Re: [dm-devel] Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-20 Thread Theodore Ts'o
On Thu, Nov 16, 2017 at 03:32:05PM -0700, Chris Murphy wrote:
> 
> XFS by default does metadata csums. But ext4 doesn't use it for either
> metadata or the journal by default still, it is still optional. So for
> now it mainly benefits XFS.

Metadata checksums are enabled by default in the version of e2fsprogs
shipped by Debian.  Since there were no real problems reported by
Debian users, in the next release of e2fsprogs, coming soon, it will
be enabled by default for all new ext4 file systems.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-17 Thread Andreas Dilger
On Nov 15, 2017, at 7:18 PM, Qu Wenruo  wrote:
> 
> [Background]
> Recently I'm considering the possibility to use checksum from filesystem
> to enhance device-mapper raid.
> 
> The idea behind it is quite simple, since most modern filesystems have
> checksum for their metadata, and even some (btrfs) have checksum for data.
> 
> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
> it can use the checksum to determine which copy is correct so it can
> return the correct data even one copy get corrupted.
> 
> [Objective]
> The final objective is to allow device mapper to do the checksum
> verification (and repair if possible).
> 
> If only for verification, it's not much different from current endio
> hook method used by most of the fs.
> However if we can move the repair part from filesystem (well, only btrfs
> supports it yet), it would benefit all fs.

I recall Darrick was looking into a mechanism to do this.  Rather than
changing the whole block layer to take a callback to do a checksum, what
we looked at was to allow the upper-layer read to specify a "retry count"
to the lower-layer block device.  If the lower layer is able to retry the
read then it will read a different device (or combination of devices for
e.g. RAID-6) based on the retry count, until the upper layer gets a good
read (based on checksum, or whatever).  If there are no more devices (or
combinations) to try then a final error is returned.

Darrick can probably point at the original thread/patch.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-17 Thread Austin S. Hemmelgarn

On 2017-11-16 20:30, Qu Wenruo wrote:



On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote:



This is at least less complicated than dm-integrity.

Just a new hook for READ bio. And it can start from easy part.
Like starting from dm-raid1 and other fs support.

It's less complicated for end users (in theory, but cryptsetup devs are
working on that for dm-integrity), but significantly more complicated
for developers.

It also brings up the question of what happens when you want some other
layer between the filesystem and the MD/DM RAID layer (say, running
bcache or dm-cache on top of the RAID array).  In the case of
dm-integrity, that's not an issue because dm-integrity is entirely
self-contained, it doesn't depend on other layers beyond the standard
block interface.


Each layer can choose to drop the support for extra verification.

If the layer is not modifying the data, it can pass it do lower layer.
Just as integrity payload.
Which then makes things a bit more complicated in every other layer as 
well, in turn making things more complicated for all developers.




As I mentioned in my other reply on this thread, running with
dm-integrity _below_ the RAID layer instead of on top of it will provide
the same net effect, and in fact provide a stronger guarantee than what
you are proposing (because dm-integrity does real cryptographic
integrity verification, as opposed to just checking for bit-rot).


Although with more CPU usage for each device even they are containing
same data.

I never said it wasn't higher resource usage.






If your checksum is calculated and checked at FS level there is no added
value when you spread this logic to other layers.


That's why I'm moving the checking part to lower level, to make more
value from the checksum.



dm-integrity adds basic 'check-summing' to any filesystem without the
need to modify fs itself


Well, despite the fact that modern filesystem has already implemented
their metadata csum.

   - the paid price is - if there is bug between

passing data from  'fs' to dm-integrity'  it cannot be captured.

Advantage of having separated 'fs' and 'block' layer is in its
separation and simplicity at each level.


Totally agreed on this.

But the idea here should not bring that large impact (compared to big
things like ZFS/Btrfs).

1) It only affect READ bio
2) Every dm target can choose if to support or pass down the hook.
     no mean to support it for RAID0 for example.
     And for complex raid like RAID5/6 no need to support it from the very
     beginning.
3) Main part of the functionality is already implemented
     The core complexity contains 2 parts:
     a) checksum calculation and checking
    Modern fs is already doing this, at least for metadata.
     b) recovery
    dm targets already have this implemented for supported raid
    profile.
     All these are already implemented, just moving them to different
     timing is not bringing such big modification IIRC.


If you want integrated solution - you are simply looking for btrfs where
multiple layers are integrated together.


If with such verification hook (along with something extra to handle
scrub), btrfs chunk mapping can be re-implemented with device-mapper:

In fact btrfs logical space is just a dm-linear device, and each chunk
can be implemented by its corresponding dm-* module like:

dm-linear:   | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
and
btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
btrfs chunk 2: data, using dm-raid0 on disk A B C D
...
btrfs chunk n: system, using dm-raid1 on disk A B

At least btrfs can take the advantage of the simplicity of separate
layers.

And other filesystem can get a little higher chance to recover its
metadata if built on dm-raid.

Again, just put dm-integrity below dm-raid.  The other filesystems
primarily have metadata checksums to catch data corruption, not repair
it,


Because they have no extra copy.
If they have, they will definitely use the extra copy to repair.
But they don't have those extra copies now, so that really becomes 
irrelevant as an argument (especially since it's not likely they will 
add data or metadata replication in the filesystem any time in the near 
future).



and I severely doubt that you will manage to convince developers to
add support in their filesystem (especially XFS) because:
1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
less of an issue because it's a completely self-contained layering
violation, while this isn't).


If passing something along with bio is violating layers, then integrity
payload is already doing this for a long time.
The block integrity layer is also interfacing directly with hardware and 
_needs_ to pass that data down.  Unless I'm mistaken, it also doesn't do 
any verification except in the filesystem layer, and doesn't pass down 
any complaints about the integrity of the data (it may try to re-read 
it, but that's not the same as what 

Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Chris Murphy
On Thu, Nov 16, 2017 at 6:54 PM, Chris Murphy  wrote:

> The user doesn't have to setup dm-verity to get this.

Or dm-integrity, rather.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Chris Murphy
On Thu, Nov 16, 2017 at 6:22 PM, Qu Wenruo  wrote:
>
>
> On 2017年11月17日 06:32, Chris Murphy wrote:
>
>> It's good the file system can stay alive, but data is the much
>> bigger target in terms of percent space on the physical media,
>
> It's also true.
> (Although working on btrfs sometimes makes me care more about safe metadata)

It seems like a good idea if it's lightweight enough, because we get
Btrfs-like metadata error detection and recovery from a copy, for
free. The user doesn't have to setup dm-verity to get this.
Additionally, if the work happens in the md driver, then both mdadm
and LVM based arrays get the feature (strictly speaking I think
dm-raid is deprecated, everything I'm aware of these days uses the md
code (including Intel's IMSM firmware based RAID).

The gotcha of course is that anytime there's a file system format
change, now this layer has to become aware of it and support all
versions of that file system's metadata for the purpose of error
detection. That might be a bitter pill to swallow in the long term.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Qu Wenruo


On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote:

>>
>> This is at least less complicated than dm-integrity.
>>
>> Just a new hook for READ bio. And it can start from easy part.
>> Like starting from dm-raid1 and other fs support.
> It's less complicated for end users (in theory, but cryptsetup devs are
> working on that for dm-integrity), but significantly more complicated
> for developers.
> 
> It also brings up the question of what happens when you want some other
> layer between the filesystem and the MD/DM RAID layer (say, running
> bcache or dm-cache on top of the RAID array).  In the case of
> dm-integrity, that's not an issue because dm-integrity is entirely
> self-contained, it doesn't depend on other layers beyond the standard
> block interface.

Each layer can choose to drop the support for extra verification.

If the layer is not modifying the data, it can pass it do lower layer.
Just as integrity payload.

> 
> As I mentioned in my other reply on this thread, running with
> dm-integrity _below_ the RAID layer instead of on top of it will provide
> the same net effect, and in fact provide a stronger guarantee than what
> you are proposing (because dm-integrity does real cryptographic
> integrity verification, as opposed to just checking for bit-rot).

Although with more CPU usage for each device even they are containing
same data.

>>
>>>
>>> If your checksum is calculated and checked at FS level there is no added
>>> value when you spread this logic to other layers.
>>
>> That's why I'm moving the checking part to lower level, to make more
>> value from the checksum.
>>
>>>
>>> dm-integrity adds basic 'check-summing' to any filesystem without the
>>> need to modify fs itself
>>
>> Well, despite the fact that modern filesystem has already implemented
>> their metadata csum.
>>
>>   - the paid price is - if there is bug between
>>> passing data from  'fs' to dm-integrity'  it cannot be captured.
>>>
>>> Advantage of having separated 'fs' and 'block' layer is in its
>>> separation and simplicity at each level.
>>
>> Totally agreed on this.
>>
>> But the idea here should not bring that large impact (compared to big
>> things like ZFS/Btrfs).
>>
>> 1) It only affect READ bio
>> 2) Every dm target can choose if to support or pass down the hook.
>>     no mean to support it for RAID0 for example.
>>     And for complex raid like RAID5/6 no need to support it from the very
>>     beginning.
>> 3) Main part of the functionality is already implemented
>>     The core complexity contains 2 parts:
>>     a) checksum calculation and checking
>>    Modern fs is already doing this, at least for metadata.
>>     b) recovery
>>    dm targets already have this implemented for supported raid
>>    profile.
>>     All these are already implemented, just moving them to different
>>     timing is not bringing such big modification IIRC.
>>>
>>> If you want integrated solution - you are simply looking for btrfs where
>>> multiple layers are integrated together.
>>
>> If with such verification hook (along with something extra to handle
>> scrub), btrfs chunk mapping can be re-implemented with device-mapper:
>>
>> In fact btrfs logical space is just a dm-linear device, and each chunk
>> can be implemented by its corresponding dm-* module like:
>>
>> dm-linear:   | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
>> and
>> btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
>> btrfs chunk 2: data, using dm-raid0 on disk A B C D
>> ...
>> btrfs chunk n: system, using dm-raid1 on disk A B
>>
>> At least btrfs can take the advantage of the simplicity of separate
>> layers.
>>
>> And other filesystem can get a little higher chance to recover its
>> metadata if built on dm-raid.
> Again, just put dm-integrity below dm-raid.  The other filesystems
> primarily have metadata checksums to catch data corruption, not repair
> it,

Because they have no extra copy.
If they have, they will definitely use the extra copy to repair.

> and I severely doubt that you will manage to convince developers to
> add support in their filesystem (especially XFS) because:
> 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
> less of an issue because it's a completely self-contained layering
> violation, while this isn't).

If passing something along with bio is violating layers, then integrity
payload is already doing this for a long time.

> 2. There's no precedent in hardware (I challenge you to find a block
> device that lets you respond to a read completing with 'Hey, this data
> is bogus, give me the real data!').
> 3. You can get the same net effect with a higher guarantee of security
> using dm-integrity.

With more CPU and IO overhead (journal mode will write data twice, one
for journal and one for real data).

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>>
>>> You are also possibly missing feature of dm-interity - it's not just
>>> giving you 'checksum' - it also makes you sure - device has 

Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Qu Wenruo


On 2017年11月17日 06:32, Chris Murphy wrote:
> On Thu, Nov 16, 2017 at 3:04 AM, Qu Wenruo  wrote:
> 
>> For example, if we use the following device mapper layout:
>>
>> FS (can be any fs with metadata csum)
>> |
>>  dm-integrity
>> |
>>  dm-raid1
>>/ \
>>  disk1 disk2
> 
> 
> You would instead do dm-integrity per physical device, then make the
> two dm-integrity devices, members of md raid1 array. Now when
> integrity fails, basically it's UNC error to raid1 which then gets the
> copy from the other device.


Yep, dm-integrity under raid1 makes much more sense here.

Although, double the CPU usage for each device added in.

> 
> But what you're getting at, that dm-integrity is more complicated, is
> true, in that it's at least partly COW based in order to get the
> atomic write guarantee needed to ensure data blocks and csums are
> always in sync, and reliable. But this also applies to the entire file
> system. The READ bio concept you're proposing leverages pretty much
> already existing code, has no write performance penalty or complexity
> at all, but does miss data for file systems that don't csum data
> blocks.

That's true, since currently only Btrfs supports data csum.
And to make filesystem to support data csum, it needs CoW support while
only XFS and Btrfs supports CoW yet.

> It's good the file system can stay alive, but data is the much
> bigger target in terms of percent space on the physical media,

It's also true.
(Although working on btrfs sometimes makes me care more about safe metadata)

Thanks,
Qu

> and
> more likely to be corrupt or go missing due to media defect or
> whatever. It's still possible for silent data corruption to happen.
> 
> 
> 
> 
>> I just want to make device-mapper raid able to handle such case too.
>> Especially when most fs supports checksum for their metadata.
> 
> XFS by default does metadata csums. But ext4 doesn't use it for either
> metadata or the journal by default still, it is still optional. So for
> now it mainly benefits XFS.
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Chris Murphy
On Thu, Nov 16, 2017 at 3:04 AM, Qu Wenruo  wrote:

> For example, if we use the following device mapper layout:
>
> FS (can be any fs with metadata csum)
> |
>  dm-integrity
> |
>  dm-raid1
>/ \
>  disk1 disk2


You would instead do dm-integrity per physical device, then make the
two dm-integrity devices, members of md raid1 array. Now when
integrity fails, basically it's UNC error to raid1 which then gets the
copy from the other device.

But what you're getting at, that dm-integrity is more complicated, is
true, in that it's at least partly COW based in order to get the
atomic write guarantee needed to ensure data blocks and csums are
always in sync, and reliable. But this also applies to the entire file
system. The READ bio concept you're proposing leverages pretty much
already existing code, has no write performance penalty or complexity
at all, but does miss data for file systems that don't csum data
blocks. It's good the file system can stay alive, but data is the much
bigger target in terms of percent space on the physical media, and
more likely to be corrupt or go missing due to media defect or
whatever. It's still possible for silent data corruption to happen.




> I just want to make device-mapper raid able to handle such case too.
> Especially when most fs supports checksum for their metadata.

XFS by default does metadata csums. But ext4 doesn't use it for either
metadata or the journal by default still, it is still optional. So for
now it mainly benefits XFS.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Pasi Kärkkäinen
On Thu, Nov 16, 2017 at 11:47:45AM -0500, Austin S. Hemmelgarn wrote:
> >
> >At least btrfs can take the advantage of the simplicity of separate layers.
> >
> >And other filesystem can get a little higher chance to recover its
> >metadata if built on dm-raid.
> Again, just put dm-integrity below dm-raid.  The other filesystems primarily
> have metadata checksums to catch data corruption, not repair it, and I
> severely doubt that you will manage to convince developers to add support in
> their filesystem (especially XFS) because:
> 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
> less of an issue because it's a completely self-contained layering
> violation, while this isn't).
> 2. There's no precedent in hardware (I challenge you to find a block device
> that lets you respond to a read completing with 'Hey, this data is bogus,
> give me the real data!').
>

Isn't this what T10 DIF/DIX (Data Integrity Fields / Data Integrity 
Extenstions) allows.. using checksums all the way from userspace applications 
to the disks in the storage backend, with checksum verification at all points 
in between? 

Does require compatible hardware/firmware/kernel/drivers/apps though.. so not 
really a generic solution.


-- Pasi

> 3. You can get the same net effect with a higher guarantee of security using
> dm-integrity.
> >
> >Thanks,
> >Qu
> >
> >>
> >>You are also possibly missing feature of dm-interity - it's not just
> >>giving you 'checksum' - it also makes you sure - device has proper
> >>content - you can't just 'replace block' even with proper checksum for a
> >>block somewhere in the middle of you device... and when joined with
> >>crypto - it makes it way more secure...
> >>
> >>Regards
> >>
> >>Zdenek
> >
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Austin S. Hemmelgarn

On 2017-11-16 09:06, Qu Wenruo wrote:



On 2017年11月16日 20:33, Zdenek Kabelac wrote:

Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a):



On 2017年11月16日 17:43, Zdenek Kabelac wrote:

Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):






[What we have]
The nearest infrastructure I found in kernel is
bio_integrity_payload.



Hi

We already have  dm-integrity target upstream.
What's missing in this target ?


If I didn't miss anything, the dm-integrity is designed to calculate and
restore csum into its space to verify the integrity.
The csum happens when bio reaches dm-integrity.

However what I want is, fs generate bio with attached verification hook,
and pass to lower layers to verify it.

For example, if we use the following device mapper layout:

  FS (can be any fs with metadata csum)
  |
   dm-integrity
  |
   dm-raid1
     / \
   disk1 disk2

If some data in disk1 get corrupted (the disk itself is still good), and
when dm-raid1 tries to read the corrupted data, it may return the
corrupted one, and then caught by dm-integrity, finally return -EIO to
FS.

But the truth is, we could at least try to read out data in disk2 if we
know the csum for it.
And use the checksum to verify if it's the correct data.


So my idea will be:
   FS (with metadata csum, or even data csum support)
  |  READ bio for metadata
  |  -With metadata verification hook
  dm-raid1
     / \
    disk1   disk2

dm-raid1 handles the bio, reading out data from disk1.
But the result can't pass verification hook.
Then retry with disk2.

If result from disk2 passes verification hook. That's good, returning
the result from disk2 to upper layer (fs).
And we can even submit WRITE bio to try to write the good result back to
disk1.

If result from disk2 doesn't pass verification hook, then we return -EIO
to upper layer.

That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6
will also try to rebuild data, but it still has some problem).

I just want to make device-mapper raid able to handle such case too.
Especially when most fs supports checksum for their metadata.



Hi

IMHO you are looking for too complicated solution.


This is at least less complicated than dm-integrity.

Just a new hook for READ bio. And it can start from easy part.
Like starting from dm-raid1 and other fs support.
It's less complicated for end users (in theory, but cryptsetup devs are 
working on that for dm-integrity), but significantly more complicated 
for developers.


It also brings up the question of what happens when you want some other 
layer between the filesystem and the MD/DM RAID layer (say, running 
bcache or dm-cache on top of the RAID array).  In the case of 
dm-integrity, that's not an issue because dm-integrity is entirely 
self-contained, it doesn't depend on other layers beyond the standard 
block interface.


As I mentioned in my other reply on this thread, running with 
dm-integrity _below_ the RAID layer instead of on top of it will provide 
the same net effect, and in fact provide a stronger guarantee than what 
you are proposing (because dm-integrity does real cryptographic 
integrity verification, as opposed to just checking for bit-rot).




If your checksum is calculated and checked at FS level there is no added
value when you spread this logic to other layers.


That's why I'm moving the checking part to lower level, to make more
value from the checksum.



dm-integrity adds basic 'check-summing' to any filesystem without the
need to modify fs itself


Well, despite the fact that modern filesystem has already implemented
their metadata csum.

  - the paid price is - if there is bug between

passing data from  'fs' to dm-integrity'  it cannot be captured.

Advantage of having separated 'fs' and 'block' layer is in its
separation and simplicity at each level.


Totally agreed on this.

But the idea here should not bring that large impact (compared to big
things like ZFS/Btrfs).

1) It only affect READ bio
2) Every dm target can choose if to support or pass down the hook.
no mean to support it for RAID0 for example.
And for complex raid like RAID5/6 no need to support it from the very
beginning.
3) Main part of the functionality is already implemented
The core complexity contains 2 parts:
a) checksum calculation and checking
   Modern fs is already doing this, at least for metadata.
b) recovery
   dm targets already have this implemented for supported raid
   profile.
All these are already implemented, just moving them to different
timing is not bringing such big modification IIRC.


If you want integrated solution - you are simply looking for btrfs where
multiple layers are integrated together.


If with such verification hook (along with something extra to handle
scrub), btrfs chunk mapping can be re-implemented with device-mapper:

In 

Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Qu Wenruo


On 2017年11月16日 20:33, Zdenek Kabelac wrote:
> Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a):
>>
>>
>> On 2017年11月16日 17:43, Zdenek Kabelac wrote:
>>> Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):


>
 [What we have]
 The nearest infrastructure I found in kernel is
 bio_integrity_payload.

>>>
>>> Hi
>>>
>>> We already have  dm-integrity target upstream.
>>> What's missing in this target ?
>>
>> If I didn't miss anything, the dm-integrity is designed to calculate and
>> restore csum into its space to verify the integrity.
>> The csum happens when bio reaches dm-integrity.
>>
>> However what I want is, fs generate bio with attached verification hook,
>> and pass to lower layers to verify it.
>>
>> For example, if we use the following device mapper layout:
>>
>>  FS (can be any fs with metadata csum)
>>  |
>>   dm-integrity
>>  |
>>   dm-raid1
>>     / \
>>   disk1 disk2
>>
>> If some data in disk1 get corrupted (the disk itself is still good), and
>> when dm-raid1 tries to read the corrupted data, it may return the
>> corrupted one, and then caught by dm-integrity, finally return -EIO to
>> FS.
>>
>> But the truth is, we could at least try to read out data in disk2 if we
>> know the csum for it.
>> And use the checksum to verify if it's the correct data.
>>
>>
>> So my idea will be:
>>   FS (with metadata csum, or even data csum support)
>>  |  READ bio for metadata
>>  |  -With metadata verification hook
>>  dm-raid1
>>     / \
>>    disk1   disk2
>>
>> dm-raid1 handles the bio, reading out data from disk1.
>> But the result can't pass verification hook.
>> Then retry with disk2.
>>
>> If result from disk2 passes verification hook. That's good, returning
>> the result from disk2 to upper layer (fs).
>> And we can even submit WRITE bio to try to write the good result back to
>> disk1.
>>
>> If result from disk2 doesn't pass verification hook, then we return -EIO
>> to upper layer.
>>
>> That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6
>> will also try to rebuild data, but it still has some problem).
>>
>> I just want to make device-mapper raid able to handle such case too.
>> Especially when most fs supports checksum for their metadata.
>>
> 
> Hi
> 
> IMHO you are looking for too complicated solution.

This is at least less complicated than dm-integrity.

Just a new hook for READ bio. And it can start from easy part.
Like starting from dm-raid1 and other fs support.

> 
> If your checksum is calculated and checked at FS level there is no added
> value when you spread this logic to other layers.

That's why I'm moving the checking part to lower level, to make more
value from the checksum.

> 
> dm-integrity adds basic 'check-summing' to any filesystem without the
> need to modify fs itself

Well, despite the fact that modern filesystem has already implemented
their metadata csum.

 - the paid price is - if there is bug between
> passing data from  'fs' to dm-integrity'  it cannot be captured.
> 
> Advantage of having separated 'fs' and 'block' layer is in its
> separation and simplicity at each level.

Totally agreed on this.

But the idea here should not bring that large impact (compared to big
things like ZFS/Btrfs).

1) It only affect READ bio
2) Every dm target can choose if to support or pass down the hook.
   no mean to support it for RAID0 for example.
   And for complex raid like RAID5/6 no need to support it from the very
   beginning.
3) Main part of the functionality is already implemented
   The core complexity contains 2 parts:
   a) checksum calculation and checking
  Modern fs is already doing this, at least for metadata.
   b) recovery
  dm targets already have this implemented for supported raid
  profile.
   All these are already implemented, just moving them to different
   timing is not bringing such big modification IIRC.
> 
> If you want integrated solution - you are simply looking for btrfs where
> multiple layers are integrated together.

If with such verification hook (along with something extra to handle
scrub), btrfs chunk mapping can be re-implemented with device-mapper:

In fact btrfs logical space is just a dm-linear device, and each chunk
can be implemented by its corresponding dm-* module like:

dm-linear:   | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
and
btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
btrfs chunk 2: data, using dm-raid0 on disk A B C D
...
btrfs chunk n: system, using dm-raid1 on disk A B

At least btrfs can take the advantage of the simplicity of separate layers.

And other filesystem can get a little higher chance to recover its
metadata if built on dm-raid.

Thanks,
Qu

> 
> You are also possibly missing feature of dm-interity - it's not just
> giving you 'checksum' - it also makes you sure - device 

Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Austin S. Hemmelgarn

On 2017-11-16 07:33, Zdenek Kabelac wrote:

Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a):



On 2017年11月16日 17:43, Zdenek Kabelac wrote:

Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):






[What we have]
The nearest infrastructure I found in kernel is
bio_integrity_payload.



Hi

We already have  dm-integrity target upstream.
What's missing in this target ?


If I didn't miss anything, the dm-integrity is designed to calculate and
restore csum into its space to verify the integrity.
The csum happens when bio reaches dm-integrity.

However what I want is, fs generate bio with attached verification hook,
and pass to lower layers to verify it.

For example, if we use the following device mapper layout:

 FS (can be any fs with metadata csum)
 |
  dm-integrity
 |
  dm-raid1
    / \
  disk1 disk2

If some data in disk1 get corrupted (the disk itself is still good), and
when dm-raid1 tries to read the corrupted data, it may return the
corrupted one, and then caught by dm-integrity, finally return -EIO to 
FS.


But the truth is, we could at least try to read out data in disk2 if we
know the csum for it.
And use the checksum to verify if it's the correct data.


So my idea will be:
  FS (with metadata csum, or even data csum support)
 |  READ bio for metadata
 |  -With metadata verification hook
 dm-raid1
    / \
   disk1   disk2

dm-raid1 handles the bio, reading out data from disk1.
But the result can't pass verification hook.
Then retry with disk2.

If result from disk2 passes verification hook. That's good, returning
the result from disk2 to upper layer (fs).
And we can even submit WRITE bio to try to write the good result back to
disk1.

If result from disk2 doesn't pass verification hook, then we return -EIO
to upper layer.

That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6
will also try to rebuild data, but it still has some problem).

I just want to make device-mapper raid able to handle such case too.
Especially when most fs supports checksum for their metadata.



Hi

IMHO you are looking for too complicated solution.

If your checksum is calculated and checked at FS level there is no added 
value when you spread this logic to other layers.


dm-integrity adds basic 'check-summing' to any filesystem without the 
need to modify fs itself - the paid price is - if there is bug between 
passing data from  'fs' to dm-integrity'  it cannot be captured.
But that is true of pretty much any layering, not just dm-integrity. 
There's just a slightly larger window for corruption with dm-integrity.


Advantage of having separated 'fs' and 'block' layer is in its 
separation and simplicity at each level.


If you want integrated solution - you are simply looking for btrfs where 
multiple layers are integrated together.


You are also possibly missing feature of dm-interity - it's not just 
giving you 'checksum' - it also makes you sure - device has proper 
content - you can't just 'replace block' even with proper checksum for a 
block somewhere in the middle of you device... and when joined with 
crypto - it makes it way more secure...
And to expand a bit further, the correct way to integrate dm-integrity 
into the stack when RAID is involved is to put it _below_ the RAID 
layer, so each underlying device is it's own dm-integrity target. 
Assuming I understand the way dm-raid and md handle -EIO, that should 
get you a similar level of protection to BTRFS (worse in some ways, 
better in others).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Zdenek Kabelac

Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a):



On 2017年11月16日 17:43, Zdenek Kabelac wrote:

Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):






[What we have]
The nearest infrastructure I found in kernel is
bio_integrity_payload.



Hi

We already have  dm-integrity target upstream.
What's missing in this target ?


If I didn't miss anything, the dm-integrity is designed to calculate and
restore csum into its space to verify the integrity.
The csum happens when bio reaches dm-integrity.

However what I want is, fs generate bio with attached verification hook,
and pass to lower layers to verify it.

For example, if we use the following device mapper layout:

 FS (can be any fs with metadata csum)
 |
  dm-integrity
 |
  dm-raid1
/ \
  disk1 disk2

If some data in disk1 get corrupted (the disk itself is still good), and
when dm-raid1 tries to read the corrupted data, it may return the
corrupted one, and then caught by dm-integrity, finally return -EIO to FS.

But the truth is, we could at least try to read out data in disk2 if we
know the csum for it.
And use the checksum to verify if it's the correct data.


So my idea will be:
  FS (with metadata csum, or even data csum support)
 |  READ bio for metadata
 |  -With metadata verification hook
 dm-raid1
/ \
   disk1   disk2

dm-raid1 handles the bio, reading out data from disk1.
But the result can't pass verification hook.
Then retry with disk2.

If result from disk2 passes verification hook. That's good, returning
the result from disk2 to upper layer (fs).
And we can even submit WRITE bio to try to write the good result back to
disk1.

If result from disk2 doesn't pass verification hook, then we return -EIO
to upper layer.

That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6
will also try to rebuild data, but it still has some problem).

I just want to make device-mapper raid able to handle such case too.
Especially when most fs supports checksum for their metadata.



Hi

IMHO you are looking for too complicated solution.

If your checksum is calculated and checked at FS level there is no added value 
when you spread this logic to other layers.


dm-integrity adds basic 'check-summing' to any filesystem without the need to 
modify fs itself - the paid price is - if there is bug between passing data 
from  'fs' to dm-integrity'  it cannot be captured.


Advantage of having separated 'fs' and 'block' layer is in its separation and 
simplicity at each level.


If you want integrated solution - you are simply looking for btrfs where 
multiple layers are integrated together.


You are also possibly missing feature of dm-interity - it's not just giving 
you 'checksum' - it also makes you sure - device has proper content - you 
can't just 'replace block' even with proper checksum for a block somewhere in 
the middle of you device... and when joined with crypto - it makes it way more 
secure...


Regards

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Qu Wenruo


On 2017年11月16日 17:43, Zdenek Kabelac wrote:
> Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):
>>
>>
>>>
>> [What we have]
>> The nearest infrastructure I found in kernel is
>> bio_integrity_payload.
>>
> 
> Hi
> 
> We already have  dm-integrity target upstream.
> What's missing in this target ?

If I didn't miss anything, the dm-integrity is designed to calculate and
restore csum into its space to verify the integrity.
The csum happens when bio reaches dm-integrity.

However what I want is, fs generate bio with attached verification hook,
and pass to lower layers to verify it.

For example, if we use the following device mapper layout:

FS (can be any fs with metadata csum)
|
 dm-integrity
|
 dm-raid1
   / \
 disk1 disk2

If some data in disk1 get corrupted (the disk itself is still good), and
when dm-raid1 tries to read the corrupted data, it may return the
corrupted one, and then caught by dm-integrity, finally return -EIO to FS.

But the truth is, we could at least try to read out data in disk2 if we
know the csum for it.
And use the checksum to verify if it's the correct data.


So my idea will be:
 FS (with metadata csum, or even data csum support)
|  READ bio for metadata
|  -With metadata verification hook
dm-raid1
   / \
  disk1   disk2

dm-raid1 handles the bio, reading out data from disk1.
But the result can't pass verification hook.
Then retry with disk2.

If result from disk2 passes verification hook. That's good, returning
the result from disk2 to upper layer (fs).
And we can even submit WRITE bio to try to write the good result back to
disk1.

If result from disk2 doesn't pass verification hook, then we return -EIO
to upper layer.

That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6
will also try to rebuild data, but it still has some problem).

I just want to make device-mapper raid able to handle such case too.
Especially when most fs supports checksum for their metadata.

Thanks,
Qu
> 
> Regards
> 
> Zdenek
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Zdenek Kabelac

Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a):






[What we have]
The nearest infrastructure I found in kernel is bio_integrity_payload.



Hi

We already have  dm-integrity target upstream.
What's missing in this target ?

Regards

Zdenek


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-16 Thread Qu Wenruo


On 2017年11月16日 15:42, Nikolay Borisov wrote:
> 
> 
> On 16.11.2017 09:38, Qu Wenruo wrote:
>>
>>
>> On 2017年11月16日 14:54, Nikolay Borisov wrote:
>>>
>>>
>>> On 16.11.2017 04:18, Qu Wenruo wrote:
 Hi all,

 [Background]
 Recently I'm considering the possibility to use checksum from filesystem
 to enhance device-mapper raid.

 The idea behind it is quite simple, since most modern filesystems have
 checksum for their metadata, and even some (btrfs) have checksum for data.

 And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
 it can use the checksum to determine which copy is correct so it can
 return the correct data even one copy get corrupted.

 [Objective]
 The final objective is to allow device mapper to do the checksum
 verification (and repair if possible).

 If only for verification, it's not much different from current endio
 hook method used by most of the fs.
 However if we can move the repair part from filesystem (well, only btrfs
 supports it yet), it would benefit all fs.

 [What we have]
 The nearest infrastructure I found in kernel is bio_integrity_payload.

 However I found it's bounded to device, as it's designed to support
 SCSI/SATA integrity protocol.
 While for such use case, it's more bounded to filesystem, as fs (or
 higher layer dm device) is the source of integrity data, and device
 (dm-raid) only do the verification and possible repair.

 I'm not sure if this is a good idea to reuse or abuse
 bio_integrity_payload for this purpose.

 Should we use some new infrastructure or enhance existing
 bio_integrity_payload?

 (Or is this a valid idea or just another crazy dream?)

>>>
>>> This sounds good in principle, however I think there is one crucial
>>> point which needs to be considered:
>>>
>>> All fs with checksums store those checksums in some specific way, then
>>> when they fetch data from disk they they also know how to acquire the
>>> respective checksum.
>>
>> Just like integrity payload, we generate READ bio attached with checksum
>> hook function and checksum data.
> 
> So how is this checksum data acquired in the first place?

In btrfs case, through metadata read bio.
Since btrfs put data csum into its csum tree, as metadata.

Pass a READ bio with metadata specific verification function, and empty
verification data.

> 
>>
>> So for data read, we read checksum first and attach it to data READ bio,
>> then submit it.
>>
>> And for metadata read, in most case the checksum is integrated into
>> metadata header, like what we did in btrfs.
>>
>> In that case we attach empty checksum data to bio, but use metadata
>> specific function hook to handle it.
>>
>>> What you suggest might be doable but it will
>>> require lower layers (dm) be aware of how to acquire the specific
>>> checksum for some data.
>>
>> In above case, dm only needs to call the verification hook function.
>> If verification passed, that's good.
>> If not, try other copy if we have.
>>
>> In this case, I don't think dm layer needs any extra interface to
>> communicate with higher layer.
> 
> 
> Well that verification function is the interface I meant, you are
> communicating the checksum out of band essentially (notwithstanding the
> metadata case, since you said checksum is in the actual metadata header)
> 
> In the end - which problem are you trying to solve, allow for a generic
> checksumming layer which filesystems may use if they decide to ?

To make it clear, to allow device mapper layer to take use of filesystem
checksum (if they have) when there are multiple copies.

One problem of current dm raid1/10 (and possible raid5/6) is that they
don't have ability to know which copy is correct.
They can only handle device disappear.

Btrfs handles it by verifying data/metadata checksum.
While xfs/ext4 also has checksum for their metadata, why not allowing
device mapper to use such checksum to get the correct copy?

The mechanism is *NOT* a generic checksum layer.
How the csum is stored is determined by fs.
Just allow device mapper layer to be aware of this and make clever decision.

And more, this only affects READ bio, WRITE bio is not affected at all.
Csum calculation and storing is all handled by filesystem.
Device mapper layer won't need to get involved in that case.

And of course, btrfs can reuse this facility to do something bigger, but
that's another story.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>> I don't think at this point there is such infra
>>> and frankly I cannot even envision how it will work elegantly. Sure you
>>> can create a dm-checksum target (which I believe dm-verity is very
>>> similar to) that stores checksums alongside data but at this point the
>>> fs is really out of the picture.
>>>
>>>
 Thanks,
 Qu

>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of 

Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-15 Thread Nikolay Borisov


On 16.11.2017 09:38, Qu Wenruo wrote:
> 
> 
> On 2017年11月16日 14:54, Nikolay Borisov wrote:
>>
>>
>> On 16.11.2017 04:18, Qu Wenruo wrote:
>>> Hi all,
>>>
>>> [Background]
>>> Recently I'm considering the possibility to use checksum from filesystem
>>> to enhance device-mapper raid.
>>>
>>> The idea behind it is quite simple, since most modern filesystems have
>>> checksum for their metadata, and even some (btrfs) have checksum for data.
>>>
>>> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
>>> it can use the checksum to determine which copy is correct so it can
>>> return the correct data even one copy get corrupted.
>>>
>>> [Objective]
>>> The final objective is to allow device mapper to do the checksum
>>> verification (and repair if possible).
>>>
>>> If only for verification, it's not much different from current endio
>>> hook method used by most of the fs.
>>> However if we can move the repair part from filesystem (well, only btrfs
>>> supports it yet), it would benefit all fs.
>>>
>>> [What we have]
>>> The nearest infrastructure I found in kernel is bio_integrity_payload.
>>>
>>> However I found it's bounded to device, as it's designed to support
>>> SCSI/SATA integrity protocol.
>>> While for such use case, it's more bounded to filesystem, as fs (or
>>> higher layer dm device) is the source of integrity data, and device
>>> (dm-raid) only do the verification and possible repair.
>>>
>>> I'm not sure if this is a good idea to reuse or abuse
>>> bio_integrity_payload for this purpose.
>>>
>>> Should we use some new infrastructure or enhance existing
>>> bio_integrity_payload?
>>>
>>> (Or is this a valid idea or just another crazy dream?)
>>>
>>
>> This sounds good in principle, however I think there is one crucial
>> point which needs to be considered:
>>
>> All fs with checksums store those checksums in some specific way, then
>> when they fetch data from disk they they also know how to acquire the
>> respective checksum.
> 
> Just like integrity payload, we generate READ bio attached with checksum
> hook function and checksum data.

So how is this checksum data acquired in the first place?

> 
> So for data read, we read checksum first and attach it to data READ bio,
> then submit it.
> 
> And for metadata read, in most case the checksum is integrated into
> metadata header, like what we did in btrfs.
> 
> In that case we attach empty checksum data to bio, but use metadata
> specific function hook to handle it.
> 
>> What you suggest might be doable but it will
>> require lower layers (dm) be aware of how to acquire the specific
>> checksum for some data.
> 
> In above case, dm only needs to call the verification hook function.
> If verification passed, that's good.
> If not, try other copy if we have.
> 
> In this case, I don't think dm layer needs any extra interface to
> communicate with higher layer.


Well that verification function is the interface I meant, you are
communicating the checksum out of band essentially (notwithstanding the
metadata case, since you said checksum is in the actual metadata header)

In the end - which problem are you trying to solve, allow for a generic
checksumming layer which filesystems may use if they decide to ?

> 
> Thanks,
> Qu
> 
>> I don't think at this point there is such infra
>> and frankly I cannot even envision how it will work elegantly. Sure you
>> can create a dm-checksum target (which I believe dm-verity is very
>> similar to) that stores checksums alongside data but at this point the
>> fs is really out of the picture.
>>
>>
>>> Thanks,
>>> Qu
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-15 Thread Qu Wenruo


On 2017年11月16日 14:54, Nikolay Borisov wrote:
> 
> 
> On 16.11.2017 04:18, Qu Wenruo wrote:
>> Hi all,
>>
>> [Background]
>> Recently I'm considering the possibility to use checksum from filesystem
>> to enhance device-mapper raid.
>>
>> The idea behind it is quite simple, since most modern filesystems have
>> checksum for their metadata, and even some (btrfs) have checksum for data.
>>
>> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
>> it can use the checksum to determine which copy is correct so it can
>> return the correct data even one copy get corrupted.
>>
>> [Objective]
>> The final objective is to allow device mapper to do the checksum
>> verification (and repair if possible).
>>
>> If only for verification, it's not much different from current endio
>> hook method used by most of the fs.
>> However if we can move the repair part from filesystem (well, only btrfs
>> supports it yet), it would benefit all fs.
>>
>> [What we have]
>> The nearest infrastructure I found in kernel is bio_integrity_payload.
>>
>> However I found it's bounded to device, as it's designed to support
>> SCSI/SATA integrity protocol.
>> While for such use case, it's more bounded to filesystem, as fs (or
>> higher layer dm device) is the source of integrity data, and device
>> (dm-raid) only do the verification and possible repair.
>>
>> I'm not sure if this is a good idea to reuse or abuse
>> bio_integrity_payload for this purpose.
>>
>> Should we use some new infrastructure or enhance existing
>> bio_integrity_payload?
>>
>> (Or is this a valid idea or just another crazy dream?)
>>
> 
> This sounds good in principle, however I think there is one crucial
> point which needs to be considered:
> 
> All fs with checksums store those checksums in some specific way, then
> when they fetch data from disk they they also know how to acquire the
> respective checksum.

Just like integrity payload, we generate READ bio attached with checksum
hook function and checksum data.

So for data read, we read checksum first and attach it to data READ bio,
then submit it.

And for metadata read, in most case the checksum is integrated into
metadata header, like what we did in btrfs.

In that case we attach empty checksum data to bio, but use metadata
specific function hook to handle it.

> What you suggest might be doable but it will
> require lower layers (dm) be aware of how to acquire the specific
> checksum for some data.

In above case, dm only needs to call the verification hook function.
If verification passed, that's good.
If not, try other copy if we have.

In this case, I don't think dm layer needs any extra interface to
communicate with higher layer.

Thanks,
Qu

> I don't think at this point there is such infra
> and frankly I cannot even envision how it will work elegantly. Sure you
> can create a dm-checksum target (which I believe dm-verity is very
> similar to) that stores checksums alongside data but at this point the
> fs is really out of the picture.
> 
> 
>> Thanks,
>> Qu
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-15 Thread Nikolay Borisov


On 16.11.2017 04:18, Qu Wenruo wrote:
> Hi all,
> 
> [Background]
> Recently I'm considering the possibility to use checksum from filesystem
> to enhance device-mapper raid.
> 
> The idea behind it is quite simple, since most modern filesystems have
> checksum for their metadata, and even some (btrfs) have checksum for data.
> 
> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
> it can use the checksum to determine which copy is correct so it can
> return the correct data even one copy get corrupted.
> 
> [Objective]
> The final objective is to allow device mapper to do the checksum
> verification (and repair if possible).
> 
> If only for verification, it's not much different from current endio
> hook method used by most of the fs.
> However if we can move the repair part from filesystem (well, only btrfs
> supports it yet), it would benefit all fs.
> 
> [What we have]
> The nearest infrastructure I found in kernel is bio_integrity_payload.
> 
> However I found it's bounded to device, as it's designed to support
> SCSI/SATA integrity protocol.
> While for such use case, it's more bounded to filesystem, as fs (or
> higher layer dm device) is the source of integrity data, and device
> (dm-raid) only do the verification and possible repair.
> 
> I'm not sure if this is a good idea to reuse or abuse
> bio_integrity_payload for this purpose.
> 
> Should we use some new infrastructure or enhance existing
> bio_integrity_payload?
> 
> (Or is this a valid idea or just another crazy dream?)
> 

This sounds good in principle, however I think there is one crucial
point which needs to be considered:

All fs with checksums store those checksums in some specific way, then
when they fetch data from disk they they also know how to acquire the
respective checksum. What you suggest might be doable but it will
require lower layers (dm) be aware of how to acquire the specific
checksum for some data. I don't think at this point there is such infra
and frankly I cannot even envision how it will work elegantly. Sure you
can create a dm-checksum target (which I believe dm-verity is very
similar to) that stores checksums alongside data but at this point the
fs is really out of the picture.


> Thanks,
> Qu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

2017-11-15 Thread Qu Wenruo
Hi all,

[Background]
Recently I'm considering the possibility to use checksum from filesystem
to enhance device-mapper raid.

The idea behind it is quite simple, since most modern filesystems have
checksum for their metadata, and even some (btrfs) have checksum for data.

And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time
it can use the checksum to determine which copy is correct so it can
return the correct data even one copy get corrupted.

[Objective]
The final objective is to allow device mapper to do the checksum
verification (and repair if possible).

If only for verification, it's not much different from current endio
hook method used by most of the fs.
However if we can move the repair part from filesystem (well, only btrfs
supports it yet), it would benefit all fs.

[What we have]
The nearest infrastructure I found in kernel is bio_integrity_payload.

However I found it's bounded to device, as it's designed to support
SCSI/SATA integrity protocol.
While for such use case, it's more bounded to filesystem, as fs (or
higher layer dm device) is the source of integrity data, and device
(dm-raid) only do the verification and possible repair.

I'm not sure if this is a good idea to reuse or abuse
bio_integrity_payload for this purpose.

Should we use some new infrastructure or enhance existing
bio_integrity_payload?

(Or is this a valid idea or just another crazy dream?)

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature