Re: [dm-devel] Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Thu, Nov 16, 2017 at 03:32:05PM -0700, Chris Murphy wrote: > > XFS by default does metadata csums. But ext4 doesn't use it for either > metadata or the journal by default still, it is still optional. So for > now it mainly benefits XFS. Metadata checksums are enabled by default in the version of e2fsprogs shipped by Debian. Since there were no real problems reported by Debian users, in the next release of e2fsprogs, coming soon, it will be enabled by default for all new ext4 file systems. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Nov 15, 2017, at 7:18 PM, Qu Wenruowrote: > > [Background] > Recently I'm considering the possibility to use checksum from filesystem > to enhance device-mapper raid. > > The idea behind it is quite simple, since most modern filesystems have > checksum for their metadata, and even some (btrfs) have checksum for data. > > And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time > it can use the checksum to determine which copy is correct so it can > return the correct data even one copy get corrupted. > > [Objective] > The final objective is to allow device mapper to do the checksum > verification (and repair if possible). > > If only for verification, it's not much different from current endio > hook method used by most of the fs. > However if we can move the repair part from filesystem (well, only btrfs > supports it yet), it would benefit all fs. I recall Darrick was looking into a mechanism to do this. Rather than changing the whole block layer to take a callback to do a checksum, what we looked at was to allow the upper-layer read to specify a "retry count" to the lower-layer block device. If the lower layer is able to retry the read then it will read a different device (or combination of devices for e.g. RAID-6) based on the retry count, until the upper layer gets a good read (based on checksum, or whatever). If there are no more devices (or combinations) to try then a final error is returned. Darrick can probably point at the original thread/patch. Cheers, Andreas signature.asc Description: Message signed with OpenPGP
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017-11-16 20:30, Qu Wenruo wrote: On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote: This is at least less complicated than dm-integrity. Just a new hook for READ bio. And it can start from easy part. Like starting from dm-raid1 and other fs support. It's less complicated for end users (in theory, but cryptsetup devs are working on that for dm-integrity), but significantly more complicated for developers. It also brings up the question of what happens when you want some other layer between the filesystem and the MD/DM RAID layer (say, running bcache or dm-cache on top of the RAID array). In the case of dm-integrity, that's not an issue because dm-integrity is entirely self-contained, it doesn't depend on other layers beyond the standard block interface. Each layer can choose to drop the support for extra verification. If the layer is not modifying the data, it can pass it do lower layer. Just as integrity payload. Which then makes things a bit more complicated in every other layer as well, in turn making things more complicated for all developers. As I mentioned in my other reply on this thread, running with dm-integrity _below_ the RAID layer instead of on top of it will provide the same net effect, and in fact provide a stronger guarantee than what you are proposing (because dm-integrity does real cryptographic integrity verification, as opposed to just checking for bit-rot). Although with more CPU usage for each device even they are containing same data. I never said it wasn't higher resource usage. If your checksum is calculated and checked at FS level there is no added value when you spread this logic to other layers. That's why I'm moving the checking part to lower level, to make more value from the checksum. dm-integrity adds basic 'check-summing' to any filesystem without the need to modify fs itself Well, despite the fact that modern filesystem has already implemented their metadata csum. - the paid price is - if there is bug between passing data from 'fs' to dm-integrity' it cannot be captured. Advantage of having separated 'fs' and 'block' layer is in its separation and simplicity at each level. Totally agreed on this. But the idea here should not bring that large impact (compared to big things like ZFS/Btrfs). 1) It only affect READ bio 2) Every dm target can choose if to support or pass down the hook. no mean to support it for RAID0 for example. And for complex raid like RAID5/6 no need to support it from the very beginning. 3) Main part of the functionality is already implemented The core complexity contains 2 parts: a) checksum calculation and checking Modern fs is already doing this, at least for metadata. b) recovery dm targets already have this implemented for supported raid profile. All these are already implemented, just moving them to different timing is not bringing such big modification IIRC. If you want integrated solution - you are simply looking for btrfs where multiple layers are integrated together. If with such verification hook (along with something extra to handle scrub), btrfs chunk mapping can be re-implemented with device-mapper: In fact btrfs logical space is just a dm-linear device, and each chunk can be implemented by its corresponding dm-* module like: dm-linear: | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n | and btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB btrfs chunk 2: data, using dm-raid0 on disk A B C D ... btrfs chunk n: system, using dm-raid1 on disk A B At least btrfs can take the advantage of the simplicity of separate layers. And other filesystem can get a little higher chance to recover its metadata if built on dm-raid. Again, just put dm-integrity below dm-raid. The other filesystems primarily have metadata checksums to catch data corruption, not repair it, Because they have no extra copy. If they have, they will definitely use the extra copy to repair. But they don't have those extra copies now, so that really becomes irrelevant as an argument (especially since it's not likely they will add data or metadata replication in the filesystem any time in the near future). and I severely doubt that you will manage to convince developers to add support in their filesystem (especially XFS) because: 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit less of an issue because it's a completely self-contained layering violation, while this isn't). If passing something along with bio is violating layers, then integrity payload is already doing this for a long time. The block integrity layer is also interfacing directly with hardware and _needs_ to pass that data down. Unless I'm mistaken, it also doesn't do any verification except in the filesystem layer, and doesn't pass down any complaints about the integrity of the data (it may try to re-read it, but that's not the same as what
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Thu, Nov 16, 2017 at 6:54 PM, Chris Murphywrote: > The user doesn't have to setup dm-verity to get this. Or dm-integrity, rather. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Thu, Nov 16, 2017 at 6:22 PM, Qu Wenruowrote: > > > On 2017年11月17日 06:32, Chris Murphy wrote: > >> It's good the file system can stay alive, but data is the much >> bigger target in terms of percent space on the physical media, > > It's also true. > (Although working on btrfs sometimes makes me care more about safe metadata) It seems like a good idea if it's lightweight enough, because we get Btrfs-like metadata error detection and recovery from a copy, for free. The user doesn't have to setup dm-verity to get this. Additionally, if the work happens in the md driver, then both mdadm and LVM based arrays get the feature (strictly speaking I think dm-raid is deprecated, everything I'm aware of these days uses the md code (including Intel's IMSM firmware based RAID). The gotcha of course is that anytime there's a file system format change, now this layer has to become aware of it and support all versions of that file system's metadata for the purpose of error detection. That might be a bitter pill to swallow in the long term. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote: >> >> This is at least less complicated than dm-integrity. >> >> Just a new hook for READ bio. And it can start from easy part. >> Like starting from dm-raid1 and other fs support. > It's less complicated for end users (in theory, but cryptsetup devs are > working on that for dm-integrity), but significantly more complicated > for developers. > > It also brings up the question of what happens when you want some other > layer between the filesystem and the MD/DM RAID layer (say, running > bcache or dm-cache on top of the RAID array). In the case of > dm-integrity, that's not an issue because dm-integrity is entirely > self-contained, it doesn't depend on other layers beyond the standard > block interface. Each layer can choose to drop the support for extra verification. If the layer is not modifying the data, it can pass it do lower layer. Just as integrity payload. > > As I mentioned in my other reply on this thread, running with > dm-integrity _below_ the RAID layer instead of on top of it will provide > the same net effect, and in fact provide a stronger guarantee than what > you are proposing (because dm-integrity does real cryptographic > integrity verification, as opposed to just checking for bit-rot). Although with more CPU usage for each device even they are containing same data. >> >>> >>> If your checksum is calculated and checked at FS level there is no added >>> value when you spread this logic to other layers. >> >> That's why I'm moving the checking part to lower level, to make more >> value from the checksum. >> >>> >>> dm-integrity adds basic 'check-summing' to any filesystem without the >>> need to modify fs itself >> >> Well, despite the fact that modern filesystem has already implemented >> their metadata csum. >> >> - the paid price is - if there is bug between >>> passing data from 'fs' to dm-integrity' it cannot be captured. >>> >>> Advantage of having separated 'fs' and 'block' layer is in its >>> separation and simplicity at each level. >> >> Totally agreed on this. >> >> But the idea here should not bring that large impact (compared to big >> things like ZFS/Btrfs). >> >> 1) It only affect READ bio >> 2) Every dm target can choose if to support or pass down the hook. >> no mean to support it for RAID0 for example. >> And for complex raid like RAID5/6 no need to support it from the very >> beginning. >> 3) Main part of the functionality is already implemented >> The core complexity contains 2 parts: >> a) checksum calculation and checking >> Modern fs is already doing this, at least for metadata. >> b) recovery >> dm targets already have this implemented for supported raid >> profile. >> All these are already implemented, just moving them to different >> timing is not bringing such big modification IIRC. >>> >>> If you want integrated solution - you are simply looking for btrfs where >>> multiple layers are integrated together. >> >> If with such verification hook (along with something extra to handle >> scrub), btrfs chunk mapping can be re-implemented with device-mapper: >> >> In fact btrfs logical space is just a dm-linear device, and each chunk >> can be implemented by its corresponding dm-* module like: >> >> dm-linear: | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n | >> and >> btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB >> btrfs chunk 2: data, using dm-raid0 on disk A B C D >> ... >> btrfs chunk n: system, using dm-raid1 on disk A B >> >> At least btrfs can take the advantage of the simplicity of separate >> layers. >> >> And other filesystem can get a little higher chance to recover its >> metadata if built on dm-raid. > Again, just put dm-integrity below dm-raid. The other filesystems > primarily have metadata checksums to catch data corruption, not repair > it, Because they have no extra copy. If they have, they will definitely use the extra copy to repair. > and I severely doubt that you will manage to convince developers to > add support in their filesystem (especially XFS) because: > 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit > less of an issue because it's a completely self-contained layering > violation, while this isn't). If passing something along with bio is violating layers, then integrity payload is already doing this for a long time. > 2. There's no precedent in hardware (I challenge you to find a block > device that lets you respond to a read completing with 'Hey, this data > is bogus, give me the real data!'). > 3. You can get the same net effect with a higher guarantee of security > using dm-integrity. With more CPU and IO overhead (journal mode will write data twice, one for journal and one for real data). Thanks, Qu >> >> Thanks, >> Qu >> >>> >>> You are also possibly missing feature of dm-interity - it's not just >>> giving you 'checksum' - it also makes you sure - device has
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月17日 06:32, Chris Murphy wrote: > On Thu, Nov 16, 2017 at 3:04 AM, Qu Wenruowrote: > >> For example, if we use the following device mapper layout: >> >> FS (can be any fs with metadata csum) >> | >> dm-integrity >> | >> dm-raid1 >>/ \ >> disk1 disk2 > > > You would instead do dm-integrity per physical device, then make the > two dm-integrity devices, members of md raid1 array. Now when > integrity fails, basically it's UNC error to raid1 which then gets the > copy from the other device. Yep, dm-integrity under raid1 makes much more sense here. Although, double the CPU usage for each device added in. > > But what you're getting at, that dm-integrity is more complicated, is > true, in that it's at least partly COW based in order to get the > atomic write guarantee needed to ensure data blocks and csums are > always in sync, and reliable. But this also applies to the entire file > system. The READ bio concept you're proposing leverages pretty much > already existing code, has no write performance penalty or complexity > at all, but does miss data for file systems that don't csum data > blocks. That's true, since currently only Btrfs supports data csum. And to make filesystem to support data csum, it needs CoW support while only XFS and Btrfs supports CoW yet. > It's good the file system can stay alive, but data is the much > bigger target in terms of percent space on the physical media, It's also true. (Although working on btrfs sometimes makes me care more about safe metadata) Thanks, Qu > and > more likely to be corrupt or go missing due to media defect or > whatever. It's still possible for silent data corruption to happen. > > > > >> I just want to make device-mapper raid able to handle such case too. >> Especially when most fs supports checksum for their metadata. > > XFS by default does metadata csums. But ext4 doesn't use it for either > metadata or the journal by default still, it is still optional. So for > now it mainly benefits XFS. > > signature.asc Description: OpenPGP digital signature
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Thu, Nov 16, 2017 at 3:04 AM, Qu Wenruowrote: > For example, if we use the following device mapper layout: > > FS (can be any fs with metadata csum) > | > dm-integrity > | > dm-raid1 >/ \ > disk1 disk2 You would instead do dm-integrity per physical device, then make the two dm-integrity devices, members of md raid1 array. Now when integrity fails, basically it's UNC error to raid1 which then gets the copy from the other device. But what you're getting at, that dm-integrity is more complicated, is true, in that it's at least partly COW based in order to get the atomic write guarantee needed to ensure data blocks and csums are always in sync, and reliable. But this also applies to the entire file system. The READ bio concept you're proposing leverages pretty much already existing code, has no write performance penalty or complexity at all, but does miss data for file systems that don't csum data blocks. It's good the file system can stay alive, but data is the much bigger target in terms of percent space on the physical media, and more likely to be corrupt or go missing due to media defect or whatever. It's still possible for silent data corruption to happen. > I just want to make device-mapper raid able to handle such case too. > Especially when most fs supports checksum for their metadata. XFS by default does metadata csums. But ext4 doesn't use it for either metadata or the journal by default still, it is still optional. So for now it mainly benefits XFS. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On Thu, Nov 16, 2017 at 11:47:45AM -0500, Austin S. Hemmelgarn wrote: > > > >At least btrfs can take the advantage of the simplicity of separate layers. > > > >And other filesystem can get a little higher chance to recover its > >metadata if built on dm-raid. > Again, just put dm-integrity below dm-raid. The other filesystems primarily > have metadata checksums to catch data corruption, not repair it, and I > severely doubt that you will manage to convince developers to add support in > their filesystem (especially XFS) because: > 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit > less of an issue because it's a completely self-contained layering > violation, while this isn't). > 2. There's no precedent in hardware (I challenge you to find a block device > that lets you respond to a read completing with 'Hey, this data is bogus, > give me the real data!'). > Isn't this what T10 DIF/DIX (Data Integrity Fields / Data Integrity Extenstions) allows.. using checksums all the way from userspace applications to the disks in the storage backend, with checksum verification at all points in between? Does require compatible hardware/firmware/kernel/drivers/apps though.. so not really a generic solution. -- Pasi > 3. You can get the same net effect with a higher guarantee of security using > dm-integrity. > > > >Thanks, > >Qu > > > >> > >>You are also possibly missing feature of dm-interity - it's not just > >>giving you 'checksum' - it also makes you sure - device has proper > >>content - you can't just 'replace block' even with proper checksum for a > >>block somewhere in the middle of you device... and when joined with > >>crypto - it makes it way more secure... > >> > >>Regards > >> > >>Zdenek > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017-11-16 09:06, Qu Wenruo wrote: On 2017年11月16日 20:33, Zdenek Kabelac wrote: Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a): On 2017年11月16日 17:43, Zdenek Kabelac wrote: Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. Hi We already have dm-integrity target upstream. What's missing in this target ? If I didn't miss anything, the dm-integrity is designed to calculate and restore csum into its space to verify the integrity. The csum happens when bio reaches dm-integrity. However what I want is, fs generate bio with attached verification hook, and pass to lower layers to verify it. For example, if we use the following device mapper layout: FS (can be any fs with metadata csum) | dm-integrity | dm-raid1 / \ disk1 disk2 If some data in disk1 get corrupted (the disk itself is still good), and when dm-raid1 tries to read the corrupted data, it may return the corrupted one, and then caught by dm-integrity, finally return -EIO to FS. But the truth is, we could at least try to read out data in disk2 if we know the csum for it. And use the checksum to verify if it's the correct data. So my idea will be: FS (with metadata csum, or even data csum support) | READ bio for metadata | -With metadata verification hook dm-raid1 / \ disk1 disk2 dm-raid1 handles the bio, reading out data from disk1. But the result can't pass verification hook. Then retry with disk2. If result from disk2 passes verification hook. That's good, returning the result from disk2 to upper layer (fs). And we can even submit WRITE bio to try to write the good result back to disk1. If result from disk2 doesn't pass verification hook, then we return -EIO to upper layer. That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6 will also try to rebuild data, but it still has some problem). I just want to make device-mapper raid able to handle such case too. Especially when most fs supports checksum for their metadata. Hi IMHO you are looking for too complicated solution. This is at least less complicated than dm-integrity. Just a new hook for READ bio. And it can start from easy part. Like starting from dm-raid1 and other fs support. It's less complicated for end users (in theory, but cryptsetup devs are working on that for dm-integrity), but significantly more complicated for developers. It also brings up the question of what happens when you want some other layer between the filesystem and the MD/DM RAID layer (say, running bcache or dm-cache on top of the RAID array). In the case of dm-integrity, that's not an issue because dm-integrity is entirely self-contained, it doesn't depend on other layers beyond the standard block interface. As I mentioned in my other reply on this thread, running with dm-integrity _below_ the RAID layer instead of on top of it will provide the same net effect, and in fact provide a stronger guarantee than what you are proposing (because dm-integrity does real cryptographic integrity verification, as opposed to just checking for bit-rot). If your checksum is calculated and checked at FS level there is no added value when you spread this logic to other layers. That's why I'm moving the checking part to lower level, to make more value from the checksum. dm-integrity adds basic 'check-summing' to any filesystem without the need to modify fs itself Well, despite the fact that modern filesystem has already implemented their metadata csum. - the paid price is - if there is bug between passing data from 'fs' to dm-integrity' it cannot be captured. Advantage of having separated 'fs' and 'block' layer is in its separation and simplicity at each level. Totally agreed on this. But the idea here should not bring that large impact (compared to big things like ZFS/Btrfs). 1) It only affect READ bio 2) Every dm target can choose if to support or pass down the hook. no mean to support it for RAID0 for example. And for complex raid like RAID5/6 no need to support it from the very beginning. 3) Main part of the functionality is already implemented The core complexity contains 2 parts: a) checksum calculation and checking Modern fs is already doing this, at least for metadata. b) recovery dm targets already have this implemented for supported raid profile. All these are already implemented, just moving them to different timing is not bringing such big modification IIRC. If you want integrated solution - you are simply looking for btrfs where multiple layers are integrated together. If with such verification hook (along with something extra to handle scrub), btrfs chunk mapping can be re-implemented with device-mapper: In
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月16日 20:33, Zdenek Kabelac wrote: > Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a): >> >> >> On 2017年11月16日 17:43, Zdenek Kabelac wrote: >>> Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): > [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. >>> >>> Hi >>> >>> We already have dm-integrity target upstream. >>> What's missing in this target ? >> >> If I didn't miss anything, the dm-integrity is designed to calculate and >> restore csum into its space to verify the integrity. >> The csum happens when bio reaches dm-integrity. >> >> However what I want is, fs generate bio with attached verification hook, >> and pass to lower layers to verify it. >> >> For example, if we use the following device mapper layout: >> >> FS (can be any fs with metadata csum) >> | >> dm-integrity >> | >> dm-raid1 >> / \ >> disk1 disk2 >> >> If some data in disk1 get corrupted (the disk itself is still good), and >> when dm-raid1 tries to read the corrupted data, it may return the >> corrupted one, and then caught by dm-integrity, finally return -EIO to >> FS. >> >> But the truth is, we could at least try to read out data in disk2 if we >> know the csum for it. >> And use the checksum to verify if it's the correct data. >> >> >> So my idea will be: >> FS (with metadata csum, or even data csum support) >> | READ bio for metadata >> | -With metadata verification hook >> dm-raid1 >> / \ >> disk1 disk2 >> >> dm-raid1 handles the bio, reading out data from disk1. >> But the result can't pass verification hook. >> Then retry with disk2. >> >> If result from disk2 passes verification hook. That's good, returning >> the result from disk2 to upper layer (fs). >> And we can even submit WRITE bio to try to write the good result back to >> disk1. >> >> If result from disk2 doesn't pass verification hook, then we return -EIO >> to upper layer. >> >> That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6 >> will also try to rebuild data, but it still has some problem). >> >> I just want to make device-mapper raid able to handle such case too. >> Especially when most fs supports checksum for their metadata. >> > > Hi > > IMHO you are looking for too complicated solution. This is at least less complicated than dm-integrity. Just a new hook for READ bio. And it can start from easy part. Like starting from dm-raid1 and other fs support. > > If your checksum is calculated and checked at FS level there is no added > value when you spread this logic to other layers. That's why I'm moving the checking part to lower level, to make more value from the checksum. > > dm-integrity adds basic 'check-summing' to any filesystem without the > need to modify fs itself Well, despite the fact that modern filesystem has already implemented their metadata csum. - the paid price is - if there is bug between > passing data from 'fs' to dm-integrity' it cannot be captured. > > Advantage of having separated 'fs' and 'block' layer is in its > separation and simplicity at each level. Totally agreed on this. But the idea here should not bring that large impact (compared to big things like ZFS/Btrfs). 1) It only affect READ bio 2) Every dm target can choose if to support or pass down the hook. no mean to support it for RAID0 for example. And for complex raid like RAID5/6 no need to support it from the very beginning. 3) Main part of the functionality is already implemented The core complexity contains 2 parts: a) checksum calculation and checking Modern fs is already doing this, at least for metadata. b) recovery dm targets already have this implemented for supported raid profile. All these are already implemented, just moving them to different timing is not bringing such big modification IIRC. > > If you want integrated solution - you are simply looking for btrfs where > multiple layers are integrated together. If with such verification hook (along with something extra to handle scrub), btrfs chunk mapping can be re-implemented with device-mapper: In fact btrfs logical space is just a dm-linear device, and each chunk can be implemented by its corresponding dm-* module like: dm-linear: | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n | and btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB btrfs chunk 2: data, using dm-raid0 on disk A B C D ... btrfs chunk n: system, using dm-raid1 on disk A B At least btrfs can take the advantage of the simplicity of separate layers. And other filesystem can get a little higher chance to recover its metadata if built on dm-raid. Thanks, Qu > > You are also possibly missing feature of dm-interity - it's not just > giving you 'checksum' - it also makes you sure - device
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017-11-16 07:33, Zdenek Kabelac wrote: Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a): On 2017年11月16日 17:43, Zdenek Kabelac wrote: Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. Hi We already have dm-integrity target upstream. What's missing in this target ? If I didn't miss anything, the dm-integrity is designed to calculate and restore csum into its space to verify the integrity. The csum happens when bio reaches dm-integrity. However what I want is, fs generate bio with attached verification hook, and pass to lower layers to verify it. For example, if we use the following device mapper layout: FS (can be any fs with metadata csum) | dm-integrity | dm-raid1 / \ disk1 disk2 If some data in disk1 get corrupted (the disk itself is still good), and when dm-raid1 tries to read the corrupted data, it may return the corrupted one, and then caught by dm-integrity, finally return -EIO to FS. But the truth is, we could at least try to read out data in disk2 if we know the csum for it. And use the checksum to verify if it's the correct data. So my idea will be: FS (with metadata csum, or even data csum support) | READ bio for metadata | -With metadata verification hook dm-raid1 / \ disk1 disk2 dm-raid1 handles the bio, reading out data from disk1. But the result can't pass verification hook. Then retry with disk2. If result from disk2 passes verification hook. That's good, returning the result from disk2 to upper layer (fs). And we can even submit WRITE bio to try to write the good result back to disk1. If result from disk2 doesn't pass verification hook, then we return -EIO to upper layer. That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6 will also try to rebuild data, but it still has some problem). I just want to make device-mapper raid able to handle such case too. Especially when most fs supports checksum for their metadata. Hi IMHO you are looking for too complicated solution. If your checksum is calculated and checked at FS level there is no added value when you spread this logic to other layers. dm-integrity adds basic 'check-summing' to any filesystem without the need to modify fs itself - the paid price is - if there is bug between passing data from 'fs' to dm-integrity' it cannot be captured. But that is true of pretty much any layering, not just dm-integrity. There's just a slightly larger window for corruption with dm-integrity. Advantage of having separated 'fs' and 'block' layer is in its separation and simplicity at each level. If you want integrated solution - you are simply looking for btrfs where multiple layers are integrated together. You are also possibly missing feature of dm-interity - it's not just giving you 'checksum' - it also makes you sure - device has proper content - you can't just 'replace block' even with proper checksum for a block somewhere in the middle of you device... and when joined with crypto - it makes it way more secure... And to expand a bit further, the correct way to integrate dm-integrity into the stack when RAID is involved is to put it _below_ the RAID layer, so each underlying device is it's own dm-integrity target. Assuming I understand the way dm-raid and md handle -EIO, that should get you a similar level of protection to BTRFS (worse in some ways, better in others). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
Dne 16.11.2017 v 11:04 Qu Wenruo napsal(a): On 2017年11月16日 17:43, Zdenek Kabelac wrote: Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. Hi We already have dm-integrity target upstream. What's missing in this target ? If I didn't miss anything, the dm-integrity is designed to calculate and restore csum into its space to verify the integrity. The csum happens when bio reaches dm-integrity. However what I want is, fs generate bio with attached verification hook, and pass to lower layers to verify it. For example, if we use the following device mapper layout: FS (can be any fs with metadata csum) | dm-integrity | dm-raid1 / \ disk1 disk2 If some data in disk1 get corrupted (the disk itself is still good), and when dm-raid1 tries to read the corrupted data, it may return the corrupted one, and then caught by dm-integrity, finally return -EIO to FS. But the truth is, we could at least try to read out data in disk2 if we know the csum for it. And use the checksum to verify if it's the correct data. So my idea will be: FS (with metadata csum, or even data csum support) | READ bio for metadata | -With metadata verification hook dm-raid1 / \ disk1 disk2 dm-raid1 handles the bio, reading out data from disk1. But the result can't pass verification hook. Then retry with disk2. If result from disk2 passes verification hook. That's good, returning the result from disk2 to upper layer (fs). And we can even submit WRITE bio to try to write the good result back to disk1. If result from disk2 doesn't pass verification hook, then we return -EIO to upper layer. That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6 will also try to rebuild data, but it still has some problem). I just want to make device-mapper raid able to handle such case too. Especially when most fs supports checksum for their metadata. Hi IMHO you are looking for too complicated solution. If your checksum is calculated and checked at FS level there is no added value when you spread this logic to other layers. dm-integrity adds basic 'check-summing' to any filesystem without the need to modify fs itself - the paid price is - if there is bug between passing data from 'fs' to dm-integrity' it cannot be captured. Advantage of having separated 'fs' and 'block' layer is in its separation and simplicity at each level. If you want integrated solution - you are simply looking for btrfs where multiple layers are integrated together. You are also possibly missing feature of dm-interity - it's not just giving you 'checksum' - it also makes you sure - device has proper content - you can't just 'replace block' even with proper checksum for a block somewhere in the middle of you device... and when joined with crypto - it makes it way more secure... Regards Zdenek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月16日 17:43, Zdenek Kabelac wrote: > Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): >> >> >>> >> [What we have] >> The nearest infrastructure I found in kernel is >> bio_integrity_payload. >> > > Hi > > We already have dm-integrity target upstream. > What's missing in this target ? If I didn't miss anything, the dm-integrity is designed to calculate and restore csum into its space to verify the integrity. The csum happens when bio reaches dm-integrity. However what I want is, fs generate bio with attached verification hook, and pass to lower layers to verify it. For example, if we use the following device mapper layout: FS (can be any fs with metadata csum) | dm-integrity | dm-raid1 / \ disk1 disk2 If some data in disk1 get corrupted (the disk itself is still good), and when dm-raid1 tries to read the corrupted data, it may return the corrupted one, and then caught by dm-integrity, finally return -EIO to FS. But the truth is, we could at least try to read out data in disk2 if we know the csum for it. And use the checksum to verify if it's the correct data. So my idea will be: FS (with metadata csum, or even data csum support) | READ bio for metadata | -With metadata verification hook dm-raid1 / \ disk1 disk2 dm-raid1 handles the bio, reading out data from disk1. But the result can't pass verification hook. Then retry with disk2. If result from disk2 passes verification hook. That's good, returning the result from disk2 to upper layer (fs). And we can even submit WRITE bio to try to write the good result back to disk1. If result from disk2 doesn't pass verification hook, then we return -EIO to upper layer. That's what btrfs has already done for DUP/RAID1/10 (although RAID5/6 will also try to rebuild data, but it still has some problem). I just want to make device-mapper raid able to handle such case too. Especially when most fs supports checksum for their metadata. Thanks, Qu > > Regards > > Zdenek > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
Dne 16.11.2017 v 09:08 Qu Wenruo napsal(a): [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. Hi We already have dm-integrity target upstream. What's missing in this target ? Regards Zdenek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月16日 15:42, Nikolay Borisov wrote: > > > On 16.11.2017 09:38, Qu Wenruo wrote: >> >> >> On 2017年11月16日 14:54, Nikolay Borisov wrote: >>> >>> >>> On 16.11.2017 04:18, Qu Wenruo wrote: Hi all, [Background] Recently I'm considering the possibility to use checksum from filesystem to enhance device-mapper raid. The idea behind it is quite simple, since most modern filesystems have checksum for their metadata, and even some (btrfs) have checksum for data. And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time it can use the checksum to determine which copy is correct so it can return the correct data even one copy get corrupted. [Objective] The final objective is to allow device mapper to do the checksum verification (and repair if possible). If only for verification, it's not much different from current endio hook method used by most of the fs. However if we can move the repair part from filesystem (well, only btrfs supports it yet), it would benefit all fs. [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. However I found it's bounded to device, as it's designed to support SCSI/SATA integrity protocol. While for such use case, it's more bounded to filesystem, as fs (or higher layer dm device) is the source of integrity data, and device (dm-raid) only do the verification and possible repair. I'm not sure if this is a good idea to reuse or abuse bio_integrity_payload for this purpose. Should we use some new infrastructure or enhance existing bio_integrity_payload? (Or is this a valid idea or just another crazy dream?) >>> >>> This sounds good in principle, however I think there is one crucial >>> point which needs to be considered: >>> >>> All fs with checksums store those checksums in some specific way, then >>> when they fetch data from disk they they also know how to acquire the >>> respective checksum. >> >> Just like integrity payload, we generate READ bio attached with checksum >> hook function and checksum data. > > So how is this checksum data acquired in the first place? In btrfs case, through metadata read bio. Since btrfs put data csum into its csum tree, as metadata. Pass a READ bio with metadata specific verification function, and empty verification data. > >> >> So for data read, we read checksum first and attach it to data READ bio, >> then submit it. >> >> And for metadata read, in most case the checksum is integrated into >> metadata header, like what we did in btrfs. >> >> In that case we attach empty checksum data to bio, but use metadata >> specific function hook to handle it. >> >>> What you suggest might be doable but it will >>> require lower layers (dm) be aware of how to acquire the specific >>> checksum for some data. >> >> In above case, dm only needs to call the verification hook function. >> If verification passed, that's good. >> If not, try other copy if we have. >> >> In this case, I don't think dm layer needs any extra interface to >> communicate with higher layer. > > > Well that verification function is the interface I meant, you are > communicating the checksum out of band essentially (notwithstanding the > metadata case, since you said checksum is in the actual metadata header) > > In the end - which problem are you trying to solve, allow for a generic > checksumming layer which filesystems may use if they decide to ? To make it clear, to allow device mapper layer to take use of filesystem checksum (if they have) when there are multiple copies. One problem of current dm raid1/10 (and possible raid5/6) is that they don't have ability to know which copy is correct. They can only handle device disappear. Btrfs handles it by verifying data/metadata checksum. While xfs/ext4 also has checksum for their metadata, why not allowing device mapper to use such checksum to get the correct copy? The mechanism is *NOT* a generic checksum layer. How the csum is stored is determined by fs. Just allow device mapper layer to be aware of this and make clever decision. And more, this only affects READ bio, WRITE bio is not affected at all. Csum calculation and storing is all handled by filesystem. Device mapper layer won't need to get involved in that case. And of course, btrfs can reuse this facility to do something bigger, but that's another story. Thanks, Qu > >> >> Thanks, >> Qu >> >>> I don't think at this point there is such infra >>> and frankly I cannot even envision how it will work elegantly. Sure you >>> can create a dm-checksum target (which I believe dm-verity is very >>> similar to) that stores checksums alongside data but at this point the >>> fs is really out of the picture. >>> >>> Thanks, Qu >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 16.11.2017 09:38, Qu Wenruo wrote: > > > On 2017年11月16日 14:54, Nikolay Borisov wrote: >> >> >> On 16.11.2017 04:18, Qu Wenruo wrote: >>> Hi all, >>> >>> [Background] >>> Recently I'm considering the possibility to use checksum from filesystem >>> to enhance device-mapper raid. >>> >>> The idea behind it is quite simple, since most modern filesystems have >>> checksum for their metadata, and even some (btrfs) have checksum for data. >>> >>> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time >>> it can use the checksum to determine which copy is correct so it can >>> return the correct data even one copy get corrupted. >>> >>> [Objective] >>> The final objective is to allow device mapper to do the checksum >>> verification (and repair if possible). >>> >>> If only for verification, it's not much different from current endio >>> hook method used by most of the fs. >>> However if we can move the repair part from filesystem (well, only btrfs >>> supports it yet), it would benefit all fs. >>> >>> [What we have] >>> The nearest infrastructure I found in kernel is bio_integrity_payload. >>> >>> However I found it's bounded to device, as it's designed to support >>> SCSI/SATA integrity protocol. >>> While for such use case, it's more bounded to filesystem, as fs (or >>> higher layer dm device) is the source of integrity data, and device >>> (dm-raid) only do the verification and possible repair. >>> >>> I'm not sure if this is a good idea to reuse or abuse >>> bio_integrity_payload for this purpose. >>> >>> Should we use some new infrastructure or enhance existing >>> bio_integrity_payload? >>> >>> (Or is this a valid idea or just another crazy dream?) >>> >> >> This sounds good in principle, however I think there is one crucial >> point which needs to be considered: >> >> All fs with checksums store those checksums in some specific way, then >> when they fetch data from disk they they also know how to acquire the >> respective checksum. > > Just like integrity payload, we generate READ bio attached with checksum > hook function and checksum data. So how is this checksum data acquired in the first place? > > So for data read, we read checksum first and attach it to data READ bio, > then submit it. > > And for metadata read, in most case the checksum is integrated into > metadata header, like what we did in btrfs. > > In that case we attach empty checksum data to bio, but use metadata > specific function hook to handle it. > >> What you suggest might be doable but it will >> require lower layers (dm) be aware of how to acquire the specific >> checksum for some data. > > In above case, dm only needs to call the verification hook function. > If verification passed, that's good. > If not, try other copy if we have. > > In this case, I don't think dm layer needs any extra interface to > communicate with higher layer. Well that verification function is the interface I meant, you are communicating the checksum out of band essentially (notwithstanding the metadata case, since you said checksum is in the actual metadata header) In the end - which problem are you trying to solve, allow for a generic checksumming layer which filesystems may use if they decide to ? > > Thanks, > Qu > >> I don't think at this point there is such infra >> and frankly I cannot even envision how it will work elegantly. Sure you >> can create a dm-checksum target (which I believe dm-verity is very >> similar to) that stores checksums alongside data but at this point the >> fs is really out of the picture. >> >> >>> Thanks, >>> Qu >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 2017年11月16日 14:54, Nikolay Borisov wrote: > > > On 16.11.2017 04:18, Qu Wenruo wrote: >> Hi all, >> >> [Background] >> Recently I'm considering the possibility to use checksum from filesystem >> to enhance device-mapper raid. >> >> The idea behind it is quite simple, since most modern filesystems have >> checksum for their metadata, and even some (btrfs) have checksum for data. >> >> And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time >> it can use the checksum to determine which copy is correct so it can >> return the correct data even one copy get corrupted. >> >> [Objective] >> The final objective is to allow device mapper to do the checksum >> verification (and repair if possible). >> >> If only for verification, it's not much different from current endio >> hook method used by most of the fs. >> However if we can move the repair part from filesystem (well, only btrfs >> supports it yet), it would benefit all fs. >> >> [What we have] >> The nearest infrastructure I found in kernel is bio_integrity_payload. >> >> However I found it's bounded to device, as it's designed to support >> SCSI/SATA integrity protocol. >> While for such use case, it's more bounded to filesystem, as fs (or >> higher layer dm device) is the source of integrity data, and device >> (dm-raid) only do the verification and possible repair. >> >> I'm not sure if this is a good idea to reuse or abuse >> bio_integrity_payload for this purpose. >> >> Should we use some new infrastructure or enhance existing >> bio_integrity_payload? >> >> (Or is this a valid idea or just another crazy dream?) >> > > This sounds good in principle, however I think there is one crucial > point which needs to be considered: > > All fs with checksums store those checksums in some specific way, then > when they fetch data from disk they they also know how to acquire the > respective checksum. Just like integrity payload, we generate READ bio attached with checksum hook function and checksum data. So for data read, we read checksum first and attach it to data READ bio, then submit it. And for metadata read, in most case the checksum is integrated into metadata header, like what we did in btrfs. In that case we attach empty checksum data to bio, but use metadata specific function hook to handle it. > What you suggest might be doable but it will > require lower layers (dm) be aware of how to acquire the specific > checksum for some data. In above case, dm only needs to call the verification hook function. If verification passed, that's good. If not, try other copy if we have. In this case, I don't think dm layer needs any extra interface to communicate with higher layer. Thanks, Qu > I don't think at this point there is such infra > and frankly I cannot even envision how it will work elegantly. Sure you > can create a dm-checksum target (which I believe dm-verity is very > similar to) that stores checksums alongside data but at this point the > fs is really out of the picture. > > >> Thanks, >> Qu >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
On 16.11.2017 04:18, Qu Wenruo wrote: > Hi all, > > [Background] > Recently I'm considering the possibility to use checksum from filesystem > to enhance device-mapper raid. > > The idea behind it is quite simple, since most modern filesystems have > checksum for their metadata, and even some (btrfs) have checksum for data. > > And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time > it can use the checksum to determine which copy is correct so it can > return the correct data even one copy get corrupted. > > [Objective] > The final objective is to allow device mapper to do the checksum > verification (and repair if possible). > > If only for verification, it's not much different from current endio > hook method used by most of the fs. > However if we can move the repair part from filesystem (well, only btrfs > supports it yet), it would benefit all fs. > > [What we have] > The nearest infrastructure I found in kernel is bio_integrity_payload. > > However I found it's bounded to device, as it's designed to support > SCSI/SATA integrity protocol. > While for such use case, it's more bounded to filesystem, as fs (or > higher layer dm device) is the source of integrity data, and device > (dm-raid) only do the verification and possible repair. > > I'm not sure if this is a good idea to reuse or abuse > bio_integrity_payload for this purpose. > > Should we use some new infrastructure or enhance existing > bio_integrity_payload? > > (Or is this a valid idea or just another crazy dream?) > This sounds good in principle, however I think there is one crucial point which needs to be considered: All fs with checksums store those checksums in some specific way, then when they fetch data from disk they they also know how to acquire the respective checksum. What you suggest might be doable but it will require lower layers (dm) be aware of how to acquire the specific checksum for some data. I don't think at this point there is such infra and frankly I cannot even envision how it will work elegantly. Sure you can create a dm-checksum target (which I believe dm-verity is very similar to) that stores checksums alongside data but at this point the fs is really out of the picture. > Thanks, > Qu > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?
Hi all, [Background] Recently I'm considering the possibility to use checksum from filesystem to enhance device-mapper raid. The idea behind it is quite simple, since most modern filesystems have checksum for their metadata, and even some (btrfs) have checksum for data. And for btrfs RAID1/10 (just ignore the RAID5/6 for now), at read time it can use the checksum to determine which copy is correct so it can return the correct data even one copy get corrupted. [Objective] The final objective is to allow device mapper to do the checksum verification (and repair if possible). If only for verification, it's not much different from current endio hook method used by most of the fs. However if we can move the repair part from filesystem (well, only btrfs supports it yet), it would benefit all fs. [What we have] The nearest infrastructure I found in kernel is bio_integrity_payload. However I found it's bounded to device, as it's designed to support SCSI/SATA integrity protocol. While for such use case, it's more bounded to filesystem, as fs (or higher layer dm device) is the source of integrity data, and device (dm-raid) only do the verification and possible repair. I'm not sure if this is a good idea to reuse or abuse bio_integrity_payload for this purpose. Should we use some new infrastructure or enhance existing bio_integrity_payload? (Or is this a valid idea or just another crazy dream?) Thanks, Qu signature.asc Description: OpenPGP digital signature