Re: Status of free-space-tree feature
On 2016/09/22 17:26, Omar Sandoval wrote: > As far as I know, space_cache=v2 isn't documented in the man page, so > hopefully anyone who managed to find out about it follows the mailing > list. If someone updates their kernel without updating btrfs-progs, the > new compat_ro bit will prevent the old version from doing any damage. Actually it's documented in `man 5 btrfs`. //wbr ojab -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
On Thu, Sep 22, 2016 at 05:47:59PM +, ojab wrote: > On 2016/09/22 17:26, Omar Sandoval wrote: > > As far as I know, space_cache=v2 isn't documented in the man page, so > > hopefully anyone who managed to find out about it follows the mailing > > list. If someone updates their kernel without updating btrfs-progs, the > > new compat_ro bit will prevent the old version from doing any damage. > > Actually it's documented in `man 5 btrfs`. > > //wbr ojab Ah, thanks. I'll be sure to update that page when I send the btrfs-progs patch. -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
On Thu, Sep 22, 2016 at 12:10:09PM +0200, Hans van Kranenburg wrote: > Hi, > > On 09/22/2016 10:52 AM, David Sterba wrote: > > On Wed, Sep 21, 2016 at 01:31:52PM -0700, Omar Sandoval wrote: > I'm not sure I understand - can you explain why this is was so wrong? > Or Omar maybe? > > If btrfsck wants to correct something (write), it can simply always > and unconditionally invalidate the fst instead of trying to "repair" > it..and I think that's what happens at the moment (at least I think > it did for me recently). That seems like a safe and simple way. > >>> I know this is what it does with the regular FSC, but I'm not sure if it > >>> does so with the FST. If it doesn't, it probably should though. > >> > >> It doesn't. The free space cache is easy to invalidate because we can > >> just compare the generation number of the superblock to that of the > >> space cache, but as it exists now, the free space tree doesn't have > >> anything equivalent. That means that any modifications that btrfs-progs > >> made to a space_cache=v2 filesystem could potentially have caused > >> filesystem corruption :/ > > Can you elaborate on what kind of corruption (characteristics / impact) > is meant here? Is it e.g. corruption as in a. filesystem becomes > unusable beyond repair b. kernel crashes when it encounters some data c. > data loss because disk space gets used twice, d. etc? Pretty much option c, yeah. If we allocate some space but don't update the free space tree, when the kernel mounts, it will think it's still free and allocate over already allocated data or metadata. Depending on what exactly it overwrites, that could lead to option a. Option b _might_ also happen if the free space tree code gets confused when trying to update the free space tree with nonsense from the now mismatched extent tree. > I suspect by not being able to update the free space info, but at the > same time doing writes, it results in free space being tracked which is > not actually free any more. I might assume (but haven't been reading / > searching through source code yet this morning) that a sane way of > dealing with a free space cache in kernel would be to always first > validate any claims about free space before actually trying to blindly > putting something in the advertised free range. The only way to verify the free space tree is to check it against the extent tree, which nullifies all of the benefits of having the free space tree in the first place. The free space tree isn't really a cache. It's designed to be the source of truth which can't get out of sync with the extent tree (barring kernel and progs bugs, of course). > >> However, I talked this through with Chris, and he came up with a great > >> idea that will help us deal with both this issue and the endianness > >> issue [1] in one fell swoop. Basically, my objection to adding a compat > >> bit for the endianness bug was that it would unnecessarily affect the > >> vast majority of users on x86; forcing those users to recreate their > >> free space tree seemed silly. However, because of the btrfs-progs bug, > >> just to be safe, we might as well force everyone to recreate their free > >> space tree. > >> > >> The solution is to add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit > >> isn't set, then we destroy and rebuild the free space tree. This covers > >> the case of big-endian kernels which created broken free space trees and > >> filesystems which could have possibly gone through btrfs-progs. > > Just some humble feedback... what does "FREE_SPACE_TREE_VALID" mean in > two years from now, when another bug is found that might require the > same sledgehammer approach to fix it? Will it require adding > FREE_SPACE_TREE_NOW_REALLY_VALID? If it happens, we can probably just stick on a FREE_SPACE_TREE_VALID2 or something, and check both bits. Hopefully it doesn't come to that. > Also, the same bit is being used for two different purposes... > 1. "pfew, we're hopefully post the btrfs-progs issue, and the user will > hopefully not use an old progs again that doesn't clear me when writing, > otherwise there's still no way to detect that and we'll falsely assume > we're ok, b" (once) > 2. btrfs-progs without r/w fst but with fst invalidate support touched > the fs, (any time in the future) Clever, isn't it? ;) Yeah, it's not the prettiest thing, but since we have to add the valid bit anyways, we might as well add support to progs. > btrfs progs >= 4.5 && < v4.7.3 are already "out in the wild". So I think > there should be a warning for users which want to try out FST. The > general recommendation to use btrfs-progs with a version number that is > at least as high as the kernel version (>=4.5 for FST) doesn't help any > more in this case. > > Added it: > https://btrfs.wiki.kernel.org/index.php/Status#Free_space_tree Thanks! > Still requires users to find and read this info instead of looking at > kernel release notes and man
Re: Status of free-space-tree feature
On Thu, Sep 22, 2016 at 10:52:05AM +0200, David Sterba wrote: > On Wed, Sep 21, 2016 at 01:31:52PM -0700, Omar Sandoval wrote: > > > > I'm not sure I understand - can you explain why this is was so wrong? > > > > Or Omar maybe? > > > > > > > > If btrfsck wants to correct something (write), it can simply always > > > > and unconditionally invalidate the fst instead of trying to "repair" > > > > it..and I think that's what happens at the moment (at least I think > > > > it did for me recently). That seems like a safe and simple way. > > > I know this is what it does with the regular FSC, but I'm not sure if it > > > does so with the FST. If it doesn't, it probably should though. > > > > It doesn't. The free space cache is easy to invalidate because we can > > just compare the generation number of the superblock to that of the > > space cache, but as it exists now, the free space tree doesn't have > > anything equivalent. That means that any modifications that btrfs-progs > > made to a space_cache=v2 filesystem could potentially have caused > > filesystem corruption :/ > > > > However, I talked this through with Chris, and he came up with a great > > idea that will help us deal with both this issue and the endianness > > issue [1] in one fell swoop. Basically, my objection to adding a compat > > bit for the endianness bug was that it would unnecessarily affect the > > vast majority of users on x86; forcing those users to recreate their > > free space tree seemed silly. However, because of the btrfs-progs bug, > > just to be safe, we might as well force everyone to recreate their free > > space tree. > > > > The solution is to add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit > > isn't set, then we destroy and rebuild the free space tree. This covers > > the case of big-endian kernels which created broken free space trees and > > filesystems which could have possibly gone through btrfs-progs. > > > > This time we'll make sure not to make btrfs-progs think it can mount > > FREE_SPACE_TREE_VALID filesystems read-write. We can even have > > btrfs-progs check for that bit and clear it if it's mounting read-write. > > The next time it gets mounted, the kernel will recreate the tree. It's > > not pretty, but it'll work. > > Sounds like a good plan to me. The bit is a form of 'clear_cache' mount. > We need to to a coordinated fix (kernel, progs), if the patches are > ready soon, 4.9 is feasible target. I'll try to get them out later today. -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
Hi, On 09/22/2016 10:52 AM, David Sterba wrote: > On Wed, Sep 21, 2016 at 01:31:52PM -0700, Omar Sandoval wrote: I'm not sure I understand - can you explain why this is was so wrong? Or Omar maybe? If btrfsck wants to correct something (write), it can simply always and unconditionally invalidate the fst instead of trying to "repair" it..and I think that's what happens at the moment (at least I think it did for me recently). That seems like a safe and simple way. >>> I know this is what it does with the regular FSC, but I'm not sure if it >>> does so with the FST. If it doesn't, it probably should though. >> >> It doesn't. The free space cache is easy to invalidate because we can >> just compare the generation number of the superblock to that of the >> space cache, but as it exists now, the free space tree doesn't have >> anything equivalent. That means that any modifications that btrfs-progs >> made to a space_cache=v2 filesystem could potentially have caused >> filesystem corruption :/ Can you elaborate on what kind of corruption (characteristics / impact) is meant here? Is it e.g. corruption as in a. filesystem becomes unusable beyond repair b. kernel crashes when it encounters some data c. data loss because disk space gets used twice, d. etc? I suspect by not being able to update the free space info, but at the same time doing writes, it results in free space being tracked which is not actually free any more. I might assume (but haven't been reading / searching through source code yet this morning) that a sane way of dealing with a free space cache in kernel would be to always first validate any claims about free space before actually trying to blindly putting something in the advertised free range. >> However, I talked this through with Chris, and he came up with a great >> idea that will help us deal with both this issue and the endianness >> issue [1] in one fell swoop. Basically, my objection to adding a compat >> bit for the endianness bug was that it would unnecessarily affect the >> vast majority of users on x86; forcing those users to recreate their >> free space tree seemed silly. However, because of the btrfs-progs bug, >> just to be safe, we might as well force everyone to recreate their free >> space tree. >> >> The solution is to add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit >> isn't set, then we destroy and rebuild the free space tree. This covers >> the case of big-endian kernels which created broken free space trees and >> filesystems which could have possibly gone through btrfs-progs. Just some humble feedback... what does "FREE_SPACE_TREE_VALID" mean in two years from now, when another bug is found that might require the same sledgehammer approach to fix it? Will it require adding FREE_SPACE_TREE_NOW_REALLY_VALID? Also, the same bit is being used for two different purposes... 1. "pfew, we're hopefully post the btrfs-progs issue, and the user will hopefully not use an old progs again that doesn't clear me when writing, otherwise there's still no way to detect that and we'll falsely assume we're ok, b" (once) 2. btrfs-progs without r/w fst but with fst invalidate support touched the fs, (any time in the future) btrfs progs >= 4.5 && < v4.7.3 are already "out in the wild". So I think there should be a warning for users which want to try out FST. The general recommendation to use btrfs-progs with a version number that is at least as high as the kernel version (>=4.5 for FST) doesn't help any more in this case. Added it: https://btrfs.wiki.kernel.org/index.php/Status#Free_space_tree Still requires users to find and read this info instead of looking at kernel release notes and man pages (so, it's great that it can be found now, thanks to the status page). >> This time we'll make sure not to make btrfs-progs think it can mount >> FREE_SPACE_TREE_VALID filesystems read-write. We can even have >> btrfs-progs check for that bit and clear it if it's mounting read-write. >> The next time it gets mounted, the kernel will recreate the tree. It's >> not pretty, but it'll work. > > Sounds like a good plan to me. The bit is a form of 'clear_cache' mount. > We need to to a coordinated fix (kernel, progs), if the patches are > ready soon, 4.9 is feasible target. How will this be announced, to prevent me from being unexpectedly greeted with hours of filesystem downtime during mount? What should be the way of working now, for a user which wants to use FST and any btrfs-progs r/w functionality? 1) umount 2) mount with clear_cache,space_cache=v1 3) use progs, e.g. btrfstune 4) mount with clear_cache,space_cache=v2 Since users who benefit from the FST are generally users that run huge filesystems, this gets a bit painful. But, I guess that might be the price that you have to pay for using cool new things. P.S. I really like the space tree, it's so much of an improvement on btrfs-transaction commits! -- Hans van Kranenburg -- To unsubscribe fr
Re: Status of free-space-tree feature
On Wed, Sep 21, 2016 at 01:31:52PM -0700, Omar Sandoval wrote: > > > I'm not sure I understand - can you explain why this is was so wrong? > > > Or Omar maybe? > > > > > > If btrfsck wants to correct something (write), it can simply always > > > and unconditionally invalidate the fst instead of trying to "repair" > > > it..and I think that's what happens at the moment (at least I think > > > it did for me recently). That seems like a safe and simple way. > > I know this is what it does with the regular FSC, but I'm not sure if it > > does so with the FST. If it doesn't, it probably should though. > > It doesn't. The free space cache is easy to invalidate because we can > just compare the generation number of the superblock to that of the > space cache, but as it exists now, the free space tree doesn't have > anything equivalent. That means that any modifications that btrfs-progs > made to a space_cache=v2 filesystem could potentially have caused > filesystem corruption :/ > > However, I talked this through with Chris, and he came up with a great > idea that will help us deal with both this issue and the endianness > issue [1] in one fell swoop. Basically, my objection to adding a compat > bit for the endianness bug was that it would unnecessarily affect the > vast majority of users on x86; forcing those users to recreate their > free space tree seemed silly. However, because of the btrfs-progs bug, > just to be safe, we might as well force everyone to recreate their free > space tree. > > The solution is to add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit > isn't set, then we destroy and rebuild the free space tree. This covers > the case of big-endian kernels which created broken free space trees and > filesystems which could have possibly gone through btrfs-progs. > > This time we'll make sure not to make btrfs-progs think it can mount > FREE_SPACE_TREE_VALID filesystems read-write. We can even have > btrfs-progs check for that bit and clear it if it's mounting read-write. > The next time it gets mounted, the kernel will recreate the tree. It's > not pretty, but it'll work. Sounds like a good plan to me. The bit is a form of 'clear_cache' mount. We need to to a coordinated fix (kernel, progs), if the patches are ready soon, 4.9 is feasible target. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
Hey, guys, On Wed, Sep 21, 2016 at 08:12:51AM -0400, Austin S. Hemmelgarn wrote: > On 2016-09-21 06:25, Holger Hoffstätte wrote: > > On 09/21/16 11:24, David Sterba wrote: > > > Hi, > > > > > > as you might have noticed, the [1] wiki Status page lists the > > > free-space-tree as 'Unstable', referencing a problem with the bitmap > > > endianity. This will affect only bigendian systems. > > > > > > There's one more problem that I overlooked but was pointed to by Omar > > > recently. The initial FST support in btrfs-progs is read-only, > > > > > > https://marc.info/?l=linux-btrfs&m=144113538017042 > > > > > > "However, we're not adding the FREE_SPACE_TREE read-only compat bit to > > > the set of supported bits because progs doesn't know how to keep the > > > free space tree consistent." > > > > > > Historically, the initial patchset version did not recognize FST feature > > > and thus refused to write, but then it was pointed by Qu and Holger that > > > the compat_ro bit is missing for FST. I've added it but this was a > > > mistake. This change is going to be reverted. > > > > I'm not sure I understand - can you explain why this is was so wrong? > > Or Omar maybe? > > > > If btrfsck wants to correct something (write), it can simply always > > and unconditionally invalidate the fst instead of trying to "repair" > > it..and I think that's what happens at the moment (at least I think > > it did for me recently). That seems like a safe and simple way. > I know this is what it does with the regular FSC, but I'm not sure if it > does so with the FST. If it doesn't, it probably should though. It doesn't. The free space cache is easy to invalidate because we can just compare the generation number of the superblock to that of the space cache, but as it exists now, the free space tree doesn't have anything equivalent. That means that any modifications that btrfs-progs made to a space_cache=v2 filesystem could potentially have caused filesystem corruption :/ However, I talked this through with Chris, and he came up with a great idea that will help us deal with both this issue and the endianness issue [1] in one fell swoop. Basically, my objection to adding a compat bit for the endianness bug was that it would unnecessarily affect the vast majority of users on x86; forcing those users to recreate their free space tree seemed silly. However, because of the btrfs-progs bug, just to be safe, we might as well force everyone to recreate their free space tree. The solution is to add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit isn't set, then we destroy and rebuild the free space tree. This covers the case of big-endian kernels which created broken free space trees and filesystems which could have possibly gone through btrfs-progs. This time we'll make sure not to make btrfs-progs think it can mount FREE_SPACE_TREE_VALID filesystems read-write. We can even have btrfs-progs check for that bit and clear it if it's mounting read-write. The next time it gets mounted, the kernel will recreate the tree. It's not pretty, but it'll work. 1: http://marc.info/?l=linux-btrfs&m=147159380621517&w=2 -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
On 2016-09-21 06:25, Holger Hoffstätte wrote: On 09/21/16 11:24, David Sterba wrote: Hi, as you might have noticed, the [1] wiki Status page lists the free-space-tree as 'Unstable', referencing a problem with the bitmap endianity. This will affect only bigendian systems. There's one more problem that I overlooked but was pointed to by Omar recently. The initial FST support in btrfs-progs is read-only, https://marc.info/?l=linux-btrfs&m=144113538017042 "However, we're not adding the FREE_SPACE_TREE read-only compat bit to the set of supported bits because progs doesn't know how to keep the free space tree consistent." Historically, the initial patchset version did not recognize FST feature and thus refused to write, but then it was pointed by Qu and Holger that the compat_ro bit is missing for FST. I've added it but this was a mistake. This change is going to be reverted. I'm not sure I understand - can you explain why this is was so wrong? Or Omar maybe? If btrfsck wants to correct something (write), it can simply always and unconditionally invalidate the fst instead of trying to "repair" it..and I think that's what happens at the moment (at least I think it did for me recently). That seems like a safe and simple way. I know this is what it does with the regular FSC, but I'm not sure if it does so with the FST. If it doesn't, it probably should though. The fst by itself seems to be working really well even in extreme scenarios (Stefan Priebe's 50+ TB filesystems come to mind, which fell over on a regular basis before I merged the fst into my "stable++" kernels), and considering how many bugs we've seen in the past from the v1 cache, I was really hoping that we can make v2 the default soon instead of skirting around the issue even longer. I entirely agree regarding this. I've not tested it on anywhere near that scale, but I've been running it since shortly after support was present in both stable kernels and userspace, and I've not seen a single issue on any of my systems except for the aforementioned big endian issue (which hasn't really affected me that much, since all my big endian systems are test VM's anyway). Of course if there are plain old runtime bugs in the fst then they need to be considered, but this seems to be about btrfs-progs support. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of free-space-tree feature
On 09/21/16 11:24, David Sterba wrote: > Hi, > > as you might have noticed, the [1] wiki Status page lists the > free-space-tree as 'Unstable', referencing a problem with the bitmap > endianity. This will affect only bigendian systems. > > There's one more problem that I overlooked but was pointed to by Omar > recently. The initial FST support in btrfs-progs is read-only, > > https://marc.info/?l=linux-btrfs&m=144113538017042 > > "However, we're not adding the FREE_SPACE_TREE read-only compat bit to > the set of supported bits because progs doesn't know how to keep the > free space tree consistent." > > Historically, the initial patchset version did not recognize FST feature > and thus refused to write, but then it was pointed by Qu and Holger that > the compat_ro bit is missing for FST. I've added it but this was a > mistake. This change is going to be reverted. I'm not sure I understand - can you explain why this is was so wrong? Or Omar maybe? If btrfsck wants to correct something (write), it can simply always and unconditionally invalidate the fst instead of trying to "repair" it..and I think that's what happens at the moment (at least I think it did for me recently). That seems like a safe and simple way. The fst by itself seems to be working really well even in extreme scenarios (Stefan Priebe's 50+ TB filesystems come to mind, which fell over on a regular basis before I merged the fst into my "stable++" kernels), and considering how many bugs we've seen in the past from the v1 cache, I was really hoping that we can make v2 the default soon instead of skirting around the issue even longer. Of course if there are plain old runtime bugs in the fst then they need to be considered, but this seems to be about btrfs-progs support. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html