Re: Will Btrfs have an official command to "uncow" existing files?
On Mon, Aug 22, 2016 at 04:06:13PM -0700, Darrick J. Wong wrote: > [add Dave and Christoph to cc] > > On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote: > > On 8/21/16 2:59 PM, Tomokhov Alexander wrote: > > > Btrfs wiki FAQ gives a link to example Python script: > > > https://github.com/stsquad/scripts/blob/master/uncow.py > > > > > > But such a crucial and fundamental tool must exist in stock btrfs-progs. > > > Filesystem with CoW technology at it's core must provide user sufficient > > > control over CoW aspects. Running 3rd-party or manually written scripts > > > for filesystem properties/metadata manipulation is not convenient, not > > > safe and definitely not the way it must be done. > > > > > > Also is it possible (at least in theory) to "uncow" files being currently > > > opened in-place? Without the trickery with creation & renaming of files > > > or directories. So that running "chattr +C" on a file would be > > > sufficient. If possible, is it going to be implemented? > > > > XFS is looking to do this via fallocate using a flag that all file > > systems can choose to honor. Once that lands, it would make sense for > > btrfs to use it as well. The idea is that when you pass the flag in, we > > examine the range and CoW anything that has a refcount != 1. > > There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE. However, > Christoph and Dave felt[1] that the fallocate call didn't need to have > an explicit 'unshare' mode because unsharing shared blocks is > necessary to guarantee that a subsequent write will not ENOSPC. I > felt that was sufficient justification to withdraw the unshare mode > flag. If you fallocate the entire length of a shared file on XFS, it > will turn off CoW for that file until you reflink/dedupe it again. >From the XFS POV that's all good... > At the time I wondered whether or not the btrfs developers (the list > was cc'd) would pipe up in support of the unshare flag, but nobody > did. Consequently it remains nonexistent. Christoph commented a few > months ago about unsharing fallocate over NFS atop XFS blocking for a > long time, though nobody asked for 'unshare' to be reinstated as a > separate fallocate mode, much less a 'don't unshare' flag for regular > fallocate mode. If there are other use cases, then we can easily implement it in XFS. However, let's not overload the XFS reflink code with things other fs devs have once said "that'd be nice to do" > (FWIW I'm ok with not having to fight for more VFS changes. :)) > > > That code hasn't landed yet though. The last time I saw it posted was > > June. I don't speak with knowledge of the integration plan, but it > > might just be queued up for the next merge window now that the reverse > > mapping patches have landed in 4.8. > > I am going to try to land XFS reflink in 4.9; I hope to have an eighth > patchset out for review at the end of the week. > > So... if the btrfs folks really want an unshare flag I can trivially > re-add it to the VFS headers and re-enable it in the XFS > implementation but y'all better speak up now and hammer out an > acceptable definition. I don't think XFS needs a new flag. It's not urgent - it can be added at any time so I'd say it something we should ignore on the XFS side of things until someone actually requires an explicit "unshare" operation for another filesystem Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will Btrfs have an official command to "uncow" existing files?
On Mon, Aug 22, 2016 at 5:06 PM, Darrick J. Wongwrote: > [add Dave and Christoph to cc] > > On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote: >> On 8/21/16 2:59 PM, Tomokhov Alexander wrote: >> > Btrfs wiki FAQ gives a link to example Python script: >> > https://github.com/stsquad/scripts/blob/master/uncow.py >> > >> > But such a crucial and fundamental tool must exist in stock btrfs-progs. >> > Filesystem with CoW technology at it's core must provide user sufficient >> > control over CoW aspects. Running 3rd-party or manually written scripts >> > for filesystem properties/metadata manipulation is not convenient, not >> > safe and definitely not the way it must be done. >> > >> > Also is it possible (at least in theory) to "uncow" files being currently >> > opened in-place? Without the trickery with creation & renaming of files or >> > directories. So that running "chattr +C" on a file would be sufficient. If >> > possible, is it going to be implemented? >> >> XFS is looking to do this via fallocate using a flag that all file >> systems can choose to honor. Once that lands, it would make sense for >> btrfs to use it as well. The idea is that when you pass the flag in, we >> examine the range and CoW anything that has a refcount != 1. > > There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE. However, > Christoph and Dave felt[1] that the fallocate call didn't need to have > an explicit 'unshare' mode because unsharing shared blocks is > necessary to guarantee that a subsequent write will not ENOSPC. I > felt that was sufficient justification to withdraw the unshare mode > flag. If you fallocate the entire length of a shared file on XFS, it > will turn off CoW for that file until you reflink/dedupe it again. > > At the time I wondered whether or not the btrfs developers (the list > was cc'd) would pipe up in support of the unshare flag, but nobody > did. Consequently it remains nonexistent. Christoph commented a few > months ago about unsharing fallocate over NFS atop XFS blocking for a > long time, though nobody asked for 'unshare' to be reinstated as a > separate fallocate mode, much less a 'don't unshare' flag for regular > fallocate mode. > > (FWIW I'm ok with not having to fight for more VFS changes. :)) > >> That code hasn't landed yet though. The last time I saw it posted was >> June. I don't speak with knowledge of the integration plan, but it >> might just be queued up for the next merge window now that the reverse >> mapping patches have landed in 4.8. > > I am going to try to land XFS reflink in 4.9; I hope to have an eighth > patchset out for review at the end of the week. > > So... if the btrfs folks really want an unshare flag I can trivially > re-add it to the VFS headers and re-enable it in the XFS > implementation but y'all better speak up now and hammer out an > acceptable definition. I don't think XFS needs a new flag. Use case wise I can't think of why I'd want to do unshare. There is a use case for wanting to set nocow after the fact. I have no idea what complexity is added on the Btrfs side for either operation, it seems like at the least to set it, data csum needs a way to be ignored or removed; and conversely to unset nocow it's a question whether that means the file should have csum's computed, strictly speaking I guess you could have cow without datacsum. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug
On Mon, Aug 22, 2016 at 6:26 PM, Jeff Mahoneywrote: > On 8/22/16 6:51 PM, Chris Murphy wrote: >> Trivially reproducible every boot, shortly after mount happens. Also >> happened with rc2. >> > > Yep. We've disabled this in our kernels. It can actually deadlock. Interesting. I've only gotten scary looking messages so far. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug
On 8/22/16 6:51 PM, Chris Murphy wrote: > Trivially reproducible every boot, shortly after mount happens. Also > happened with rc2. > Yep. We've disabled this in our kernels. It can actually deadlock. -Jeff > > [ 13.225891] virbr0: port 1(virbr0-nic) entered blocking state > [ 13.225895] virbr0: port 1(virbr0-nic) entered listening state > [ 13.299806] virbr0: port 1(virbr0-nic) entered disabled state > > [ 13.309179] = > [ 13.309181] [ INFO: possible recursive locking detected ] > [ 13.309182] 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 Not tainted > [ 13.309183] - > [ 13.309185] libvirt_leasesh/1174 is trying to acquire lock: > [ 13.309186] (>log_mutex){+.+...}, at: [] > btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309212] >but task is already holding lock: > [ 13.309213] (>log_mutex){+.+...}, at: [] > btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309229] >other info that might help us debug this: > [ 13.309230] Possible unsafe locking scenario: > > [ 13.309231]CPU0 > [ 13.309232] > [ 13.309233] lock(>log_mutex); > [ 13.309235] lock(>log_mutex); > [ 13.309237] > *** DEADLOCK *** > > [ 13.309238] May be due to missing lock nesting notation > > [ 13.309240] 6 locks held by libvirt_leasesh/1174: > [ 13.309241] #0: (sb_writers#8){.+.+.+}, at: [] > __sb_start_write+0xb4/0xf0 > [ 13.309247] #1: (>i_mutex_dir_key#3/1){+.+.+.}, at: > [] lock_rename+0xda/0x100 > [ 13.309252] #2: (>s_type->i_mutex_key#14){+.+.+.}, at: > [] lock_two_nondirectories+0x3e/0x70 > [ 13.309258] #3: (>s_type->i_mutex_key#14/4){+.+...}, at: > [] lock_two_nondirectories+0x66/0x70 > [ 13.309263] #4: (sb_internal){.+.+.+}, at: [] > __sb_start_write+0x78/0xf0 > [ 13.309266] #5: (>log_mutex){+.+...}, at: > [] btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309282] >stack backtrace: > [ 13.309284] CPU: 2 PID: 1174 Comm: libvirt_leasesh Not tainted > 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 > [ 13.309285] Hardware name: Apple Inc. > MacBookPro8,2/Mac-94245A3940C91C80, BIOS > MBP81.88Z.0047.B2C.1510261540 10/26/15 > [ 13.309287] 0086 03b4acd5 891b0a1d77a0 > 9f466723 > [ 13.309290] a0b07910 891adcf44000 891b0a1d7868 > 9f10f01e > [ 13.309294] dcf44a78 0006 b967c054 > a0408900 > [ 13.309297] Call Trace: > [ 13.309301] [] dump_stack+0x86/0xc3 > [ 13.309303] [] __lock_acquire+0x78e/0x1290 > [ 13.309306] [] ? sched_clock+0x9/0x10 > [ 13.309309] [] ? sched_clock_cpu+0xa7/0xc0 > [ 13.309312] [] ? mutex_unlock+0xe/0x10 > [ 13.309314] [] lock_acquire+0xf6/0x1f0 > [ 13.309326] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309328] [] mutex_lock_nested+0x86/0x3f0 > [ 13.309340] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309341] [] ? mutex_unlock+0xe/0x10 > [ 13.309353] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309365] [] btrfs_log_inode+0x162/0x10f0 [btrfs] > [ 13.309368] [] ? __might_sleep+0x49/0x80 > [ 13.309380] [] btrfs_log_inode+0xc8c/0x10f0 [btrfs] > [ 13.309382] [] ? sched_clock+0x9/0x10 > [ 13.309394] [] btrfs_log_inode_parent+0x240/0x940 > [btrfs] > [ 13.309396] [] ? _raw_spin_unlock+0x27/0x40 > [ 13.309408] [] ? btrfs_update_inode+0xda/0x110 [btrfs] > [ 13.309420] [] btrfs_log_new_name+0x71/0x90 [btrfs] > [ 13.309432] [] btrfs_rename2+0x1090/0x17a0 [btrfs] > [ 13.309434] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 13.309437] [] ? lock_two_nondirectories+0x66/0x70 > [ 13.309439] [] vfs_rename+0x5c2/0x970 > [ 13.309441] [] ? legitimize_path.isra.34+0x20/0x60 > [ 13.309443] [] SyS_rename+0x3a7/0x3d0 > [ 13.309445] [] entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 13.318819] device virbr0-nic left promiscuous mode > [ 13.318854] virbr0: port 1(virbr0-nic) entered disabled state > -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: Will Btrfs have an official command to "uncow" existing files?
Thanks for the in-depth answer. Well, "simple enough process" is still a sequence of steps which must be carefully done. In a proper order with correct parameters depending on environment. It's work with data, data which can be invaluable. No, really, I'm not a beginner user. I use Arch Linux everyday from 2009, program in different languages and so on. But even a few ordered manual commands for changing file attribute involving quite dangerous "mv" and "cp" (overwrite case) is something very suspicious. "2a" - step depends on whether another filesystem simply exists, better if it's free, has enough space, supports same file permission features, etc. Requires time to figure out these conditions, not suitable for all systems. "2b" - step with "mv out". Out to where? What if the file with the same name already exists in a destination directory you "mv out". Not reliable. Ok, need to create temporary directory. Where, how to call it then - involves conditional checks performed by user. Similarly creation of empty file should also satisfy the condition that it's name is unique in the directory. Additionally all existing ways of "uncow" require manual free space check beforehand. User must control and monitor if the file is currently not opened. I'm sure I missed something else. These all are problems that are unrelated to file attributes itself, but user must think of them for some reason. An official specialized tool could automatically track all these conditions, perform the right sequence of actions and report to user results. Yes I do take into consideration that there are situations when "uncow" cannot be actually applied to a file for the reasons you described. No snapshots atm in my case and, for example, I have firefox sqlite database file with 900+ extents on a rotational disk. I wouldn't say it's noticeable, but at least desire the number of extents not to increase further so that I won't notice it ever. I admit that Btrfs may defragment it, but may not. Sometimes we need a more controllable approach. 22.08.2016, 05:00, "Duncan" <1i5t5.dun...@cox.net>: > Tomokhov Alexander posted on Sun, 21 Aug 2016 21:59:36 +0300 as excerpted: > >> Btrfs wiki FAQ gives a link to example Python script: >> https://github.com/stsquad/scripts/blob/master/uncow.py >> >> But such a crucial and fundamental tool must exist in stock btrfs-progs. >> Filesystem with CoW technology at it's core must provide user sufficient >> control over CoW aspects. Running 3rd-party or manually written scripts >> for filesystem properties/metadata manipulation is not convenient, not >> safe and definitely not the way it must be done. > > Why? No script or dedicated tool needed as it's a simple enough process. > > Simply: > > 1. chattr +C (that being nocow) the containing directory. > > Then either: > > 2a. mv the file to and from another filesystem, so it's actually created > new in the directory and thus inherits the nocow attribute at file > creation, > > or > > 2b. mv out and then cp the file back into place with --reflink=never, > again, forcing the file to be created new in the directory, so it > inherits the nocow attribute at creation, > > OR (replacing both steps above) > > Create the empty file (using touch or similar), set it nocow, and use cat > srcfile >> destfile style redirection to fill it, so the file again gets > the nocow attribute set before it has content, but allowing you to set > only the file nocow, without setting the containing directory nocow. > > Of course there's no exception here to the general case, if you're doing > the same thing to a whole bunch of files, setting up a script to do it > may be more efficient than doing it to each one manually one by one, and > a script could be useful there, but that's a general rule, nothing > exceptional for btrfs nocow, and a script or fancy tool isn't actually > required, regardless. > > The point being, cow is the default case, and should work /reasonably/ > well in most cases, certainly well enough so that normal people doing > normal things shouldn't need to worry about it. The only people who will > need to worry about it, therefore, are people worried about the last bit > of optimization possible to various corner-case use-cases that don't > match default assumptions very well, and it's precisely these sorts of > people that are /technical/ enough to be able to understand both why they > might want nocow (and what the positives and negatives are going to be), > and how to actually get it. > >> Also is it possible (at least in theory) to "uncow" files being >> currently opened in-place? Without the trickery with creation & renaming >> of files or directories. So that running "chattr +C" on a file would be >> sufficient. If possible, is it going to be implemented? > > It's software. Of course it's possible, tho it's also possible the > negatives make it not worth the trouble. If the implementation simply > creates a new file and
Re: Will Btrfs have an official command to "uncow" existing files?
Oh, didn't know that XFS is going to have many of Btrfs features and continues to evolve. Thank you for the answer. 22.08.2016, 23:14, "Jeff Mahoney": > On 8/21/16 2:59 PM, Tomokhov Alexander wrote: >> Btrfs wiki FAQ gives a link to example Python script: >> https://github.com/stsquad/scripts/blob/master/uncow.py >> >> But such a crucial and fundamental tool must exist in stock btrfs-progs. >> Filesystem with CoW technology at it's core must provide user sufficient >> control over CoW aspects. Running 3rd-party or manually written scripts for >> filesystem properties/metadata manipulation is not convenient, not safe and >> definitely not the way it must be done. >> >> Also is it possible (at least in theory) to "uncow" files being currently >> opened in-place? Without the trickery with creation & renaming of files or >> directories. So that running "chattr +C" on a file would be sufficient. If >> possible, is it going to be implemented? > > XFS is looking to do this via fallocate using a flag that all file > systems can choose to honor. Once that lands, it would make sense for > btrfs to use it as well. The idea is that when you pass the flag in, we > examine the range and CoW anything that has a refcount != 1. > > That code hasn't landed yet though. The last time I saw it posted was > June. I don't speak with knowledge of the integration plan, but it > might just be queued up for the next merge window now that the reverse > mapping patches have landed in 4.8. > > -Jeff > > -- > Jeff Mahoney > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will Btrfs have an official command to "uncow" existing files?
[add Dave and Christoph to cc] On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote: > On 8/21/16 2:59 PM, Tomokhov Alexander wrote: > > Btrfs wiki FAQ gives a link to example Python script: > > https://github.com/stsquad/scripts/blob/master/uncow.py > > > > But such a crucial and fundamental tool must exist in stock btrfs-progs. > > Filesystem with CoW technology at it's core must provide user sufficient > > control over CoW aspects. Running 3rd-party or manually written scripts for > > filesystem properties/metadata manipulation is not convenient, not safe and > > definitely not the way it must be done. > > > > Also is it possible (at least in theory) to "uncow" files being currently > > opened in-place? Without the trickery with creation & renaming of files or > > directories. So that running "chattr +C" on a file would be sufficient. If > > possible, is it going to be implemented? > > XFS is looking to do this via fallocate using a flag that all file > systems can choose to honor. Once that lands, it would make sense for > btrfs to use it as well. The idea is that when you pass the flag in, we > examine the range and CoW anything that has a refcount != 1. There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE. However, Christoph and Dave felt[1] that the fallocate call didn't need to have an explicit 'unshare' mode because unsharing shared blocks is necessary to guarantee that a subsequent write will not ENOSPC. I felt that was sufficient justification to withdraw the unshare mode flag. If you fallocate the entire length of a shared file on XFS, it will turn off CoW for that file until you reflink/dedupe it again. At the time I wondered whether or not the btrfs developers (the list was cc'd) would pipe up in support of the unshare flag, but nobody did. Consequently it remains nonexistent. Christoph commented a few months ago about unsharing fallocate over NFS atop XFS blocking for a long time, though nobody asked for 'unshare' to be reinstated as a separate fallocate mode, much less a 'don't unshare' flag for regular fallocate mode. (FWIW I'm ok with not having to fight for more VFS changes. :)) > That code hasn't landed yet though. The last time I saw it posted was > June. I don't speak with knowledge of the integration plan, but it > might just be queued up for the next merge window now that the reverse > mapping patches have landed in 4.8. I am going to try to land XFS reflink in 4.9; I hope to have an eighth patchset out for review at the end of the week. So... if the btrfs folks really want an unshare flag I can trivially re-add it to the VFS headers and re-enable it in the XFS implementation but y'all better speak up now and hammer out an acceptable definition. I don't think XFS needs a new flag. --D [1] https://www.spinics.net/lists/linux-nfs/msg54740.html > > -Jeff > > -- > Jeff Mahoney > SUSE Labs > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug
Trivially reproducible every boot, shortly after mount happens. Also happened with rc2. [ 13.225891] virbr0: port 1(virbr0-nic) entered blocking state [ 13.225895] virbr0: port 1(virbr0-nic) entered listening state [ 13.299806] virbr0: port 1(virbr0-nic) entered disabled state [ 13.309179] = [ 13.309181] [ INFO: possible recursive locking detected ] [ 13.309182] 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 Not tainted [ 13.309183] - [ 13.309185] libvirt_leasesh/1174 is trying to acquire lock: [ 13.309186] (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309212] but task is already holding lock: [ 13.309213] (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309229] other info that might help us debug this: [ 13.309230] Possible unsafe locking scenario: [ 13.309231]CPU0 [ 13.309232] [ 13.309233] lock(>log_mutex); [ 13.309235] lock(>log_mutex); [ 13.309237] *** DEADLOCK *** [ 13.309238] May be due to missing lock nesting notation [ 13.309240] 6 locks held by libvirt_leasesh/1174: [ 13.309241] #0: (sb_writers#8){.+.+.+}, at: [] __sb_start_write+0xb4/0xf0 [ 13.309247] #1: (>i_mutex_dir_key#3/1){+.+.+.}, at: [] lock_rename+0xda/0x100 [ 13.309252] #2: (>s_type->i_mutex_key#14){+.+.+.}, at: [] lock_two_nondirectories+0x3e/0x70 [ 13.309258] #3: (>s_type->i_mutex_key#14/4){+.+...}, at: [] lock_two_nondirectories+0x66/0x70 [ 13.309263] #4: (sb_internal){.+.+.+}, at: [] __sb_start_write+0x78/0xf0 [ 13.309266] #5: (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309282] stack backtrace: [ 13.309284] CPU: 2 PID: 1174 Comm: libvirt_leasesh Not tainted 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 [ 13.309285] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOS MBP81.88Z.0047.B2C.1510261540 10/26/15 [ 13.309287] 0086 03b4acd5 891b0a1d77a0 9f466723 [ 13.309290] a0b07910 891adcf44000 891b0a1d7868 9f10f01e [ 13.309294] dcf44a78 0006 b967c054 a0408900 [ 13.309297] Call Trace: [ 13.309301] [] dump_stack+0x86/0xc3 [ 13.309303] [] __lock_acquire+0x78e/0x1290 [ 13.309306] [] ? sched_clock+0x9/0x10 [ 13.309309] [] ? sched_clock_cpu+0xa7/0xc0 [ 13.309312] [] ? mutex_unlock+0xe/0x10 [ 13.309314] [] lock_acquire+0xf6/0x1f0 [ 13.309326] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309328] [] mutex_lock_nested+0x86/0x3f0 [ 13.309340] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309341] [] ? mutex_unlock+0xe/0x10 [ 13.309353] [] ? btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309365] [] btrfs_log_inode+0x162/0x10f0 [btrfs] [ 13.309368] [] ? __might_sleep+0x49/0x80 [ 13.309380] [] btrfs_log_inode+0xc8c/0x10f0 [btrfs] [ 13.309382] [] ? sched_clock+0x9/0x10 [ 13.309394] [] btrfs_log_inode_parent+0x240/0x940 [btrfs] [ 13.309396] [] ? _raw_spin_unlock+0x27/0x40 [ 13.309408] [] ? btrfs_update_inode+0xda/0x110 [btrfs] [ 13.309420] [] btrfs_log_new_name+0x71/0x90 [btrfs] [ 13.309432] [] btrfs_rename2+0x1090/0x17a0 [btrfs] [ 13.309434] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 13.309437] [] ? lock_two_nondirectories+0x66/0x70 [ 13.309439] [] vfs_rename+0x5c2/0x970 [ 13.309441] [] ? legitimize_path.isra.34+0x20/0x60 [ 13.309443] [] SyS_rename+0x3a7/0x3d0 [ 13.309445] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 13.318819] device virbr0-nic left promiscuous mode [ 13.318854] virbr0: port 1(virbr0-nic) entered disabled state -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
Em Seg, 2016-08-22 às 14:49 -0600, Chris Murphy escreveu: > This is really weird. I'm running 4.7.0 (Fedora) and I'm not > experiencing problems, let alone this. What is this kernel's > provenance? Is it a plain mainline 4.7.0 that you built? I'm not > really sure what to recommend except maybe going back to 4.5.7 or > 4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in > this regard. > Well, I'm using the default openSUSE kernel here. And I have been seen this errors for sometimes. When I reported it, I was using v4.6.1. Hence, I think the version of btrfs-progs is not the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
On Mon, Aug 22, 2016 at 2:39 PM, Ronan Arraes Jardim Chagaswrote: > The same thing just happened again! And now it was also fixed > automatically, but now I have: > > Metadata,DUP: Size:33.50GiB, Used:812.78MiB This is really weird. I'm running 4.7.0 (Fedora) and I'm not experiencing problems, let alone this. What is this kernel's provenance? Is it a plain mainline 4.7.0 that you built? I'm not really sure what to recommend except maybe going back to 4.5.7 or 4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in this regard. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
The same thing just happened again! And now it was also fixed automatically, but now I have: Metadata,DUP: Size:33.50GiB, Used:812.78MiB -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will Btrfs have an official command to "uncow" existing files?
On 8/21/16 2:59 PM, Tomokhov Alexander wrote: > Btrfs wiki FAQ gives a link to example Python script: > https://github.com/stsquad/scripts/blob/master/uncow.py > > But such a crucial and fundamental tool must exist in stock btrfs-progs. > Filesystem with CoW technology at it's core must provide user sufficient > control over CoW aspects. Running 3rd-party or manually written scripts for > filesystem properties/metadata manipulation is not convenient, not safe and > definitely not the way it must be done. > > Also is it possible (at least in theory) to "uncow" files being currently > opened in-place? Without the trickery with creation & renaming of files or > directories. So that running "chattr +C" on a file would be sufficient. If > possible, is it going to be implemented? XFS is looking to do this via fallocate using a flag that all file systems can choose to honor. Once that lands, it would make sense for btrfs to use it as well. The idea is that when you pass the flag in, we examine the range and CoW anything that has a refcount != 1. That code hasn't landed yet though. The last time I saw it posted was June. I don't speak with knowledge of the integration plan, but it might just be queued up for the next merge window now that the reverse mapping patches have landed in 4.8. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
New information guys! I formatted using the latest Tumbleweed snapshot (btrfs-progs v4.7+20160729) and I still have the same problem. I notice two things. First, when I see the "No space left on device", it is fixed when the Metadata space increases **a lot**. For example, when the error first occurred, I had: Metadata, DUP: total=2.00GiB, used=811.52MiB After waiting a while (could not run balance), it was automatically fixed and then I have: Metadata, DUP: total=9.50GiB, used=811.52MiB During the error, when I ran the balance command, I see these messages in `dmesg`: Ago 22 16:00:03 ronanarraes-osd kernel: BTRFS info (device sda6): relocating block group 9323937792 flags 34 Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): found 1 extents Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): 1 enospc errors during balance Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6): relocating block group 36201037824 flags 34 Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6): 2 enospc errors during balance Ago 22 16:00:45 ronanarraes-osd kernel: BTRFS info (device sda6): relocating block group 36234592256 flags 34 Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): found 1 extents Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): 4 enospc errors during balance Ago 22 16:01:20 ronanarraes-osd kernel: BTRFS info (device sda6): relocating block group 38415630336 flags 34 Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): found 1 extents Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): 8 enospc errors during balance Does it add anything relevant to the problem? Regards, Ronan Arraes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] remove mapping from balance_dirty_pages*()
Hi Josef, [auto build test ERROR on linus/master] [also build test ERROR on v4.8-rc3 next-20160822] [cannot apply to linux/master] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Josef-Bacik/Provide-accounting-for-dirty-metadata/20160823-014222 config: sparc64-allyesconfig (attached as .config) compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=sparc64 All error/warnings (new ones prefixed by >>): fs/ntfs/attrib.c: In function 'ntfs_attr_set': >> fs/ntfs/attrib.c:2549:35: error: implicit declaration of function >> 'inode_to_bdi' [-Werror=implicit-function-declaration] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ >> fs/ntfs/attrib.c:2549:35: warning: passing argument 1 of >> 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast >> [-Wint-conversion] In file included from include/linux/memcontrol.h:30:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ fs/ntfs/attrib.c:2591:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ In file included from include/linux/memcontrol.h:30:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ fs/ntfs/attrib.c:2609:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ In file included from include/linux/memcontrol.h:30:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ cc1: some warnings being treated as errors vim +/inode_to_bdi +2549 fs/ntfs/attrib.c 2543 kaddr = kmap_atomic(page); 2544 memset(kaddr + start_ofs, val, size - start_ofs); 2545 flush_dcache_page(page); 2546 kunmap_atomic(kaddr); 2547 set_page_dirty(page); 2548 put_page(page); > 2549 balance_dirty_pages_ratelimited(inode_to_bdi(inode), 2550 inode->i_sb); 2551 cond_resched(); 2552 if (idx == end) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH 2/3] writeback: allow for dirty metadata accounting
Provide a mechanism for file systems to indicate how much dirty metadata they are holding. This introduces a few things 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY. 2) WB stat for dirty metadata. This way we know if we need to try and call into the file system to write out metadata. This could potentially be used in the future to make balancing of dirty pages smarter. Signed-off-by: Josef Bacik--- arch/tile/mm/pgtable.c | 3 +- drivers/base/node.c | 2 + fs/fs-writeback.c| 1 + fs/proc/meminfo.c| 2 + include/linux/backing-dev-defs.h | 1 + include/linux/mm.h | 7 +++ include/linux/mmzone.h | 1 + include/trace/events/writeback.h | 7 ++- mm/backing-dev.c | 2 + mm/page-writeback.c | 100 +-- mm/page_alloc.c | 7 ++- mm/vmscan.c | 3 +- 12 files changed, 127 insertions(+), 9 deletions(-) diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c index 7cc6ee7..9543468 100644 --- a/arch/tile/mm/pgtable.c +++ b/arch/tile/mm/pgtable.c @@ -44,12 +44,13 @@ void show_mem(unsigned int filter) { struct zone *zone; - pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n", + pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n", (global_node_page_state(NR_ACTIVE_ANON) + global_node_page_state(NR_ACTIVE_FILE)), (global_node_page_state(NR_INACTIVE_ANON) + global_node_page_state(NR_INACTIVE_FILE)), global_node_page_state(NR_FILE_DIRTY), + global_node_page_state(NR_METADATA_DIRTY), global_node_page_state(NR_WRITEBACK), global_node_page_state(NR_UNSTABLE_NFS), global_page_state(NR_FREE_PAGES), diff --git a/drivers/base/node.c b/drivers/base/node.c index 5548f96..efc867b2 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -99,6 +99,7 @@ static ssize_t node_read_meminfo(struct device *dev, #endif n += sprintf(buf + n, "Node %d Dirty: %8lu kB\n" + "Node %d MetadataDirty: %8lu kB\n" "Node %d Writeback: %8lu kB\n" "Node %d FilePages: %8lu kB\n" "Node %d Mapped: %8lu kB\n" @@ -119,6 +120,7 @@ static ssize_t node_read_meminfo(struct device *dev, #endif , nid, K(node_page_state(pgdat, NR_FILE_DIRTY)), + nid, K(node_page_state(pgdat, NR_METADATA_DIRTY)), nid, K(node_page_state(pgdat, NR_WRITEBACK)), nid, K(node_page_state(pgdat, NR_FILE_PAGES)), nid, K(node_page_state(pgdat, NR_FILE_MAPPED)), diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 56c8fda..d329f89 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void) { return global_node_page_state(NR_FILE_DIRTY) + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_METADATA_DIRTY) + get_nr_dirty_inodes(); } diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 09e18fd..8ca094f 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -80,6 +80,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) "SwapTotal: %8lu kB\n" "SwapFree: %8lu kB\n" "Dirty: %8lu kB\n" + "MetadataDirty: %8lu kB\n" "Writeback: %8lu kB\n" "AnonPages: %8lu kB\n" "Mapped: %8lu kB\n" @@ -139,6 +140,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) K(i.totalswap), K(i.freeswap), K(global_node_page_state(NR_FILE_DIRTY)), + K(global_node_page_state(NR_METADATA_DIRTY)), K(global_node_page_state(NR_WRITEBACK)), K(global_node_page_state(NR_ANON_MAPPED)), K(global_node_page_state(NR_FILE_MAPPED)), diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 3f10307..1200aae 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -34,6 +34,7 @@ typedef int (congested_fn)(void *, int); enum wb_stat_item { WB_RECLAIMABLE, WB_WRITEBACK, + WB_METADATA_DIRTY, WB_DIRTIED, WB_WRITTEN, NR_WB_STAT_ITEMS diff --git a/include/linux/mm.h b/include/linux/mm.h index 08ed53e..5a3f626 100644
[PATCH 3/3] writeback: introduce super_operations->write_metadata
Now that we have metadata counters in the VM, we need to provide a way to kick writeback on dirty metadata. Introduce super_operations->write_metadata. This allows file systems to deal with writing back any dirty metadata we need based on the writeback needs of the system. Since there is no inode to key off of we need a list in the bdi for dirty super blocks to be added. From there we can find any dirty sb's on the bdi we are currently doing writeback on and call into their ->write_metadata callback. Signed-off-by: Josef Bacik--- fs/fs-writeback.c| 58 +--- fs/super.c | 7 + include/linux/backing-dev-defs.h | 2 ++ include/linux/fs.h | 4 +++ mm/backing-dev.c | 1 + 5 files changed, 69 insertions(+), 3 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index d329f89..b7d8946 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1615,11 +1615,36 @@ static long writeback_sb_inodes(struct super_block *sb, return wrote; } +static long writeback_sb_metadata(struct super_block *sb, + struct bdi_writeback *wb, + struct wb_writeback_work *work) +{ + struct writeback_control wbc = { + .sync_mode = work->sync_mode, + .tagged_writepages = work->tagged_writepages, + .for_kupdate= work->for_kupdate, + .for_background = work->for_background, + .for_sync = work->for_sync, + .range_cyclic = work->range_cyclic, + .range_start= 0, + .range_end = LLONG_MAX, + }; + long write_chunk; + + write_chunk = writeback_chunk_size(wb, work); + wbc.nr_to_write = write_chunk; + sb->s_op->write_metadata(sb, ); + work->nr_pages -= write_chunk - wbc.nr_to_write; + + return write_chunk - wbc.nr_to_write; +} + static long __writeback_inodes_wb(struct bdi_writeback *wb, struct wb_writeback_work *work) { unsigned long start_time = jiffies; long wrote = 0; + bool done = false; while (!list_empty(>b_io)) { struct inode *inode = wb_inode(wb->b_io.prev); @@ -1639,11 +1664,38 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb, /* refer to the same tests at the end of writeback_sb_inodes */ if (wrote) { - if (time_is_before_jiffies(start_time + HZ / 10UL)) - break; - if (work->nr_pages <= 0) + if (time_is_before_jiffies(start_time + HZ / 10UL) || + work->nr_pages <= 0) { + done = true; break; + } + } + } + + if (!done && wb_stat(wb, WB_METADATA_DIRTY)) { + LIST_HEAD(list); + + spin_unlock(>list_lock); + spin_lock(>bdi->sb_list_lock); + list_splice_init(>bdi->dirty_sb_list, ); + while (!list_empty()) { + struct super_block *sb; + + sb = list_first_entry(, struct super_block, + s_bdi_list); + list_move_tail(>s_bdi_list, + >bdi->dirty_sb_list); + if (!sb->s_op->write_metadata) + continue; + if (!trylock_super(sb)) + continue; + spin_unlock(>bdi->sb_list_lock); + wrote += writeback_sb_metadata(sb, wb, work); + spin_lock(>bdi->sb_list_lock); + up_read(>s_umount); } + spin_unlock(>bdi->sb_list_lock); + spin_lock(>list_lock); } /* Leave any unwritten inodes on b_io */ return wrote; diff --git a/fs/super.c b/fs/super.c index c2ff475..c1b1028 100644 --- a/fs/super.c +++ b/fs/super.c @@ -215,6 +215,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, spin_lock_init(>s_inode_list_lock); INIT_LIST_HEAD(>s_inodes_wb); spin_lock_init(>s_inode_wblist_lock); + INIT_LIST_HEAD(>s_bdi_list); if (list_lru_init_memcg(>s_dentry_lru)) goto fail; @@ -305,6 +306,8 @@ void deactivate_locked_super(struct super_block *s) { struct file_system_type *fs = s->s_type; if (atomic_dec_and_test(>s_active)) { + struct backing_dev_info *bdi = s->s_bdi; + cleancache_invalidate_fs(s); unregister_shrinker(>s_shrink); fs->kill_sb(s);
[PATCH 0/3][V2] Provide accounting for dirty metadata
Here is my updated set of patches for providing a way for fs'es to do their own accounting for their dirty metadata pages. The changes since V1 include -Split the accounting + ->write_metadata patches out into their own patches. -Added a few more account_metadata* helpers that I hadn't thought about previously. -Changed the bdi->sb_list to bdi->dirty_sb_list. This is to avoid confusion about the purpose of the list. I do a splice of this list when processing it as we have to drop the list lock and I didn't want to worry about umounts screwing up the list while we were writing metadata. -Added the dirty metadata counter to the various places we output those counters (meminfo, oom messages, etc). I've also actually changed btrfs to use these interfaces and have been testing that code for almost a week now and have fixed up the various problems that happend with V1 of this code. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] remove mapping from balance_dirty_pages*()
The only reason we pass in the mapping is to get the inode in order to see if writeback cgroups is enabled, and even then it only checks the bdi and a super block flag. balance_dirty_pages() doesn't even use the mapping. Since balance_dirty_pages*() works on a bdi level, just pass in the bdi and super block directly so we can avoid using mapping. This will allow us to still use balance_dirty_pages for dirty metadata pages that are not backed by an address_mapping. Signed-off-by: Josef BacikReviewed-by: Jan Kara --- drivers/mtd/devices/block2mtd.c | 12 fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/file.c | 3 ++- fs/btrfs/ioctl.c| 3 ++- fs/btrfs/relocation.c | 3 ++- fs/buffer.c | 3 ++- fs/iomap.c | 3 ++- fs/ntfs/attrib.c| 10 +++--- fs/ntfs/file.c | 4 ++-- include/linux/backing-dev.h | 29 +++-- include/linux/writeback.h | 3 ++- mm/filemap.c| 4 +++- mm/memory.c | 9 +++-- mm/page-writeback.c | 15 +++ 14 files changed, 71 insertions(+), 34 deletions(-) diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c index 7c887f1..7892d0b 100644 --- a/drivers/mtd/devices/block2mtd.c +++ b/drivers/mtd/devices/block2mtd.c @@ -52,7 +52,8 @@ static struct page *page_read(struct address_space *mapping, int index) /* erase a specified part of the device */ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len) { - struct address_space *mapping = dev->blkdev->bd_inode->i_mapping; + struct inode *inode = dev->blkdev->bd_inode; + struct address_space *mapping = inode->i_mapping; struct page *page; int index = to >> PAGE_SHIFT; // page index int pages = len >> PAGE_SHIFT; @@ -71,7 +72,8 @@ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len) memset(page_address(page), 0xff, PAGE_SIZE); set_page_dirty(page); unlock_page(page); - balance_dirty_pages_ratelimited(mapping); + balance_dirty_pages_ratelimited(inode_to_bdi(inode), + inode->i_sb); break; } @@ -141,7 +143,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf, loff_t to, size_t len, size_t *retlen) { struct page *page; - struct address_space *mapping = dev->blkdev->bd_inode->i_mapping; + struct inode *inode = dev->blkdev->bd_inode; + struct address_space *mapping = inode->i_mapping; int index = to >> PAGE_SHIFT; // page index int offset = to & ~PAGE_MASK; // page offset int cpylen; @@ -162,7 +165,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf, memcpy(page_address(page) + offset, buf, cpylen); set_page_dirty(page); unlock_page(page); - balance_dirty_pages_ratelimited(mapping); + balance_dirty_pages_ratelimited(inode_to_bdi(inode), + inode->i_sb); } put_page(page); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 87dad55..4034ad6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4024,8 +4024,8 @@ static void __btrfs_btree_balance_dirty(struct btrfs_root *root, ret = percpu_counter_compare(>fs_info->dirty_metadata_bytes, BTRFS_DIRTY_METADATA_THRESH); if (ret > 0) { - balance_dirty_pages_ratelimited( - root->fs_info->btree_inode->i_mapping); + balance_dirty_pages_ratelimited(>fs_info->bdi, + root->fs_info->sb); } } diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9404121..f060b08 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1686,7 +1686,8 @@ again: cond_resched(); - balance_dirty_pages_ratelimited(inode->i_mapping); + balance_dirty_pages_ratelimited(inode_to_bdi(inode), + inode->i_sb); if (dirty_pages < (root->nodesize >> PAGE_SHIFT) + 1) btrfs_btree_balance_dirty(root); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 14ed1e9..a222bad 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1410,7 +1410,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, } defrag_count +=
Re: [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes
On Mon, Aug 22 2016 at 4:05am -0400, Lukas Herboltwrote: > Hello, > > There is patch from Mike. It's part of current pull request to 4.8-rc1 > For more details check: > - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html > - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html > > Lukas > > On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruo wrote: > > Hi, Mike and btrfs and dm guys > > > > When doing regression test on v4.8-rc1, we found that fstests/btrfs/056 > > always fails. With the following dmesg: > > --- > > Buffer I/O error on dev dm-0, logical block 1310704, async page read > > Buffer I/O error on dev dm-0, logical block 16, async page read > > Buffer I/O error on dev dm-0, logical block 16, async page read > > --- > > > > And bisect leads to the following commits: > > --- > > commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 > > Author: Mike Snitzer > > Date: Fri Jul 29 13:19:55 2016 -0400 > > > > dm flakey: error READ bios during the down_interval > > --- > > > > While according to the document of dm-flakey, it says that when using > > drop_writes feature, read bios are not affected: > > --- > > drop_writes: > > All write I/O is silently ignored. > > Read I/O is handled correctly. I went back to the dm-flakey.c code at the time that the 'drop_writes' feature was added via commit b26f5e3d. It does confirm your understanding of how reads should be handled if drop_writes is enabled. Not sure why I thought differently. Please try the following patch. diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 97e446d..6a2e8dd 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -289,15 +289,13 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) pb->bio_submitted = true; /* -* Map reads as normal only if corrupt_bio_byte set. +* Error reads if neither corrupt_bio_byte or drop_writes are set. +* Otherwise, flakey_end_io() will decide if the reads should be modified. */ if (bio_data_dir(bio) == READ) { - /* If flags were specified, only corrupt those that match. */ - if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && - all_corrupt_bio_flags_match(bio, fc)) - goto map_bio; - else + if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, >flags)) return -EIO; + goto map_bio; } /* @@ -334,14 +332,21 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) struct flakey_c *fc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); - /* -* Corrupt successful READs while in down state. -*/ if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { - if (fc->corrupt_bio_byte) + if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && + all_corrupt_bio_flags_match(bio, fc)) { + /* +* Corrupt successful matching READs while in down state. +*/ corrupt_bio_data(bio, fc); - else + + } else if (!test_bit(DROP_WRITES, >flags)) { + /* +* Error read during the down_interval if drop_writes +* wasn't configured. +*/ return -EIO; + } } return error; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes
Hi Qu, Sorry for the confusion. Reading the email again and the code it seems that the READS are really returned as -EIO if you set the drop_writes. I just tested it and you are right. If I was reading the fstest correctly the flakey is created as: --- flakey: 0 409600 flakey 8:64 0 0 180 1 drop_writes --- I believe the READs are dropped because it does not have any flags set. --- if (bio_data_dir(bio) == READ) { /* If flags were specified, only corrupt those that match. */ if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && all_corrupt_bio_flags_match(bio, fc)) goto map_bio; else return -EIO; } --- with conclusion of setting: --- /* * Flag this bio as submitted while down. */ pb->bio_submitted = true; --- I have quick test patch ready, but it probably broke more thing than fixes so I will continue on it. Just in case you want to test it. Diff is done again 4.8-rc1 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -292,6 +292,11 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) * Map reads as normal only if corrupt_bio_byte set. */ if (bio_data_dir(bio) == READ) { +/* We should retunr all READS as ok in case of DROP WRITES flag is set. */ + if (test_bit(DROP_WRITES, >flags)) { + pb->bio_submitted = false; + goto map_bio; + } /* If flags were specified, only corrupt those that match. */ if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && all_corrupt_bio_flags_match(bio, fc)) On Mon, Aug 22, 2016 at 10:05 AM, Lukas Herboltwrote: > Hello, > > There is patch from Mike. It's part of current pull request to 4.8-rc1 > For more details check: > - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html > - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html > > Lukas > > On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruo wrote: >> Hi, Mike and btrfs and dm guys >> >> When doing regression test on v4.8-rc1, we found that fstests/btrfs/056 >> always fails. With the following dmesg: >> --- >> Buffer I/O error on dev dm-0, logical block 1310704, async page read >> Buffer I/O error on dev dm-0, logical block 16, async page read >> Buffer I/O error on dev dm-0, logical block 16, async page read >> --- >> >> And bisect leads to the following commits: >> --- >> commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 >> Author: Mike Snitzer >> Date: Fri Jul 29 13:19:55 2016 -0400 >> >> dm flakey: error READ bios during the down_interval >> --- >> >> While according to the document of dm-flakey, it says that when using >> drop_writes feature, read bios are not affected: >> --- >> drop_writes: >> All write I/O is silently ignored. >> Read I/O is handled correctly. >> --- >> >> If I understand the word "correctly" correctly, it should means READ I/0 is >> handled without problem. >> >> However with this commit, it also corrupt the read bio, leading to the test >> failure. >> >> >> At least there are two fixes available here; >> 1) Fix fstest scripts >>The related macro is "_flakey_drop_and_remount yes", which will >>check the fs during the "drop_writes" time. >> >>Currently, only btrfs/056 calls "_flakey_drop_and_remount" with >>"yes". So other test cases are not affected. >> >>However, even we move the fsck outside of the "drop_writes" range, >>although test case can pass without problem, but we will still >>get a dmesg error: >>"Buffer I/O error on dev dm-0, logical block 1310704, async page read" >> >> 2) Revert to flakey behavior to allow READ bio >>Then everything is back to the old good days. >> >> Not sure which one is correct for current use case, as I'm not familiar with >> dm codes. >> >> Any idea to fix dm-flaky and keep the READ bio behavior? >> >> Thanks, >> Qu >> >> >> >> >> >> -- >> dm-devel mailing list >> dm-de...@redhat.com >> https://www.redhat.com/mailman/listinfo/dm-devel > > > > -- > Lukas Herbolt > RHCE, RH436, BSc, SSc > Senior Technical Support Engineer > Global Support Services (GSS) > Email:lherb...@redhat.com -- Lukas Herbolt RHCE, RH436, BSc, SSc Senior Technical Support Engineer Global Support Services (GSS) Email:lherb...@redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes
Hello, There is patch from Mike. It's part of current pull request to 4.8-rc1 For more details check: - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html Lukas On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruowrote: > Hi, Mike and btrfs and dm guys > > When doing regression test on v4.8-rc1, we found that fstests/btrfs/056 > always fails. With the following dmesg: > --- > Buffer I/O error on dev dm-0, logical block 1310704, async page read > Buffer I/O error on dev dm-0, logical block 16, async page read > Buffer I/O error on dev dm-0, logical block 16, async page read > --- > > And bisect leads to the following commits: > --- > commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 > Author: Mike Snitzer > Date: Fri Jul 29 13:19:55 2016 -0400 > > dm flakey: error READ bios during the down_interval > --- > > While according to the document of dm-flakey, it says that when using > drop_writes feature, read bios are not affected: > --- > drop_writes: > All write I/O is silently ignored. > Read I/O is handled correctly. > --- > > If I understand the word "correctly" correctly, it should means READ I/0 is > handled without problem. > > However with this commit, it also corrupt the read bio, leading to the test > failure. > > > At least there are two fixes available here; > 1) Fix fstest scripts >The related macro is "_flakey_drop_and_remount yes", which will >check the fs during the "drop_writes" time. > >Currently, only btrfs/056 calls "_flakey_drop_and_remount" with >"yes". So other test cases are not affected. > >However, even we move the fsck outside of the "drop_writes" range, >although test case can pass without problem, but we will still >get a dmesg error: >"Buffer I/O error on dev dm-0, logical block 1310704, async page read" > > 2) Revert to flakey behavior to allow READ bio >Then everything is back to the old good days. > > Not sure which one is correct for current use case, as I'm not familiar with > dm codes. > > Any idea to fix dm-flaky and keep the READ bio behavior? > > Thanks, > Qu > > > > > > -- > dm-devel mailing list > dm-de...@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- Lukas Herbolt RHCE, RH436, BSc, SSc Senior Technical Support Engineer Global Support Services (GSS) Email:lherb...@redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes
Hi, Mike and btrfs and dm guys When doing regression test on v4.8-rc1, we found that fstests/btrfs/056 always fails. With the following dmesg: --- Buffer I/O error on dev dm-0, logical block 1310704, async page read Buffer I/O error on dev dm-0, logical block 16, async page read Buffer I/O error on dev dm-0, logical block 16, async page read --- And bisect leads to the following commits: --- commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 Author: Mike SnitzerDate: Fri Jul 29 13:19:55 2016 -0400 dm flakey: error READ bios during the down_interval --- While according to the document of dm-flakey, it says that when using drop_writes feature, read bios are not affected: --- drop_writes: All write I/O is silently ignored. Read I/O is handled correctly. --- If I understand the word "correctly" correctly, it should means READ I/0 is handled without problem. However with this commit, it also corrupt the read bio, leading to the test failure. At least there are two fixes available here; 1) Fix fstest scripts The related macro is "_flakey_drop_and_remount yes", which will check the fs during the "drop_writes" time. Currently, only btrfs/056 calls "_flakey_drop_and_remount" with "yes". So other test cases are not affected. However, even we move the fsck outside of the "drop_writes" range, although test case can pass without problem, but we will still get a dmesg error: "Buffer I/O error on dev dm-0, logical block 1310704, async page read" 2) Revert to flakey behavior to allow READ bio Then everything is back to the old good days. Not sure which one is correct for current use case, as I'm not familiar with dm codes. Any idea to fix dm-flaky and keep the READ bio behavior? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html