Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Dave Chinner
On Mon, Aug 22, 2016 at 04:06:13PM -0700, Darrick J. Wong wrote:
> [add Dave and Christoph to cc]
> 
> On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote:
> > On 8/21/16 2:59 PM, Tomokhov Alexander wrote:
> > > Btrfs wiki FAQ gives a link to example Python script: 
> > > https://github.com/stsquad/scripts/blob/master/uncow.py
> > > 
> > > But such a crucial and fundamental tool must exist in stock btrfs-progs. 
> > > Filesystem with CoW technology at it's core must provide user sufficient 
> > > control over CoW aspects. Running 3rd-party or manually written scripts 
> > > for filesystem properties/metadata manipulation is not convenient, not 
> > > safe and definitely not the way it must be done.
> > > 
> > > Also is it possible (at least in theory) to "uncow" files being currently 
> > > opened in-place? Without the trickery with creation & renaming of files 
> > > or directories. So that running "chattr +C" on a file would be 
> > > sufficient. If possible, is it going to be implemented?
> > 
> > XFS is looking to do this via fallocate using a flag that all file
> > systems can choose to honor.  Once that lands, it would make sense for
> > btrfs to use it as well.  The idea is that when you pass the flag in, we
> > examine the range and CoW anything that has a refcount != 1.
> 
> There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE.  However,
> Christoph and Dave felt[1] that the fallocate call didn't need to have
> an explicit 'unshare' mode because unsharing shared blocks is
> necessary to guarantee that a subsequent write will not ENOSPC.  I
> felt that was sufficient justification to withdraw the unshare mode
> flag.  If you fallocate the entire length of a shared file on XFS, it
> will turn off CoW for that file until you reflink/dedupe it again.

>From the XFS POV that's all good...

> At the time I wondered whether or not the btrfs developers (the list
> was cc'd) would pipe up in support of the unshare flag, but nobody
> did.  Consequently it remains nonexistent.  Christoph commented a few
> months ago about unsharing fallocate over NFS atop XFS blocking for a
> long time, though nobody asked for 'unshare' to be reinstated as a
> separate fallocate mode, much less a 'don't unshare' flag for regular
> fallocate mode.

If there are other use cases, then we can easily implement it in
XFS. However, let's not overload the XFS reflink code with things
other fs devs have once said "that'd be nice to do"

> (FWIW I'm ok with not having to fight for more VFS changes. :))
> 
> > That code hasn't landed yet though.  The last time I saw it posted was
> > June.  I don't speak with knowledge of the integration plan, but it
> > might just be queued up for the next merge window now that the reverse
> > mapping patches have landed in 4.8.
> 
> I am going to try to land XFS reflink in 4.9; I hope to have an eighth
> patchset out for review at the end of the week.
> 
> So... if the btrfs folks really want an unshare flag I can trivially
> re-add it to the VFS headers and re-enable it in the XFS
> implementation  but y'all better speak up now and hammer out an
> acceptable definition.  I don't think XFS needs a new flag.

It's not urgent - it can be added at any time so I'd say it
something we should ignore on the XFS side of things until someone
actually requires an explicit "unshare" operation for another
filesystem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Chris Murphy
On Mon, Aug 22, 2016 at 5:06 PM, Darrick J. Wong
 wrote:
> [add Dave and Christoph to cc]
>
> On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote:
>> On 8/21/16 2:59 PM, Tomokhov Alexander wrote:
>> > Btrfs wiki FAQ gives a link to example Python script: 
>> > https://github.com/stsquad/scripts/blob/master/uncow.py
>> >
>> > But such a crucial and fundamental tool must exist in stock btrfs-progs. 
>> > Filesystem with CoW technology at it's core must provide user sufficient 
>> > control over CoW aspects. Running 3rd-party or manually written scripts 
>> > for filesystem properties/metadata manipulation is not convenient, not 
>> > safe and definitely not the way it must be done.
>> >
>> > Also is it possible (at least in theory) to "uncow" files being currently 
>> > opened in-place? Without the trickery with creation & renaming of files or 
>> > directories. So that running "chattr +C" on a file would be sufficient. If 
>> > possible, is it going to be implemented?
>>
>> XFS is looking to do this via fallocate using a flag that all file
>> systems can choose to honor.  Once that lands, it would make sense for
>> btrfs to use it as well.  The idea is that when you pass the flag in, we
>> examine the range and CoW anything that has a refcount != 1.
>
> There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE.  However,
> Christoph and Dave felt[1] that the fallocate call didn't need to have
> an explicit 'unshare' mode because unsharing shared blocks is
> necessary to guarantee that a subsequent write will not ENOSPC.  I
> felt that was sufficient justification to withdraw the unshare mode
> flag.  If you fallocate the entire length of a shared file on XFS, it
> will turn off CoW for that file until you reflink/dedupe it again.
>
> At the time I wondered whether or not the btrfs developers (the list
> was cc'd) would pipe up in support of the unshare flag, but nobody
> did.  Consequently it remains nonexistent.  Christoph commented a few
> months ago about unsharing fallocate over NFS atop XFS blocking for a
> long time, though nobody asked for 'unshare' to be reinstated as a
> separate fallocate mode, much less a 'don't unshare' flag for regular
> fallocate mode.
>
> (FWIW I'm ok with not having to fight for more VFS changes. :))
>
>> That code hasn't landed yet though.  The last time I saw it posted was
>> June.  I don't speak with knowledge of the integration plan, but it
>> might just be queued up for the next merge window now that the reverse
>> mapping patches have landed in 4.8.
>
> I am going to try to land XFS reflink in 4.9; I hope to have an eighth
> patchset out for review at the end of the week.
>
> So... if the btrfs folks really want an unshare flag I can trivially
> re-add it to the VFS headers and re-enable it in the XFS
> implementation  but y'all better speak up now and hammer out an
> acceptable definition.  I don't think XFS needs a new flag.

Use case wise I can't think of why I'd want to do unshare. There is a
use case for wanting to set nocow after the fact. I have no idea what
complexity is added on the Btrfs side for either operation, it seems
like at the least to set it, data csum needs a way to be ignored or
removed; and conversely to unset nocow it's a question whether that
means the file should have csum's computed, strictly speaking I guess
you could have cow without datacsum.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug

2016-08-22 Thread Chris Murphy
On Mon, Aug 22, 2016 at 6:26 PM, Jeff Mahoney  wrote:
> On 8/22/16 6:51 PM, Chris Murphy wrote:
>> Trivially reproducible every boot, shortly after mount happens. Also
>> happened with rc2.
>>
>
> Yep.  We've disabled this in our kernels.  It can actually deadlock.

Interesting. I've only gotten scary looking messages so far.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug

2016-08-22 Thread Jeff Mahoney
On 8/22/16 6:51 PM, Chris Murphy wrote:
> Trivially reproducible every boot, shortly after mount happens. Also
> happened with rc2.
> 

Yep.  We've disabled this in our kernels.  It can actually deadlock.

-Jeff

> 
> [   13.225891] virbr0: port 1(virbr0-nic) entered blocking state
> [   13.225895] virbr0: port 1(virbr0-nic) entered listening state
> [   13.299806] virbr0: port 1(virbr0-nic) entered disabled state
> 
> [   13.309179] =
> [   13.309181] [ INFO: possible recursive locking detected ]
> [   13.309182] 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 Not tainted
> [   13.309183] -
> [   13.309185] libvirt_leasesh/1174 is trying to acquire lock:
> [   13.309186]  (>log_mutex){+.+...}, at: []
> btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309212]
>but task is already holding lock:
> [   13.309213]  (>log_mutex){+.+...}, at: []
> btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309229]
>other info that might help us debug this:
> [   13.309230]  Possible unsafe locking scenario:
> 
> [   13.309231]CPU0
> [   13.309232]
> [   13.309233]   lock(>log_mutex);
> [   13.309235]   lock(>log_mutex);
> [   13.309237]
> *** DEADLOCK ***
> 
> [   13.309238]  May be due to missing lock nesting notation
> 
> [   13.309240] 6 locks held by libvirt_leasesh/1174:
> [   13.309241]  #0:  (sb_writers#8){.+.+.+}, at: []
> __sb_start_write+0xb4/0xf0
> [   13.309247]  #1:  (>i_mutex_dir_key#3/1){+.+.+.}, at:
> [] lock_rename+0xda/0x100
> [   13.309252]  #2:  (>s_type->i_mutex_key#14){+.+.+.}, at:
> [] lock_two_nondirectories+0x3e/0x70
> [   13.309258]  #3:  (>s_type->i_mutex_key#14/4){+.+...}, at:
> [] lock_two_nondirectories+0x66/0x70
> [   13.309263]  #4:  (sb_internal){.+.+.+}, at: []
> __sb_start_write+0x78/0xf0
> [   13.309266]  #5:  (>log_mutex){+.+...}, at:
> [] btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309282]
>stack backtrace:
> [   13.309284] CPU: 2 PID: 1174 Comm: libvirt_leasesh Not tainted
> 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1
> [   13.309285] Hardware name: Apple Inc.
> MacBookPro8,2/Mac-94245A3940C91C80, BIOS
> MBP81.88Z.0047.B2C.1510261540 10/26/15
> [   13.309287]  0086 03b4acd5 891b0a1d77a0
> 9f466723
> [   13.309290]  a0b07910 891adcf44000 891b0a1d7868
> 9f10f01e
> [   13.309294]  dcf44a78 0006 b967c054
> a0408900
> [   13.309297] Call Trace:
> [   13.309301]  [] dump_stack+0x86/0xc3
> [   13.309303]  [] __lock_acquire+0x78e/0x1290
> [   13.309306]  [] ? sched_clock+0x9/0x10
> [   13.309309]  [] ? sched_clock_cpu+0xa7/0xc0
> [   13.309312]  [] ? mutex_unlock+0xe/0x10
> [   13.309314]  [] lock_acquire+0xf6/0x1f0
> [   13.309326]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309328]  [] mutex_lock_nested+0x86/0x3f0
> [   13.309340]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309341]  [] ? mutex_unlock+0xe/0x10
> [   13.309353]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309365]  [] btrfs_log_inode+0x162/0x10f0 [btrfs]
> [   13.309368]  [] ? __might_sleep+0x49/0x80
> [   13.309380]  [] btrfs_log_inode+0xc8c/0x10f0 [btrfs]
> [   13.309382]  [] ? sched_clock+0x9/0x10
> [   13.309394]  [] btrfs_log_inode_parent+0x240/0x940 
> [btrfs]
> [   13.309396]  [] ? _raw_spin_unlock+0x27/0x40
> [   13.309408]  [] ? btrfs_update_inode+0xda/0x110 [btrfs]
> [   13.309420]  [] btrfs_log_new_name+0x71/0x90 [btrfs]
> [   13.309432]  [] btrfs_rename2+0x1090/0x17a0 [btrfs]
> [   13.309434]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [   13.309437]  [] ? lock_two_nondirectories+0x66/0x70
> [   13.309439]  [] vfs_rename+0x5c2/0x970
> [   13.309441]  [] ? legitimize_path.isra.34+0x20/0x60
> [   13.309443]  [] SyS_rename+0x3a7/0x3d0
> [   13.309445]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
> [   13.318819] device virbr0-nic left promiscuous mode
> [   13.318854] virbr0: port 1(virbr0-nic) entered disabled state
> 


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Tomokhov Alexander
Thanks for the in-depth answer.

Well, "simple enough process" is still a sequence of steps which must be 
carefully done. In a proper order with correct parameters depending on 
environment. It's work with data, data which can be invaluable.
No, really, I'm not a beginner user. I use Arch Linux everyday from 2009, 
program in different languages and so on. But even a few ordered manual 
commands for changing file attribute involving quite dangerous "mv" and "cp" 
(overwrite case) is something very suspicious.

"2a" - step depends on whether another filesystem simply exists, better if it's 
free, has enough space, supports same file permission features, etc. Requires 
time to figure out these conditions, not suitable for all systems.

"2b" - step with "mv out". Out to where? What if the file with the same name 
already exists in a destination directory you "mv out". Not reliable. Ok, need 
to create temporary directory. Where, how to call it then - involves 
conditional checks performed by user.

Similarly creation of empty file should also satisfy the condition that it's 
name is unique in the directory.

Additionally all existing ways of "uncow" require manual free space check 
beforehand. User must control and monitor if the file is currently not opened. 
I'm sure I missed something else.
These all are problems that are unrelated to file attributes itself, but user 
must think of them for some reason. An official specialized tool could 
automatically track all these conditions, perform the right sequence of actions 
and report to user results.

Yes I do take into consideration that there are situations when "uncow" cannot 
be actually applied to a file for the reasons you described. No snapshots atm 
in my case and, for example, I have firefox sqlite database file with 900+ 
extents on a rotational disk. I wouldn't say it's noticeable, but at least 
desire the number of extents not to increase further so that I won't notice it 
ever. I admit that Btrfs may defragment it, but may not. Sometimes we need a 
more controllable approach.

22.08.2016, 05:00, "Duncan" <1i5t5.dun...@cox.net>:
> Tomokhov Alexander posted on Sun, 21 Aug 2016 21:59:36 +0300 as excerpted:
>
>>  Btrfs wiki FAQ gives a link to example Python script:
>>  https://github.com/stsquad/scripts/blob/master/uncow.py
>>
>>  But such a crucial and fundamental tool must exist in stock btrfs-progs.
>>  Filesystem with CoW technology at it's core must provide user sufficient
>>  control over CoW aspects. Running 3rd-party or manually written scripts
>>  for filesystem properties/metadata manipulation is not convenient, not
>>  safe and definitely not the way it must be done.
>
> Why? No script or dedicated tool needed as it's a simple enough process.
>
> Simply:
>
> 1. chattr +C (that being nocow) the containing directory.
>
> Then either:
>
> 2a. mv the file to and from another filesystem, so it's actually created
> new in the directory and thus inherits the nocow attribute at file
> creation,
>
> or
>
> 2b. mv out and then cp the file back into place with --reflink=never,
> again, forcing the file to be created new in the directory, so it
> inherits the nocow attribute at creation,
>
> OR (replacing both steps above)
>
> Create the empty file (using touch or similar), set it nocow, and use cat
> srcfile >> destfile style redirection to fill it, so the file again gets
> the nocow attribute set before it has content, but allowing you to set
> only the file nocow, without setting the containing directory nocow.
>
> Of course there's no exception here to the general case, if you're doing
> the same thing to a whole bunch of files, setting up a script to do it
> may be more efficient than doing it to each one manually one by one, and
> a script could be useful there, but that's a general rule, nothing
> exceptional for btrfs nocow, and a script or fancy tool isn't actually
> required, regardless.
>
> The point being, cow is the default case, and should work /reasonably/
> well in most cases, certainly well enough so that normal people doing
> normal things shouldn't need to worry about it. The only people who will
> need to worry about it, therefore, are people worried about the last bit
> of optimization possible to various corner-case use-cases that don't
> match default assumptions very well, and it's precisely these sorts of
> people that are /technical/ enough to be able to understand both why they
> might want nocow (and what the positives and negatives are going to be),
> and how to actually get it.
>
>>  Also is it possible (at least in theory) to "uncow" files being
>>  currently opened in-place? Without the trickery with creation & renaming
>>  of files or directories. So that running "chattr +C" on a file would be
>>  sufficient. If possible, is it going to be implemented?
>
> It's software. Of course it's possible, tho it's also possible the
> negatives make it not worth the trouble. If the implementation simply
> creates a new file and 

Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Tomokhov Alexander
Oh, didn't know that XFS is going to have many of Btrfs features and continues 
to evolve. Thank you for the answer.

22.08.2016, 23:14, "Jeff Mahoney" :
> On 8/21/16 2:59 PM, Tomokhov Alexander wrote:
>>  Btrfs wiki FAQ gives a link to example Python script: 
>> https://github.com/stsquad/scripts/blob/master/uncow.py
>>
>>  But such a crucial and fundamental tool must exist in stock btrfs-progs. 
>> Filesystem with CoW technology at it's core must provide user sufficient 
>> control over CoW aspects. Running 3rd-party or manually written scripts for 
>> filesystem properties/metadata manipulation is not convenient, not safe and 
>> definitely not the way it must be done.
>>
>>  Also is it possible (at least in theory) to "uncow" files being currently 
>> opened in-place? Without the trickery with creation & renaming of files or 
>> directories. So that running "chattr +C" on a file would be sufficient. If 
>> possible, is it going to be implemented?
>
> XFS is looking to do this via fallocate using a flag that all file
> systems can choose to honor. Once that lands, it would make sense for
> btrfs to use it as well. The idea is that when you pass the flag in, we
> examine the range and CoW anything that has a refcount != 1.
>
> That code hasn't landed yet though. The last time I saw it posted was
> June. I don't speak with knowledge of the integration plan, but it
> might just be queued up for the next merge window now that the reverse
> mapping patches have landed in 4.8.
>
> -Jeff
>
> --
> Jeff Mahoney
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Darrick J. Wong
[add Dave and Christoph to cc]

On Mon, Aug 22, 2016 at 04:14:19PM -0400, Jeff Mahoney wrote:
> On 8/21/16 2:59 PM, Tomokhov Alexander wrote:
> > Btrfs wiki FAQ gives a link to example Python script: 
> > https://github.com/stsquad/scripts/blob/master/uncow.py
> > 
> > But such a crucial and fundamental tool must exist in stock btrfs-progs. 
> > Filesystem with CoW technology at it's core must provide user sufficient 
> > control over CoW aspects. Running 3rd-party or manually written scripts for 
> > filesystem properties/metadata manipulation is not convenient, not safe and 
> > definitely not the way it must be done.
> > 
> > Also is it possible (at least in theory) to "uncow" files being currently 
> > opened in-place? Without the trickery with creation & renaming of files or 
> > directories. So that running "chattr +C" on a file would be sufficient. If 
> > possible, is it going to be implemented?
> 
> XFS is looking to do this via fallocate using a flag that all file
> systems can choose to honor.  Once that lands, it would make sense for
> btrfs to use it as well.  The idea is that when you pass the flag in, we
> examine the range and CoW anything that has a refcount != 1.

There /was/ a flag to do that -- FALLOC_FL_UNSHARE_RANGE.  However,
Christoph and Dave felt[1] that the fallocate call didn't need to have
an explicit 'unshare' mode because unsharing shared blocks is
necessary to guarantee that a subsequent write will not ENOSPC.  I
felt that was sufficient justification to withdraw the unshare mode
flag.  If you fallocate the entire length of a shared file on XFS, it
will turn off CoW for that file until you reflink/dedupe it again.

At the time I wondered whether or not the btrfs developers (the list
was cc'd) would pipe up in support of the unshare flag, but nobody
did.  Consequently it remains nonexistent.  Christoph commented a few
months ago about unsharing fallocate over NFS atop XFS blocking for a
long time, though nobody asked for 'unshare' to be reinstated as a
separate fallocate mode, much less a 'don't unshare' flag for regular
fallocate mode.

(FWIW I'm ok with not having to fight for more VFS changes. :))

> That code hasn't landed yet though.  The last time I saw it posted was
> June.  I don't speak with knowledge of the integration plan, but it
> might just be queued up for the next merge window now that the reverse
> mapping patches have landed in 4.8.

I am going to try to land XFS reflink in 4.9; I hope to have an eighth
patchset out for review at the end of the week.

So... if the btrfs folks really want an unshare flag I can trivially
re-add it to the VFS headers and re-enable it in the XFS
implementation  but y'all better speak up now and hammer out an
acceptable definition.  I don't think XFS needs a new flag.

--D

[1] https://www.spinics.net/lists/linux-nfs/msg54740.html

> 
> -Jeff
> 
> -- 
> Jeff Mahoney
> SUSE Labs
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible recursive locking detected, 4.8.0-0.rc3.git0.1.fc25.x86_64+debug

2016-08-22 Thread Chris Murphy
Trivially reproducible every boot, shortly after mount happens. Also
happened with rc2.



[   13.225891] virbr0: port 1(virbr0-nic) entered blocking state
[   13.225895] virbr0: port 1(virbr0-nic) entered listening state
[   13.299806] virbr0: port 1(virbr0-nic) entered disabled state

[   13.309179] =
[   13.309181] [ INFO: possible recursive locking detected ]
[   13.309182] 4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1 Not tainted
[   13.309183] -
[   13.309185] libvirt_leasesh/1174 is trying to acquire lock:
[   13.309186]  (>log_mutex){+.+...}, at: []
btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309212]
   but task is already holding lock:
[   13.309213]  (>log_mutex){+.+...}, at: []
btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309229]
   other info that might help us debug this:
[   13.309230]  Possible unsafe locking scenario:

[   13.309231]CPU0
[   13.309232]
[   13.309233]   lock(>log_mutex);
[   13.309235]   lock(>log_mutex);
[   13.309237]
*** DEADLOCK ***

[   13.309238]  May be due to missing lock nesting notation

[   13.309240] 6 locks held by libvirt_leasesh/1174:
[   13.309241]  #0:  (sb_writers#8){.+.+.+}, at: []
__sb_start_write+0xb4/0xf0
[   13.309247]  #1:  (>i_mutex_dir_key#3/1){+.+.+.}, at:
[] lock_rename+0xda/0x100
[   13.309252]  #2:  (>s_type->i_mutex_key#14){+.+.+.}, at:
[] lock_two_nondirectories+0x3e/0x70
[   13.309258]  #3:  (>s_type->i_mutex_key#14/4){+.+...}, at:
[] lock_two_nondirectories+0x66/0x70
[   13.309263]  #4:  (sb_internal){.+.+.+}, at: []
__sb_start_write+0x78/0xf0
[   13.309266]  #5:  (>log_mutex){+.+...}, at:
[] btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309282]
   stack backtrace:
[   13.309284] CPU: 2 PID: 1174 Comm: libvirt_leasesh Not tainted
4.8.0-0.rc3.git0.1.fc25.x86_64+debug #1
[   13.309285] Hardware name: Apple Inc.
MacBookPro8,2/Mac-94245A3940C91C80, BIOS
MBP81.88Z.0047.B2C.1510261540 10/26/15
[   13.309287]  0086 03b4acd5 891b0a1d77a0
9f466723
[   13.309290]  a0b07910 891adcf44000 891b0a1d7868
9f10f01e
[   13.309294]  dcf44a78 0006 b967c054
a0408900
[   13.309297] Call Trace:
[   13.309301]  [] dump_stack+0x86/0xc3
[   13.309303]  [] __lock_acquire+0x78e/0x1290
[   13.309306]  [] ? sched_clock+0x9/0x10
[   13.309309]  [] ? sched_clock_cpu+0xa7/0xc0
[   13.309312]  [] ? mutex_unlock+0xe/0x10
[   13.309314]  [] lock_acquire+0xf6/0x1f0
[   13.309326]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309328]  [] mutex_lock_nested+0x86/0x3f0
[   13.309340]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309341]  [] ? mutex_unlock+0xe/0x10
[   13.309353]  [] ? btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309365]  [] btrfs_log_inode+0x162/0x10f0 [btrfs]
[   13.309368]  [] ? __might_sleep+0x49/0x80
[   13.309380]  [] btrfs_log_inode+0xc8c/0x10f0 [btrfs]
[   13.309382]  [] ? sched_clock+0x9/0x10
[   13.309394]  [] btrfs_log_inode_parent+0x240/0x940 [btrfs]
[   13.309396]  [] ? _raw_spin_unlock+0x27/0x40
[   13.309408]  [] ? btrfs_update_inode+0xda/0x110 [btrfs]
[   13.309420]  [] btrfs_log_new_name+0x71/0x90 [btrfs]
[   13.309432]  [] btrfs_rename2+0x1090/0x17a0 [btrfs]
[   13.309434]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   13.309437]  [] ? lock_two_nondirectories+0x66/0x70
[   13.309439]  [] vfs_rename+0x5c2/0x970
[   13.309441]  [] ? legitimize_path.isra.34+0x20/0x60
[   13.309443]  [] SyS_rename+0x3a7/0x3d0
[   13.309445]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[   13.318819] device virbr0-nic left promiscuous mode
[   13.318854] virbr0: port 1(virbr0-nic) entered disabled state

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-22 Thread Ronan Arraes Jardim Chagas
Em Seg, 2016-08-22 às 14:49 -0600, Chris Murphy escreveu:
> This is really weird. I'm running 4.7.0 (Fedora) and I'm not
> experiencing problems, let alone this. What is this kernel's
> provenance? Is it a plain mainline 4.7.0 that you built? I'm not
> really sure what to recommend except maybe going back to 4.5.7 or
> 4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in
> this regard.
> 

Well, I'm using the default openSUSE kernel here. And I have been seen
this errors for sometimes. When I reported it, I was using v4.6.1.
Hence, I think the version of btrfs-progs is not the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-22 Thread Chris Murphy
On Mon, Aug 22, 2016 at 2:39 PM, Ronan Arraes Jardim Chagas
 wrote:
> The same thing just happened again! And now it was also fixed
> automatically, but now I have:
>
> Metadata,DUP: Size:33.50GiB, Used:812.78MiB

This is really weird. I'm running 4.7.0 (Fedora) and I'm not
experiencing problems, let alone this. What is this kernel's
provenance? Is it a plain mainline 4.7.0 that you built? I'm not
really sure what to recommend except maybe going back to 4.5.7 or
4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in
this regard.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-22 Thread Ronan Arraes Jardim Chagas
The same thing just happened again! And now it was also fixed
automatically, but now I have:

Metadata,DUP: Size:33.50GiB, Used:812.78MiB
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will Btrfs have an official command to "uncow" existing files?

2016-08-22 Thread Jeff Mahoney
On 8/21/16 2:59 PM, Tomokhov Alexander wrote:
> Btrfs wiki FAQ gives a link to example Python script: 
> https://github.com/stsquad/scripts/blob/master/uncow.py
> 
> But such a crucial and fundamental tool must exist in stock btrfs-progs. 
> Filesystem with CoW technology at it's core must provide user sufficient 
> control over CoW aspects. Running 3rd-party or manually written scripts for 
> filesystem properties/metadata manipulation is not convenient, not safe and 
> definitely not the way it must be done.
> 
> Also is it possible (at least in theory) to "uncow" files being currently 
> opened in-place? Without the trickery with creation & renaming of files or 
> directories. So that running "chattr +C" on a file would be sufficient. If 
> possible, is it going to be implemented?

XFS is looking to do this via fallocate using a flag that all file
systems can choose to honor.  Once that lands, it would make sense for
btrfs to use it as well.  The idea is that when you pass the flag in, we
examine the range and CoW anything that has a refcount != 1.

That code hasn't landed yet though.  The last time I saw it posted was
June.  I don't speak with knowledge of the integration plan, but it
might just be queued up for the next merge window now that the reverse
mapping patches have landed in 4.8.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-22 Thread Ronan Arraes Jardim Chagas
New information guys! I formatted using the latest Tumbleweed snapshot
(btrfs-progs v4.7+20160729) and I still have the same problem.

I notice two things. First, when I see the "No space left on device",
it is fixed when the Metadata space increases **a lot**. For example,
when the error first occurred, I had:

Metadata, DUP: total=2.00GiB, used=811.52MiB

After waiting a while (could not run balance), it was automatically
fixed and then I have:

Metadata, DUP: total=9.50GiB, used=811.52MiB

During the error, when I ran the balance command, I see these messages
in `dmesg`:

Ago 22 16:00:03 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 9323937792 flags 34
Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): 1
enospc errors during balance
Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 36201037824 flags 34
Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6): 2
enospc errors during balance
Ago 22 16:00:45 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 36234592256 flags 34
Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): 4
enospc errors during balance
Ago 22 16:01:20 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 38415630336 flags 34
Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): 8
enospc errors during balance

Does it add anything relevant to the problem?

Regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] remove mapping from balance_dirty_pages*()

2016-08-22 Thread kbuild test robot
Hi Josef,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.8-rc3 next-20160822]
[cannot apply to linux/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Josef-Bacik/Provide-accounting-for-dirty-metadata/20160823-014222
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   fs/ntfs/attrib.c: In function 'ntfs_attr_set':
>> fs/ntfs/attrib.c:2549:35: error: implicit declaration of function 
>> 'inode_to_bdi' [-Werror=implicit-function-declaration]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^
>> fs/ntfs/attrib.c:2549:35: warning: passing argument 1 of 
>> 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
>> [-Wint-conversion]
   In file included from include/linux/memcontrol.h:30:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^
   fs/ntfs/attrib.c:2591:35: warning: passing argument 1 of 
'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
[-Wint-conversion]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^
   In file included from include/linux/memcontrol.h:30:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^
   fs/ntfs/attrib.c:2609:35: warning: passing argument 1 of 
'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
[-Wint-conversion]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^
   In file included from include/linux/memcontrol.h:30:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:367:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^
   cc1: some warnings being treated as errors

vim +/inode_to_bdi +2549 fs/ntfs/attrib.c

  2543  kaddr = kmap_atomic(page);
  2544  memset(kaddr + start_ofs, val, size - start_ofs);
  2545  flush_dcache_page(page);
  2546  kunmap_atomic(kaddr);
  2547  set_page_dirty(page);
  2548  put_page(page);
> 2549  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  2550  inode->i_sb);
  2551  cond_resched();
  2552  if (idx == end)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH 2/3] writeback: allow for dirty metadata accounting

2016-08-22 Thread Josef Bacik
Provide a mechanism for file systems to indicate how much dirty metadata they
are holding.  This introduces a few things

1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
2) WB stat for dirty metadata.  This way we know if we need to try and call into
the file system to write out metadata.  This could potentially be used in the
future to make balancing of dirty pages smarter.

Signed-off-by: Josef Bacik 
---
 arch/tile/mm/pgtable.c   |   3 +-
 drivers/base/node.c  |   2 +
 fs/fs-writeback.c|   1 +
 fs/proc/meminfo.c|   2 +
 include/linux/backing-dev-defs.h |   1 +
 include/linux/mm.h   |   7 +++
 include/linux/mmzone.h   |   1 +
 include/trace/events/writeback.h |   7 ++-
 mm/backing-dev.c |   2 +
 mm/page-writeback.c  | 100 +--
 mm/page_alloc.c  |   7 ++-
 mm/vmscan.c  |   3 +-
 12 files changed, 127 insertions(+), 9 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7cc6ee7..9543468 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
 {
struct zone *zone;
 
-   pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu 
free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu 
swap:%lu\n",
+   pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu 
writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu 
bounce:%lu pagecache:%lu swap:%lu\n",
   (global_node_page_state(NR_ACTIVE_ANON) +
global_node_page_state(NR_ACTIVE_FILE)),
   (global_node_page_state(NR_INACTIVE_ANON) +
global_node_page_state(NR_INACTIVE_FILE)),
   global_node_page_state(NR_FILE_DIRTY),
+  global_node_page_state(NR_METADATA_DIRTY),
   global_node_page_state(NR_WRITEBACK),
   global_node_page_state(NR_UNSTABLE_NFS),
   global_page_state(NR_FREE_PAGES),
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..efc867b2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -99,6 +99,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
n += sprintf(buf + n,
   "Node %d Dirty:  %8lu kB\n"
+  "Node %d MetadataDirty:  %8lu kB\n"
   "Node %d Writeback:  %8lu kB\n"
   "Node %d FilePages:  %8lu kB\n"
   "Node %d Mapped: %8lu kB\n"
@@ -119,6 +120,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
,
   nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
+  nid, K(node_page_state(pgdat, NR_METADATA_DIRTY)),
   nid, K(node_page_state(pgdat, NR_WRITEBACK)),
   nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
   nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..d329f89 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
return global_node_page_state(NR_FILE_DIRTY) +
global_node_page_state(NR_UNSTABLE_NFS) +
+   global_node_page_state(NR_METADATA_DIRTY) +
get_nr_dirty_inodes();
 }
 
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 09e18fd..8ca094f 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -80,6 +80,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
"SwapTotal:  %8lu kB\n"
"SwapFree:   %8lu kB\n"
"Dirty:  %8lu kB\n"
+   "MetadataDirty:  %8lu kB\n"
"Writeback:  %8lu kB\n"
"AnonPages:  %8lu kB\n"
"Mapped: %8lu kB\n"
@@ -139,6 +140,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
K(i.totalswap),
K(i.freeswap),
K(global_node_page_state(NR_FILE_DIRTY)),
+   K(global_node_page_state(NR_METADATA_DIRTY)),
K(global_node_page_state(NR_WRITEBACK)),
K(global_node_page_state(NR_ANON_MAPPED)),
K(global_node_page_state(NR_FILE_MAPPED)),
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 3f10307..1200aae 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -34,6 +34,7 @@ typedef int (congested_fn)(void *, int);
 enum wb_stat_item {
WB_RECLAIMABLE,
WB_WRITEBACK,
+   WB_METADATA_DIRTY,
WB_DIRTIED,
WB_WRITTEN,
NR_WB_STAT_ITEMS
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..5a3f626 100644

[PATCH 3/3] writeback: introduce super_operations->write_metadata

2016-08-22 Thread Josef Bacik
Now that we have metadata counters in the VM, we need to provide a way to kick
writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
allows file systems to deal with writing back any dirty metadata we need based
on the writeback needs of the system.  Since there is no inode to key off of we
need a list in the bdi for dirty super blocks to be added.  From there we can
find any dirty sb's on the bdi we are currently doing writeback on and call into
their ->write_metadata callback.

Signed-off-by: Josef Bacik 
---
 fs/fs-writeback.c| 58 +---
 fs/super.c   |  7 +
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/fs.h   |  4 +++
 mm/backing-dev.c |  1 +
 5 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d329f89..b7d8946 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1615,11 +1615,36 @@ static long writeback_sb_inodes(struct super_block *sb,
return wrote;
 }
 
+static long writeback_sb_metadata(struct super_block *sb,
+ struct bdi_writeback *wb,
+ struct wb_writeback_work *work)
+{
+   struct writeback_control wbc = {
+   .sync_mode  = work->sync_mode,
+   .tagged_writepages  = work->tagged_writepages,
+   .for_kupdate= work->for_kupdate,
+   .for_background = work->for_background,
+   .for_sync   = work->for_sync,
+   .range_cyclic   = work->range_cyclic,
+   .range_start= 0,
+   .range_end  = LLONG_MAX,
+   };
+   long write_chunk;
+
+   write_chunk = writeback_chunk_size(wb, work);
+   wbc.nr_to_write = write_chunk;
+   sb->s_op->write_metadata(sb, );
+   work->nr_pages -= write_chunk - wbc.nr_to_write;
+
+   return write_chunk - wbc.nr_to_write;
+}
+
 static long __writeback_inodes_wb(struct bdi_writeback *wb,
  struct wb_writeback_work *work)
 {
unsigned long start_time = jiffies;
long wrote = 0;
+   bool done = false;
 
while (!list_empty(>b_io)) {
struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1639,11 +1664,38 @@ static long __writeback_inodes_wb(struct bdi_writeback 
*wb,
 
/* refer to the same tests at the end of writeback_sb_inodes */
if (wrote) {
-   if (time_is_before_jiffies(start_time + HZ / 10UL))
-   break;
-   if (work->nr_pages <= 0)
+   if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+   work->nr_pages <= 0) {
+   done = true;
break;
+   }
+   }
+   }
+
+   if (!done && wb_stat(wb, WB_METADATA_DIRTY)) {
+   LIST_HEAD(list);
+
+   spin_unlock(>list_lock);
+   spin_lock(>bdi->sb_list_lock);
+   list_splice_init(>bdi->dirty_sb_list, );
+   while (!list_empty()) {
+   struct super_block *sb;
+
+   sb = list_first_entry(, struct super_block,
+ s_bdi_list);
+   list_move_tail(>s_bdi_list,
+  >bdi->dirty_sb_list);
+   if (!sb->s_op->write_metadata)
+   continue;
+   if (!trylock_super(sb))
+   continue;
+   spin_unlock(>bdi->sb_list_lock);
+   wrote += writeback_sb_metadata(sb, wb, work);
+   spin_lock(>bdi->sb_list_lock);
+   up_read(>s_umount);
}
+   spin_unlock(>bdi->sb_list_lock);
+   spin_lock(>list_lock);
}
/* Leave any unwritten inodes on b_io */
return wrote;
diff --git a/fs/super.c b/fs/super.c
index c2ff475..c1b1028 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -215,6 +215,7 @@ static struct super_block *alloc_super(struct 
file_system_type *type, int flags,
spin_lock_init(>s_inode_list_lock);
INIT_LIST_HEAD(>s_inodes_wb);
spin_lock_init(>s_inode_wblist_lock);
+   INIT_LIST_HEAD(>s_bdi_list);
 
if (list_lru_init_memcg(>s_dentry_lru))
goto fail;
@@ -305,6 +306,8 @@ void deactivate_locked_super(struct super_block *s)
 {
struct file_system_type *fs = s->s_type;
if (atomic_dec_and_test(>s_active)) {
+   struct backing_dev_info *bdi = s->s_bdi;
+
cleancache_invalidate_fs(s);
unregister_shrinker(>s_shrink);
fs->kill_sb(s);

[PATCH 0/3][V2] Provide accounting for dirty metadata

2016-08-22 Thread Josef Bacik
Here is my updated set of patches for providing a way for fs'es to do their own
accounting for their dirty metadata pages.  The changes since V1 include

-Split the accounting + ->write_metadata patches out into their own patches.
-Added a few more account_metadata* helpers that I hadn't thought about
previously.
-Changed the bdi->sb_list to bdi->dirty_sb_list.  This is to avoid confusion
about the purpose of the list.  I do a splice of this list when processing it as
we have to drop the list lock and I didn't want to worry about umounts screwing
up the list while we were writing metadata.
-Added the dirty metadata counter to the various places we output those counters
(meminfo, oom messages, etc).

I've also actually changed btrfs to use these interfaces and have been testing
that code for almost a week now and have fixed up the various problems that
happend with V1 of this code.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] remove mapping from balance_dirty_pages*()

2016-08-22 Thread Josef Bacik
The only reason we pass in the mapping is to get the inode in order to see if
writeback cgroups is enabled, and even then it only checks the bdi and a super
block flag.  balance_dirty_pages() doesn't even use the mapping.  Since
balance_dirty_pages*() works on a bdi level, just pass in the bdi and super
block directly so we can avoid using mapping.  This will allow us to still use
balance_dirty_pages for dirty metadata pages that are not backed by an
address_mapping.

Signed-off-by: Josef Bacik 
Reviewed-by: Jan Kara 
---
 drivers/mtd/devices/block2mtd.c | 12 
 fs/btrfs/disk-io.c  |  4 ++--
 fs/btrfs/file.c |  3 ++-
 fs/btrfs/ioctl.c|  3 ++-
 fs/btrfs/relocation.c   |  3 ++-
 fs/buffer.c |  3 ++-
 fs/iomap.c  |  3 ++-
 fs/ntfs/attrib.c| 10 +++---
 fs/ntfs/file.c  |  4 ++--
 include/linux/backing-dev.h | 29 +++--
 include/linux/writeback.h   |  3 ++-
 mm/filemap.c|  4 +++-
 mm/memory.c |  9 +++--
 mm/page-writeback.c | 15 +++
 14 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 7c887f1..7892d0b 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -52,7 +52,8 @@ static struct page *page_read(struct address_space *mapping, 
int index)
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-   struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+   struct inode *inode = dev->blkdev->bd_inode;
+   struct address_space *mapping = inode->i_mapping;
struct page *page;
int index = to >> PAGE_SHIFT;   // page index
int pages = len >> PAGE_SHIFT;
@@ -71,7 +72,8 @@ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t 
to, size_t len)
memset(page_address(page), 0xff, PAGE_SIZE);
set_page_dirty(page);
unlock_page(page);
-   balance_dirty_pages_ratelimited(mapping);
+   
balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+   inode->i_sb);
break;
}
 
@@ -141,7 +143,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, 
const u_char *buf,
loff_t to, size_t len, size_t *retlen)
 {
struct page *page;
-   struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+   struct inode *inode = dev->blkdev->bd_inode;
+   struct address_space *mapping = inode->i_mapping;
int index = to >> PAGE_SHIFT;   // page index
int offset = to & ~PAGE_MASK;   // page offset
int cpylen;
@@ -162,7 +165,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, 
const u_char *buf,
memcpy(page_address(page) + offset, buf, cpylen);
set_page_dirty(page);
unlock_page(page);
-   balance_dirty_pages_ratelimited(mapping);
+   balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+   inode->i_sb);
}
put_page(page);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87dad55..4034ad6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4024,8 +4024,8 @@ static void __btrfs_btree_balance_dirty(struct btrfs_root 
*root,
ret = percpu_counter_compare(>fs_info->dirty_metadata_bytes,
 BTRFS_DIRTY_METADATA_THRESH);
if (ret > 0) {
-   balance_dirty_pages_ratelimited(
-  root->fs_info->btree_inode->i_mapping);
+   balance_dirty_pages_ratelimited(>fs_info->bdi,
+   root->fs_info->sb);
}
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9404121..f060b08 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1686,7 +1686,8 @@ again:
 
cond_resched();
 
-   balance_dirty_pages_ratelimited(inode->i_mapping);
+   balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+   inode->i_sb);
if (dirty_pages < (root->nodesize >> PAGE_SHIFT) + 1)
btrfs_btree_balance_dirty(root);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14ed1e9..a222bad 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1410,7 +1410,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
}
 
defrag_count += 

Re: [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes

2016-08-22 Thread Mike Snitzer
On Mon, Aug 22 2016 at  4:05am -0400,
Lukas Herbolt  wrote:

> Hello,
> 
> There is patch from Mike. It's part of current pull request to 4.8-rc1
> For more details check:
>  - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html
>  - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html
> 
> Lukas
> 
> On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruo  wrote:
> > Hi, Mike and btrfs and dm guys
> >
> > When doing regression test on v4.8-rc1, we found that fstests/btrfs/056
> > always fails. With the following dmesg:
> > ---
> > Buffer I/O error on dev dm-0, logical block 1310704, async page read
> > Buffer I/O error on dev dm-0, logical block 16, async page read
> > Buffer I/O error on dev dm-0, logical block 16, async page read
> > ---
> >
> > And bisect leads to the following commits:
> > ---
> > commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819
> > Author: Mike Snitzer 
> > Date:   Fri Jul 29 13:19:55 2016 -0400
> >
> > dm flakey: error READ bios during the down_interval
> > ---
> >
> > While according to the document of dm-flakey, it says that when using
> > drop_writes feature, read bios are not affected:
> > ---
> >   drop_writes:
> > All write I/O is silently ignored.
> > Read I/O is handled correctly.

I went back to the dm-flakey.c code at the time that the 'drop_writes'
feature was added via commit b26f5e3d.  It does confirm your
understanding of how reads should be handled if drop_writes is enabled.

Not sure why I thought differently.  Please try the following patch.

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 97e446d..6a2e8dd 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -289,15 +289,13 @@ static int flakey_map(struct dm_target *ti, struct bio 
*bio)
pb->bio_submitted = true;
 
/*
-* Map reads as normal only if corrupt_bio_byte set.
+* Error reads if neither corrupt_bio_byte or drop_writes are 
set.
+* Otherwise, flakey_end_io() will decide if the reads should 
be modified.
 */
if (bio_data_dir(bio) == READ) {
-   /* If flags were specified, only corrupt those that 
match. */
-   if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == 
READ) &&
-   all_corrupt_bio_flags_match(bio, fc))
-   goto map_bio;
-   else
+   if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, 
>flags))
return -EIO;
+   goto map_bio;
}
 
/*
@@ -334,14 +332,21 @@ static int flakey_end_io(struct dm_target *ti, struct bio 
*bio, int error)
struct flakey_c *fc = ti->private;
struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct 
per_bio_data));
 
-   /*
-* Corrupt successful READs while in down state.
-*/
if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
-   if (fc->corrupt_bio_byte)
+   if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
+   all_corrupt_bio_flags_match(bio, fc)) {
+   /*
+* Corrupt successful matching READs while in down 
state.
+*/
corrupt_bio_data(bio, fc);
-   else
+
+   } else if (!test_bit(DROP_WRITES, >flags)) {
+   /*
+* Error read during the down_interval if drop_writes
+* wasn't configured.
+*/
return -EIO;
+   }
}
 
return error;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes

2016-08-22 Thread Lukas Herbolt
Hi Qu,

Sorry for the confusion. Reading the email again and the code it seems
that the READS are really returned as -EIO if you set the drop_writes.
I just tested it and you are right.

If I was reading the fstest correctly the flakey is created as:
---
flakey: 0 409600 flakey 8:64 0 0 180 1 drop_writes
---

I believe the READs are dropped because it does not have any flags set.

---
if (bio_data_dir(bio) == READ) {
/* If flags were specified, only corrupt those that match. */
if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
all_corrupt_bio_flags_match(bio, fc))
goto map_bio;
else
return -EIO;
}
---

with conclusion of setting:
---
/*
 * Flag this bio as submitted while down.
 */
pb->bio_submitted = true;
---

I have quick test patch ready, but it probably broke more thing than
fixes so I will continue on it.
Just in case you want to test it. Diff is done again 4.8-rc1

--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -292,6 +292,11 @@ static int flakey_map(struct dm_target *ti,
struct bio *bio)
 * Map reads as normal only if corrupt_bio_byte set.
 */
if (bio_data_dir(bio) == READ) {
+/* We should retunr all READS as ok in case
of DROP WRITES flag is set. */
+   if (test_bit(DROP_WRITES, >flags)) {
+   pb->bio_submitted = false;
+   goto map_bio;
+   }
/* If flags were specified, only corrupt those
that match. */
if (fc->corrupt_bio_byte &&
(fc->corrupt_bio_rw == READ) &&
all_corrupt_bio_flags_match(bio, fc))



On Mon, Aug 22, 2016 at 10:05 AM, Lukas Herbolt  wrote:
> Hello,
>
> There is patch from Mike. It's part of current pull request to 4.8-rc1
> For more details check:
>  - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html
>  - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html
>
> Lukas
>
> On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruo  wrote:
>> Hi, Mike and btrfs and dm guys
>>
>> When doing regression test on v4.8-rc1, we found that fstests/btrfs/056
>> always fails. With the following dmesg:
>> ---
>> Buffer I/O error on dev dm-0, logical block 1310704, async page read
>> Buffer I/O error on dev dm-0, logical block 16, async page read
>> Buffer I/O error on dev dm-0, logical block 16, async page read
>> ---
>>
>> And bisect leads to the following commits:
>> ---
>> commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819
>> Author: Mike Snitzer 
>> Date:   Fri Jul 29 13:19:55 2016 -0400
>>
>> dm flakey: error READ bios during the down_interval
>> ---
>>
>> While according to the document of dm-flakey, it says that when using
>> drop_writes feature, read bios are not affected:
>> ---
>>   drop_writes:
>> All write I/O is silently ignored.
>> Read I/O is handled correctly.
>> ---
>>
>> If I understand the word "correctly" correctly, it should means READ I/0 is
>> handled without problem.
>>
>> However with this commit, it also corrupt the read bio, leading to the test
>> failure.
>>
>>
>> At least there are two fixes available here;
>> 1) Fix fstest scripts
>>The related macro is "_flakey_drop_and_remount yes", which will
>>check the fs during the "drop_writes" time.
>>
>>Currently, only btrfs/056 calls "_flakey_drop_and_remount" with
>>"yes". So other test cases are not affected.
>>
>>However, even we move the fsck outside of the "drop_writes" range,
>>although test case can pass without problem, but we will still
>>get a dmesg error:
>>"Buffer I/O error on dev dm-0, logical block 1310704, async page read"
>>
>> 2) Revert to flakey behavior to allow READ bio
>>Then everything is back to the old good days.
>>
>> Not sure which one is correct for current use case, as I'm not familiar with
>> dm codes.
>>
>> Any idea to fix dm-flaky and keep the READ bio behavior?
>>
>> Thanks,
>> Qu
>>
>>
>>
>>
>>
>> --
>> dm-devel mailing list
>> dm-de...@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
>
> --
> Lukas Herbolt
> RHCE, RH436, BSc, SSc
> Senior Technical Support Engineer
> Global Support Services (GSS)
> Email:lherb...@redhat.com



-- 
Lukas Herbolt
RHCE, RH436, BSc, SSc
Senior Technical Support Engineer
Global Support Services (GSS)
Email:lherb...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] [Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes

2016-08-22 Thread Lukas Herbolt
Hello,

There is patch from Mike. It's part of current pull request to 4.8-rc1
For more details check:
 - https://www.redhat.com/archives/dm-devel/2016-July/msg00561.html
 - https://www.redhat.com/archives/dm-devel/2016-August/msg00109.html

Lukas

On Mon, Aug 22, 2016 at 9:31 AM, Qu Wenruo  wrote:
> Hi, Mike and btrfs and dm guys
>
> When doing regression test on v4.8-rc1, we found that fstests/btrfs/056
> always fails. With the following dmesg:
> ---
> Buffer I/O error on dev dm-0, logical block 1310704, async page read
> Buffer I/O error on dev dm-0, logical block 16, async page read
> Buffer I/O error on dev dm-0, logical block 16, async page read
> ---
>
> And bisect leads to the following commits:
> ---
> commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819
> Author: Mike Snitzer 
> Date:   Fri Jul 29 13:19:55 2016 -0400
>
> dm flakey: error READ bios during the down_interval
> ---
>
> While according to the document of dm-flakey, it says that when using
> drop_writes feature, read bios are not affected:
> ---
>   drop_writes:
> All write I/O is silently ignored.
> Read I/O is handled correctly.
> ---
>
> If I understand the word "correctly" correctly, it should means READ I/0 is
> handled without problem.
>
> However with this commit, it also corrupt the read bio, leading to the test
> failure.
>
>
> At least there are two fixes available here;
> 1) Fix fstest scripts
>The related macro is "_flakey_drop_and_remount yes", which will
>check the fs during the "drop_writes" time.
>
>Currently, only btrfs/056 calls "_flakey_drop_and_remount" with
>"yes". So other test cases are not affected.
>
>However, even we move the fsck outside of the "drop_writes" range,
>although test case can pass without problem, but we will still
>get a dmesg error:
>"Buffer I/O error on dev dm-0, logical block 1310704, async page read"
>
> 2) Revert to flakey behavior to allow READ bio
>Then everything is back to the old good days.
>
> Not sure which one is correct for current use case, as I'm not familiar with
> dm codes.
>
> Any idea to fix dm-flaky and keep the READ bio behavior?
>
> Thanks,
> Qu
>
>
>
>
>
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel



-- 
Lukas Herbolt
RHCE, RH436, BSc, SSc
Senior Technical Support Engineer
Global Support Services (GSS)
Email:lherb...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Regression/Behavior change]dm-flakey corrupt read bio, even the feature is drop_writes

2016-08-22 Thread Qu Wenruo

Hi, Mike and btrfs and dm guys

When doing regression test on v4.8-rc1, we found that fstests/btrfs/056 
always fails. With the following dmesg:

---
Buffer I/O error on dev dm-0, logical block 1310704, async page read
Buffer I/O error on dev dm-0, logical block 16, async page read
Buffer I/O error on dev dm-0, logical block 16, async page read
---

And bisect leads to the following commits:
---
commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819
Author: Mike Snitzer 
Date:   Fri Jul 29 13:19:55 2016 -0400

dm flakey: error READ bios during the down_interval
---

While according to the document of dm-flakey, it says that when using 
drop_writes feature, read bios are not affected:

---
  drop_writes:
All write I/O is silently ignored.
Read I/O is handled correctly.
---

If I understand the word "correctly" correctly, it should means READ I/0 
is handled without problem.


However with this commit, it also corrupt the read bio, leading to the 
test failure.



At least there are two fixes available here;
1) Fix fstest scripts
   The related macro is "_flakey_drop_and_remount yes", which will
   check the fs during the "drop_writes" time.

   Currently, only btrfs/056 calls "_flakey_drop_and_remount" with
   "yes". So other test cases are not affected.

   However, even we move the fsck outside of the "drop_writes" range,
   although test case can pass without problem, but we will still
   get a dmesg error:
   "Buffer I/O error on dev dm-0, logical block 1310704, async page read"

2) Revert to flakey behavior to allow READ bio
   Then everything is back to the old good days.

Not sure which one is correct for current use case, as I'm not familiar 
with dm codes.


Any idea to fix dm-flaky and keep the READ bio behavior?

Thanks,
Qu





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html