Re: About btrfs qgroup import/export command

2013-01-09 Thread Miao Xie
Hi, Arne

On Wed, 19 Dec 2012 12:40:25 +0100, Arne Jansen wrote:
 On 19.12.2012 12:25, Miao Xie wrote:
 As we know, there is no backup function for qgroup. when the problem
 occurs, the users must recover qgroup configuration manually, it is not
 convenient. And besides that, some users might want to import an existed
 qgroup configuration into a new filesystem. Btrfs does not have such a
 function,it can only be done manually.

 So we want to implement btrfs qgroup import/export commands.
 1)'btrfs qgroup export' commands will export qgroup tree
   into a user's specified file.(stdout by default)

 2)user may modify the configuration file firstly and then
   import it into the filesystem.(by 'btrfs qgroup import' command)

 The file may be formated as the following:

 Qgroupid is_compressed is_exclusive   limited_sizeparent
 --
  0/10 0  10G1/0
  1/01 1  20G---
   
  If 'is_exclusive' is set, 'limited_size' corresponds to max exlusive size,
  else max referenced size. Here 'parent' exclude ancestral qgroups. 

 Is there any comment about this idea? 
 
 The configuration only really makes sense in combination with the existing
 subvolumes. Even if the target has subvolumes under the same name, they
 might have different internal IDs. So it might make more sense to address
 the level 0 qgroups by name.

Good idea.

 Also it might be misleading to apply a configuration to an existing fs, as
 it currently is not possible get a correct accounting if the fs is not
 empty. Rescan is not yet implemented.

Rescan will be implemented in the future, so it is not a main problem
to implement 'btrfs qgroup import/export' commands.

 So instead of just saving and restoring the qgroup config, it might make
 more sense to create a new filesystem including all subvolumes and quota
 config from a config file.
 But, I'm not completely convinced that this is a features that is needed
 frequently. If I want a standard deployment, I simple write a script that
 creates the fs + subvol + quota.

If users want to config some qgroups(reset the limited size,
modify its ancestral qgroups),i think it is more convenient and flexible
to use import/export commands than write a script.


Above all,our qgroup import/export commands will be implemented as follows:

qgroupid  is_compressed  is_exclusive  limited_size   parent   full_path


And we may specify matching degree when we import the qgroup information.

1strict matching
qgroup(level-0) matches a subvolume/snapshot 's objectid and full path 
exactly.
If a qgroup fail to match, the process will exit.

2general matching
It only require qgroup(level-0) to match a subvolume/snapshot 's full 
path.
If the corresponding subvolume/snapshot does not exist,skip it. 
Otherwise,apply
modifications to the corresponding subvolume/snapshot qgroup.

3weak matching
It only require qgroup(level-0) to match a subvolume/snapshot 's full 
path.
If the corresponding subvolume/snapshot does not exist,create the 
subvolume
automatically(a tracking qgroup is also created automatically)and then 
apply
modifications to the newly created tracking qgroup.


How do you think about the above idea?

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: remove '-h' from btrfs man page

2013-01-09 Thread Simon Xu
Remove '-h' from btrfs man page as it's not supported by the btrfs utility.
---
 man/btrfs.8.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 9222580..bf3ea13 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -52,7 +52,7 @@ btrfs \- control a btrfs filesystem
 \fBbtrfs\fP \fBinspect-internal logical-resolve\fP
 [-Pv] [-s size] \fIlogical\fP \fIpath\fP
 .PP
-\fBbtrfs\fP \fBhelp|\-\-help|\-h \fP\fI\fP
+\fBbtrfs\fP \fBhelp|\-\-help \fP\fI\fP
 .PP
 \fBbtrfs\fP \fBcommand \-\-help \fP\fI\fP
 .PP
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.

2013-01-09 Thread Richard Cooper

On 3 Jan 2013, at 16:43, Richard Cooper wrote:

 On 3 Jan 2013, at 15:06, Josef Bacik wrote:
 
 On Thu, Jan 03, 2013 at 05:26:38AM -0700, Richard Cooper wrote:
 Hi All,
 
 I'm trying to repair a broken fs using btrfsck and am hitting a failed 
 assertion. I'd appreciate any suggestions for what to do next. Is there 
 anything I can do to help fix this bug? Any other information from my FS 
 which would help? If the FS could be salvaged that would be a bonus, but 
 I'm more interested in providing a useful bug report before wiping the disk.
 
 
 Well good news is that its the allocator failing to find space for a new 
 block,
 and the allocator in btrfs-progs is under-tested, so it's likely just an
 internal bug and something we can fix.  Can you do btrfs fi show /dev/md4 
 (not
 mounted) and post that so we can be sure there's actually enough space. 
 
 # ./btrfs fi show /dev/md4 
 Label: none  uuid: 5be10dea-64c1-474e-b640-987b25af3c27
   Total devices 1 FS bytes used 606.79GB
   devid1 size 16.36TB used 627.04GB path /dev/md4

Is this all the information you need? Is there a bug tracker I should report 
this to, to stop it getting lost in the mailing list archives?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Btrfs: remove unnecessary cur_trans set before goto loop in join_transaction

2013-01-09 Thread Jiri Kosina
On Mon, 24 Sep 2012, Wang Sheng-Hui wrote:

 In the big loop, cur_trans will be set fs_info-running_transaction
 before it's used. And after kmem_cache_free it and goto loop, it will
 be setup again. No need to setup it immediately after freed.
 
 Signed-off-by: Wang Sheng-Hui shh...@gmail.com
 ---
  fs/btrfs/transaction.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 469a8b6..675d813 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -98,7 +98,6 @@ loop:
* to redo the trans_no_join checks above
*/
   kmem_cache_free(btrfs_transaction_cachep, cur_trans);
 - cur_trans = fs_info-running_transaction;
   goto loop;
   } else if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
   spin_unlock(fs_info-trans_lock);

This is Obviously Correct(TM) :-) and doesn't seem to have been picked up 
by btrfs maintainers. I am picking it up now.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix off-by-one in lseek

2013-01-09 Thread David Sterba
On Wed, Jan 09, 2013 at 12:34:45PM +0800, Liu Bo wrote:
  [20191.948060] D: __set_extent_bit isize = 0 odd range 
  [774144,7384799041917984768)
  [20191.956581] D: clear_extent_bit isize = 0 odd range 
  [774144,7384799041917984768)
  
  so I'm not sending it as a separate patch yet until the check covers all 
  cases.
 
 Thanks for coding this up, I've checked the code, these messages can
 be fixed by the following, please check if it works on your side :)

Thanks, no more of these warnings. There was one new to me, during test
013:

[  348.433006] [ cut here ]
[  348.438926] WARNING: at fs/btrfs/disk-io.c:3210 free_fs_root+0x8b/0x90 
[btrfs]()
[  348.447596] Hardware name: Santa Rosa platform
[  348.447602] Modules linked in: aoe dm_crypt loop btrfs
[  348.447605] Pid: 9091, comm: umount Not tainted 3.8.0-rc2-default+ #229
[  348.447607] Call Trace:
[  348.447615]  [8104c6bf] warn_slowpath_common+0x7f/0xc0
[  348.447619]  [8104c71a] warn_slowpath_null+0x1a/0x20
[  348.447635]  [a002af3b] free_fs_root+0x8b/0x90 [btrfs]
[  348.447652]  [a002e75e] btrfs_free_fs_root+0x7e/0x90 [btrfs]
[  348.447668]  [a002e84b] del_fs_roots+0xdb/0x120 [btrfs]
[  348.447683]  [a002292e] ? btrfs_free_block_groups+0x29e/0x370 
[btrfs]
[  348.447699]  [a0030182] close_ctree+0x1d2/0x340 [btrfs]
[  348.447705]  [81178c6f] ? dispose_list+0x4f/0x60
[  348.447711]  [811799d4] ? evict_inodes+0x114/0x130
[  348.447722]  [a0003c69] btrfs_put_super+0x19/0x20 [btrfs]
[  348.447727]  [811608f2] generic_shutdown_super+0x62/0xf0
[  348.447730]  [81160a16] kill_anon_super+0x16/0x30
[  348.447741]  [a0004d9a] btrfs_kill_super+0x1a/0x90 [btrfs]
[  348.447744]  [811618e2] ? deactivate_super+0x42/0x70
[  348.447748]  [81160c6d] deactivate_locked_super+0x3d/0x90
[  348.447751]  [811618ea] deactivate_super+0x4a/0x70
[  348.447755]  [8117dc70] mntput_no_expire+0x100/0x160
[  348.447759]  [8117ecb1] sys_umount+0x71/0x3c0
[  348.447763]  [81960919] system_call_fastpath+0x16/0x1b
[  348.447765] ---[ end trace 25a08f78869c0553 ]---
[  348.614158] VFS: Busy inodes after unmount of sda8. Self-destruct in 5 
seconds.  Have a nice day...

looks like a leaked inode. The line number does not match a WARN in the
sources, this one is better:

(gdb) l *(free_fs_root+0x8b)
0x2ab5b is in free_fs_root (fs/btrfs/disk-io.c:3206).
3201}
3202
3203static void free_fs_root(struct btrfs_root *root)
3204{
3205iput(root-cache_inode);
3206WARN_ON(!RB_EMPTY_ROOT(root-inode_tree));
3207if (root-anon_dev)
3208free_anon_bdev(root-anon_dev);
3209free_extent_buffer(root-node);
3210free_extent_buffer(root-commit_root);

I've added only the 2 fixes from you, no other change. I'll do another test
based on current btrfs-next.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About btrfs qgroup import/export command

2013-01-09 Thread Arne Jansen
On 09.01.2013 11:17, Miao Xie wrote:
 Hi, Arne
 
 On Wed, 19 Dec 2012 12:40:25 +0100, Arne Jansen wrote:
 On 19.12.2012 12:25, Miao Xie wrote:
 As we know, there is no backup function for qgroup. when the problem
 occurs, the users must recover qgroup configuration manually, it is not
 convenient. And besides that, some users might want to import an existed
 qgroup configuration into a new filesystem. Btrfs does not have such a
 function,it can only be done manually.

 So we want to implement btrfs qgroup import/export commands.
 1)'btrfs qgroup export' commands will export qgroup tree
   into a user's specified file.(stdout by default)

 2)user may modify the configuration file firstly and then
   import it into the filesystem.(by 'btrfs qgroup import' command)

 The file may be formated as the following:

 Qgroupid is_compressed is_exclusive   limited_sizeparent
 --
  0/10 0  10G1/0
  1/01 1  20G---
   
  If 'is_exclusive' is set, 'limited_size' corresponds to max exlusive size,
  else max referenced size. Here 'parent' exclude ancestral qgroups. 

 Is there any comment about this idea? 

 The configuration only really makes sense in combination with the existing
 subvolumes. Even if the target has subvolumes under the same name, they
 might have different internal IDs. So it might make more sense to address
 the level 0 qgroups by name.
 
 Good idea.
 
 Also it might be misleading to apply a configuration to an existing fs, as
 it currently is not possible get a correct accounting if the fs is not
 empty. Rescan is not yet implemented.
 
 Rescan will be implemented in the future, so it is not a main problem
 to implement 'btrfs qgroup import/export' commands.
 
 So instead of just saving and restoring the qgroup config, it might make
 more sense to create a new filesystem including all subvolumes and quota
 config from a config file.
 But, I'm not completely convinced that this is a features that is needed
 frequently. If I want a standard deployment, I simple write a script that
 creates the fs + subvol + quota.
 
 If users want to config some qgroups(reset the limited size,
 modify its ancestral qgroups),i think it is more convenient and flexible
 to use import/export commands than write a script.
 
 
 Above all,our qgroup import/export commands will be implemented as follows:
 
 qgroupid  is_compressed  is_exclusive  limited_size   parent   full_path
 
 
 And we may specify matching degree when we import the qgroup information.
 
 1strict matching
   qgroup(level-0) matches a subvolume/snapshot 's objectid and full path 
 exactly.
   If a qgroup fail to match, the process will exit.
 
 2general matching
   It only require qgroup(level-0) to match a subvolume/snapshot 's full 
 path.
   If the corresponding subvolume/snapshot does not exist,skip it. 
 Otherwise,apply
   modifications to the corresponding subvolume/snapshot qgroup.
 
 3weak matching
   It only require qgroup(level-0) to match a subvolume/snapshot 's full 
 path.
   If the corresponding subvolume/snapshot does not exist,create the 
 subvolume
   automatically(a tracking qgroup is also created automatically)and then 
 apply
   modifications to the newly created tracking qgroup.
 
 
 How do you think about the above idea?

I still have problems imagining a use case for this. In our setup we have
lots of subvolumes with a quota configuration that follows some rules, but
it won't be possible to just import/export from one machine to the other.
So you have to design the tool to your needs.
There's another essential tool that's still missing with regard to quota
which I'd love to see come to life:
Currently the configuration of the tracking qgroups is completely left to
the user. This requires a deep understanding how qgroups work from him.
It would be great if we could come up with a simple description language
where the user just describes what he wants to achieve and the tool calculates
the tracking groups itself. It could also contain a templating mechanism
that might cover your use case.
A description might contain information e.g. which subvols to group, from which
subvols the user intends to take snapshots in the future and in which groups
those snapshots will be put. My pdf gives some example use cases which should
be possible to cover.
That's not exactly what you have in mind, but maybe it is possible to cover
both needs with one tool.

-Arne

 
 Thanks
 Miao
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a 

Re: [PATCH] Btrfs: fix off-by-one in lseek

2013-01-09 Thread David Sterba
On Wed, Jan 09, 2013 at 12:50:25PM +0100, David Sterba wrote:
 I've added only the 2 fixes from you, no other change. I'll do another test
 based on current btrfs-next.

reproduced on linus/master + btrfs-next + debugging patch + your fixes
with test 068

[ 1285.152973] [ cut here ]
[ 1285.158880] WARNING: at fs/btrfs/extent-tree.c:7696 
btrfs_free_block_groups+0x25b/0x370 [btrfs]()
[ 1285.169400] Hardware name: Santa Rosa platform
[ 1285.169410] Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs]
[ 1285.169414] Pid: 28160, comm: umount Not tainted 3.8.0-rc2-default+ #230
[ 1285.169420] Call Trace:
[ 1285.169431]  [8104c6bf] warn_slowpath_common+0x7f/0xc0
[ 1285.169438]  [8104c71a] warn_slowpath_null+0x1a/0x20
[ 1285.169462]  [a02108eb] btrfs_free_block_groups+0x25b/0x370 [btrfs]
[ 1285.169487]  [a021e1aa] close_ctree+0x1ca/0x340 [btrfs]
[ 1285.169495]  [81178c6f] ? dispose_list+0x4f/0x60
[ 1285.169502]  [811799d4] ? evict_inodes+0x114/0x130
[ 1285.169519]  [a01f1c69] btrfs_put_super+0x19/0x20 [btrfs]
[ 1285.169527]  [811608f2] generic_shutdown_super+0x62/0xf0
[ 1285.169536]  [81160a16] kill_anon_super+0x16/0x30
[ 1285.169552]  [a01f2d9a] btrfs_kill_super+0x1a/0x90 [btrfs]
[ 1285.169559]  [811618e2] ? deactivate_super+0x42/0x70
[ 1285.169566]  [81160c6d] deactivate_locked_super+0x3d/0x90
[ 1285.169573]  [811618ea] deactivate_super+0x4a/0x70
[ 1285.169582]  [8117dc70] mntput_no_expire+0x100/0x160
[ 1285.169587]  [8117ecb1] sys_umount+0x71/0x3c0
[ 1285.169593]  [81960cd9] system_call_fastpath+0x16/0x1b
[ 1285.169595] ---[ end trace 32a766aa6196f679 ]---
[ 1285.169598] space_info 4 has 1073651712 free, is not full
[ 1285.169601] space_info total=1082130432, used=24576, pinned=0, reserved=0, 
may_use=98304, readonly=8454144
[ 1285.169618] [ cut here ]
[ 1285.169638] WARNING: at fs/btrfs/disk-io.c:3211 free_fs_root+0x8b/0x90 
[btrfs]()
[ 1285.169639] Hardware name: Santa Rosa platform
[ 1285.169646] Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs]
[ 1285.169650] Pid: 28160, comm: umount Tainted: GW 3.8.0-rc2-default+ 
#230
[ 1285.169651] Call Trace:
[ 1285.169656]  [8104c6bf] warn_slowpath_common+0x7f/0xc0
[ 1285.169661]  [8104c71a] warn_slowpath_null+0x1a/0x20
[ 1285.169681]  [a0218f6b] free_fs_root+0x8b/0x90 [btrfs]
[ 1285.169703]  [a021c78e] btrfs_free_fs_root+0x7e/0x90 [btrfs]
[ 1285.169726]  [a021c87b] del_fs_roots+0xdb/0x120 [btrfs]
[ 1285.169747]  [a021092e] ?  btrfs_free_block_groups+0x29e/0x370 
[btrfs]
[ 1285.169771]  [a021e1b2] close_ctree+0x1d2/0x340 [btrfs]
[ 1285.169778]  [81178c6f] ? dispose_list+0x4f/0x60
[ 1285.169784]  [811799d4] ? evict_inodes+0x114/0x130
[ 1285.169801]  [a01f1c69] btrfs_put_super+0x19/0x20 [btrfs]
[ 1285.169808]  [811608f2] generic_shutdown_super+0x62/0xf0
[ 1285.169815]  [81160a16] kill_anon_super+0x16/0x30
[ 1285.169831]  [a01f2d9a] btrfs_kill_super+0x1a/0x90 [btrfs]
[ 1285.169838]  [811618e2] ? deactivate_super+0x42/0x70
[ 1285.169845]  [81160c6d] deactivate_locked_super+0x3d/0x90
[ 1285.169854]  [811618ea] deactivate_super+0x4a/0x70
[ 1285.169861]  [8117dc70] mntput_no_expire+0x100/0x160
[ 1285.169868]  [8117ecb1] sys_umount+0x71/0x3c0
[ 1285.169877]  [81960cd9] system_call_fastpath+0x16/0x1b
[ 1285.169881] ---[ end trace 32a766aa6196f67a ]---
[ 1285.550715] VFS: Busy inodes after unmount of sda9. Self-destruct in
5 seconds.  Have a nice day...

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] send: Avoid sending disknr==0 and PREALLOC extents when possible

2013-01-09 Thread Alex Lyakas
These two patches address the issue of sending unneeded zero data for
disknr==0 and PREALLOC extents.
There is room for additional improvement for PREALLOC extents, but it
requires adding a new command, so for
now this is not addressed.

Please review and comment.

Thanks,
Alex.

Alex Lyakas (2):
  Avoid sending disknr==0 extents in the following cases: 1) full
send 2) new inode in a diff-send 3) when disknr==0 extents are
added to the end of an inode
  On a diff-send, avoid sending PREALLOC extents, if the parent root
has only PREALLOC extents on an appropriate file range.

 fs/btrfs/send.c |  178 +--
 1 file changed, 172 insertions(+), 6 deletions(-)

--
1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Avoid sending disknr==0 extents when possible

2013-01-09 Thread Alex Lyakas
Subject: [PATCH 1/2] Avoid sending disknr==0 extents in the following cases:
 1) full send 2) new inode in a diff-send 3) when
 disknr==0 extents are  added to the end of an inode

Original-version-by: Chen Yang chenyang.f...@cn.fujitsu.com
Signed-off-by: Alex Lyakas alex.bt...@zadarastorage.com
---
 fs/btrfs/send.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5445454..5ab584f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3844,7 +3844,8 @@ static int is_extent_unchanged(struct send_ctx *sctx,
btrfs_item_key_to_cpu(eb, found_key, slot);
if (found_key.objectid != key.objectid ||
found_key.type != key.type) {
-   ret = 0;
+   /* No need to send a no-data extent it in this case */
+   ret = (left_disknr == 0) ? 1 : 0;
goto out;
}

@@ -3870,7 +3871,8 @@ static int is_extent_unchanged(struct send_ctx *sctx,
 * This may only happen on the first iteration.
 */
if (found_key.offset + right_len = ekey-offset) {
-   ret = 0;
+   /* No need to send a no-data extent it in this case */
+   ret = (left_disknr == 0) ? 1 : 0;
goto out;
}

@@ -3951,6 +3953,28 @@ static int process_extent(struct send_ctx *sctx,
ret = 0;
goto out;
}
+   } else {
+   struct extent_buffer *eb;
+   struct btrfs_file_extent_item *ei;
+   u8 extent_type;
+   u64 extent_disknr;
+
+   eb = path-nodes[0];
+   ei = btrfs_item_ptr(eb, path-slots[0],
+   struct btrfs_file_extent_item);
+
+   extent_type = btrfs_file_extent_type(eb, ei);
+   extent_disknr = btrfs_file_extent_disk_bytenr(eb, ei);
+   if (extent_type == BTRFS_FILE_EXTENT_REG  extent_disknr == 0) 
{
+   /*
+* This is disknr=0 extent in a full-send or a new inode
+* in a diff-send. Since we will send truncate command
+* in finish_inode_if_needed anyways, the inode size 
will be
+* correct, and we don't have to send all-zero data.
+*/
+   ret = 0;
+   goto out;
+   }
}

ret = find_extent_clone(sctx, path, key-objectid, key-offset,
-- 
1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] On a diff-send, avoid sending PREALLOC extent

2013-01-09 Thread Alex Lyakas
Subject: [PATCH 2/2] On a diff-send, avoid sending PREALLOC extents, if the
 parent root has only PREALLOC extents on an appropriate
 file range.

This does not fully avoids sending PREALLOC extents, because on full-send or
new inode we need a new send command to do that. But this patch improves
the situation by handling diff-sends.

Signed-off-by: Alex Lyakas alex.bt...@zadarastorage.com
---
 fs/btrfs/send.c |  150 +--
 1 file changed, 146 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5ab584f..456bc3e 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3763,6 +3763,144 @@ out:
return ret;
 }

+static int is_prealloc_extent_unchanged(struct send_ctx *sctx,
+  struct btrfs_path *left_path,
+  struct btrfs_key *ekey)
+{
+   int ret = 0;
+   struct btrfs_key key;
+   struct btrfs_path *path = NULL;
+   struct extent_buffer *eb;
+   int slot;
+   struct btrfs_key found_key;
+   struct btrfs_file_extent_item *ei;
+   u64 left_len;
+   u64 right_len;
+   u8 right_type;
+
+   eb = left_path-nodes[0];
+   slot = left_path-slots[0];
+   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+   left_len = btrfs_file_extent_num_bytes(eb, ei);
+
+   /*
+* The logic is similar, but much simpler than in is_extent_unchanged().
+* We need to check extents on the parent root, and make sure that only
+* PREALLOC extents are on the same file range as our current extent.
+*
+* Following comments will refer to these graphics. L is the left
+* extents which we are checking at the moment. 1-8 are the right
+* extents that we iterate.
+*
+*|-L-|
+* |-1-|-2a-|-3-|-4-|-5-|-6-|
+*
+*|-L-|
+* |--1--|-2b-|...(same as above)
+*
+* Alternative situation. Happens on files where extents got split.
+*|-L-|
+* |---7---|-6-|
+*
+* Alternative situation. Happens on files which got larger.
+*|-L-|
+* |-8-|
+* Nothing follows after 8.
+*/
+
+   path = alloc_path_for_send();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = ekey-objectid;
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   key.offset = ekey-offset;
+   ret = btrfs_search_slot_for_read(sctx-parent_root, key, path, 0, 0);
+   if (ret  0)
+   goto out;
+   if (ret) {
+   ret = 0;
+   goto out;
+   }
+
+   /*
+* Handle special case where the right side has no extents at all.
+*/
+   eb = path-nodes[0];
+   slot = path-slots[0];
+   btrfs_item_key_to_cpu(eb, found_key, slot);
+   if (found_key.objectid != key.objectid ||
+   found_key.type != key.type) {
+   /*
+* We need to send a prealloc command, which we don't have 
yet,
+* just send this extent fully.
+*/
+   ret = 0;
+   goto out;
+   }
+
+   key = found_key;
+   while (key.offset  ekey-offset + left_len) {
+   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+   right_type = btrfs_file_extent_type(eb, ei);
+   right_len = btrfs_file_extent_num_bytes(eb, ei);
+
+   if (right_type != BTRFS_FILE_EXTENT_PREALLOC) {
+   ret = 0;
+   goto out;
+   }
+
+   /*
+* Are we at extent 8? If yes, we know the extent is changed.
+* This may only happen on the first iteration.
+*/
+   if (found_key.offset + right_len = ekey-offset) {
+   /*
+* We need to send a prealloc command, which we don't 
have yet,
+* just send this extent fully.
+*/
+   ret = 0;
+   goto out;
+   }
+
+   /* The right extent is also PREALLOC, so up to here we are ok,
continue checking */
+
+   ret = btrfs_next_item(sctx-parent_root, path);
+   if (ret  0)
+   goto out;
+   if (!ret) {
+   eb = path-nodes[0];
+   slot = path-slots[0];
+   btrfs_item_key_to_cpu(eb, found_key, slot);
+   }
+   if (ret || found_key.objectid != key.objectid ||
+   found_key.type != key.type) {
+   key.offset += right_len;
+   break;
+   } else {
+   if (found_key.offset != key.offset + right_len) {
+   /* 

Re: Btrfs: wipe all the superblock [redhat bugzilla 889888]

2013-01-09 Thread Karel Zak
On Wed, Jan 09, 2013 at 06:48:28PM +0100, Goffredo Baroncelli wrote:
 Hi Karel,
 
  
   You can specify more than one magic strings for the same filesystem,
   the .magics = { } is array.
 
 thanks for you suggestion. However this seems to me not applicable. I
 tried to change the code, and what I got to me seems inconsistently:
 
 Whit this change
 
 1) if I do wipefs device, I got the offset of the first superblock
 (good enough)
 2) if I do wipefs -a device, I clean-up *all three* superblocks
 (very good)
 3) if I do wipefs -o offset device, I clean-up only the superblock
 located at offset (very bad)

 this is expected behavior described in wipefs man page:

Note  that  some  filesystems  or  some partition tables store more
magic strings on the devices. The wipefs lists the first offset where
a magic string has been detected. The device is not  scanned  for
additional magic strings for the same filesystem. It's possible that
after wipefs -o offset will be the same filesystem or partition
table visible by another magic string on another offset.

 If the user doesn't know enough btrfs, trying 1) and 3) could think that
 the disk is cleaned-up. Instead the 2nd and the 3rd super-blocks still
 exist.

 well, users (and installers) usually use wipefs -a or wipefs -t fsname

   see for example libblkid/src/superblocks/reiserfs.c 
 
 I think that this is a different case: the reiser superblocks are

 it was example how to specify the magic strings in the code

 *alternative*; instead in the btrfs case, *all the three superblocks*
 exist at the same time.

 this is pretty common to have backup superblock (e.g. GPT) or more
 ways how to detect the filesystem (e.g. FAT).

 Please, send me the patch with the magic strings :-)
 
 I really don't want to add dummy filesystems to the library (like you
 did in the first version of the patch) -- it's very bad idea with
 many side effects.

Karel

-- 
 Karel Zak  k...@redhat.com
 http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][v2] Btrfs: wipe all the superblock [redhat bugzilla 889888]

2013-01-09 Thread Goffredo Baroncelli
Hi all,

Currently wipefs doesn't clear all the superblock of btrfs. Only the first 
one is cleared.

Btrfs has three superblocks. The first one is placed at 64KB, the second 
one at 64MB, the third one at 256GB.

If the first superblock is valid except that the magic field is zeroed,
btrfs skips the check of the other superblocks.
If the first superblock is fully invalid, btrfs checks for the other
superblock.

So zeroing the first superblock magic field at the beginning seems
that the filesystem is wiped. But when the first superblock is overwritten
(e.g. by another filesystem), then the other two superblocks may be considered
valid, and the filesystem may resurrect.


# make a filesystem, wipe it and check if it disappears

$ sudo mkfs.btrfs -L Btrfs-test /dev/loop0
$ sudo btrfs filesystem  show /dev/loop0
Label: 'Btrfs-test'  uuid: 3156cef7-8522-411f-876a-ba8ec32cc781
Total devices 1 FS bytes used 28.00KB
devid1 size 4.00TB used 2.04GB path /dev/loop0

Btrfs Btrfs v0.19
$ sudo wipefs /dev/loop0
offset   type

0x10040  btrfs   [filesystem]
 LABEL: Btrfs-test
 UUID:  3156cef7-8522-411f-876a-ba8ec32cc781

$ sudo wipefs /dev/loop0 -a
8 bytes were erased at offset 0x10040 (btrfs)
they were: 5f 42 48 52 66 53 5f 4d
$ sudo btrfs filesystem  show /dev/loop0
Btrfs Btrfs v0.19

# it seems that the filesystem is disappeared # now zero all the 1st superblock

$ sudo dd if=/dev/zero of=/dev/loop0 bs=1 count=4k seek=64k
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.00659795 s, 621 kB/s

# check if the filesystem is resurrected
$ sudo btrfs filesystem  show /dev/loop0
failed to read /dev/sr0
Label: 'Btrfs-test'  uuid: 3156cef7-8522-411f-876a-ba8ec32cc781
Total devices 1 FS bytes used 28.00KB
devid1 size 4.00TB used 2.04GB path /dev/loop0

Btrfs Btrfs v0.19

# it is !!!


# With this patch, wipefs is able to wipe all the superblock:

$ sudo mkfs.btrfs -L Btrfs-test /dev/loop0
$ sudo ~/btrfs/util-linux/wipefs /dev/loop0 offset   type

0x10040  btrfs   [filesystem]
 LABEL: Btrfs-test
 UUID:  5ca3239c-363c-4c28-b831-eac03cc5ca62

$ sudo ~/btrfs/util-linux/wipefs -a /dev/loop0 
/dev/loop0: 8 bytes were erased at offset 0x00010040 (btrfs):   5f 42 48 52 66 
53 5f 4d
/dev/loop0: 8 bytes were erased at offset 0x0440 (btrfs):   5f 42 48 52 66 
53 5f 4d
/dev/loop0: 8 bytes were erased at offset 0x400040 (btrfs): 5f 42 48 52 66 
53 5f 4d

# Now even if we zero the 1st superblock, the filesystem doesn't resurrect

$ sudo dd if=/dev/zero of=/dev/loop0 bs=1 count=4k seek=64k4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.00643427 s, 637 kB/s
$ sudo btrfs filesystem  show /dev/loop0
Btrfs Btrfs v0.19
$

Br 
G.Baroncelli

Signed-off-by: Goffredo Baroncelli kreij...@inwind.it

ChangeLog

V1 - V2Removed the three different superblock and put
all the info in the same array (Thanks to Karel Zak
for the suggestion)


--

Btrfs has three superblock. The first one is placed at 64KB, the
second one at 64MB, the third one at 256GB.

If the first superblock is valid except that the magic field is zeroed,
btrfs skips the check of the other superblocks.
If the first superblock is fully invalid, btrfs checks for the other
superblock.

So zeroing the first superblock magic field at the beginning seems
that the filesystem is wiped. But when the first superblock is overwritten
(eg by another filesystem), then the other two superblock may be considered
valid, and the filesystem may resurrect.

This patch allow to find and wipe the other btrfs superblocks signature.
---
 libblkid/src/superblocks/btrfs.c |8 
 1 file changed, 8 insertions(+)

diff --git a/libblkid/src/superblocks/btrfs.c b/libblkid/src/superblocks/btrfs.c
index 039be42..cb5004f 100644
--- a/libblkid/src/superblocks/btrfs.c
+++ b/libblkid/src/superblocks/btrfs.c
@@ -87,6 +87,14 @@ const struct blkid_idinfo btrfs_idinfo =
.magics =
{
{ .magic = _BHRfS_M, .len = 8, .kboff = 64, .sboff = 0x40 },
+   { .magic = _BHRfS_M, 
+ .len = 8, 
+ .kboff = 64 * 1024, 
+ .sboff = 0x40 },
+   { .magic = _BHRfS_M, 
+ .len = 8, 
+ .kboff = 256 * 1024 * 1024, 
+ .sboff = 0x40 },
{ NULL }
}
 };
-- 
1.7.10.4 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 1/2] Btrfs: add leak debug for extent map

2013-01-09 Thread Liu Bo
On Tue, Jan 08, 2013 at 12:07:34PM -0800, Zach Brown wrote:
  This is for detecting extent map leak.
 
 Hmm, I guess it's cool to get the allocation-specific decoding which you
 don't get from the generic kernel leak tracking?

Hi Zach,

Thanks for the advice, but what allocation-specific decoding do you refer to?
Could you please show me any examples?

 
  +static LIST_HEAD(emaps);
 
  +   while (!list_empty(emaps)) {
  +   em = list_entry(emaps.next, struct extent_map, leak_list);
  +   printk(KERN_ERR btrfs ext map leak: start %llu len %llu block 
  %llu flags %llu refs %d in tree %d compress %d\n,
  +   em-start, em-len, em-block_start, em-flags, 
  atomic_read(em-refs), em-in_tree, em-compress_type);
  +   list_del(em-leak_list);
  +   kmem_cache_free(extent_map_cache, em);
 
  +   struct list_head leak_list;
 
 Might as well protect all that with ifdefs, too, if you're going to do
 it that way?

All right, I'm happy to do that.

Thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix off-by-one in lseek

2013-01-09 Thread Liu Bo
On Wed, Jan 09, 2013 at 12:50:25PM +0100, David Sterba wrote:
 On Wed, Jan 09, 2013 at 12:34:45PM +0800, Liu Bo wrote:
   [20191.948060] D: __set_extent_bit isize = 0 odd range 
   [774144,7384799041917984768)
   [20191.956581] D: clear_extent_bit isize = 0 odd range 
   [774144,7384799041917984768)
   
   so I'm not sending it as a separate patch yet until the check covers all 
   cases.
  
  Thanks for coding this up, I've checked the code, these messages can
  be fixed by the following, please check if it works on your side :)
 
 Thanks, no more of these warnings. There was one new to me, during test
 013:
 
 [  348.433006] [ cut here ]
 [  348.438926] WARNING: at fs/btrfs/disk-io.c:3210 free_fs_root+0x8b/0x90 
 [btrfs]()
 [  348.447596] Hardware name: Santa Rosa platform
 [  348.447602] Modules linked in: aoe dm_crypt loop btrfs
 [  348.447605] Pid: 9091, comm: umount Not tainted 3.8.0-rc2-default+ #229
 [  348.447607] Call Trace:
 [  348.447615]  [8104c6bf] warn_slowpath_common+0x7f/0xc0
 [  348.447619]  [8104c71a] warn_slowpath_null+0x1a/0x20
 [  348.447635]  [a002af3b] free_fs_root+0x8b/0x90 [btrfs]
 [  348.447652]  [a002e75e] btrfs_free_fs_root+0x7e/0x90 [btrfs]
 [  348.447668]  [a002e84b] del_fs_roots+0xdb/0x120 [btrfs]
 [  348.447683]  [a002292e] ? btrfs_free_block_groups+0x29e/0x370 
 [btrfs]
 [  348.447699]  [a0030182] close_ctree+0x1d2/0x340 [btrfs]
 [  348.447705]  [81178c6f] ? dispose_list+0x4f/0x60
 [  348.447711]  [811799d4] ? evict_inodes+0x114/0x130
 [  348.447722]  [a0003c69] btrfs_put_super+0x19/0x20 [btrfs]
 [  348.447727]  [811608f2] generic_shutdown_super+0x62/0xf0
 [  348.447730]  [81160a16] kill_anon_super+0x16/0x30
 [  348.447741]  [a0004d9a] btrfs_kill_super+0x1a/0x90 [btrfs]
 [  348.447744]  [811618e2] ? deactivate_super+0x42/0x70
 [  348.447748]  [81160c6d] deactivate_locked_super+0x3d/0x90
 [  348.447751]  [811618ea] deactivate_super+0x4a/0x70
 [  348.447755]  [8117dc70] mntput_no_expire+0x100/0x160
 [  348.447759]  [8117ecb1] sys_umount+0x71/0x3c0
 [  348.447763]  [81960919] system_call_fastpath+0x16/0x1b
 [  348.447765] ---[ end trace 25a08f78869c0553 ]---
 [  348.614158] VFS: Busy inodes after unmount of sda8. Self-destruct in 5 
 seconds.  Have a nice day...
 
 looks like a leaked inode. The line number does not match a WARN in the
 sources, this one is better:
 
 (gdb) l *(free_fs_root+0x8b)
 0x2ab5b is in free_fs_root (fs/btrfs/disk-io.c:3206).
 3201}
 3202
 3203static void free_fs_root(struct btrfs_root *root)
 3204{
 3205iput(root-cache_inode);
 3206WARN_ON(!RB_EMPTY_ROOT(root-inode_tree));
 3207if (root-anon_dev)
 3208free_anon_bdev(root-anon_dev);
 3209free_extent_buffer(root-node);
 3210free_extent_buffer(root-commit_root);
 
 I've added only the 2 fixes from you, no other change. I'll do another test
 based on current btrfs-next.

Thanks for the report, could you please show me what options you're
using?

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them

2013-01-09 Thread Miao Xie
On Fri, 28 Dec 2012 15:33:38 +0100, David Sterba wrote:
 On Thu, Dec 20, 2012 at 06:09:35PM +0800, Miao Xie wrote:
 --- a/fs/fs-writeback.c
 +++ b/fs/fs-writeback.c
 @@ -1314,7 +1314,6 @@ void writeback_inodes_sb_nr(struct super_block *sb,
  bdi_queue_work(sb-s_bdi, work);
  wait_for_completion(done);
  }
 -EXPORT_SYMBOL(writeback_inodes_sb_nr);
 
 Why do you remove the export? writeback_inodes_sb is exported as well.

As you said below, there is no user now.

 Originally the _nr variant has been introduced for btrfs
 (3259f8bed2f0f57c2fdcdac1b510c3fa319ef97e) and there are no other users
 now, so from that point it would make sense. From the other side, the
 change is not strictly necessary for this patch and keeps the writeback
 API a bit more flexible. I vote for keeping it.

Maybe you are right, I'll send out a new one.

 Otherwise (for the btrfs part),
 Tested-by: David Sterba dste...@suse.cz

Thanks for your test and review.
Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3] Btrfs: flush all dirty inodes if writeback can not start

2013-01-09 Thread Miao Xie
We may try to flush some dirty pages when there is no enough space to reserve.
But it is possible that this operation fails, in order to get enough space to
reserve successfully, we will sync all the delalloc file. This operation is
safe, we needn't worry about the case that the filesystem goes from r/w to r/o.
because the filesystem should guarantee all the dirty pages have been written
into the disk after it becomes readonly, so the sync operation will do nothing
if the filesystem is already readonly. Though it may waste lots of time,
as a corner case, we needn't care.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v2 - v3:
- remove unnecessary btrfs_wait_ordered_extents()

Changelog v1 - v2:
- make the function static
---
 fs/btrfs/extent-tree.c | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b6ed965..93a2bfc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3695,12 +3695,15 @@ static int can_overcommit(struct btrfs_root *root,
return 0;
 }
 
-static int writeback_inodes_sb_nr_if_idle_safe(struct super_block *sb,
-  unsigned long nr_pages,
-  enum wb_reason reason)
+static inline int writeback_inodes_sb_nr_if_idle_safe(struct super_block *sb,
+ unsigned long nr_pages,
+ enum wb_reason reason)
 {
-   if (!writeback_in_progress(sb-s_bdi) 
-   down_read_trylock(sb-s_umount)) {
+   /* the flusher is dealing with the dirty inodes now. */
+   if (writeback_in_progress(sb-s_bdi))
+   return 1;
+
+   if (down_read_trylock(sb-s_umount)) {
writeback_inodes_sb_nr(sb, nr_pages, reason);
up_read(sb-s_umount);
return 1;
@@ -3709,6 +3712,27 @@ static int writeback_inodes_sb_nr_if_idle_safe(struct 
super_block *sb,
return 0;
 }
 
+static void btrfs_writeback_inodes_sb_nr(struct btrfs_root *root,
+unsigned long nr_pages)
+{
+   struct super_block *sb = root-fs_info-sb;
+   int started;
+
+   /* If we can not start writeback, just sync all the delalloc file. */
+   started = writeback_inodes_sb_nr_if_idle_safe(sb, nr_pages,
+ WB_REASON_FS_FREE_SPACE);
+   if (!started) {
+   /*
+* We needn't worry the filesystem going from r/w to r/o though
+* we don't acquire -s_umount mutex, because the filesystem
+* should guarantee the delalloc inodes list be empty after
+* the filesystem is readonly(all dirty pages are written to
+* the disk).
+*/
+   btrfs_start_delalloc_inodes(root, 0);
+   }
+}
+
 /*
  * shrink metadata reservation for delalloc
  */
@@ -3738,13 +3762,12 @@ static void shrink_delalloc(struct btrfs_root *root, 
u64 to_reclaim, u64 orig,
return;
}
 
+   flush = trans ? BTRFS_RESERVE_NO_FLUSH : BTRFS_RESERVE_FLUSH_ALL;
+
while (delalloc_bytes  loops  3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim  PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle_safe(root-fs_info-sb,
-   nr_pages,
-   WB_REASON_FS_FREE_SPACE);
-
+   btrfs_writeback_inodes_sb_nr(root, nr_pages);
/*
 * We need to wait for the async pages to actually start before
 * we do anything.
@@ -3752,10 +3775,6 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
wait_event(root-fs_info-async_submit_wait,
   !atomic_read(root-fs_info-async_delalloc_pages));
 
-   if (!trans)
-   flush = BTRFS_RESERVE_FLUSH_ALL;
-   else
-   flush = BTRFS_RESERVE_NO_FLUSH;
spin_lock(space_info-lock);
if (can_overcommit(root, space_info, orig, flush)) {
spin_unlock(space_info-lock);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs fs not mounting or being identified after power loss.

2013-01-09 Thread Randy Barlow
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/08/2013 10:46 PM, Jordan Windsor wrote:
 kernel is at 3.6.11-1-ARCH

Sorry I don't know much to help you, but I would suggest perhaps using
a newer kernel in the future. It sounds like your FS might be in
trouble as is, but I would recommend using the most recent RC kernel
(3.8-rc3) and restoring your data to a new btrfs from backups. In
general, it's a good idea to stick to pretty recent kernels with btrfs.
Hope this helps!

- -- 
R
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlDuONwACgkQw3vjPfF7QfW5iQCbBYvZym0krlweyntdyqJ8XobG
p+oAniAyDdt20c57WXw5Ewvf1rhDQQ1o
=2rpQ
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them

2013-01-09 Thread Miao Xie
writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read()
with down_read_trylock() because
- If -s_umount is write locked, then the sb is not idle. That is
  writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock.
- writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start
  writeback, it may bring us deadlock problem when doing umount. In order to
  fix the problem, ext4 and btrfs implemented their own writeback functions
  instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant
  code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle().

The name of these two functions is cumbersome, so rename them to
try_to_writeback_inodes_sb(_nr).

This idea came from Christoph Hellwig.
Some code is from the patch of Kamal Mostafa.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1 - v2:
- do not remove EXPORT_SYMBOL of writeback_inodes_sb_br()
---
 fs/btrfs/extent-tree.c| 20 +++-
 fs/ext4/inode.c   |  8 ++--
 fs/fs-writeback.c | 44 
 include/linux/writeback.h |  6 +++---
 4 files changed, 28 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 521e9d4..f31abb1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3689,20 +3689,6 @@ static int can_overcommit(struct btrfs_root *root,
return 0;
 }
 
-static int writeback_inodes_sb_nr_if_idle_safe(struct super_block *sb,
-  unsigned long nr_pages,
-  enum wb_reason reason)
-{
-   if (!writeback_in_progress(sb-s_bdi) 
-   down_read_trylock(sb-s_umount)) {
-   writeback_inodes_sb_nr(sb, nr_pages, reason);
-   up_read(sb-s_umount);
-   return 1;
-   }
-
-   return 0;
-}
-
 /*
  * shrink metadata reservation for delalloc
  */
@@ -3735,9 +3721,9 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
while (delalloc_bytes  loops  3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim  PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle_safe(root-fs_info-sb,
-   nr_pages,
-   WB_REASON_FS_FREE_SPACE);
+   try_to_writeback_inodes_sb_nr(root-fs_info-sb,
+ nr_pages,
+ WB_REASON_FS_FREE_SPACE);
 
/*
 * We need to wait for the async pages to actually start before
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbfe13b..5f6eef7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2512,12 +2512,8 @@ static int ext4_nonda_switch(struct super_block *sb)
/*
 * Start pushing delalloc when 1/2 of free blocks are dirty.
 */
-   if (dirty_blocks  (free_blocks  2 * dirty_blocks) 
-   !writeback_in_progress(sb-s_bdi) 
-   down_read_trylock(sb-s_umount)) {
-   writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
-   up_read(sb-s_umount);
-   }
+   if (dirty_blocks  (free_blocks  2 * dirty_blocks))
+   try_to_writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
 
if (2 * free_blocks  3 * dirty_blocks ||
free_blocks  (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) {
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 310972b..ad3cc46 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1332,47 +1332,43 @@ void writeback_inodes_sb(struct super_block *sb, enum 
wb_reason reason)
 EXPORT_SYMBOL(writeback_inodes_sb);
 
 /**
- * writeback_inodes_sb_if_idle -   start writeback if none underway
+ * try_to_writeback_inodes_sb_nr - try to start writeback if none underway
  * @sb: the superblock
- * @reason: reason why some writeback work was initiated
+ * @nr: the number of pages to write
+ * @reason: the reason of writeback
  *
- * Invoke writeback_inodes_sb if no writeback is currently underway.
+ * Invoke writeback_inodes_sb_nr if no writeback is currently underway.
  * Returns 1 if writeback was started, 0 if not.
  */
-int writeback_inodes_sb_if_idle(struct super_block *sb, enum wb_reason reason)
+int try_to_writeback_inodes_sb_nr(struct super_block *sb,
+ unsigned long nr,
+ enum wb_reason reason)
 {
-   if (!writeback_in_progress(sb-s_bdi)) {
-   down_read(sb-s_umount);
-   writeback_inodes_sb(sb, reason);
-   up_read(sb-s_umount);
+   if (writeback_in_progress(sb-s_bdi))
return 1;
-   } else
+
+   if (!down_read_trylock(sb-s_umount))
return 0;
+
+   writeback_inodes_sb_nr(sb, nr, reason);
+