[PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread Liu Bo
A user reported some weird behaviours,
if we move a file with the noCow flag to a directory without the
noCow flag, the file is now without the flag, but after remount,
we'll find the file's noCow flag comes back.

This is because we missed a proper inode update after inheriting
parent directory's flags,

Reported-by: Marios Titas redneb8...@gmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/inode.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d9984fa..d2e3352 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
old_dentry-d_inode,
old_dentry-d_name.name,
old_dentry-d_name.len);
-   if (!ret)
-   ret = btrfs_update_inode(trans, root, old_inode);
}
if (ret) {
btrfs_abort_transaction(trans, root, ret);
@@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
}
 
fixup_inode_flags(new_dir, old_inode);
+   ret = btrfs_update_inode(trans, root, old_inode);
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto out_fail;
+   }
 
ret = btrfs_add_link(trans, new_dir, old_inode,
 new_dentry-d_name.name,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread Marios Titas
Sorry, but the bug persists even with the above patch.

touch test
chattr +C test
lsattr test
mv test test2
lsattr test2

In the above scenario test2 will not have the C flag.

On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote:
 A user reported some weird behaviours,
 if we move a file with the noCow flag to a directory without the
 noCow flag, the file is now without the flag, but after remount,
 we'll find the file's noCow flag comes back.

 This is because we missed a proper inode update after inheriting
 parent directory's flags,

 Reported-by: Marios Titas redneb8...@gmail.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/inode.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index d9984fa..d2e3352 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
 old_dentry-d_inode,
 old_dentry-d_name.name,
 old_dentry-d_name.len);
 -   if (!ret)
 -   ret = btrfs_update_inode(trans, root, old_inode);
 }
 if (ret) {
 btrfs_abort_transaction(trans, root, ret);
 @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
 }

 fixup_inode_flags(new_dir, old_inode);
 +   ret = btrfs_update_inode(trans, root, old_inode);
 +   if (ret) {
 +   btrfs_abort_transaction(trans, root, ret);
 +   goto out_fail;
 +   }

 ret = btrfs_add_link(trans, new_dir, old_inode,
  new_dentry-d_name.name,
 --
 1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use reserved space for creating a snapshot

2013-02-22 Thread Miao Xie
On fri, 22 Feb 2013 12:33:36 +0800, Liu Bo wrote:
 While inserting dir index and updating inode for a snapshot, we'd
 add delayed items which consume trans-block_rsv, if we don't have
 any space reserved in this trans handle, we either just return or
 reserve space again.
 
 But before creating pending snapshots during committing transaction,
 we've done a release on this trans handle, so we don't have space reserved
 in it at this stage.
 
 What we're using is block_rsv of pending snapshots which has already
 reserved well enough space for both inserting dir index and updating
 inode, so we need to set trans handle to indicate that we have space
 now.
 
 Signed-off-by: Liu Bo bo.li@oracle.com

Reviewed-by: Miao Xie mi...@cn.fujitsu.com

 ---
  fs/btrfs/transaction.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index fc03aa6..5878bb4 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -1063,6 +1063,7 @@ static noinline int create_pending_snapshot(struct 
 btrfs_trans_handle *trans,
  
   rsv = trans-block_rsv;
   trans-block_rsv = pending-block_rsv;
 + trans-bytes_reserved = trans-block_rsv-reserved;
  
   dentry = pending-dentry;
   parent = dget_parent(dentry);
 @@ -1216,6 +1217,7 @@ static noinline int create_pending_snapshot(struct 
 btrfs_trans_handle *trans,
  fail:
   dput(parent);
   trans-block_rsv = rsv;
 + trans-bytes_reserved = 0;
  no_free_objectid:
   kfree(new_root_item);
  root_item_alloc_fail:
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread Marios Titas
Wouldn't though inheriting create all sorts of problems? For instance
check the example that I give in my other responese [1].

[1] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg22396.html

On Fri, Feb 22, 2013 at 4:34 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On  fri, 22 Feb 2013 16:40:35 +0800, Liu Bo wrote:
 On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote:
 Sorry, but the bug persists even with the above patch.

 touch test
 chattr +C test
 lsattr test
 mv test test2
 lsattr test2

 In the above scenario test2 will not have the C flag.

 What do you expect?  IMO it's right that test2 does not have the C flag.

 No, it's not right.
 For the users, they expect the C flag is not lost because they just do
 a rename operation. but fixup_inode_flags() re-sets the flags by the
 parent directory's flag.

 I think we should inherit the flags from the parent just when we create
 a new file/directory, in the other cases, just give a option to the users.
 How do you think about?

 Thanks
 Miao


 This patch ensure that we get the same result after we remount, no more
 the C flag coming back :)

 thanks,
 liubo


 On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote:
 A user reported some weird behaviours,
 if we move a file with the noCow flag to a directory without the
 noCow flag, the file is now without the flag, but after remount,
 we'll find the file's noCow flag comes back.

 This is because we missed a proper inode update after inheriting
 parent directory's flags,

 Reported-by: Marios Titas redneb8...@gmail.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/inode.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index d9984fa..d2e3352 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, 
 struct dentry *old_dentry,
 old_dentry-d_inode,
 old_dentry-d_name.name,
 old_dentry-d_name.len);
 -   if (!ret)
 -   ret = btrfs_update_inode(trans, root, old_inode);
 }
 if (ret) {
 btrfs_abort_transaction(trans, root, ret);
 @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, 
 struct dentry *old_dentry,
 }

 fixup_inode_flags(new_dir, old_inode);
 +   ret = btrfs_update_inode(trans, root, old_inode);
 +   if (ret) {
 +   btrfs_abort_transaction(trans, root, ret);
 +   goto out_fail;
 +   }

 ret = btrfs_add_link(trans, new_dir, old_inode,
  new_dentry-d_name.name,
 --
 1.7.7.6

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread Liu Bo
On Fri, Feb 22, 2013 at 04:10:37AM -0500, Marios Titas wrote:
 You are right, your patch does improve the situation a bit. But it
 still does not address the main issue. To illustrate that, consider
 the following scenario:

Sorry for so much confusion for users.

Please let me explain the following senario, 

 
 touch test
 chattr +C test
 head -c 1048576 /dev/zero  test
 mv test test2
 
 now test2 lost the C flag because it was renamed. But the data in
 test2 was written before it lost the C flag and so the extents do not
 have checksums!
 Also try to clone it with BTRFS_IOC_CLONE. It fails as if it had the C flag:
 
 cp --reflink test2 test3

We don't clone a file when the src(test2) file has NODATASUM and the dest(test3)
file does not have NODATASUM, or vice versa.  This ensures our checksum's valid.

Here,
*  test2 does has NODATASUM because test has NODATASUM, while
*  test3 is a new created file, and we're not with '-o nodatasum' or
   '-o nodatacow' mount options or we don't chattr test3,
   so test3 does not have NODATASUM flags set.

So 'cp' ends up 'INVALID'.

 
 OTOH, if you try to clone over a file with NODATACOW then it works:
 
 touch test3
 chattr +C test3
 cp --reflink test2 test3

Now test3 is with NODATACOW, so the above 'cp' works.

 
 so the file is in an incosistent state: it sometimes behaves as if it
 had the NODATACOW flag and sometimes as if it didn't.

The C flag refers to NODATACOW, this NODATACOW is used to tell btrfs
if we write the file's data on COW mode.

So the failure of 'clone' does not equal to the file is NODATACOW.

Feel free to correct me.

thanks,
liubo

 
 Thanks
 
 On Fri, Feb 22, 2013 at 3:40 AM, Liu Bo bo.li@oracle.com wrote:
  On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote:
  Sorry, but the bug persists even with the above patch.
 
  touch test
  chattr +C test
  lsattr test
  mv test test2
  lsattr test2
 
  In the above scenario test2 will not have the C flag.
 
  What do you expect?  IMO it's right that test2 does not have the C flag.
 
  This patch ensure that we get the same result after we remount, no more
  the C flag coming back :)
 
  thanks,
  liubo
 
 
  On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote:
   A user reported some weird behaviours,
   if we move a file with the noCow flag to a directory without the
   noCow flag, the file is now without the flag, but after remount,
   we'll find the file's noCow flag comes back.
  
   This is because we missed a proper inode update after inheriting
   parent directory's flags,
  
   Reported-by: Marios Titas redneb8...@gmail.com
   Signed-off-by: Liu Bo bo.li@oracle.com
   ---
fs/btrfs/inode.c |7 +--
1 files changed, 5 insertions(+), 2 deletions(-)
  
   diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
   index d9984fa..d2e3352 100644
   --- a/fs/btrfs/inode.c
   +++ b/fs/btrfs/inode.c
   @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, 
   struct dentry *old_dentry,
   old_dentry-d_inode,
   old_dentry-d_name.name,
   old_dentry-d_name.len);
   -   if (!ret)
   -   ret = btrfs_update_inode(trans, root, old_inode);
   }
   if (ret) {
   btrfs_abort_transaction(trans, root, ret);
   @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, 
   struct dentry *old_dentry,
   }
  
   fixup_inode_flags(new_dir, old_inode);
   +   ret = btrfs_update_inode(trans, root, old_inode);
   +   if (ret) {
   +   btrfs_abort_transaction(trans, root, ret);
   +   goto out_fail;
   +   }
  
   ret = btrfs_add_link(trans, new_dir, old_inode,
new_dentry-d_name.name,
   --
   1.7.7.6
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Snapshot Cleaner not Working with inode_cache

2013-02-22 Thread Norbert Scheibner

Am 20.02.2013, 02:14 Uhr, schrieb Liu Bo bo.li@oracle.com:

I think I know why inode_cache keeps us from freeing space, inode_cache  
adds
a cache_inode in each btrfs root, and this cache_inode will be iput at  
the very

last of stage during umount, ie. after we do cleanup work on old
snapshot/subvols, where we free the space.

A remount will force btrfs to do cleanup work on old snapshots during  
mount.


This may explain the situation.

thanks,
liubo


I don't know how long the code behaves that way, but this is
exactly what I see here on debian kernel 3.2.35.

Norbert

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Snapshot Cleaner not Working with inode_cache

2013-02-22 Thread Liu Bo
On Fri, Feb 22, 2013 at 11:16:22AM +0100, Norbert Scheibner wrote:
 Am 20.02.2013, 02:14 Uhr, schrieb Liu Bo bo.li@oracle.com:
 
 I think I know why inode_cache keeps us from freeing space,
 inode_cache adds
 a cache_inode in each btrfs root, and this cache_inode will be
 iput at the very
 last of stage during umount, ie. after we do cleanup work on old
 snapshot/subvols, where we free the space.
 
 A remount will force btrfs to do cleanup work on old snapshots
 during mount.
 
 This may explain the situation.
 
 thanks,
 liubo
 
 I don't know how long the code behaves that way, but this is
 exactly what I see here on debian kernel 3.2.35.

A patch to fix it is now in btrfs-next, so we may not be bitten any more.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread Wang Shilong
From: Wang Shilong wangsl-f...@cn.fujitsu.com

This patch tries to stop users to create/destroy qgroup level 0,
users can only create/destroy qgroup level more than 0.

See the fact:
a subvolume/snapshot qgroup was created automatically
when creating subvolume/snapshot, so creating a qgroup level 0 can't
be a subvolume/snapshot qgroup, the only way to use it is that assigning
subvolume/snapshot qgroup to it, the point is that we don't want to have a
parent qgroup whose level is 0.

So we want to force users to use qgroup with clear relations
which means a parent qgroup's level  child qgroup's level.For example:

  2/0
 /\
/  \
   /\
  1/0   1/1
/ \\
   /   \\   
  / \\
0/2560/2570/258

This pattern of quota is nature and easy for users to understand, otherwise it 
will
make the quota configuration confusing and difficult to maintain.

Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
Acked-by: Miao Xie mi...@cn.fujitsu.com
Cc: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a31cd93..3590c21 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, 
void __user *arg)
goto drop_write;
}
 
-   if (!sa-qgroupid) {
+   if (!(sa-qgroupid  48)) {
ret = -EINVAL;
goto out;
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread Wang Shilong
From: Wang Shilong wangsl-f...@cn.fujitsu.com

This patch tries to stop users to create/destroy qgroup level 0,
users can only create/destroy qgroup level more than 0.

See the fact:
a subvolume/snapshot qgroup was created automatically
when creating subvolume/snapshot, so creating a qgroup level 0 can't
be a subvolume/snapshot qgroup, the only way to use it is that assigning
subvolume/snapshot qgroup to it, the point is that we don't want to have a
parent qgroup whose level is 0.

So we want to force users to use qgroup with clear relations
which means a parent qgroup's level  child qgroup's level.For example:

  2/0
 /\
/  \
   /\
  1/0   1/1
/ \\
   /   \\   
  / \\
0/256 0/2570/258

This pattern of quota is nature and easy for users to understand, otherwise it 
will
make the quota configuration confusing and difficult to maintain.

Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
Acked-by: Miao Xie mi...@cn.fujitsu.com
Cc: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a31cd93..3590c21 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, 
void __user *arg)
goto drop_write;
}
 
-   if (!sa-qgroupid) {
+   if (!(sa-qgroupid  48)) {
ret = -EINVAL;
goto out;
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: create the qgroup that limits root subvolume automatically

2013-02-22 Thread Arne Jansen
On 02/22/13 13:02, Wang Shilong wrote:
 From: Wang Shilong wangsl-f...@cn.fujitsu.com
 
 Creating the root subvolume qgroup when enabling quota,with

Why only create a qgroup for the root subvolume and not for
every existing subvolume?

 this patch,it will be ok to limit the whole filesystem size.

This will not limit the whole filesystem, but only the root
subvolume. To limit the whole filesystem you'd have to create
a level 1 qgroup and add all subvolumes to it.

-Arne

 
 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/qgroup.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index a5c8562..c409096 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -777,6 +777,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct extent_buffer *leaf;
   struct btrfs_key key;
   int ret = 0;
 + struct btrfs_qgroup *qgroup = NULL;
  
   spin_lock(fs_info-qgroup_lock);
   if (fs_info-quota_root) {
 @@ -823,7 +824,18 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
  
   btrfs_mark_buffer_dirty(leaf);
  
 + btrfs_release_path(path);
 + ret = add_qgroup_item(trans, quota_root, BTRFS_FS_TREE_OBJECTID);
 + if (ret)
 + goto out;
 +
   spin_lock(fs_info-qgroup_lock);
 + qgroup = add_qgroup_rb(fs_info, BTRFS_FS_TREE_OBJECTID);
 + if (IS_ERR(qgroup)) {
 + spin_unlock(fs_info-qgroup_lock);
 + ret = PTR_ERR(qgroup);
 + goto out;
 + }
   fs_info-quota_root = quota_root;
   fs_info-pending_quota_state = 1;
   spin_unlock(fs_info-qgroup_lock);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread Arne Jansen
On 02/22/13 13:09, Wang Shilong wrote:
 From: Wang Shilong wangsl-f...@cn.fujitsu.com
 
 This patch tries to stop users to create/destroy qgroup level 0,
 users can only create/destroy qgroup level more than 0.
 
 See the fact:
   a subvolume/snapshot qgroup was created automatically
 when creating subvolume/snapshot, so creating a qgroup level 0 can't
 be a subvolume/snapshot qgroup, the only way to use it is that assigning
 subvolume/snapshot qgroup to it, the point is that we don't want to have a
 parent qgroup whose level is 0.
 
   So we want to force users to use qgroup with clear relations
 which means a parent qgroup's level  child qgroup's level.For example:
 
   2/0
  /\
 /  \
/\
   1/0   1/1
 / \\
/   \\   
   / \\
   0/256 0/2570/258
 
 This pattern of quota is nature and easy for users to understand, otherwise 
 it will
 make the quota configuration confusing and difficult to maintain.

I agree that a strict hierarchy of the levels should be enforced.
Currently the kernel has no idea of 'level', it's just an artificial
concept that lives in userspace. This patch would be the first place
to add that magic shift '48' to the kernel.
In my opinion it would be sufficient to do the enforcement in user
space, as it is of no technical nature.

-Arne

 
 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 Acked-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/ioctl.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index a31cd93..3590c21 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file 
 *file, void __user *arg)
   goto drop_write;
   }
  
 - if (!sa-qgroupid) {
 + if (!(sa-qgroupid  48)) {
   ret = -EINVAL;
   goto out;
   }
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Mace Moneta
https://bugzilla.redhat.com/show_bug.cgi?id=906142

With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
following error.  It can take hours of use before I get a
reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
'-o recovery' to get the filesystem back after a reboot.  No data
appears to be lost, and a scrub runs to completion with no errors.

[14691.074991] WARNING: at fs/btrfs/extent_io.c:4718
map_private_extent_buffer+0xd4/0xe0 [btrfs]()
[14691.074993] Hardware name: C2SEA
[14691.074995] btrfs bad mapping eb start 645984256 len 4096, wanted 4096 8
[14691.074997] Modules linked in: vfat fat usb_storage fuse rfcomm
bnep nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables
w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec
uvcvideo videobuf2_vmalloc snd_hwdep snd_seq snd_seq_device
videobuf2_memops btusb videobuf2_core videodev snd_pcm bluetooth
iTCO_wdt snd_page_alloc media rfkill coretemp snd_timer
iTCO_vendor_support i2c_i801 snd lpc_ich mfd_core soundcore microcode
r8169 mii vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc
i2c_dev uinput btrfs zlib_deflate libcrc32c ata_generic pata_acpi i915
video firewire_ohci i2c_algo_bit firewire_core drm_kms_helper
pata_it8213 crc_itu_t drm i2c_core
[14691.075070] Pid: 1926, comm: encfs Not tainted
3.8.0-0.rc7.git0.1.fc19.x86_64 #1
[14691.075072] Call Trace:
[14691.075093]  [a01a7c00] ?
map_private_extent_buffer+0xb0/0xe0 [btrfs]
[14691.075099]  [8105c210] warn_slowpath_common+0x70/0xa0
[14691.075102]  [8105c28c] warn_slowpath_fmt+0x4c/0x50
[14691.075121]  [a01a7c24] map_private_extent_buffer+0xd4/0xe0 [btrfs]
[14691.075139]  [a019da30] btrfs_set_token_64+0x60/0xf0 [btrfs]
[14691.075159]  [a01be264]
btrfs_log_changed_extents+0x384/0x600 [btrfs]
[14691.075178]  [a01c05b8] btrfs_log_inode+0x3b8/0x660 [btrfs]
[14691.075196]  [a01c1519] btrfs_log_inode_parent+0x169/0x450 [btrfs]
[14691.075216]  [a01c183a] btrfs_log_dentry_safe+0x3a/0x60 [btrfs]
[14691.075234]  [a0198400] btrfs_sync_file+0x150/0x1f0 [btrfs]
[14691.075239]  [811c48c6] do_fsync+0x56/0x80
[14691.075242]  [811c4b50] sys_fsync+0x10/0x20
[14691.075247]  [8163e419] system_call_fastpath+0x16/0x1b
[14691.075253] ---[ end trace 0c19c78181b4038d ]---
[14691.075261] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[14691.075311] IP: [a01a7e23] write_extent_buffer+0xd3/0x150 [btrfs]
[14691.075364] PGD 208a79067 PUD 2089a6067 PMD 0
[14691.075400] Oops:  [#1] SMP
[14691.075425] Modules linked in: vfat fat usb_storage fuse rfcomm
bnep nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables
w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec
uvcvideo videobuf2_vmalloc snd_hwdep snd_seq snd_seq_device
videobuf2_memops btusb videobuf2_core videodev snd_pcm bluetooth
iTCO_wdt snd_page_alloc media rfkill coretemp snd_timer
iTCO_vendor_support i2c_i801 snd lpc_ich mfd_core soundcore microcode
r8169 mii vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc
i2c_dev uinput btrfs zlib_deflate libcrc32c ata_generic pata_acpi i915
video firewire_ohci i2c_algo_bit firewire_core drm_kms_helper
pata_it8213 crc_itu_t drm i2c_core
[14691.076012] CPU 2
[14691.076012] Pid: 1926, comm: encfs Tainted: GW
3.8.0-0.rc7.git0.1.fc19.x86_64 #1 Supermicro C2SEA/C2SEA
[14691.076012] RIP: 0010:[a01a7e23]  [a01a7e23]
write_extent_buffer+0xd3/0x150 [btrfs]
[14691.076012] RSP: 0018:88020b653c20  EFLAGS: 00010202
[14691.076012] RAX:  RBX: 0008 RCX: 0008
[14691.076012] RDX: 1008 RSI: 2681 RDI: 8801316cf988
[14691.076012] RBP: 88020b653c50 R08: 000a R09: 03ea
[14691.076012] R10:  R11: 88020b6538d6 R12: 88020b653c80
[14691.076012] R13: 8801316cf988 R14:  R15: 0008
[14691.076012] FS:  7fd04462b800() GS:880237d0()
knlGS:
[14691.076012] CS:  0010 DS:  ES:  CR0: 80050033
[14691.076012] CR2:  CR3: 0001e7e39000 CR4: 07e0
[14691.076012] DR0:  DR1:  DR2: 
[14691.076012] DR3:  DR6: 0ff0 DR7: 0400
[14691.076012] Process encfs (pid: 1926, threadinfo 88020b652000,
task 8801f0c44620)
[14691.076012] Stack:
[14691.076012]  1000 88020b653d70 8801316cf988
1000
[14691.076012]  0025 0fdb 88020b653cb0
a019dab0
[14691.076012]   880106418000 1000
1000
[14691.076012] Call Trace:
[14691.076012]  [a019dab0] btrfs_set_token_64+0xe0/0xf0 [btrfs]

Re: collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)

2013-02-22 Thread Josef Bacik
On Thu, Feb 21, 2013 at 06:15:49PM -0700, Alexandre Oliva wrote:
 On Feb 21, 2013, Alexandre Oliva ol...@gnu.org wrote:
 
  What I saw in that function also happens to explain why in some cases I
  see filesystems allocate a huge number of chunks that remain unused
  (leading to the scenario above, of not having more chunks to allocate).
  It happens for data and metadata, but not necessarily both.  I'm
  guessing some thread sets the force_alloc flag on the corresponding
  space_info, and then several threads trying to get disk space end up
  attempting to allocate a new chunk concurrently.  All of them will see
  the force_alloc flag and bump their local copy of force up to the level
  they see first, and they won't clear it even if another thread succeeds
  in allocating a chunk, thus clearing the force flag.  Then each thread
  that observed the force flag will, on its turn, force the allocation of
  a new chunk.  And any threads that come in while it does that will see
  the force flag still set and pick it up, and so on.  This sounds like a
  problem to me, but...  what should the correct behavior be?  Clear
  force_flag once we copy it to a local force?  Reset force to the
  incoming value on every loop?
 
 I think a slight variant of the following makes the most sense, so I
 implemented it in the patch below.
 
  Set the flag to our incoming force if we have it at first, clear our
  local flag, and move it from the space_info when we determined that we
  are the thread that's going to perform the allocation?
 
 

 From: Alexandre Oliva ol...@gnu.org
 
 btrfs: consume force_alloc in the first thread to chunk_alloc
 
 Even if multiple threads in do_chunk_alloc look at force_alloc and see
 a force flag, it suffices that one of them consumes the flag.  Arrange
 for an incoming force argument to make to force_alloc in case of
 concurrent calls, so that it is used only by the first thread to get
 to allocation after the initial request.
 
 Signed-off-by: Alexandre Oliva ol...@gnu.org
 ---
  fs/btrfs/extent-tree.c |8 
  1 file changed, 8 insertions(+)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 6ee89d5..66283f7 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
 *trans,
  
  again:
   spin_lock(space_info-lock);
 +
 + /* Bring force_alloc to force and tentatively consume it.  */
   if (force  space_info-force_alloc)
   force = space_info-force_alloc;
 + space_info-force_alloc = CHUNK_ALLOC_NO_FORCE;
 +
   if (space_info-full) {
   spin_unlock(space_info-lock);
   return 0;
 @@ -3586,6 +3590,10 @@ again:
   return 0;
   } else if (space_info-chunk_alloc) {
   wait_for_alloc = 1;
 + /* Reset force_alloc so that it's consumed by the
 +first thread that completes the allocation.  */
 + space_info-force_alloc = force;
 + force = CHUNK_ALLOC_NO_FORCE;

So I understand what you are getting at, but I think you are doing it wrong.  If
we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to
allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc to
our original caller's CHUNK_ALLOC_FORCE.  So we only really care about making
sure a chunk is actually allocated, instead of doing this flag shuffling we
should just do

if (space_info-chunk_alloc) {
spin_unlock(space_info-lock);
wait_event(!space_info-chunk_alloc);
return 0;
}

and that way we don't allocate more chunks than normal.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: clear chunk_alloc flag on retryable failure

2013-02-22 Thread Josef Bacik
On Thu, Feb 21, 2013 at 02:15:14PM -0700, Alexandre Oliva wrote:
 I've experienced filesystem freezes with permanent spikes in the active
 process count for quite a while, particularly on filesystems whose
 available raw space has already been fully allocated to chunks.
 
 While looking into this, I found a pretty obvious error in
 do_chunk_alloc: it sets space_info-chunk_alloc, but if
 btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
 that flag set, which causes any other threads waiting for
 space_info-chunk_alloc to become zero to spin indefinitely.
 
 I haven't double-checked that this patch fixes the failure I've observed
 fully (it's not exactly trivial to trigger), but it surely is a bug and
 the fix is trivial, so...  Please put it in :-)

Yup putting in btrfs-next, thanks.

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: create the qgroup that limits root subvolume automatically

2013-02-22 Thread Shilong Wang
2013/2/22 Arne Jansen sensi...@gmx.net:
 On 02/22/13 13:02, Wang Shilong wrote:
 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 Creating the root subvolume qgroup when enabling quota,with

 Why only create a qgroup for the root subvolume and not for
 every existing subvolume?


Yes,You are right.
Creating all the existed subvolume  qgroup is necessary when enabling
quota since we try to prevent
creating  group level 0...the subvolume/snapshot group should be
operated automatically...

Atfer this work.
I think it is necessary to delete the subvolume/snapshot qgroup  as
the deletion of sub volume/snapshot.

BTW, there is a thing  to think about...
During enabling quota...No new subvolume should be created before the
enabling quota is done.

I will try to implement such functions.

 this patch,it will be ok to limit the whole filesystem size.

 This will not limit the whole filesystem, but only the root
 subvolume. To limit the whole filesystem you'd have to create
 a level 1 qgroup and add all subvolumes to it.


Right, thanks for correcting it...

Thanks,
Wang


 -Arne


 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/qgroup.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index a5c8562..c409096 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -777,6 +777,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct extent_buffer *leaf;
   struct btrfs_key key;
   int ret = 0;
 + struct btrfs_qgroup *qgroup = NULL;

   spin_lock(fs_info-qgroup_lock);
   if (fs_info-quota_root) {
 @@ -823,7 +824,18 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,

   btrfs_mark_buffer_dirty(leaf);

 + btrfs_release_path(path);
 + ret = add_qgroup_item(trans, quota_root, BTRFS_FS_TREE_OBJECTID);
 + if (ret)
 + goto out;
 +
   spin_lock(fs_info-qgroup_lock);
 + qgroup = add_qgroup_rb(fs_info, BTRFS_FS_TREE_OBJECTID);
 + if (IS_ERR(qgroup)) {
 + spin_unlock(fs_info-qgroup_lock);
 + ret = PTR_ERR(qgroup);
 + goto out;
 + }
   fs_info-quota_root = quota_root;
   fs_info-pending_quota_state = 1;
   spin_unlock(fs_info-qgroup_lock);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread Shilong Wang
Hello,

2013/2/22 Arne Jansen sensi...@gmx.net:
 On 02/22/13 13:09, Wang Shilong wrote:
 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 This patch tries to stop users to create/destroy qgroup level 0,
 users can only create/destroy qgroup level more than 0.

 See the fact:
   a subvolume/snapshot qgroup was created automatically
 when creating subvolume/snapshot, so creating a qgroup level 0 can't
 be a subvolume/snapshot qgroup, the only way to use it is that assigning
 subvolume/snapshot qgroup to it, the point is that we don't want to have a
 parent qgroup whose level is 0.

   So we want to force users to use qgroup with clear relations
 which means a parent qgroup's level  child qgroup's level.For example:

   2/0
  /\
 /  \
/\
   1/0   1/1
 / \\
/   \\
   / \\
   0/256 0/2570/258

 This pattern of quota is nature and easy for users to understand, otherwise 
 it will
 make the quota configuration confusing and difficult to maintain.

 I agree that a strict hierarchy of the levels should be enforced.
 Currently the kernel has no idea of 'level', it's just an artificial
 concept that lives in userspace. This patch would be the first place
 to add that magic shift '48' to the kernel.
 In my opinion it would be sufficient to do the enforcement in user
 space, as it is of no technical nature.


...i have made some patches about these work in btrfs-prog, but it has
been not merged...
I will pick up thoses patches and do the other necessary work..

Thanks,
Wang

 -Arne


 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 Acked-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/ioctl.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index a31cd93..3590c21 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file 
 *file, void __user *arg)
   goto drop_write;
   }

 - if (!sa-qgroupid) {
 + if (!(sa-qgroupid  48)) {
   ret = -EINVAL;
   goto out;
   }


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Josef Bacik
On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
 https://bugzilla.redhat.com/show_bug.cgi?id=906142
 
 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
 following error.  It can take hours of use before I get a
 reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
 '-o recovery' to get the filesystem back after a reboot.  No data
 appears to be lost, and a scrub runs to completion with no errors.

Could you do

gdb btrfs.ko
list *(btrfs_log_inode+0x3b8)

and tell me what it says?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


copy on write misconception

2013-02-22 Thread Mike Power
I think I have a misconception of what copy on write in btrfs means for 
individual files.


I had originally thought that I could create a large file:
time dd if=/dev/zero of=10G bs=1G count=10
10+0 records in
10+0 records out
10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s

real1m41.082s
user0m0.000s
sys0m7.792s

Then if I copied this file no blocks would be copied until they are 
written.  Hence the two files would use the same blocks underneath. But 
specifically that copy would be fast.  Since it would only need to write 
some metadata.  But when I copy the file:

time cp 10G 10G2

real3m38.790s
user0m0.124s
sys0m10.709s

Oddly enough it actually takes longer then the initial file creation.  
So I am guessing that the long duration copy of the file is expected and 
that is not one of the virtues of btrfs copy on write.  Does that sound 
right?


I was looking at a virtual machine solution and thought btrfs would be 
great if I could copy the vm disk to a new file at low cost and then 
launch that vm and customize it to my needs.


OS Ubuntu 12.10

Mike Power
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copy on write misconception

2013-02-22 Thread Hugo Mills
On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote:
 I think I have a misconception of what copy on write in btrfs means
 for individual files.
 
 I had originally thought that I could create a large file:
 time dd if=/dev/zero of=10G bs=1G count=10
 10+0 records in
 10+0 records out
 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s
 
 real1m41.082s
 user0m0.000s
 sys0m7.792s
 
 Then if I copied this file no blocks would be copied until they are
 written.  Hence the two files would use the same blocks underneath.
 But specifically that copy would be fast.  Since it would only need
 to write some metadata.  But when I copy the file:
 time cp 10G 10G2
 
 real3m38.790s
 user0m0.124s
 sys0m10.709s
 
 Oddly enough it actually takes longer then the initial file
 creation.  So I am guessing that the long duration copy of the file
 is expected and that is not one of the virtues of btrfs copy on
 write.  Does that sound right?

   You probably want cp --reflink=always, which makes a CoW copy of
the file's metadata only. The resulting files have the semantics of
two different files, but share their blocks until a part of one of
them is modified (at which point, the modified blocks are no longer
shared).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I don't like the look of it,  I tell you. Well, stop --- 
  looking at it, then.  


signature.asc
Description: Digital signature


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Mace Moneta
On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote:
 On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
 https://bugzilla.redhat.com/show_bug.cgi?id=906142

 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
 following error.  It can take hours of use before I get a
 reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
 '-o recovery' to get the filesystem back after a reboot.  No data
 appears to be lost, and a scrub runs to completion with no errors.

 Could you do

 gdb btrfs.ko
 list *(btrfs_log_inode+0x3b8)

 and tell me what it says?  Thanks,

 Josef

# uname -r
3.8.0-0.rc7.git0.1.fc19.x86_64

# gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko

(gdb) list *(btrfs_log_inode+0x3b8)
0x675b8 is in btrfs_log_inode (fs/btrfs/tree-log.c:3633).
3628
3629log_extents:
3630if (fast_search) {
3631btrfs_release_path(dst_path);
3632ret = btrfs_log_changed_extents(trans, root,
inode, dst_path);
3633if (ret) {
3634err = ret;
3635goto out_unlock;
3636}
3637} else {
(gdb)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copy on write misconception

2013-02-22 Thread cwillu
 Then if I copied this file no blocks would be copied until they are written.
 Hence the two files would use the same blocks underneath. But specifically
 that copy would be fast.  Since it would only need to write some metadata.
 But when I copy the file:
 time cp 10G 10G2

cp without arguments still does a regular copy; btrfs does nothing to
de-duplicate writes.

cp --reflink 10G 10G2 will give you the results you expect.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copy on write misconception

2013-02-22 Thread Mike Power

On 02/22/2013 09:16 AM, Hugo Mills wrote:

On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote:

I think I have a misconception of what copy on write in btrfs means
for individual files.

I had originally thought that I could create a large file:
time dd if=/dev/zero of=10G bs=1G count=10
10+0 records in
10+0 records out
10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s

real1m41.082s
user0m0.000s
sys0m7.792s

Then if I copied this file no blocks would be copied until they are
written.  Hence the two files would use the same blocks underneath.
But specifically that copy would be fast.  Since it would only need
to write some metadata.  But when I copy the file:
time cp 10G 10G2

real3m38.790s
user0m0.124s
sys0m10.709s

Oddly enough it actually takes longer then the initial file
creation.  So I am guessing that the long duration copy of the file
is expected and that is not one of the virtues of btrfs copy on
write.  Does that sound right?

You probably want cp --reflink=always, which makes a CoW copy of
the file's metadata only. The resulting files have the semantics of
two different files, but share their blocks until a part of one of
them is modified (at which point, the modified blocks are no longer
shared).

Hugo.


I see, and it works great:
time cp --reflink=always 10G 10G3

real0m0.028s
user0m0.000s
sys0m0.000s

So from the user perspective I might say I want to opt out of this 
feature not optin.  I want all copies by all applications done as a copy 
on write.  But if my understanding is correct that is up to the 
application being called (in this case cp) and how it in turns makes 
calls to the system.


In short I can't remount the btrfs filesystem with some new args that 
says always copy on write files because that is what it already.


Mike Power
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Josef Bacik
On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
  https://bugzilla.redhat.com/show_bug.cgi?id=906142
 
  With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
  following error.  It can take hours of use before I get a
  reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
  '-o recovery' to get the filesystem back after a reboot.  No data
  appears to be lost, and a scrub runs to completion with no errors.
 
  Could you do
 
  gdb btrfs.ko
  list *(btrfs_log_inode+0x3b8)
 
  and tell me what it says?  Thanks,
 
  Josef
 
 # uname -r
 3.8.0-0.rc7.git0.1.fc19.x86_64
 
 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko
 

Sigh sorry, I miseed the other line because of line wrapping, can you do

list *(btrfs_log_changed_extents+0x384)

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Mace Moneta
On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote:
 On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
  https://bugzilla.redhat.com/show_bug.cgi?id=906142
 
  With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
  following error.  It can take hours of use before I get a
  reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
  '-o recovery' to get the filesystem back after a reboot.  No data
  appears to be lost, and a scrub runs to completion with no errors.
 
  Could you do
 
  gdb btrfs.ko
  list *(btrfs_log_inode+0x3b8)
 
  and tell me what it says?  Thanks,
 
  Josef

 # uname -r
 3.8.0-0.rc7.git0.1.fc19.x86_64

 # gdb 
 /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko


 Sigh sorry, I miseed the other line because of line wrapping, can you do

 list *(btrfs_log_changed_extents+0x384)

 Thanks,

 Josef

(gdb) list *(btrfs_log_changed_extents+0x384)
0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731).
2726   generation, 64);
2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct
btrfs_file_extent_item,
2728   disk_num_bytes, 64);
2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item,
2730  offset, 64);
2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item,
2732   num_bytes, 64);
2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item,
2734   ram_bytes, 64);
2735BTRFS_SETGET_FUNCS(file_extent_compression, struct
btrfs_file_extent_item,
(gdb)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Josef Bacik
On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
  On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote:
   On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
   https://bugzilla.redhat.com/show_bug.cgi?id=906142
  
   With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
   following error.  It can take hours of use before I get a
   reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
   '-o recovery' to get the filesystem back after a reboot.  No data
   appears to be lost, and a scrub runs to completion with no errors.
  
   Could you do
  
   gdb btrfs.ko
   list *(btrfs_log_inode+0x3b8)
  
   and tell me what it says?  Thanks,
  
   Josef
 
  # uname -r
  3.8.0-0.rc7.git0.1.fc19.x86_64
 
  # gdb 
  /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko
 
 
  Sigh sorry, I miseed the other line because of line wrapping, can you do
 
  list *(btrfs_log_changed_extents+0x384)
 
  Thanks,
 
  Josef
 
 (gdb) list *(btrfs_log_changed_extents+0x384)
 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731).
 2726   generation, 64);
 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct
 btrfs_file_extent_item,
 2728   disk_num_bytes, 64);
 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item,
 2730  offset, 64);
 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct 
 btrfs_file_extent_item,
 2732   num_bytes, 64);
 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct 
 btrfs_file_extent_item,
 2734   ram_bytes, 64);
 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct
 btrfs_file_extent_item,
 (gdb)

Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup?
Mount options, raid etc.  I'm going to setup encfs up here and hammer it with
fsstress and see if I can reproduce.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Mace Moneta
On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote:
 On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
  On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote:
   On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
   https://bugzilla.redhat.com/show_bug.cgi?id=906142
  
   With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
   following error.  It can take hours of use before I get a
   reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
   '-o recovery' to get the filesystem back after a reboot.  No data
   appears to be lost, and a scrub runs to completion with no errors.
  
   Could you do
  
   gdb btrfs.ko
   list *(btrfs_log_inode+0x3b8)
  
   and tell me what it says?  Thanks,
  
   Josef
 
  # uname -r
  3.8.0-0.rc7.git0.1.fc19.x86_64
 
  # gdb 
  /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko
 
 
  Sigh sorry, I miseed the other line because of line wrapping, can you do
 
  list *(btrfs_log_changed_extents+0x384)
 
  Thanks,
 
  Josef

 (gdb) list *(btrfs_log_changed_extents+0x384)
 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731).
 2726   generation, 64);
 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct
 btrfs_file_extent_item,
 2728   disk_num_bytes, 64);
 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item,
 2730  offset, 64);
 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct 
 btrfs_file_extent_item,
 2732   num_bytes, 64);
 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct 
 btrfs_file_extent_item,
 2734   ram_bytes, 64);
 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct
 btrfs_file_extent_item,
 (gdb)

 Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup?
 Mount options, raid etc.  I'm going to setup encfs up here and hammer it with
 fsstress and see if I can reproduce.  Thanks,

 Josef

The btrfs mount options I'm using are: subvol=home,noatime,autodefrag

The encfs is mounted with default options.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Mace Moneta
On Fri, Feb 22, 2013 at 1:16 PM, Mace Moneta moneta.m...@gmail.com wrote:
 On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote:
 On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
  On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com 
  wrote:
   On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
   https://bugzilla.redhat.com/show_bug.cgi?id=906142
  
   With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
   following error.  It can take hours of use before I get a
   reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with
   '-o recovery' to get the filesystem back after a reboot.  No data
   appears to be lost, and a scrub runs to completion with no errors.
  
   Could you do
  
   gdb btrfs.ko
   list *(btrfs_log_inode+0x3b8)
  
   and tell me what it says?  Thanks,
  
   Josef
 
  # uname -r
  3.8.0-0.rc7.git0.1.fc19.x86_64
 
  # gdb 
  /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko
 
 
  Sigh sorry, I miseed the other line because of line wrapping, can you do
 
  list *(btrfs_log_changed_extents+0x384)
 
  Thanks,
 
  Josef

 (gdb) list *(btrfs_log_changed_extents+0x384)
 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731).
 2726   generation, 64);
 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct
 btrfs_file_extent_item,
 2728   disk_num_bytes, 64);
 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct 
 btrfs_file_extent_item,
 2730  offset, 64);
 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct 
 btrfs_file_extent_item,
 2732   num_bytes, 64);
 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct 
 btrfs_file_extent_item,
 2734   ram_bytes, 64);
 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct
 btrfs_file_extent_item,
 (gdb)

 Ok nothing obvious is jumping out at me, anything specifc to your btrfs 
 setup?
 Mount options, raid etc.  I'm going to setup encfs up here and hammer it with
 fsstress and see if I can reproduce.  Thanks,

 Josef

 The btrfs mount options I'm using are: subvol=home,noatime,autodefrag

 The encfs is mounted with default options.

Oh, and there's no raid data, just a single drive.  I don't do heavy
I/O to the encfs, which may explain why it takes minutes to hours to
recreate.  I have my google-chrome config directory (cache, profile,
passwords, etc.) in the encfs, so it's getting read/written as I
browse.

# btrfs fi show
failed to read /dev/sr0
Label: 'btrfs'  uuid: 057239ee-1cc7-44b2-8fa3-714661dfa7fe
Total devices 1 FS bytes used 39.06GB
devid1 size 455.58GB used 77.04GB path /dev/sda3

Btrfs Btrfs v0.19

# btrfs fi df /home
Data: total=58.01GB, used=38.46GB
System, DUP: total=8.00MB, used=16.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=9.50GB, used=611.59MB
Metadata: total=8.00MB, used=0.00


Btrfs Btrfs v0.19
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copy on write misconception

2013-02-22 Thread cwillu
On Fri, Feb 22, 2013 at 11:41 AM, Mike Power dodts...@gmail.com wrote:
 On 02/22/2013 09:16 AM, Hugo Mills wrote:

 On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote:

 I think I have a misconception of what copy on write in btrfs means
 for individual files.

 I had originally thought that I could create a large file:
 time dd if=/dev/zero of=10G bs=1G count=10
 10+0 records in
 10+0 records out
 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s

 real1m41.082s
 user0m0.000s
 sys0m7.792s

 Then if I copied this file no blocks would be copied until they are
 written.  Hence the two files would use the same blocks underneath.
 But specifically that copy would be fast.  Since it would only need
 to write some metadata.  But when I copy the file:
 time cp 10G 10G2

 real3m38.790s
 user0m0.124s
 sys0m10.709s

 Oddly enough it actually takes longer then the initial file
 creation.  So I am guessing that the long duration copy of the file
 is expected and that is not one of the virtues of btrfs copy on
 write.  Does that sound right?

 You probably want cp --reflink=always, which makes a CoW copy of
 the file's metadata only. The resulting files have the semantics of
 two different files, but share their blocks until a part of one of
 them is modified (at which point, the modified blocks are no longer
 shared).

 Hugo.

 I see, and it works great:
 time cp --reflink=always 10G 10G3

 real0m0.028s
 user0m0.000s
 sys0m0.000s

 So from the user perspective I might say I want to opt out of this feature
 not optin.  I want all copies by all applications done as a copy on write.
 But if my understanding is correct that is up to the application being
 called (in this case cp) and how it in turns makes calls to the system.

 In short I can't remount the btrfs filesystem with some new args that says
 always copy on write files because that is what it already.

There's no copy a file syscall; when a program copies a file, it
opens a new file, and writes all the bytes from the old to the new.
Converting this to a reflink would require btrfs to implement full
de-dup (which is rather expensive), and still wouldn't prevent the
program from reading and writing all 10gb (and so wouldn't be any
faster).

You can set an alias in your shell to make cp --reflink=auto the
default, but that won't affect other programs, nor other users.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug] mkfs.btrfs reports device busy for ext4 mounted disk

2013-02-22 Thread Zach Brown
  Next, since previously we had btrfs on sdb and mkfs.ext4
  does not overwrite super-block mirror 1.. so
 
btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64
 sb_bytenr)
 
  finds btrfs on sdb.

btrfs-progs shouldn't be unconditionally trusting the backup superblocks
if the primary is garbage.  It should only check the backups if the user
specifically asks it to.

  unless I am missing something. wipefs (along with the below patch)
 [PATCH][v2] Btrfs: wipe all the superblock [redhat bugzilla 889888]
  seems to be only solution as of now.

This is good practice and will work around the bug in btrfs-progs for
now.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()

2013-02-22 Thread Josef Bacik
On Fri, Feb 22, 2013 at 11:31:07AM -0700, Mace Moneta wrote:
 On Fri, Feb 22, 2013 at 1:16 PM, Mace Moneta moneta.m...@gmail.com wrote:
  On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote:
  On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote:
  On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote:
   On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote:
   On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com 
   wrote:
On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=906142
   
With 3.8 kernels in Fedora 18, using encfs on btrfs I get the
following error.  It can take hours of use before I get a
reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount 
with
'-o recovery' to get the filesystem back after a reboot.  No data
appears to be lost, and a scrub runs to completion with no errors.
   
Could you do
   
gdb btrfs.ko
list *(btrfs_log_inode+0x3b8)
   
and tell me what it says?  Thanks,
   
Josef
  
   # uname -r
   3.8.0-0.rc7.git0.1.fc19.x86_64
  
   # gdb 
   /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko
  
  
   Sigh sorry, I miseed the other line because of line wrapping, can you do
  
   list *(btrfs_log_changed_extents+0x384)
  
   Thanks,
  
   Josef
 
  (gdb) list *(btrfs_log_changed_extents+0x384)
  0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731).
  2726   generation, 64);
  2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct
  btrfs_file_extent_item,
  2728   disk_num_bytes, 64);
  2729BTRFS_SETGET_FUNCS(file_extent_offset, struct 
  btrfs_file_extent_item,
  2730  offset, 64);
  2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct 
  btrfs_file_extent_item,
  2732   num_bytes, 64);
  2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct 
  btrfs_file_extent_item,
  2734   ram_bytes, 64);
  2735BTRFS_SETGET_FUNCS(file_extent_compression, struct
  btrfs_file_extent_item,
  (gdb)
 
  Ok nothing obvious is jumping out at me, anything specifc to your btrfs 
  setup?
  Mount options, raid etc.  I'm going to setup encfs up here and hammer it 
  with
  fsstress and see if I can reproduce.  Thanks,
 
  Josef
 
  The btrfs mount options I'm using are: subvol=home,noatime,autodefrag
 
  The encfs is mounted with default options.
 
 Oh, and there's no raid data, just a single drive.  I don't do heavy
 I/O to the encfs, which may explain why it takes minutes to hours to
 recreate.  I have my google-chrome config directory (cache, profile,
 passwords, etc.) in the encfs, so it's getting read/written as I
 browse.

So incase I can't reproduce can you build btrfs-next and see if it reproduces on
there?  And if it does perfect I can send you debug patches to apply and such.
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread David Sterba
On Fri, Feb 22, 2013 at 05:34:47PM +0800, Miao Xie wrote:
 Onfri, 22 Feb 2013 16:40:35 +0800, Liu Bo wrote:
  On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote:
  Sorry, but the bug persists even with the above patch.
 
  touch test
  chattr +C test
  lsattr test
  mv test test2
  lsattr test2
 
  In the above scenario test2 will not have the C flag.
  
  What do you expect?  IMO it's right that test2 does not have the C flag.
 
 No, it's not right.
 For the users, they expect the C flag is not lost because they just do
 a rename operation. but fixup_inode_flags() re-sets the flags by the
 parent directory's flag.
 
 I think we should inherit the flags from the parent just when we create
 a new file/directory, in the other cases, just give a option to the users.
 How do you think about?

I agree with that. The COW status of a file should not be changed at all
when renamed. The typical users are database files and vm images, losing
the NOCOW flag just from moving here and back is quite unexpected.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: update inode flags when renaming

2013-02-22 Thread David Sterba
On Fri, Feb 22, 2013 at 04:19:27PM -0500, Marios Titas wrote:
 I think that many end users will find all this very confusing. They
 will never expect that renaming a file will cause it to suddenly lose
 one flag (NODATACOW) while preserving the other (NODATASUM).
 Especially since they cannot explicitly control the NODATASUM flag on
 a per file basis. I think that renaming a file should preserve all
 flags no matter if it's done in the same directory or not. Just like
 it preserves permissions, ownership and inode number.

I agree. For completeness, the other inherited flags/attributes are
compression statuses. Silently changing them on remove may be wrong in
case the file gains the 'never try to compress' flag after some clever
heuristic (which we do not have yet) decides so.

 So I think
 inheriting the flags from the parent on rename is not a good idea
 either. Interestingly enough, files don't lose any of the two flags if
 instead of renaming you link and then unlink the original.

Link does not take the same codepath as new file or rename. A new
directory entry is created and link count increased.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug] mkfs.btrfs reports device busy for ext4 mounted disk

2013-02-22 Thread David Sterba
On Fri, Feb 22, 2013 at 11:03:25AM -0800, Zach Brown wrote:
   Next, since previously we had btrfs on sdb and mkfs.ext4
   does not overwrite super-block mirror 1.. so
  
 btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64
  sb_bytenr)
  
   finds btrfs on sdb.
 
 btrfs-progs shouldn't be unconditionally trusting the backup superblocks
 if the primary is garbage.  It should only check the backups if the user
 specifically asks it to.

Agreed. Let me add that all the rescue tools should accept a parameter
to pick the backup superblocks. Currently fsck -s, select-super -s,
restore -u (though I'd like see all the option names unified, 'S' is my
candidate that would not break compatibility).

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread David Sterba
On Sat, Feb 23, 2013 at 12:39:24AM +0800, Shilong Wang wrote:
 Hello,
 
 2013/2/22 Arne Jansen sensi...@gmx.net:
  On 02/22/13 13:09, Wang Shilong wrote:
  From: Wang Shilong wangsl-f...@cn.fujitsu.com
 
  This patch tries to stop users to create/destroy qgroup level 0,
  users can only create/destroy qgroup level more than 0.
 
  See the fact:
a subvolume/snapshot qgroup was created automatically
  when creating subvolume/snapshot, so creating a qgroup level 0 can't
  be a subvolume/snapshot qgroup, the only way to use it is that assigning
  subvolume/snapshot qgroup to it, the point is that we don't want to have a
  parent qgroup whose level is 0.
 
So we want to force users to use qgroup with clear relations
  which means a parent qgroup's level  child qgroup's level.For example:
 
2/0
   /\
  /  \
 /\
1/0   1/1
  / \\
 /   \\
/ \\
0/256 0/2570/258
 
  This pattern of quota is nature and easy for users to understand, 
  otherwise it will
  make the quota configuration confusing and difficult to maintain.
 
  I agree that a strict hierarchy of the levels should be enforced.
  Currently the kernel has no idea of 'level', it's just an artificial
  concept that lives in userspace. This patch would be the first place
  to add that magic shift '48' to the kernel.
  In my opinion it would be sufficient to do the enforcement in user
  space, as it is of no technical nature.
 
 
 ...i have made some patches about these work in btrfs-prog, but it has
 been not merged...
 I will pick up thoses patches and do the other necessary work..

This one?

https://patchwork.kernel.org/patch/2008591/

went through integration branch into progs' master.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] btrfs: clean snapshots one by one

2013-02-22 Thread David Sterba
On Sun, Feb 17, 2013 at 09:55:23PM +0200, Alex Lyakas wrote:
  --- a/fs/btrfs/disk-io.c
  +++ b/fs/btrfs/disk-io.c
  @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg)
  struct btrfs_root *root = arg;
 
  do {
  +   int again = 0;
  +
  if (!(root-fs_info-sb-s_flags  MS_RDONLY) 
  mutex_trylock(root-fs_info-cleaner_mutex)) {
  btrfs_run_delayed_iputs(root);
  -   btrfs_clean_old_snapshots(root);
  +   again = btrfs_clean_one_deleted_snapshot(root);
  mutex_unlock(root-fs_info-cleaner_mutex);
  btrfs_run_defrag_inodes(root-fs_info);
  }
 
  -   if (!try_to_freeze()) {
  +   if (!try_to_freeze()  !again) {
  set_current_state(TASK_INTERRUPTIBLE);
  if (!kthread_should_stop())
  schedule();
  @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root)
 
  mutex_lock(root-fs_info-cleaner_mutex);
  btrfs_run_delayed_iputs(root);
  -   btrfs_clean_old_snapshots(root);
  mutex_unlock(root-fs_info-cleaner_mutex);
  +   wake_up_process(root-fs_info-cleaner_kthread);
 I am probably missing something, but if the cleaner wakes up here,
 won't it attempt cleaning the next snap? Because I don't see the
 cleaner checking anywhere that we are unmounting. Or at this point
 dead_roots is supposed to be empty?

No, you're right, the check of umount semaphore is missing (was in the
dusted patchset and was titled 'avoid cleaner deadlock' which we solve
now in another way, so I did not realize the patch is actually needed).
So, this hunk should do it:

--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1627,11 +1627,13 @@ static int cleaner_kthread(void *arg)
int again = 0;

if (!(root-fs_info-sb-s_flags  MS_RDONLY) 
+   down_read_trylock(root-fs_info-sb-s_umount) 
mutex_trylock(root-fs_info-cleaner_mutex)) {
btrfs_run_delayed_iputs(root);
again = btrfs_clean_one_deleted_snapshot(root);
mutex_unlock(root-fs_info-cleaner_mutex);
btrfs_run_defrag_inodes(root-fs_info);
+   up_read(root-fs_info-sb-s_umount);
}

if (!try_to_freeze()  !again) {
---

Seems that also checking for btrfs_fs_closing != 0 would help here.

And to the second part, no dead_roots is not supposed to be empty.

  @@ -1783,31 +1783,50 @@ cleanup_transaction:
   }
 
   /*
  - * interface function to delete all the snapshots we have scheduled for 
  deletion
  + * return  0 if error
  + * 0 if there are no more dead_roots at the time of call
  + * 1 there are more to be processed, call me again
  + *
  + * The return value indicates there are certainly more snapshots to 
  delete, but
  + * if there comes a new one during processing, it may return 0. We don't 
  mind,
  + * because btrfs_commit_super will poke cleaner thread and it will process 
  it a
  + * few seconds later.
*/
  -int btrfs_clean_old_snapshots(struct btrfs_root *root)
  +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
   {
  -   LIST_HEAD(list);
  +   int ret;
  +   int run_again = 1;
  struct btrfs_fs_info *fs_info = root-fs_info;
 
  +   if (root-fs_info-sb-s_flags  MS_RDONLY) {
  +   pr_debug(G btrfs: cleaner called for RO fs!\n);
  +   return 0;
  +   }
  +
  spin_lock(fs_info-trans_lock);
  -   list_splice_init(fs_info-dead_roots, list);
  +   if (list_empty(fs_info-dead_roots)) {
  +   spin_unlock(fs_info-trans_lock);
  +   return 0;
  +   }
  +   root = list_first_entry(fs_info-dead_roots,
  +   struct btrfs_root, root_list);
  +   list_del(root-root_list);
  spin_unlock(fs_info-trans_lock);
 
  -   while (!list_empty(list)) {
  -   int ret;
  -
  -   root = list_entry(list.next, struct btrfs_root, root_list);
  -   list_del(root-root_list);
  +   pr_debug(btrfs: cleaner removing %llu\n,
  +   (unsigned long long)root-objectid);
 
  -   btrfs_kill_all_delayed_nodes(root);
  +   btrfs_kill_all_delayed_nodes(root);
 
  -   if (btrfs_header_backref_rev(root-node) 
  -   BTRFS_MIXED_BACKREF_REV)
  -   ret = btrfs_drop_snapshot(root, NULL, 0, 0);
  -   else
  -   ret =btrfs_drop_snapshot(root, NULL, 1, 0);
  -   BUG_ON(ret  0);
  -   }
  -   return 0;
  +   if (btrfs_header_backref_rev(root-node) 
  +   BTRFS_MIXED_BACKREF_REV)
  +   ret = btrfs_drop_snapshot(root, NULL, 0, 

Re: Rebalancing RAID1

2013-02-22 Thread Fredrik Tolf

On Mon, 18 Feb 2013, Stefan Behrens wrote:

On Fri, 15 Feb 2013 22:56:19 +0100 (CET), Fredrik Tolf wrote:

The oops cut can be found here:
http://www.dolda2000.com/~fredrik/tmp/btrfs-oops


This scrub issue is fixed since Linux 3.8-rc1 with commit
4ded4f6 Btrfs: fix BUG() in scrub when first superblock reading gives EIO


I see, thanks!

Rebooting the system did get me running again, allowing me to remove the 
missing device from filesystem. However, I encountered a couple of 
somewhat strange happenings as I did that. I don't know if they're 
considered bugs or not, but I thought I had best report them.


To begin with, the act of removing the missing device from the filesystem 
itself caused the resynchronization to the new device to happen in 
blocking mode, so the btrfs device delete missing operation took about a 
day to finish. My expectation would have been that the device removal 
would have been a fast operation and that I would have had to scrub the 
filesystem or something in order to resynchronize, but I can see how this 
would be intented behavior.


However, what's weirder is that while the resynchronization was underway, 
I couldn't mount subvolumes on other mountpoints. The mount commands 
blocked (disk-slept) until the entire synchronization was done, and I 
don't think this was intended behavior, because I had the kernel saying 
the following while it happened:


Feb 16 06:01:27 nerv kernel: [ 3482.512106] INFO: task mount:3525 blocked for 
more than 120 seconds.
Feb 16 06:01:28 nerv kernel: [ 3482.518484] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Feb 16 06:01:28 nerv kernel: [ 3482.526324] mount   D 88003e220e40  
   0  3525   3524 0x
Feb 16 06:01:28 nerv kernel: [ 3482.533587]  88003e220e40 0082 
a0067470 88003e2300c0
Feb 16 06:01:28 nerv kernel: [ 3482.541088]  00013b40 88001126dfd8 
00013b40 88001126dfd8
Feb 16 06:01:28 nerv kernel: [ 3482.548584]  00013b40 88003e220e40 
00013b40 88001126c010
Feb 16 06:01:28 nerv kernel: [ 3482.556280] Call Trace:
Feb 16 06:01:28 nerv kernel: [ 3482.558776]  [81396132] ? 
__mutex_lock_common+0x10d/0x175
Feb 16 06:01:28 nerv kernel: [ 3482.565078]  [81396260] ? 
mutex_lock+0x1a/0x2c
Feb 16 06:01:28 nerv kernel: [ 3482.570661]  [a05a38c2] ? 
btrfs_scan_one_device+0x40/0x133 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.577752]  [a0564e8b] ? 
btrfs_mount+0x1c4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.584080]  [810e56cb] ? 
pcpu_next_pop+0x37/0x43
Feb 16 06:01:28 nerv kernel: [ 3482.589709]  [810e52c0] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.595226]  [811012aa] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.601345]  [81113778] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.606595]  [811291f7] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.612292]  [a056516b] ? 
btrfs_mount+0x4a4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.618673]  [810e52c0] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.624178]  [811012aa] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.630347]  [81113778] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.635580]  [811291f7] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.641258]  [811292e0] ? 
do_kern_mount+0x49/0xd6
Feb 16 06:01:29 nerv kernel: [ 3482.646855]  [81129a98] ? 
do_mount+0x72b/0x791
Feb 16 06:01:29 nerv kernel: [ 3482.652186]  [81129b86] ? 
sys_mount+0x88/0xc3
Feb 16 06:01:29 nerv kernel: [ 3482.657464]  [8139d229] ? 
system_call_fastpath+0x16/0x1b

Furthermore, it struck me that the consequences of having to mount a 
filesystem with missing deviced with -o degraded can be a bit strange. I 
realize what the intentions of the behavior is, of course, but I think it 
might cause quite some difficulties when trying to mount a degraded btrfs 
filesystem as root on a system that you don't have physical access to, 
like a hosted server, because it might be hard to manipulate the boot 
process so as to pass that mountflag to the initrd. Note that this is not 
a problem with md-raid; it will simply assemble its arrays in degraded 
mode automatically, without intervention. I'm not necessarily saying 
that's better, but I thought I should bring up the point.


--

Fredrik Tolf
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Changing allocation mode

2013-02-22 Thread Fredrik Tolf

Dear list,

I'm still in the process of transferring all the data I have to the btrfs 
filesystem I have had your help in debugging in a previous thread, and I 
have a slight question, if you will humour me.


I have the data I want to transfer on an old ReiserFS partition, 
consisting of 2 mdraid mirrors, one of which consists of two 1.5 TB disks, 
and the other of two 3 TB disks. The btrfs I'm copying the data to 
consists of two 3 TB disks only that I have put in RAID-1 mode, and the 
data on the old filesystem is only slightly larger than 3 TB. I am now at 
the point where I have transferred just under 3 TB.


If I were transferring the data to a new filesystem on mdraid, the 
procedure I would use for that last portion of the data would be to remove 
one disk only from either of the old mdraid mirror arrays (putting that 
array in degraded mode), and then create a new mirror in degraded mode 
with only that disk, add that mirror to the new filesystem, expand it, 
copy the last data, and then delete the old mirrors, moving the rest of 
the disks to the new filesystem.


Is there a way to mirror this procedure in btrfs? I'm not yet quite so 
familiar with all btrfs concepts that I know quite what I'm talking about, 
but I'm guessing that what I want to do is to merely temporarily set the 
allocator to allocate new btrfs on a single disk only, and then add a 
single disk to the filesystem. And then copy the rest of the data, abandon 
the old filesystem and add another disk and rebalance those 
singly-allocated extents to RAID-1 mode.


Have I described a conceptionable idea in saying so? And if so, how does 
one actually do that? I don't know if I'm just blind, but I haven't found 
any btrfs command to change the allocation algorithm without having to 
rebalance the existing data, which seems a bit unnecessary in this case.


Thanks for any help you can offer!

--

Fredrik Tolf
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Tests] xfs test[298]: Btrfs Quota testing

2013-02-22 Thread Dave Chinner
On Fri, Feb 22, 2013 at 11:24:04AM +0530, Hemanth Kumar wrote:
 
 Signed-off-by: Hemanth Kumar hemanthkuma...@gmail.com

Description?

 ---
  298 | 37 +
  298.out | 12 
  2 files changed, 49 insertions(+)
  create mode 100644 298
  create mode 100644 298.out

You didn't actaully run this in the xfstests, harness, did you? i.e.

$ ./check 298

because it won't run as there is no entry in the group file for this
test

 
 diff --git a/298 b/298
 new file mode 100644
 index 000..d699fb7
 --- /dev/null
 +++ b/298
 @@ -0,0 +1,37 @@
 +
 +#! /bin/bash
 +# FS QA Test No. 298

Newline at top of patch.

 +#
 +# Test btrfs's quotas
 +#
 +#--
 +#
 +# creator
 +#owner=hemanthkuma...@gmail.com

Copyright statement?

 +
 +
 +seq=`basename $0`
 +echo QA output created by $seq
 +
 +here=`pwd`
 +tmp=/tmp/$$
 +status=1# failure is the default!
 +
 +_cleanup()
 +{
 +rm -rf $tmp.*
 +}
 +
 +trap _cleanup ; exit \$status 0 1 2 3 15
 +
 +#Enabeling btrfs qutas

Where are all the usual _require() statements?

 +btrfs quota enable $TEST_DIR
 +echo quota enabled on $TEST_DEV

That won't work - you're not allowed to output actual device names
into the golden output. We always filter then the to
TEST_DEV/TEST_DIR or SCRATCH_DEV/SCRATCH_MNT

 +btrfs subvolume create $TEST_DIR/vol1
 +echo vol1 created
 +btrfs qgroup show $TEST_DIR

That output will change for different filesystem configurations.
needs filtering, or dropping.

 +btrfs qgroup limit 2m $TEST_DIR/vol1
 +echo qgroup limited to 2mb
 +dd if=$TEST_DEV of=$TEST_DIR/vol1/file1 bs=3M count=1

You need to filter the output of dd to remove all variable data.
However, it is preferable to use xfs_io for doing IO. i.e:

$XFS_IO_PROG -f -c pwrite 0 3m $TEST_DIR/vol1/file1 | filter_io

 +echo tried to write 3m worth data
 +exit

You never set status=0, so the test will always fail.

Also, you don't undo any of the modifications you made to the
TEST_DEV, which means that it will affect all subsequent tests. If
you are doing specific configuration tests, your should be using the
SCRATCH_DEV/SCRATCH_MNT

 diff --git a/298.out b/298.out
 new file mode 100644
 index 000..344ab7f
 --- /dev/null
 +++ b/298.out
 @@ -0,0 +1,12 @@
 +QA output created by 298
 +quota enabled on /dev/sdb5
 +Create subvolume '/test/vol1'
 +vol1 created
 +0/257 4096 4096
 +qgroup limited to 2mb
 +dd: writing ‘/test/vol1/file1’: Disk quota exceeded

You've got environment specific characters in your output.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use

2013-02-22 Thread Shilong Wang
Hello, David

2013/2/23 David Sterba dste...@suse.cz:
 On Sat, Feb 23, 2013 at 12:39:24AM +0800, Shilong Wang wrote:
 Hello,

 2013/2/22 Arne Jansen sensi...@gmx.net:
  On 02/22/13 13:09, Wang Shilong wrote:
  From: Wang Shilong wangsl-f...@cn.fujitsu.com
 
  This patch tries to stop users to create/destroy qgroup level 0,
  users can only create/destroy qgroup level more than 0.
 
  See the fact:
a subvolume/snapshot qgroup was created automatically
  when creating subvolume/snapshot, so creating a qgroup level 0 can't
  be a subvolume/snapshot qgroup, the only way to use it is that assigning
  subvolume/snapshot qgroup to it, the point is that we don't want to have a
  parent qgroup whose level is 0.
 
So we want to force users to use qgroup with clear relations
  which means a parent qgroup's level  child qgroup's level.For example:
 
2/0
   /\
  /  \
 /\
1/0   1/1
  / \\
 /   \\
/ \\
0/256 0/2570/258
 
  This pattern of quota is nature and easy for users to understand, 
  otherwise it will
  make the quota configuration confusing and difficult to maintain.
 
  I agree that a strict hierarchy of the levels should be enforced.
  Currently the kernel has no idea of 'level', it's just an artificial
  concept that lives in userspace. This patch would be the first place
  to add that magic shift '48' to the kernel.
  In my opinion it would be sufficient to do the enforcement in user
  space, as it is of no technical nature.
 

 ...i have made some patches about these work in btrfs-prog, but it has
 been not merged...
 I will pick up thoses patches and do the other necessary work..

 This one?

 https://patchwork.kernel.org/patch/2008591/

 went through integration branch into progs' master.

Yes, it is.However, more work needs done to make it work well..
I'd continue my work based on integration-20130126..

Thanks,
Wang

 david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html