Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2013-01-31 Thread Alex Lyakas
Thanks for your comments, Miao.

On Thu, Jan 31, 2013 at 4:42 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On Wed, 30 Jan 2013 20:23:22 +0200, Alex Lyakas wrote:
 Hi Miao,
 I was following this thread in the past, but I did not understand it
 fully, maybe you can explain?

  # mkfs.btrfs partition
  # mount partition mnt
  # cd mnt
  # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; 
 done
  # for ((i=0; i4; i++))
   do
   mkdir $i
   for ((j=0; j200; j++))
   do
   btrfs sub snap . $i/$j
   done 
   done

 snapshot creation has a critical section.  Once we copy a given root to
 its snapshot, we're not allowed to change it until the transaction
 is fully committed.

 Is the limitation that if we are creating a snap B of root A, and
 placing the root of B somewhere into the tree of A, then we can do
 this only once per transaction? Does this limitation still exist or
 your fix fixes it?

 The limitation is the snapshoted subvolume can not be changed until the 
 transaction
 is committed. That is we can not insert anything(including the root of B and 
 the
 directory item/index of B) into the tree of A after snap B is created.

 This limitation was fixed.

 Also, according to your reproducer, each btrfs sub snap will
 start/join a transaction, but then it will call
 btrfs_commit_transaction() and not btrfs_commit_transaction_async(),
 so it will wait until the transaction commits. So how it may happen
 that you create more than one snap in the same transaction with your
 reproducer?

 run several tasks, and each task create snapshots repeatedly in its own
 directory.
 (If we create snapshots in the same directory, the i_mutex of the directory
  will make the process serialized)

So I missed the fact that btrfs_start_transaction() can actually join
an existing transaction. So if several threads ask to create snaps in
parallel, we may hit this.


 The reason I am asking, is that I want to try to write code that
 creates several snaps in one transaction and only then commits. Should
 this be possible or there is some limitation, like I mentioned above?

 As far as I know, it is possible, there is no limitation now.

 Thanks
 Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2013-01-30 Thread Alex Lyakas
Hi Miao,
I was following this thread in the past, but I did not understand it
fully, maybe you can explain?

   # mkfs.btrfs partition
   # mount partition mnt
   # cd mnt
   # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; 
  done
   # for ((i=0; i4; i++))
do
mkdir $i
for ((j=0; j200; j++))
do
btrfs sub snap . $i/$j
done 
done
 
  snapshot creation has a critical section.  Once we copy a given root to
  its snapshot, we're not allowed to change it until the transaction
  is fully committed.

Is the limitation that if we are creating a snap B of root A, and
placing the root of B somewhere into the tree of A, then we can do
this only once per transaction? Does this limitation still exist or
your fix fixes it?

Also, according to your reproducer, each btrfs sub snap will
start/join a transaction, but then it will call
btrfs_commit_transaction() and not btrfs_commit_transaction_async(),
so it will wait until the transaction commits. So how it may happen
that you create more than one snap in the same transaction with your
reproducer?

The reason I am asking, is that I want to try to write code that
creates several snaps in one transaction and only then commits. Should
this be possible or there is some limitation, like I mentioned above?

Thanks for your help,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-21 Thread Miao Xie
On Thu, 9 Aug 2012 09:21:29 +0200, David Sterba wrote:
 On Thu, Aug 09, 2012 at 08:48:02AM +0200, David Sterba wrote:
 and down, no problems so far, and the wikipedia test-subvol stresstest
 that caused trouble to one of your patches is also ok. I'll do some more
 testing on other machines and will report problems eventually.
 
 So it won't be so easy :)
 
 The test generated 15+ G of data, ~500 snapshots, then umount and fsck:
 
 lots of
 
 ref mismatch on [9655283712 4096] extent item 1, found 0
 Incorrect local backref count on 9655283712 root 5 owner 589776 offset 110592 
 found 0 wanted 1 back 0x86badf0
 backpointer mismatch on [9655283712 4096]
 owner ref check failed [9655283712 4096]

By debuging, I found it should be a bug of btrfsck.
Could you try this patch?

Thanks
Miao

From 77e9bcaae464354c0b0176631ba51374e3d31cfc Mon Sep 17 00:00:00 2001
From: Miao Xie mi...@cn.fujitsu.com
Date: Tue, 21 Aug 2012 14:16:27 +0800
Subject: [PATCH] Btrfs-progs: fix wrong return value of check_owner_ref()

If we find the block by seach corresponding fs tree, we should return 0,
and tell the caller we pass the check.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 btrfsck.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/btrfsck.c b/btrfsck.c
index 4e91769..57e7b57 100644
--- a/btrfsck.c
+++ b/btrfsck.c
@@ -1954,7 +1954,7 @@ static int check_owner_ref(struct btrfs_root *root,
 
if (buf-start == btrfs_node_blockptr(path.nodes[level + 1],
  path.slots[level + 1]))
-   rec-owner_ref_checked = 1;
+   found = 1;
 
btrfs_release_path(ref_root, path);
return found ? 0 : 1;
-- 
1.7.6.5

 and then
 
 checking fs roots
 root 2854 inode 233882 errors 2500
 root 2880 inode 271639 errors 2200
 
 and it's not finished yet, other types of error may pop up.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-10 Thread Miao Xie
On Thu, 9 Aug 2012 09:21:29 +0200, David Sterba wrote:
 On Thu, Aug 09, 2012 at 08:48:02AM +0200, David Sterba wrote:
 and down, no problems so far, and the wikipedia test-subvol stresstest
 that caused trouble to one of your patches is also ok. I'll do some more
 testing on other machines and will report problems eventually.
 
 So it won't be so easy :)
 
 The test generated 15+ G of data, ~500 snapshots, then umount and fsck:
 
 lots of
 
 ref mismatch on [9655283712 4096] extent item 1, found 0
 Incorrect local backref count on 9655283712 root 5 owner 589776 offset 110592 
 found 0 wanted 1 back 0x86badf0
 backpointer mismatch on [9655283712 4096]
 owner ref check failed [9655283712 4096]
 
 and then
 
 checking fs roots
 root 2854 inode 233882 errors 2500
 root 2880 inode 271639 errors 2200
 
 and it's not finished yet, other types of error may pop up.


Could you try Arne's patch?

[PATCH v2] Btrfs: fix race in run_clustered_refs
http://marc.info/?l=linux-btrfsm=134449329717830w=2

I run test for several times with this patch, and the problem didn't happen.
So it seems this patch can fix the above problem.

Regards
Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-10 Thread Miao Xie
On  thu, 9 Aug 2012 14:04:05 -0400, Chris Mason wrote:
 On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
 If we create several snapshots at the same time, the following BUG_ON() will 
 be
 triggered.

  kernel BUG at fs/btrfs/extent-tree.c:6047!

 Steps to reproduce:
  # mkfs.btrfs partition
  # mount partition mnt
  # cd mnt
  # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
  # for ((i=0; i4; i++))
   do
   mkdir $i
   for ((j=0; j200; j++))
   do
   btrfs sub snap . $i/$j
   done 
   done
 
 snapshot creation has a critical section.  Once we copy a given root to
 its snapshot, we're not allowed to change it until the transaction
 is fully committed.

I knew this critical section. But I think we can kick it away by forcing the
snapshoted tree to do COW.

BTW, I will take a vacation next week, so I can not reply it until the week 
after next.
If it is not urgent, I will continue looking into this problem after I come 
back.

Thanks
Miao

 
 This means that if you're taking a snapshot of root A and storing the
 directory item of the snapshot in root A, you can only do it once per
 transaction with getting into trouble.
 
 Looks like the code doesn't enforce this though.  Dave, could you please
 give this a try:
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index fcc8c21..9e7c621 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1324,6 +1324,8 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
 *trans,
   u64 search_start;
   int ret;
  
 + WARN_ON(root-danger_transid == trans-transid);
 +
   if (trans-transaction != root-fs_info-running_transaction) {
   printk(KERN_CRIT trans %llu running %llu\n,
  (unsigned long long)trans-transid,
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0d195b5..35b5603 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1490,6 +1490,12 @@ struct btrfs_root {
   u64 objectid;
   u64 last_trans;
  
 + /*
 +  * the last transaction that took a snapshot of this
 +  * root.  We're only allowed one snapshot per root per transaction
 +  */
 + u64 snapshot_trans;
 +
   /* data allocations are done in sectorsize units */
   u32 sectorsize;
  
 @@ -1550,6 +1556,13 @@ struct btrfs_root {
  
   int force_cow;
  
 + /*
 +  * this marks the critical section of snapshot creation.  If we
 +  * make any changes to a root after this critical section starts,
 +  * we corrupt the FS.  It is checked by btrfs_cow_block
 +  */
 + u64 danger_transid;
 +
   spinlock_t root_times_lock;
  };
  
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 9df50fa..b20d835 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -524,13 +524,28 @@ static int create_snapshot(struct btrfs_root *root, 
 struct dentry *dentry,
   pending_snapshot-inherit = *inherit;
   *inherit = NULL;/* take responsibility to free it */
   }
 -
 +again:
   trans = btrfs_start_transaction(root-fs_info-extent_root, 5);
   if (IS_ERR(trans)) {
   ret = PTR_ERR(trans);
   goto fail;
   }
  
 + /* we're only allowed to snapshot a given root once per transaction */
 + spin_lock(root-fs_info-trans_lock);
 + if (root-snapshot_trans == trans-transid) {
 + spin_unlock(root-fs_info-trans_lock);
 + ret = btrfs_commit_transaction(trans, 
 root-fs_info-extent_root);
 + if (ret)
 + goto fail;
 + goto again;
 + }
 +
 + root-snapshot_trans = trans-transid;
 +
 + spin_unlock(root-fs_info-trans_lock);
 +
 +
   ret = btrfs_snap_reserve_metadata(trans, pending_snapshot);
   BUG_ON(ret);
  
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 27c2600..4507421 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -1093,6 +1093,7 @@ static noinline int create_pending_snapshot(struct 
 btrfs_trans_handle *trans,
  
   /* see comments in should_cow_block() */
   root-force_cow = 1;
 + root-danger_transid = trans-transid;
   smp_wmb();
  
   btrfs_set_root_node(new_root_item, tmp);
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-10 Thread Chris Mason
On Fri, Aug 10, 2012 at 04:38:47AM -0600, Miao Xie wrote:
 Onthu, 9 Aug 2012 14:04:05 -0400, Chris Mason wrote:
  On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
  If we create several snapshots at the same time, the following BUG_ON() 
  will be
  triggered.
 
 kernel BUG at fs/btrfs/extent-tree.c:6047!
 
  Steps to reproduce:
   # mkfs.btrfs partition
   # mount partition mnt
   # cd mnt
   # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; 
  done
   # for ((i=0; i4; i++))
do
mkdir $i
for ((j=0; j200; j++))
do
btrfs sub snap . $i/$j
done 
done
  
  snapshot creation has a critical section.  Once we copy a given root to
  its snapshot, we're not allowed to change it until the transaction
  is fully committed.
 
 I knew this critical section. But I think we can kick it away by forcing the
 snapshoted tree to do COW.

Yes, it should be possible.

 
 BTW, I will take a vacation next week, so I can not reply it until the week 
 after next.
 If it is not urgent, I will continue looking into this problem after I come 
 back.

No problem, we can take the current patch for now and improve add back
the ability to do multiple snapshots per root later.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread David Sterba
On Thu, Aug 09, 2012 at 11:10:17AM +0800, Miao Xie wrote:
 I chose the 1st way to fix it.

The least I can say now is that it fixed the crash! The approach is
minimalistic and I think we can take it for now. I didn't review it,
only tested with reproducer you described plus modified the numbers up
and down, no problems so far, and the wikipedia test-subvol stresstest
that caused trouble to one of your patches is also ok. I'll do some more
testing on other machines and will report problems eventually.

Thanks!

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread David Sterba
On Thu, Aug 09, 2012 at 08:48:02AM +0200, David Sterba wrote:
 and down, no problems so far, and the wikipedia test-subvol stresstest
 that caused trouble to one of your patches is also ok. I'll do some more
 testing on other machines and will report problems eventually.

So it won't be so easy :)

The test generated 15+ G of data, ~500 snapshots, then umount and fsck:

lots of

ref mismatch on [9655283712 4096] extent item 1, found 0
Incorrect local backref count on 9655283712 root 5 owner 589776 offset 110592 
found 0 wanted 1 back 0x86badf0
backpointer mismatch on [9655283712 4096]
owner ref check failed [9655283712 4096]

and then

checking fs roots
root 2854 inode 233882 errors 2500
root 2880 inode 271639 errors 2200

and it's not finished yet, other types of error may pop up.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Miao Xie
On Thu, 9 Aug 2012 09:21:29 +0200, David Sterba wrote:
 On Thu, Aug 09, 2012 at 08:48:02AM +0200, David Sterba wrote:
 and down, no problems so far, and the wikipedia test-subvol stresstest
 that caused trouble to one of your patches is also ok. I'll do some more
 testing on other machines and will report problems eventually.
 
 So it won't be so easy :)
 
 The test generated 15+ G of data, ~500 snapshots, then umount and fsck:
 
 lots of
 
 ref mismatch on [9655283712 4096] extent item 1, found 0
 Incorrect local backref count on 9655283712 root 5 owner 589776 offset 110592 
 found 0 wanted 1 back 0x86badf0
 backpointer mismatch on [9655283712 4096]
 owner ref check failed [9655283712 4096]
 
 and then
 
 checking fs roots
 root 2854 inode 233882 errors 2500
 root 2880 inode 271639 errors 2200
 
 and it's not finished yet, other types of error may pop up.

Thanks for your test, I'll look into it.

Regards 
Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Josef Bacik
On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
 If we create several snapshots at the same time, the following BUG_ON() will 
 be
 triggered.
 
   kernel BUG at fs/btrfs/extent-tree.c:6047!
 
 Steps to reproduce:
  # mkfs.btrfs partition
  # mount partition mnt
  # cd mnt
  # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
  # for ((i=0; i4; i++))
   do
   mkdir $i
   for ((j=0; j200; j++))
   do
   btrfs sub snap . $i/$j
   done 
   done
 
 The reason is:
 Before transaction commit, some operations changed the fs tree and new tree
 blocks were allocated because of COW. We used the implicit non-shared back
 reference for those newly allocated tree blocks because they were not shared 
 by
 two or more trees.
 
 And then we created the first snapshot for the fs tree, according to the back
 reference rules, we also used implicit back refs for the child tree blocks of
 the root node of the fs tree, now those child nodes/leaves were shared by two
 trees.
 
 Then We didn't deal with the delayed references, and continued to change the 
 fs
 tree(created the second snapshot and inserted the dir item of the new snapshot
 into the fs tree). According to the rules of the back reference, we added full
 back refs for those tree blocks whose parents have be shared by two trees.
 Now some newly allocated tree blocks had two types of the references.
 
 As we know, the delayed reference system handles these delayed references from
 back to front, and the full delayed reference is inserted after the implicit
 ones. So when we dealt with the back references of those newly allocated tree
 blocks, the full references was dealt with at first. And if the first 
 reference
 is a shared back reference and the tree block that the reference points to is
 newly allocated, It would be considered as a tree block which is shared by two
 or more trees when it is allocated and should be a full back reference not a
 implicit one, the flag of its reference also should be set to FULL_BACKREF.
 But in fact, it was a non-shared tree block with a implicit reference at
 beginning, so it was not compulsory to set the flags to FULL_BACKREF. So 
 BUG_ON
 was triggered.
 
 We have several methods to fix this bug:
 1. deal with delayed references after the snapshot is created and before we
change the source tree of the snapshot. This is the easiest and safest way.
 2. modify the sort method of the delayed reference tree, make the full delayed
references be inserted before the implicit ones. It is also very easy, but
I don't know if it will introduce some problems or not.

Thanks for tracking this down, FWIW I like option 2 the most, it would be
intereseting to see if it does actually introduce new issues.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Chris Mason
On Thu, Aug 09, 2012 at 06:23:19AM -0600, Josef Bacik wrote:
 On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
  If we create several snapshots at the same time, the following BUG_ON() 
  will be
  triggered.
  
  kernel BUG at fs/btrfs/extent-tree.c:6047!
  
  Steps to reproduce:
   # mkfs.btrfs partition
   # mount partition mnt
   # cd mnt
   # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
   # for ((i=0; i4; i++))
do
mkdir $i
for ((j=0; j200; j++))
do
btrfs sub snap . $i/$j
done 
done
  
  The reason is:
  Before transaction commit, some operations changed the fs tree and new tree
  blocks were allocated because of COW. We used the implicit non-shared back
  reference for those newly allocated tree blocks because they were not 
  shared by
  two or more trees.
  
  And then we created the first snapshot for the fs tree, according to the 
  back
  reference rules, we also used implicit back refs for the child tree blocks 
  of
  the root node of the fs tree, now those child nodes/leaves were shared by 
  two
  trees.
  
  Then We didn't deal with the delayed references, and continued to change 
  the fs
  tree(created the second snapshot and inserted the dir item of the new 
  snapshot
  into the fs tree). According to the rules of the back reference, we added 
  full
  back refs for those tree blocks whose parents have be shared by two trees.
  Now some newly allocated tree blocks had two types of the references.
  
  As we know, the delayed reference system handles these delayed references 
  from
  back to front, and the full delayed reference is inserted after the implicit
  ones. So when we dealt with the back references of those newly allocated 
  tree
  blocks, the full references was dealt with at first. And if the first 
  reference
  is a shared back reference and the tree block that the reference points to 
  is
  newly allocated, It would be considered as a tree block which is shared by 
  two
  or more trees when it is allocated and should be a full back reference not a
  implicit one, the flag of its reference also should be set to FULL_BACKREF.
  But in fact, it was a non-shared tree block with a implicit reference at
  beginning, so it was not compulsory to set the flags to FULL_BACKREF. So 
  BUG_ON
  was triggered.
  
  We have several methods to fix this bug:
  1. deal with delayed references after the snapshot is created and before we
 change the source tree of the snapshot. This is the easiest and safest 
  way.
  2. modify the sort method of the delayed reference tree, make the full 
  delayed
 references be inserted before the implicit ones. It is also very easy, 
  but
 I don't know if it will introduce some problems or not.
 
 Thanks for tracking this down, FWIW I like option 2 the most, it would be
 intereseting to see if it does actually introduce new issues.  Thanks,

For this release, I like the current patch ;)  Great job tracking it
down Miao.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Josef Bacik
On Thu, Aug 09, 2012 at 07:11:09AM -0600, Chris L. Mason wrote:
 On Thu, Aug 09, 2012 at 06:23:19AM -0600, Josef Bacik wrote:
  On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
   If we create several snapshots at the same time, the following BUG_ON() 
   will be
   triggered.
   
 kernel BUG at fs/btrfs/extent-tree.c:6047!
   
   Steps to reproduce:
# mkfs.btrfs partition
# mount partition mnt
# cd mnt
# for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; 
   done
# for ((i=0; i4; i++))
 do
 mkdir $i
 for ((j=0; j200; j++))
 do
 btrfs sub snap . $i/$j
 done 
 done
   
   The reason is:
   Before transaction commit, some operations changed the fs tree and new 
   tree
   blocks were allocated because of COW. We used the implicit non-shared back
   reference for those newly allocated tree blocks because they were not 
   shared by
   two or more trees.
   
   And then we created the first snapshot for the fs tree, according to the 
   back
   reference rules, we also used implicit back refs for the child tree 
   blocks of
   the root node of the fs tree, now those child nodes/leaves were shared by 
   two
   trees.
   
   Then We didn't deal with the delayed references, and continued to change 
   the fs
   tree(created the second snapshot and inserted the dir item of the new 
   snapshot
   into the fs tree). According to the rules of the back reference, we added 
   full
   back refs for those tree blocks whose parents have be shared by two trees.
   Now some newly allocated tree blocks had two types of the references.
   
   As we know, the delayed reference system handles these delayed references 
   from
   back to front, and the full delayed reference is inserted after the 
   implicit
   ones. So when we dealt with the back references of those newly allocated 
   tree
   blocks, the full references was dealt with at first. And if the first 
   reference
   is a shared back reference and the tree block that the reference points 
   to is
   newly allocated, It would be considered as a tree block which is shared 
   by two
   or more trees when it is allocated and should be a full back reference 
   not a
   implicit one, the flag of its reference also should be set to 
   FULL_BACKREF.
   But in fact, it was a non-shared tree block with a implicit reference at
   beginning, so it was not compulsory to set the flags to FULL_BACKREF. So 
   BUG_ON
   was triggered.
   
   We have several methods to fix this bug:
   1. deal with delayed references after the snapshot is created and before 
   we
  change the source tree of the snapshot. This is the easiest and safest 
   way.
   2. modify the sort method of the delayed reference tree, make the full 
   delayed
  references be inserted before the implicit ones. It is also very easy, 
   but
  I don't know if it will introduce some problems or not.
  
  Thanks for tracking this down, FWIW I like option 2 the most, it would be
  intereseting to see if it does actually introduce new issues.  Thanks,
 
 For this release, I like the current patch ;)  Great job tracking it
 down Miao.
 

Well sure it's much cleaner, but it appears to not work right?

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Chris Mason
On Thu, Aug 09, 2012 at 07:12:47AM -0600, Josef Bacik wrote:
 On Thu, Aug 09, 2012 at 07:11:09AM -0600, Chris L. Mason wrote:
  On Thu, Aug 09, 2012 at 06:23:19AM -0600, Josef Bacik wrote:
   On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
If we create several snapshots at the same time, the following BUG_ON() 
will be
triggered.

kernel BUG at fs/btrfs/extent-tree.c:6047!

Steps to reproduce:
 # mkfs.btrfs partition
 # mount partition mnt
 # cd mnt
 # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; 
done
 # for ((i=0; i4; i++))
  do
  mkdir $i
  for ((j=0; j200; j++))
  do
  btrfs sub snap . $i/$j
  done 
  done

The reason is:
Before transaction commit, some operations changed the fs tree and new 
tree
blocks were allocated because of COW. We used the implicit non-shared 
back
reference for those newly allocated tree blocks because they were not 
shared by
two or more trees.

And then we created the first snapshot for the fs tree, according to 
the back
reference rules, we also used implicit back refs for the child tree 
blocks of
the root node of the fs tree, now those child nodes/leaves were shared 
by two
trees.

Then We didn't deal with the delayed references, and continued to 
change the fs
tree(created the second snapshot and inserted the dir item of the new 
snapshot
into the fs tree). According to the rules of the back reference, we 
added full
back refs for those tree blocks whose parents have be shared by two 
trees.
Now some newly allocated tree blocks had two types of the references.

As we know, the delayed reference system handles these delayed 
references from
back to front, and the full delayed reference is inserted after the 
implicit
ones. So when we dealt with the back references of those newly 
allocated tree
blocks, the full references was dealt with at first. And if the first 
reference
is a shared back reference and the tree block that the reference points 
to is
newly allocated, It would be considered as a tree block which is shared 
by two
or more trees when it is allocated and should be a full back reference 
not a
implicit one, the flag of its reference also should be set to 
FULL_BACKREF.
But in fact, it was a non-shared tree block with a implicit reference at
beginning, so it was not compulsory to set the flags to FULL_BACKREF. 
So BUG_ON
was triggered.

We have several methods to fix this bug:
1. deal with delayed references after the snapshot is created and 
before we
   change the source tree of the snapshot. This is the easiest and 
safest way.
2. modify the sort method of the delayed reference tree, make the full 
delayed
   references be inserted before the implicit ones. It is also very 
easy, but
   I don't know if it will introduce some problems or not.
   
   Thanks for tracking this down, FWIW I like option 2 the most, it would be
   intereseting to see if it does actually introduce new issues.  Thanks,
  
  For this release, I like the current patch ;)  Great job tracking it
  down Miao.
  
 
 Well sure it's much cleaner, but it appears to not work right?

I'm not sure if those are from a different bug.  I'll try to reproduce
as well.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-09 Thread Chris Mason
On Wed, Aug 08, 2012 at 09:10:17PM -0600, Miao Xie wrote:
 If we create several snapshots at the same time, the following BUG_ON() will 
 be
 triggered.
 
   kernel BUG at fs/btrfs/extent-tree.c:6047!
 
 Steps to reproduce:
  # mkfs.btrfs partition
  # mount partition mnt
  # cd mnt
  # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
  # for ((i=0; i4; i++))
   do
   mkdir $i
   for ((j=0; j200; j++))
   do
   btrfs sub snap . $i/$j
   done 
   done

snapshot creation has a critical section.  Once we copy a given root to
its snapshot, we're not allowed to change it until the transaction
is fully committed.

This means that if you're taking a snapshot of root A and storing the
directory item of the snapshot in root A, you can only do it once per
transaction with getting into trouble.

Looks like the code doesn't enforce this though.  Dave, could you please
give this a try:

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index fcc8c21..9e7c621 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1324,6 +1324,8 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
u64 search_start;
int ret;
 
+   WARN_ON(root-danger_transid == trans-transid);
+
if (trans-transaction != root-fs_info-running_transaction) {
printk(KERN_CRIT trans %llu running %llu\n,
   (unsigned long long)trans-transid,
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d195b5..35b5603 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1490,6 +1490,12 @@ struct btrfs_root {
u64 objectid;
u64 last_trans;
 
+   /*
+* the last transaction that took a snapshot of this
+* root.  We're only allowed one snapshot per root per transaction
+*/
+   u64 snapshot_trans;
+
/* data allocations are done in sectorsize units */
u32 sectorsize;
 
@@ -1550,6 +1556,13 @@ struct btrfs_root {
 
int force_cow;
 
+   /*
+* this marks the critical section of snapshot creation.  If we
+* make any changes to a root after this critical section starts,
+* we corrupt the FS.  It is checked by btrfs_cow_block
+*/
+   u64 danger_transid;
+
spinlock_t root_times_lock;
 };
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9df50fa..b20d835 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -524,13 +524,28 @@ static int create_snapshot(struct btrfs_root *root, 
struct dentry *dentry,
pending_snapshot-inherit = *inherit;
*inherit = NULL;/* take responsibility to free it */
}
-
+again:
trans = btrfs_start_transaction(root-fs_info-extent_root, 5);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto fail;
}
 
+   /* we're only allowed to snapshot a given root once per transaction */
+   spin_lock(root-fs_info-trans_lock);
+   if (root-snapshot_trans == trans-transid) {
+   spin_unlock(root-fs_info-trans_lock);
+   ret = btrfs_commit_transaction(trans, 
root-fs_info-extent_root);
+   if (ret)
+   goto fail;
+   goto again;
+   }
+
+   root-snapshot_trans = trans-transid;
+
+   spin_unlock(root-fs_info-trans_lock);
+
+
ret = btrfs_snap_reserve_metadata(trans, pending_snapshot);
BUG_ON(ret);
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 27c2600..4507421 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1093,6 +1093,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 
/* see comments in should_cow_block() */
root-force_cow = 1;
+   root-danger_transid = trans-transid;
smp_wmb();
 
btrfs_set_root_node(new_root_item, tmp);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Btrfs: fix full backref problem when inserting shared block reference

2012-08-08 Thread Miao Xie
If we create several snapshots at the same time, the following BUG_ON() will be
triggered.

kernel BUG at fs/btrfs/extent-tree.c:6047!

Steps to reproduce:
 # mkfs.btrfs partition
 # mount partition mnt
 # cd mnt
 # for ((i=0;i2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
 # for ((i=0; i4; i++))
  do
  mkdir $i
  for ((j=0; j200; j++))
  do
  btrfs sub snap . $i/$j
  done 
  done

The reason is:
Before transaction commit, some operations changed the fs tree and new tree
blocks were allocated because of COW. We used the implicit non-shared back
reference for those newly allocated tree blocks because they were not shared by
two or more trees.

And then we created the first snapshot for the fs tree, according to the back
reference rules, we also used implicit back refs for the child tree blocks of
the root node of the fs tree, now those child nodes/leaves were shared by two
trees.

Then We didn't deal with the delayed references, and continued to change the fs
tree(created the second snapshot and inserted the dir item of the new snapshot
into the fs tree). According to the rules of the back reference, we added full
back refs for those tree blocks whose parents have be shared by two trees.
Now some newly allocated tree blocks had two types of the references.

As we know, the delayed reference system handles these delayed references from
back to front, and the full delayed reference is inserted after the implicit
ones. So when we dealt with the back references of those newly allocated tree
blocks, the full references was dealt with at first. And if the first reference
is a shared back reference and the tree block that the reference points to is
newly allocated, It would be considered as a tree block which is shared by two
or more trees when it is allocated and should be a full back reference not a
implicit one, the flag of its reference also should be set to FULL_BACKREF.
But in fact, it was a non-shared tree block with a implicit reference at
beginning, so it was not compulsory to set the flags to FULL_BACKREF. So BUG_ON
was triggered.

We have several methods to fix this bug:
1. deal with delayed references after the snapshot is created and before we
   change the source tree of the snapshot. This is the easiest and safest way.
2. modify the sort method of the delayed reference tree, make the full delayed
   references be inserted before the implicit ones. It is also very easy, but
   I don't know if it will introduce some problems or not.
3. modify select_delayed_ref() and make it select the implicit delayed reference
   at first. This way is not so good because it may wastes CPU time if we have
   lots of delayed references.
4. set the flags to FULL_BACKREF, this method is a little complex comparing with
   the 1st way. 

I chose the 1st way to fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/transaction.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7ac7cdc..2eafbd2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1119,6 +1119,11 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
ret = btrfs_reloc_post_snapshot(trans, pending);
if (ret)
goto abort_trans;
+
+   ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
+   if (ret)
+   goto abort_trans;
+
ret = 0;
 fail:
kfree(new_root_item);
-- 
1.7.6.5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html