Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-24 Thread liubo
On 05/24/2011 11:56 PM, liubo wrote:
  The problems I hit:
  
  When an inode is dropped from cache (just via iput) and then read in
  again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
  happens the logging code assumes the inode isn't in the log and hits
  -EEXIST if it finds inode items.
  
 
 ok, I just find where the problem addresses.  This is because I've put
 a check between logged_trans and transaction_id, which is inclined to
 filter those that are first logged, and I'm sorry for not taking the
 'iput' stuff into consideration.  And it's easy to fix this, as we
 can just kick this check off and put another check while searching
 'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree,
 it proves that this inode is definitely not in the tree.
 
 So I'd like to make some changes like this patch(_UNTEST_):

I've thought of this problem more and came up with a _better and more 
efficient_ patch.
It will always get BTRFS_I(inode)-logged_trans correct value.

But I'm still trying to test it somehow... :P

Here it is:

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40f6f8f..d22b3bf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1769,12 +1769,9 @@ static int btrfs_finish_ordered_io(struct inode *inode, 
u64 start, u64 end)
add_pending_csums(trans, inode, ordered_extent-file_offset,
  ordered_extent-list);
 
-   ret = btrfs_ordered_update_i_size(inode, 0, ordered_extent);
-   if (!ret) {
-   ret = btrfs_update_inode(trans, root, inode);
-   BUG_ON(ret);
-   } else
-   btrfs_set_inode_last_trans(trans, inode);
+   btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+   ret = btrfs_update_inode(trans, root, inode);
+   BUG_ON(ret);
ret = 0;
 out:
if (nolock) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..92fe5dd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3032,6 +3032,37 @@ out:
return ret;
 }
 
+static int check_logged_trans(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, struct inode *inode)
+{
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_path *path;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   ret = btrfs_search_slot(trans, root,
+   BTRFS_I(inode)-location, path, 0, 0);
+   if (ret) {
+   if (ret  0)
+   ret = 0;
+   goto out;
+   }
+
+   btrfs_unlock_up_safe(path, 1);
+   inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+   struct btrfs_inode_item);
+
+   BTRFS_I(inode)-logged_trans = btrfs_inode_transid(path-nodes[0],
+  inode_item);
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
+
 static int inode_in_log(struct btrfs_trans_handle *trans,
 struct inode *inode)
 {
@@ -3084,6 +3115,18 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle 
*trans,
if (ret)
goto end_no_trans;
 
+   /*
+* After we iput a inode and reread it from disk, logged_trans is 0.
+* However, this inode _may_ still remain in log tree and not be
+* committed yet.
+* So we need to check the log tree to get logged_trans a right value.
+*/
+   if (!BTRFS_I(inode)-logged_trans  root-log_root) {
+   ret = check_logged_trans(trans, root-log_root, inode);
+   if (ret)
+   goto end_no_trans;
+   }
+
if (inode_in_log(trans, inode)) {
ret = BTRFS_NO_LOG_SYNC;
goto end_no_trans;


thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-24 Thread Chris Mason
Excerpts from liubo's message of 2011-05-25 06:21:04 -0400:
 On 05/24/2011 11:56 PM, liubo wrote:
   The problems I hit:
   
   When an inode is dropped from cache (just via iput) and then read in
   again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
   happens the logging code assumes the inode isn't in the log and hits
   -EEXIST if it finds inode items.
   
  
  ok, I just find where the problem addresses.  This is because I've put
  a check between logged_trans and transaction_id, which is inclined to
  filter those that are first logged, and I'm sorry for not taking the
  'iput' stuff into consideration.  And it's easy to fix this, as we
  can just kick this check off and put another check while searching
  'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree,
  it proves that this inode is definitely not in the tree.
  
  So I'd like to make some changes like this patch(_UNTEST_):
 
 I've thought of this problem more and came up with a _better and more 
 efficient_ patch.
 It will always get BTRFS_I(inode)-logged_trans correct value.

Thanks, this makes sense.

 
 But I'm still trying to test it somehow... :P

http://oss.oracle.com/~mason/synctest/synctest.c

I used synctest -F -f -u -n 100 -t 32 .

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-24 Thread liubo
On 05/24/2011 11:56 PM, liubo wrote:
  
  Second, we use the generation number of the super to read in the log
  tree root after a crash.  This doesn't always match the sub trans id and
  so it doesn't always match the transid stored in the btree blocks.
  
  There are a few solutions to this, we can use some of the reserved
  fields in the super for the generation numbers of the roots the super
  points to, and use whichever one is bigger when we read things in.
  
 
 All right, I'm going to dig it more.
 

I've got this resolved via 'log_root_transid' of 'struct btrfs_super_block',
and it looks nice on both syntactic and functional side. :)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ac8d2ac..1006898 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2103,6 +2103,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (btrfs_super_log_root(disk_super) != 0 
!(fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)) {
u64 bytenr = btrfs_super_log_root(disk_super);
+   u64 log_root_transid = btrfs_super_log_root_transid(disk_super);
 
if (fs_devices-rw_devices == 0) {
printk(KERN_WARNING Btrfs log replay required 
@@ -2125,7 +2126,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
log_tree_root-node = read_tree_block(tree_root, bytenr,
  blocksize,
- generation + 1);
+ log_root_transid);
ret = btrfs_recover_log_trees(log_tree_root);
BUG_ON(ret);
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..b304ec1 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2089,6 +2089,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
log_root_tree-node-start);
btrfs_set_super_log_root_level(root-fs_info-super_for_commit,
btrfs_header_level(log_root_tree-node));
+   btrfs_set_super_log_root_transid(root-fs_info-super_for_commit,
+trans-transid);
 
log_root_tree-log_batch = 0;
log_root_tree-log_transid++;

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-23 Thread Chris Mason
Excerpts from Chris Mason's message of 2011-05-19 20:23:29 -0400:
 Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400:
  Introduce a new concept sub transaction,
  the relation between transaction and sub transaction is
  
  transaction A   --- transid = x
 sub trans a(1)   --- sub_transid = x+1
 sub trans a(2)   --- sub_transid = x+2
   ... ...
 sub trans a(n-1) --- sub_transid = x+n-1
 sub trans a(n)   --- sub_transid = x+n
  transaction B   --- transid = x+n+1
   ... ...
  
  And the most important is
  a) a trans handler's transid now gets value from sub transid instead of 
  transid.
  b) when a transaction commits, transid may not added by 1, but depend on the
 biggest sub_transaction of the last neighbour transaction,
 i.e.
  B-transid = a(n)-transid + 1,
  (B-transid - A-transid) = 1
  c) we start a new sub transaction after a fsync.
  
  We also ship some 'trans-transid' to 'trans-transaction-transid' to
  ensure btrfs works well and to get rid of WARNings.
  
  These are used for the new log code.
 
 This is exactly what I had in mind.  I need to read it harder and make
 sure it interacts well with the directory logging code, but I love it.

Ok, I hit a few problems with this, and since the transids are used
everywhere for various reasons, I think we need to wait until 2.6.41.
This code is really very close to right, but we have the delayed inode
work, scrub, and the new inode number allocator all at once.  I'd like
to limit the size of the changes.

The problems I hit:

When an inode is dropped from cache (just via iput) and then read in
again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
happens the logging code assumes the inode isn't in the log and hits
-EEXIST if it finds inode items.

I patched it to just delete away all the logged items if the logged
transid wasn't set, which is probably safest given that we can now reuse
inode numbers.

Second, we use the generation number of the super to read in the log
tree root after a crash.  This doesn't always match the sub trans id and
so it doesn't always match the transid stored in the btree blocks.

There are a few solutions to this, we can use some of the reserved
fields in the super for the generation numbers of the roots the super
points to, and use whichever one is bigger when we read things in.

Liubo, since we'll leave this one for .41, I'll take your smaller patch
that just skips the csum items.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-23 Thread liubo
On 05/23/2011 10:40 AM, Chris Mason wrote:
 Excerpts from Chris Mason's message of 2011-05-19 20:23:29 -0400:
 Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400:
 Introduce a new concept sub transaction,
 the relation between transaction and sub transaction is

 transaction A   --- transid = x
sub trans a(1)   --- sub_transid = x+1
sub trans a(2)   --- sub_transid = x+2
  ... ...
sub trans a(n-1) --- sub_transid = x+n-1
sub trans a(n)   --- sub_transid = x+n
 transaction B   --- transid = x+n+1
  ... ...

 And the most important is
 a) a trans handler's transid now gets value from sub transid instead of 
 transid.
 b) when a transaction commits, transid may not added by 1, but depend on the
biggest sub_transaction of the last neighbour transaction,
i.e.
 B-transid = a(n)-transid + 1,
 (B-transid - A-transid) = 1
 c) we start a new sub transaction after a fsync.

 We also ship some 'trans-transid' to 'trans-transaction-transid' to
 ensure btrfs works well and to get rid of WARNings.

 These are used for the new log code.
 This is exactly what I had in mind.  I need to read it harder and make
 sure it interacts well with the directory logging code, but I love it.
 
 Ok, I hit a few problems with this, and since the transids are used
 everywhere for various reasons, I think we need to wait until 2.6.41.
 This code is really very close to right, but we have the delayed inode
 work, scrub, and the new inode number allocator all at once.  I'd like
 to limit the size of the changes.
 

I agree with this, in fact, I'm a litter worried cause it is such an
important role that the transids are playing in btrfs, which means
to change it is dangerous, so it deserves more test.

 The problems I hit:
 
 When an inode is dropped from cache (just via iput) and then read in
 again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
 happens the logging code assumes the inode isn't in the log and hits
 -EEXIST if it finds inode items.
 

ok, I just find where the problem addresses.  This is because I've put
a check between logged_trans and transaction_id, which is inclined to
filter those that are first logged, and I'm sorry for not taking the
'iput' stuff into consideration.  And it's easy to fix this, as we
can just kick this check off and put another check while searching
'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree,
it proves that this inode is definitely not in the tree.

So I'd like to make some changes like this patch(_UNTEST_):

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..69ddbbd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2569,10 +2569,6 @@ static int prepare_for_merge_items(struct 
btrfs_trans_handle *trans,
int i;
int ret;
 
-   /* There are no relative items of the inode in log. */
-   if (BTRFS_I(inode)-logged_trans  trans-transaction-transid)
-   return 0;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -2622,6 +2618,11 @@ static int prepare_for_merge_items(struct 
btrfs_trans_handle *trans,
 
if (ret  0) {
btrfs_release_path(log, path);
+
+   /* There are no relative items of the inode in log. */
+   if (key.type == BTRFS_INODE_ITEM_KEY)
+   break;
+
continue;
}


 I patched it to just delete away all the logged items if the logged
 transid wasn't set, which is probably safest given that we can now reuse
 inode numbers.
 
 Second, we use the generation number of the super to read in the log
 tree root after a crash.  This doesn't always match the sub trans id and
 so it doesn't always match the transid stored in the btree blocks.
 
 There are a few solutions to this, we can use some of the reserved
 fields in the super for the generation numbers of the roots the super
 points to, and use whichever one is bigger when we read things in.
 

All right, I'm going to dig it more.

 Liubo, since we'll leave this one for .41, I'll take your smaller patch
 that just skips the csum items.
 

ok, I see.  Thank a lot for the review. :)

thanks,
liubo

 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-19 Thread liubo
On 05/20/2011 08:23 AM, Chris Mason wrote:
 Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400:
 Introduce a new concept sub transaction,
 the relation between transaction and sub transaction is

 transaction A   --- transid = x
sub trans a(1)   --- sub_transid = x+1
sub trans a(2)   --- sub_transid = x+2
  ... ...
sub trans a(n-1) --- sub_transid = x+n-1
sub trans a(n)   --- sub_transid = x+n
 transaction B   --- transid = x+n+1
  ... ...

 And the most important is
 a) a trans handler's transid now gets value from sub transid instead of 
 transid.
 b) when a transaction commits, transid may not added by 1, but depend on the
biggest sub_transaction of the last neighbour transaction,
i.e.
 B-transid = a(n)-transid + 1,
 (B-transid - A-transid) = 1
 c) we start a new sub transaction after a fsync.

 We also ship some 'trans-transid' to 'trans-transaction-transid' to
 ensure btrfs works well and to get rid of WARNings.

 These are used for the new log code.
 
 This is exactly what I had in mind.  I need to read it harder and make
 sure it interacts well with the directory logging code, but I love it.
 
 Thanks!
 

It's so great that you like it.  :)

But I must NOTE again:
   Due to the bug which patch 8 fixed, the previous preformance statistics I 
posted sometime ago, 
   like (*SPEED* : 4.7+ Mb/sec), are valueless and cannot be used as a basis 
any more...

Hope that more people can get the patchset tested.

thanks,
liubo

 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html