Re: [RFC PATCH 4/4] btrfs: implement delayed dir index insertion and deletion

2010-12-05 Thread Miao Xie

On thu, 02 Dec 2010 11:33:40 -0500, Chris Mason wrote:

Excerpts from Miao Xie's message of 2010-12-01 03:09:35 -0500:

Compare with Ext3/4, the performance of file creation and deletion on btrfs
is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
such as inode item, directory name item, directory name index and so on.

If we can do some delayed b+ tree insertion or deletion, we can improve the
performance, so we made this patch which implemented delayed directory name
index insertion and deletion.


Many thanks for working on this.  It's a difficult problem and these
patches look very clean.

I think you can get more improvement if you also do this delayed scheme
for the inode items themselves.


I think I can try to do it later.


The hard part of these delayed implementations is always the throttling,

+if (delayed_root-count= root-leafsize / sizeof(*dir_item))
+btrfs_run_delayed_dir_index(trans, root, NULL,
+BTRFS_DELAYED_INSERT_ITEM, 0);
+

Have you experimented with other values here?


Yes, I have tried to set it to other values(such as 40, 100), or remove this 
check.
Though the performance becomes better as the number increases, the reserved 
space
that fs needs become larger, if the fs space is not enough, btrfs will call
shrink_delalloc(), the performance drops rapidly.

Maybe we can use a background work thread to do b+ tree insertion and deletion,
and implement it just like dirty page write back mechanism.


I need to take a hard look at the locking and do some benchmarking on
larger machines.  I'm a little worried about increased lock contention,
but I think we can get around it by breaking up the rbtrees a little
later on if we need to.


Yes, we can create rb-tree for every directory.

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/4] btrfs: implement delayed dir index insertion and deletion

2010-12-02 Thread Chris Mason
Excerpts from Miao Xie's message of 2010-12-01 03:09:35 -0500:
 Compare with Ext3/4, the performance of file creation and deletion on btrfs
 is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
 such as inode item, directory name item, directory name index and so on.
 
 If we can do some delayed b+ tree insertion or deletion, we can improve the
 performance, so we made this patch which implemented delayed directory name
 index insertion and deletion.

Many thanks for working on this.  It's a difficult problem and these
patches look very clean.

I think you can get more improvement if you also do this delayed scheme
for the inode items themselves.

The hard part of these delayed implementations is always the throttling,

+if (delayed_root-count = root-leafsize / sizeof(*dir_item))
+btrfs_run_delayed_dir_index(trans, root, NULL,
+BTRFS_DELAYED_INSERT_ITEM, 0);
+ 

Have you experimented with other values here?

I need to take a hard look at the locking and do some benchmarking on
larger machines.  I'm a little worried about increased lock contention,
but I think we can get around it by breaking up the rbtrees a little
later on if we need to.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] btrfs: implement delayed dir index insertion and deletion

2010-12-01 Thread Miao Xie
Compare with Ext3/4, the performance of file creation and deletion on btrfs
is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
such as inode item, directory name item, directory name index and so on.

If we can do some delayed b+ tree insertion or deletion, we can improve the
performance, so we made this patch which implemented delayed directory name
index insertion and deletion.

Implementation:
- Every transaction has two rb-tree, one is used to manage the directory name
  index which is going to be inserted into b+ tree, and the other is used to
  manage the directory name index which is going to be deleted from b+ tree.
- When we want to insert a directory name index into b+ tree, we just add the
  information into the inserting rb-tree.
  And when the number of directory name index touches the upper limit (The max
  number of the directory name index that can be stored in a leaf), we start
  inserting manipulation and insert those directory name index into the b+ tree.
- When we want to delete a directory name index from the b+ tree, we search it
  in the inserting rb-tree at first. If we look it up, just drop it. If not,
  add the key of it into the deleting rb-tree.
  Similar to the inserting rb-tree, the number of directory name index touches
  the upper limit, we start deleting manipulation and delete those directory
  name index from the b+ tree.
- If we want to commit transaction, we do inserting/deleting manipulation and
  insert/delete all the directory name indexs which are in the rb-tree
  into/from the b+ tree.
- when we want to read directory entry, we will do deleting manipulation and
  delete all the directory name indexs of the file/directory it contains from
  the b+ tree. And then read directory entries in the b+ tree. At the end read
  directory entries which is in the inserting rb-tree.

I did a quick test by the benchmark tool[1] and found we can improve the
performance of file creation by ~9%, and file deletion by ~13%.

Before applying this patch:
Create files:
Total files: 5
Total time: 1.188547
Average time: 0.24
Delete files:
Total files: 5
Total time: 1.662012
Average time: 0.33

After applying this patch:
Create files:
Total files: 5
Total time: 1.083526
Average time: 0.22
Delete files:
Total files: 5
Total time: 1.439360
Average time: 0.29

[1] http://marc.info/?l=linux-btrfsm=128212635122920q=p3

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/Makefile|2 +-
 fs/btrfs/btrfs_inode.h   |2 +
 fs/btrfs/ctree.c |   13 +-
 fs/btrfs/ctree.h |   15 +-
 fs/btrfs/delayed-dir-index.c |  790 ++
 fs/btrfs/delayed-dir-index.h |   92 +
 fs/btrfs/dir-item.c  |   24 +-
 fs/btrfs/extent-tree.c   |   21 ++
 fs/btrfs/inode.c |   80 +++--
 fs/btrfs/transaction.c   |9 +
 fs/btrfs/transaction.h   |2 +
 11 files changed, 995 insertions(+), 55 deletions(-)
 create mode 100644 fs/btrfs/delayed-dir-index.c
 create mode 100644 fs/btrfs/delayed-dir-index.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index a35eb36..1f7696a 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,4 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o \
-  compression.o delayed-ref.o relocation.o
+  compression.o delayed-ref.o relocation.o delayed-dir-index.o
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 6ad63f1..3d03a17 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -162,6 +162,8 @@ struct btrfs_inode {
struct inode vfs_inode;
 };
 
+extern unsigned char btrfs_filetype_table[];
+
 static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
 {
return container_of(inode, struct btrfs_inode, vfs_inode);
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 9ac1715..08f4339 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -38,10 +38,6 @@ static int balance_node_right(struct btrfs_trans_handle 
*trans,
  struct extent_buffer *src_buf);
 static int del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
   struct btrfs_path *path, int level, int slot);
-static int setup_items_for_insert(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root, struct btrfs_path *path,
-   struct btrfs_key *cpu_key, u32 *data_size,
-   u32 total_data, u32 total_size, int nr);
 
 
 struct btrfs_path *btrfs_alloc_path(void)
@@ -3680,11 +3676,10 @@ out:
  * to save stack depth by doing the bulk of the