Re: btrfs hang in flush-btrfs-5
On 07/11/2011 05:21 PM, Jeremy Sanders wrote: Josef Bacik wrote: On 07/11/2011 07:40 AM, Jeremy Sanders wrote: Jeremy Sanders wrote: Hi - I'm trying btrfs with kernel 2.6.38.8-32.fc15.x86_64 (a Fedora kernel). I'm just doing a tar-to-tar copy onto the file system with compress- force=zlib. Here are some traces of the stuck processes. I've managed to reproduce the hang using the latest btrfs from the repository. I had to remove some of the tracing lines to get it to compile under 2.6.38.8 and an ioctl which wasn't defined. Here is is where it is stuck: Hrm well that is just unlikely and hard to hit. Will you try this and see if it helps you? Thanks, It's got quite a bit further past than where it got before and hasn't crashed yet. I will let you know when it has finished ok. I see that the btrfs-delalloc (rather than endio-write) thread is taking up 100% of CPU and the write speed seems to have dropped during the copying, however. The copy started with using endio-write fully on both cores and now is using dealloc a lot. When you see that can you get sysrq+w or sysrq+t to get a stacktrace of what it's doing so I can see if it's something that can be fixed. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Delayed inode operations not doing the right thing with enospc
On 07/12/2011 11:20 AM, Christian Brunner wrote: 2011/6/7 Josef Bacik jo...@redhat.com: On 06/06/2011 09:39 PM, Miao Xie wrote: On fri, 03 Jun 2011 14:46:10 -0400, Josef Bacik wrote: I got a lot of these when running stress.sh on my test box This is because use_block_rsv() is having to do a reserve_metadata_bytes(), which shouldn't happen as we should have reserved enough space for those operations to complete. This is happening because use_block_rsv() will call get_block_rsv(), which if root-ref_cows is set (which is the case on all fs roots) we will use trans-block_rsv, which will only have what the current transaction starter had reserved. What needs to be done instead is we need to have a block reserve that any reservation that is done at create time for these inodes is migrated to this special reserve, and then when you run the delayed inode items stuff you set trans-block_rsv to the special block reserve so the accounting is all done properly. This is just off the top of my head, there may be a better way to do it, I've not actually looked that the delayed inode code at all. I would do this myself but I have a ever increasing list of shit to do so will somebody pick this up and fix it please? Thanks, Sorry, it's my miss. I forgot to set trans-block_rsv to global_block_rsv, since we have migrated the space from trans_block_rsv to global_block_rsv. I'll fix it soon. There is another problem, we're failing xfstest 204. I tried making reserve_metadata_bytes commit the transaction regardless of whether or not there were pinned bytes but the test just hung there. Usually it takes 7 seconds to run and I ctrl+c'ed it after a couple of minutes. 204 just creates a crap ton of files, which is what is killing us. There needs to be a way to start flushing delayed inode items so we can reclaim the space they are holding onto so we don't get enospc, and it needs to be better than just committing the transaction because that is dog slow. Thanks, Josef Is there a solution for this? I'm running a 2.6.38.8 kernel with all the btrfs patches from 3.0rc7 (except the pluging). When starting a ceph rebuild on the btrfs volumes I get a lot of warnings from block_rsv_use_bytes in use_block_rsv: Ok I think I've got this nailed down. Will you run with this patch and make sure the warnings go away? Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 52d7eca..2263d29 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -112,9 +112,6 @@ struct btrfs_inode { */ u64 disk_i_size; - /* flags field from the on disk inode */ - u32 flags; - /* * if this is a directory then index_cnt is the counter for the index * number for new files that are created @@ -128,14 +125,8 @@ struct btrfs_inode { */ u64 last_unlink_trans; - /* -* Counters to keep track of the number of extent item's we may use due -* to delalloc and such. outstanding_extents is the number of extent -* items we think we'll end up using, and reserved_extents is the number -* of extent items we've reserved metadata for. -*/ - atomic_t outstanding_extents; - atomic_t reserved_extents; + /* flags field from the on disk inode */ + u32 flags; /* * ordered_data_close is set by truncate when a file that used @@ -151,12 +142,21 @@ struct btrfs_inode { unsigned orphan_meta_reserved:1; unsigned dummy_inode:1; unsigned in_defrag:1; - /* * always compress this one file */ unsigned force_compress:4; + /* +* Counters to keep track of the number of extent item's we may use due +* to delalloc and such. outstanding_extents is the number of extent +* items we think we'll end up using, and reserved_extents is the number +* of extent items we've reserved metadata for. +*/ + spinlock_t extents_count_lock; + unsigned outstanding_extents; + unsigned reserved_extents; + struct btrfs_delayed_node *delayed_node; struct inode vfs_inode; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index be02cae..3ba4d5f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2133,7 +2133,7 @@ static inline bool btrfs_mixed_space_info(struct btrfs_space_info *space_info) /* extent-tree.c */ static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root, -int num_items) +unsigned num_items) { return (root-leafsize + root-nodesize * (BTRFS_MAX_LEVEL - 1)) * 3 * num_items; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3e52b85..65a721c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3952,13 +3952,35 @@ static u64
[PATCH] Btrfs: don't print the leaf if we had an error
In __btrfs_free_extent we will print the leaf if we fail to find the extent we wanted, but the problem is if we get an error we won't have a leaf so often this leads to a NULL pointer dereference and we lose the error that actually occurred. So only print the leaf if ret 0, which means we didn't find the item we were looking for but we didn't error either. This way the error is preserved. Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/extent-tree.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3e52b85..152669b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4447,7 +4447,9 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, printk(KERN_ERR umm, got %d back from search , was looking for %llu\n, ret, (unsigned long long)bytenr); - btrfs_print_leaf(extent_root, path-nodes[0]); + if (ret 0) + btrfs_print_leaf(extent_root, +path-nodes[0]); } BUG_ON(ret); extent_slot = path-slots[0]; -- 1.7.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] Btrfs: fixes and cleanups for 3.1
The first 4 patches are bug-fixes, and the remaining are small cleanups that have sit in my git tree for some time. The first 3 patches have been sent to the list before. We save some bytes after this patchset: textdata bss dec hex filename 42638738541024 431265 694a1 fs/btrfs/btrfs.o.orig 42565138541024 430529 691c1 fs/btrfs/btrfs.o I've run xfstests for testing. === Li Zefan (12): Btrfs: copy string correctly in INO_LOOKUP ioctl Btrfs: fix space leak when skipping small extents during trimming Btrfs: fix space leak when trimming free extents Btrfs: check the nodatasum flag when writing compressed files Btrfs: use wait_event() Btrfs: remove a BUG_ON() in btrfs_commit_transaction() Btrfs: remove remaining ref-cache code Btrfs: make acl functions really no-op if acl is not enabled Btrfs: remove redundant code for dir item lookup Btrfs: clean up search_extent_mapping() Btrfs: clean up code for extent_map lookup Btrfs: clean up code for merging extent maps Xiao Guangrong (4): Btrfs: remove unused members from struct extent_state Btrfs: clean up for insert_state() Btrfs: clean up for wait_extent_bit() Btrfs: clean up for find_first_extent_bit() fs/btrfs/Makefile |4 +- fs/btrfs/acl.c | 17 - fs/btrfs/compression.c | 14 +++- fs/btrfs/ctree.h| 15 - fs/btrfs/dir-item.c | 30 + fs/btrfs/extent_io.c| 80 -- fs/btrfs/extent_io.h|2 - fs/btrfs/extent_map.c | 155 ++- fs/btrfs/free-space-cache.c | 43 +++- fs/btrfs/ioctl.c|3 +- fs/btrfs/ref-cache.c| 68 --- fs/btrfs/ref-cache.h| 52 -- fs/btrfs/transaction.c | 65 +++ 13 files changed, 142 insertions(+), 406 deletions(-) delete mode 100644 fs/btrfs/ref-cache.c delete mode 100644 fs/btrfs/ref-cache.h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] Btrfs: copy string correctly in INO_LOOKUP ioctl
Memory areas [ptr, ptr+total_len] and [name, name+total_len] may overlap, so it's wrong to use memcpy(). Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a3c4751..08a4580 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1755,11 +1755,10 @@ static noinline int btrfs_search_path_in_tree(struct btrfs_fs_info *info, key.objectid = key.offset; key.offset = (u64)-1; dirid = key.objectid; - } if (ptr name) goto out; - memcpy(name, ptr, total_len); + memmove(name, ptr, total_len); name[total_len]='\0'; ret = 0; out: -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/16] Btrfs: fix space leak when skipping small extents during trimming
We're taking a free space extent out of the free space cache, trimming it and then putting it back into the cache. However for an extent that is smaller than the specified minimum length, it's taken out but won't be put back, which causes space leak. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c | 34 +- 1 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index bf0d615..901585d 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2463,6 +2463,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, u64 bytes = 0; u64 actually_trimmed; int ret = 0; + int update_ret; *trimmed = 0; @@ -2486,6 +2487,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, } if (entry-bitmap) { + bytes = 0; ret = search_bitmap(ctl, entry, start, bytes); if (!ret) { if (start = end) { @@ -2493,6 +2495,8 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, break; } bytes = min(bytes, end - start); + if (bytes minlen) + goto next; bitmap_clear_bits(ctl, entry, start, bytes); if (entry-bytes == 0) free_bitmap(ctl, entry); @@ -2506,33 +2510,29 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, } else { start = entry-offset; bytes = min(entry-bytes, end - start); + if (bytes minlen) + goto next; unlink_free_space(ctl, entry); kmem_cache_free(btrfs_free_space_cachep, entry); } spin_unlock(ctl-tree_lock); - if (bytes = minlen) { - int update_ret; - update_ret = btrfs_update_reserved_bytes(block_group, -bytes, 1, 1); + update_ret = btrfs_update_reserved_bytes(block_group, +bytes, 1, 1); - ret = btrfs_error_discard_extent(fs_info-extent_root, -start, -bytes, -actually_trimmed); + ret = btrfs_error_discard_extent(fs_info-extent_root, start, +bytes, actually_trimmed); - btrfs_add_free_space(block_group, start, bytes); - if (!update_ret) - btrfs_update_reserved_bytes(block_group, - bytes, 0, 1); + btrfs_add_free_space(block_group, start, bytes); + if (!update_ret) + btrfs_update_reserved_bytes(block_group, bytes, 0, 1); - if (ret) - break; - *trimmed += actually_trimmed; - } + if (ret) + break; + *trimmed += actually_trimmed; +next: start += bytes; - bytes = 0; if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/16] Btrfs: fix space leak when trimming free extents
When the end of an extent exceeds the end of the specified range, the extent will be accidentally truncated. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 901585d..cf4bffd 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2512,8 +2512,15 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, bytes = min(entry-bytes, end - start); if (bytes minlen) goto next; + unlink_free_space(ctl, entry); - kmem_cache_free(btrfs_free_space_cachep, entry); + if (bytes entry-bytes) { + entry-offset = entry-offset + bytes; + entry-bytes = entry-bytes - bytes; + link_free_space(ctl, entry); + } else { + kmem_cache_free(btrfs_free_space_cachep, entry); + } } spin_unlock(ctl-tree_lock); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/16] Btrfs: use wait_event()
Use wait_event() when possible to avoid code duplication. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/transaction.c | 59 +-- 1 files changed, 7 insertions(+), 52 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 51dcec8..34a30ea 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -216,17 +216,11 @@ static void wait_current_trans(struct btrfs_root *root) spin_lock(root-fs_info-trans_lock); cur_trans = root-fs_info-running_transaction; if (cur_trans cur_trans-blocked) { - DEFINE_WAIT(wait); atomic_inc(cur_trans-use_count); spin_unlock(root-fs_info-trans_lock); - while (1) { - prepare_to_wait(root-fs_info-transaction_wait, wait, - TASK_UNINTERRUPTIBLE); - if (!cur_trans-blocked) - break; - schedule(); - } - finish_wait(root-fs_info-transaction_wait, wait); + + wait_event(root-fs_info-transaction_wait, + !cur_trans-blocked); put_transaction(cur_trans); } else { spin_unlock(root-fs_info-trans_lock); @@ -362,15 +356,7 @@ struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root static noinline int wait_for_commit(struct btrfs_root *root, struct btrfs_transaction *commit) { - DEFINE_WAIT(wait); - while (!commit-commit_done) { - prepare_to_wait(commit-commit_wait, wait, - TASK_UNINTERRUPTIBLE); - if (commit-commit_done) - break; - schedule(); - } - finish_wait(commit-commit_wait, wait); + wait_event(commit-commit_wait, commit-commit_done); return 0; } @@ -1080,22 +1066,7 @@ int btrfs_transaction_blocked(struct btrfs_fs_info *info) static void wait_current_trans_commit_start(struct btrfs_root *root, struct btrfs_transaction *trans) { - DEFINE_WAIT(wait); - - if (trans-in_commit) - return; - - while (1) { - prepare_to_wait(root-fs_info-transaction_blocked_wait, wait, - TASK_UNINTERRUPTIBLE); - if (trans-in_commit) { - finish_wait(root-fs_info-transaction_blocked_wait, - wait); - break; - } - schedule(); - finish_wait(root-fs_info-transaction_blocked_wait, wait); - } + wait_event(root-fs_info-transaction_blocked_wait, trans-in_commit); } /* @@ -1105,24 +1076,8 @@ static void wait_current_trans_commit_start(struct btrfs_root *root, static void wait_current_trans_commit_start_and_unblock(struct btrfs_root *root, struct btrfs_transaction *trans) { - DEFINE_WAIT(wait); - - if (trans-commit_done || (trans-in_commit !trans-blocked)) - return; - - while (1) { - prepare_to_wait(root-fs_info-transaction_wait, wait, - TASK_UNINTERRUPTIBLE); - if (trans-commit_done || - (trans-in_commit !trans-blocked)) { - finish_wait(root-fs_info-transaction_wait, - wait); - break; - } - schedule(); - finish_wait(root-fs_info-transaction_wait, - wait); - } + wait_event(root-fs_info-transaction_wait, + trans-commit_done || (trans-in_commit !trans-blocked)); } /* -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
wait_for_commit() always returns 0. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/transaction.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 34a30ea..40726ac 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -353,11 +353,10 @@ struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root } /* wait for a transaction commit to be fully complete */ -static noinline int wait_for_commit(struct btrfs_root *root, +static noinline void wait_for_commit(struct btrfs_root *root, struct btrfs_transaction *commit) { wait_event(commit-commit_wait, commit-commit_done); - return 0; } int btrfs_wait_for_commit(struct btrfs_root *root, u64 transid) @@ -1184,8 +1183,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, atomic_inc(cur_trans-use_count); btrfs_end_transaction(trans, root); - ret = wait_for_commit(root, cur_trans); - BUG_ON(ret); + wait_for_commit(root, cur_trans); put_transaction(cur_trans); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/16] Btrfs: remove remaining ref-cache code
Since commit f2a97a9dbd86eb1ef956bdf20e05c507b32beb96 (btrfs: remove all unused functions), there's no extern functions at all in ref-cache.c, so just remove the remaining dead code. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ref-cache.c | 68 -- fs/btrfs/ref-cache.h | 52 -- 2 files changed, 0 insertions(+), 120 deletions(-) delete mode 100644 fs/btrfs/ref-cache.c delete mode 100644 fs/btrfs/ref-cache.h diff --git a/fs/btrfs/ref-cache.c b/fs/btrfs/ref-cache.c deleted file mode 100644 index 82d569c..000 --- a/fs/btrfs/ref-cache.c +++ /dev/null @@ -1,68 +0,0 @@ -/* - * Copyright (C) 2008 Oracle. All rights reserved. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, - * Boston, MA 021110-1307, USA. - */ - -#include linux/sched.h -#include linux/slab.h -#include linux/sort.h -#include ctree.h -#include ref-cache.h -#include transaction.h - -static struct rb_node *tree_insert(struct rb_root *root, u64 bytenr, - struct rb_node *node) -{ - struct rb_node **p = root-rb_node; - struct rb_node *parent = NULL; - struct btrfs_leaf_ref *entry; - - while (*p) { - parent = *p; - entry = rb_entry(parent, struct btrfs_leaf_ref, rb_node); - - if (bytenr entry-bytenr) - p = (*p)-rb_left; - else if (bytenr entry-bytenr) - p = (*p)-rb_right; - else - return parent; - } - - entry = rb_entry(node, struct btrfs_leaf_ref, rb_node); - rb_link_node(node, parent, p); - rb_insert_color(node, root); - return NULL; -} - -static struct rb_node *tree_search(struct rb_root *root, u64 bytenr) -{ - struct rb_node *n = root-rb_node; - struct btrfs_leaf_ref *entry; - - while (n) { - entry = rb_entry(n, struct btrfs_leaf_ref, rb_node); - WARN_ON(!entry-in_tree); - - if (bytenr entry-bytenr) - n = n-rb_left; - else if (bytenr entry-bytenr) - n = n-rb_right; - else - return n; - } - return NULL; -} diff --git a/fs/btrfs/ref-cache.h b/fs/btrfs/ref-cache.h deleted file mode 100644 index 24f7001..000 --- a/fs/btrfs/ref-cache.h +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Copyright (C) 2008 Oracle. All rights reserved. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, - * Boston, MA 021110-1307, USA. - */ -#ifndef __REFCACHE__ -#define __REFCACHE__ - -struct btrfs_extent_info { - /* bytenr and num_bytes find the extent in the extent allocation tree */ - u64 bytenr; - u64 num_bytes; - - /* objectid and offset find the back reference for the file */ - u64 objectid; - u64 offset; -}; - -struct btrfs_leaf_ref { - struct rb_node rb_node; - struct btrfs_leaf_ref_tree *tree; - int in_tree; - atomic_t usage; - - u64 root_gen; - u64 bytenr; - u64 owner; - u64 generation; - int nritems; - - struct list_head list; - struct btrfs_extent_info extents[]; -}; - -static inline size_t btrfs_leaf_ref_size(int nr_extents) -{ - return sizeof(struct btrfs_leaf_ref) + - sizeof(struct btrfs_extent_info) * nr_extents; -} -#endif -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/16] Btrfs: make acl functions really no-op if acl is not enabled
So there's no overhead for something we don't use. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/Makefile |4 +++- fs/btrfs/acl.c| 17 - fs/btrfs/ctree.h | 15 --- 3 files changed, 15 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 9b72dcf..40e6ac0 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -6,5 +6,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ transaction.o inode.o file.o tree-defrag.o \ extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ - export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \ + export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o + +btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index f66fc99..b206d4c 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -28,8 +28,6 @@ #include btrfs_inode.h #include xattr.h -#ifdef CONFIG_BTRFS_FS_POSIX_ACL - static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) { int size; @@ -318,18 +316,3 @@ const struct xattr_handler btrfs_xattr_acl_access_handler = { .get= btrfs_xattr_acl_get, .set= btrfs_xattr_acl_set, }; - -#else /* CONFIG_BTRFS_FS_POSIX_ACL */ - -int btrfs_acl_chmod(struct inode *inode) -{ - return 0; -} - -int btrfs_init_acl(struct btrfs_trans_handle *trans, - struct inode *inode, struct inode *dir) -{ - return 0; -} - -#endif /* CONFIG_BTRFS_FS_POSIX_ACL */ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 60e13ef..b5097e2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2644,12 +2644,21 @@ do { \ /* acl.c */ #ifdef CONFIG_BTRFS_FS_POSIX_ACL int btrfs_check_acl(struct inode *inode, int mask, unsigned int flags); -#else -#define btrfs_check_acl NULL -#endif int btrfs_init_acl(struct btrfs_trans_handle *trans, struct inode *inode, struct inode *dir); int btrfs_acl_chmod(struct inode *inode); +#else +#define btrfs_check_acl NULL +static inline int btrfs_init_acl(struct btrfs_trans_handle *trans, +struct inode *inode, struct inode *dir) +{ + return 0; +} +static inline int btrfs_acl_chmod(struct inode *inode) +{ + return 0; +} +#endif /* relocation.c */ int btrfs_relocate_block_group(struct btrfs_root *root, u64 group_start); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/16] Btrfs: remove redundant code for dir item lookup
When we search a dir item with a specific hash code, we can just return NULL without further checking if btrfs_search_slot() returns 1. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/dir-item.c | 30 ++ 1 files changed, 2 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index 685f259..81533a0 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -203,8 +203,6 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, struct btrfs_key key; int ins_len = mod 0 ? -1 : 0; int cow = mod != 0; - struct btrfs_key found_key; - struct extent_buffer *leaf; key.objectid = dir; btrfs_set_key_type(key, BTRFS_DIR_ITEM_KEY); @@ -214,18 +212,7 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, ret = btrfs_search_slot(trans, root, key, path, ins_len, cow); if (ret 0) return ERR_PTR(ret); - if (ret 0) { - if (path-slots[0] == 0) - return NULL; - path-slots[0]--; - } - - leaf = path-nodes[0]; - btrfs_item_key_to_cpu(leaf, found_key, path-slots[0]); - - if (found_key.objectid != dir || - btrfs_key_type(found_key) != BTRFS_DIR_ITEM_KEY || - found_key.offset != key.offset) + if (ret 0) return NULL; return btrfs_match_dir_item_name(root, path, name, name_len); @@ -320,8 +307,6 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans, struct btrfs_key key; int ins_len = mod 0 ? -1 : 0; int cow = mod != 0; - struct btrfs_key found_key; - struct extent_buffer *leaf; key.objectid = dir; btrfs_set_key_type(key, BTRFS_XATTR_ITEM_KEY); @@ -329,18 +314,7 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans, ret = btrfs_search_slot(trans, root, key, path, ins_len, cow); if (ret 0) return ERR_PTR(ret); - if (ret 0) { - if (path-slots[0] == 0) - return NULL; - path-slots[0]--; - } - - leaf = path-nodes[0]; - btrfs_item_key_to_cpu(leaf, found_key, path-slots[0]); - - if (found_key.objectid != dir || - btrfs_key_type(found_key) != BTRFS_XATTR_ITEM_KEY || - found_key.offset != key.offset) + if (ret 0) return NULL; return btrfs_match_dir_item_name(root, path, name, name_len); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/16] Btrfs: clean up search_extent_mapping()
rb_node returned by __tree_search() can be a valid pointer or NULL, but won't be some errno. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_map.c | 17 +++-- 1 files changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 2d04103..911a9db 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -379,23 +379,12 @@ struct extent_map *search_extent_mapping(struct extent_map_tree *tree, em = rb_entry(next, struct extent_map, rb_node); goto found; } - if (!rb_node) { - em = NULL; - goto out; - } - if (IS_ERR(rb_node)) { - em = ERR_CAST(rb_node); - goto out; - } - em = rb_entry(rb_node, struct extent_map, rb_node); - goto found; - - em = NULL; - goto out; + if (!rb_node) + return NULL; + em = rb_entry(rb_node, struct extent_map, rb_node); found: atomic_inc(em-refs); -out: return em; } -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/16] Btrfs: clean up code for extent_map lookup
lookup_extent_map() and search_extent_map() can share most of code. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_map.c | 85 + 1 files changed, 29 insertions(+), 56 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 911a9db..df7a803 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -299,19 +299,8 @@ static u64 range_end(u64 start, u64 len) return start + len; } -/** - * lookup_extent_mapping - lookup extent_map - * @tree: tree to lookup in - * @start: byte offset to start the search - * @len: length of the lookup range - * - * Find and return the first extent_map struct in @tree that intersects the - * [start, len] range. There may be additional objects in the tree that - * intersect, so check the object returned carefully to make sure that no - * additional lookups are needed. - */ -struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree, -u64 start, u64 len) +struct extent_map *__lookup_extent_mapping(struct extent_map_tree *tree, + u64 start, u64 len, int strict) { struct extent_map *em; struct rb_node *rb_node; @@ -320,38 +309,42 @@ struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree, u64 end = range_end(start, len); rb_node = __tree_search(tree-map, start, prev, next); - if (!rb_node prev) { - em = rb_entry(prev, struct extent_map, rb_node); - if (end em-start start extent_map_end(em)) - goto found; - } - if (!rb_node next) { - em = rb_entry(next, struct extent_map, rb_node); - if (end em-start start extent_map_end(em)) - goto found; - } if (!rb_node) { - em = NULL; - goto out; - } - if (IS_ERR(rb_node)) { - em = ERR_CAST(rb_node); - goto out; + if (prev) + rb_node = prev; + else if (next) + rb_node = next; + else + return NULL; } + em = rb_entry(rb_node, struct extent_map, rb_node); - if (end em-start start extent_map_end(em)) - goto found; - em = NULL; - goto out; + if (strict !(end em-start start extent_map_end(em))) + return NULL; -found: atomic_inc(em-refs); -out: return em; } /** + * lookup_extent_mapping - lookup extent_map + * @tree: tree to lookup in + * @start: byte offset to start the search + * @len: length of the lookup range + * + * Find and return the first extent_map struct in @tree that intersects the + * [start, len] range. There may be additional objects in the tree that + * intersect, so check the object returned carefully to make sure that no + * additional lookups are needed. + */ +struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree, +u64 start, u64 len) +{ + return __lookup_extent_mapping(tree, start, len, 1); +} + +/** * search_extent_mapping - find a nearby extent map * @tree: tree to lookup in * @start: byte offset to start the search @@ -365,27 +358,7 @@ out: struct extent_map *search_extent_mapping(struct extent_map_tree *tree, u64 start, u64 len) { - struct extent_map *em; - struct rb_node *rb_node; - struct rb_node *prev = NULL; - struct rb_node *next = NULL; - - rb_node = __tree_search(tree-map, start, prev, next); - if (!rb_node prev) { - em = rb_entry(prev, struct extent_map, rb_node); - goto found; - } - if (!rb_node next) { - em = rb_entry(next, struct extent_map, rb_node); - goto found; - } - if (!rb_node) - return NULL; - - em = rb_entry(rb_node, struct extent_map, rb_node); -found: - atomic_inc(em-refs); - return em; + return __lookup_extent_mapping(tree, start, len, 0); } /** -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/16] Btrfs: clean up code for merging extent maps
unpin_extent_cache() and add_extent_mapping() shares the same code that merges extent maps. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_map.c | 59 +--- 1 files changed, 21 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index df7a803..7c97b33 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -183,22 +183,10 @@ static int mergable_maps(struct extent_map *prev, struct extent_map *next) return 0; } -int unpin_extent_cache(struct extent_map_tree *tree, u64 start, u64 len) +static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em) { - int ret = 0; struct extent_map *merge = NULL; struct rb_node *rb; - struct extent_map *em; - - write_lock(tree-lock); - em = lookup_extent_mapping(tree, start, len); - - WARN_ON(!em || em-start != start); - - if (!em) - goto out; - - clear_bit(EXTENT_FLAG_PINNED, em-flags); if (em-start != 0) { rb = rb_prev(em-rb_node); @@ -225,6 +213,24 @@ int unpin_extent_cache(struct extent_map_tree *tree, u64 start, u64 len) merge-in_tree = 0; free_extent_map(merge); } +} + +int unpin_extent_cache(struct extent_map_tree *tree, u64 start, u64 len) +{ + int ret = 0; + struct extent_map *em; + + write_lock(tree-lock); + em = lookup_extent_mapping(tree, start, len); + + WARN_ON(!em || em-start != start); + + if (!em) + goto out; + + clear_bit(EXTENT_FLAG_PINNED, em-flags); + + try_merge_map(tree, em); free_extent_map(em); out: @@ -247,7 +253,6 @@ int add_extent_mapping(struct extent_map_tree *tree, struct extent_map *em) { int ret = 0; - struct extent_map *merge = NULL; struct rb_node *rb; struct extent_map *exist; @@ -263,30 +268,8 @@ int add_extent_mapping(struct extent_map_tree *tree, goto out; } atomic_inc(em-refs); - if (em-start != 0) { - rb = rb_prev(em-rb_node); - if (rb) - merge = rb_entry(rb, struct extent_map, rb_node); - if (rb mergable_maps(merge, em)) { - em-start = merge-start; - em-len += merge-len; - em-block_len += merge-block_len; - em-block_start = merge-block_start; - merge-in_tree = 0; - rb_erase(merge-rb_node, tree-map); - free_extent_map(merge); - } -} - rb = rb_next(em-rb_node); - if (rb) - merge = rb_entry(rb, struct extent_map, rb_node); - if (rb mergable_maps(em, merge)) { - em-len += merge-len; - em-block_len += merge-len; - rb_erase(merge-rb_node, tree-map); - merge-in_tree = 0; - free_extent_map(merge); - } + + try_merge_map(tree, em); out: return ret; } -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] Btrfs: remove unused members from struct extent_state
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com These members are not used at all. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_io.h |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index a11a92e..d04ca37 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -108,8 +108,6 @@ struct extent_state { wait_queue_head_t wq; atomic_t refs; unsigned long state; - u64 split_start; - u64 split_end; /* for use by the FS */ u64 private; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/16] Btrfs: clean up for insert_state()
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Don't duplicate set_state_bits(). Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b181a94..6c7394f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -308,6 +308,9 @@ static void clear_state_cb(struct extent_io_tree *tree, tree-ops-clear_bit_hook(tree-mapping-host, state, bits); } +static int set_state_bits(struct extent_io_tree *tree, + struct extent_state *state, int *bits); + /* * insert an extent_state struct into the tree. 'bits' are set on the * struct before it is inserted. @@ -323,7 +326,6 @@ static int insert_state(struct extent_io_tree *tree, int *bits) { struct rb_node *node; - int bits_to_set = *bits ~EXTENT_CTLBITS; int ret; if (end start) { @@ -334,13 +336,11 @@ static int insert_state(struct extent_io_tree *tree, } state-start = start; state-end = end; - ret = set_state_cb(tree, state, bits); + + ret = set_state_bits(tree, state, bits); if (ret) return ret; - if (bits_to_set EXTENT_DIRTY) - tree-dirty_bytes += end - start + 1; - state-state |= bits_to_set; node = tree_insert(tree-state, end, state-rb_node); if (node) { struct extent_state *found; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/16] Btrfs: clean up for wait_extent_bit()
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com We can just use cond_resched_lock(). Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_io.c |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6c7394f..1959a63 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -658,11 +658,7 @@ again: if (start end) break; - if (need_resched()) { - spin_unlock(tree-lock); - cond_resched(); - spin_lock(tree-lock); - } + cond_resched_lock(tree-lock); } out: spin_unlock(tree-lock); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/16] Btrfs: clean up for find_first_extent_bit()
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com find_first_extent_bit() and find_first_extent_bit_state() share most of the code, and we can just make the former call the latter. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 64 ++--- 1 files changed, 24 insertions(+), 40 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1959a63..a31ffa5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1055,46 +1055,6 @@ static int set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) return 0; } -/* - * find the first offset in the io tree with 'bits' set. zero is - * returned if we find something, and *start_ret and *end_ret are - * set to reflect the state struct that was found. - * - * If nothing was found, 1 is returned, 0 on error - */ -int find_first_extent_bit(struct extent_io_tree *tree, u64 start, - u64 *start_ret, u64 *end_ret, int bits) -{ - struct rb_node *node; - struct extent_state *state; - int ret = 1; - - spin_lock(tree-lock); - /* -* this search will find all the extents that end after -* our range starts. -*/ - node = tree_search(tree, start); - if (!node) - goto out; - - while (1) { - state = rb_entry(node, struct extent_state, rb_node); - if (state-end = start (state-state bits)) { - *start_ret = state-start; - *end_ret = state-end; - ret = 0; - break; - } - node = rb_next(node); - if (!node) - break; - } -out: - spin_unlock(tree-lock); - return ret; -} - /* find the first state struct with 'bits' set after 'start', and * return it. tree-lock must be held. NULL will returned if * nothing was found after 'start' @@ -1127,6 +1087,30 @@ out: } /* + * find the first offset in the io tree with 'bits' set. zero is + * returned if we find something, and *start_ret and *end_ret are + * set to reflect the state struct that was found. + * + * If nothing was found, 1 is returned, 0 on error + */ +int find_first_extent_bit(struct extent_io_tree *tree, u64 start, + u64 *start_ret, u64 *end_ret, int bits) +{ + struct extent_state *state; + int ret = 1; + + spin_lock(tree-lock); + state = find_first_extent_bit_state(tree, start, bits); + if (state) { + *start_ret = state-start; + *end_ret = state-end; + ret = 0; + } + spin_unlock(tree-lock); + return ret; +} + +/* * find a contiguous range of bytes in the file marked as delalloc, not * more than 'max_bytes'. start and end are used to return the range, * -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mis-Design of Btrfs?
On Wed, 29 Jun 2011 10:29:53 +0100 Ric Wheeler rwhee...@redhat.com wrote: On 06/27/2011 07:46 AM, NeilBrown wrote: On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius nico-lkml-20110...@schottelius.org wrote: Good morning devs, I'm wondering whether the raid- and volume-management-builtin of btrfs is actually a sane idea or not. Currently we do have md/device-mapper support for raid already, btrfs lacks raid5 support and re-implements stuff that has already been done. I'm aware of the fact that it is very useful to know on which devices we are in a filesystem. But I'm wondering, whether it wouldn't be smarter to generalise the information exposure through the VFS layer instead of replicating functionality: Physical: USB-HD SSD USB-Flash | Exposes information to Raid: Raid1, Raid5, Raid10, etc.| higher levels Crypto: Luks | LVM:Groups/Volumes| FS: xfs/jfs/reiser/ext3 v Thus a filesystem like ext3 could be aware that it is running on a USB HD, enable -o sync be default or have the filesystem to rewrite blocks when running on crypto or optimise for an SSD, ... I would certainly agree that exposing information to higher levels is a good idea. To some extent we do. But it isn't always as easy as it might sound. Choosing exactly what information to expose is the challenge. If you lack sufficient foresight you might expose something which turns out to be very specific to just one device, so all those upper levels which make use of the information find they are really special-casing one specific device, which isn't a good idea. However it doesn't follow that RAID5 should not be implemented in BTRFS. The levels that you have drawn are just one perspective. While that has value, it may not be universal. I could easily argue that the LVM layer is a mistake and that filesystems should provide that functionality directly. I could almost argue the same for crypto. RAID1 can make a lot of sense to be tightly integrated with the FS. RAID5 ... I'm less convinced, but then I have a vested interest there so that isn't an objective assessment. Part of the way Linux works is that s/he who writes the code gets to make the design decisions. The BTRFS developers might create something truly awesome, or might end up having to support a RAID feature that they subsequently think is a bad idea. But it really is their decision to make. NeilBrown One more thing to add here is that I think that we still have a chance to increase the sharing between btrfs and the MD stack if we can get those changes made. No one likes to duplicate code, but we will need a richer interface between the block and file system layer to help close that gap. Ric I'm certainly open to suggestions and collaboration. Do you have in mind any particular way to make the interface richer?? NeilBrown -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html