Re: snapshot space available
Chris Murphy posted on Mon, 26 Aug 2013 09:15:46 -0600 as excerpted: I'm uncertain if autodefrag avoids this problem. It does seem like in certain instances, like this, the file system needs to be able to prune itself somehow, like a partial balance to consolidate data chunks and then release their space so they can become metadata chunks. Intriguing question re defrag. I hadn't thought of the possibility until you suggested it, but indeed, tracking hundreds of extents instead of one or a dozen, does sound like it could reduce metadata usage, reducing the chance of running into the issue in the first place as well as lessening the chance of a simple delete temporarily requiring significant new metadata resources in ordered to track all those extent frees before the final atomic root entry update. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: stop refusing the relocation of chunk 0
AFAICT chunk 0 is no longer special, and so it should be restriped just like every other chunk. One reason for this change is us refusing the relocation can lead to filesystems that can only be mounted ro, and never rw -- see the bugzilla [1] for details. The other reason is that device removal code is already doing this: it will happily relocate chunk 0 is part of shrinking the device. [1] https://bugzilla.kernel.org/show_bug.cgi?id=60594 Reported-by: Xavier Bassery xav...@bartica.org Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c9a0977..e8325de 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2991,10 +2991,6 @@ again: if (found_key.objectid != key.objectid) break; - /* chunk zero is special */ - if (found_key.offset == 0) - break; - chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); if (!counting) { @@ -3030,6 +3026,8 @@ again: spin_unlock(fs_info-balance_lock); } loop: + if (found_key.offset == 0) + break; key.offset = found_key.offset - 1; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: handle errors when doing slow caching
On Tue, Aug 27, 2013 at 01:26:36PM +0300, Alex Lyakas wrote: Hi Josef, thanks for addressing this. On Mon, Aug 5, 2013 at 6:19 PM, Josef Bacik jba...@fusionio.com wrote: Alex Lyakas reported a bug where wait_block_group_cache_progress() would wait forever if a drive failed. This is because we just bail out if there is an error while trying to cache a block group, we don't update anybody who may be waiting. So this introduces a new enum for the cache state in case of error and makes everybody bail out if we have an error. Alex tested and verified this patch fixed his problem. This fixes bz 59431. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.h |1 + fs/btrfs/extent-tree.c | 27 --- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cbb1263..c17acbc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1188,6 +1188,7 @@ enum btrfs_caching_type { BTRFS_CACHE_STARTED = 1, BTRFS_CACHE_FAST= 2, BTRFS_CACHE_FINISHED= 3, + BTRFS_CACHE_ERROR = 4, }; enum btrfs_disk_cache_state { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e868c35..e6dfa7f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -113,7 +113,8 @@ static noinline int block_group_cache_done(struct btrfs_block_group_cache *cache) { smp_mb(); - return cache-cached == BTRFS_CACHE_FINISHED; + return cache-cached == BTRFS_CACHE_FINISHED || + cache-cached == BTRFS_CACHE_ERROR; } static int block_group_bits(struct btrfs_block_group_cache *cache, u64 bits) @@ -389,7 +390,7 @@ static noinline void caching_thread(struct btrfs_work *work) u64 total_found = 0; u64 last = 0; u32 nritems; - int ret = 0; + int ret = -ENOMEM; caching_ctl = container_of(work, struct btrfs_caching_control, work); block_group = caching_ctl-block_group; @@ -517,6 +518,12 @@ err: mutex_unlock(caching_ctl-mutex); out: + if (ret) { + spin_lock(block_group-lock); + block_group-caching_ctl = NULL; + block_group-cached = BTRFS_CACHE_ERROR; + spin_unlock(block_group-lock); + } wake_up(caching_ctl-wait); put_caching_control(caching_ctl); @@ -6035,8 +6042,11 @@ static u64 stripe_align(struct btrfs_root *root, * for our min num_bytes. Another option is to have it go ahead * and look in the rbtree for a free extent of a given size, but this * is a good start. + * + * Callers of this must check if cache-cached == BTRFS_CACHE_ERROR before using + * any of the information in this block group. */ -static noinline int +static noinline void wait_block_group_cache_progress(struct btrfs_block_group_cache *cache, u64 num_bytes) { @@ -6044,28 +6054,29 @@ wait_block_group_cache_progress(struct btrfs_block_group_cache *cache, caching_ctl = get_caching_control(cache); if (!caching_ctl) - return 0; + return; wait_event(caching_ctl-wait, block_group_cache_done(cache) || (cache-free_space_ctl-free_space = num_bytes)); put_caching_control(caching_ctl); - return 0; } static noinline int wait_block_group_cache_done(struct btrfs_block_group_cache *cache) { struct btrfs_caching_control *caching_ctl; + int ret = 0; caching_ctl = get_caching_control(cache); if (!caching_ctl) return 0; In case caching_thread completes with error for this block group, get_caching_control() will return NULL. So this function will return success, although the block group was not cached properly. Currently only btrfs_trim_fs() caller checks the return value of this function, although you didn't post the btrfs_trim_fs() change in this patch (but you posed it in the bugzilla). Still, should we check the cache-cached for ERROR even if there is no caching control? Yeah I'll fix that up, sorry about that. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add support for asserts
On Mon, Aug 26, 2013 at 02:53:26PM -0700, Zach Brown wrote: With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. I like the sound of that! --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3814,6 +3814,22 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) #define btrfs_debug(fs_info, fmt, args...) \ btrfs_printk(fs_info, KERN_DEBUG fmt, ##args) +#ifdef BTRFS_ASSERT + +static inline void assfail(char *expr, char *file, int lin) +{ + printk(KERN_ERR BTRFS assertion failed: %s, file: %s, line: %d, + expr, file, line); + BUG(); +} I'm not sure why this is needed. +#define ASSERT(expr) \ + (unlikely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) (Passing the assertion is unlikely()? I know, this is from xfs... still.) Yeah I copy+pasted and then thought about it after I sent it. I will fix it up. +#else +#define ASSERT(expr) ((void)0) +#endif Anyway, if you're going to do it this way, why not: #ifdef BTRFS_ASSERT #define btrfs_assert(cond) BUG_ON(!(cond)) #else #define btrfs_assert(cond) do { if (cond) ; } while (0) #endif I like the verbosity, especially with random kernel versions and such, it will help me figure out where we BUG_ON()'ed without having to checkout a particular version and go hunting. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- fs/btrfs/volumes.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f42e412..44cd21b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3465,7 +3465,7 @@ static int btrfs_uuid_scan_kthread(void *data) int slot; struct btrfs_root_item root_item; u32 item_size; - struct btrfs_trans_handle *trans; + struct btrfs_trans_handle *trans = NULL; path = btrfs_alloc_path(); if (!path) { @@ -3509,7 +3509,13 @@ static int btrfs_uuid_scan_kthread(void *data) (int)sizeof(root_item)); if (btrfs_root_refs(root_item) == 0) goto skip; - if (!btrfs_is_empty_uuid(root_item.uuid)) { + + if (!btrfs_is_empty_uuid(root_item.uuid) || + !btrfs_is_empty_uuid(root_item.received_uuid)) { + if (trans) + goto
[PATCH] Btrfs: handle errors when doing slow caching V2
Alex Lyakas reported a bug where wait_block_group_cache_progress() would wait forever if a drive failed. This is because we just bail out if there is an error while trying to cache a block group, we don't update anybody who may be waiting. So this introduces a new enum for the cache state in case of error and makes everybody bail out if we have an error. Alex tested and verified this patch fixed his problem. This fixes bz 59431. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- V1-V2: return an error in wait_block_group_cache_done if there is no caching ctl and we have BTRFS_CACHE_ERROR set. fs/btrfs/ctree.h |1 + fs/btrfs/extent-tree.c | 32 +++- 2 files changed, 24 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cbb1263..c17acbc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1188,6 +1188,7 @@ enum btrfs_caching_type { BTRFS_CACHE_STARTED = 1, BTRFS_CACHE_FAST= 2, BTRFS_CACHE_FINISHED= 3, + BTRFS_CACHE_ERROR = 4, }; enum btrfs_disk_cache_state { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e868c35..a073f3e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -113,7 +113,8 @@ static noinline int block_group_cache_done(struct btrfs_block_group_cache *cache) { smp_mb(); - return cache-cached == BTRFS_CACHE_FINISHED; + return cache-cached == BTRFS_CACHE_FINISHED || + cache-cached == BTRFS_CACHE_ERROR; } static int block_group_bits(struct btrfs_block_group_cache *cache, u64 bits) @@ -389,7 +390,7 @@ static noinline void caching_thread(struct btrfs_work *work) u64 total_found = 0; u64 last = 0; u32 nritems; - int ret = 0; + int ret = -ENOMEM; caching_ctl = container_of(work, struct btrfs_caching_control, work); block_group = caching_ctl-block_group; @@ -517,6 +518,12 @@ err: mutex_unlock(caching_ctl-mutex); out: + if (ret) { + spin_lock(block_group-lock); + block_group-caching_ctl = NULL; + block_group-cached = BTRFS_CACHE_ERROR; + spin_unlock(block_group-lock); + } wake_up(caching_ctl-wait); put_caching_control(caching_ctl); @@ -6035,8 +6042,11 @@ static u64 stripe_align(struct btrfs_root *root, * for our min num_bytes. Another option is to have it go ahead * and look in the rbtree for a free extent of a given size, but this * is a good start. + * + * Callers of this must check if cache-cached == BTRFS_CACHE_ERROR before using + * any of the information in this block group. */ -static noinline int +static noinline void wait_block_group_cache_progress(struct btrfs_block_group_cache *cache, u64 num_bytes) { @@ -6044,28 +6054,29 @@ wait_block_group_cache_progress(struct btrfs_block_group_cache *cache, caching_ctl = get_caching_control(cache); if (!caching_ctl) - return 0; + return; wait_event(caching_ctl-wait, block_group_cache_done(cache) || (cache-free_space_ctl-free_space = num_bytes)); put_caching_control(caching_ctl); - return 0; } static noinline int wait_block_group_cache_done(struct btrfs_block_group_cache *cache) { struct btrfs_caching_control *caching_ctl; + int ret = 0; caching_ctl = get_caching_control(cache); if (!caching_ctl) - return 0; + return (cache-cached == BTRFS_CACHE_ERROR) ? -EIO : 0; wait_event(caching_ctl-wait, block_group_cache_done(cache)); - + if (cache-cached == BTRFS_CACHE_ERROR) + ret = -EIO; put_caching_control(caching_ctl); - return 0; + return ret; } int __get_raid_index(u64 flags) @@ -6248,6 +6259,8 @@ have_block_group: ret = 0; } + if (unlikely(block_group-cached == BTRFS_CACHE_ERROR)) + goto loop; if (unlikely(block_group-ro)) goto loop; @@ -8230,7 +8243,8 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) * We haven't cached this block group, which means we could * possibly have excluded extents on this block group. */ - if (block_group-cached == BTRFS_CACHE_NO) + if (block_group-cached == BTRFS_CACHE_NO || + block_group-cached == BTRFS_CACHE_ERROR) free_excluded_extents(info-extent_root, block_group); btrfs_remove_free_space_cache(block_group); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock in uuid scan kthread
On Tue, Aug 27, 2013 at 04:51:55PM +0100, Filipe David Borba Manana wrote: If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- fs/btrfs/volumes.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f42e412..44cd21b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3465,7 +3465,7 @@ static int btrfs_uuid_scan_kthread(void *data) int slot; struct btrfs_root_item root_item; u32 item_size; - struct btrfs_trans_handle *trans; + struct btrfs_trans_handle *trans = NULL; path = btrfs_alloc_path(); if (!path) { @@ -3509,7 +3509,13 @@ static int btrfs_uuid_scan_kthread(void *data) (int)sizeof(root_item)); if (btrfs_root_refs(root_item) == 0) goto skip; - if (!btrfs_is_empty_uuid(root_item.uuid)) { + + if
[PATCH v2] Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed wrong assignment of NULL to transaction pointer, and addressed code style comment from Josef. fs/btrfs/volumes.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f42e412..3cb9649 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3465,7 +3465,7 @@ static int btrfs_uuid_scan_kthread(void *data) int slot; struct btrfs_root_item root_item; u32 item_size; - struct btrfs_trans_handle *trans; + struct btrfs_trans_handle *trans = NULL; path = btrfs_alloc_path(); if (!path) { @@ -3503,13 +3503,18 @@ static int btrfs_uuid_scan_kthread(void *data) if (item_size sizeof(root_item)) goto skip; - trans = NULL; read_extent_buffer(eb, root_item, btrfs_item_ptr_offset(eb, slot), (int)sizeof(root_item));
[PATCH v4] Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed wrong assignment of NULL to transaction pointer, and addressed code style comment from Josef. V3: Removed unnecessary if statement to check if trans is NULL, as it can't be NULL anymore in that section of the loop. V4: Ensure that we don't leak a transaction handle if after looking again for items we don't end up updating anything in the uuid tree. Before we create a transaction, we release the path we found and then repeat the key search and check if the item still needs to be updated - if not we don't use the transaction for that iteration of the loop - if this is true for all iterations, we would leak the transaction handle. fs/btrfs/volumes.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f42e412..ec1cf0e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3465,7 +3465,7 @@ static int btrfs_uuid_scan_kthread(void *data) int slot;
Re: [PATCH v2] xfstests/btrfs/308: add snapshot-aware defrag for partial extents testcases
Thanks for the patch Liu Bo, it has been committed as test number 10. commit d943515bbcb8893c5d4ae21340a6022235d9e643 Author: Liu Bo bo.li@oracle.com Date: Thu Jul 25 08:15:02 2013 + xfstests/btrfs/010: add snapshot-aware defrag for partial extents testcases This is to test whether snapshot-aware defrag can work well on partial extents. Reviewed-by: Ben Myers b...@sgi.com Signed-off-by: Liu Bo bo.li@oracle.com Signed-off-by: Rich Johnston rjohns...@sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not complete ordered extents that are truncated
Currently we will do the normal btrfs_finish_ordered_io if we invalidate a dirty page that is part of an ordered extent but hasn't yet been submitted for IO. The side effect of this is that when the rest of the ordered extent is completed it will still add its extent to the file and it's csums to the tree. This usually isn't a problem since this occurs right before a truncate, so these csums and extents are going to be cleaned up. However if there is an error doing the truncate we can leave extents that point to invalid data, and as such you'll get csum errors trying to read back this extent. So to handle this just set a flag on the ordered extent if we are going to invalidate one of its dirty pages, and then once we go to do the completion stuff we skip updating the file extents and adding our csums, we just free everything up and exit. This is preliminary work we need to have better orphan entry error handling. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/inode.c| 17 +++-- fs/btrfs/ordered-data.h |3 +++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1c86f9d..d212b36 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2562,8 +2562,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) struct extent_state *cached_state = NULL; struct new_sa_defrag_extent *new = NULL; int compress_type = 0; - int ret; + int ret = 0; bool nolock; + bool truncated = false; nolock = btrfs_is_free_space_inode(inode); @@ -2572,6 +2573,11 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) goto out; } + if (test_bit(BTRFS_ORDERED_TRUNCATED, ordered_extent-flags)) { + truncated = true; + goto out; + } + if (test_bit(BTRFS_ORDERED_NOCOW, ordered_extent-flags)) { BUG_ON(!list_empty(ordered_extent-list)); /* Logic error */ btrfs_ordered_update_i_size(inode, 0, ordered_extent); @@ -2667,7 +2673,7 @@ out: if (trans) btrfs_end_transaction(trans, root); - if (ret) { + if (ret || truncated) { clear_extent_uptodate(io_tree, ordered_extent-file_offset, ordered_extent-file_offset + ordered_extent-len - 1, NULL, GFP_NOFS); @@ -7348,6 +7354,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned long offset) EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_LOCKED | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1, 0, cached_state, GFP_NOFS); + + /* +* Set this so that we don't actually do the extent update work +* for this ordered extent. +*/ + set_bit(BTRFS_ORDERED_TRUNCATED, ordered-flags); + /* * whoever cleared the private bit is responsible * for the finish_ordered_io diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 68844d5..3d7fe72 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -69,6 +69,9 @@ struct btrfs_ordered_sum { * the isize. */ #define BTRFS_ORDERED_LOGGED_CSUM 8 /* We've logged the csums on this ordered ordered extent */ +#define BTRFS_ORDERED_TRUNCATED 9 /* We truncated a page in this ordered +extent, do not allow the completion stuff +to occur */ struct btrfs_ordered_extent { /* logical offset in the file */ -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not complete ordered extents that are truncated
On Tue, Aug 27, 2013 at 01:20:14PM -0400, Josef Bacik wrote: Currently we will do the normal btrfs_finish_ordered_io if we invalidate a dirty page that is part of an ordered extent but hasn't yet been submitted for IO. The side effect of this is that when the rest of the ordered extent is completed it will still add its extent to the file and it's csums to the tree. This usually isn't a problem since this occurs right before a truncate, so these csums and extents are going to be cleaned up. However if there is an error doing the truncate we can leave extents that point to invalid data, and as such you'll get csum errors trying to read back this extent. So to handle this just set a flag on the ordered extent if we are going to invalidate one of its dirty pages, and then once we go to do the completion stuff we skip updating the file extents and adding our csums, we just free everything up and exit. This is preliminary work we need to have better orphan entry error handling. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- Ignore this, it is wrong. Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] utils: add support for getting/changing file system, features
This patch adds support for getting and changing file system feature bits. It supports both unmounted and mounted operation via a new set of ioctls. Setting bits not supported by the running kernel, if the file system is mounted, or the tools, if the file system is unmounted, can be forced with a -f option. It supports clearing of feature bits, but I've left that undocumented intentionally. Signed-off-by: Jeff Mahoney je...@suse.com --- cmds-filesystem.c | 42 + ioctl.h | 12 ++ man/btrfs.8.in| 28 utils.c | 478 +- utils.h | 10 ++ 5 files changed, 569 insertions(+), 1 deletion(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index f41a72a..ca096e0 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -515,6 +515,47 @@ static int cmd_label(int argc, char **argv) return get_label(argv[1]); } +static const char * const cmd_features_usage[] = { + btrfs filesystem features [device|mountpoint] [[-f] features], + Get or change the list of features currently enabled by a filesystem, + With one argument, get the currently enabled features of filesystem, + on device or mountpoint., , + If features is passed, add new features to the file system., + The format of new features can be a comma separated list of names or, + or a comma-separated list of specifiers of the following format: A, + prefix of compat, compat_ro, or incompat and a decimal number, , + separated by a colon: (e.g. compat:10), , + Features to be enabled will be confirmed against the supported, + features of the kernel (if mounted) or the progs (if unmounted). This, + behavior can be overridden when operating on unmounted, + filesystems by using the -f (force) flag., , + A list of features supported by the progs can be found in the manual., + NULL +}; + +static int cmd_features(int argc, char **argv) +{ + if (check_argc_min(argc, 2) || check_argc_max(argc, 4)) + usage(cmd_features_usage); + + if (argc 2) { + char *features = argv[2]; + int force = 0; + if (argc 3) { + if (!strcmp(argv[3], -f)) + force = 1; + else if (!strcmp(argv[2], -f)) { + force = 1; + features = argv[3]; + } else + usage(cmd_features_usage); + } + + return parse_and_set_features(argv[1], features, force); + } else + return get_features(argv[1]); +} + const struct cmd_group filesystem_cmd_group = { filesystem_cmd_group_usage, NULL, { { df, cmd_df, cmd_df_usage, NULL, 0 }, @@ -524,6 +565,7 @@ const struct cmd_group filesystem_cmd_group = { { balance, cmd_balance, NULL, balance_cmd_group, 1 }, { resize, cmd_resize, cmd_resize_usage, NULL, 0 }, { label, cmd_label, cmd_label_usage, NULL, 0 }, + { features, cmd_features, cmd_features_usage, NULL, 0 }, { 0, 0, 0, 0, 0 }, } }; diff --git a/ioctl.h b/ioctl.h index abe6dd4..44483d1 100644 --- a/ioctl.h +++ b/ioctl.h @@ -172,6 +172,12 @@ struct btrfs_ioctl_fs_info_args { __u64 reserved[124];/* pad to 1k */ }; +struct btrfs_ioctl_feature_flags { + __u64 compat_flags; + __u64 compat_ro_flags; + __u64 incompat_flags; +}; + /* balance control ioctl modes */ #define BTRFS_BALANCE_CTL_PAUSE1 #define BTRFS_BALANCE_CTL_CANCEL 2 @@ -537,6 +543,12 @@ struct btrfs_ioctl_clone_range_args { struct btrfs_ioctl_get_dev_stats) #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \ struct btrfs_ioctl_dev_replace_args) +#define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 54, \ + struct btrfs_ioctl_feature_flags) +#define BTRFS_IOC_ADD_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 55, \ + struct btrfs_ioctl_feature_flags) +#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 56, \ + struct btrfs_ioctl_feature_flags[2]) #ifdef __cplusplus } diff --git a/man/btrfs.8.in b/man/btrfs.8.in index af7df4d..9c5b61e 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -31,6 +31,8 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBfilesystem label\fP\fI dev [newlabel]\fP .PP +\fBbtrfs\fP \fBfilesystem features\fP\fI dev|mountpoint [[-f ]newlabel]\fP +.PP \fBbtrfs\fP \fBfilesystem balance\fP\fI path \fP .PP \fBbtrfs\fP \fBdevice scan\fP\fI [--all-devices|device [device...]]\fP @@ -280,6 +282,32 @@ NOTE: Currently there are the following limitations: - the filesystem should
[PATCH] btrfs: add ability to query/change feature bits online
There are some feature bits that require no offline setup and can be enabled online. I've only reviewed extended irefs, but there will probably be more. We introduce three new ioctls: - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs basis. - BTRFS_IOC_ADD_FEATURES: enable new features on a per-fs basis. We introduce a new mask _SAFE_ONLINE that allows us to define which features are safe to enable at runtime. The failure modes for BTRFS_IOC_ADD_FEATURES are as follows: - Enabling a completely unsupported feature: warns and returns -ENOTSUPP - Enabling a feature that can only be done offline: warns and returns -EPERM The structure passed in is not modified on return since it is possible to isolate which features would bounce a -EPERM by subtracting the features provided by a GET_SUPPORTED_FEATURES call. Signed-off-by: Jeff Mahoney je...@suse.com --- fs/btrfs/ctree.h |6 ++ fs/btrfs/ioctl.c | 127 + include/uapi/linux/btrfs.h | 12 3 files changed, 145 insertions(+) --- a/fs/btrfs/ctree.h 2013-08-27 11:02:35.062626912 -0400 +++ b/fs/btrfs/ctree.h 2013-08-27 11:03:32.002352821 -0400 @@ -512,7 +512,10 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL 8) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL +#define BTRFS_FEATURE_COMPAT_ONLINE_SAFE 0ULL #define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL +#define BTRFS_FEATURE_COMPAT_RO_ONLINE_SAFE0ULL + #define BTRFS_FEATURE_INCOMPAT_SUPP\ (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \ BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |\ @@ -523,6 +526,9 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) +#define BTRFS_FEATURE_INCOMPAT_ONLINE_SAFE \ + (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) + /* * A leaf is full of items. offset and size tell us where to find * the item in the leaf (relative to the start of the data area) --- a/fs/btrfs/ioctl.c 2013-08-27 11:02:35.062626912 -0400 +++ b/fs/btrfs/ioctl.c 2013-08-27 15:08:06.900740843 -0400 @@ -4098,6 +4098,127 @@ out_unlock: return ret; } +static int btrfs_ioctl_get_supported_features(struct file *file, + void __user *arg) +{ + struct btrfs_ioctl_feature_flags features[2]; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + features[0].compat_flags = BTRFS_FEATURE_COMPAT_SUPP; + features[0].compat_ro_flags = BTRFS_FEATURE_COMPAT_RO_SUPP; + features[0].incompat_flags = BTRFS_FEATURE_INCOMPAT_SUPP; + + features[1].compat_flags = BTRFS_FEATURE_COMPAT_ONLINE_SAFE; + features[1].compat_ro_flags = BTRFS_FEATURE_COMPAT_RO_ONLINE_SAFE; + features[1].incompat_flags = BTRFS_FEATURE_INCOMPAT_ONLINE_SAFE; + + if (copy_to_user(arg, features, sizeof(features))) + return -EFAULT; + + return 0; +} + +static int btrfs_ioctl_get_features(struct file *file, void __user *arg) +{ + struct btrfs_root *root = BTRFS_I(file_inode(file))-root; + struct btrfs_super_block *super_block = root-fs_info-super_copy; + struct btrfs_ioctl_feature_flags features; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + features.compat_flags = btrfs_super_compat_flags(super_block); + features.compat_ro_flags = btrfs_super_compat_ro_flags(super_block); + features.incompat_flags = btrfs_super_incompat_flags(super_block); + + if (copy_to_user(arg, features, sizeof(features))) + return -EFAULT; + + return 0; +} + +static int check_feature_bits(struct btrfs_root *root, const char *type, + u64 flags, u64 supported_flags, + u64 allowed_flags) +{ + u64 disallowed, unsupported; + + unsupported = flags ~supported_flags; + if (unsupported) { + btrfs_warn(root-fs_info, + this kernel does not support %s bits %llx, type, + unsupported); + return -EOPNOTSUPP; + } + + disallowed = flags ~allowed_flags; + if (disallowed) { + btrfs_warn(root-fs_info, + can't enable %s bits %llx while mounted, type, + disallowed); + return -EPERM; + } + + return 0; +} + +#define check_feature(root, flags, mask_base) \ +check_feature_bits(root, # mask_base, flags, \ + BTRFS_FEATURE_ ## mask_base ## _SUPP, \ + BTRFS_FEATURE_ ## mask_base ## _ONLINE_SAFE) + +static int btrfs_ioctl_add_features(struct file *file, void __user *arg) +{ + struct btrfs_root *root =
[PATCH 1/2] ctree: fix broken macros for compat_ro
The macro for compat_ro had the right name but was modifying compat instead. Signed-off-by: Jeff Mahoney je...@suse.com --- ctree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ctree.h b/ctree.h index 0b0d701..8ef45f3 100644 --- a/ctree.h +++ b/ctree.h @@ -1912,7 +1912,7 @@ BTRFS_SETGET_STACK_FUNCS(super_num_devices, struct btrfs_super_block, BTRFS_SETGET_STACK_FUNCS(super_compat_flags, struct btrfs_super_block, compat_flags, 64); BTRFS_SETGET_STACK_FUNCS(super_compat_ro_flags, struct btrfs_super_block, -compat_flags, 64); +compat_ro_flags, 64); BTRFS_SETGET_STACK_FUNCS(super_incompat_flags, struct btrfs_super_block, incompat_flags, 64); BTRFS_SETGET_STACK_FUNCS(super_csum_type, struct btrfs_super_block, -- 1.8.1.4 -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add support for asserts
On 8/27/13 9:47 AM, Josef Bacik wrote: On Mon, Aug 26, 2013 at 02:53:26PM -0700, Zach Brown wrote: With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. I like the sound of that! --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3814,6 +3814,22 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) #define btrfs_debug(fs_info, fmt, args...) \ btrfs_printk(fs_info, KERN_DEBUG fmt, ##args) +#ifdef BTRFS_ASSERT + +static inline void assfail(char *expr, char *file, int lin) +{ + printk(KERN_ERR BTRFS assertion failed: %s, file: %s, line: %d, + expr, file, line); + BUG(); +} I'm not sure why this is needed. +#define ASSERT(expr) \ + (unlikely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) (Passing the assertion is unlikely()? I know, this is from xfs... still.) Yeah I copy+pasted and then thought about it after I sent it. I will fix it up. +#else +#define ASSERT(expr) ((void)0) +#endif Anyway, if you're going to do it this way, why not: #ifdef BTRFS_ASSERT #define btrfs_assert(cond) BUG_ON(!(cond)) #else #define btrfs_assert(cond) do { if (cond) ; } while (0) #endif I like the verbosity, especially with random kernel versions and such, it will help me figure out where we BUG_ON()'ed without having to checkout a particular version and go hunting. Thanks, Agreed. One of the positives of the obnoxious reiserfs warning IDs is that it uniquely identifies a call site across kernel versions. You can tell at a glance that it's the same failure you may have been chasing for a while. Anything to make the ID-at-a-glance easy is worth it. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: add support for asserts
On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? For the call sites that are unimplemented error handling, something to annotate those so that we can keep track of them and gradually eliminate those too would be good, though. -Jeff Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/Kconfig |9 + fs/btrfs/ctree.h | 16 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index 2b3b832..398cbd5 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -72,3 +72,12 @@ config BTRFS_DEBUG performance, or export extra information via sysfs. If unsure, say N. + +config BTRFS_ASSERT + bool Btrfs assert support + depends on BTRFS_FS + help + Enable run-time assertion checking. This will result in panics if + any of the assertions trip. This is meant for btrfs developers only. + + If unsure, say N. diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c90be01..8278a3f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3814,6 +3814,22 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) #define btrfs_debug(fs_info, fmt, args...) \ btrfs_printk(fs_info, KERN_DEBUG fmt, ##args) +#ifdef BTRFS_ASSERT + +static inline void assfail(char *expr, char *file, int lin) +{ + printk(KERN_ERR BTRFS assertion failed: %s, file: %s, line: %d, +expr, file, line); + BUG(); +} + +#define ASSERT(expr) \ + (unlikely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) +#else +#define ASSERT(expr) ((void)0) +#endif + +#define btrfs_assert() __printf(5, 6) void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, unsigned int line, int errno, const char *fmt, ...); -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
[PATCH v5] Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed wrong assignment of NULL to transaction pointer, and addressed code style comment from Josef. V3: Removed unnecessary if statement to check if trans is NULL, as it can't be NULL anymore in that section of the loop. V4: Ensure that we don't leak a transaction handle if after looking again for items we don't end up updating anything in the uuid tree. Before we create a transaction, we release the path we found and then repeat the key search and check if the item still needs to be updated - if not we don't use the transaction for that iteration of the loop - if this is true for all iterations, we would leak the transaction handle. V5: If a transaction is not used in a loop iteration, end it before we call cond_resched() at the end of the loop body. This is more friendly for other tasks. fs/btrfs/volumes.c | 33 - 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
[PATCH v6] Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts to create one, the kthread will block, waiting for that transaction to finish while it's keeping locks on the tree root, and in turn the existing transaction is waiting for those locks to be free. The stack trace reported by the kernel follows. [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds. [36700.671602] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671602] btrfs-uuid D 0 15480 2 0x [36700.671604] 880710bd5b88 0046 8803d36ba850 0003 [36700.671605] 8806d76dc530 880710bd5fd8 880710bd5fd8 880710bd5fd8 [36700.671607] 8808098ac530 8806d76dc530 880710bd5b98 8805e4508e40 [36700.671608] Call Trace: [36700.671610] [816f36b9] schedule+0x29/0x70 [36700.671620] [a05a3bdf] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [36700.671623] [81066760] ? add_wait_queue+0x60/0x60 [36700.671629] [a05a5b06] start_transaction+0x3d6/0x530 [btrfs] [36700.671636] [a05bb1f4] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [36700.671642] [a05a5fbb] btrfs_start_transaction+0x1b/0x20 [btrfs] [36700.671649] [a05c8a81] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs] [36700.671655] [a05c8870] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs] [36700.671657] [81065fa0] kthread+0xc0/0xd0 [36700.671659] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671661] [816fcd1c] ret_from_fork+0x7c/0xb0 [36700.671662] [81065ee0] ? flush_kthread_worker+0xb0/0xb0 [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds. [36700.671664] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [36700.671665] btrfs D 0 15481 15212 0x0004 [36700.671666] 880248cbf4c8 0086 8803d36ba700 8801dbd5c280 [36700.671668] 880807815c40 880248cbffd8 880248cbffd8 880248cbffd8 [36700.671669] 8805e86a 880807815c40 880248cbf4d8 8801dbd5c280 [36700.671670] Call Trace: [36700.671672] [816f36b9] schedule+0x29/0x70 [36700.671679] [a05d9b0d] btrfs_tree_lock+0x6d/0x230 [btrfs] [36700.671680] [81066760] ? add_wait_queue+0x60/0x60 [36700.671685] [a0582829] btrfs_search_slot+0x999/0xb00 [btrfs] [36700.671691] [a05bd9de] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs] [36700.671698] [a05e3e54] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs] [36700.671704] [a05e4362] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [36700.671710] [a05c4441] ? free_extent_buffer+0x61/0xc0 [btrfs] [36700.671716] [a0594c82] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs] [36700.671723] [a0610092] commit_cowonly_roots+0x171/0x24b [btrfs] [36700.671729] [a05a4dde] btrfs_commit_transaction+0x4fe/0xa10 [btrfs] [36700.671735] [a0610af3] create_subvol+0x5c0/0x636 [btrfs] [36700.671742] [a05d49ff] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs] [36700.671747] [a05d4bf2] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs] [36700.671752] [a05d4c6c] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs] [36700.671757] [a05d4c9e] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs] [36700.671759] [8113a764] ? handle_pte_fault+0x84/0x920 [36700.671764] [a05d87eb] btrfs_ioctl+0xf0b/0x1d00 [btrfs] [36700.671766] [8113c120] ? handle_mm_fault+0x210/0x310 [36700.671768] [816f83a4] ? __do_page_fault+0x284/0x4e0 [36700.671770] [81180aa6] do_vfs_ioctl+0x96/0x550 [36700.671772] [81170fe3] ? __sb_end_write+0x33/0x70 [36700.671774] [81180ff1] SyS_ioctl+0x91/0xb0 [36700.671775] [816fcdc2] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed wrong assignment of NULL to transaction pointer, and addressed code style comment from Josef. V3: Removed unnecessary if statement to check if trans is NULL, as it can't be NULL anymore in that section of the loop. V4: Ensure that we don't leak a transaction handle if after looking again for items we don't end up updating anything in the uuid tree. Before we create a transaction, we release the path we found and then repeat the key search and check if the item still needs to be updated - if not we don't use the transaction for that iteration of the loop - if this is true for all iterations, we would leak the transaction handle. V5: If a transaction is not used in a loop iteration, end it before we call cond_resched() at the end of the loop body. This is more friendly for other tasks. V6: Removed redundant btrfs_end_transaction() calls on error, because this is already performed after we exit the loop. fs/btrfs/volumes.c | 35
Re: [PATCH] Btrfs: add support for asserts
On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add support for asserts
On 8/27/13 4:56 PM, Josef Bacik wrote: On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Ok. With my distro hat on, I can tell you I'll be leaving them on. :) -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: add support for asserts
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 8/27/13 4:07 PM, Jeff Mahoney wrote: On 8/27/13 4:56 PM, Josef Bacik wrote: On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Ok. With my distro hat on, I can tell you I'll be leaving them on. :) -Jeff XFS also has XFS_WARN as a config option, which keeps all the assertions in place, but printk's backtraces w/o the icky BUG(). That might be good to add as well, and perhaps best for a shipping distro (vs. a developer debugging who might want to drop a core file when the assert trips). - -Eric -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSHRhOAAoJECCuFpLhPd7gYbcP/034ADG3dwTa83FaAWuAurg7 byKWG4EwRqt3PYjUgruxBJAc426O7tz6j1NNTrAwZys9/GJOsisPShA8gO0f+W/A +bQZJlXoUMbbwVPMcCqsnKMKlXNyKoqgME9AUQOrzMB/SgDtC9Y/OgdqgWF/58UV X1KC3OOtcfQr/1t19AZuNhJ5oHfytoscv3nnnW5872t1JtL8daomak4fyDuRKgRV 45kQ726nafUlXNmi1TG8GadlcmKxxbBm0vt2ui6RtZWVauPE4Gej+iEUux9WtwSc 48eOQ5iqbFVzC8v++Rc1eT28mBIjSetr+O/Tk+VL4TvYCKA2trMAltNAFinv9AB0 Q+Z9F1K26aFe/Z/gcM57j+c0VOkv1tvSElF1iJcVHPuRvV7k+548g+KVzbXNDPBP vuV2fnUCpw/XHQlrI+efYLs7Ies0TuV2eGPhmbKWjhossPwOeng71zxuiXSbNMBE gVcHg6idXjCdaCCIYuJr8+5K4ngnpTEbAUs4C2x6iHzuHZcXScEHWYU/nHvizElL bCZ162QSeQZAd+NgSzoZSmv4XqFMj6c4q60XvhAuu3fpkVPVY4GshcFhT14Onhfl /054HqdQIXjUGOdbeuUwmXoaqzpSKDhBmZ0G+ykarD1KRCaEW61JrnFepRPO7G69 Q3oiPIbvdvCw3BZAJaGL =7vvX -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add support for asserts
On 8/27/13 5:21 PM, Eric Sandeen wrote: On 8/27/13 4:07 PM, Jeff Mahoney wrote: On 8/27/13 4:56 PM, Josef Bacik wrote: On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Ok. With my distro hat on, I can tell you I'll be leaving them on. :) -Jeff XFS also has XFS_WARN as a config option, which keeps all the assertions in place, but printk's backtraces w/o the icky BUG(). That might be good to add as well, and perhaps best for a shipping distro (vs. a developer debugging who might want to drop a core file when the assert trips). Isn't that the distinction between BUG_ON and WARN_ON? If it's worth a BUG_ON, things should be bad enough (or could result in being bad enough) that we want to bail out. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: add support for asserts
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 8/27/13 4:25 PM, Jeff Mahoney wrote: On 8/27/13 5:21 PM, Eric Sandeen wrote: On 8/27/13 4:07 PM, Jeff Mahoney wrote: On 8/27/13 4:56 PM, Josef Bacik wrote: On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Ok. With my distro hat on, I can tell you I'll be leaving them on. :) -Jeff XFS also has XFS_WARN as a config option, which keeps all the assertions in place, but printk's backtraces w/o the icky BUG(). That might be good to add as well, and perhaps best for a shipping distro (vs. a developer debugging who might want to drop a core file when the assert trips). Isn't that the distinction between BUG_ON and WARN_ON? If it's worth a BUG_ON, things should be bad enough (or could result in being bad enough) that we want to bail out. -Jeff Maybe; just FWIW here was Dave's rationale for xfs. Right now btrfs doesn't have the behavior-changing side effect (no BTRFS_DEBUG config) though, so maybe the distinction is less important... xfs: introduce CONFIG_XFS_WARN Running a CONFIG_XFS_DEBUG kernel in production environments is not the best idea as it introduces significant overhead, can change the behaviour of algorithms (such as allocation) to improve test coverage, and (most importantly) panic the machine on non-fatal errors. There are many cases where all we want to do is run a kernel with more bounds checking enabled, such as is provided by the ASSERT() statements throughout the code, but without all the potential overhead and drawbacks. This patch converts all the ASSERT statements to evaluate as WARN_ON(1) statements and hence if they fail dump a warning and a stack trace to the log. This has minimal overhead and does not change any algorithms, and will allow us to find strange out of bounds problems more easily on production machines. There are a few places where assert statements contain debug only code. These are converted to be debug-or-warn only code so that we still get all the assert checks in the code. Signed-off-by: Dave Chinner dchin...@redhat.com -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSHRn5AAoJECCuFpLhPd7gpsQQAJlFGX/t9b/LASxSisiv/wdL ZoQXHhCzxpzVVdsWstzOY0bVXw8vdsG2E+nmih2S6T7AzkoqPDoEnYE9CqpNQFFy Ca/kJOcfE1T4mIwKZwLHATkJX0V/S6nY7jPa7xdcseie+1H7ldSPaM5Jb6fkvXg/ 8lNPTikGeoRJdUwQN4xxNgsivITfJpl65Z+AVg5UAUqqUKZtYZLfVeAlyQFKvOyl /am80yLLzhFODtV3GcWkaYcInBaB2AaVlqHrpTnf53gG9JGynyFjnZGlysz0flSs wstNKLOon+wNBg1Dz0HrUSVma87g5hc1WtaZFC/qI3uHuoatsxOWxG6+LXZlr2CN Jsq3ZwHHxOs4MLgyEYlSirpgKqn/aKA+J8O0mNlltBj2lpU2hKgPS7dmMw5o8VAM 1uei1er15eBlCY0uBncRXIcLEcXfRXo9b69ErQBIbCN7xrGyWdbZ/DVtElaFeImh Lw+iBXebBbw6SCqCMZFc3vpYdF+9RP6shImBlsqxTzKs5M1gISrFtCF0GqOuPrWt 7jyrredhpKACAaOpxPW8UWh2vL+q51JWzzZKYE35Gy4M/8E64TQ0rYhLGj7x+TYU FWYzpONK0x7XbmgtEKTutwi9w+vfSlMzFNpUavwFeTZIh8Dw1tEO3dBn59Rs9Oz8 Widxpe+hqz/qK/0O4rTb =4nmO -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add support for asserts
On 8/27/13 5:28 PM, Eric Sandeen wrote: On 8/27/13 4:25 PM, Jeff Mahoney wrote: On 8/27/13 5:21 PM, Eric Sandeen wrote: On 8/27/13 4:07 PM, Jeff Mahoney wrote: On 8/27/13 4:56 PM, Josef Bacik wrote: On Tue, Aug 27, 2013 at 03:28:24PM -0400, Jeff Mahoney wrote: On 8/26/13 4:56 PM, Josef Bacik wrote: One of the complaints we get a lot is how many BUG_ON()'s we have. So to help with this I'm introducing a kconfig option to enable/disable a new ASSERT() mechanism much like what XFS does. This will allow us developers to still get our nice panics but allow users/distros to compile them out. With this we can go through and convert any BUG_ON()'s that we have to catch actual programming mistakes to the new ASSERT() and then fix everybody else to return errors. This will also allow developers to leave sanity checks in their new code to make sure we don't trip over problems while testing stuff and vetting new features. Thanks, I don't think the complaint is so much about the number of BUG_ONs, but that there's no distinction between something that is supposed to be impossible and something that is improbable. The BUG_ONs to keep code correctness are good and are littered all over the kernel with positive results. The BUG_ONs that are there in place of real error handling served their purpose and need to be replaced. So, I don't know if it's a net win to compile the good BUG_ONs out of the code. Especially if a user runs into something strange yet familiar and the first response is oh, huh, can you rebuild with asserts enabled? Either I provide an option for it or distros do it themselves, this cuts out the middle man. I'd really rather they just be on all the time since they aren't things we should hit anyway, but at least this way people have a choice. Ok. With my distro hat on, I can tell you I'll be leaving them on. :) -Jeff XFS also has XFS_WARN as a config option, which keeps all the assertions in place, but printk's backtraces w/o the icky BUG(). That might be good to add as well, and perhaps best for a shipping distro (vs. a developer debugging who might want to drop a core file when the assert trips). Isn't that the distinction between BUG_ON and WARN_ON? If it's worth a BUG_ON, things should be bad enough (or could result in being bad enough) that we want to bail out. -Jeff Maybe; just FWIW here was Dave's rationale for xfs. Right now btrfs doesn't have the behavior-changing side effect (no BTRFS_DEBUG config) though, so maybe the distinction is less important... Yeah, I'd agree with the distinction not being there in btrfs (yet). ReiserFS has a similar mode where there are a ton of checks that are optionally enabled and does invasive things that can slow things down. It's disabled pretty much universally AFAIK. One of the things (low) on my TODO list is to go through all of those and move them into regular checks since some of them are the types of things fsfuzzer likes to trip over. -Jeff xfs: introduce CONFIG_XFS_WARN Running a CONFIG_XFS_DEBUG kernel in production environments is not the best idea as it introduces significant overhead, can change the behaviour of algorithms (such as allocation) to improve test coverage, and (most importantly) panic the machine on non-fatal errors. There are many cases where all we want to do is run a kernel with more bounds checking enabled, such as is provided by the ASSERT() statements throughout the code, but without all the potential overhead and drawbacks. This patch converts all the ASSERT statements to evaluate as WARN_ON(1) statements and hence if they fail dump a warning and a stack trace to the log. This has minimal overhead and does not change any algorithms, and will allow us to find strange out of bounds problems more easily on production machines. There are a few places where assert statements contain debug only code. These are converted to be debug-or-warn only code so that we still get all the assert checks in the code. Signed-off-by: Dave Chinner dchin...@redhat.com -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: bad unlock balance in btrfs_commit_transaction_async
Another developer just noticed this in testing; anyone have any ideas? On 08/22/2013 05:40 PM, Sage Weil wrote: I just noticed that there is a locking imbalance warning with sb_internal in the transaction commit code. I believe this has only started appearing recently (after I merged -rc5 into my testing tree), but I'm working on confirming that. The error is 4[27034.835134] = 4[27034.839854] [ BUG: bad unlock balance detected! ] 4[27034.844576] 3.11.0-rc5-ceph-00061-g546140d #1 Not tainted 4[27034.849992] - 4[27034.854713] ceph-osd/30797 is trying to release lock (sb_internal) at: 4[27034.861304] [a0148fd8] btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.868994] but there are no more locks to release! 4[27034.873887] 4[27034.873887] other info that might help us debug this: 4[27034.880448] no locks held by ceph-osd/30797. 4[27034.884733] 4[27034.884733] stack backtrace: 4[27034.889123] CPU: 0 PID: 30797 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1 4[27034.897421] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 4[27034.904938] a0148fd8 88020baf9c68 81642d85 0007 4[27034.912411] 88021b32deb0 88020baf9c98 810ab89e 88020cff8000 4[27034.919883] 0246 88020aaeddd0 a0148fd8 88020baf9ce8 4[27034.927358] Call Trace: 4[27034.929836] [a0148fd8] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.937790] [81642d85] dump_stack+0x46/0x58 4[27034.942951] [810ab89e] print_unlock_imbalance_bug+0xfe/0x110 4[27034.949599] [a0148fd8] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.957552] [810aeafe] lock_release+0x15e/0x220 4[27034.963069] [a0148fff] btrfs_commit_transaction_async+0x1ef/0x2c0 [btrfs] 4[27034.970850] [810af6f5] ? trace_hardirqs_on_caller+0x105/0x1d0 4[27034.977587] [a0177907] btrfs_ioctl_start_sync+0x47/0xc0 [btrfs] 4[27034.984499] [a017c575] btrfs_ioctl+0xe55/0x1af0 [btrfs] 4[27034.990700] [8164ab2b] ? _raw_spin_unlock+0x2b/0x40 4[27034.996552] [810b40b5] ? do_futex+0xa45/0xbb0 4[27035.001885] [8119cf0c] ? fget_light+0x3c/0x130 4[27035.007302] [811922a6] do_vfs_ioctl+0x96/0x560 4[27035.012720] [8119cf6e] ? fget_light+0x9e/0x130 4[27035.018137] [8119cf0c] ? fget_light+0x3c/0x130 4[27035.023556] [81192801] SyS_ioctl+0x91/0xb0 4[27035.028628] [813338fe] ? trace_hardirqs_on_thunk+0x3a/0x3f 4[27035.035089] [81653782] system_call_fastpath+0x16/0x1b This is presumably some breakage in the freeze locking dance that goes on with async commits (btrfs_commit_transaction_async caller takes the freeze semaphore, do_async_commit releases it), but it's not obvious to me what broke yet. Unless this rings any bells for anyone, I'll go ahead and bisect. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad unlock balance in btrfs_commit_transaction_async
On Wed, Aug 28, 2013 at 8:56 AM, Dan Mick dan.m...@inktank.com wrote: Another developer just noticed this in testing; anyone have any ideas? btrfs_ioctl_start_sync() calls btrfs_attach_transaction_barrier() which further calls start_transaction() with type == TRANS_ATTACH. In start_transaction(), sb_start_intwrite() is called when (type __TRANS_FREEZABLE) is true. but (TRANS_ATTACH __TRANS_FREEZABLE) is false. So we see the bad bad unlock balance bug. Yan, Zheng On 08/22/2013 05:40 PM, Sage Weil wrote: I just noticed that there is a locking imbalance warning with sb_internal in the transaction commit code. I believe this has only started appearing recently (after I merged -rc5 into my testing tree), but I'm working on confirming that. The error is 4[27034.835134] = 4[27034.839854] [ BUG: bad unlock balance detected! ] 4[27034.844576] 3.11.0-rc5-ceph-00061-g546140d #1 Not tainted 4[27034.849992] - 4[27034.854713] ceph-osd/30797 is trying to release lock (sb_internal) at: 4[27034.861304] [a0148fd8] btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.868994] but there are no more locks to release! 4[27034.873887] 4[27034.873887] other info that might help us debug this: 4[27034.880448] no locks held by ceph-osd/30797. 4[27034.884733] 4[27034.884733] stack backtrace: 4[27034.889123] CPU: 0 PID: 30797 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1 4[27034.897421] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 4[27034.904938] a0148fd8 88020baf9c68 81642d85 0007 4[27034.912411] 88021b32deb0 88020baf9c98 810ab89e 88020cff8000 4[27034.919883] 0246 88020aaeddd0 a0148fd8 88020baf9ce8 4[27034.927358] Call Trace: 4[27034.929836] [a0148fd8] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.937790] [81642d85] dump_stack+0x46/0x58 4[27034.942951] [810ab89e] print_unlock_imbalance_bug+0xfe/0x110 4[27034.949599] [a0148fd8] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs] 4[27034.957552] [810aeafe] lock_release+0x15e/0x220 4[27034.963069] [a0148fff] btrfs_commit_transaction_async+0x1ef/0x2c0 [btrfs] 4[27034.970850] [810af6f5] ? trace_hardirqs_on_caller+0x105/0x1d0 4[27034.977587] [a0177907] btrfs_ioctl_start_sync+0x47/0xc0 [btrfs] 4[27034.984499] [a017c575] btrfs_ioctl+0xe55/0x1af0 [btrfs] 4[27034.990700] [8164ab2b] ? _raw_spin_unlock+0x2b/0x40 4[27034.996552] [810b40b5] ? do_futex+0xa45/0xbb0 4[27035.001885] [8119cf0c] ? fget_light+0x3c/0x130 4[27035.007302] [811922a6] do_vfs_ioctl+0x96/0x560 4[27035.012720] [8119cf6e] ? fget_light+0x9e/0x130 4[27035.018137] [8119cf0c] ? fget_light+0x3c/0x130 4[27035.023556] [81192801] SyS_ioctl+0x91/0xb0 4[27035.028628] [813338fe] ? trace_hardirqs_on_thunk+0x3a/0x3f 4[27035.035089] [81653782] system_call_fastpath+0x16/0x1b This is presumably some breakage in the freeze locking dance that goes on with async commits (btrfs_commit_transaction_async caller takes the freeze semaphore, do_async_commit releases it), but it's not obvious to me what broke yet. Unless this rings any bells for anyone, I'll go ahead and bisect. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs-progs: prevent mkfs from aborting with small volume
(2013/08/26 23:23), Eric Sandeen wrote: Thanks for looking into this - how small of a device did you test? I tried a 2MB device w/ these 2 patches and still got: [btrfs-progs]# truncate --size=2m testfile [btrfs-progs]# ./mkfs.btrfs testfile WARNING! - Btrfs v0.20-rc1-360-geeeb4e9 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using SMALL VOLUME: forcing mixed metadata/data groups mkfs.btrfs: volumes.c:857: btrfs_alloc_chunk: Assertion `!(ret)' failed. Aborted (core dumped) which was at: ret = btrfs_alloc_dev_extent(trans, device, info-chunk_root-root_key.objectid, BTRFS_FIRST_CHUNK_TREE_OBJECTID, key.offset, calc_size, dev_offset); BUG_ON(ret); :( Wow... It seems that this abort is different problem from the bug which my patches are going to fix. I'll try to make new patch to fix this problem. Also, I'm curious - I know the code existed before your patch 2/2, but I don't understand why it reserves 1MB for the first superblock when the first superblock is actually at 64k. Any idea? -Eric I'm not sure... According to the git-log, this 1M trick is in the following old commit by Chris: commit a6de0bd778475504f42a142c83b8077993cbddfe Author: Chris Mason chris.ma...@oracle.com Date: Thu Apr 3 16:35:48 2008 -0400 Add mirroring support across multiple drives Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html