[PATCH 1/3] Btrfs: do not log extents when we only log new names
When we log new names, we need to log just enough to recreate the inode during log replay, and there is no need to log extents along with it. This actually fixes a bug revealed by xfstests 241, where it shows that we're logging some extents that have not updated metadata, so we don't get proper EXTENT_DATA items to be copied to log tree. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/tree-log.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 81e407d..4ec41ec 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3435,7 +3435,8 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, ret = btrfs_truncate_inode_items(trans, log, inode, 0, 0); } else { - fast_search = true; + if (inode_only == LOG_INODE_ALL) + fast_search = true; max_key.type = BTRFS_XATTR_ITEM_KEY; ret = drop_objectid_items(trans, log, path, ino, BTRFS_XATTR_ITEM_KEY); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Btrfs: get right arguments for btrfs_wait_ordered_range
btrfs_wait_ordered_range expects for 'len' instead of 'end'. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9ab1bed..d2df981 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1562,7 +1562,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * range being left. */ atomic_inc(root-log_batch); - btrfs_wait_ordered_range(inode, start, end); + btrfs_wait_ordered_range(inode, start, end - start + 1); atomic_inc(root-log_batch); /* -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs: cleanup for btrfs_wait_order_range
Variable 'found' is no more used. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/ordered-data.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 7772f02..7f75bea 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -606,7 +606,6 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len) u64 end; u64 orig_end; struct btrfs_ordered_extent *ordered; - int found; if (start + len start) { orig_end = INT_LIMIT(loff_t); @@ -642,7 +641,6 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len) filemap_fdatawait_range(inode-i_mapping, start, orig_end); end = orig_end; - found = 0; while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, end); if (!ordered) @@ -655,7 +653,6 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len) btrfs_put_ordered_extent(ordered); break; } - found++; btrfs_start_ordered_extent(inode, ordered, 1); end = ordered-file_offset; btrfs_put_ordered_extent(ordered); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/ctree.c:2950
Hi, I have a torture test[1] that I run to test stable page writeback. When I run it against a btrfs (3.7.0-rc3) I observe the kernel bug messsage[2] that I've attached at the end of this message. The test program spawns a bunch of threads, which try to rewrite file blocks either through mmap or through regular pwrite calls. I'll take a look at it tomorrow, but I was wondering if this caught anybody's eye... # gcc -o wac wac.c # mount /dev/path/to/a/btrfs /mnt # ./wac -l 65536 -n 32 -m 32 -f -r /mnt/victimfile wait a few seconds kaboom --D [1] http://djwong.org/docs/wac.c [2] Double stack-trace: [ 9713.768350] [ cut here ] [ 9713.773188] WARNING: at /storage/home/djwong/cdev/work/linux-spw/fs/btrfs/tree-log.c:3716 btrfs_log_inode_parent+0x427/0x480 [btrfs]() [ 9713.778876] Hardware name: Bochs [ 9713.780877] Modules linked in: btrfs sd_mod scsi_debug ext4 mbcache jbd2 scsi_mod sch_fq_codel nfsv4 eeprom nfsd auth_rpcgss exportfs af_packet raid1 raid0 md_mod zlib_deflate libcrc32c [last unloaded: sd_mod] [ 9713.786778] Pid: 1992, comm: wac Not tainted 3.7.0-rc3-spw #42 [ 9713.789183] Call Trace: [ 9713.791102] [8105210f] warn_slowpath_common+0x7f/0xc0 [ 9713.793428] [8105216a] warn_slowpath_null+0x1a/0x20 [ 9713.795826] [a030ad57] btrfs_log_inode_parent+0x427/0x480 [btrfs] [ 9713.798310] [8117c97c] ? dget_parent+0x1c/0xe0 [ 9713.800653] [a030adf6] btrfs_log_dentry_safe+0x46/0x70 [btrfs] [ 9713.803149] [a02e21c6] btrfs_sync_file+0x1a6/0x240 [btrfs] [ 9713.805626] [81550f49] ? sysret_check+0x22/0x5d [ 9713.807861] [8119705d] do_fsync+0x5d/0x90 [ 9713.810077] [812595ae] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9713.812642] [81197460] sys_fsync+0x10/0x20 [ 9713.814793] [81550f1d] system_call_fastpath+0x1a/0x1f [ 9713.817178] ---[ end trace 47f6b9aede5fa6f5 ]--- [ 9715.884354] [ cut here ] [ 9715.888197] kernel BUG at /storage/home/djwong/cdev/work/linux-spw/fs/btrfs/ctree.c:2950! [ 9715.888197] invalid opcode: [#1] PREEMPT SMP [ 9715.888197] Modules linked in: btrfs sd_mod scsi_debug ext4 mbcache jbd2 scsi_mod sch_fq_codel nfsv4 eeprom nfsd auth_rpcgss exportfs af_packet raid1 raid0 md_mod zlib_deflate libcrc32c [last unloaded: sd_mod] [ 9715.888197] CPU 0 [ 9715.888197] Pid: 1990, comm: wac Tainted: GW3.7.0-rc3-spw #42 Bochs Bochs [ 9715.888197] RIP: 0010:[a02af5b9] [a02af5b9] btrfs_set_item_key_safe+0x149/0x150 [btrfs] [ 9715.888197] RSP: 0018:880028735ae8 EFLAGS: 00010246 [ 9715.888197] RAX: RBX: 000f RCX: e000 [ 9715.888197] RDX: RSI: 880028735c16 RDI: 880028735ac7 [ 9715.888197] RBP: 880028735b48 R08: 0c9e R09: 880028735b08 [ 9715.888197] R10: R11: R12: 880016b62bf0 [ 9715.888197] R13: 880028735c16 R14: 880028735b07 R15: 8800293edbd0 [ 9715.888197] FS: 7f3ffa3b3700() GS:88003fc0() knlGS: [ 9715.888197] CS: 0010 DS: ES: CR0: 8005003b [ 9715.888197] CR2: 7f3ffa3b0fba CR3: 2872d000 CR4: 07f0 [ 9715.888197] DR0: DR1: DR2: [ 9715.888197] DR3: DR6: 0ff0 DR7: 0400 [ 9715.888197] Process wac (pid: 1990, threadinfo 880028734000, task 88002870de50) [ 9715.888197] Stack: [ 9715.888197] 880028735b48 88002f678000 8800131c2000 0100 [ 9715.888197] 6c01 e000 880028735b48 8800293edbd0 [ 9715.888197] 880016b62bf0 d000 0001 [ 9715.888197] Call Trace: [ 9715.888197] [a02e46e7] __btrfs_drop_extents+0x557/0xb00 [btrfs] [ 9715.888197] [a0306fdd] btrfs_log_changed_extents+0x5bd/0x610 [btrfs] [ 9715.888197] [a02efff7] ? free_extent_buffer+0x37/0x90 [btrfs] [ 9715.888197] [a0308efb] btrfs_log_inode+0x50b/0x5b0 [btrfs] [ 9715.888197] [8117c9b8] ? dget_parent+0x58/0xe0 [ 9715.888197] [a030aac2] btrfs_log_inode_parent+0x192/0x480 [btrfs] [ 9715.888197] [8117c97c] ? dget_parent+0x1c/0xe0 [ 9715.888197] [a030adf6] btrfs_log_dentry_safe+0x46/0x70 [btrfs] [ 9715.888197] [a02e21c6] btrfs_sync_file+0x1a6/0x240 [btrfs] [ 9715.888197] [81550f49] ? sysret_check+0x22/0x5d [ 9715.888197] [8119705d] do_fsync+0x5d/0x90 [ 9715.888197] [812595ae] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9715.888197] [81197460] sys_fsync+0x10/0x20 [ 9715.888197] [81550f1d] system_call_fastpath+0x1a/0x1f [ 9715.888197] Code: 00 00 4c 89 f6 4c 89 e7 48 98 48 8d 04 80 48 8d 54 80 65 e8 aa 1b 04 00 4c 89 ee 4c 89 f7 e8 2f f2 ff ff 85 c0 0f 8f 57 ff ff ff 0f 0b 0f 0b 0f 1f 00 66 66 66 66 90 55 48 b8 00 00 00 00 00 16 [ 9715.888197]
Re: kernel BUG at fs/btrfs/ctree.c:2950
On thu, 1 Nov 2012 00:04:13 -0700, Darrick J. Wong wrote: Hi, I have a torture test[1] that I run to test stable page writeback. When I run it against a btrfs (3.7.0-rc3) I observe the kernel bug messsage[2] that I've attached at the end of this message. The test program spawns a bunch of threads, which try to rewrite file blocks either through mmap or through regular pwrite calls. I'll take a look at it tomorrow, but I was wondering if this caught anybody's eye... I'm looking into the following problem now, it seems that the file extent metadata is modified when we do logging. Thanks Miao # gcc -o wac wac.c # mount /dev/path/to/a/btrfs /mnt # ./wac -l 65536 -n 32 -m 32 -f -r /mnt/victimfile wait a few seconds kaboom --D [1] http://djwong.org/docs/wac.c [2] Double stack-trace: [ 9713.768350] [ cut here ] [ 9713.773188] WARNING: at /storage/home/djwong/cdev/work/linux-spw/fs/btrfs/tree-log.c:3716 btrfs_log_inode_parent+0x427/0x480 [btrfs]() [ 9713.778876] Hardware name: Bochs [ 9713.780877] Modules linked in: btrfs sd_mod scsi_debug ext4 mbcache jbd2 scsi_mod sch_fq_codel nfsv4 eeprom nfsd auth_rpcgss exportfs af_packet raid1 raid0 md_mod zlib_deflate libcrc32c [last unloaded: sd_mod] [ 9713.786778] Pid: 1992, comm: wac Not tainted 3.7.0-rc3-spw #42 [ 9713.789183] Call Trace: [ 9713.791102] [8105210f] warn_slowpath_common+0x7f/0xc0 [ 9713.793428] [8105216a] warn_slowpath_null+0x1a/0x20 [ 9713.795826] [a030ad57] btrfs_log_inode_parent+0x427/0x480 [btrfs] [ 9713.798310] [8117c97c] ? dget_parent+0x1c/0xe0 [ 9713.800653] [a030adf6] btrfs_log_dentry_safe+0x46/0x70 [btrfs] [ 9713.803149] [a02e21c6] btrfs_sync_file+0x1a6/0x240 [btrfs] [ 9713.805626] [81550f49] ? sysret_check+0x22/0x5d [ 9713.807861] [8119705d] do_fsync+0x5d/0x90 [ 9713.810077] [812595ae] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9713.812642] [81197460] sys_fsync+0x10/0x20 [ 9713.814793] [81550f1d] system_call_fastpath+0x1a/0x1f [ 9713.817178] ---[ end trace 47f6b9aede5fa6f5 ]--- [ 9715.884354] [ cut here ] [ 9715.888197] kernel BUG at /storage/home/djwong/cdev/work/linux-spw/fs/btrfs/ctree.c:2950! [ 9715.888197] invalid opcode: [#1] PREEMPT SMP [ 9715.888197] Modules linked in: btrfs sd_mod scsi_debug ext4 mbcache jbd2 scsi_mod sch_fq_codel nfsv4 eeprom nfsd auth_rpcgss exportfs af_packet raid1 raid0 md_mod zlib_deflate libcrc32c [last unloaded: sd_mod] [ 9715.888197] CPU 0 [ 9715.888197] Pid: 1990, comm: wac Tainted: GW3.7.0-rc3-spw #42 Bochs Bochs [ 9715.888197] RIP: 0010:[a02af5b9] [a02af5b9] btrfs_set_item_key_safe+0x149/0x150 [btrfs] [ 9715.888197] RSP: 0018:880028735ae8 EFLAGS: 00010246 [ 9715.888197] RAX: RBX: 000f RCX: e000 [ 9715.888197] RDX: RSI: 880028735c16 RDI: 880028735ac7 [ 9715.888197] RBP: 880028735b48 R08: 0c9e R09: 880028735b08 [ 9715.888197] R10: R11: R12: 880016b62bf0 [ 9715.888197] R13: 880028735c16 R14: 880028735b07 R15: 8800293edbd0 [ 9715.888197] FS: 7f3ffa3b3700() GS:88003fc0() knlGS: [ 9715.888197] CS: 0010 DS: ES: CR0: 8005003b [ 9715.888197] CR2: 7f3ffa3b0fba CR3: 2872d000 CR4: 07f0 [ 9715.888197] DR0: DR1: DR2: [ 9715.888197] DR3: DR6: 0ff0 DR7: 0400 [ 9715.888197] Process wac (pid: 1990, threadinfo 880028734000, task 88002870de50) [ 9715.888197] Stack: [ 9715.888197] 880028735b48 88002f678000 8800131c2000 0100 [ 9715.888197] 6c01 e000 880028735b48 8800293edbd0 [ 9715.888197] 880016b62bf0 d000 0001 [ 9715.888197] Call Trace: [ 9715.888197] [a02e46e7] __btrfs_drop_extents+0x557/0xb00 [btrfs] [ 9715.888197] [a0306fdd] btrfs_log_changed_extents+0x5bd/0x610 [btrfs] [ 9715.888197] [a02efff7] ? free_extent_buffer+0x37/0x90 [btrfs] [ 9715.888197] [a0308efb] btrfs_log_inode+0x50b/0x5b0 [btrfs] [ 9715.888197] [8117c9b8] ? dget_parent+0x58/0xe0 [ 9715.888197] [a030aac2] btrfs_log_inode_parent+0x192/0x480 [btrfs] [ 9715.888197] [8117c97c] ? dget_parent+0x1c/0xe0 [ 9715.888197] [a030adf6] btrfs_log_dentry_safe+0x46/0x70 [btrfs] [ 9715.888197] [a02e21c6] btrfs_sync_file+0x1a6/0x240 [btrfs] [ 9715.888197] [81550f49] ? sysret_check+0x22/0x5d [ 9715.888197] [8119705d] do_fsync+0x5d/0x90 [ 9715.888197] [812595ae] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9715.888197] [81197460]
[PATCH 1/5] Btrfs: fix joining the same transaction handler more than 2 times
If we flush inodes with pending delalloc in a transaction, we may join the same transaction handler more than 2 times. The reason is: Task use_count of trans handle commit_transaction1 |- btrfs_start_delalloc_inodes 1 |- run_delalloc_nocow1 |- join_transaction2 |- cow_file_range 2 |- join_transaction3 In fact, cow_file_range needn't join the transaction again because the caller have joined the transaction, so we fix this problem by this way. Reported-by: Liu Bo bo.li@oracle.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 77 +-- fs/btrfs/transaction.c |1 + 2 files changed, 48 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5fc0990..aadcdd6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -803,14 +803,14 @@ static u64 get_extent_allocation_hint(struct inode *inode, u64 start, * required to start IO on it. It may be clean and already done with * IO when we return. */ -static noinline int cow_file_range(struct inode *inode, - struct page *locked_page, - u64 start, u64 end, int *page_started, - unsigned long *nr_written, - int unlock) +static noinline int __cow_file_range(struct btrfs_trans_handle *trans, +struct inode *inode, +struct btrfs_root *root, +struct page *locked_page, +u64 start, u64 end, int *page_started, +unsigned long *nr_written, +int unlock) { - struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_trans_handle *trans; u64 alloc_hint = 0; u64 num_bytes; unsigned long ram_size; @@ -823,25 +823,10 @@ static noinline int cow_file_range(struct inode *inode, int ret = 0; BUG_ON(btrfs_is_free_space_inode(inode)); - trans = btrfs_join_transaction(root); - if (IS_ERR(trans)) { - extent_clear_unlock_delalloc(inode, -BTRFS_I(inode)-io_tree, -start, end, locked_page, -EXTENT_CLEAR_UNLOCK_PAGE | -EXTENT_CLEAR_UNLOCK | -EXTENT_CLEAR_DELALLOC | -EXTENT_CLEAR_DIRTY | -EXTENT_SET_WRITEBACK | -EXTENT_END_WRITEBACK); - return PTR_ERR(trans); - } - trans-block_rsv = root-fs_info-delalloc_block_rsv; num_bytes = (end - start + blocksize) ~(blocksize - 1); num_bytes = max(blocksize, num_bytes); disk_num_bytes = num_bytes; - ret = 0; /* if this is a small write inside eof, kick off defrag */ if (num_bytes 64 * 1024 @@ -952,11 +937,9 @@ static noinline int cow_file_range(struct inode *inode, alloc_hint = ins.objectid + ins.offset; start += cur_alloc_size; } - ret = 0; out: - btrfs_end_transaction(trans, root); - return ret; + out_unlock: extent_clear_unlock_delalloc(inode, BTRFS_I(inode)-io_tree, @@ -971,6 +954,39 @@ out_unlock: goto out; } +static noinline int cow_file_range(struct inode *inode, + struct page *locked_page, + u64 start, u64 end, int *page_started, + unsigned long *nr_written, + int unlock) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = BTRFS_I(inode)-root; + int ret; + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + extent_clear_unlock_delalloc(inode, +BTRFS_I(inode)-io_tree, +start, end, locked_page, +EXTENT_CLEAR_UNLOCK_PAGE | +EXTENT_CLEAR_UNLOCK | +EXTENT_CLEAR_DELALLOC | +EXTENT_CLEAR_DIRTY | +EXTENT_SET_WRITEBACK | +EXTENT_END_WRITEBACK); + return PTR_ERR(trans); + } + trans-block_rsv = root-fs_info-delalloc_block_rsv; + + ret = __cow_file_range(trans, inode, root, locked_page, start, end, + page_started, nr_written, unlock); + + btrfs_end_transaction(trans, root); + + return ret; +} + /* * work queue call back to started
[PATCH 2/5] Btrfs: fix missing flush when committing a transaction
Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This patch fixes the above problem by flush all pending stuffs when all the other tasks end the transaction. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c | 74 ++- 1 files changed, 47 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 6d0d5a0..d9a9a70 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); } +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); + int snap_pending = 0; + int ret; + + if (!flush_on_commit) { + spin_lock(root-fs_info-trans_lock); + if (!list_empty(trans-transaction-pending_snapshots)) + snap_pending = 1; + spin_unlock(root-fs_info-trans_lock); + } + + if (flush_on_commit || snap_pending) { + btrfs_start_delalloc_inodes(root, 1); + btrfs_wait_ordered_extents(root, 1); + } + + ret = btrfs_run_delayed_items(trans, root); + if (ret) + return ret; + + /* +* running the delayed items may have added new refs. account +* them now so that they hinder processing of more delayed refs +* as little as possible. +*/ + btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info); + + /* +* rename don't use btrfs_join_transaction, so, once we +* set the transaction to blocked above, we aren't going +* to get any new ordered operations. We can safely run +* it here and no for sure that nothing new will be added +* to the list +*/ + btrfs_run_ordered_operations(root, 1); + + return 0; +} + /* * btrfs_transaction state sequence: *in_commit = 0, blocked = 0 (initial) @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, int ret = -EIO; int should_grow = 0; unsigned long now = get_seconds(); - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); btrfs_run_ordered_operations(root, 0); @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, should_grow = 1; do { - int snap_pending = 0; - joined = cur_trans-num_joined; - if (!list_empty(trans-transaction-pending_snapshots)) - snap_pending = 1; WARN_ON(cur_trans != trans-transaction); - if (flush_on_commit || snap_pending) { - btrfs_start_delalloc_inodes(root, 1); - btrfs_wait_ordered_extents(root, 1); - } - - ret = btrfs_run_delayed_items(trans, root); + ret = btrfs_flush_all_pending_stuffs(trans, root); if (ret) goto cleanup_transaction; - /* -* running the delayed items may have added new refs. account -* them now so that they hinder processing of more delayed refs -* as little as possible. -*/ - btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info); - - /* -* rename don't use btrfs_join_transaction, so, once we -* set the transaction to blocked above, we aren't going -* to get any new ordered operations. We can safely run -* it here and no for sure that nothing new will be added -* to the list -*/ - btrfs_run_ordered_operations(root, 1); - prepare_to_wait(cur_trans-writer_wait, wait, TASK_UNINTERRUPTIBLE); @@ -1536,6 +1552,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, } while (atomic_read(cur_trans-num_writers) 1 || (should_grow cur_trans-num_joined != joined)); + ret = btrfs_flush_all_pending_stuffs(trans, root); + if (ret) + goto cleanup_transaction; + /* * Ok now we need to
[PATCH 3/5] Btrfs: fix wrong file extent length
There are two types of the file extent - inline extent and regular extent, When we log file extents, we didn't take inline extent into account, fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/file-item.c | 21 - fs/btrfs/tree-log.c | 10 ++ 3 files changed, 23 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2ce1135..f019fd2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3256,6 +3256,7 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 objectid, u64 bytenr, int mod); +u64 btrfs_file_extent_length(struct btrfs_path *path); int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_ordered_sum *sums); diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 1ad08e4e4..bd38cef 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -133,7 +133,6 @@ fail: return ERR_PTR(ret); } - int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 objectid, @@ -151,6 +150,26 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, return ret; } +u64 btrfs_file_extent_length(struct btrfs_path *path) +{ + int extent_type; + struct btrfs_file_extent_item *fi; + u64 len; + + fi = btrfs_item_ptr(path-nodes[0], path-slots[0], + struct btrfs_file_extent_item); + extent_type = btrfs_file_extent_type(path-nodes[0], fi); + + if (extent_type == BTRFS_FILE_EXTENT_REG || + extent_type == BTRFS_FILE_EXTENT_PREALLOC) + len = btrfs_file_extent_num_bytes(path-nodes[0], fi); + else if (extent_type == BTRFS_FILE_EXTENT_INLINE) + len = btrfs_file_extent_inline_len(path-nodes[0], fi); + else + BUG(); + + return len; +} static int __btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode, struct bio *bio, diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index e9ebb47..cbb544e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3143,7 +3143,6 @@ static int log_one_extent(struct btrfs_trans_handle *trans, struct btrfs_path *dst_path, struct log_args *args) { struct btrfs_root *log = root-log_root; - struct btrfs_file_extent_item *fi; struct btrfs_key key; u64 start = em-mod_start; u64 search_start = start; @@ -3199,10 +3198,7 @@ again: } } while (key.offset start); - fi = btrfs_item_ptr(path-nodes[0], path-slots[0], - struct btrfs_file_extent_item); - num_bytes = btrfs_file_extent_num_bytes(path-nodes[0], - fi); + num_bytes = btrfs_file_extent_length(path); if (key.offset + num_bytes = start) { btrfs_release_path(path); return -ENOENT; @@ -3211,8 +3207,7 @@ again: args-src = path-nodes[0]; next_slot: btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]); - fi = btrfs_item_ptr(args-src, path-slots[0], - struct btrfs_file_extent_item); + num_bytes = btrfs_file_extent_length(path); if (args-nr args-start_slot + args-nr == path-slots[0]) { args-nr++; @@ -3230,7 +3225,6 @@ next_slot: } nritems = btrfs_header_nritems(path-nodes[0]); path-slots[0]++; - num_bytes = btrfs_file_extent_num_bytes(args-src, fi); if (len num_bytes) { /* I _think_ this is ok, envision we write to a * preallocated space that is adjacent to a previously -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Btrfs: fix unprotected extent map operation when logging file extents
We forget to protect the modified_extents list, fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/tree-log.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index cbb544e..f7e9387 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3525,8 +3525,10 @@ next_slot: struct extent_map_tree *tree = BTRFS_I(inode)-extent_tree; struct extent_map *em, *n; + write_lock(tree-lock); list_for_each_entry_safe(em, n, tree-modified_extents, list) list_del_init(em-list); + write_unlock(tree-lock); } if (inode_only == LOG_INODE_ALL S_ISDIR(inode-i_mode)) { -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set
If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent, but now we forget to take it into account, and set a wrong max key, if so, we will skip the file extent metadata when doing logging. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/tree-log.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f7e9387..c495b47 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3394,7 +3394,10 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, /* today the code can only do partial logging of directories */ - if (inode_only == LOG_INODE_EXISTS || S_ISDIR(inode-i_mode)) + if (S_ISDIR(inode-i_mode) || + (!test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + BTRFS_I(inode)-runtime_flags) +inode_only == LOG_INODE_EXISTS)) max_key.type = BTRFS_XATTR_ITEM_KEY; else max_key.type = (u8)-1; -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] flush delalloc by multi-task
Hi, Josef Please drop this patchset from your btrfs-next tree because it may cause the performance regression in some cases. I'll improve it later. Thanks Miao On thu, 25 Oct 2012 17:20:29 +0800, Miao Xie wrote: This patchset introduce multi-task delalloc flush, it can make the delalloc flush more faster. And besides that, it also can fix the problem that we join the same transaction handler more than 2 times. Implementation: - Create a new worker pool. - Queue the inode with pending delalloc into the work queue of the worker pool when we want to force them into the disk, and then we will wait till all the works we submit are done. - The ordered extents also can be queued into this work queue. The process is similar to the second one. Miao Xie (3): Btrfs: make delalloc inodes be flushed by multi-task Btrfs: make ordered operations be handled by multi-task Btrfs: make ordered extent be flushed by multi-task fs/btrfs/ctree.h| 14 +++ fs/btrfs/disk-io.c |7 fs/btrfs/inode.c| 78 ++--- fs/btrfs/ordered-data.c | 87 ++- fs/btrfs/ordered-data.h |7 +++- fs/btrfs/relocation.c |6 +++- fs/btrfs/transaction.c | 24 ++--- 7 files changed, 185 insertions(+), 38 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
(sorry, forgot to cc linux-btrfs.) On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This is weird, create_snapshot() will first add pending snapshot into list and then commit the transaction itself, regardless of if the snapshot is different with others or not. But the transaction may be committed by the other task, and the snapshot creation task just wait until it ends. It's possible that a commit tranaction becomes a end transaction when it finds itself is already in commit. So if snapshot creation starts the transaction, it will increment the transaction's num_writers, why does not the other task wait for its end_transacion? I doubt if this can really happen anyway... Can you elaborate the situation more? thanks, liubo How do you find this? Just by review the code. I think it can be triggered Thanks Miao thanks, liubo This patch fixes the above problem by flush all pending stuffs when all the other tasks end the transaction. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c | 74 ++- 1 files changed, 47 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 6d0d5a0..d9a9a70 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); } +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, +struct btrfs_root *root) +{ + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); + int snap_pending = 0; + int ret; + + if (!flush_on_commit) { + spin_lock(root-fs_info-trans_lock); + if (!list_empty(trans-transaction-pending_snapshots)) + snap_pending = 1; + spin_unlock(root-fs_info-trans_lock); + } + + if (flush_on_commit || snap_pending) { + btrfs_start_delalloc_inodes(root, 1); + btrfs_wait_ordered_extents(root, 1); + } + + ret = btrfs_run_delayed_items(trans, root); + if (ret) + return ret; + + /* + * running the delayed items may have added new refs. account + * them now so that they hinder processing of more delayed refs + * as little as possible. + */ + btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info); + + /* + * rename don't use btrfs_join_transaction, so, once we + * set the transaction to blocked above, we aren't going + * to get any new ordered operations. We can safely run + * it here and no for sure that nothing new will be added + * to the list + */ + btrfs_run_ordered_operations(root, 1); + + return 0; +} + /* * btrfs_transaction state sequence: *in_commit = 0, blocked = 0 (initial) @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, int ret = -EIO; int should_grow = 0; unsigned long now = get_seconds(); - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); btrfs_run_ordered_operations(root, 0); @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, should_grow = 1; do { - int snap_pending = 0; - joined = cur_trans-num_joined; - if (!list_empty(trans-transaction-pending_snapshots)) - snap_pending = 1; WARN_ON(cur_trans != trans-transaction); - if (flush_on_commit || snap_pending) { - btrfs_start_delalloc_inodes(root, 1); - btrfs_wait_ordered_extents(root, 1); - } - - ret = btrfs_run_delayed_items(trans, root); + ret = btrfs_flush_all_pending_stuffs(trans, root); if (ret) goto cleanup_transaction; - /* - * running the delayed items may have added new refs. account - * them now so that they hinder processing of more delayed refs - * as little as possible. - */ - btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info); - - /*
Re: [PATCH 3/5] Btrfs: fix wrong file extent length
On Thu, Nov 01, 2012 at 03:33:59PM +0800, Miao Xie wrote: There are two types of the file extent - inline extent and regular extent, When we log file extents, we didn't take inline extent into account, fix it. Good catch. Reviewed-by: Liu Bo bo.li@oracle.com thanks, liubo Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/file-item.c | 21 - fs/btrfs/tree-log.c | 10 ++ 3 files changed, 23 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2ce1135..f019fd2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3256,6 +3256,7 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 objectid, u64 bytenr, int mod); +u64 btrfs_file_extent_length(struct btrfs_path *path); int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_ordered_sum *sums); diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 1ad08e4e4..bd38cef 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -133,7 +133,6 @@ fail: return ERR_PTR(ret); } - int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 objectid, @@ -151,6 +150,26 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, return ret; } +u64 btrfs_file_extent_length(struct btrfs_path *path) +{ + int extent_type; + struct btrfs_file_extent_item *fi; + u64 len; + + fi = btrfs_item_ptr(path-nodes[0], path-slots[0], + struct btrfs_file_extent_item); + extent_type = btrfs_file_extent_type(path-nodes[0], fi); + + if (extent_type == BTRFS_FILE_EXTENT_REG || + extent_type == BTRFS_FILE_EXTENT_PREALLOC) + len = btrfs_file_extent_num_bytes(path-nodes[0], fi); + else if (extent_type == BTRFS_FILE_EXTENT_INLINE) + len = btrfs_file_extent_inline_len(path-nodes[0], fi); + else + BUG(); + + return len; +} static int __btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode, struct bio *bio, diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index e9ebb47..cbb544e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3143,7 +3143,6 @@ static int log_one_extent(struct btrfs_trans_handle *trans, struct btrfs_path *dst_path, struct log_args *args) { struct btrfs_root *log = root-log_root; - struct btrfs_file_extent_item *fi; struct btrfs_key key; u64 start = em-mod_start; u64 search_start = start; @@ -3199,10 +3198,7 @@ again: } } while (key.offset start); - fi = btrfs_item_ptr(path-nodes[0], path-slots[0], - struct btrfs_file_extent_item); - num_bytes = btrfs_file_extent_num_bytes(path-nodes[0], - fi); + num_bytes = btrfs_file_extent_length(path); if (key.offset + num_bytes = start) { btrfs_release_path(path); return -ENOENT; @@ -3211,8 +3207,7 @@ again: args-src = path-nodes[0]; next_slot: btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]); - fi = btrfs_item_ptr(args-src, path-slots[0], - struct btrfs_file_extent_item); + num_bytes = btrfs_file_extent_length(path); if (args-nr args-start_slot + args-nr == path-slots[0]) { args-nr++; @@ -3230,7 +3225,6 @@ next_slot: } nritems = btrfs_header_nritems(path-nodes[0]); path-slots[0]++; - num_bytes = btrfs_file_extent_num_bytes(args-src, fi); if (len num_bytes) { /* I _think_ this is ok, envision we write to a * preallocated space that is adjacent to a previously -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] Btrfs: fix unprotected extent map operation when logging file extents
On Thu, Nov 01, 2012 at 03:34:54PM +0800, Miao Xie wrote: We forget to protect the modified_extents list, fix it. Looks good to me. Reviewed-by: Liu Bo bo.li@oracle.com thanks, liubo Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/tree-log.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index cbb544e..f7e9387 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3525,8 +3525,10 @@ next_slot: struct extent_map_tree *tree = BTRFS_I(inode)-extent_tree; struct extent_map *em, *n; + write_lock(tree-lock); list_for_each_entry_safe(em, n, tree-modified_extents, list) list_del_init(em-list); + write_unlock(tree-lock); } if (inode_only == LOG_INODE_ALL S_ISDIR(inode-i_mode)) { -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote: (sorry, forgot to cc linux-btrfs.) On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This is weird, create_snapshot() will first add pending snapshot into list and then commit the transaction itself, regardless of if the snapshot is different with others or not. But the transaction may be committed by the other task, and the snapshot creation task just wait until it ends. It's possible that a commit tranaction becomes a end transaction when it finds itself is already in commit. So if snapshot creation starts the transaction, it will increment the transaction's num_writers, why does not the other task wait for its end_transacion? I doubt if this can really happen anyway... Can you elaborate the situation more? Task1 Task2 start_transaction start_transaction commit_transaction set in_commit to 1 check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush commit_transaction find in_commit is 1 end_transaction (num_writer--) wait_for_commit num_writer is 1 continue committing the transaction ... Thanks Miao thanks, liubo How do you find this? Just by review the code. I think it can be triggered Thanks Miao thanks, liubo This patch fixes the above problem by flush all pending stuffs when all the other tasks end the transaction. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c | 74 ++- 1 files changed, 47 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 6d0d5a0..d9a9a70 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); } +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, +struct btrfs_root *root) +{ + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); + int snap_pending = 0; + int ret; + + if (!flush_on_commit) { + spin_lock(root-fs_info-trans_lock); + if (!list_empty(trans-transaction-pending_snapshots)) + snap_pending = 1; + spin_unlock(root-fs_info-trans_lock); + } + + if (flush_on_commit || snap_pending) { + btrfs_start_delalloc_inodes(root, 1); + btrfs_wait_ordered_extents(root, 1); + } + + ret = btrfs_run_delayed_items(trans, root); + if (ret) + return ret; + + /* + * running the delayed items may have added new refs. account + * them now so that they hinder processing of more delayed refs + * as little as possible. + */ + btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info); + + /* + * rename don't use btrfs_join_transaction, so, once we + * set the transaction to blocked above, we aren't going + * to get any new ordered operations. We can safely run + * it here and no for sure that nothing new will be added + * to the list + */ + btrfs_run_ordered_operations(root, 1); + + return 0; +} + /* * btrfs_transaction state sequence: *in_commit = 0, blocked = 0 (initial) @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, int ret = -EIO; int should_grow = 0; unsigned long now = get_seconds(); - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); btrfs_run_ordered_operations(root, 0); @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, should_grow = 1; do { - int snap_pending = 0; - joined = cur_trans-num_joined; - if (!list_empty(trans-transaction-pending_snapshots)) - snap_pending = 1; WARN_ON(cur_trans != trans-transaction); - if
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On Thu, Nov 01, 2012 at 04:16:43PM +0800, Miao Xie wrote: On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote: (sorry, forgot to cc linux-btrfs.) On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This is weird, create_snapshot() will first add pending snapshot into list and then commit the transaction itself, regardless of if the snapshot is different with others or not. But the transaction may be committed by the other task, and the snapshot creation task just wait until it ends. It's possible that a commit tranaction becomes a end transaction when it finds itself is already in commit. So if snapshot creation starts the transaction, it will increment the transaction's num_writers, why does not the other task wait for its end_transacion? I doubt if this can really happen anyway... Can you elaborate the situation more? Task1 Task2 start_transaction start_transaction commit_transaction set in_commit to 1 check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush commit_transaction find in_commit is 1 end_transaction (num_writer--) wait_for_commit num_writer is 1 continue committing the transaction ... Make sense. Then I think we'd better put the flush part right after setting 'trans_no_join = 1' since snapshot creation may also join an existing transaction. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set
On Thu, Nov 01, 2012 at 03:35:23PM +0800, Miao Xie wrote: If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent, but now we forget to take it into account, and set a wrong max key, if so, we will skip the file extent metadata when doing logging. Fix it. But it's along with LOG_INODE_EXISTS, which is set by rename and link and means we need to log just enough to rebuild the inode during log replay. On the other side, if we do log all the extents because of having set BTRFS_INODE_NEEDS_FULL_SYNC, we don't know if we actually get what we want because rename and link do not wait for dirty pages as fsync does. thanks, liubo Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/tree-log.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f7e9387..c495b47 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3394,7 +3394,10 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, /* today the code can only do partial logging of directories */ - if (inode_only == LOG_INODE_EXISTS || S_ISDIR(inode-i_mode)) + if (S_ISDIR(inode-i_mode) || + (!test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, +BTRFS_I(inode)-runtime_flags) + inode_only == LOG_INODE_EXISTS)) max_key.type = BTRFS_XATTR_ITEM_KEY; else max_key.type = (u8)-1; -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] shared compression workspaces limits doesnot match
In function find_workspace, it's allowed to alloc cpus + 1 workspaces at most, but in function free_workspace, it will freed the workspace if there exists more then cpus' workspaces. The two limits doesn't match. I thought the original itention is allowed to alloc cpus compression workspaces at most. Signed-off-by: Rock Lee geekerrock...@gmail.com --- fs/btrfs/compression.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index c6467aa..eef1811 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -779,7 +779,7 @@ again: return workspace; } - if (atomic_read(alloc_workspace) cpus) { + if (atomic_read(alloc_workspace) = cpus) { DEFINE_WAIT(wait); spin_unlock(workspace_lock); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Request for review] [RFC] Add label support for snapshots and subvols
From: Anand Jain anand.j...@oracle.com (This patch is for the review/test not yet for the integration). Here is an implementation of the feature to add label to the subvolume and snapshots. Which would help sysadmin to better manager the subvol and snapshots. This can be done in two ways, one - using attr which is user land only changes but drawback is able to change the label using the non btrfs cli. And the other way is to add a member to btrfs_root_item in the btrfs kernel to hold the label info for each snapshot and subvol. The drawback here is having to introduce V3 version of this structure. If there is any better way pls do share. The patch code is for the review. Any comments/suggestion welcome. Below is a demo of this new feature. btrfs fi label -t /btrfs/sv1 Prod-DB btrfs fi label -t /btrfs/sv1 Prod-DB btrfs su snap /btrfs/sv1 /btrfs/snap1-sv1 Create a snapshot of '/btrfs/sv1' in '/btrfs/snap1-sv1' btrfs fi label -t /btrfs/snap1-sv1 btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Thanks, Anand Anand Jain (2): Btrfs-progs: move open_file_or_dir() to utils.c Btrfs-progs: add feature to label subvol and snapshot Makefile |4 ++-- btrfsctl.c|7 --- btrfslabel.c | 40 btrfslabel.h |4 +++- cmds-balance.c|1 + cmds-filesystem.c | 34 +- cmds-inspect.c|1 + cmds-qgroup.c |1 + cmds-quota.c |1 + cmds-subvolume.c |1 + commands.h|3 --- common.c | 46 -- ioctl.h |2 ++ utils.c | 30 -- utils.h |3 +++ 15 files changed, 116 insertions(+), 62 deletions(-) delete mode 100644 common.c Btrfs: add label to snapshot and subvol fs/btrfs/ctree.h | 14 ++ fs/btrfs/ioctl.c | 32 fs/btrfs/ioctl.h |2 ++ fs/btrfs/root-tree.c | 44 +++- fs/btrfs/transaction.c |1 + 5 files changed, 72 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs-progs: add feature to label subvol and snapshot
From: Anand Jain anand.j...@oracle.com Signed-off-by: Anand Jain anand.j...@oracle.com --- btrfslabel.c | 40 btrfslabel.h |4 +++- cmds-filesystem.c | 34 +- ioctl.h |2 ++ 4 files changed, 74 insertions(+), 6 deletions(-) diff --git a/btrfslabel.c b/btrfslabel.c index cb142b0..90fe618 100644 --- a/btrfslabel.c +++ b/btrfslabel.c @@ -126,3 +126,43 @@ int set_label(char *btrfs_dev, char *nLabel) change_label_unmounted(btrfs_dev, nLabel); return 0; } + +int get_subvol_label(char *subvol) +{ + int fd, e=0; + char label[BTRFS_LABEL_SIZE+1]; + + fd = open_file_or_dir(subvol); + + if(ioctl(fd, BTRFS_IOC_SUBVOL_GETLABEL, label) 0) { + e = errno; + fprintf(stderr, ERROR: get subvol label failed, %s\n, + strerror(e)); + close(fd); + return -e; + } + label[BTRFS_LABEL_SIZE] = '\0'; + printf(%s\n,label); + close(fd); + return 0; +} + +int set_subvol_label(char *subvol, char *labelp) +{ + int fd, e=0; + char label[BTRFS_LABEL_SIZE]; + + fd = open_file_or_dir(subvol); + + memset(label, 0, BTRFS_LABEL_SIZE); + strcpy(label, labelp); + if(ioctl(fd, BTRFS_IOC_SUBVOL_SETLABEL, label) 0) { + e = errno; + fprintf(stderr, ERROR: set subvol label failed, %s\n, + strerror(e)); + close(fd); + return -e; + } + close(fd); + return 0; +} diff --git a/btrfslabel.h b/btrfslabel.h index abf43ad..3db4180 100644 --- a/btrfslabel.h +++ b/btrfslabel.h @@ -2,4 +2,6 @@ int get_label(char *btrfs_dev); -int set_label(char *btrfs_dev, char *nLabel); \ No newline at end of file +int set_label(char *btrfs_dev, char *nLabel); +int get_subvol_label(char *subvol); +int set_subvol_label(char *subvol, char *labelp); diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 9c43d35..718e70b 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -22,6 +22,7 @@ #include errno.h #include uuid/uuid.h #include ctype.h +#include sys/stat.h #include kerncompat.h #include ctree.h @@ -517,18 +518,41 @@ static int cmd_resize(int argc, char **argv) } static const char * const cmd_label_usage[] = { - btrfs filesystem label device [newlabel], - Get or change the label of an unmounted filesystem, - With one argument, get the label of filesystem on device., - If newlabel is passed, set the filesystem label to newlabel., + btrfs filesystem label [-t] device|subvol [newlabel], + Read or modify filesystem or subvol or snapshot label, + -t Read or modify label for the specified subvol or snapshot., NULL }; static int cmd_label(int argc, char **argv) { - if (check_argc_min(argc, 2) || check_argc_max(argc, 3)) + struct stat st; + char subvol[BTRFS_PATH_NAME_MAX+1]; + + if (check_argc_min(argc, 2) || check_argc_max(argc, 4)) usage(cmd_label_usage); + if(getopt(argc, argv, t:) != -1) { + if(optarg == NULL || strlen(optarg) BTRFS_PATH_NAME_MAX) + return -1; + else { + strcpy(subvol,optarg); + if(stat(subvol, st) 0) { + fprintf(stderr, Error: %s\n,strerror(errno)); + return -errno; + } + if(!S_ISDIR(st.st_mode)) { + fprintf(stderr, Error: Not a dir\n); + return -1; + } + } + if (argc 3) + return set_subvol_label(argv[2], argv[3]); + else + return get_subvol_label(argv[2]); + return 0; + } + if (argc 2) return set_label(argv[1], argv[2]); else diff --git a/ioctl.h b/ioctl.h index 6fda3a1..009fa6e 100644 --- a/ioctl.h +++ b/ioctl.h @@ -432,4 +432,6 @@ struct btrfs_ioctl_clone_range_args { struct btrfs_ioctl_qgroup_create_args) #define BTRFS_IOC_QGROUP_LIMIT _IOR(BTRFS_IOCTL_MAGIC, 43, \ struct btrfs_ioctl_qgroup_limit_args) +#define BTRFS_IOC_SUBVOL_GETLABEL _IOWR(BTRFS_IOCTL_MAGIC, 55, __u64) +#define BTRFS_IOC_SUBVOL_SETLABEL _IOW(BTRFS_IOCTL_MAGIC, 56, __u64) #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs-progs: move open_file_or_dir() to utils.c
From: Anand Jain anand.j...@oracle.com The definition of the function open_file_or_dir() is moved from common.c to utils.c in order to be able to share some common code between scrub and the device stats in the following step. That common code uses open_file_or_dir(). Since open_file_or_dir() makes use of the function dirfd(3), the required XOPEN version was raised from 6 to 7. Signed-off-by: Anand Jain anand.j...@oracle.com Original-Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de --- Makefile |4 ++-- btrfsctl.c |7 --- cmds-balance.c |1 + cmds-inspect.c |1 + cmds-qgroup.c|1 + cmds-quota.c |1 + cmds-subvolume.c |1 + commands.h |3 --- common.c | 46 -- utils.c | 30 -- utils.h |3 +++ 11 files changed, 42 insertions(+), 56 deletions(-) delete mode 100644 common.c diff --git a/Makefile b/Makefile index 4894903..8576d90 100644 --- a/Makefile +++ b/Makefile @@ -41,8 +41,8 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects) - $(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ +btrfs: $(objects) btrfs.o help.o $(cmds_objects) + $(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o diff --git a/btrfsctl.c b/btrfsctl.c index 518684c..049a5f3 100644 --- a/btrfsctl.c +++ b/btrfsctl.c @@ -63,7 +63,7 @@ static void print_usage(void) exit(1); } -static int open_file_or_dir(const char *fname) +static int btrfsctl_open_file_or_dir(const char *fname) { int ret; struct stat st; @@ -91,6 +91,7 @@ static int open_file_or_dir(const char *fname) } return fd; } + int main(int ac, char **av) { char *fname = NULL; @@ -128,7 +129,7 @@ int main(int ac, char **av) snap_location = strdup(fullpath); snap_location = dirname(snap_location); - snap_fd = open_file_or_dir(snap_location); + snap_fd = btrfsctl_open_file_or_dir(snap_location); name = strdup(fullpath); name = basename(name); @@ -238,7 +239,7 @@ int main(int ac, char **av) } name = fname; } else { - fd = open_file_or_dir(fname); + fd = btrfsctl_open_file_or_dir(fname); } if (name) { diff --git a/cmds-balance.c b/cmds-balance.c index 38a7426..6268b61 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -28,6 +28,7 @@ #include volumes.h #include commands.h +#include utils.h static const char * const balance_cmd_group_usage[] = { btrfs [filesystem] balance command [options] path, diff --git a/cmds-inspect.c b/cmds-inspect.c index edabff5..79e069b 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -22,6 +22,7 @@ #include kerncompat.h #include ioctl.h +#include utils.h #include commands.h #include btrfs-list.h diff --git a/cmds-qgroup.c b/cmds-qgroup.c index 1525c11..cafc284 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -24,6 +24,7 @@ #include ioctl.h #include commands.h +#include utils.h static const char * const qgroup_cmd_group_usage[] = { btrfs qgroup command [options] path, diff --git a/cmds-quota.c b/cmds-quota.c index cf9ad97..8481514 100644 --- a/cmds-quota.c +++ b/cmds-quota.c @@ -23,6 +23,7 @@ #include ioctl.h #include commands.h +#include utils.h static const char * const quota_cmd_group_usage[] = { btrfs quota command [options] path, diff --git a/cmds-subvolume.c b/cmds-subvolume.c index ac39f7b..e3cdb1e 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -32,6 +32,7 @@ #include ctree.h #include commands.h #include btrfs-list.h +#include utils.h static const char * const subvolume_cmd_group_usage[] = { btrfs subvolume command args, diff --git a/commands.h b/commands.h index bb6d2dd..8114a73 100644 --- a/commands.h +++ b/commands.h @@ -79,9 +79,6 @@ void help_ambiguous_token(const char *arg, const struct cmd_group *grp); void help_command_group(const struct cmd_group *grp, int argc, char **argv); -/* common.c */ -int open_file_or_dir(const char *fname); - extern const struct cmd_group subvolume_cmd_group; extern const struct cmd_group filesystem_cmd_group; extern const struct cmd_group balance_cmd_group; diff --git a/common.c b/common.c deleted file mode 100644 index 03f6570..000 --- a/common.c +++ /dev/null @@ -1,46 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY;
[PATCH] Btrfs: add label to snapshot and subvol
From: Anand Jain anand.j...@oracle.com This modifies the struct btrfs_root_item to hold the label, and make it v3 of this structure. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/ctree.h | 14 ++ fs/btrfs/ioctl.c | 32 fs/btrfs/ioctl.h |2 ++ fs/btrfs/root-tree.c | 44 +++- fs/btrfs/transaction.c |1 + 5 files changed, 72 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 994d255..b280256 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -759,6 +759,10 @@ struct btrfs_root_item { struct btrfs_timespec otime; struct btrfs_timespec stime; struct btrfs_timespec rtime; + + __le64 generation_v3; + char label[BTRFS_LABEL_SIZE]; /* Add label to subvol */ + __le64 reserved[8]; /* for future */ } __attribute__ ((__packed__)); @@ -2428,6 +2432,8 @@ BTRFS_SETGET_STACK_FUNCS(root_last_snapshot, struct btrfs_root_item, last_snapshot, 64); BTRFS_SETGET_STACK_FUNCS(root_generation_v2, struct btrfs_root_item, generation_v2, 64); +BTRFS_SETGET_STACK_FUNCS(root_generation_v3, struct btrfs_root_item, +generation_v3, 64); BTRFS_SETGET_STACK_FUNCS(root_ctransid, struct btrfs_root_item, ctransid, 64); BTRFS_SETGET_STACK_FUNCS(root_otransid, struct btrfs_root_item, @@ -2441,6 +2447,14 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root) { return (root-root_item.flags cpu_to_le64(BTRFS_ROOT_SUBVOL_RDONLY)) != 0; } +static inline char * btrfs_root_label(struct btrfs_root *root) +{ + return (root-root_item.label); +} +static inline void btrfs_root_set_label(struct btrfs_root *root, char *val) +{ + memcpy(root-root_item.label,val,BTRFS_LABEL_SIZE); +} /* struct btrfs_root_backup */ BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e58bd9d..cce0128 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3725,6 +3725,34 @@ static int btrfs_ioctl_set_label(struct btrfs_root *root, void __user *arg) return 0; } +static int btrfs_ioctl_subvol_getlabel(struct btrfs_root *root, + void __user *arg) +{ + char *label; + + label = btrfs_root_label(root); + if (copy_to_user(arg, label, BTRFS_LABEL_SIZE)) + return -EFAULT; + return 0; +} + +static int btrfs_ioctl_subvol_setlabel(struct btrfs_root *root, + void __user *arg) +{ + char label[BTRFS_LABEL_SIZE]; + struct btrfs_trans_handle *trans; + + if (copy_from_user(label, arg, BTRFS_LABEL_SIZE)) + return -EFAULT; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_root_set_label(root, label); + btrfs_commit_transaction(trans, root); + + return 0; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -3827,6 +3855,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_get_label(root, argp); case BTRFS_IOC_SET_LABEL: return btrfs_ioctl_set_label(root, argp); + case BTRFS_IOC_SUBVOL_GETLABEL: + return btrfs_ioctl_subvol_getlabel(root, argp); + case BTRFS_IOC_SUBVOL_SETLABEL: + return btrfs_ioctl_subvol_setlabel(root, argp); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 0c60fcb..1009a0c 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -455,4 +455,6 @@ struct btrfs_ioctl_send_args { struct btrfs_ioctl_get_dev_stats) #define BTRFS_IOC_GET_LABEL _IOR(BTRFS_IOCTL_MAGIC, 53, __u64) #define BTRFS_IOC_SET_LABEL _IOW(BTRFS_IOCTL_MAGIC, 54, __u64) +#define BTRFS_IOC_SUBVOL_GETLABEL _IOWR(BTRFS_IOCTL_MAGIC, 55, __u64) +#define BTRFS_IOC_SUBVOL_SETLABEL _IOW(BTRFS_IOCTL_MAGIC, 56, __u64) #endif diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index eb923d0..2a9ae5f 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -35,32 +35,34 @@ void btrfs_read_root_item(struct btrfs_root *root, { uuid_le uuid; int len; - int need_reset = 0; len = btrfs_item_size_nr(eb, slot); read_extent_buffer(eb, item, btrfs_item_ptr_offset(eb, slot), min_t(int, len, (int)sizeof(*item))); - if (len sizeof(*item)) - need_reset = 1; - if (!need_reset btrfs_root_generation(item) - != btrfs_root_generation_v2(item)) { - if (btrfs_root_generation_v2(item) != 0) { - printk(KERN_WARNING btrfs: mismatching -
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On Thu, 1 Nov 2012 17:00:00 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 04:16:43PM +0800, Miao Xie wrote: On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote: (sorry, forgot to cc linux-btrfs.) On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This is weird, create_snapshot() will first add pending snapshot into list and then commit the transaction itself, regardless of if the snapshot is different with others or not. But the transaction may be committed by the other task, and the snapshot creation task just wait until it ends. It's possible that a commit tranaction becomes a end transaction when it finds itself is already in commit. So if snapshot creation starts the transaction, it will increment the transaction's num_writers, why does not the other task wait for its end_transacion? I doubt if this can really happen anyway... Can you elaborate the situation more? Task1Task2 start_transaction start_transaction commit_transaction set in_commit to 1 check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush commit_transaction find in_commit is 1 end_transaction (num_writer--) wait_for_commit num_writer is 1 continue committing the transaction ... Make sense. Then I think we'd better put the flush part right after setting 'trans_no_join = 1' No, or the flusher will be blocked when it joins an transaction. since snapshot creation may also join an existing transaction. It is impossible because btrfs_start_transaction is different from btrfs_join_transaction, it will be blocked when transaction-blocked is 1. Snapshot creation uses btrfs_start_transaction. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
Marc MERLIN wrote (ao): That said, it's working fine again for now after I went back to kernel 3.5.3 (down from 3.6.3). It hasn't been long enough to say for sure, but there is a remote possibility that changes in 3.6 actually caused my drive to freeze after several hours of use. When that happened (3 times), 2 of those times, btrfs did not manage to write all its data before access was cutoff, and I got the bug I reported here, which in turn crashes any kernel you try to mount the FS with. Cleaning the log manually fixed it both times so far. For now, I'll stick with 3.5.3 for a while to make sure my drive is actually ok (it seems to be afterall), and once I'm happy that it's the case, I'll go back to 3.6.3 with serial console remote logging and try to capture the full sata failure I got with 3.6.3. Thanks for the info. You could put some load on the ssd to see if you can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or badblocks (in the default non-destructive mode). Can you collect SMART data (with smartctl) from the ssd? Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: find-new possibility of showing modified and deleted files/directories
On Thu, 01 Nov 2012 06:06:57 +0100, Arne Jansen wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. Consider the rsync bundle format as well. That should provide interoperability with any filesystem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
Hi Liubo, I couldn't apply your V4 patch against the btrfs-next HEAD. Do you have a github branch which I can checkout? Thanks, Itaru On Wed, Oct 31, 2012 at 9:55 PM, Liu Bo bo.li@oracle.com wrote: On 10/31/2012 08:13 PM, Itaru Kitayama wrote: Hi LiuBo: I am seeing another warning with your patch applied btrfs-next. Hi Itaru, Thanks for testing, you seems to be using an old version, since in the new version record_extent_backrefs() does not own a WARN_ON(). Could you please test it again with the new patches applied? thanks, liubo [ 5224.531560] [ cut here ] [ 5224.531565] WARNING: at fs/btrfs/inode.c:2054 record_extent_backrefs+0x87/0xe0() [ 5224.531567] Hardware name: Bochs [ 5224.531568] Modules linked in: microcode ppdev psmouse nfsd nfs_acl auth_rpcgss serio_raw nfs fscache lockd binfmt_misc sunrpc cirrus parport_pc ttm drm_kms_helper drm sysimgblt i2c_piix4 sysfillrect syscopyarea i2c_core lp parport floppy [ 5224.531591] Pid: 2485, comm: btrfs-endio-wri Tainted: GW 3.7.0-rc1-v11+ #53 [ 5224.531592] Call Trace: [ 5224.531598] [81061c63] warn_slowpath_common+0x93/0xc0 [ 5224.531600] [81061caa] warn_slowpath_null+0x1a/0x20 [ 5224.531603] [81322287] record_extent_backrefs+0x87/0xe0 [ 5224.531606] [8132d10b] btrfs_finish_ordered_io+0x8bb/0xa80 [ 5224.531611] [810ce300] ? trace_hardirqs_off_caller+0xb0/0x140 [ 5224.531614] [8132d2e5] finish_ordered_fn+0x15/0x20 [ 5224.531617] [8134beb7] worker_loop+0x157/0x580 [ 5224.531620] [8134bd60] ? btrfs_queue_worker+0x2f0/0x2f0 [ 5224.531624] [81090aa8] kthread+0xe8/0xf0 [ 5224.531627] [810ce3c2] ? get_lock_stats+0x22/0x70 [ 5224.531630] [810909c0] ? kthread_create_on_node+0x160/0x160 [ 5224.531634] [817c1c6c] ret_from_fork+0x7c/0xb0 [ 5224.531636] [810909c0] ? kthread_create_on_node+0x160/0x160 [ 5224.531638] ---[ end trace 0256d2b5a195208c ]--- I've compared some of the old extents logical addresses with the corresponding object ids and offsets from the extent tree; some are just 8k off from the found extents and some keys are totally off. Itaru On Sat, Oct 27, 2012 at 7:28 PM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v3-v4: - fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. fs/btrfs/inode.c | 609 ++ 1 files changed, 609 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 85a1e50..35e6993 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -54,6 +54,7 @@ #include locking.h #include free-space-cache.h #include inode-map.h +#include backref.h struct btrfs_iget_args { u64 ino; @@ -1839,6 +1840,600 @@ out: return ret; } +/* snapshot-aware defrag */ +struct sa_defrag_extent_backref { + struct rb_node node; + struct old_sa_defrag_extent *old; + u64 root_id; + u64 inum; + u64 file_pos; + u64 extent_offset; + u64 num_bytes; + u64 generation; +}; + +struct old_sa_defrag_extent { + struct list_head list; + struct new_sa_defrag_extent *new; + + u64 extent_offset; + u64 bytenr; + u64 offset; + u64 len; + int count; +}; + +struct new_sa_defrag_extent { + struct rb_root root; + struct list_head head; + struct btrfs_path *path; + struct inode *inode; + u64 file_pos; + u64 len; + u64 bytenr; + u64 disk_len; + u8 compress_type; +}; + +static int backref_comp(struct sa_defrag_extent_backref *b1, + struct sa_defrag_extent_backref *b2) +{ + if (b1-root_id b2-root_id) + return -1; + else if (b1-root_id b2-root_id) + return 1; + + if (b1-inum b2-inum) + return -1; + else if (b1-inum b2-inum) + return 1; + + if (b1-file_pos b2-file_pos) + return -1; + else if (b1-file_pos b2-file_pos) + return 1; + + return 0; +} + +static void
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
On Thu, Nov 01, 2012 at 08:08:52PM +0900, Itaru Kitayama wrote: Hi Liubo, I couldn't apply your V4 patch against the btrfs-next HEAD. Do you have a github branch which I can checkout? The current btrfs-next HEAD actually have included this v4 patch, so just pull btrfs-next and give it a shot :) thanks, liubo Thanks, Itaru On Wed, Oct 31, 2012 at 9:55 PM, Liu Bo bo.li@oracle.com wrote: On 10/31/2012 08:13 PM, Itaru Kitayama wrote: Hi LiuBo: I am seeing another warning with your patch applied btrfs-next. Hi Itaru, Thanks for testing, you seems to be using an old version, since in the new version record_extent_backrefs() does not own a WARN_ON(). Could you please test it again with the new patches applied? thanks, liubo [ 5224.531560] [ cut here ] [ 5224.531565] WARNING: at fs/btrfs/inode.c:2054 record_extent_backrefs+0x87/0xe0() [ 5224.531567] Hardware name: Bochs [ 5224.531568] Modules linked in: microcode ppdev psmouse nfsd nfs_acl auth_rpcgss serio_raw nfs fscache lockd binfmt_misc sunrpc cirrus parport_pc ttm drm_kms_helper drm sysimgblt i2c_piix4 sysfillrect syscopyarea i2c_core lp parport floppy [ 5224.531591] Pid: 2485, comm: btrfs-endio-wri Tainted: GW 3.7.0-rc1-v11+ #53 [ 5224.531592] Call Trace: [ 5224.531598] [81061c63] warn_slowpath_common+0x93/0xc0 [ 5224.531600] [81061caa] warn_slowpath_null+0x1a/0x20 [ 5224.531603] [81322287] record_extent_backrefs+0x87/0xe0 [ 5224.531606] [8132d10b] btrfs_finish_ordered_io+0x8bb/0xa80 [ 5224.531611] [810ce300] ? trace_hardirqs_off_caller+0xb0/0x140 [ 5224.531614] [8132d2e5] finish_ordered_fn+0x15/0x20 [ 5224.531617] [8134beb7] worker_loop+0x157/0x580 [ 5224.531620] [8134bd60] ? btrfs_queue_worker+0x2f0/0x2f0 [ 5224.531624] [81090aa8] kthread+0xe8/0xf0 [ 5224.531627] [810ce3c2] ? get_lock_stats+0x22/0x70 [ 5224.531630] [810909c0] ? kthread_create_on_node+0x160/0x160 [ 5224.531634] [817c1c6c] ret_from_fork+0x7c/0xb0 [ 5224.531636] [810909c0] ? kthread_create_on_node+0x160/0x160 [ 5224.531638] ---[ end trace 0256d2b5a195208c ]--- I've compared some of the old extents logical addresses with the corresponding object ids and offsets from the extent tree; some are just 8k off from the found extents and some keys are totally off. Itaru On Sat, Oct 27, 2012 at 7:28 PM, Liu Bo bo.li@oracle.com wrote: This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Originally patch by Li Zefan l...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v3-v4: - fix duplicated refs bugs detected by mounting with autodefrag, thanks for the bug report from Mitch and Chris. fs/btrfs/inode.c | 609 ++ 1 files changed, 609 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 85a1e50..35e6993 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -54,6 +54,7 @@ #include locking.h #include free-space-cache.h #include inode-map.h +#include backref.h struct btrfs_iget_args { u64 ino; @@ -1839,6 +1840,600 @@ out: return ret; } +/* snapshot-aware defrag */ +struct sa_defrag_extent_backref { + struct rb_node node; + struct old_sa_defrag_extent *old; + u64 root_id; + u64 inum; + u64 file_pos; + u64 extent_offset; + u64 num_bytes; + u64 generation; +}; + +struct old_sa_defrag_extent { + struct list_head list; + struct new_sa_defrag_extent *new; + + u64 extent_offset; + u64 bytenr; + u64 offset; + u64 len; + int count; +}; + +struct new_sa_defrag_extent { + struct rb_root root; + struct list_head head; + struct btrfs_path *path; + struct inode *inode; + u64 file_pos; + u64 len; + u64 bytenr; + u64 disk_len; + u8 compress_type; +}; + +static int backref_comp(struct sa_defrag_extent_backref *b1, + struct sa_defrag_extent_backref *b2) +{ + if (b1-root_id b2-root_id) + return -1; + else if (b1-root_id
Re: find-new possibility of showing modified and deleted files/directories
On 01.11.2012 12:00, Gabriel wrote: On Thu, 01 Nov 2012 06:06:57 +0100, Arne Jansen wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. Consider the rsync bundle format as well. That should provide interoperability with any filesystem. Rsync is an interactive protocol. The idea with send/receive is that the stream can be generated without any interactions with receiver. You can store the stream somewhere, or replay it to many destinations. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashes in extent_io.c after btrfs bad mapping eb notice
On Thu, November 01, 2012 at 03:39 (+0100), Liu Bo wrote: On 11/01/2012 04:00 AM, Franke wrote: Hi, since yesterday I have run a balance while asleep/at work. Now I experimented a bit, and the situation has changed. I am now getting hard hangs ( system is gone without even writing anything to syslog ), some time ( minutes to an hour ) into running a scrub. Those hangs happen with 3.6.2 , 3.6.4 and Jan's unstable version. It hasn't hung yet without running a scrub. I have no idea if this is part of the same problem or something else. Do you have any idea either way? Well, thanks for testing. We may need your sysrq-w output(maybe screen output) to locate where we hard hangs. Besides, I recommend you pick Jan's patches out, and apply them on the latest btrfs upstream and run another round to see if it get better, since there might be some fixes for the very hang already in the upstream. Right now the latest btrfs upstream's top commit is commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a Author: Chris Mason chris.ma...@fusionio.com Date: Tue Oct 9 11:17:20 2012 -0400 btrfs: init ref_index to zero in add_inode_ref Signed-off-by: Chris Mason chris.ma...@fusionio.com This is an old top commit. The current cmason/master state is commit c37b2b6269ee4637fb7cdb5da0d1e47215d57ce2 Author: Josef Bacik jba...@fusionio.com Date: Mon Oct 22 15:51:44 2012 -0400 and includes my recent fixes. I don't really expect them to prevent getting stuck anywhere. sysrq+w output would be really helpful. I'm trying to reproduce the problems in the meantime. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: check block group used count and fix if specified
A user reported a problem where all of his block groups had invalid used counts in the block group item. This patch walks the extent tree and counts up the used amount for each block group. If the user specifies repair we can set the correct used value and when the transaction commits we're all set. This was reported and tested by a user and worked. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- btrfsck.c | 107 + ctree.h |3 ++ 2 files changed, 110 insertions(+), 0 deletions(-) diff --git a/btrfsck.c b/btrfsck.c index 67f4a9d..a5f995d 100644 --- a/btrfsck.c +++ b/btrfsck.c @@ -3470,6 +3470,108 @@ static int check_extents(struct btrfs_trans_handle *trans, return ret; } +static int check_block_group_used(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_block_group_cache *block_group, + struct btrfs_path *path, int repair) +{ + struct extent_buffer *leaf; + struct btrfs_key key; + u64 used = 0; + int slot; + int err = 0; + int ret; + + root = root-fs_info-extent_root; + key.objectid = min_t(u64, block_group-key.objectid, +BTRFS_SUPER_INFO_OFFSET); + key.offset = 0; + key.type = BTRFS_EXTENT_ITEM_KEY; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + return ret; + + while (1) { + leaf = path-nodes[0]; + slot = path-slots[0]; + + if (slot = btrfs_header_nritems(leaf)) { + ret = btrfs_next_leaf(root, path); + if (ret 0) { + err = ret; + break; + } + if (ret) { + ret = 0; + break; + } + continue; + } + btrfs_item_key_to_cpu(leaf, key, slot); + if (key.objectid block_group-key.objectid) { + path-slots[0]++; + continue; + } + + if (key.objectid = + block_group-key.objectid + block_group-key.offset) + break; + + if (key.type == BTRFS_EXTENT_ITEM_KEY) + used += key.offset; + path-slots[0]++; + } + btrfs_release_path(root, path); + + if (!err btrfs_block_group_used(block_group-item) != used) { + fprintf(stderr, Block group %llu has a wrong used amount, + used=%llu, actually used=%llu%s\n, + (unsigned long long)block_group-key.objectid, + (unsigned long long) + btrfs_block_group_used(block_group-item), + (unsigned long long)used, repair ? , fixing: ); + if (repair) { + btrfs_set_block_group_used(block_group-item, used); + set_extent_bits(root-fs_info-block_group_cache, + block_group-key.objectid, + block_group-key.objectid + + block_group-key.offset - 1, + EXTENT_DIRTY, GFP_NOFS); + } + err = 1; + } + + return err; +} + +static int check_block_groups_used(struct btrfs_trans_handle *trans, + struct btrfs_root *root, int repair) +{ + struct btrfs_block_group_cache *block_group; + struct btrfs_path *path; + u64 bytenr = 0; + int ret; + int err = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + path-reada = 2; + while ((block_group = btrfs_lookup_first_block_group(root-fs_info, +bytenr))) { + ret = check_block_group_used(trans, root, block_group, path, +repair); + if (ret !err) + ret = err; + bytenr = block_group-key.objectid + block_group-key.offset; + } + btrfs_free_path(path); + + return err; +} + static void print_usage(void) { fprintf(stderr, usage: btrfsck dev\n); @@ -3574,6 +3676,11 @@ int main(int ac, char **av) if (ret) fprintf(stderr, Errors found in extent allocation tree\n); + fprintf(stderr, checking block groups used count\n); + ret = check_block_groups_used(trans, root, repair); + if (ret) + fprintf(stderr, Errors found in block groups\n); + fprintf(stderr, checking fs roots\n); ret =
no space left on device.
So I have ended up in a state where I can't delete files with rm. the error I get is no space on device. however I'm not even close to empty. /dev/sdb1 38G 27G 9.5G 75% there is about 800k files/dirs in this filesystem extra strange is that I can in another directory create and delete files. So I tried pretty much all I could google my way to but problem persisted. So I decided to do a backup and a format. But when the backup was done I tried one more time and now it was possible to delete the directory and all content? using the 3.5 kernel in ubuntu 12.10. Is this a known issue ? is it fixed in later kernels? fsck /btrfs scrub and kernel log. nothing indicate any problem of any kind. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: check block group used count and fix if specified
On Thu, Nov 01, 2012 at 06:34:54AM -0600, Josef Bacik wrote: A user reported a problem where all of his block groups had invalid used counts in the block group item. This patch walks the extent tree and counts up the used amount for each block group. If the user specifies repair we can set the correct used value and when the transaction commits we're all set. This was reported and tested by a user and worked. Thanks, Josef and I hashed this out a little bit on irc. My fsck repair code already tries to fix the block group accounting, but I think there is a key part his code does differently (correctly ;): +static int check_block_groups_used(struct btrfs_trans_handle *trans, +struct btrfs_root *root, int repair) +{ + struct btrfs_block_group_cache *block_group; + struct btrfs_path *path; + u64 bytenr = 0; + int ret; + int err = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + path-reada = 2; + while ((block_group = btrfs_lookup_first_block_group(root-fs_info, + bytenr))) { + ret = check_block_group_used(trans, root, block_group, path, + repair); + if (ret !err) + ret = err; + bytenr = block_group-key.objectid + block_group-key.offset; + } + btrfs_free_path(path); + + return err; +} My code reuses btrfs_fix_block_group_acounting, which does this: start = 0; while(1) { cache = btrfs_lookup_block_group(fs_info, start); if (!cache) break; start = cache-key.objectid + cache-key.offset; btrfs_set_block_group_used(cache-item, 0); cache-space_info-bytes_used = 0; set_extent_bits(root-fs_info-block_group_cache, cache-key.objectid, cache-key.objectid + cache-key.offset -1, BLOCK_GROUP_DIRTY, GFP_NOFS); } Using btrfs_lookup_first_block_group here should fix things. It must be breaking out too soon and so the accounting isn't updated properly. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: make snapshot-aware defrag as a mount option
On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote: This feature works on our crucial write endio path, so if we've got lots of fragments to process, it will be kind of a disaster to the performance, so I make such a change. One can benifit from it while mounting with '-o snap_aware_defrag'. I think we should always prefer to maintain snapshot cloning as much as possible, and have a specific option to defrag that makes it break the clone in favor of removing fragmentation. So, please keep the snapshot aware defrag the default ;) Thanks for taking these patches up again! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs-progs: correcting misnamed parameter options for btrfs send
Unfortunately, the command line options for btrfs send were misnamed. To specify a base for an incremental snapshot transfer, the best choice is -i for incremental (was: -p). To optionally add snapshots existing on the receiver as clone sources, the best choice is -c (was: -i). Compatibily note: -i option was broken anyway, which makes it less critical reassigning it. For potential users of the old option style, we emit a fatal warning if the -p is used. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net Reviewed-by: Alexander Block abloc...@googlemail.com --- cmds-send.c | 97 +++--- 1 files changed, 52 insertions(+), 45 deletions(-) diff --git a/cmds-send.c b/cmds-send.c index 9b47e70..9db65e9 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -234,7 +234,7 @@ out: return ERR_PTR(ret); } -static int do_send(struct btrfs_send *send, u64 root_id, u64 parent_root) +static int do_send(struct btrfs_send *send, u64 root_id, u64 base_root_id) { int ret; pthread_t t_read; @@ -286,7 +286,7 @@ static int do_send(struct btrfs_send *send, u64 root_id, u64 parent_root) io_send.clone_sources = (__u64*)send-clone_sources; io_send.clone_sources_count = send-clone_sources_count; - io_send.parent_root = parent_root; + io_send.parent_root = base_root_id; ret = ioctl(subvol_fd, BTRFS_IOC_SEND, io_send); if (ret) { ret = -errno; @@ -420,19 +420,20 @@ int cmd_send_start(int argc, char **argv) struct btrfs_send send; u32 i; char *mount_root = NULL; - char *snapshot_parent = NULL; + char *incremental_base = NULL; u64 root_id; - u64 parent_root_id = 0; + u64 base_root_id = 0; + int full_send = 1; memset(send, 0, sizeof(send)); send.dump_fd = fileno(stdout); - while ((c = getopt(argc, argv, vf:i:p:)) != -1) { + while ((c = getopt(argc, argv, vf:i:p:r:)) != -1) { switch (c) { case 'v': g_verbose++; break; - case 'i': { + case 'r': subvol = realpath(optarg, NULL); if (!subvol) { ret = -errno; @@ -455,19 +456,26 @@ int cmd_send_start(int argc, char **argv) add_clone_source(send, root_id); free(subvol); break; - } case 'f': outname = optarg; break; - case 'p': - snapshot_parent = realpath(optarg, NULL); - if (!snapshot_parent) { + case 'i': + if (incremental_base) { + fprintf(stderr, ERROR: you cannot have more than one base for incremental send (-i)\n); + return 1; + } + incremental_base = realpath(optarg, NULL); + if (!incremental_base) { ret = -errno; fprintf(stderr, ERROR: realpath %s failed. %s\n, optarg, strerror(-ret)); goto out; } + full_send = 0; break; + case 'p': + fprintf(stderr, ERROR: -p option was removed. use -i instead\n); + return 1; case '?': default: fprintf(stderr, ERROR: send args invalid.\n); @@ -504,17 +512,17 @@ int cmd_send_start(int argc, char **argv) if (ret 0) goto out; - if (snapshot_parent != NULL) { + if (incremental_base != NULL) { ret = get_root_id(send, - get_subvol_name(send, snapshot_parent), - parent_root_id); + get_subvol_name(send, incremental_base), + base_root_id); if (ret 0) { fprintf(stderr, ERROR: could not resolve root_id - for %s\n, snapshot_parent); + for %s\n, incremental_base); goto out; } - add_clone_source(send, parent_root_id); + add_clone_source(send, base_root_id); } for (i = optind; i argc; i++) { @@ -573,10 +581,13 @@ int cmd_send_start(int argc, char **argv) goto out; } - if (!parent_root_id) { - ret = find_good_parent(send, root_id, parent_root_id); - if (ret 0) -
[PATCH 0/2] Btrfs-progs: urgent fixes for btrfs send
Hi everybody, We made a bad mistake with btrfs send command line arguments and we'd better fix it before it's being widely used (read: *now*). When using btrfs send as in the current master, the -i option does *not* give you an incremental stream as you'd expect. There are two problems in the back: - The -i option adds clone sources and btrfs determines itself whether any of these should be used to generate an incremental stream. - That determination is broken. There is however a -p option to force generation of an incremental stream. This is a must change in my opinion. We should be using send -i in the same way zfs send is using it. I expect anything else to cause wide confusion. Therefore, these 2 patches do the following: - Turn -i into -r, which stands for remote or receiving side. The option is meant to tell btrfs which subvolumes exist on the receiver. - Turn -p into -i. Yes, this is a clash. - Fix the parent determination (required in combination with the -r option) in a way, that we never overwrite a base for an incremental snapshot given with -i. Outcome for people who are already used to the current way btrfs send works: - btrfs send -p [base] [subvol] This command now prints a fatal error message to use -i instead. NEW: btrfs send -i [base] [subvol] - btrfs send -i [snap] [subvol] This command now does exactly what one would have expected from the previous documentation: it should have automatically determined [snap] as base for an incremental stream. Now it really selects [snap] as base. - btrfs send -p [base] -i [snap1] -i [snap2] [subvol] Prints the same error message as the first example. NEW: btrfs send -i [base] -r [snap1] -r [snap2] [subvol] - btrfs send -i [snap1] -i [snap2] [subvol] Previously, determination of the best base was likely to fail, resulting in a full stream. Now you get a fatal error message for specifying -i multiple times. NEW: btrfs send -r [snap1] -r [snap2] [subvol] Although this gives a change in command line options rather late in the game, I'll emphasis again that I think it's a no-go to leave it as it currently is. The outcome as outlined should be acceptable for anyone, I don't see a case where this change does something completely different after the change. Users will have to adapt to the corrected switches, though. For ease of management, you can fetch these patches from my git repo, based on top of the current cmason/master: git://git.jan-o-sch.net/btrfs-progs for-chris -Jan Jan Schmidt (2): Btrfs-progs: correcting misnamed parameter options for btrfs send Btrfs-progs: bugfix for subvolume parent determination in btrfs send cmds-send.c | 97 +++--- send-utils.c |4 +- 2 files changed, 54 insertions(+), 47 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs-progs: bugfix for subvolume parent determination in btrfs send
We missed to add the default subvolume, because it has no ROOT_BACKREF_ITEM. This made get_parent always fail for direct decendants of the default subvolume, resulting in lots of full streams where incremental streams were requested. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net Reviewed-by: Alexander Block abloc...@googlemail.com --- send-utils.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/send-utils.c b/send-utils.c index fcde5c2..d8d3972 100644 --- a/send-utils.c +++ b/send-utils.c @@ -240,7 +240,8 @@ int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s) memcpy(root_item, root_item_ptr, sizeof(root_item)); root_item_valid = 1; - } else if (sh-type == BTRFS_ROOT_BACKREF_KEY) { + } else if (sh-type == BTRFS_ROOT_BACKREF_KEY || + root_item_valid) { if (!root_item_valid) goto skip; @@ -274,7 +275,6 @@ int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s) subvol_uuid_search_add(s, si); root_item_valid = 0; } else { - root_item_valid = 0; goto skip; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: find-new possibility of showing modified and deleted files/directories
On Thu, 01 Nov 2012 12:29:36 +0100, Arne Jansen wrote: On 01.11.2012 12:00, Gabriel wrote: On Thu, 01 Nov 2012 06:06:57 +0100, Arne Jansen wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. Consider the rsync bundle format as well. That should provide interoperability with any filesystem. Rsync is an interactive protocol. The idea with send/receive is that the stream can be generated without any interactions with receiver. You can store the stream somewhere, or replay it to many destinations. Same with rsync's batch mode. Here is more about it: http://manpages.ubuntu.com/manpages/precise/man1/rsync.1.html#contenttoc21 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Btrfs-progs: urgent fixes for btrfs send
On Thu, Nov 01, 2012 at 09:01:24AM -0600, Jan Schmidt wrote: Hi everybody, We made a bad mistake with btrfs send command line arguments and we'd better fix it before it's being widely used (read: *now*). Ok, I do agree that -i was confusing. I didn't end up using it in my backup scripts here. How about: Make -p and -i mean the same thing. Add -r for what -i should have done. This has the advantage of not breaking the people that did get working btrfs send setups ;) -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Btrfs-progs: urgent fixes for btrfs send
On Thu, November 01, 2012 at 16:07 (+0100), Chris Mason wrote: On Thu, Nov 01, 2012 at 09:01:24AM -0600, Jan Schmidt wrote: Hi everybody, We made a bad mistake with btrfs send command line arguments and we'd better fix it before it's being widely used (read: *now*). Ok, I do agree that -i was confusing. I didn't end up using it in my backup scripts here. Good we agree here :-) How about: Make -p and -i mean the same thing. Add -r for what -i should have done. This has the advantage of not breaking the people that did get working btrfs send setups ;) I'd carefully argue that we're still in the position to break things, because the 3.7 kernel isn't released and you cannot use btrfs send without it. The number of users should be really small. I prefer having a clean and painful cut over suffering from bad decisions forever. That may not be the most popular opinion in the world. In the end, I could live with -p and -i doing the same thing. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: make snapshot-aware defrag as a mount option
On 11/01/2012 10:43 PM, Chris Mason wrote: On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote: This feature works on our crucial write endio path, so if we've got lots of fragments to process, it will be kind of a disaster to the performance, so I make such a change. One can benifit from it while mounting with '-o snap_aware_defrag'. I think we should always prefer to maintain snapshot cloning as much as possible, and have a specific option to defrag that makes it break the clone in favor of removing fragmentation. Oh yeah, so I was considering the existing btrfs partitions who have already broke the cloning relationship. So, please keep the snapshot aware defrag the default ;) All right, that'd be nice, just drop this patch. thanks, liubo Thanks for taking these patches up again! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashes in extent_io.c after btrfs bad mapping eb notice
On Thu, November 01, 2012 at 12:57 (+0100), Jan Schmidt wrote: I'm trying to reproduce the problems in the meantime. Looks like it worked :-/ And it also looks like it can either bug or deadlock, depending on the things going on in the kernel at the same time. I did a parallel fsmark on a qgroup enabled volume while scrubbing it, reaching at a page fault after four hours of iteration: 1[194521.851156] BUG: unable to handle kernel paging request at 880137c52a08 1[194659.159461] IP: [810e3642] __lock_acquire+0x62/0x1630 4[194659.231741] PGD 1e0c063 PUD be586067 PMD be745067 PTE 800137c52160 4[194659.311717] Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC 4[194659.375976] Modules linked in: btrfs mpt2sas scsi_transport_sas raid_class 4[194659.460230] CPU 6 4[194659.483318] Pid: 20466, comm: btrfs-scrub-3 Tainted: GW 3.6.0+ #3 Supermicro X8SIL/X8SIL 4[194659.595327] RIP: 0010:[810e3642] [810e3642] __lock_acquire+0x62/0x1630 4[194659.696829] RSP: 0018:880138ab7c50 EFLAGS: 00010046 4[194659.761725] RAX: 0046 RBX: 880137c52a08 RCX: 4[194659.848565] RDX: 0001 RSI: RDI: 880137c52a08 4[194659.935405] RBP: 880138ab7d20 R08: 0002 R09: 0001 4[194660.022245] R10: R11: R12: 8802273ba3b0 4[194660.108984] R13: 0002 R14: R15: 4[194660.195717] FS: () GS:88023720() knlGS: 4[194660.293997] CS: 0010 DS: ES: CR0: 8005003b 4[194660.363990] CR2: 880137c52a08 CR3: 01e0b000 CR4: 07e0 4[194660.450726] DR0: DR1: DR2: 4[194660.537564] DR3: DR6: 0ff0 DR7: 0400 4[194660.624406] Process btrfs-scrub-3 (pid: 20466, threadinfo 880138ab6000, task 8802273ba3b0) 4[194660.733189] Stack: 4[194660.758357] 0286 8802353a4000 8802273ba3b0 4[194660.848733] 880138ab7c90 0286 880138ab7d20 8802353a4000 4[194660.939212] 8802273baa78 8245f100 880138ab7cc0 0286 4[194661.029589] Call Trace: 4[194661.059969] [8109811a] ? del_timer_sync+0x8a/0xc0 4[194661.128964] [81098090] ? try_to_del_timer_sync+0x70/0x70 4[194661.205367] [a00a306a] ? worker_loop+0x35a/0x5b0 [btrfs] 4[194661.281688] [810e4ca5] lock_acquire+0x95/0x140 4[194661.347634] [a00a306a] ? worker_loop+0x35a/0x5b0 [btrfs] 4[194661.423964] [819380c0] _raw_spin_lock+0x40/0x80 4[194661.490953] [a00a306a] ? worker_loop+0x35a/0x5b0 [btrfs] 4[194661.567283] [a00a306a] worker_loop+0x35a/0x5b0 [btrfs] 4[194661.641539] [a00a2d10] ? btrfs_queue_worker+0x300/0x300 [btrfs] 4[194661.725249] [810ac3d6] kthread+0xa6/0xb0 4[194661.784961] [819409a4] kernel_thread_helper+0x4/0x10 4[194661.857120] [8193901d] ? retint_restore_args+0xe/0xe 4[194661.929294] [810ac330] ? __init_kthread_worker+0x70/0x70 4[194662.005632] [819409a0] ? gs_change+0xb/0xb 4[194662.067405] Code: 48 89 5d d8 4c 89 7d f8 45 0f 45 e8 85 c0 48 89 fb 4c 8b 55 10 0f 84 4e 04 00 00 44 8b 3d 2b be 0c 01 45 85 ff 0f 84 56 04 00 00 48 81 3b e0 5a 1f 82 b8 01 00 00 00 44 0f 44 e8 83 fe 01 0f 86 1[194662.302652] RIP [810e3642] __lock_acquire+0x62/0x1630 4[194662.375973] RSP 880138ab7c50 4[194662.418821] CR2: 880137c52a08 4[194662.460051] ---[ end trace 85e160ea023efd39 ]--- debug config enabled: CONFIG_DEBUG_PAGEALLOC=y CONFIG_SLUB_DEBUG=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOCKDEP=y -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag
On 11/01/2012 10:05 PM, Itaru Kitayama wrote: Hi Liubo: The V4 leaves only warnings from btrfs_destroy_inode(). So, you think it's normal an old extent recorded can be removed from the extent tree by the time relink_file_extents() invoked? Yeah, it could be if only we run delayed refs in time. But I don't think that often happens since we run delayed refs when the amount reaches its limit(64). thanks, liubo Itaru On Thu, Nov 1, 2012 at 8:21 PM, Liu Bo bo.li@oracle.com wrote: The current btrfs-next HEAD actually have included this v4 patch, so just pull btrfs-next and give it a shot :) thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
On Thu, Nov 01, 2012 at 11:56:18AM +0100, Sander wrote: For now, I'll stick with 3.5.3 for a while to make sure my drive is actually ok (it seems to be afterall), and once I'm happy that it's the case, I'll go back to 3.6.3 with serial console remote logging and try to capture the full sata failure I got with 3.6.3. Thanks for the info. You could put some load on the ssd to see if you can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or badblocks (in the default non-destructive mode). I'll try this in a few days when I've first comfirmed that my SSD is still 100% stable under 3.5.3 (so far it is). After that, I'll go back to 3.6.3 and see what it takes to crash it. But as per my original report and http://marc.merlins.org/tmp/crash.jpg this does look like a sata layer problem, which btrfs isn't responsible for. Also there is still that unaddressed bug that when it does happen, btrfs then can end up in a state where the filesystem is unmountable without manually fixing it. Can you collect SMART data (with smartctl) from the ssd? I did actually have a look, but to be honest, SSDs have pretty useless smart data overall. Mine's likely a bit worse than the average even. gandalfthegreat:~# smartctl -a /dev/sda smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.3-amd64-preempt-noide-20120903] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: OCZ-VERTEX4 Serial Number:OCZ-26W4VJ3SP32E1WC2 LU WWN Device Id: 5 e83a97 59be3b57e Firmware Version: 1.5 User Capacity:512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Device is:Not in smartctl database [for details use: -P showall] ATA Version is: 9 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Thu Nov 1 09:14:43 2012 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection:(0) seconds. Offline data collection capabilities:(0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x00) Error logging NOT supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 0) minutes. Extended self-test routine recommended polling time:( 0) minutes. SMART Attributes Data Structure revision number: 18 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x 006 000 000Old_age Offline - 6 3 Spin_Up_Time0x 100 100 000Old_age Offline - 0 4 Start_Stop_Count0x 100 100 000Old_age Offline - 0 5 Reallocated_Sector_Ct 0x 100 100 000Old_age Offline - 8 9 Power_On_Hours 0x 100 100 000Old_age Offline - 1210 12 Power_Cycle_Count 0x 100 100 000Old_age Offline - 240 232 Available_Reservd_Space 0x 100 100 000Old_age Offline - 8019542246 233 Media_Wearout_Indicator 0x 099 000 000Old_age Offline - 99 SMART Error Log not supported Warning! SMART Self-Test Log Structure error: invalid SMART checksum. SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems
Re: find-new possibility of showing modified and deleted files/directories
On Wed, Oct 31, 2012 at 9:06 PM, Arne Jansen sensi...@gmx.net wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. You just made my day. I will probably be approaching a lot of this from Python as well so I'm very interested in the stream format itself. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Btrfs-progs: urgent fixes for btrfs send
On Thu, Nov 01, 2012 at 09:48:25AM -0600, Jan Schmidt wrote: On Thu, November 01, 2012 at 16:07 (+0100), Chris Mason wrote: On Thu, Nov 01, 2012 at 09:01:24AM -0600, Jan Schmidt wrote: Hi everybody, We made a bad mistake with btrfs send command line arguments and we'd better fix it before it's being widely used (read: *now*). Ok, I do agree that -i was confusing. I didn't end up using it in my backup scripts here. Good we agree here :-) How about: Make -p and -i mean the same thing. Add -r for what -i should have done. This has the advantage of not breaking the people that did get working btrfs send setups ;) I'd carefully argue that we're still in the position to break things, because the 3.7 kernel isn't released and you cannot use btrfs send without it. The number of users should be really small. I prefer having a clean and painful cut over suffering from bad decisions forever. That may not be the most popular opinion in the world. In the end, I could live with -p and -i doing the same thing. But we have a working -p, I'm not sure why we'd rename it to -i? I'm even fine with just flat out removing -i. This is mostly because --parent makes a lot of sense to me, but I'm more than open to other ideas. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: use radix tree tagging to keep track of dirty ebs
Currently we set dirty bits in the transactions dirty eb's in order to know what we have to write out on transaction commit. This means for every eb we allocate we have to allocate the corresponding extent state in the dirty pages tree. We also only change this tree on commit, so we could end up looking at ranges that we've already written. By using the radix tagging we can avoid the memory allocation altogether, which is a step we need in order to non-blocking COW's. This also clears the radix tag when we write dirty eb's, so if we write buffers because of memory pressure we won't come back and do all the checking at transaction commit. This ran with my fs_mark billion files test and didn't regress in performance, and it passed xfstests without issues. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c | 87 -- fs/btrfs/extent-tree.c |9 ++- fs/btrfs/extent_io.c | 160 +-- fs/btrfs/extent_io.h |7 ++ fs/btrfs/transaction.c | 19 +- fs/btrfs/transaction.h |1 - 6 files changed, 197 insertions(+), 86 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0643159..305bf35 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -61,9 +61,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_root *root); static void btrfs_destroy_pending_snapshots(struct btrfs_transaction *t); static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root); -static int btrfs_destroy_marked_extents(struct btrfs_root *root, - struct extent_io_tree *dirty_pages, - int mark); +static int btrfs_destroy_marked_extents(struct btrfs_root *root); static int btrfs_destroy_pinned_extent(struct btrfs_root *root, struct extent_io_tree *pinned_extents); @@ -3263,7 +3261,7 @@ int btrfs_commit_super(struct btrfs_root *root) ret = btrfs_commit_transaction(trans, root); if (ret) return ret; - ret = btrfs_write_and_wait_transaction(NULL, root); + ret = filemap_write_and_wait(root-fs_info-btree_inode-i_mapping); if (ret) { btrfs_error(root-fs_info, ret, Failed to sync btree inode to disk.); @@ -3650,61 +3648,34 @@ static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) spin_unlock(root-fs_info-delalloc_lock); } -static int btrfs_destroy_marked_extents(struct btrfs_root *root, - struct extent_io_tree *dirty_pages, - int mark) +static int btrfs_destroy_marked_extents(struct btrfs_root *root) { - int ret; - struct page *page; struct inode *btree_inode = root-fs_info-btree_inode; - struct extent_buffer *eb; - u64 start = 0; - u64 end; - u64 offset; - unsigned long index; + struct extent_io_tree *tree = BTRFS_I(btree_inode)-io_tree; + struct radix_tree_iter iter; + void **slot; + int mark = PAGECACHE_TAG_DIRTY; - while (1) { - ret = find_first_extent_bit(dirty_pages, start, start, end, - mark, NULL); - if (ret) - break; +again: + spin_lock_irq(tree-buffer_lock); + radix_tree_for_each_tagged(slot, tree-buffer, iter, 0, mark) { + struct extent_buffer *eb; - clear_extent_bits(dirty_pages, start, end, mark, GFP_NOFS); - while (start = end) { - index = start PAGE_CACHE_SHIFT; - start = (u64)(index + 1) PAGE_CACHE_SHIFT; - page = find_get_page(btree_inode-i_mapping, index); - if (!page) - continue; - offset = page_offset(page); - - spin_lock(dirty_pages-buffer_lock); - eb = radix_tree_lookup( -(BTRFS_I(page-mapping-host)-io_tree)-buffer, - offset PAGE_CACHE_SHIFT); - spin_unlock(dirty_pages-buffer_lock); - if (eb) - ret = test_and_clear_bit(EXTENT_BUFFER_DIRTY, -eb-bflags); - if (PageWriteback(page)) - end_page_writeback(page); - - lock_page(page); - if (PageDirty(page)) { - clear_page_dirty_for_io(page); - spin_lock_irq(page-mapping-tree_lock); - radix_tree_tag_clear(page-mapping-page_tree, -
Re: [Request for review] [RFC] Add label support for snapshots and subvols
Below is a demo of this new feature. btrfs fi label -t /btrfs/sv1 Prod-DB btrfs fi label -t /btrfs/sv1 Prod-DB btrfs su snap /btrfs/sv1 /btrfs/snap1-sv1 Create a snapshot of '/btrfs/sv1' in '/btrfs/snap1-sv1' btrfs fi label -t /btrfs/snap1-sv1 btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Request for review] [RFC] Add label support for snapshots and subvols
On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote: btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test ... because it would mean possibilty to decouple subvol name from whatever-data-you-need (in this case, a label). My request, though, is to just implement properties, and USER properties, like what we have in zfs. This seems to be a cleaner, saner approach. For example, this is on Ubutu + zfsonlinux: # zfs create rpool/u # zfs set user:label=Some test filesystem rpool/u # zfs get creation,user:label rpool/u NAME PROPERTYVALUE SOURCE rpool/u creationFri Nov 2 5:24 2012 - rpool/u user:label Some test filesystem local More info about zfs user properties here: http://docs.oracle.com/cd/E19082-01/817-2271/gdrcw/index.html -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Request for review] [RFC] Add label support for snapshots and subvols
On Fri, Nov 02, 2012 at 05:28:01AM +0700, Fajar A. Nugraha wrote: On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote: btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test ... because it would mean possibilty to decouple subvol name from whatever-data-you-need (in this case, a label). My request, though, is to just implement properties, and USER properties, like what we have in zfs. This seems to be a cleaner, saner approach. For example, this is on Ubutu + zfsonlinux: # zfs create rpool/u # zfs set user:label=Some test filesystem rpool/u # zfs get creation,user:label rpool/u NAME PROPERTYVALUE SOURCE rpool/u creationFri Nov 2 5:24 2012 - rpool/u user:label Some test filesystem local Don't we already have an equivalent to that with user xattrs? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt signature.asc Description: Digital signature
Re: [Request for review] [RFC] Add label support for snapshots and subvols
On Fri, Nov 2, 2012 at 5:32 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Nov 02, 2012 at 05:28:01AM +0700, Fajar A. Nugraha wrote: On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote: btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test ... because it would mean possibilty to decouple subvol name from whatever-data-you-need (in this case, a label). My request, though, is to just implement properties, and USER properties, like what we have in zfs. This seems to be a cleaner, saner approach. For example, this is on Ubutu + zfsonlinux: # zfs create rpool/u # zfs set user:label=Some test filesystem rpool/u # zfs get creation,user:label rpool/u NAME PROPERTYVALUE SOURCE rpool/u creationFri Nov 2 5:24 2012 - rpool/u user:label Some test filesystem local Don't we already have an equivalent to that with user xattrs? Hugo. Anand did say one way to implement the label is by using attr, so +1 from me for that approach. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Request for review] [RFC] Add label support for snapshots and subvols
On 11/01/2012 11:28 PM, Fajar A. Nugraha wrote: On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote: btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test ... because it would mean possibilty to decouple subvol name from whatever-data-you-need (in this case, a label). Could you elaborate how this solution is different from using xattr ? I think also that these labels could be changed (like xattr). ? These info should be associated to the inode of the subvolume root. We could use a specific name in the system namespace, like system.btrfslabel even tough I didn't see any advantage to using the user namespace... -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set
On thu, 1 Nov 2012 17:21:12 +0800, Liu Bo wrote: On Thu, Nov 01, 2012 at 03:35:23PM +0800, Miao Xie wrote: If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent, but now we forget to take it into account, and set a wrong max key, if so, we will skip the file extent metadata when doing logging. Fix it. But it's along with LOG_INODE_EXISTS, which is set by rename and link and means we need to log just enough to rebuild the inode during log replay. On the other side, if we do log all the extents because of having set BTRFS_INODE_NEEDS_FULL_SYNC, we don't know if we actually get what we want because rename and link do not wait for dirty pages as fsync does. Full sync is the safest way to log. I think we should not ignore it since it is set. Thanks Miao thanks, liubo Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/tree-log.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f7e9387..c495b47 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3394,7 +3394,10 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, /* today the code can only do partial logging of directories */ -if (inode_only == LOG_INODE_EXISTS || S_ISDIR(inode-i_mode)) +if (S_ISDIR(inode-i_mode) || +(!test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + BTRFS_I(inode)-runtime_flags) + inode_only == LOG_INODE_EXISTS)) max_key.type = BTRFS_XATTR_ITEM_KEY; else max_key.type = (u8)-1; -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html