Re: Recover btrfs volume which can only be mounded in read-only mode
On Mon, Oct 26, 2015 at 09:14:00AM +, Duncan wrote: > Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted: > > >> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple > >> to code up and pretty simple to arrange tests for that run either one > >> side or the other, but not both, or that are well balanced to both. > >> However, it's pretty poor in terms of ensuring optimized real-world > >> deployment read-scheduling. > >> > >> What it does is simply this. Remember, btrfs raid1 is specifically two > >> copies. It chooses which copy of the two will be read very simply, > >> based on the PID making the request. Odd PIDs get assigned one copy, > >> even PIDs the other. As I said, simple to code, great for ensuring > >> testing of one copy or the other or both, but not really optimized at > >> all for real-world usage. > >> > >> If your workload happens to be a bunch of all odd or all even PIDs, > >> well, enjoy your testing-grade read-scheduler, bottlenecking everything > >> reading one copy, while the other sits entirely idle. > > > > I think PID-based solution is not the best one. Why not simply take a > > random device? Then at least all drives in the volume are equally loaded > > (in average). > > Nobody argues that the even/odd-PID-based read-scheduling solution is > /optimal/, in a production sense at least. But at the time and for the > purpose it was written it was pretty good, arguably reasonably close to > "best", because the implementation is at once simple and transparent for > debugging purposes, and real easy to test either one side or the other, > or both, and equally important, to duplicate the results of those tests, > by simply arranging for the testing to have either all even or all odd > PIDs, or both. And for ordinary use, it's good /enough/, as ordinarily, > PIDs will be evenly distributed even/odd. > > In that context, your random device read-scheduling algorithm would be > far worse, because while being reasonably simple, it's anything *but* > easy to ensure reads go to only one side or equally to both, or for that > matter, to duplicate the tests, because randomization, by definition > does /not/ lend itself to duplication. For what it's worth, David tried implementing round-robin (IIRC) some time ago, and found that it performed *worse* than the pid-based system. (It may have been random, but memory says it was round-robin). Hugo. -- Hugo Mills | Great films about cricket: The Umpire Strikes Back hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: [RFC PATCH 2/3] btrfs-progs: kernel based default features for mkfs
Thanks Jeff for the comments. On 10/23/2015 11:24 PM, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/21/15 4:45 AM, Anand Jain wrote: mkfs from latest btrfs-progs will enable latest default features, and if the kernel is down-rev and does not support a latest default feature then mount fails, as expected. This patch disables default features based on the running kernel. I like the idea generally based on comments further in the thread, but what I don't like is: 1) It's silent. Will add warning. 2) There's no way to override it. If we're going to change the defaults at runtime, we should tell the user what has changed and why. Otherwise, an identical mkfs.btrfs binary will behave differently on different systems without feedback and that violates the principle of least surprise. If they want to do what Qu suggests later in the thread, where the device is being prepared for use on a newer kernel, it should be possible to force it. The normal -f should be fine there. Sill have no idea how to get this, trying. Thanks, Anand - -Jeff Signed-off-by: Anand Jain--- mkfs.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mkfs.c b/mkfs.c index a5802f7..2b9d734 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1357,10 +1357,13 @@ int main(int ac, char **av) int dev_cnt = 0; int saved_optind; char fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 }; -u64 features = BTRFS_MKFS_DEFAULT_FEATURES; + u64 features; struct mkfs_allocation allocation = { 0 }; struct btrfs_mkfs_config mkfs_cfg; + features = btrfs_features_allowed_by_kernel(); +features &= BTRFS_MKFS_DEFAULT_FEATURES; + while(1) { int c; static const struct option long_options[] = { - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iQIcBAEBAgAGBQJWKlE/AAoJEB57S2MheeWy2mAP/33RvA9174u0PRmh+RBorZDC p3nDFFxS7pI5u7rSkFqUvbsKy9AoblUvgMYS8pDNkFokDML2hbH3HaYWFEmqvMch mp9DQ+wKz5hI5fYt/wgDdtVO6X0E3TCm2Cj1Uw4fl7E0bMzgNgio8tnOoGTrHGa5 YkZ96L9UWzEScv9EtesO3DLbUC+O3pokyHsHCdBRVgEwLcLB1AtmPrQmhc2a1+M4 sfzElmbo9Rld/xmtI4ecHl1sWbpfrYcKimzV32Jdv/SNhEyPuFOcN6/GUDOrGE7o Vs87+HtuXUr+CbFUM9r9wB1Nqj4yYJ78LnBfepBMjY9vWyAgPR49WFPRA/uhkvu/ uOd4DNgUbLktakztsMb1GRiS/6AEj6s8mHFzkOrS5b9E/RbwegWgcnpnWCveFcDO Nsfa6Mg99X7ojuXeMi8c00Jins70uSnh/dLOtP5JYkxTAf8v5znbouYYawBZLHAi P0KbIpQFmW+Qm9is1CDVZktnj79BFMcd+twMFQu/m9jhYdLUFqeEFCJ+sxCGcmoM n18ayAzbvCQCYz5dBOk2EQPgQoQKJGEOdc4IY0GdRcOwNcbw2hWbwbfGjLAKpLrA PVC8YmRsyT1CotXBXJEpn7jYFR2fnDOyO/5jq1JRDa6Mxeq3dECIRWof3pwQLnpI boQXIGHUlVWltF+hla3C =TG+F -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] Btrfs: fix regression when running delayed references
On Mon, Oct 26, 2015 at 9:22 AM, Liu Bowrote: > On 10/25/2015 06:04 PM, fdman...@kernel.org wrote: >> >> From: Filipe Manana >> >> In the kernel 4.2 merge window we had a refactoring/rework of the delayed >> references implementation in order to fix certain problems with qgroups. >> However that rework introduced one more regression that leads to the >> following trace when running delayed references for metadata: >> >> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! >> [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic >> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache >> sunrpc loop fuse parport_pc psmouse i2 >> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW >> 4.3.0-rc5-btrfs-next-17+ #1 >> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 >> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper >> [btrfs] >> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: >> 88010c4c8000 >> [35908.065201] RIP: 0010:[] [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 >> [35908.065201] RAX: RBX: 88008a661000 RCX: >> >> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: >> >> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: >> 88010c4cb9f8 >> [35908.065201] R10: R11: 002c R12: >> >> [35908.065201] R13: 88020a74c578 R14: R15: >> >> [35908.065201] FS: () GS:88023edc() >> knlGS: >> [35908.065201] CS: 0010 DS: ES: CR0: 8005003b >> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: >> 06e0 >> [35908.065201] Stack: >> [35908.065201] 88010c4cbb18 0f37 88020a74c578 >> 88015a408000 >> [35908.065201] 880154a44000 0005 >> 88010c4cbbd8 >> [35908.065201] a0492b9a 0005 >> >> [35908.065201] Call Trace: >> [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 >> [btrfs] >> [35908.065201] [] ? >> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] >> [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f >> [btrfs] >> [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd >> [btrfs] >> [35908.065201] [] delayed_ref_async_start+0x3c/0x7b >> [btrfs] >> [35908.065201] [] normal_work_helper+0x14c/0x32a >> [btrfs] >> [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 >> [btrfs] >> [35908.065201] [] process_one_work+0x24a/0x4ac >> [35908.065201] [] worker_thread+0x206/0x2c2 >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] kthread+0xef/0xf7 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] [] ret_from_fork+0x3f/0x70 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 >> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 >> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 >> [35908.065201] RIP [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP >> [35908.310885] ---[ end trace fe4299baf0666457 ]--- >> >> This happens because the new delayed references code no longer merges >> delayed references that have different sequence values. The following >> steps are an example sequence leading to this issue: >> >> 1) Transaction N starts, fs_info->tree_mod_seq has value 0; >> >> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for >> bytenr A is created, with a value of 1 and a seq value of 0; >> >> 3) fs_info->tree_mod_seq is incremented to 1; >> >> 4) Extent buffer A is deleted through btrfs_del_items(), which calls >> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The >> later returns the metadata extent associated to extent buffer A to >> the free space cache (the range is not pinned), because the extent >> buffer was created in the current transaction (N) and writeback never >> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set >> in the extent buffer). >> This creates the delayed reference Ref2 for bytenr A, with a value >> of -1 and a seq value of 1; >> >> 5) Delayed reference Ref2 is not merged with Ref1 when we create it, >> because they have different sequence numbers (decided at >> add_delayed_ref_tail_merge()); >> >> 6)
RE: [PATCH] btrfs: Remove code for no-cow in scrub/replace
Hi, Filipe Manana > -Original Message- > From: Filipe Manana [mailto:fdman...@gmail.com] > Sent: Friday, October 23, 2015 11:17 PM > To: Jeff Mahoney> Cc: Zhao Lei ; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs: Remove code for no-cow in scrub/replace > > On Fri, Oct 23, 2015 at 4:11 PM, Jeff Mahoney wrote: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > On 10/23/15 4:03 AM, Zhao Lei wrote: > >> Since we set source bg to readonly in scrub/replace, we don't need to > >> consider confliction of no-cow write in scrub/replace operaion. > > > > What happens if there's a read failure? IIRC the initial purpose of > > this code was to correct read failures during scrub and device > > replacement by fetching the bad extent from another device if one is > > available. > > And we don't have xfstests, or any other automated test suite, to test those > code paths. > So completely useless to run xfstests in a loop for 5 days, especially the > generic > tests which never trigger scrub runs... > Completely agree you. Something which was not writen in patch's comment, I also have a custom test, which include scrub on bad-block device: [root@ZLLINUX custom_tests]# ls -l total 36 -rwxr-xr-x 1 root root 15931 Oct 9 21:08 btrfs_maintance.sh -rwxr-xr-x 1 root root 2000 Apr 30 18:23 btrfs_out_of_space.sh -rwxr-xr-x 1 root root 5588 May 26 09:47 btrfs_replace_sumerr.sh -rwxr-xr-x 1 root root 1042 Sep 18 17:05 __loop_custom.sh -rwxr-xr-x 1 root root 939 Jul 14 18:15 xfstests.sh [root@ZLLINUX custom_tests]# # grep '()' btrfs_maintance.sh ... redundancy_func_test() scrub_func_test() replace_func_test() many_faildisk_test() [root@ZLLINUX custom_tests]# And this patch also tested in above script: # ./__loop_custom.sh ./btrfs_maintance.sh Loop 0 replace_func_test: raidtype=raid1 mkfs_opt= mount_opt=-o nodatacow max_disk= mkfs and mount: raid=raid1 dev_cnt=2 mkfs_opt= mount_opt=-o nodatacow: Writting some data: du: 17, writting 1M file du: 18, writting 2M file du: 20, writting 4M file du: 24, writting 8M file du: 32, writting 16M file reach enough space: 48% Start replace /dev/vdc -> /dev/vde OO:/dev/vdc OO:/dev/vdd #X:/dev/vde #O:/dev/vdc OO:/dev/vdd OO:/dev/vde Replace start in dmesg found Replace finish in dmesg found dmesg: OK file contents before remount: same dmesg: OK file contents after remount: same Check prune /dev/vdc #O:/dev/vdc OO:/dev/vdd OO:/dev/vde prune dsk /dev/vdc #X:/dev/vdc OO:/dev/vdd OO:/dev/vde dmesg: OK fs contents: same Check prune /dev/vdd #X:/dev/vdc OO:/dev/vdd OO:/dev/vde prune dsk /dev/vdd #X:/dev/vdc OX:/dev/vdd OO:/dev/vde dmesg: OK fs contents: same Start replace /dev/vde -> /dev/vdc #X:/dev/vdc OX:/dev/vdd OO:/dev/vde OO:/dev/vdc OX:/dev/vdd #O:/dev/vde Replace start in dmesg found Replace finish in dmesg found dmesg: OK file contents before remount: same I'll add above script into xfstests when it is complete and stable. Thanks Zhaolei > > > > > See commit 0ef8e45158f (btrfs scrub: add fixup code for errors on > > nodatasum files) > > > > - -Jeff > > > >> This patch removes special code for no-cow mode in scrub/replace, > >> reduced 670 lines. > >> > >> Tested by continuous xfstests in 5 days, include generic and btrfs > >> groups with 10 mount options include nodatacow. > >> > >> Signed-off-by: Zhao Lei --- > >> fs/btrfs/ctree.h | 1 - fs/btrfs/scrub.c | 669 > >> --- 2 files > >> changed, 670 deletions(-) > >> > >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index > >> 938efe3..3387509 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h > >> @@ -1688,7 +1688,6 @@ struct btrfs_fs_info { int > >> scrub_workers_refcnt; struct btrfs_workqueue *scrub_workers; struct > >> btrfs_workqueue *scrub_wr_completion_workers; - struct > >> btrfs_workqueue *scrub_nocow_workers; struct btrfs_workqueue > >> *scrub_parity_workers; > >> > >> #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY diff --git a/fs/btrfs/scrub.c > >> b/fs/btrfs/scrub.c index d64f557..6027679 > >> 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -205,32 > >> +205,6 @@ struct scrub_ctx { atomic_trefs; }; > >> > >> -struct scrub_fixup_nodatasum { - struct scrub_ctx*sctx; - > struct > >> btrfs_device *dev; - u64 logical; - struct > btrfs_root *root; - > >> struct btrfs_work work; - int mirror_num; -}; - > -struct > >> scrub_nocow_inode { - u64 inum; - u64 > offset; - u64 root; - > >> struct list_head list; -}; - -struct scrub_copy_nocow_ctx { - > >> struct scrub_ctx *sctx; -u64 > logical; - u64 len; - int > >> mirror_num; - u64 physical_for_dev_replace; - > struct list_head > >> inodes; -
[PATCH] btrfs: clear PF_NOFREEZE in cleaner_kthread()
From: Jiri Kosinacleaner_kthread() kthread calls try_to_freeze() at the beginning of every cleanup attempt. This operation can't ever succeed though, as the kthread hasn't marked itself as freezable. Before (hopefully eventually) kthread freezing gets converted to fileystem freezing, we'd rather mark cleaner_kthread() freezable (as my understanding is that it can generate filesystem I/O during suspend). Signed-off-by: Jiri Kosina --- fs/btrfs/disk-io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 295795a..173970d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1759,6 +1759,7 @@ static int cleaner_kthread(void *arg) int again; struct btrfs_trans_handle *trans; + set_freezable(); do { again = 0; -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
On Mon, Oct 26, 2015 at 1:19 AM, Qu Wenruowrote: > Ancient qgroup code call memcpy() on a extent buffer and use it for leaf > iteration. > > As extent buffer contains lock, pointers to pages, it's never sane to do > such copy. > > The following bug may be caused by this insane operation: > [92098.841309] general protection fault: [#1] SMP > [92098.841338] Modules linked in: ... > [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted > 4.3.0-rc1 #1 > [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper > [btrfs] > [92098.842261] Call Trace: > [92098.842277] [] ? read_extent_buffer+0xb8/0x110 > [btrfs] > [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 > [btrfs] > [92098.842329] [] > btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] > > Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(), > called in reading key from the copied extent_buffer. > > This patch will use btrfs_clone_extent_buffer() to a better copy of > extent buffer to deal such case. > > Reported-by: Stephane Lesimple > Suggested-by: Filipe Manana > Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana thanks Qu > --- > v2: > Follow the parameter change in previous patch. > v3: > None > v4: > Use btrfs_clone_extent_buffer() other than introducing new facilities > v5: > Change slot = path->slots[0] postion. > --- > fs/btrfs/qgroup.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 158633c..31d1934 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct > btrfs_trans_handle *trans) > */ > static int > qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, > - struct btrfs_trans_handle *trans, > - struct extent_buffer *scratch_leaf) > + struct btrfs_trans_handle *trans) > { > struct btrfs_key found; > + struct extent_buffer *scratch_leaf = NULL; > struct ulist *roots = NULL; > struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); > u64 num_bytes; > @@ -2233,7 +2233,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, > struct btrfs_path *path, > fs_info->qgroup_rescan_progress.objectid = found.objectid + 1; > > btrfs_get_tree_mod_seq(fs_info, _mod_seq_elem); > - memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); > + scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]); > + if (!scratch_leaf) { > + ret = -ENOMEM; > + mutex_unlock(_info->qgroup_rescan_lock); > + goto out; > + } > + extent_buffer_get(scratch_leaf); > + btrfs_tree_read_lock(scratch_leaf); > + btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK); > slot = path->slots[0]; > btrfs_release_path(path); > mutex_unlock(_info->qgroup_rescan_lock); > @@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, > struct btrfs_path *path, > goto out; > } > out: > + if (scratch_leaf) { > + btrfs_tree_read_unlock_blocking(scratch_leaf); > + free_extent_buffer(scratch_leaf); > + } > btrfs_put_tree_mod_seq(fs_info, _mod_seq_elem); > > return ret; > @@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > qgroup_rescan_work); > struct btrfs_path *path; > struct btrfs_trans_handle *trans = NULL; > - struct extent_buffer *scratch_leaf = NULL; > int err = -ENOMEM; > int ret = 0; > > path = btrfs_alloc_path(); > if (!path) > goto out; > - scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); > - if (!scratch_leaf) > - goto out; > > err = 0; > while (!err) { > @@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > if (!fs_info->quota_enabled) { > err = -EINTR; > } else { > - err = qgroup_rescan_leaf(fs_info, path, trans, > -scratch_leaf); > + err = qgroup_rescan_leaf(fs_info, path, trans); > } > if (err > 0) > btrfs_commit_transaction(trans, fs_info->fs_root); > @@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > } > > out: > - kfree(scratch_leaf); > btrfs_free_path(path); > > mutex_lock(_info->qgroup_rescan_lock); > -- > 2.6.2 > > -- > To unsubscribe from this list: send the line
RE: [PATCH] btrfs: Remove code for no-cow in scrub/replace
Hi, Jeff Mahoney Thanks for review! > -Original Message- > From: Jeff Mahoney [mailto:je...@suse.com] > Sent: Friday, October 23, 2015 11:11 PM > To: Zhao Lei; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs: Remove code for no-cow in scrub/replace > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 10/23/15 4:03 AM, Zhao Lei wrote: > > Since we set source bg to readonly in scrub/replace, we don't need to > > consider confliction of no-cow write in scrub/replace operaion. > > What happens if there's a read failure? IIRC the initial purpose of this code > was to correct read failures during scrub and device replacement by fetching > the bad extent from another device if one is available. > > See commit 0ef8e45158f (btrfs scrub: add fixup code for errors on nodatasum > files) > "nodatasum" is used to check "is the data in non-cow state", and the reason for using inode-writeback is to avoid same-time-writing in the block which is in scrubing. Comment in newest code scrub.c L1055 can give us the detail. (Introduced by comment: b5d67f64f) Since the entire bg was set to readonly in scrub period, there are no same-time write operation for both cow and non-cow bg, and the bio-based fix operation can works for all above case. Thanks Zhaolei > - -Jeff > > > This patch removes special code for no-cow mode in scrub/replace, > > reduced 670 lines. > > > > Tested by continuous xfstests in 5 days, include generic and btrfs > > groups with 10 mount options include nodatacow. > > > > Signed-off-by: Zhao Lei --- > > fs/btrfs/ctree.h | 1 - fs/btrfs/scrub.c | 669 > > --- 2 files > > changed, 670 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index > > 938efe3..3387509 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h > > @@ -1688,7 +1688,6 @@ struct btrfs_fs_info { int scrub_workers_refcnt; > > struct btrfs_workqueue *scrub_workers; struct > > btrfs_workqueue *scrub_wr_completion_workers; - struct > > btrfs_workqueue *scrub_nocow_workers; struct btrfs_workqueue > > *scrub_parity_workers; > > > > #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY diff --git a/fs/btrfs/scrub.c > > b/fs/btrfs/scrub.c index d64f557..6027679 > > 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -205,32 > > +205,6 @@ struct scrub_ctx { atomic_trefs; }; > > > > -struct scrub_fixup_nodatasum { - struct scrub_ctx*sctx; - > > struct > > btrfs_device*dev; - u64 logical; - struct > > btrfs_root *root; - > > struct btrfs_work work; - int mirror_num; -}; - > > -struct > > scrub_nocow_inode { - u64 inum; - u64 > > offset; - u64 > root; - > > struct list_headlist; -}; - -struct scrub_copy_nocow_ctx { - > > struct scrub_ctx*sctx; -u64 logical; - > > u64 len; - > int > > mirror_num; - u64 physical_for_dev_replace; - > > struct > list_head > > inodes; - struct btrfs_work work; -}; - struct scrub_warning { > > struct btrfs_path *path; u64 extent_item_size; @@ > > -242,8 > +216,6 > > @@ struct scrub_warning { > > > > static void scrub_pending_bio_inc(struct scrub_ctx *sctx); static void > > scrub_pending_bio_dec(struct scrub_ctx *sctx); -static void > > scrub_pending_trans_workers_inc(struct scrub_ctx *sctx); -static void > > scrub_pending_trans_workers_dec(struct scrub_ctx *sctx); static int > > scrub_handle_errored_block(struct scrub_block *sblock_to_check); > > static int scrub_setup_recheck_block(struct scrub_block > > *original_sblock, struct scrub_block *sblocks_for_recheck); @@ -298,13 > > +270,6 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, > > static void scrub_wr_submit(struct scrub_ctx *sctx); static void > > scrub_wr_bio_end_io(struct bio *bio, int err); static void > > scrub_wr_bio_end_io_worker(struct btrfs_work *work); -static int > > write_page_nocow(struct scrub_ctx *sctx, - u64 > > physical_for_dev_replace, struct page *page); -static int > > copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, - struct > > scrub_copy_nocow_ctx *ctx); -static int copy_nocow_pages(struct > > scrub_ctx *sctx, u64 logical, u64 len, - int mirror_num, u64 > > physical_for_dev_replace); -static void copy_nocow_pages_worker(struct > > btrfs_work *work); static void __scrub_blocked_if_needed(struct > > btrfs_fs_info *fs_info); static void scrub_blocked_if_needed(struct > > btrfs_fs_info *fs_info); static void scrub_put_ctx(struct scrub_ctx > > *sctx); @@ -355,60 +320,6 @@ static void > > scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) > > scrub_pause_off(fs_info); } > > > > -/* - * used for workers that require transaction commits (i.e., for > > the - * NOCOW case) -
[PATCH v2 0/5] btrfs-progs: Add all missing close_ctree and btrfs_close_all_devices
This patch add all missing close_ctree and btrfs_close_all_devices to several tools in btrfs progs, to avoid memory leak. Changelog v1->v2: Move btrfs_close_all_devices() from cmd-XXX into btrfs.c to make code simple, and avoid similar problem in cmd-XXX in future. Zhao Lei (5): btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs command btrfs-progs: Remove all btrfs_close_all_devices in sub-command btrfs-progs: Add all missing btrfs_close_all_devices to standalone tools btrfs-progs: Add missing close_ctree to btrfs-select-super.c btrfs-progs: use system's default path for math.h btrfs-calc-size.c| 1 + btrfs-debug-tree.c | 5 - btrfs-find-root.c| 1 + btrfs-map-logical.c | 1 + btrfs-select-super.c | 3 +++ btrfs.c | 9 - btrfstune.c | 1 + cmds-check.c | 1 - cmds-device.c| 3 --- cmds-replace.c | 2 -- extent-tree.c| 2 +- 11 files changed, 20 insertions(+), 9 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs command
Adding a btrfs_close_all_devices() after command callback in btrfs.c can force-close all opened device before program exit, to avoid memory leak in all btrfs sub-command. Suggested-by: David SterbaSigned-off-by: Zhao Lei --- btrfs.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/btrfs.c b/btrfs.c index 63df377..9416a29 100644 --- a/btrfs.c +++ b/btrfs.c @@ -18,6 +18,7 @@ #include #include +#include "volumes.h" #include "crc32c.h" #include "commands.h" #include "utils.h" @@ -214,6 +215,7 @@ int main(int argc, char **argv) { const struct cmd_struct *cmd; const char *bname; + int ret; if ((bname = strrchr(argv[0], '/')) != NULL) bname++; @@ -242,5 +244,10 @@ int main(int argc, char **argv) crc32c_optimization_init(); fixup_argv0(argv, cmd->token); - exit(cmd->fn(argc, argv)); + + ret = cmd->fn(argc, argv); + + btrfs_close_all_devices(); + + exit(ret); } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] btrfs-progs: use system's default path for math.h
Line of #include "math.h" in extent-tree.c using quotas is history reason, (we have cuseom math.h in source before) Now it is better to use "<>" instead of quotas for this header file. Signed-off-by: Zhao Lei--- extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extent-tree.c b/extent-tree.c index 0c8152a..0d605e1 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "kerncompat.h" #include "radix-tree.h" #include "ctree.h" @@ -28,7 +29,6 @@ #include "crc32c.h" #include "volumes.h" #include "free-space-cache.h" -#include "math.h" #include "utils.h" #define PENDING_EXTENT_INSERT 0 -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/5] btrfs-progs: Add missing close_ctree to btrfs-select-super.c
Add missing close_ctree() to btrfs-select-super.c to avoid memory leak. Signed-off-by: Zhao Lei--- btrfs-select-super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/btrfs-select-super.c b/btrfs-select-super.c index bd44978..df74153 100644 --- a/btrfs-select-super.c +++ b/btrfs-select-super.c @@ -102,6 +102,7 @@ int main(int ac, char **av) */ printf("using SB copy %llu, bytenr %llu\n", (unsigned long long)num, (unsigned long long)bytenr); + close_ctree(root); btrfs_close_all_devices(); return ret; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] btrfs-progs: Remove all btrfs_close_all_devices in sub-command
Since we have btrfs_close_all_devices() in btrfs's main entrance, it is not necessary to call btrfs_close_all_devices() separately in each sub-command. Signed-off-by: Zhao Lei--- cmds-check.c | 1 - cmds-device.c | 3 --- cmds-replace.c | 2 -- 3 files changed, 6 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 1f8caad..3af6e61 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9738,7 +9738,6 @@ out: free_root_recs_tree(_cache); close_out: close_ctree(root); - btrfs_close_all_devices(); err_out: if (ctx.progress_enabled) task_deinit(ctx.info); diff --git a/cmds-device.c b/cmds-device.c index 5f2b952..620ae8b 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -139,7 +139,6 @@ static int cmd_device_add(int argc, char **argv) error_out: close_file_or_dir(fdmnt, dirstream); - btrfs_close_all_devices(); return !!ret; } @@ -288,7 +287,6 @@ static int cmd_device_scan(int argc, char **argv) } out: - btrfs_close_all_devices(); return !!ret; } @@ -466,7 +464,6 @@ static int cmd_device_stats(int argc, char **argv) out: free(di_args); close_file_or_dir(fdmnt, dirstream); - btrfs_close_all_devices(); return err; } diff --git a/cmds-replace.c b/cmds-replace.c index 9ab8438..fadd2cd 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -330,7 +330,6 @@ static int cmd_replace_start(int argc, char **argv) } } close_file_or_dir(fdmnt, dirstream); - btrfs_close_all_devices(); return 0; leave_with_error: @@ -340,7 +339,6 @@ leave_with_error: close(fdmnt); if (fddstdev != -1) close(fddstdev); - btrfs_close_all_devices(); return 1; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] btrfs-progs: Add all missing btrfs_close_all_devices to standalone tools
This patch add all missing btrfs_close_all_devices() to standalone tools in btrfs progs, to avoid memory leak. Signed-off-by: Zhao Lei--- btrfs-calc-size.c| 1 + btrfs-debug-tree.c | 5 - btrfs-find-root.c| 1 + btrfs-map-logical.c | 1 + btrfs-select-super.c | 2 ++ btrfstune.c | 1 + 6 files changed, 10 insertions(+), 1 deletion(-) diff --git a/btrfs-calc-size.c b/btrfs-calc-size.c index 7287858..b756693 100644 --- a/btrfs-calc-size.c +++ b/btrfs-calc-size.c @@ -508,5 +508,6 @@ int main(int argc, char **argv) out: close_ctree(root); free(roots); + btrfs_close_all_devices(); return ret; } diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c index 7d8e876..8adc39f 100644 --- a/btrfs-debug-tree.c +++ b/btrfs-debug-tree.c @@ -28,6 +28,7 @@ #include "disk-io.h" #include "print-tree.h" #include "transaction.h" +#include "volumes.h" #include "utils.h" static int print_usage(int ret) @@ -428,5 +429,7 @@ no_node: printf("uuid %s\n", uuidbuf); printf("%s\n", PACKAGE_STRING); close_root: - return close_ctree(root); + ret = close_ctree(root); + btrfs_close_all_devices(); + return ret; } diff --git a/btrfs-find-root.c b/btrfs-find-root.c index 01b3603..fc3812c 100644 --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -216,5 +216,6 @@ int main(int argc, char **argv) out: btrfs_find_root_free(); close_ctree(root); + btrfs_close_all_devices(); return ret; } diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c index d9fa6b2..0161b5c 100644 --- a/btrfs-map-logical.c +++ b/btrfs-map-logical.c @@ -359,5 +359,6 @@ close: close_ctree(root); if (ret < 0) ret = 1; + btrfs_close_all_devices(); return ret; } diff --git a/btrfs-select-super.c b/btrfs-select-super.c index b790f3e..bd44978 100644 --- a/btrfs-select-super.c +++ b/btrfs-select-super.c @@ -23,6 +23,7 @@ #include #include "kerncompat.h" #include "ctree.h" +#include "volumes.h" #include "disk-io.h" #include "print-tree.h" #include "transaction.h" @@ -101,5 +102,6 @@ int main(int ac, char **av) */ printf("using SB copy %llu, bytenr %llu\n", (unsigned long long)num, (unsigned long long)bytenr); + btrfs_close_all_devices(); return ret; } diff --git a/btrfstune.c b/btrfstune.c index c248ee6..0907aa9 100644 --- a/btrfstune.c +++ b/btrfstune.c @@ -548,6 +548,7 @@ int main(int argc, char *argv[]) } out: close_ctree(root); + btrfs_close_all_devices(); return ret; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] Btrfs: fix regression when running delayed references
On 10/25/2015 06:04 PM, fdman...@kernel.org wrote: From: Filipe MananaIn the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the extent at offset A, so a new extent buffer at offset A
[GIT PULL] Btrfs fixes for delayed refs regression and a deadlock
From: Filipe MananaHi Chris, please consider the following fixes for the 4.4 merge window (they were previously sent to the mailing list already). They fix an issue with delayed references that makes us hit some BUG_ONs as of the 4.2 kernel release. A lot of people have been hitting this and reported it in the mailing list and bugzilla. For at least some of them this has been making it impossible to run a balance on a 4.2+ kernel, such as Stéphane's case on his multi terabyte filesystem. I've tagged both for stable and included review tags that people gave through the mailing list. A very special thanks to Stéphane Lesimple for volunteering not only to test these fixes (balance took over 1 day to complete on his fs!) but also debug patches to help me figure out what was leading to the crashes. Not only balance finishes successfully for him now, but fsck also does not report any inconsistencies and his filesystem seems healthy (his files, snapshots, etc, seem all ok). As a bonus, the second patch also ends up fixing a deadlock in the clone ioctl when qgroups are enabled (reported by Elias Probst in the mailing list). Thanks. The following changes since commit a9e6d153563d2ed69c6cd7fb4fa5ce4ca7c712eb: Merge branch 'allocator-fixes' into for-linus-4.4 (2015-10-21 19:00:38 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git delayed-refs-balance-fix-4.4 for you to fetch changes up to b06c4bf5c874a57254b197f53ddf588e7a24a2bf: Btrfs: fix regression running delayed references when using qgroups (2015-10-25 19:53:26 +) Filipe Manana (2): Btrfs: fix regression when running delayed references Btrfs: fix regression running delayed references when using qgroups fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-ref.c | 139 --- fs/btrfs/delayed-ref.h | 7 ++- fs/btrfs/extent-tree.c | 59 ++- fs/btrfs/file.c| 10 +- fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 62 +- fs/btrfs/relocation.c | 16 +++- fs/btrfs/tree-log.c| 2 +- 9 files changed, 170 insertions(+), 133 deletions(-) -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
random i/o error without error in dmesg
Hi, I have this error for a time, It's not easy to reproduce, i write everything i know at the moment. I maintain some servers running xen (4.5.1) and gentoo dom0 with recent kernels (3.18.*, 4.1.6, 4.2.3, 4.2.4). I use gentoo-sources patchset. Running xen domu s, for www and mysql. I have mysql servers in domu with high load (lots of read write). These systems are identical in term of configuration and kernel. Sometimes I got mysql errors randomly (sometimes more than one at a day, sometimes one at a week), but it is more frequent on high load. The mysql errors are because the file cannot be read from the filesystem. If i try to run md5sum on it it shows io error. At this point mysql stop && umount && mount && mysql start solves the problem. calling echo 3 > /proc/sys/vm/drop_caches sometimes solves the io error, but not every time. The problem rarely randomly fixed without remount. The problem seems to have no connection to the dom0 kernel and the xen version. I have this problem for example on these dom0 -s: kernel: 3.19.3 xen 4.5.0 kernel: 4.2.3 xen 4.5.1 The problem seems to have started with the kernel 4.0 series, but I am not sure. In the summer the load was low, and the problem occured very rarely. In this case of io error: btrfs scrub finds no error. no memory or hdd/ssd hardware error (smart, memtest, etc) (not only one physical server is affected) and no errors in dmesg at all. tried different kernel configs, but I don't think I have anything extraordinary. I use deadline scheduler. I use these mount options: /dev/xvdb1 on /mnt/mysql_naplo_b2 type btrfs (rw,noatime,compress=zlib,nossd,noacl,space_cache,subvolid=5,subvol=/) I tried to reformat the filesystem with recent btrfs-progs: (and olders before) btrfs-progs v4.2.2 I use default mkfs options (skinny extents) After format the problem was disappeared for some days. (it seems correlation with the age of the filesystem?) I do manual defragment on the filesystem with a script simply recursively check "filefrag" for count the fragmentation and defrag if it is more than 50 and the file is larger than 64kbyte. (this sometimes lowers the frequency of the problem) The files unreadable are usually small files, for example: filefrag: /mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD: 2 extents found ls -l: -rw-rw 1 mysql mysql 8092 okt 22 08.24 /mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD There is no error in dmesg, no io errors, no kernel panic, etc at all. The (virtual) servers has 3-4GB of memory, and I use a 2GB tmpfs for the temporary tables (this way the physical memory usage is somewhat hectic). The filesystem has no snapshots, but sometimes (for rebuilding replication) I take on, and delete it. (but the problem happens on filesystems with no snapshot created ever) I did not try downgrading the kernel (for 3.18), but I always try to upgrade. I guess this problem has some connection to the memory usage (but there is no out of memory). I am able to try any debug mode if you suggest one, but it's not reproducable, it happens randomly. I think there should be some errors in the dmesg if I encounter io errors, but I am not sure if this error has direct connection for btrfs at all. I didn't try other filesystems. The problem was occured with kernel versions: 4.0.1, 4.0.4, 4.1.6, 4.2.1, 4.2.3, 4.2.4. I checked the bugzilla, and google for similar problem, but I couldn't find any similar. This problem sometimes (i think it is the same) happen on a www server too, with apache log files (they are fragmented heavily), but very rarely. I don't have any problem with this configuration on other servers even mysql servers with lower load. I welcome any suggestion: László Szalma -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: Cleanup no_quota parameter
No_quota parameter for delayed_ref related function are meaningless after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan extent for its rfer/excl change without checking no_quota flag. So this patch will clean them up. Signed-off-by: Qu Wenruo--- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-ref.c | 26 ++ fs/btrfs/delayed-ref.h | 7 ++ fs/btrfs/extent-tree.c | 45 ++--- fs/btrfs/file.c| 10 - fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 60 +- fs/btrfs/relocation.c | 16 ++ fs/btrfs/tree-log.c| 2 +- 9 files changed, 43 insertions(+), 131 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index bc3c711..3fa3c3b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, - u64 owner, u64 offset, int no_quota); + u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len, int delalloc); @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, -u64 root_objectid, u64 owner, u64 offset, int no_quota); +u64 root_objectid, u64 owner, u64 offset); int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index bd9b63b..449974f 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node, list); /* No need to compare bytenr nor is_head */ - if (exist->type != ref->type || exist->no_quota != ref->no_quota || - exist->seq != ref->seq) + if (exist->type != ref->type || exist->seq != ref->seq) goto add_tail; if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY || @@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, int level, -int action, int no_quota) +int action) { struct btrfs_delayed_tree_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; @@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->action = action; ref->is_head = 0; ref->in_tree = 1; - ref->no_quota = no_quota; ref->seq = seq; full_ref = btrfs_delayed_node_to_tree_ref(ref); @@ -581,7 +579,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, u64 owner, -u64 offset, int action, int no_quota) +u64 offset, int action) { struct btrfs_delayed_data_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; @@ -604,7 +602,6 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref->action = action; ref->is_head = 0; ref->in_tree = 1; - ref->no_quota = no_quota; ref->seq = seq; full_ref = btrfs_delayed_node_to_data_ref(ref); @@ -635,17 +632,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, int level, int action, - struct btrfs_delayed_extent_op *extent_op, - int no_quota) + struct btrfs_delayed_extent_op *extent_op) { struct btrfs_delayed_tree_ref *ref; struct btrfs_delayed_ref_head *head_ref; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_qgroup_extent_record *record = NULL; - if (!is_fstree(ref_root) || !fs_info->quota_enabled) - no_quota = 0; - BUG_ON(extent_op && extent_op->is_data); ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
[PATCH 2/3] btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans
Between btrfs_allocerved_file_extent() and btrfs_add_delayed_qgroup_reserve(), there is a window that delayed_refs are run and delayed ref head maybe freed before btrfs_add_delayed_qgroup_reserve(). This will cause btrfs_dad_delayed_qgroup_reserve() to return -ENOENT, and cause transaction to be aborted. This patch will record qgroup reserve space info into delayed_ref_head at btrfs_add_delayed_ref(), to eliminate the race window. Reported-by: Filipe MananaSigned-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/delayed-ref.c | 22 +- fs/btrfs/delayed-ref.h | 2 +- fs/btrfs/extent-tree.c | 14 -- fs/btrfs/inode.c | 12 5 files changed, 32 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3fa3c3b..a8c9a27 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3403,7 +3403,8 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 root_objectid, u64 owner, -u64 offset, struct btrfs_key *ins); +u64 offset, u64 ram_bytes, +struct btrfs_key *ins); int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 root_objectid, u64 owner, u64 offset, diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 449974f..8d65427 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -422,7 +422,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *ref, struct btrfs_qgroup_extent_record *qrecord, -u64 bytenr, u64 num_bytes, int action, int is_data) +u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, +int action, int is_data) { struct btrfs_delayed_ref_head *existing; struct btrfs_delayed_ref_head *head_ref = NULL; @@ -431,6 +432,9 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, int count_mod = 1; int must_insert_reserved = 0; + /* If reserved is provided, it must be a data extent. */ + BUG_ON(!is_data && reserved); + /* * the head node stores the sum of all the mods, so dropping a ref * should drop the sum in the head node by one. @@ -480,6 +484,11 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, /* Record qgroup extent info if provided */ if (qrecord) { + if (ref_root && reserved) { + head_ref->qgroup_ref_root = ref_root; + head_ref->qgroup_reserved = reserved; + } + qrecord->bytenr = bytenr; qrecord->num_bytes = num_bytes; qrecord->old_roots = NULL; @@ -498,6 +507,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, existing = htree_insert(_refs->href_root, _ref->href_node); if (existing) { + WARN_ON(ref_root && reserved && existing->qgroup_ref_root + && existing->qgroup_reserved); update_existing_head_ref(delayed_refs, >node, ref); /* * we've updated the existing ref, free the newly @@ -664,7 +675,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, * the spin lock */ head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record, - bytenr, num_bytes, action, 0); + bytenr, num_bytes, 0, 0, action, 0); add_delayed_tree_ref(fs_info, trans, head_ref, >node, bytenr, num_bytes, parent, ref_root, level, action); @@ -687,7 +698,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, - u64 owner, u64 offset, int action, + u64 owner, u64 offset, u64 reserved, int action, struct btrfs_delayed_extent_op *extent_op) { struct btrfs_delayed_data_ref *ref; @@ -726,7 +737,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, * the spin lock */ head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record, - bytenr, num_bytes, action, 1); + bytenr, num_bytes,
[4.4][PATCH 0/3] btrfs: Qgroup hotfix
This patchset fixes 2 bugs: 1. Race condition leading to abort transaction Reported by Filipe, fixed by 2nd patch. 2. Qgroup low level double free leading to EDQUOT In fact, I hit such bug several times during internal rebase, but I'm so stupid to forgot to include it in v3 patchset. Fixed in 3rd patch. Qu Wenruo (3): btrfs: Cleanup no_quota parameter btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans btrfs: qgroup: Fix a rebase bug which will cause qgroup double free fs/btrfs/ctree.h | 7 +++--- fs/btrfs/delayed-ref.c | 48 fs/btrfs/delayed-ref.h | 9 +++- fs/btrfs/extent-tree.c | 55 ++--- fs/btrfs/file.c| 10 - fs/btrfs/inode.c | 16 +- fs/btrfs/ioctl.c | 60 +- fs/btrfs/qgroup.c | 4 fs/btrfs/relocation.c | 16 ++ fs/btrfs/tree-log.c| 2 +- 10 files changed, 73 insertions(+), 154 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup double free
When rebasing my patchset, I forgot to pick up a cleanup patch to remove old hotfix in 4.2 release. Witouth the cleanup, it will screw up new qgroup reserve framework and always cause minus reserved number. Signed-off-by: Qu Wenruo--- fs/btrfs/qgroup.c | 4 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 158633c..7664a63 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1652,10 +1652,6 @@ static int qgroup_update_counters(struct btrfs_fs_info *fs_info, } } - /* For exclusive extent, free its reserved bytes too */ - if (nr_old_roots == 0 && nr_new_roots == 1 && - cur_new_count == nr_new_roots) - qg->reserved -= num_bytes; if (dirty) qgroup_dirty(fs_info, qg); } -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v3] Btrfs: fix regression running delayed references when using qgroups
wrote on 2015/10/25 18:51 +: From: Filipe MananaIn the kernel 4.2 merge window we had a big changes to the implementation of delayed references and qgroups which made the no_quota field of delayed references not used anymore. More specifically the no_quota field is not used anymore as of: commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.") Leaving the no_quota field actually prevents delayed references from getting merged, which in turn cause the following BUG_ON(), at fs/btrfs/extent-tree.c, to be hit when qgroups are enabled: static int run_delayed_tree_ref(...) { (...) BUG_ON(node->ref_mod != 1); (...) } This happens on a scenario like the following: 1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. 2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota. 3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref2 is incompatible due to Ref2->no_quota != Ref3->no_quota. 4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref3 is incompatible due to Ref3->no_quota != Ref4->no_quota. 5) We run delayed references, trigger merging of delayed references, through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs(). 6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and all other conditions are satisfied too. So Ref1 gets a ref_mod value of 2. 7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and all other conditions are satisfied too. So Ref2 gets a ref_mod value of 2. 8) Ref1 and Ref2 aren't merged, because they have different values for their no_quota field. 9) Delayed reference Ref1 is picked for running (select_delayed_ref() always prefers references with an action == BTRFS_ADD_DELAYED_REF). So run_delayed_tree_ref() is called for Ref1 which triggers the BUG_ON because Ref1->red_mod != 1 (equals 2). So fix this by removing the no_quota field, as it's not used anymore as of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism."). The use of no_quota was also buggy in at least two places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. This fixes the remainder of problems several people have been having when running delayed references, mostly while a balance is running in parallel, on a 4.2+ kernel. Very special thanks to Stéphane Lesimple for helping debugging this issue and testing this fix on his multi terabyte filesystem (which took more than one day to balance alone, plus fsck, etc). Also, this fixes deadlock issue when using the clone ioctl with qgroups enabled, as reported by Elias Probst in the mailing list. The deadlock happens because after calling btrfs_insert_empty_item we have our path holding a write lock on a leaf of the fs/subvol tree and then before releasing the path we called check_ref() which did backref walking, when qgroups are enabled, and tried to read lock the same leaf. The trace for this case is the following: INFO: task systemd-nspawn:6095 blocked for more than 120 seconds. (...) Call Trace: [] schedule+0x74/0x83 [] btrfs_tree_read_lock+0xc0/0xea [] ? wait_woken+0x74/0x74 [] btrfs_search_old_slot+0x51a/0x810 [] btrfs_next_old_leaf+0xdf/0x3ce [] ? ulist_add_merge+0x1b/0x127 [] __resolve_indirect_refs+0x62a/0x667 [] ? btrfs_clear_lock_blocking_rw+0x78/0xbe [] find_parent_nodes+0xaf3/0xfc6 [] __btrfs_find_all_roots+0x92/0xf0 [] btrfs_find_all_roots+0x45/0x65 [] ? btrfs_get_tree_mod_seq+0x2b/0x88 [] check_ref+0x64/0xc4 [] btrfs_clone+0x66e/0xb5d [] btrfs_ioctl_clone+0x48f/0x5bb [] ? native_sched_clock+0x28/0x77 [] btrfs_ioctl+0xabc/0x25cb (...) Reported-by: Stéphane Lesimple Tested-by: Stéphane Lesimple Reported-by: Elias Probst Reported-by: Peter Becker Reported-by: Malte Schröder Reported-by: Derek Dongray Reported-by: Erkki Seppala
Re: corrupted RAID1: unsuccessful recovery / help needed
Lukas Pirl posted on Mon, 26 Oct 2015 19:19:50 +1300 as excerpted: > TL;DR: RAID1 does not recover, I guess the interesting part in the stack > trace is: [elided, I'm not a dev so it's little help to me] > > I'd appreciate some help for repairing a corrupted RAID1. > > Setup: > * Linux 4.2.0-12, Btrfs v3.17, > `btrfs fi show`: >uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23 >Total devices 6 FS bytes used 2.87TiB >devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/[...] >devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/ >devid 3 size 1.82TiB used 1.53TiB path /dev/mapper/ >devid 4 size 1.82TiB used 1.53TiB path /dev/mapper/ >devid 6 size 1.82TiB used 1.05TiB path /dev/mapper/ >*** Some devices missing > * disks are dm-crypted FWIW... Older btrfs userspace such as your v3.17 is "OK" for normal runtime use, assuming you don't need any newer features, as in normal runtime, it's the kernel code doing the real work and userspace for the most part simply makes the appropriate kernel calls to do that work. But, once you get into a recovery situation like the one you're in now, current userspace becomes much more important, as the various things you'll do to attempt recovery rely far more on userspace code directly accessing the filesystem, and it's only the newest userspace code that has the latest fixes. So for a recovery situation, the newest userspace release (4.2.2 at present) as well as a recent kernel is recommended, and depending on the problem, you may at times need to run integration or apply patches on top of that. > What happened: > * devid 5 started to die (slowly) > * added a new disk (devid 6) and tried `btrfs device delete` > * failed with kernel crashes (guess:) due to heavy IO errors > * removed devid 5 from /dev (deactivated in dm-crypt) > * tried `btrfs balance` >* interrupted multiple times due to kernel crashes > (probably due to semi-corrupted file system?) > * file system did not mount anymore after a required hard-reset > * no successful recovery so far: >if not read-only, kernel IO blocks eventually (hard-reset required) > * tried: >* `-o degraded` > -> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL >* `-o degraded,recovery` > -> IO freeze, kernel log: http://pastebin.com/VemHfnuS >* `-o degraded,recovery,ro` > -> file system accessible, system stable > * going rw again does not fix the problem > > I did not btrfs-zero-log so far because my oops did not look very > similar to the one in the Wiki and I did not want to risk to make > recovery harder. General note about btrfs and btrfs raid. Given that btrfs itself remains a "stabilizing, but not yet fully mature and stable filesystem", while btrfs raid will often let you recover from a bad device, sometimes that recovery is in the form of letting you mount ro, so you can access the data and copy it elsewhere, before blowing away the filesystem and starting over. Back to the problem at hand. Current btrfs has a known limitation when operating in degraded mode. That being, a btrfs raid may be write- mountable only once, degraded, after which it can only be read-only mounted. This is because under certain circumstances in degraded mode, btrfs will fall back from its normal raid mode to single mode chunk allocation for new writes, and once there's single-mode chunks on the filesystem, btrfs mount isn't currently smart enough to check that all chunks are actually available on present devices, and simply jumps to the conclusion that there's single mode chunks on the missing device(s) as well, so refuses to mount writable after that in ordered to prevent further damage to the filesystem and preserve the ability to mount at least ro, to copy off what isn't damaged. There's a patch in the pipeline for this problem, that checks individual chunks instead of leaping to conclusions based on the presence of single- mode chunks on a degraded filesystem with missing devices. If that's your only problem (which the backtraces might reveal but I as a non-dev btrfs user can't tell), the patches should let you mount writable. But that patch isn't in kernel 4.2. You'll need at least kernel 4.3-rc, and possibly btrfs integration, or to cherrypick the patches onto 4.2. Meanwhile, in keeping with the admin's rule on backups, by definition, if you valued the data more than the time and resources necessary for a backup, by definition, you have a backup available, otherwise, by definition, you valued the data less than the time and resources necessary to back it up. Therefore, no worries. Regardless of the fate of the data, you saved what your actions declared of most valuable to you, either the data, or the hassle and resources cost of the backup you didn't do. As such, if you don't have a backup (or if you do but it's outdated), the data at risk of loss is by definition of very limited value. That said, it
Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter
Filipe Manana wrote on 2015/10/26 08:14 +: On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruowrote: No_quota parameter for delayed_ref related function are meaningless after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan extent for its rfer/excl change without checking no_quota flag. So this patch will clean them up. Hi Qu, I already send a patch for this yesterday: https://patchwork.kernel.org/patch/7481901/ Sorry, I didn't notice the patch also removed no_quota... This is more than a cleanup, it fixes several bugs. The most important is crashes (BUG_ON) when running delayed references, mostly triggered during balance. The second one is a deadlock in the clone ioctl (reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html). The use of no_quota was also buggy in at least 2 places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. I want to get this to stable, together with the other delayed references fix, as a lot of people are unable to run balance as of kernel 4.2+. Sorry again for the regression I brought in 4.2. The rework for delayed_ref implement is not important at all, and in fact new qgroup accounting could work completely well without them. I'll update the changelog to reflect the clone ioctl deadlock issue, which I previously forgot. thanks That would be great. BTW what about split the patch into no_quota cleanup and other fixes? It's not that obvious if they are all put into one patch. Thanks, Qu Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-ref.c | 26 ++ fs/btrfs/delayed-ref.h | 7 ++ fs/btrfs/extent-tree.c | 45 ++--- fs/btrfs/file.c| 10 - fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 60 +- fs/btrfs/relocation.c | 16 ++ fs/btrfs/tree-log.c| 2 +- 9 files changed, 43 insertions(+), 131 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index bc3c711..3fa3c3b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, - u64 owner, u64 offset, int no_quota); + u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len, int delalloc); @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, -u64 root_objectid, u64 owner, u64 offset, int no_quota); +u64 root_objectid, u64 owner, u64 offset); int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index bd9b63b..449974f 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node, list); /* No need to compare bytenr nor is_head */ - if (exist->type != ref->type || exist->no_quota != ref->no_quota || - exist->seq != ref->seq) + if (exist->type != ref->type || exist->seq != ref->seq) goto add_tail; if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY || @@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, int level, -int action, int no_quota) +int action) { struct btrfs_delayed_tree_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; @@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->action = action; ref->is_head = 0;
[PATCH 1/2 v3] Btrfs: fix regression when running delayed references
From: Filipe MananaIn the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the extent at offset A, so a new extent buffer at offset A is allocated. This creates delayed reference Ref3 for bytenr A,
Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter
On Mon, Oct 26, 2015 at 8:25 AM, Qu Wenruowrote: > > > Filipe Manana wrote on 2015/10/26 08:14 +: >> >> On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruo >> wrote: >>> >>> No_quota parameter for delayed_ref related function are meaningless >>> after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan >>> extent for its rfer/excl change without checking no_quota flag. >>> >>> So this patch will clean them up. >> >> >> Hi Qu, >> >> I already send a patch for this yesterday: >> https://patchwork.kernel.org/patch/7481901/ > > > Sorry, I didn't notice the patch also removed no_quota... > >> >> This is more than a cleanup, it fixes several bugs. The most important >> is crashes (BUG_ON) when running delayed references, mostly triggered >> during balance. The second one is a deadlock in the clone ioctl >> (reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html). >> >> The use of no_quota was also buggy in at least 2 places: >> >> 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting >> no_quota to 0 instead of 1 when the following condition was true: >> is_fstree(ref_root) || !fs_info->quota_enabled >> >> 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to >> reset a node's no_quota when the condition >> "!is_fstree(root_objectid) >> || !root->fs_info->quota_enabled" was true but we did it only in >> an unused local stack variable, that is, we never reset the >> no_quota >> value in the node itself. >> >> I want to get this to stable, together with the other delayed >> references fix, as a lot of people are unable to run balance as of >> kernel 4.2+. > > > Sorry again for the regression I brought in 4.2. > The rework for delayed_ref implement is not important at all, and in fact > new qgroup accounting could work completely well without them. > >> I'll update the changelog to reflect the clone ioctl deadlock issue, >> which I previously forgot. >> >> thanks >> > > That would be great. > > BTW what about split the patch into no_quota cleanup and other fixes? No point in doing that. Fixing the balance regression requires removing the whole no_quota thing, fixing those 2 bugs it had alone, won't fix the problem leading to the BUG_ON. > It's not that obvious if they are all put into one patch. > > Thanks, > Qu > > >> >>> >>> Signed-off-by: Qu Wenruo >>> --- >>> fs/btrfs/ctree.h | 4 ++-- >>> fs/btrfs/delayed-ref.c | 26 ++ >>> fs/btrfs/delayed-ref.h | 7 ++ >>> fs/btrfs/extent-tree.c | 45 ++--- >>> fs/btrfs/file.c| 10 - >>> fs/btrfs/inode.c | 4 ++-- >>> fs/btrfs/ioctl.c | 60 >>> +- >>> fs/btrfs/relocation.c | 16 ++ >>> fs/btrfs/tree-log.c| 2 +- >>> 9 files changed, 43 insertions(+), 131 deletions(-) >>> >>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>> index bc3c711..3fa3c3b 100644 >>> --- a/fs/btrfs/ctree.h >>> +++ b/fs/btrfs/ctree.h >>> @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct >>> btrfs_trans_handle *trans, >>> int btrfs_free_extent(struct btrfs_trans_handle *trans, >>>struct btrfs_root *root, >>>u64 bytenr, u64 num_bytes, u64 parent, u64 >>> root_objectid, >>> - u64 owner, u64 offset, int no_quota); >>> + u64 owner, u64 offset); >>> >>> int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 >>> len, >>> int delalloc); >>> @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct >>> btrfs_trans_handle *trans, >>> int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, >>> struct btrfs_root *root, >>> u64 bytenr, u64 num_bytes, u64 parent, >>> -u64 root_objectid, u64 owner, u64 offset, int >>> no_quota); >>> +u64 root_objectid, u64 owner, u64 offset); >>> >>> int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, >>> struct btrfs_root *root); >>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >>> index bd9b63b..449974f 100644 >>> --- a/fs/btrfs/delayed-ref.c >>> +++ b/fs/btrfs/delayed-ref.c >>> @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle >>> *trans, >>> exist = list_entry(href->ref_list.prev, struct >>> btrfs_delayed_ref_node, >>> list); >>> /* No need to compare bytenr nor is_head */ >>> - if (exist->type != ref->type || exist->no_quota != ref->no_quota >>> || >>> - exist->seq != ref->seq) >>> + if (exist->type != ref->type || exist->seq != ref->seq) >>> goto add_tail; >>> >>> if ((exist->type ==
Re: Recover btrfs volume which can only be mounded in read-only mode
Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted: >> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple >> to code up and pretty simple to arrange tests for that run either one >> side or the other, but not both, or that are well balanced to both. >> However, it's pretty poor in terms of ensuring optimized real-world >> deployment read-scheduling. >> >> What it does is simply this. Remember, btrfs raid1 is specifically two >> copies. It chooses which copy of the two will be read very simply, >> based on the PID making the request. Odd PIDs get assigned one copy, >> even PIDs the other. As I said, simple to code, great for ensuring >> testing of one copy or the other or both, but not really optimized at >> all for real-world usage. >> >> If your workload happens to be a bunch of all odd or all even PIDs, >> well, enjoy your testing-grade read-scheduler, bottlenecking everything >> reading one copy, while the other sits entirely idle. > > I think PID-based solution is not the best one. Why not simply take a > random device? Then at least all drives in the volume are equally loaded > (in average). Nobody argues that the even/odd-PID-based read-scheduling solution is /optimal/, in a production sense at least. But at the time and for the purpose it was written it was pretty good, arguably reasonably close to "best", because the implementation is at once simple and transparent for debugging purposes, and real easy to test either one side or the other, or both, and equally important, to duplicate the results of those tests, by simply arranging for the testing to have either all even or all odd PIDs, or both. And for ordinary use, it's good /enough/, as ordinarily, PIDs will be evenly distributed even/odd. In that context, your random device read-scheduling algorithm would be far worse, because while being reasonably simple, it's anything *but* easy to ensure reads go to only one side or equally to both, or for that matter, to duplicate the tests, because randomization, by definition does /not/ lend itself to duplication. And with both simplicity/transparency/debuggability and duplicatability of testing being primary factors when the code went in... And again, the fact that it hasn't been optimized since then, in the context of "premature optimization", really says quite a bit about what the btrfs devs themselves consider btrfs' status to be -- obviously *not* production-grade stable and mature, or optimizations like this would have already been done. Like it or not, that's btrfs' status at the moment. Actually, the coming N-way-mirroring may very well be why they've not yet optimized the even/odd-PID mechanism already, because doing an optimized two-way would obviously be premature-optimization given the coming N-way, and doing an N-way clearly couldn't be properly tested at present, because only two-way is possible. Introducing an optimized N-way scheduler together with the N-way-mirroring code necessary to properly test it thus becomes a no-brainer. > From what you said I believe that certain servers will not benefit from > btrfs, e.g. dedicated server that runs only one "fat" Java process, or > one "huge" MySQL database. Indeed. But with btrfs still "stabilizing, but not entirely stable and mature", and indeed, various features still set to drop, and various optimizations still yet to do including this one, nobody, leastwise not the btrfs devs and knowledgeable regulars on this list, is /claiming/ that btrfs is at this time the be-all and end-all optimal solution for every single use-case. Rather far from it! As for the claims of salespeople... should any of them be making wild claims about btrfs, who in their sane mind takes salespeople's claims at face value in any case? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recover btrfs volume which can only be mounded in read-only mode
Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted: [Regarding the btrfs raid1 "device-with-the-most-space" chunk-allocation strategy.] > I think the mentioned strategy (fill in the device with most free space) > is not most effective. If the data is spread equally, the read > performance would be higher (reading from 3 disks instead of 2). In my > case this is even crucial, because the smallest drive is SSD (and it is > not loaded at all). > > Maybe I don't see the benefit from the strategy which is currently > implemented (besides that it is robust and well-tested)? Two comments: 1) As Hugo alluded to, in striped mode (raid0/5/6 and I believe 10), the chunk allocator goes wide, allocating a chunk from each device with free space, then striping at something smaller (64 KiB maybe?). When the smallest device is full, it reduces the width by one and continues allocating, down to the minimum stripe width for the raid type. However, raid1 and single do device-with-the-most-space first, thus, particularly for raid1, ensuring maximum usage of available space. Were raid1 to do width-first, capacity would be far lower and much more of the largest device would remain unusable, because some chunk pairs would be allocated entirely on the smaller devices, meaning less of the largest device would be used before the smaller devices fill up and no more raid1 chunks could be allocated as only the single largest device has free space left and raid1 requires allocation on two separate devices. In the three-device raid1 case, the difference in usable capacity would be 1/3 the capacity of the smallest device, since until it is full, 1/3 of all allocations would be to the two smaller devices, leaving that much more space unusable on the largest device. So you see there's a reason for most-space-first, that being that it forces one chunk from each pair-allocation to the largest device, thereby most efficiently distributing space so as to leave as little space as possible unusable due to only one device left when pair-allocation is required. 2) There has been talk of a more flexible chunk allocator with an admin- specified strategy allowing smart use of hybrid ssd/disk filesystems, for instance. Perhaps put the metadata on the ssds, for instance, since btrfs metadata is relatively hot as in addition to the traditional metadata, it contains the checksums which btrfs of course checks on read. However, this sort of thing is likely to be some time off, as it's relatively lower priority than various other possible features. Unfortunately, given the rate of btrfs development, "some time off" is in practice likely to be at least five years out. In the mean time, there's technologies such as bcache that allow hybrid caching of "hot" data, designed to present themselves as virtual block devices so btrfs as well as other filesystems can layer on top. And in fact, we have some regular users that have btrfs on top of bcache actually deployed, and from reports, it now works quite well. (There were some problems awhile in the past, but they're several years in the past now, back well before the last couple LTS kernel series that's the oldest recommended for btrfs deployment.) If you're interested, start a new thread with btrfs on bcache in the subject line, and you'll likely get some very useful replies. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter
Qu Wenruo wrote on 2015/10/26 16:25 +0800: Filipe Manana wrote on 2015/10/26 08:14 +: On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruowrote: No_quota parameter for delayed_ref related function are meaningless after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan extent for its rfer/excl change without checking no_quota flag. So this patch will clean them up. Hi Qu, I already send a patch for this yesterday: https://patchwork.kernel.org/patch/7481901/ Sorry, I didn't notice the patch also removed no_quota... This is more than a cleanup, it fixes several bugs. The most important is crashes (BUG_ON) when running delayed references, mostly triggered during balance. The second one is a deadlock in the clone ioctl (reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html). The use of no_quota was also buggy in at least 2 places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. I want to get this to stable, together with the other delayed references fix, as a lot of people are unable to run balance as of kernel 4.2+. Sorry again for the regression I brought in 4.2. The rework for delayed_ref implement is not important at all, and in fact new qgroup accounting could work completely well without them. I'll update the changelog to reflect the clone ioctl deadlock issue, which I previously forgot. thanks That would be great. BTW what about split the patch into no_quota cleanup and other fixes? It's not that obvious if they are all put into one patch. Just forget it... The cleanup itself will fix them all Thanks, Qu Thanks, Qu Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-ref.c | 26 ++ fs/btrfs/delayed-ref.h | 7 ++ fs/btrfs/extent-tree.c | 45 ++--- fs/btrfs/file.c| 10 - fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 60 +- fs/btrfs/relocation.c | 16 ++ fs/btrfs/tree-log.c| 2 +- 9 files changed, 43 insertions(+), 131 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index bc3c711..3fa3c3b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, - u64 owner, u64 offset, int no_quota); + u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len, int delalloc); @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, -u64 root_objectid, u64 owner, u64 offset, int no_quota); +u64 root_objectid, u64 owner, u64 offset); int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index bd9b63b..449974f 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node, list); /* No need to compare bytenr nor is_head */ - if (exist->type != ref->type || exist->no_quota != ref->no_quota || - exist->seq != ref->seq) + if (exist->type != ref->type || exist->seq != ref->seq) goto add_tail; if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY || @@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, int level, -int action, int no_quota) +int action) { struct btrfs_delayed_tree_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; @@ -548,7 +547,6 @@
[4.4][PATCH 0/3] btrfs: Qgroup hotfix
This patchset fixes 2 bugs: 1. Race condition leading to abort transaction Reported by Filipe, fixed by 2nd patch. 2. Qgroup low level double free leading to EDQUOT In fact, I hit such bug several times during internal rebase, but I'm so stupid to forgot to include it in v3 patchset. Fixed in 3rd patch. Qu Wenruo (3): btrfs: Cleanup no_quota parameter btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans btrfs: qgroup: Fix a rebase bug which will cause qgroup double free fs/btrfs/ctree.h | 7 +++--- fs/btrfs/delayed-ref.c | 48 fs/btrfs/delayed-ref.h | 9 +++- fs/btrfs/extent-tree.c | 55 ++--- fs/btrfs/file.c| 10 - fs/btrfs/inode.c | 16 +- fs/btrfs/ioctl.c | 60 +- fs/btrfs/qgroup.c | 4 fs/btrfs/relocation.c | 16 ++ fs/btrfs/tree-log.c| 2 +- 10 files changed, 73 insertions(+), 154 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter
On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruowrote: > No_quota parameter for delayed_ref related function are meaningless > after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan > extent for its rfer/excl change without checking no_quota flag. > > So this patch will clean them up. Hi Qu, I already send a patch for this yesterday: https://patchwork.kernel.org/patch/7481901/ This is more than a cleanup, it fixes several bugs. The most important is crashes (BUG_ON) when running delayed references, mostly triggered during balance. The second one is a deadlock in the clone ioctl (reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html). The use of no_quota was also buggy in at least 2 places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. I want to get this to stable, together with the other delayed references fix, as a lot of people are unable to run balance as of kernel 4.2+. I'll update the changelog to reflect the clone ioctl deadlock issue, which I previously forgot. thanks > > Signed-off-by: Qu Wenruo > --- > fs/btrfs/ctree.h | 4 ++-- > fs/btrfs/delayed-ref.c | 26 ++ > fs/btrfs/delayed-ref.h | 7 ++ > fs/btrfs/extent-tree.c | 45 ++--- > fs/btrfs/file.c| 10 - > fs/btrfs/inode.c | 4 ++-- > fs/btrfs/ioctl.c | 60 > +- > fs/btrfs/relocation.c | 16 ++ > fs/btrfs/tree-log.c| 2 +- > 9 files changed, 43 insertions(+), 131 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index bc3c711..3fa3c3b 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct > btrfs_trans_handle *trans, > int btrfs_free_extent(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > u64 bytenr, u64 num_bytes, u64 parent, u64 > root_objectid, > - u64 owner, u64 offset, int no_quota); > + u64 owner, u64 offset); > > int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len, >int delalloc); > @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct > btrfs_trans_handle *trans, > int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > u64 bytenr, u64 num_bytes, u64 parent, > -u64 root_objectid, u64 owner, u64 offset, int > no_quota); > +u64 root_objectid, u64 owner, u64 offset); > > int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, >struct btrfs_root *root); > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > index bd9b63b..449974f 100644 > --- a/fs/btrfs/delayed-ref.c > +++ b/fs/btrfs/delayed-ref.c > @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle > *trans, > exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node, >list); > /* No need to compare bytenr nor is_head */ > - if (exist->type != ref->type || exist->no_quota != ref->no_quota || > - exist->seq != ref->seq) > + if (exist->type != ref->type || exist->seq != ref->seq) > goto add_tail; > > if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY || > @@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, > struct btrfs_delayed_ref_head *head_ref, > struct btrfs_delayed_ref_node *ref, u64 bytenr, > u64 num_bytes, u64 parent, u64 ref_root, int level, > -int action, int no_quota) > +int action) > { > struct btrfs_delayed_tree_ref *full_ref; > struct btrfs_delayed_ref_root *delayed_refs; > @@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, > ref->action = action; > ref->is_head = 0; > ref->in_tree = 1; > - ref->no_quota = no_quota; > ref->seq = seq; > > full_ref = btrfs_delayed_node_to_tree_ref(ref); > @@ -581,7 +579,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info, > struct btrfs_delayed_ref_head *head_ref, > struct btrfs_delayed_ref_node *ref, u64 bytenr, >
[PATCH 2/2 v3] Btrfs: fix regression running delayed references when using qgroups
From: Filipe MananaIn the kernel 4.2 merge window we had a big changes to the implementation of delayed references and qgroups which made the no_quota field of delayed references not used anymore. More specifically the no_quota field is not used anymore as of: commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.") Leaving the no_quota field actually prevents delayed references from getting merged, which in turn cause the following BUG_ON(), at fs/btrfs/extent-tree.c, to be hit when qgroups are enabled: static int run_delayed_tree_ref(...) { (...) BUG_ON(node->ref_mod != 1); (...) } This happens on a scenario like the following: 1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. 2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota. 3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref2 is incompatible due to Ref2->no_quota != Ref3->no_quota. 4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref3 is incompatible due to Ref3->no_quota != Ref4->no_quota. 5) We run delayed references, trigger merging of delayed references, through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs(). 6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and all other conditions are satisfied too. So Ref1 gets a ref_mod value of 2. 7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and all other conditions are satisfied too. So Ref2 gets a ref_mod value of 2. 8) Ref1 and Ref2 aren't merged, because they have different values for their no_quota field. 9) Delayed reference Ref1 is picked for running (select_delayed_ref() always prefers references with an action == BTRFS_ADD_DELAYED_REF). So run_delayed_tree_ref() is called for Ref1 which triggers the BUG_ON because Ref1->red_mod != 1 (equals 2). So fix this by removing the no_quota field, as it's not used anymore as of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism."). The use of no_quota was also buggy in at least two places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. This fixes the remainder of problems several people have been having when running delayed references, mostly while a balance is running in parallel, on a 4.2+ kernel. Very special thanks to Stéphane Lesimple for helping debugging this issue and testing this fix on his multi terabyte filesystem (which took more than one day to balance alone, plus fsck, etc). Also, this fixes deadlock issue when using the clone ioctl with qgroups enabled, as reported by Elias Probst in the mailing list. The deadlock happens because after calling btrfs_insert_empty_item we have our path holding a write lock on a leaf of the fs/subvol tree and then before releasing the path we called check_ref() which did backref walking, when qgroups are enabled, and tried to read lock the same leaf. The trace for this case is the following: INFO: task systemd-nspawn:6095 blocked for more than 120 seconds. (...) Call Trace: [] schedule+0x74/0x83 [] btrfs_tree_read_lock+0xc0/0xea [] ? wait_woken+0x74/0x74 [] btrfs_search_old_slot+0x51a/0x810 [] btrfs_next_old_leaf+0xdf/0x3ce [] ? ulist_add_merge+0x1b/0x127 [] __resolve_indirect_refs+0x62a/0x667 [] ? btrfs_clear_lock_blocking_rw+0x78/0xbe [] find_parent_nodes+0xaf3/0xfc6 [] __btrfs_find_all_roots+0x92/0xf0 [] btrfs_find_all_roots+0x45/0x65 [] ? btrfs_get_tree_mod_seq+0x2b/0x88 [] check_ref+0x64/0xc4 [] btrfs_clone+0x66e/0xb5d [] btrfs_ioctl_clone+0x48f/0x5bb [] ? native_sched_clock+0x28/0x77 [] btrfs_ioctl+0xabc/0x25cb (...) Reported-by: Stéphane Lesimple Tested-by: Stéphane Lesimple Reported-by: Elias Probst Reported-by: Peter Becker Reported-by: Malte Schröder Reported-by: Derek Dongray Reported-by: Erkki Seppala Cc: sta...@vger.kernel.org # 4.2+ Signed-off-by: Filipe Manana
corrupted RAID1: unsuccessful recovery / help needed
TL;DR: RAID1 does not recover, I guess the interesting part in the stack trace is: Call Trace: [] __del_reloc_root+0x30/0x100 [btrfs] [] free_reloc_roots+0x25/0x40 [btrfs] [] merge_reloc_roots+0x18e/0x240 [btrfs] [] btrfs_recover_relocation+0x374/0x420 [btrfs] [] open_ctree+0x1b7d/0x23e0 [btrfs] [] btrfs_mount+0x94e/0xa70 [btrfs] [] ? find_next_bit+0x15/0x20 [] mount_fs+0x38/0x160 … Hello list. I'd appreciate some help for repairing a corrupted RAID1. Setup: * Linux 4.2.0-12, Btrfs v3.17, `btrfs fi show`: uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23 Total devices 6 FS bytes used 2.87TiB devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/WD-WCC4J7AFLTSZ devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/WD-WCAU45343103 devid 3 size 1.82TiB used 1.53TiB path /dev/mapper/WD-WCAVY6423276 devid 4 size 1.82TiB used 1.53TiB path /dev/mapper/WD-WCAZAF872578 devid 6 size 1.82TiB used 1.05TiB path /dev/mapper/WD-WMC4M0H3Z5UK *** Some devices missing * disks are dm-crypted What happened: * devid 5 started to die (slowly) * added a new disk (devid 6) and tried `btrfs device delete` * failed with kernel crashes (guess:) due to heavy IO errors * removed devid 5 from /dev (deactivated in dm-crypt) * tried `btrfs balance` * interrupted multiple times due to kernel crashes (probably due to semi-corrupted file system?) * file system did not mount anymore after a required hard-reset * no successful recovery so far: if not read-only, kernel IO blocks eventually (hard-reset required) * tried: * `-o degraded` -> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL * `-o degraded,recovery` -> IO freeze, kernel log: http://pastebin.com/VemHfnuS * `-o degraded,recovery,ro` -> file system accessible, system stable * going rw again does not fix the problem I did not btrfs-zero-log so far because my oops did not look very similar to the one in the Wiki and I did not want to risk to make recovery harder. Thanks, Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exclusive quota of snapshot exceeded despite no space used
Thanks a lot for your reply! While remounting the filesystem fixes the issue temporary, it doesn't take very long for the bug to happen again so it's not really a workaround I can work with. I did recompile the kernel using your patches, but unfortunately the problem still appears. Thanks, Johannes Interesting, just touching file will cause EQUOTA is quite a big problem. I'll try to reproduce it with my patchset and see what really caused the problem. The problem seems to do with snapshot qgroup hacking. But I'm not completely sure yet. BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K? 16K is the correct number with only 6 empty files, just in case. Thanks, Qu I ran my example from the first mail again and managed to write 7 files this time, "qgroup show" still shows 16kB after sync: root@t420:/media/extern/snap# btrfs qg limit -e 50M . root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘8’: Disk quota exceeded ^C root@t420:/media/extern/snap# sync root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- root@t420:/media/extern/snap# btrfs fi sync . FSSync '.' root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- By the way, I don't if its relevant but the problem is not limited to exclusive quotas, but also happens when setting a "referenced" limit (qgroup limit without "-e"). Thanks, Johannes The bug is located, and turns out to be quite a stupid problem caused by myself. I just forgot to include a cleanup patch during rebase AGAIN!!! You can apply the following patch to resolve it: [PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup double free Or just apply the whole patchset: [4.4][PATCH 0/3] btrfs: Qgroup hotfix At least, with the patchset based on Chris' integration-4.4 branch, it succeeded in touching all the 100 files in my test box. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with 8TB SMR drives
I decided to give this ST8000AS0002 a try for storing old snapshots, although standardization for more optimal/native contol of SMR drives is still ongoing. I saw people got it working with 3.18 kernel, so that gave confidence. I wanted to see if i could get it running with 4.3.0-rc6 kernel (and 4.2.3 tools) on an H87M-Pro eSata (non-Intel) port. Filesystem is btrfs all single profiles on top of dm-crypt and mounted with compress-force=zlib,nossd (I use the drive via bcache but currently with not attached to a cache device). The initial snapshot send | receive action crashed after 1.2TB transferred, with all the typical/known problems in dmesg Then same trial, newly created fs, on 1 of the Intel sata ports. Also the same timeouts seen in dmesg, but fs already corrupted after a few GB of datatransfer. It seemed that the drive was not able to handle and store the filesystemdatastream that was being pushed onto it. So I did some step back and just created an ext4 on it and did and rsync copy.Unfortunately, also the same timouts, port resets etc. As the drive made the main system unstable, I hooked it up to an AMD E-350 based board, also to try other kernels. Also on this board, no success with 4.x kernels and also not with 3.18.22 in the first place. But I figured out that a powercycle did the trick and not just a hard- or softreset. So again created fs from scratch and mounted as indicated. Now it is 55% filled (3.9TiB) with 10 snapshots (done as increments from the source fs from late 2013, with uncompressed allocation of about 5.6 TiB). The whole datatransfer took about 4 days, which is roughly 10x slower than what would be achieved if the drive were non-SMR and in a fast (e.g. Core i7) system. Although the task below took more than 8 minutes: [322087.174089] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] ... the fs and system runs OK. My take is that this relatively low average datatransfer (one reason I forced zlib compression) helped getting the task done successfully for this device-managed SMR drive, but it is unsatisfying that there are kernel version and computerystem dependencies. I had limited time for preparing and setting up the datatransfer, so other configurations with new kernels might also work, but I had most confidence upfront in the one that has turned out to work. Maybe now that all data is on the drive, I shrink the fs and create a test fs in a second partition. On Sat, Oct 24, 2015 at 5:27 AM, Ken Longwrote: > Hello, > > I have a a single version of this drive formatted with btrfs. Its my > only btrfs drive on this machine. > I'm getting similar errors. Is there any info I can provide to help > troubleshoot this? > > Is a full dmesg still wanted? > > here's what I'm running- > > $ uname -a > Linux machine 4.2.0-16-lowlatency #19-Ubuntu SMP PREEMPT Thu Oct 8 > 16:19:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with 8TB SMR drives
I decided to give this ST8000AS0002 a try for backups / storing old snapshots, although standardization for more optimal/native contol of SMR drives is still ongoing. I saw people got it working with 3.18 kernel, so that gave confidence. I wanted to see if i could get it running with 4.3.0-rc6 kernel (and 4.2.3 tools) on an H87M-Pro eSata (non-Intel) port. Filesystem is btrfs all single profiles on top of dm-crypt and mounted with compress-force=zlib,nossd (I use the drive via bcache but currently with not attached to a cache device). The initial snapshot send | receive action crashed after 1.2TB transferred, with all the typical/known problems in dmesg. Then same trial, newly created fs, on 1 of the Intel sata ports. Also the same timeouts seen in dmesg, but fs already corrupted after a few GB of datatransfer. It seemed that the drive was not able to handle and store the filesystemdatastream that was being pushed onto it. So I did some step back and just created an ext4 on it and did and rsync copy.Unfortunately, also the same timouts, port resets etc. As the drive made the main system unstable, I hooked it up to an AMD E-350 based board, also to try other kernels. Also on this board, no success with 4.x kernels and also not with 3.18.22 in the first place. But I figured out that a powercycle did the trick and not just a hard- or softreset. So again created fs from scratch and mounted as indicated. Now it is 55% filled (3.9TiB) with 10 snapshots (done as increments from the source fs from late 2013, with uncompressed allocation of about 5.6 TiB). The whole datatransfer took about 4 days, which is roughly 10x slower than what would be achieved if the drive were non-SMR and in a fast (e.g. Core i7) system. Although the task below took more than 8 minutes: [322087.174089] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] ... the fs and system runs OK. My take is that this relatively low average datatransfer (one reason I forced zlib compression) helped getting the task done successfully for this device-managed SMR drive, but it is unsatisfying that there are kernel version and computerystem dependencies. I had limited time for preparing and setting up the datatransfer, so other configurations with new kernels might also work, but I had most confidence upfront in the one that has turned out to work. Maybe now that all data is on the drive, I shrink the fs and create a test fs in a second partition. On Sat, Oct 24, 2015 at 5:27 AM, Ken Longwrote: > Hello, > > I have a a single version of this drive formatted with btrfs. Its my > only btrfs drive on this machine. > I'm getting similar errors. Is there any info I can provide to help > troubleshoot this? > > Is a full dmesg still wanted? > > here's what I'm running- > > $ uname -a > Linux machine 4.2.0-16-lowlatency #19-Ubuntu SMP PREEMPT Thu Oct 8 > 16:19:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()
On Mon, Oct 26, 2015 at 12:19:33PM +, Pádraig Brady wrote: > On 26/10/15 03:39, Christoph Hellwig wrote: > > On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote: > >> I'm a bit worried about the sparse expansion and default reflinking > >> which might preclude cp(1) from using this call in most cases, but I will > >> test and try to use it. coreutils has heuristics for determining if files > >> are remote, which we might use to restrict to that use case. > > > > Can you explain why reflinking and hole expansion are an issue if done > > locally and not if done remotely? I'd really like to make the call as > > usable as possible for everyone, but we really need clear sem�ntics for > > that. > > Fair point on local vs remote. I was just assuming that remote > copy offload would not do reflinking on the backend, or at > least wasn't an exposed option over the remote interface. The server could definitely do a reflink. More generally, from the description of the NFS COPY operation: https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-39#page-64 If the copy completes successfully, either synchronously or asynchronously, the data copied from the source file to the destination file MUST appear identical to the NFS client. However, the NFS server's on disk representation of the data in the source file and destination file MAY differ. For example, the NFS server might encrypt, compress, deduplicate, or otherwise represent the on disk data in the source and destination file differently. > I get the impression that you think reflinking should be hidden > from the user, i.e. cp(1) should not have had the --reflink option > (for the last 6 years)? I'm not convinced of that, and even so > I think lower level interfaces would benefit from finer grained options. > This would be especially useful since there is no general interface > to reflink at present. I was happy with the reflink control options, > thinking the extra control could allow cp to use this by default. Maybe that's a case for Christoph's "clone" operation. I agree with him that it makes sense to allow the filesystem to implement "copy" using reflink or similar tricks under the covers. And that in fact it's difficult to imagine how you'd prevent that in the presence of layers of filesystem or block protocols underneath. That "cp" flag seems strange to me, but if "cp" wants to take advantage of a copy system call while continuing to make something like that distinction then I suppose it could fallocate the destination range file after the copy. --b. > > Also note that Annas current series allows for hole filling - any decent > > implementation should not do them, but that's really a quality of > > implementation and not an interface issue. > > I think you're saying the default `cp --sparse=auto` operation > could rely on copy_file_range(...complete file...), while > cp --sparse={always,never} would have to iterate over the > file, punching or filling holes as appropriate. I thought > Anna indicated differently wrt splice filling holes by default. > > TBH I'm not clear on the semantics of the current implementation, > so need to test the above in various cases. > > thanks, > Pádraig. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exclusive quota of snapshot exceeded despite no space used
Johannes Henninger wrote on 2015/10/27 01:15 +0100: On 26.10.2015 08:12, Qu Wenruo wrote: Thanks a lot for your reply! While remounting the filesystem fixes the issue temporary, it doesn't take very long for the bug to happen again so it's not really a workaround I can work with. I did recompile the kernel using your patches, but unfortunately the problem still appears. Thanks, Johannes Interesting, just touching file will cause EQUOTA is quite a big problem. I'll try to reproduce it with my patchset and see what really caused the problem. The problem seems to do with snapshot qgroup hacking. But I'm not completely sure yet. BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K? 16K is the correct number with only 6 empty files, just in case. Thanks, Qu I ran my example from the first mail again and managed to write 7 files this time, "qgroup show" still shows 16kB after sync: root@t420:/media/extern/snap# btrfs qg limit -e 50M . root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘8’: Disk quota exceeded ^C root@t420:/media/extern/snap# sync root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- root@t420:/media/extern/snap# btrfs fi sync . FSSync '.' root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- By the way, I don't if its relevant but the problem is not limited to exclusive quotas, but also happens when setting a "referenced" limit (qgroup limit without "-e"). Thanks, Johannes The bug is located, and turns out to be quite a stupid problem caused by myself. I just forgot to include a cleanup patch during rebase AGAIN!!! You can apply the following patch to resolve it: [PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup double free Or just apply the whole patchset: [4.4][PATCH 0/3] btrfs: Qgroup hotfix At least, with the patchset based on Chris' integration-4.4 branch, it succeeded in touching all the 100 files in my test box. Thanks, Qu It's working! Thank you so much for fixing this bug, you don't even know how much this has helped me! Thanks! Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Glad to hear that. If it's working for you, it would be better to add a 'Tested-by' tag for the 3rd patch. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
Hello, currently my computer freezes every several seconds for half a second or so. Using it feels like I'm playing musical chairs with the kernel. I have just one download happening on utorrent right now - this is what the graph looks like: http://i.imgur.com/LqhMtrJ.png and every time a new spike happens, a freeze happens just before that... that's the only time those freezes happen, too. Please advise. On Mon, Oct 26, 2015 at 7:31 PM, cheater00 .wrote: > I do not experience btrfs-transacti going up to 100% for minutes at a > time now (not reproduced yet) but I have it spiking up to say 30% for > a short while and everything jags during that time. So, say, if I am > watching youtube, the sound cuts out and the video drops out for a > bit. And if I'm typing, then what I typed during that time gets lost, > like if I never typed that. > > I have also connected the same HDD bay with a USB3 cable instead of > USB2. It's on an USB3 port. So it's running via USB3 now. > > > On Mon, Oct 26, 2015 at 6:43 PM, cheater00 . wrote: >> So far I cannot reproduce. If I don't post again this means the issue >> has been fixed by updating the kernel. >> >> On Mon, Oct 26, 2015 at 4:40 PM, cheater00 . wrote: >>> I have located 4.3.0-rc7 binaries which I will now try. >>> >>> On Mon, Oct 26, 2015 at 3:38 PM, cheater00 . wrote: Thanks for the reply. What version did this go into? I'll try getting a prebuilt backport of the kernel, building source could slow things down considerably, but debs will not be available for the latest few minor versions I guess. So if you can tell me a min version, I'll try to find the latest deb newer than that, or I'll build if that's not available. On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo wrote: > On 10/26/2015 08:16 PM, cheater00 . wrote: >> >> Hi guys, >> I am running into really bad performance. Here's my setup: >> >> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >> 32-bit with kernel 4.0.4-040004-generic #201505171336. >> >> Single btrfs partition covering whole disk. >> >> Autodefrag is on. >> >> fstab line: >> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >> >> Sometimes when files are being modified or removed, I see >> btrfs-transacti eat 100% cpu; during this time no io operations >> succeed, that is, they're all stalled. You can't even ls on that fs. >> This happens for several minutes then normal operation resumes. There >> doesn't seem to be a rule to what will trigger this, other than >> opening a single file and reading usually works quite well. (say, >> watching a movie while all other programs are closed). But even moving >> files off the disks triggers some sort of bug. Just now I am moving a >> few files (just 30gb worth) onto another disk, and the bug triggers. >> So btrfs-transacti was eating my cpu for over 5 minutes and according >> to mv's output after this was done and cpu usage went back to normal >> what I was waiting for was for a tiny png file to be removed. This is >> pretty bad. >> >> I have tried defragmenting directories where files are being accessed >> and moved. This hasn't helped. >> >> This happens whether the FS is near full or not. It currently is near >> full but it wasn't before and it still did that. It still has about ~ >> 100GB free space now. >> >> The more things are happening the more often this bug gets triggered. >> So if I have utorrent running and its temporary downloads directory is >> there, its download speed graph will be a few spikes of running at >> several MB/sec separated by durations of 0 download speed. >> >> Nothing seems to show up in dmesg or syslog. >> >> I have asked in #btrfs but the suggestions ended up not fixing the >> issue (autodefrag, defrag dirs). >> >> Please advise what I should do with this issue. > > > It might be related to delayed ref rework, the last time I saw this kind > of > hanging problem about btrfs-transaction eating cpu is that because btrfs > doesn't merge delayed refs, it'd be best to try the lastest kernel and if > the issue is not resolved, then we can work out a reproducer and provide > debugging. > > Thanks, > > Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Print Warning only if ENOSPC_DEBUG is enabled
Signed-off-by : Ashish Samant--- fs/btrfs/delayed-inode.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index a2ae427..b86cfd9 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -652,9 +652,13 @@ static int btrfs_delayed_inode_reserve_metadata( goto out; ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes); - if (!WARN_ON(ret)) + if (!ret) goto out; + if (btrfs_test_opt(root, ENOSPC_DEBUG)) + WARN(1, KERN_DEBUG +"btrfs: block rsv migrate returned %d\n", ret); + /* * Ok this is a problem, let's just steal from the global rsv * since this really shouldn't happen that often. -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
Filipe Manana wrote on 2015/10/25 14:39 +: On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruowrote: Add new function btrfs_add_delayed_qgroup_reserve() function to record how much space is reserved for that extent. As btrfs only accounts qgroup at run_delayed_refs() time, so newly allocated extent should keep the reserved space until then. So add needed function with related members to do it. Signed-off-by: Qu Wenruo --- v2: None v3: None --- fs/btrfs/delayed-ref.c | 29 + fs/btrfs/delayed-ref.h | 14 ++ 2 files changed, 43 insertions(+) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ac3e81d..bd9b63b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(_ref->ref_list); head_ref->processing = 0; head_ref->total_ref_mod = count_mod; + head_ref->qgroup_reserved = 0; + head_ref->qgroup_ref_root = 0; /* Record qgroup extent info if provided */ if (qrecord) { @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, return 0; } +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info, +struct btrfs_trans_handle *trans, +u64 ref_root, u64 bytenr, u64 num_bytes) +{ + struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_delayed_ref_head *ref_head; + int ret = 0; + + if (!fs_info->quota_enabled || !is_fstree(ref_root)) + return 0; + + delayed_refs = >transaction->delayed_refs; + + spin_lock(_refs->lock); + ref_head = find_ref_head(_refs->href_root, bytenr, 0); + if (!ref_head) { + ret = -ENOENT; + goto out; + } Hi Qu, So while running btrfs/063, with qgroups enabled (I modified the test to enable qgroups), ran into this 2 times: [169125.246506] BTRFS info (device sdc): disk space caching is enabled [169125.363164] [ cut here ] [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929 btrfs_finish_ordered_io+0x347/0x4eb [btrfs]() [169125.367702] BTRFS: Transaction aborted (error -2) [169125.368830] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4 psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod e1000 virtio [last unloaded: btrfs] [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 Hi Filipe, Although not related to the bug report, I'm a little interested in your testing kernel. Are you testing integration-4.4 from Chris repo? Or 4.3-rc from mainline repo with my qgroup reserve patchset applied? Although integration-4.4 already merged qgroup reserve patchset, but it's causing some strange bug like over decrease data sinfo->bytes_may_use, mainly in generic/127 testcase. But if qgroup reserve patchset is rebased to integration-4.3 (I did all my old tests based on that), no generic/127 problem at all. Thanks, Qu [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [169125.382167] 88007ef2bc28 812566f4 88007ef2bc70 [169125.383643] 88007ef2bc60 8104d0a6 a03cac33 8801f5ca6db0 [169125.385197] 8802c6c7ee98 880122bc1000 fffe 88007ef2bcc8 [169125.386691] Call Trace: [169125.387194] [] dump_stack+0x4e/0x79 [169125.388205] [] warn_slowpath_common+0x9f/0xb8 [169125.389386] [] ? btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.390837] [] warn_slowpath_fmt+0x48/0x50 [169125.391839] [] ? unpin_extent_cache+0xbe/0xcc [btrfs] [169125.392973] [] btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.395714] [] ? _raw_spin_unlock_irqrestore+0x38/0x60 [169125.396888] [] ? trace_hardirqs_off_caller+0x1f/0xb9 [169125.397986] [] finish_ordered_fn+0x15/0x17 [btrfs] [169125.399122] [] normal_work_helper+0x14c/0x32a [btrfs] [169125.400300] [] btrfs_endio_write_helper+0x12/0x14 [btrfs] [169125.401450] [] process_one_work+0x24a/0x4ac [169125.402631] [] worker_thread+0x206/0x2c2 [169125.403622] [] ? rescuer_thread+0x2cb/0x2cb [169125.404693] [] kthread+0xef/0xf7 [169125.405727] [] ? kthread_parkme+0x24/0x24 [169125.406808] [] ret_from_fork+0x3f/0x70 [169125.407834] [] ? kthread_parkme+0x24/0x24 [169125.408840] ---[ end trace 6ee4342a5722b119 ]--- [169125.409654] BTRFS: error (device sdc) in
Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
Chris Mason wrote on 2015/10/27 01:14 -0400: On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote: Filipe Manana wrote on 2015/10/25 14:39 +: On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruowrote: Add new function btrfs_add_delayed_qgroup_reserve() function to record how much space is reserved for that extent. As btrfs only accounts qgroup at run_delayed_refs() time, so newly allocated extent should keep the reserved space until then. So add needed function with related members to do it. Signed-off-by: Qu Wenruo --- v2: None v3: None --- fs/btrfs/delayed-ref.c | 29 + fs/btrfs/delayed-ref.h | 14 ++ 2 files changed, 43 insertions(+) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ac3e81d..bd9b63b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(_ref->ref_list); head_ref->processing = 0; head_ref->total_ref_mod = count_mod; + head_ref->qgroup_reserved = 0; + head_ref->qgroup_ref_root = 0; /* Record qgroup extent info if provided */ if (qrecord) { @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, return 0; } +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info, +struct btrfs_trans_handle *trans, +u64 ref_root, u64 bytenr, u64 num_bytes) +{ + struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_delayed_ref_head *ref_head; + int ret = 0; + + if (!fs_info->quota_enabled || !is_fstree(ref_root)) + return 0; + + delayed_refs = >transaction->delayed_refs; + + spin_lock(_refs->lock); + ref_head = find_ref_head(_refs->href_root, bytenr, 0); + if (!ref_head) { + ret = -ENOENT; + goto out; + } Hi Qu, So while running btrfs/063, with qgroups enabled (I modified the test to enable qgroups), ran into this 2 times: [169125.246506] BTRFS info (device sdc): disk space caching is enabled [169125.363164] [ cut here ] [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929 btrfs_finish_ordered_io+0x347/0x4eb [btrfs]() [169125.367702] BTRFS: Transaction aborted (error -2) [169125.368830] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4 psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod e1000 virtio [last unloaded: btrfs] [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 Hi Filipe, Although not related to the bug report, I'm a little interested in your testing kernel. Are you testing integration-4.4 from Chris repo? Or 4.3-rc from mainline repo with my qgroup reserve patchset applied? Although integration-4.4 already merged qgroup reserve patchset, but it's causing some strange bug like over decrease data sinfo->bytes_may_use, mainly in generic/127 testcase. But if qgroup reserve patchset is rebased to integration-4.3 (I did all my old tests based on that), no generic/127 problem at all. Did I mismerge things? -chris Not sure yet. But at least some patches in 4.3 is not in integration-4.4, like the following patch: btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size I'll continue testing and bisecting to see what triggers the strange WARN_ON() in integration-4.4. -- Oct 27 11:05:00 vmware kernel: WARNING: CPU: 4 PID: 13711 at fs/btrfs//extent-tree.c:4171 btrfs_free_reserved_data_space_noquota+0x175/0x190 [btrfs]() Oct 27 11:05:00 vmware kernel: Modules linked in: btrfs(OE) fuse vfat msdos fat xfs binfmt_misc bridge stp llc dm_snapshot dm_bufio dm_flakey loop iptable_nat nf_conntrack_ipv4 nf _defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_filter ip_tables dm_mirror dm_region_hash dm_log xor dm_mod crc32c_intel vmw_balloon raid6_pq nfsd vmw_vmci i2c_piix4 shpchp auth_rpcgss acpi_cpufreq nfs_acl lockd grace sunrpc ext4 mbcache jbd2 sd_mod vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix vmxnet3 libata vmw_pvscsi floppy [last unloaded: btrfs] Oct 27 11:05:00 vmware kernel: CPU: 4 PID: 13711 Comm: fsx Tainted: G W OE 4.3.0-rc5+ #5 Oct 27 11:05:00 vmware kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/16/2013 Oct 27 11:05:00 vmware kernel: 2caf2373 88021f63b760 81302e73 Oct 27 11:05:00 vmware kernel:
Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote: > > > Filipe Manana wrote on 2015/10/25 14:39 +: > >On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruowrote: > >>Add new function btrfs_add_delayed_qgroup_reserve() function to record > >>how much space is reserved for that extent. > >> > >>As btrfs only accounts qgroup at run_delayed_refs() time, so newly > >>allocated extent should keep the reserved space until then. > >> > >>So add needed function with related members to do it. > >> > >>Signed-off-by: Qu Wenruo > >>--- > >>v2: > >> None > >>v3: > >> None > >>--- > >> fs/btrfs/delayed-ref.c | 29 + > >> fs/btrfs/delayed-ref.h | 14 ++ > >> 2 files changed, 43 insertions(+) > >> > >>diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > >>index ac3e81d..bd9b63b 100644 > >>--- a/fs/btrfs/delayed-ref.c > >>+++ b/fs/btrfs/delayed-ref.c > >>@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, > >> INIT_LIST_HEAD(_ref->ref_list); > >> head_ref->processing = 0; > >> head_ref->total_ref_mod = count_mod; > >>+ head_ref->qgroup_reserved = 0; > >>+ head_ref->qgroup_ref_root = 0; > >> > >> /* Record qgroup extent info if provided */ > >> if (qrecord) { > >>@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info > >>*fs_info, > >> return 0; > >> } > >> > >>+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info, > >>+struct btrfs_trans_handle *trans, > >>+u64 ref_root, u64 bytenr, u64 > >>num_bytes) > >>+{ > >>+ struct btrfs_delayed_ref_root *delayed_refs; > >>+ struct btrfs_delayed_ref_head *ref_head; > >>+ int ret = 0; > >>+ > >>+ if (!fs_info->quota_enabled || !is_fstree(ref_root)) > >>+ return 0; > >>+ > >>+ delayed_refs = >transaction->delayed_refs; > >>+ > >>+ spin_lock(_refs->lock); > >>+ ref_head = find_ref_head(_refs->href_root, bytenr, 0); > >>+ if (!ref_head) { > >>+ ret = -ENOENT; > >>+ goto out; > >>+ } > > > >Hi Qu, > > > >So while running btrfs/063, with qgroups enabled (I modified the test > >to enable qgroups), ran into this 2 times: > > > >[169125.246506] BTRFS info (device sdc): disk space caching is enabled > >[169125.363164] [ cut here ] > >[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929 > >btrfs_finish_ordered_io+0x347/0x4eb [btrfs]() > >[169125.367702] BTRFS: Transaction aborted (error -2) > >[169125.368830] Modules linked in: btrfs dm_flakey dm_mod > >crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs > >lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4 > >psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core > >serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom > >ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring > >scsi_mod e1000 virtio [last unloaded: btrfs] > >[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G > > W 4.3.0-rc5-btrfs-next-17+ #1 > > Hi Filipe, > > Although not related to the bug report, I'm a little interested in your > testing kernel. > > Are you testing integration-4.4 from Chris repo? > Or 4.3-rc from mainline repo with my qgroup reserve patchset applied? > > Although integration-4.4 already merged qgroup reserve patchset, but it's > causing some strange bug like over decrease data sinfo->bytes_may_use, > mainly in generic/127 testcase. > > But if qgroup reserve patchset is rebased to integration-4.3 (I did all my > old tests based on that), no generic/127 problem at all. Did I mismerge things? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recover btrfs volume which can only be mounded in read-only mode
Hugo Mills posted on Mon, 26 Oct 2015 09:24:57 + as excerpted: > On Mon, Oct 26, 2015 at 09:14:00AM +, Duncan wrote: >> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted: >> >>> I think PID-based solution is not the best one. Why not simply take a >>> random device? Then at least all drives in the volume are equally >>> loaded (in average). >> >> Nobody argues that the even/odd-PID-based read-scheduling solution is >> /optimal/, in a production sense at least. But [it's near ideal for >> testing, and "good enough" for the most general case]. > > For what it's worth, David tried implementing round-robin (IIRC) > some time ago, and found that it performed *worse* than the pid-based > system. (It may have been random, but memory says it was round-robin). What I'd like to know is what mdraid1 uses, and if btrfs can get that. Because some upgrades worth ago, after trying mdraid6 for the main system and mdraid0 for some parts (with mdraid1 for boot since grub1 could deal with it, but not the others), I eventually settled on 4-way mdraid1 for everything, using the same disks I had used for the raid6 and raid0. And I was rather blown away by the mdraid1 speed, in comparison, especially compared to raid0, which I thought would be better than raid1. I guess my use-case is multi-thread read-heavy enough that the whatever mdraid1 uses, I was getting upto four separate reads (one per spindle) going at once, while writes still happened at single-spindle speed as with SATA (as opposed to the older IDE, this was when SATA was still new), each spindle had its own channel and they could write in parallel with bottleneck being the speed at which the slowest of the four completed its write. So writes were single-spindle-speed, still far faster than the raid6 read-modify-write cycle, while reads... it really did appear to multitask one per spindle. Also, the mdraid1 may have actually taken into account spindle head location as well, and scheduled reads to the spindle with the head already positioned closest to the target, tho I'm not sure on that. But whatever mdraid1 scheduling does, I was totally astonished at how efficient it was, and it really did turn my thinking on most efficient raid choices upside down. So if btrfs could simply take that scheduler and modify it as necessary for btrfs specifics, provided the modifications weren't /too/ heavy (and the fact that btrfs does read-time checksum verification could very well mean the algorithm as directly adapted as possible may not reach anything like the same efficiency), I really do think that'd be the ideal. And of course it's freedomware code in the same kernel, so reusing the mdraid read-scheduler shouldn't be the problem it might be in other circumstances, tho the possible caveat of btrfs specific implementation issues does remain. And of course someone would have to take the time to adapt it to work with btrfs, which gets us back onto the practical side of things, the "opportunity rich, developer-time poor" situation that is btrfs coding reality, premature optimization, possibly doing it at the same time as N- way-mirroring, etc. But anyway, mdraid's raid1 read-scheduler really does seem to be impressively efficient, the benchmark to try to match, if possible. If that can be done by reusing some of the same code, so much the better. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: Prevent creation of filesystem with 'mixed bgs' and having differing sectorsize and nodesize.
On Wed, Oct 14, 2015 at 11:10:38PM +0530, Chandan Rajendra wrote: > mkfs.btrfs allows creation of Btrfs filesystem instances with mixed block > group feature enabled and having a sectorsize different from nodesize. > For e.g: > > [root@localhost btrfs-progs]# mkfs.btrfs -f -M -s 4096 -n 16384 /dev/loop0 > Forcing mixed metadata/data groups > btrfs-progs v3.19-rc2-404-gbbbd18e-dirty > See http://btrfs.wiki.kernel.org for more information. > > Performing full device TRIM (4.00GiB) ... > Label: (null) > UUID: c82b5720-6d88-4fa1-ac05-d0d4cb797fd5 > Node size: 16384 > Sector size:4096 > Filesystem size:4.00GiB > Block group profiles: > Data+Metadata:single8.00MiB > System: single4.00MiB > SSD detected: no > Incompat features: mixed-bg, extref, skinny-metadata > Number of devices: 1 > Devices: > IDSIZE PATH >1 4.00GiB /dev/loop6 > > This commit fixes the issue by setting BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS > feature bit before checking the validity of nodesize that was specified on the > command line. > > Signed-off-by: Chandan RajendraTest added and applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bad fs performance, IO freezes
Hi guys, I am running into really bad performance. Here's my setup: WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu 32-bit with kernel 4.0.4-040004-generic #201505171336. Single btrfs partition covering whole disk. Autodefrag is on. fstab line: UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 Sometimes when files are being modified or removed, I see btrfs-transacti eat 100% cpu; during this time no io operations succeed, that is, they're all stalled. You can't even ls on that fs. This happens for several minutes then normal operation resumes. There doesn't seem to be a rule to what will trigger this, other than opening a single file and reading usually works quite well. (say, watching a movie while all other programs are closed). But even moving files off the disks triggers some sort of bug. Just now I am moving a few files (just 30gb worth) onto another disk, and the bug triggers. So btrfs-transacti was eating my cpu for over 5 minutes and according to mv's output after this was done and cpu usage went back to normal what I was waiting for was for a tiny png file to be removed. This is pretty bad. I have tried defragmenting directories where files are being accessed and moved. This hasn't helped. This happens whether the FS is near full or not. It currently is near full but it wasn't before and it still did that. It still has about ~ 100GB free space now. The more things are happening the more often this bug gets triggered. So if I have utorrent running and its temporary downloads directory is there, its download speed graph will be a few spikes of running at several MB/sec separated by durations of 0 download speed. Nothing seems to show up in dmesg or syslog. I have asked in #btrfs but the suggestions ended up not fixing the issue (autodefrag, defrag dirs). Please advise what I should do with this issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
I get the same kind of muddy errors when I do a quota rescan on the filesystem or qgroup show on any subvolume on mine, and I know I don't have them enabled. On Mon, Oct 26, 2015 at 8:56 AM, cheater00 .wrote: > fwiw, I did this: > > sudo btrfs qgroup show /media/X > ERROR: can't perform the search - No such file or directory > ERROR: can't list qgroups: No such file or directory > > I assume this means no qgroups present, which means no quotas present. > Please correct me if I'm wrong. > So yes, the issue must lie elsewhere. > > On Mon, Oct 26, 2015 at 2:46 PM, cheater00 . wrote: >> I don't remember doing that, but just to exclude everything, how do I check? >> >> On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearson >> wrote: >>> AFAIK quotas aren't a mount option, but if you never enabled them and >>> created the qgroups by hand that's your answer and the issue must be >>> something else. >>> >>> On Mon, Oct 26, 2015 at 8:36 AM, cheater00 . wrote: There are no quotas. I haven't enabled them. I believe the fstab says that - could they be enabled in another way? How do I check for sure? The man page doesn't say how to check the status: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson wrote: > Accidentally didn't reply to the list the 1st time. > > I see the same issue when I have quotas enabled. If you have quotas > on, see if turning them off helps. > > On Mon, Oct 26, 2015 at 7:16 AM, cheater00 . wrote: >> Hi guys, >> I am running into really bad performance. Here's my setup: >> >> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >> 32-bit with kernel 4.0.4-040004-generic #201505171336. >> >> Single btrfs partition covering whole disk. >> >> Autodefrag is on. >> >> fstab line: >> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >> >> Sometimes when files are being modified or removed, I see >> btrfs-transacti eat 100% cpu; during this time no io operations >> succeed, that is, they're all stalled. You can't even ls on that fs. >> This happens for several minutes then normal operation resumes. There >> doesn't seem to be a rule to what will trigger this, other than >> opening a single file and reading usually works quite well. (say, >> watching a movie while all other programs are closed). But even moving >> files off the disks triggers some sort of bug. Just now I am moving a >> few files (just 30gb worth) onto another disk, and the bug triggers. >> So btrfs-transacti was eating my cpu for over 5 minutes and according >> to mv's output after this was done and cpu usage went back to normal >> what I was waiting for was for a tiny png file to be removed. This is >> pretty bad. >> >> I have tried defragmenting directories where files are being accessed >> and moved. This hasn't helped. >> >> This happens whether the FS is near full or not. It currently is near >> full but it wasn't before and it still did that. It still has about ~ >> 100GB free space now. >> >> The more things are happening the more often this bug gets triggered. >> So if I have utorrent running and its temporary downloads directory is >> there, its download speed graph will be a few spikes of running at >> several MB/sec separated by durations of 0 download speed. >> >> Nothing seems to show up in dmesg or syslog. >> >> I have asked in #btrfs but the suggestions ended up not fixing the >> issue (autodefrag, defrag dirs). >> >> Please advise what I should do with this issue. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random i/o error without error in dmesg
Hi FWIW, this sounds like what I've been seeing with dovecot. In case it's relevant, I'll try to explain. After some uptime, I'll see log messages like this: Okt 26 12:05:46 thetick dovecot[467]: imap(marcec): Error: pread() failed with file /home/marcec/.mdbox/mailboxes/BTRFS/dbox-Mails/dovecot.index.log: Input/output error Occasionally they go away by themselves, but usually I have to reboot to make them go away. This happens when getmail attempts to fetch mail, which fails due to the above error. After the reboot getmail succeeds again. As in Szalma's case, btrfs-scrub never reports anything wrong. I use LZO compression on the relevant file system, so I wanted to wait until kernel 4.1.11 before reporting this, but that hasn't hit Gentoo yet (and neither has 4.1.10, for some reason). I don't use quotas. According to the what I see in the systemd journal, the errors started on 2015-06-01 with kernel 3.19.8. Note that, strangely enough, I had been using that same version since 2015-05-23, so for more than a week before the error cropped up. I checked whether I made any changes to the configuration, and found this: diff --git a/kernels/kernel-config-3.19.8-gentoo b/kernels/kernel- config-3.19.8-gentoo index b061b31..8cf8eba 100644 --- a/kernels/kernel-config-3.19.8-gentoo +++ b/kernels/kernel-config-3.19.8-gentoo @@ -64,7 +64,7 @@ CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" -CONFIG_LOCALVERSION_AUTO=y +# CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y @@ -73,8 +73,8 @@ CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set -CONFIG_KERNEL_LZMA=y -# CONFIG_KERNEL_XZ is not set +# CONFIG_KERNEL_LZMA is not set +CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" @@ -132,7 +132,7 @@ CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y -# CONFIG_BSD_PROCESS_ACCT_V3 is not set +CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y The only change I can think of that might affect anything is CONFIG_BSD_PROCESS_ACCT_V3=y (I don't remember why exactly I set it). I can try without it set, but maybe the kernel configuration is a red herring? Anyway, the current state of the system is: # uname -r 4.1.9-gentoo-r1 # btrfs filesystem show / Label: 'MARCEC_ROOT' uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64 Total devices 1 FS bytes used 74.40GiB devid1 size 107.79GiB used 105.97GiB path /dev/sda1 btrfs-progs v4.2.2 # btrfs filesystem df / Data, single: total=98.94GiB, used=72.30GiB System, single: total=32.00MiB, used=20.00KiB Metadata, single: total=7.00GiB, used=2.10GiB GlobalReserve, single: total=512.00MiB, used=0.00B The filesystem is mounted as (leaving out subvolume mounts which use the same mount options): /dev/sda1 on / type btrfs (rw,noatime,compress=lzo,ssd,discard,space_cache) Greetings, -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: Bad fs performance, IO freezes
On 10/26/2015 08:16 PM, cheater00 . wrote: Hi guys, I am running into really bad performance. Here's my setup: WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu 32-bit with kernel 4.0.4-040004-generic #201505171336. Single btrfs partition covering whole disk. Autodefrag is on. fstab line: UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 Sometimes when files are being modified or removed, I see btrfs-transacti eat 100% cpu; during this time no io operations succeed, that is, they're all stalled. You can't even ls on that fs. This happens for several minutes then normal operation resumes. There doesn't seem to be a rule to what will trigger this, other than opening a single file and reading usually works quite well. (say, watching a movie while all other programs are closed). But even moving files off the disks triggers some sort of bug. Just now I am moving a few files (just 30gb worth) onto another disk, and the bug triggers. So btrfs-transacti was eating my cpu for over 5 minutes and according to mv's output after this was done and cpu usage went back to normal what I was waiting for was for a tiny png file to be removed. This is pretty bad. I have tried defragmenting directories where files are being accessed and moved. This hasn't helped. This happens whether the FS is near full or not. It currently is near full but it wasn't before and it still did that. It still has about ~ 100GB free space now. The more things are happening the more often this bug gets triggered. So if I have utorrent running and its temporary downloads directory is there, its download speed graph will be a few spikes of running at several MB/sec separated by durations of 0 download speed. Nothing seems to show up in dmesg or syslog. I have asked in #btrfs but the suggestions ended up not fixing the issue (autodefrag, defrag dirs). Please advise what I should do with this issue. It might be related to delayed ref rework, the last time I saw this kind of hanging problem about btrfs-transaction eating cpu is that because btrfs doesn't merge delayed refs, it'd be best to try the lastest kernel and if the issue is not resolved, then we can work out a reproducer and provide debugging. Thanks, Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
Thanks for the reply. What version did this go into? I'll try getting a prebuilt backport of the kernel, building source could slow things down considerably, but debs will not be available for the latest few minor versions I guess. So if you can tell me a min version, I'll try to find the latest deb newer than that, or I'll build if that's not available. On Mon, Oct 26, 2015 at 3:25 PM, Liu Bowrote: > On 10/26/2015 08:16 PM, cheater00 . wrote: >> >> Hi guys, >> I am running into really bad performance. Here's my setup: >> >> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >> 32-bit with kernel 4.0.4-040004-generic #201505171336. >> >> Single btrfs partition covering whole disk. >> >> Autodefrag is on. >> >> fstab line: >> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >> >> Sometimes when files are being modified or removed, I see >> btrfs-transacti eat 100% cpu; during this time no io operations >> succeed, that is, they're all stalled. You can't even ls on that fs. >> This happens for several minutes then normal operation resumes. There >> doesn't seem to be a rule to what will trigger this, other than >> opening a single file and reading usually works quite well. (say, >> watching a movie while all other programs are closed). But even moving >> files off the disks triggers some sort of bug. Just now I am moving a >> few files (just 30gb worth) onto another disk, and the bug triggers. >> So btrfs-transacti was eating my cpu for over 5 minutes and according >> to mv's output after this was done and cpu usage went back to normal >> what I was waiting for was for a tiny png file to be removed. This is >> pretty bad. >> >> I have tried defragmenting directories where files are being accessed >> and moved. This hasn't helped. >> >> This happens whether the FS is near full or not. It currently is near >> full but it wasn't before and it still did that. It still has about ~ >> 100GB free space now. >> >> The more things are happening the more often this bug gets triggered. >> So if I have utorrent running and its temporary downloads directory is >> there, its download speed graph will be a few spikes of running at >> several MB/sec separated by durations of 0 download speed. >> >> Nothing seems to show up in dmesg or syslog. >> >> I have asked in #btrfs but the suggestions ended up not fixing the >> issue (autodefrag, defrag dirs). >> >> Please advise what I should do with this issue. > > > It might be related to delayed ref rework, the last time I saw this kind of > hanging problem about btrfs-transaction eating cpu is that because btrfs > doesn't merge delayed refs, it'd be best to try the lastest kernel and if > the issue is not resolved, then we can work out a reproducer and provide > debugging. > > Thanks, > > Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] btrfs-progs: Add all missing close_ctree and btrfs_close_all_devices
On Mon, Oct 26, 2015 at 06:28:17PM +0800, Zhao Lei wrote: > This patch add all missing close_ctree and btrfs_close_all_devices > to several tools in btrfs progs, to avoid memory leak. > > Changelog v1->v2: > Move btrfs_close_all_devices() from cmd-XXX into btrfs.c to make > code simple, and avoid similar problem in cmd-XXX in future. > > Zhao Lei (5): > btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs > command > btrfs-progs: Remove all btrfs_close_all_devices in sub-command > btrfs-progs: Add all missing btrfs_close_all_devices to standalone > tools > btrfs-progs: Add missing close_ctree to btrfs-select-super.c > btrfs-progs: use system's default path for math.h Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
I don't remember doing that, but just to exclude everything, how do I check? On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearsonwrote: > AFAIK quotas aren't a mount option, but if you never enabled them and > created the qgroups by hand that's your answer and the issue must be > something else. > > On Mon, Oct 26, 2015 at 8:36 AM, cheater00 . wrote: >> There are no quotas. I haven't enabled them. I believe the fstab says >> that - could they be enabled in another way? How do I check for sure? >> The man page doesn't say how to check the status: >> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota >> >> On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson >> wrote: >>> Accidentally didn't reply to the list the 1st time. >>> >>> I see the same issue when I have quotas enabled. If you have quotas >>> on, see if turning them off helps. >>> >>> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 . wrote: Hi guys, I am running into really bad performance. Here's my setup: WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu 32-bit with kernel 4.0.4-040004-generic #201505171336. Single btrfs partition covering whole disk. Autodefrag is on. fstab line: UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 Sometimes when files are being modified or removed, I see btrfs-transacti eat 100% cpu; during this time no io operations succeed, that is, they're all stalled. You can't even ls on that fs. This happens for several minutes then normal operation resumes. There doesn't seem to be a rule to what will trigger this, other than opening a single file and reading usually works quite well. (say, watching a movie while all other programs are closed). But even moving files off the disks triggers some sort of bug. Just now I am moving a few files (just 30gb worth) onto another disk, and the bug triggers. So btrfs-transacti was eating my cpu for over 5 minutes and according to mv's output after this was done and cpu usage went back to normal what I was waiting for was for a tiny png file to be removed. This is pretty bad. I have tried defragmenting directories where files are being accessed and moved. This hasn't helped. This happens whether the FS is near full or not. It currently is near full but it wasn't before and it still did that. It still has about ~ 100GB free space now. The more things are happening the more often this bug gets triggered. So if I have utorrent running and its temporary downloads directory is there, its download speed graph will be a few spikes of running at several MB/sec separated by durations of 0 download speed. Nothing seems to show up in dmesg or syslog. I have asked in #btrfs but the suggestions ended up not fixing the issue (autodefrag, defrag dirs). Please advise what I should do with this issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
Accidentally didn't reply to the list the 1st time. I see the same issue when I have quotas enabled. If you have quotas on, see if turning them off helps. On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .wrote: > Hi guys, > I am running into really bad performance. Here's my setup: > > WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu > 32-bit with kernel 4.0.4-040004-generic #201505171336. > > Single btrfs partition covering whole disk. > > Autodefrag is on. > > fstab line: > UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 > > Sometimes when files are being modified or removed, I see > btrfs-transacti eat 100% cpu; during this time no io operations > succeed, that is, they're all stalled. You can't even ls on that fs. > This happens for several minutes then normal operation resumes. There > doesn't seem to be a rule to what will trigger this, other than > opening a single file and reading usually works quite well. (say, > watching a movie while all other programs are closed). But even moving > files off the disks triggers some sort of bug. Just now I am moving a > few files (just 30gb worth) onto another disk, and the bug triggers. > So btrfs-transacti was eating my cpu for over 5 minutes and according > to mv's output after this was done and cpu usage went back to normal > what I was waiting for was for a tiny png file to be removed. This is > pretty bad. > > I have tried defragmenting directories where files are being accessed > and moved. This hasn't helped. > > This happens whether the FS is near full or not. It currently is near > full but it wasn't before and it still did that. It still has about ~ > 100GB free space now. > > The more things are happening the more often this bug gets triggered. > So if I have utorrent running and its temporary downloads directory is > there, its download speed graph will be a few spikes of running at > several MB/sec separated by durations of 0 download speed. > > Nothing seems to show up in dmesg or syslog. > > I have asked in #btrfs but the suggestions ended up not fixing the > issue (autodefrag, defrag dirs). > > Please advise what I should do with this issue. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
fwiw, I did this: sudo btrfs qgroup show /media/X ERROR: can't perform the search - No such file or directory ERROR: can't list qgroups: No such file or directory I assume this means no qgroups present, which means no quotas present. Please correct me if I'm wrong. So yes, the issue must lie elsewhere. On Mon, Oct 26, 2015 at 2:46 PM, cheater00 .wrote: > I don't remember doing that, but just to exclude everything, how do I check? > > On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearson > wrote: >> AFAIK quotas aren't a mount option, but if you never enabled them and >> created the qgroups by hand that's your answer and the issue must be >> something else. >> >> On Mon, Oct 26, 2015 at 8:36 AM, cheater00 . wrote: >>> There are no quotas. I haven't enabled them. I believe the fstab says >>> that - could they be enabled in another way? How do I check for sure? >>> The man page doesn't say how to check the status: >>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota >>> >>> On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson >>> wrote: Accidentally didn't reply to the list the 1st time. I see the same issue when I have quotas enabled. If you have quotas on, see if turning them off helps. On Mon, Oct 26, 2015 at 7:16 AM, cheater00 . wrote: > Hi guys, > I am running into really bad performance. Here's my setup: > > WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu > 32-bit with kernel 4.0.4-040004-generic #201505171336. > > Single btrfs partition covering whole disk. > > Autodefrag is on. > > fstab line: > UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 > > Sometimes when files are being modified or removed, I see > btrfs-transacti eat 100% cpu; during this time no io operations > succeed, that is, they're all stalled. You can't even ls on that fs. > This happens for several minutes then normal operation resumes. There > doesn't seem to be a rule to what will trigger this, other than > opening a single file and reading usually works quite well. (say, > watching a movie while all other programs are closed). But even moving > files off the disks triggers some sort of bug. Just now I am moving a > few files (just 30gb worth) onto another disk, and the bug triggers. > So btrfs-transacti was eating my cpu for over 5 minutes and according > to mv's output after this was done and cpu usage went back to normal > what I was waiting for was for a tiny png file to be removed. This is > pretty bad. > > I have tried defragmenting directories where files are being accessed > and moved. This hasn't helped. > > This happens whether the FS is near full or not. It currently is near > full but it wasn't before and it still did that. It still has about ~ > 100GB free space now. > > The more things are happening the more often this bug gets triggered. > So if I have utorrent running and its temporary downloads directory is > there, its download speed graph will be a few spikes of running at > several MB/sec separated by durations of 0 download speed. > > Nothing seems to show up in dmesg or syslog. > > I have asked in #btrfs but the suggestions ended up not fixing the > issue (autodefrag, defrag dirs). > > Please advise what I should do with this issue. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()
On 26/10/15 03:39, Christoph Hellwig wrote: > On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote: >> I'm a bit worried about the sparse expansion and default reflinking >> which might preclude cp(1) from using this call in most cases, but I will >> test and try to use it. coreutils has heuristics for determining if files >> are remote, which we might use to restrict to that use case. > > Can you explain why reflinking and hole expansion are an issue if done > locally and not if done remotely? I'd really like to make the call as > usable as possible for everyone, but we really need clear sem�ntics for > that. Fair point on local vs remote. I was just assuming that remote copy offload would not do reflinking on the backend, or at least wasn't an exposed option over the remote interface. I get the impression that you think reflinking should be hidden from the user, i.e. cp(1) should not have had the --reflink option (for the last 6 years)? I'm not convinced of that, and even so I think lower level interfaces would benefit from finer grained options. This would be especially useful since there is no general interface to reflink at present. I was happy with the reflink control options, thinking the extra control could allow cp to use this by default. > Also note that Annas current series allows for hole filling - any decent > implementation should not do them, but that's really a quality of > implementation and not an interface issue. I think you're saying the default `cp --sparse=auto` operation could rely on copy_file_range(...complete file...), while cp --sparse={always,never} would have to iterate over the file, punching or filling holes as appropriate. I thought Anna indicated differently wrt splice filling holes by default. TBH I'm not clear on the semantics of the current implementation, so need to test the above in various cases. thanks, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
There are no quotas. I haven't enabled them. I believe the fstab says that - could they be enabled in another way? How do I check for sure? The man page doesn't say how to check the status: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearsonwrote: > Accidentally didn't reply to the list the 1st time. > > I see the same issue when I have quotas enabled. If you have quotas > on, see if turning them off helps. > > On Mon, Oct 26, 2015 at 7:16 AM, cheater00 . wrote: >> Hi guys, >> I am running into really bad performance. Here's my setup: >> >> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >> 32-bit with kernel 4.0.4-040004-generic #201505171336. >> >> Single btrfs partition covering whole disk. >> >> Autodefrag is on. >> >> fstab line: >> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >> >> Sometimes when files are being modified or removed, I see >> btrfs-transacti eat 100% cpu; during this time no io operations >> succeed, that is, they're all stalled. You can't even ls on that fs. >> This happens for several minutes then normal operation resumes. There >> doesn't seem to be a rule to what will trigger this, other than >> opening a single file and reading usually works quite well. (say, >> watching a movie while all other programs are closed). But even moving >> files off the disks triggers some sort of bug. Just now I am moving a >> few files (just 30gb worth) onto another disk, and the bug triggers. >> So btrfs-transacti was eating my cpu for over 5 minutes and according >> to mv's output after this was done and cpu usage went back to normal >> what I was waiting for was for a tiny png file to be removed. This is >> pretty bad. >> >> I have tried defragmenting directories where files are being accessed >> and moved. This hasn't helped. >> >> This happens whether the FS is near full or not. It currently is near >> full but it wasn't before and it still did that. It still has about ~ >> 100GB free space now. >> >> The more things are happening the more often this bug gets triggered. >> So if I have utorrent running and its temporary downloads directory is >> there, its download speed graph will be a few spikes of running at >> several MB/sec separated by durations of 0 download speed. >> >> Nothing seems to show up in dmesg or syslog. >> >> I have asked in #btrfs but the suggestions ended up not fixing the >> issue (autodefrag, defrag dirs). >> >> Please advise what I should do with this issue. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
AFAIK quotas aren't a mount option, but if you never enabled them and created the qgroups by hand that's your answer and the issue must be something else. On Mon, Oct 26, 2015 at 8:36 AM, cheater00 .wrote: > There are no quotas. I haven't enabled them. I believe the fstab says > that - could they be enabled in another way? How do I check for sure? > The man page doesn't say how to check the status: > https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota > > On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson > wrote: >> Accidentally didn't reply to the list the 1st time. >> >> I see the same issue when I have quotas enabled. If you have quotas >> on, see if turning them off helps. >> >> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 . wrote: >>> Hi guys, >>> I am running into really bad performance. Here's my setup: >>> >>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >>> 32-bit with kernel 4.0.4-040004-generic #201505171336. >>> >>> Single btrfs partition covering whole disk. >>> >>> Autodefrag is on. >>> >>> fstab line: >>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >>> >>> Sometimes when files are being modified or removed, I see >>> btrfs-transacti eat 100% cpu; during this time no io operations >>> succeed, that is, they're all stalled. You can't even ls on that fs. >>> This happens for several minutes then normal operation resumes. There >>> doesn't seem to be a rule to what will trigger this, other than >>> opening a single file and reading usually works quite well. (say, >>> watching a movie while all other programs are closed). But even moving >>> files off the disks triggers some sort of bug. Just now I am moving a >>> few files (just 30gb worth) onto another disk, and the bug triggers. >>> So btrfs-transacti was eating my cpu for over 5 minutes and according >>> to mv's output after this was done and cpu usage went back to normal >>> what I was waiting for was for a tiny png file to be removed. This is >>> pretty bad. >>> >>> I have tried defragmenting directories where files are being accessed >>> and moved. This hasn't helped. >>> >>> This happens whether the FS is near full or not. It currently is near >>> full but it wasn't before and it still did that. It still has about ~ >>> 100GB free space now. >>> >>> The more things are happening the more often this bug gets triggered. >>> So if I have utorrent running and its temporary downloads directory is >>> there, its download speed graph will be a few spikes of running at >>> several MB/sec separated by durations of 0 download speed. >>> >>> Nothing seems to show up in dmesg or syslog. >>> >>> I have asked in #btrfs but the suggestions ended up not fixing the >>> issue (autodefrag, defrag dirs). >>> >>> Please advise what I should do with this issue. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
I have located 4.3.0-rc7 binaries which I will now try. On Mon, Oct 26, 2015 at 3:38 PM, cheater00 .wrote: > Thanks for the reply. What version did this go into? I'll try getting > a prebuilt backport of the kernel, building source could slow things > down considerably, but debs will not be available for the latest few > minor versions I guess. So if you can tell me a min version, I'll try > to find the latest deb newer than that, or I'll build if that's not > available. > > On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo wrote: >> On 10/26/2015 08:16 PM, cheater00 . wrote: >>> >>> Hi guys, >>> I am running into really bad performance. Here's my setup: >>> >>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu >>> 32-bit with kernel 4.0.4-040004-generic #201505171336. >>> >>> Single btrfs partition covering whole disk. >>> >>> Autodefrag is on. >>> >>> fstab line: >>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 >>> >>> Sometimes when files are being modified or removed, I see >>> btrfs-transacti eat 100% cpu; during this time no io operations >>> succeed, that is, they're all stalled. You can't even ls on that fs. >>> This happens for several minutes then normal operation resumes. There >>> doesn't seem to be a rule to what will trigger this, other than >>> opening a single file and reading usually works quite well. (say, >>> watching a movie while all other programs are closed). But even moving >>> files off the disks triggers some sort of bug. Just now I am moving a >>> few files (just 30gb worth) onto another disk, and the bug triggers. >>> So btrfs-transacti was eating my cpu for over 5 minutes and according >>> to mv's output after this was done and cpu usage went back to normal >>> what I was waiting for was for a tiny png file to be removed. This is >>> pretty bad. >>> >>> I have tried defragmenting directories where files are being accessed >>> and moved. This hasn't helped. >>> >>> This happens whether the FS is near full or not. It currently is near >>> full but it wasn't before and it still did that. It still has about ~ >>> 100GB free space now. >>> >>> The more things are happening the more often this bug gets triggered. >>> So if I have utorrent running and its temporary downloads directory is >>> there, its download speed graph will be a few spikes of running at >>> several MB/sec separated by durations of 0 download speed. >>> >>> Nothing seems to show up in dmesg or syslog. >>> >>> I have asked in #btrfs but the suggestions ended up not fixing the >>> issue (autodefrag, defrag dirs). >>> >>> Please advise what I should do with this issue. >> >> >> It might be related to delayed ref rework, the last time I saw this kind of >> hanging problem about btrfs-transaction eating cpu is that because btrfs >> doesn't merge delayed refs, it'd be best to try the lastest kernel and if >> the issue is not resolved, then we can work out a reproducer and provide >> debugging. >> >> Thanks, >> >> Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
So far I cannot reproduce. If I don't post again this means the issue has been fixed by updating the kernel. On Mon, Oct 26, 2015 at 4:40 PM, cheater00 .wrote: > I have located 4.3.0-rc7 binaries which I will now try. > > On Mon, Oct 26, 2015 at 3:38 PM, cheater00 . wrote: >> Thanks for the reply. What version did this go into? I'll try getting >> a prebuilt backport of the kernel, building source could slow things >> down considerably, but debs will not be available for the latest few >> minor versions I guess. So if you can tell me a min version, I'll try >> to find the latest deb newer than that, or I'll build if that's not >> available. >> >> On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo wrote: >>> On 10/26/2015 08:16 PM, cheater00 . wrote: Hi guys, I am running into really bad performance. Here's my setup: WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu 32-bit with kernel 4.0.4-040004-generic #201505171336. Single btrfs partition covering whole disk. Autodefrag is on. fstab line: UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 Sometimes when files are being modified or removed, I see btrfs-transacti eat 100% cpu; during this time no io operations succeed, that is, they're all stalled. You can't even ls on that fs. This happens for several minutes then normal operation resumes. There doesn't seem to be a rule to what will trigger this, other than opening a single file and reading usually works quite well. (say, watching a movie while all other programs are closed). But even moving files off the disks triggers some sort of bug. Just now I am moving a few files (just 30gb worth) onto another disk, and the bug triggers. So btrfs-transacti was eating my cpu for over 5 minutes and according to mv's output after this was done and cpu usage went back to normal what I was waiting for was for a tiny png file to be removed. This is pretty bad. I have tried defragmenting directories where files are being accessed and moved. This hasn't helped. This happens whether the FS is near full or not. It currently is near full but it wasn't before and it still did that. It still has about ~ 100GB free space now. The more things are happening the more often this bug gets triggered. So if I have utorrent running and its temporary downloads directory is there, its download speed graph will be a few spikes of running at several MB/sec separated by durations of 0 download speed. Nothing seems to show up in dmesg or syslog. I have asked in #btrfs but the suggestions ended up not fixing the issue (autodefrag, defrag dirs). Please advise what I should do with this issue. >>> >>> >>> It might be related to delayed ref rework, the last time I saw this kind of >>> hanging problem about btrfs-transaction eating cpu is that because btrfs >>> doesn't merge delayed refs, it'd be best to try the lastest kernel and if >>> the issue is not resolved, then we can work out a reproducer and provide >>> debugging. >>> >>> Thanks, >>> >>> Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad fs performance, IO freezes
I do not experience btrfs-transacti going up to 100% for minutes at a time now (not reproduced yet) but I have it spiking up to say 30% for a short while and everything jags during that time. So, say, if I am watching youtube, the sound cuts out and the video drops out for a bit. And if I'm typing, then what I typed during that time gets lost, like if I never typed that. I have also connected the same HDD bay with a USB3 cable instead of USB2. It's on an USB3 port. So it's running via USB3 now. On Mon, Oct 26, 2015 at 6:43 PM, cheater00 .wrote: > So far I cannot reproduce. If I don't post again this means the issue > has been fixed by updating the kernel. > > On Mon, Oct 26, 2015 at 4:40 PM, cheater00 . wrote: >> I have located 4.3.0-rc7 binaries which I will now try. >> >> On Mon, Oct 26, 2015 at 3:38 PM, cheater00 . wrote: >>> Thanks for the reply. What version did this go into? I'll try getting >>> a prebuilt backport of the kernel, building source could slow things >>> down considerably, but debs will not be available for the latest few >>> minor versions I guess. So if you can tell me a min version, I'll try >>> to find the latest deb newer than that, or I'll build if that's not >>> available. >>> >>> On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo wrote: On 10/26/2015 08:16 PM, cheater00 . wrote: > > Hi guys, > I am running into really bad performance. Here's my setup: > > WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu > 32-bit with kernel 4.0.4-040004-generic #201505171336. > > Single btrfs partition covering whole disk. > > Autodefrag is on. > > fstab line: > UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0 > > Sometimes when files are being modified or removed, I see > btrfs-transacti eat 100% cpu; during this time no io operations > succeed, that is, they're all stalled. You can't even ls on that fs. > This happens for several minutes then normal operation resumes. There > doesn't seem to be a rule to what will trigger this, other than > opening a single file and reading usually works quite well. (say, > watching a movie while all other programs are closed). But even moving > files off the disks triggers some sort of bug. Just now I am moving a > few files (just 30gb worth) onto another disk, and the bug triggers. > So btrfs-transacti was eating my cpu for over 5 minutes and according > to mv's output after this was done and cpu usage went back to normal > what I was waiting for was for a tiny png file to be removed. This is > pretty bad. > > I have tried defragmenting directories where files are being accessed > and moved. This hasn't helped. > > This happens whether the FS is near full or not. It currently is near > full but it wasn't before and it still did that. It still has about ~ > 100GB free space now. > > The more things are happening the more often this bug gets triggered. > So if I have utorrent running and its temporary downloads directory is > there, its download speed graph will be a few spikes of running at > several MB/sec separated by durations of 0 download speed. > > Nothing seems to show up in dmesg or syslog. > > I have asked in #btrfs but the suggestions ended up not fixing the > issue (autodefrag, defrag dirs). > > Please advise what I should do with this issue. It might be related to delayed ref rework, the last time I saw this kind of hanging problem about btrfs-transaction eating cpu is that because btrfs doesn't merge delayed refs, it'd be best to try the lastest kernel and if the issue is not resolved, then we can work out a reproducer and provide debugging. Thanks, Liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html