Re: btrfs double send
Filipe, Its still not perfect. Here I can do sequential sends a few times then I get something like this: [root@grover snap]# sh -x brh + base=/snap/shot ++ date +%Y-%V-%u_%m-%d_%H:%M + stamp=2015-43-6_10-24_10:24 + btrfs subv snapshot -r /snap/homevol /snap/shot.2015-43-6_10-24_10:24 Create a readonly snapshot of '/snap/homevol' in '/snap/shot.2015-43-6_10-24_10:24' + sync + /usr/bin/time -f 'send %E' nice -19 ionice -c3 btrfs send -v -p /snap/shot /snap/shot.2015-43-6_10-24_10:24 + btrfs receive -v /backup/snap At subvol /snap/shot.2015-43-6_10-24_10:24 At snapshot shot.2015-43-6_10-24_10:24 receiving snapshot shot.2015-43-6_10-24_10:24 uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, ctransid=625267 parent_uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, parent_ctransid=619893 ERROR: unlink home/ed/Maildir/.spam/dovecot.index.log.2 failed. No such file or directory Command terminated by signal 13 send 1:49.64 + sync + btrfs subv delete /snap/shot Delete subvolume (no-commit): '/snap/shot' + sync + mv /snap/shot.2015-43-6_10-24_10:24 /snap/shot so there is another bug hiding in the code, its usually something in my Maildir that shows in the log. Please let me know if there is anything you would like me to try. I am running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with btrfs fixes removed. On top of this are a few patches from this list. TIA Ed Tomlinson On Saturday, October 24, 2015 1:52:21 PM EDT, Filipe Manana wrote: On Sat, Oct 24, 2015 at 6:36 PM,wrote: Hello, I would like to do backups based on btrfs send/receive. So I though I would do a transfer over portable HDD and then incremental sends (using -p) over network. Initial : ... It's a bug, we got it recently reported and fixed. The fix is in Chris' integration branch for the upcoming 4.4 merge window: http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.4=b96b1db039ebc584d03a9933b279e0d3e704c528 cheers Best regards, Lubos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v2] Btrfs: fix regression when running delayed references
From: Filipe MananaIn the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the extent at offset A, so a new extent buffer at offset A is allocated. This creates delayed reference Ref3 for bytenr A,
[PATCH 2/2 v2] Btrfs: fix regression running delayed references when using qgroups
From: Filipe MananaIn the kernel 4.2 merge window we had a big changes to the implementation of delayed references and qgroups which made the no_quota field of delayed references not used anymore. More specifically the no_quota field is not used anymore as of: commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.") Leaving the no_quota field actually prevents delayed references from getting merged, which in turn cause the following BUG_ON(), at fs/btrfs/extent-tree.c, to be hit when qgroups are enabled: static int run_delayed_tree_ref(...) { (...) BUG_ON(node->ref_mod != 1); (...) } This happens on a scenario like the following: 1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. 2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota. 3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref2 is incompatible due to Ref2->no_quota != Ref3->no_quota. 4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref3 is incompatible due to Ref3->no_quota != Ref4->no_quota. 5) We run delayed references, trigger merging of delayed references, through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs(). 6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and all other conditions are satisfied too. So Ref1 gets a ref_mod value of 2. 7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and all other conditions are satisfied too. So Ref2 gets a ref_mod value of 2. 8) Ref1 and Ref2 aren't merged, because they have different values for their no_quota field. 9) Delayed reference Ref1 is picked for running (select_delayed_ref() always prefers references with an action == BTRFS_ADD_DELAYED_REF). So run_delayed_tree_ref() is called for Ref1 which triggers the BUG_ON because Ref1->red_mod != 1 (equals 2). So fix this by removing the no_quota field, as it's not used anymore as of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism."). This fixes the remainder of problems several people have been having when running delayed references, mostly while a balance is running in parallel, on a 4.2+ kernel. Very special thanks to Stéphane Lesimple for helping debugging this issue and testing this fix on his multi terabyte filesystem (which took more than one day to balance alone, plus fsck, etc). Reported-by: Stéphane Lesimple Tested-by: Stéphane Lesimple Cc: sta...@vger.kernel.org # 4.2+ Signed-off-by: Filipe Manana --- V2: There was no v1. But since the first patch in the series became a v2, made this one a v2 from the start. fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-ref.c | 28 +++ fs/btrfs/delayed-ref.h | 7 ++ fs/btrfs/extent-tree.c | 45 +--- fs/btrfs/file.c| 10 fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 62 +- fs/btrfs/relocation.c | 16 ++--- fs/btrfs/tree-log.c| 2 +- 9 files changed, 44 insertions(+), 134 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 49bc792..43eb981 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3398,7 +3398,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, - u64 owner, u64 offset, int no_quota); + u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len, int delalloc); @@ -3411,7 +3411,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, -u64 root_objectid, u64 owner, u64 offset, int no_quota); +u64 root_objectid, u64 owner, u64 offset); int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 4832943..7832031 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -220,7 +220,7 @@ static bool merge_ref(struct
Re: Exclusive quota of snapshot exceeded despite no space used
On 25.10.2015 02:44, Qu Wenruo wrote: > > > 在 2015年10月23日 23:05, Johannes Henninger 写道: >> On 23.10.2015 01:47, Qu Wenruo wrote: >>> 在 2015年10月23日 04:38, Johannes Henninger 写道: I'm having a weird problem with snapshots and exclusive quotas. After creating a snapshot of a subvolume and setting an exclusive quota of 50MB for the snapshot, everything seems to work fine. I can write approximately 50MB before the quota kicks in. However, if I create a snapshot, set an exclusive quota and just wait for some time, I suddenly cannot even create an empty file because I'm getting a "quota exceeded" error. The time until the bug appears seems to vary. During the waiting time, I'm changing neither the snapshot nor the original subvolume. "qgroup show -e" reports an exclusive use of only a few kilobytes for the snapshot, which is nowhere near the limit. Steps to reproduce (/media/extern is a fresh and empty btrfs partition): Enable quota and create an empty subvolume: root@t420:/media/extern# btrfs quota enable . root@t420:/media/extern# btrfs subvolume create sub Create subvolume './sub' Snapshot the subvolume and set a limit: root@t420:/media/extern# btrfs subvolume snapshot sub snap Create a snapshot of 'sub' in './snap' root@t420:/media/extern# cd snap/ root@t420:/media/extern/snap# btrfs qgroup limit -e 50M . Sometimes it takes "longer" for the quota to kick in, so I'm touching a file every 5 minutes here: root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘7’: Disk quota exceeded ^C root@t420:/media/extern/snap# btrfs qgroup show -e . qgroupid rfer excl max_excl 0/5 16.00KiB 16.00KiB none 0/25716.00KiB 16.00KiB none 0/25816.00KiB 16.00KiB 50.00MiB Any idea why this happens? >>> BTW, to make btrfs qgroup show work, it's better to call sync before >>> qgroup show. >>> >>> It's a known bug that even after qgroup accounting rework, qgroup >>> reserve still has bug and can cause reserved space to underflow, >>> making such problem happen. >>> >>> For such case, btrfs qgroup show won't help as reserved space is not >>> shown in the output. >>> >>> One workaround would be, umount the filesystem and mount again. >>> Which will reset the underflow reserved space and work for sometime. >>> >>> If it's OK for you to recompile the kernel, you can try the following >>> patchset: >>> [PATCH v3 00/21] Rework btrfs qgroup reserved space framework >>> >>> Which should solve the problem. >>> >>> Thanks, >>> Qu >>> >> >> Thanks a lot for your reply! >> >> While remounting the filesystem fixes the issue temporary, it doesn't >> take very long for the bug to happen again so it's not really a >> workaround I can work with. >> >> I did recompile the kernel using your patches, but unfortunately the >> problem still appears. >> >> Thanks, >> Johannes >> > Interesting, just touching file will cause EQUOTA is quite a big problem. > > I'll try to reproduce it with my patchset and see what really caused > the problem. > The problem seems to do with snapshot qgroup hacking. > But I'm not completely sure yet. > > BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K? > 16K is the correct number with only 6 empty files, just in case. > > Thanks, > Qu I ran my example from the first mail again and managed to write 7 files this time, "qgroup show" still shows 16kB after sync: root@t420:/media/extern/snap# btrfs qg limit -e 50M . root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘8’: Disk quota exceeded ^C root@t420:/media/extern/snap# sync root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- root@t420:/media/extern/snap# btrfs fi sync . FSSync '.' root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- By the way, I don't if its relevant but the problem is
Re: Exclusive quota of snapshot exceeded despite no space used
在 2015年10月25日 19:10, Johannes Henninger 写道: On 25.10.2015 02:44, Qu Wenruo wrote: 在 2015年10月23日 23:05, Johannes Henninger 写道: On 23.10.2015 01:47, Qu Wenruo wrote: 在 2015年10月23日 04:38, Johannes Henninger 写道: I'm having a weird problem with snapshots and exclusive quotas. After creating a snapshot of a subvolume and setting an exclusive quota of 50MB for the snapshot, everything seems to work fine. I can write approximately 50MB before the quota kicks in. However, if I create a snapshot, set an exclusive quota and just wait for some time, I suddenly cannot even create an empty file because I'm getting a "quota exceeded" error. The time until the bug appears seems to vary. During the waiting time, I'm changing neither the snapshot nor the original subvolume. "qgroup show -e" reports an exclusive use of only a few kilobytes for the snapshot, which is nowhere near the limit. Steps to reproduce (/media/extern is a fresh and empty btrfs partition): Enable quota and create an empty subvolume: root@t420:/media/extern# btrfs quota enable . root@t420:/media/extern# btrfs subvolume create sub Create subvolume './sub' Snapshot the subvolume and set a limit: root@t420:/media/extern# btrfs subvolume snapshot sub snap Create a snapshot of 'sub' in './snap' root@t420:/media/extern# cd snap/ root@t420:/media/extern/snap# btrfs qgroup limit -e 50M . Sometimes it takes "longer" for the quota to kick in, so I'm touching a file every 5 minutes here: root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘7’: Disk quota exceeded ^C root@t420:/media/extern/snap# btrfs qgroup show -e . qgroupid rfer excl max_excl 0/5 16.00KiB 16.00KiB none 0/25716.00KiB 16.00KiB none 0/25816.00KiB 16.00KiB 50.00MiB Any idea why this happens? BTW, to make btrfs qgroup show work, it's better to call sync before qgroup show. It's a known bug that even after qgroup accounting rework, qgroup reserve still has bug and can cause reserved space to underflow, making such problem happen. For such case, btrfs qgroup show won't help as reserved space is not shown in the output. One workaround would be, umount the filesystem and mount again. Which will reset the underflow reserved space and work for sometime. If it's OK for you to recompile the kernel, you can try the following patchset: [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Which should solve the problem. Thanks, Qu Thanks a lot for your reply! While remounting the filesystem fixes the issue temporary, it doesn't take very long for the bug to happen again so it's not really a workaround I can work with. I did recompile the kernel using your patches, but unfortunately the problem still appears. Thanks, Johannes Interesting, just touching file will cause EQUOTA is quite a big problem. I'll try to reproduce it with my patchset and see what really caused the problem. The problem seems to do with snapshot qgroup hacking. But I'm not completely sure yet. BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K? 16K is the correct number with only 6 empty files, just in case. Thanks, Qu I ran my example from the first mail again and managed to write 7 files this time, "qgroup show" still shows 16kB after sync: root@t420:/media/extern/snap# btrfs qg limit -e 50M . root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘8’: Disk quota exceeded ^C root@t420:/media/extern/snap# sync root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- root@t420:/media/extern/snap# btrfs fi sync . FSSync '.' root@t420:/media/extern/snap# btrfs qgroup show -pcre . qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/25716.00KiB 16.00KiB none none --- --- 0/25816.00KiB 16.00KiB none 50.00MiB --- --- Thanks for the report. At least, the rfer/excl number is correct. So the problem is limited to qgroup reserved space. I'll debug it tomorrow to see what's going wrong inside the reserved space. By the way, I don't if its relevant but the problem is not limited to exclusive quotas, but also happens when setting a "referenced" limit (qgroup limit
Re: btrfs double send
On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinsonwrote: > Filipe, > > Its still not perfect. Here I can do sequential sends a few times then I > get something like this: > > [root@grover snap]# sh -x brh > + base=/snap/shot > ++ date +%Y-%V-%u_%m-%d_%H:%M > + stamp=2015-43-6_10-24_10:24 > + btrfs subv snapshot -r /snap/homevol /snap/shot.2015-43-6_10-24_10:24 > Create a readonly snapshot of '/snap/homevol' in > '/snap/shot.2015-43-6_10-24_10:24' > + sync > + /usr/bin/time -f 'send %E' nice -19 ionice -c3 btrfs send -v -p /snap/shot > /snap/shot.2015-43-6_10-24_10:24 > + btrfs receive -v /backup/snap > At subvol /snap/shot.2015-43-6_10-24_10:24 > At snapshot shot.2015-43-6_10-24_10:24 > receiving snapshot shot.2015-43-6_10-24_10:24 > uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, ctransid=625267 > parent_uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, parent_ctransid=619893 > ERROR: unlink home/ed/Maildir/.spam/dovecot.index.log.2 failed. No such file > or directory > Command terminated by signal 13 > send 1:49.64 > + sync > + btrfs subv delete /snap/shot > Delete subvolume (no-commit): '/snap/shot' > + sync > + mv /snap/shot.2015-43-6_10-24_10:24 /snap/shot > > > so there is another bug hiding in the code, its usually something in my > Maildir that shows in the log. Different and unrelated problem. Yes, we know there are still some problems regarding send issuing invalid/outdated paths to the send stream. Nothing to do with incorrect uuids in the send stream. > > Please let me know if there is anything you would like me to try. I am > running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with > btrfs fixes removed. On top of this are a few patches from this list. > > TIA > Ed Tomlinson > > On Saturday, October 24, 2015 1:52:21 PM EDT, Filipe Manana wrote: >> >> On Sat, Oct 24, 2015 at 6:36 PM, wrote: >>> >>> Hello, >>> >>> I would like to do backups based on btrfs send/receive. >>> >>> So I though I would do a transfer over portable HDD and then incremental >>> sends (using -p) over network. >>> >>> Initial : ... >> >> >> It's a bug, we got it recently reported and fixed. >> The fix is in Chris' integration branch for the upcoming 4.4 merge window: >> >> >> http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.4=b96b1db039ebc584d03a9933b279e0d3e704c528 >> >> cheers >>> >>> >>> Best regards, >>> >>> Lubos >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
Filipe Manana wrote on 2015/10/25 14:39 +: On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruowrote: Add new function btrfs_add_delayed_qgroup_reserve() function to record how much space is reserved for that extent. As btrfs only accounts qgroup at run_delayed_refs() time, so newly allocated extent should keep the reserved space until then. So add needed function with related members to do it. Signed-off-by: Qu Wenruo --- v2: None v3: None --- fs/btrfs/delayed-ref.c | 29 + fs/btrfs/delayed-ref.h | 14 ++ 2 files changed, 43 insertions(+) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ac3e81d..bd9b63b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(_ref->ref_list); head_ref->processing = 0; head_ref->total_ref_mod = count_mod; + head_ref->qgroup_reserved = 0; + head_ref->qgroup_ref_root = 0; /* Record qgroup extent info if provided */ if (qrecord) { @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, return 0; } +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info, +struct btrfs_trans_handle *trans, +u64 ref_root, u64 bytenr, u64 num_bytes) +{ + struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_delayed_ref_head *ref_head; + int ret = 0; + + if (!fs_info->quota_enabled || !is_fstree(ref_root)) + return 0; + + delayed_refs = >transaction->delayed_refs; + + spin_lock(_refs->lock); + ref_head = find_ref_head(_refs->href_root, bytenr, 0); + if (!ref_head) { + ret = -ENOENT; + goto out; + } Hi Qu, So while running btrfs/063, with qgroups enabled (I modified the test to enable qgroups), ran into this 2 times: Thanks for the test. I also want a method to enable quota for all other btrfs/generic tests, but have no good idea other than modifing testcase itself. Any good ideas? [169125.246506] BTRFS info (device sdc): disk space caching is enabled [169125.363164] [ cut here ] [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929 btrfs_finish_ordered_io+0x347/0x4eb [btrfs]() [169125.367702] BTRFS: Transaction aborted (error -2) [169125.368830] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4 psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod e1000 virtio [last unloaded: btrfs] [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [169125.382167] 88007ef2bc28 812566f4 88007ef2bc70 [169125.383643] 88007ef2bc60 8104d0a6 a03cac33 8801f5ca6db0 [169125.385197] 8802c6c7ee98 880122bc1000 fffe 88007ef2bcc8 [169125.386691] Call Trace: [169125.387194] [] dump_stack+0x4e/0x79 [169125.388205] [] warn_slowpath_common+0x9f/0xb8 [169125.389386] [] ? btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.390837] [] warn_slowpath_fmt+0x48/0x50 [169125.391839] [] ? unpin_extent_cache+0xbe/0xcc [btrfs] [169125.392973] [] btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.395714] [] ? _raw_spin_unlock_irqrestore+0x38/0x60 [169125.396888] [] ? trace_hardirqs_off_caller+0x1f/0xb9 [169125.397986] [] finish_ordered_fn+0x15/0x17 [btrfs] [169125.399122] [] normal_work_helper+0x14c/0x32a [btrfs] [169125.400300] [] btrfs_endio_write_helper+0x12/0x14 [btrfs] [169125.401450] [] process_one_work+0x24a/0x4ac [169125.402631] [] worker_thread+0x206/0x2c2 [169125.403622] [] ? rescuer_thread+0x2cb/0x2cb [169125.404693] [] kthread+0xef/0xf7 [169125.405727] [] ? kthread_parkme+0x24/0x24 [169125.406808] [] ret_from_fork+0x3f/0x70 [169125.407834] [] ? kthread_parkme+0x24/0x24 [169125.408840] ---[ end trace 6ee4342a5722b119 ]--- [169125.409654] BTRFS: error (device sdc) in btrfs_finish_ordered_io:2929: errno=-2 No such entry So what you have here is racy: btrfs_finish_ordered_io() joins existing transaction (or starts a new one) insert_reserved_file_extent() btrfs_alloc_reserved_file_extent() --> creates delayed ref *** delayed refs are run, someone called btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref
[PATCH v5] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
Ancient qgroup code call memcpy() on a extent buffer and use it for leaf iteration. As extent buffer contains lock, pointers to pages, it's never sane to do such copy. The following bug may be caused by this insane operation: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(), called in reading key from the copied extent_buffer. This patch will use btrfs_clone_extent_buffer() to a better copy of extent buffer to deal such case. Reported-by: Stephane LesimpleSuggested-by: Filipe Manana Signed-off-by: Qu Wenruo --- v2: Follow the parameter change in previous patch. v3: None v4: Use btrfs_clone_extent_buffer() other than introducing new facilities v5: Change slot = path->slots[0] postion. --- fs/btrfs/qgroup.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 158633c..31d1934 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) */ static int qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, - struct btrfs_trans_handle *trans, - struct extent_buffer *scratch_leaf) + struct btrfs_trans_handle *trans) { struct btrfs_key found; + struct extent_buffer *scratch_leaf = NULL; struct ulist *roots = NULL; struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); u64 num_bytes; @@ -2233,7 +2233,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, fs_info->qgroup_rescan_progress.objectid = found.objectid + 1; btrfs_get_tree_mod_seq(fs_info, _mod_seq_elem); - memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); + scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]); + if (!scratch_leaf) { + ret = -ENOMEM; + mutex_unlock(_info->qgroup_rescan_lock); + goto out; + } + extent_buffer_get(scratch_leaf); + btrfs_tree_read_lock(scratch_leaf); + btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK); slot = path->slots[0]; btrfs_release_path(path); mutex_unlock(_info->qgroup_rescan_lock); @@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, goto out; } out: + if (scratch_leaf) { + btrfs_tree_read_unlock_blocking(scratch_leaf); + free_extent_buffer(scratch_leaf); + } btrfs_put_tree_mod_seq(fs_info, _mod_seq_elem); return ret; @@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) qgroup_rescan_work); struct btrfs_path *path; struct btrfs_trans_handle *trans = NULL; - struct extent_buffer *scratch_leaf = NULL; int err = -ENOMEM; int ret = 0; path = btrfs_alloc_path(); if (!path) goto out; - scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); - if (!scratch_leaf) - goto out; err = 0; while (!err) { @@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) if (!fs_info->quota_enabled) { err = -EINTR; } else { - err = qgroup_rescan_leaf(fs_info, path, trans, -scratch_leaf); + err = qgroup_rescan_leaf(fs_info, path, trans); } if (err > 0) btrfs_commit_transaction(trans, fs_info->fs_root); @@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) } out: - kfree(scratch_leaf); btrfs_free_path(path); mutex_lock(_info->qgroup_rescan_lock); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 0/4] VFS: In-kernel copy system call
On Sat, Oct 24, 2015 at 11:52:37AM -0500, Eric Biggers wrote: > A few comments: > > > if (!(file_in->f_mode & FMODE_READ) || > > !(file_out->f_mode & FMODE_WRITE) || > > (file_out->f_flags & O_APPEND) || > > !file_out->f_op) > > return -EBADF; > > Isn't 'f_op' always non-NULL? Yes, its is. > If the destination file cannot be append-only, shouldn't this be documented? Yes. > > if (inode_in->i_sb != inode_out->i_sb || > > file_in->f_path.mnt != file_out->f_path.mnt) > > return -EXDEV; > > Doesn't the same mount already imply the same superblock? It does. > > /* > > * copy_file_range() differs from regular file read and write in that it > > * specifically allows return partial success. When it does so is up to > > * the copy_file_range method. > > */ > > What does this mean? I thought that read() and write() can also return > partial success. The syscalls are allow to return short from the standards perspective, but if you actually do that for regualr fiels hell will break loose as applications don't expect it. That's why we can't actually ever do it. > Should FMODE_PREAD or FMODE_PWRITE access be checked if the user specifies > their > own 'off_in' or 'off_out', respectively? Maybe. > What is supposed to happen if the user passes provides a file descriptor to a > non-regular file, such as a block device or char device? If they implement the proper method I see no reason why we can't support it. For block device we only have one file_ops instance and mapping that to the bio-level XCOPY abstraction that's been posted a couple of times would seem sensible. For character devices that's entirely up to the driver. > If the 'in' file has fewer than 'len' bytes remaining until EOF, what is the > expected behavior? It looks like the btrfs implementation has different > behavior from the pagecache implementation. Good question. I'd say failure is the right way to handle a mismatching length. > It appears the btrfs implementation has alignment restrictions --- where is > this > documented and how will users know what alignment to use? For actual clones we're limited to the file system block size (NFS adds an extra attribute for the clone block size), but for regaulr copies we probably should fall back to the dumb implementation if we don't match it. > Are copies within the same file permitted and can the ranges overlap? The man > page doesn't say. For clones we defintively want to support it, but for copies I'd be tempted to say no. Does anyone else have an opinion? > It looks like the initial patch defines __NR_copy_file_range for the ARM > architecture but doesn't actually hook that system call up for ARM; why is > that? Looks like that should be dropped. I really wish we had a way to just wire up syscalls everywhere. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()
On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote: > I'm a bit worried about the sparse expansion and default reflinking > which might preclude cp(1) from using this call in most cases, but I will > test and try to use it. coreutils has heuristics for determining if files > are remote, which we might use to restrict to that use case. Can you explain why reflinking and hole expansion are an issue if done locally and not if done remotely? I'd really like to make the call as usable as possible for everyone, but we really need clear sem?ntics for that. Also note that Annas current series allows for hole filling - any decent implementation should not do them, but that's really a quality of implementation and not an interface issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: zero out delayed node upon allocation
On Oct 25, 2015, at 3:50 PM, Alexandru Moise <00moses.alexande...@gmail.com> wrote: >>> This allows us to trim out half of btrfs_init_delayed_node() which >>> is now reduntant. >> >> It's redundant if kmem_cache_zalloc is used, but you haven't >> documented that doing so is now required. For all of these changes >> you've posted, if they're to be accepted, I'd really prefer to set up >> the slab with a constructor instead. Then we don't need to worry >> about such guarantees. The object returned via kmem_cache_alloc will >> always be properly initialized. > > Well I wouldn't say it's *required* just makes this particular piece > of code neater, since we memset-zero the node's inode_item _anyways_. But the rest of the delayed node still needs to be initialized. That's happening implicitly with your patch via zalloc but if anyone ever adds a new allocation from that cache, they'll need to know that zalloc is required for proper initializatiom. Documenting that is one approach. Constructors are another. > I like the constructor idea though, do you suggest I should invest in > that idea? Well, see below. >> >> This makes assumptions about atomic_t and what atomic_set does that >> aren't guaranteed to be true. When accessors/mutators are part of the >> API they should be used. >> >> - -Jeff > > You're right, taking out that atomic_set was really stupid. I'll > resent the patch with a proper explanation in the commit message and > put the atomic_set back. > > Unless you feel that the change is rather pointless, I'll gladly back > off :-). I don't want to dissuade new contributors but there's plenty of work to be done before we start worrying about cleaning initializers. These aren't hot paths (but if they were, the implicit memset of the whole object might have negative effects.) The immediate impact of cleanup patches like these is code churn so that developers with outstanding patch sets need to rebase on top of new context. It's a normal part of the development cycle but it's more work for not a lot of benefit. So, if your goal is to contribute to btrfs, I'd suggest digging into the code. Try out changes. Break it and figure out how you broke it. It's like learning how to navigate in a new city: by getting lost you learn more. :) -Jeff-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: zero out delayed node upon allocation
On Sun, Oct 25, 2015 at 01:33:45PM -0400, Jeff Mahoney wrote: > > On Oct 25, 2015, at 3:50 PM, Alexandru Moise <00moses.alexande...@gmail.com> > wrote: > > >>> This allows us to trim out half of btrfs_init_delayed_node() which > >>> is now reduntant. > >> > >> It's redundant if kmem_cache_zalloc is used, but you haven't > >> documented that doing so is now required. For all of these changes > >> you've posted, if they're to be accepted, I'd really prefer to set up > >> the slab with a constructor instead. Then we don't need to worry > >> about such guarantees. The object returned via kmem_cache_alloc will > >> always be properly initialized. > > > > Well I wouldn't say it's *required* just makes this particular piece > > of code neater, since we memset-zero the node's inode_item _anyways_. > > But the rest of the delayed node still needs to be initialized. That's > happening implicitly with your patch via zalloc but if anyone ever adds a new > allocation from that cache, they'll need to know that zalloc is required for > proper initializatiom. Documenting that is one approach. Constructors are > another. > > > I like the constructor idea though, do you suggest I should invest in > > that idea? > > Well, see below. > > >> > >> This makes assumptions about atomic_t and what atomic_set does that > >> aren't guaranteed to be true. When accessors/mutators are part of the > >> API they should be used. > >> > >> - -Jeff > > > > You're right, taking out that atomic_set was really stupid. I'll > > resent the patch with a proper explanation in the commit message and > > put the atomic_set back. > > > > Unless you feel that the change is rather pointless, I'll gladly back > > off :-). > > I don't want to dissuade new contributors but there's plenty of work to be > done before we start worrying about cleaning initializers. These aren't hot > paths (but if they were, the implicit memset of the whole object might have > negative effects.) The immediate impact of cleanup patches like these is code > churn so that developers with outstanding patch sets need to rebase on top of > new context. It's a normal part of the development cycle but it's more work > for not a lot of benefit. > > So, if your goal is to contribute to btrfs, I'd suggest digging into the > code. Try out changes. Break it and figure out how you broke it. It's like > learning how to navigate in a new city: by getting lost you learn more. :) > > -Jeff Yeah I see your point, I take back version 2 of this patch and will follow your advice. Much appreciated! Alex -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: zero out delayed node upon allocation
> > This allows us to trim out half of btrfs_init_delayed_node() which > > is now reduntant. > > It's redundant if kmem_cache_zalloc is used, but you haven't > documented that doing so is now required. For all of these changes > you've posted, if they're to be accepted, I'd really prefer to set up > the slab with a constructor instead. Then we don't need to worry > about such guarantees. The object returned via kmem_cache_alloc will > always be properly initialized. Well I wouldn't say it's *required* just makes this particular piece of code neater, since we memset-zero the node's inode_item _anyways_. I like the constructor idea though, do you suggest I should invest in that idea? > > This makes assumptions about atomic_t and what atomic_set does that > aren't guaranteed to be true. When accessors/mutators are part of the > API they should be used. > > - -Jeff You're right, taking out that atomic_set was really stupid. I'll resent the patch with a proper explanation in the commit message and put the atomic_set back. Unless you feel that the change is rather pointless, I'll gladly back off :-). Regards, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: zero out delayed node upon allocation
It's slightly cleaner to zero-out the delayed node upon allocation than to do it by hand in btrfs_init_delayed_node() for a few members Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- v2: Thanks Jeff Mahoney for pointing out the mistake of removing the atomic_set in v1 of the patch. atomic_set() call restored. fs/btrfs/delayed-inode.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index a2ae427..0a00c4f 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -54,16 +54,11 @@ static inline void btrfs_init_delayed_node( delayed_node->root = root; delayed_node->inode_id = inode_id; atomic_set(_node->refs, 0); - delayed_node->count = 0; - delayed_node->flags = 0; delayed_node->ins_root = RB_ROOT; delayed_node->del_root = RB_ROOT; mutex_init(_node->mutex); - delayed_node->index_cnt = 0; INIT_LIST_HEAD(_node->n_list); INIT_LIST_HEAD(_node->p_list); - delayed_node->bytes_reserved = 0; - memset(_node->inode_item, 0, sizeof(delayed_node->inode_item)); } static inline int btrfs_is_continuous_delayed_item( @@ -132,7 +127,7 @@ again: if (node) return node; - node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS); + node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); if (!node) return ERR_PTR(-ENOMEM); btrfs_init_delayed_node(node, root, ino); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs double send
Filipe, I realize its another bug. Two additional pieces of info that might help. One, btrfs-progs was at 4.1.2 (I missed a tag= in my git pull). Second is that I have been able to recreate this issue three times over a period of two to three days (tring again with 4.2.3). My fs is probably a good testcase for any patches that appear. Meanwhile I've been falling back to rsync which always works but is so much slower. TIA Ed On Sunday, October 25, 2015 9:42:54 AM EDT, Filipe Manana wrote: On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinsonwrote: Filipe, Its still not perfect. Here I can do sequential sends a few times then I get something like this: [root@grover snap]# sh -x brh + base=/snap/shot ... Different and unrelated problem. Yes, we know there are still some problems regarding send issuing invalid/outdated paths to the send stream. Nothing to do with incorrect uuids in the send stream. Please let me know if there is anything you would like me to try. I am running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with btrfs fixes removed. On top of this are a few patches from this list. TIA Ed Tomlinson ... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs double send
On Sun, Oct 25, 2015 at 2:10 PM, Ed Tomlinsonwrote: > Filipe, > > I realize its another bug. Two additional pieces of info that might help. > One, btrfs-progs was at 4.1.2 (I missed a tag= in my git pull). Btrfs-progs' version is irrelevant. The problem is the send side generating incorrect paths in the send stream (as I mentioned earlier), so it's exclusively a problem in the kernel. > Second > is that I have been able to recreate this issue three times over a period of > two to three days (tring again with 4.2.3). My fs is probably a good > testcase for any patches that appear. Yeah. If you can find out what happened to the directory/file in question between the parent and send snapshots (was it renamed/moved? was it deleted and then a new one with the same name/location created?) that helps. Otherwise looking at btrfs-debug-tree and see the trees for each snapshot helps too, but you need to figure out the number of the inode where it's failing (sometimes btrfs receive -vv is enough to figure this out). thanks > > Meanwhile I've been falling back to rsync which always works but is so much > slower. > > TIA > Ed > On Sunday, October 25, 2015 9:42:54 AM EDT, Filipe Manana wrote: >> >> On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinson wrote: >>> >>> Filipe, >>> >>> Its still not perfect. Here I can do sequential sends a few times then I >>> get something like this: >>> >>> [root@grover snap]# sh -x brh >>> + base=/snap/shot ... >> >> >> Different and unrelated problem. >> Yes, we know there are still some problems regarding send issuing >> invalid/outdated paths to the send stream. Nothing to do with >> incorrect uuids in the send stream. >> >>> Please let me know if there is anything you would like me to try. I am >>> running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches >>> with >>> btrfs fixes removed. On top of this are a few patches from this list. >>> >>> TIA >>> Ed Tomlinson ... >> >> >> >> > -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruowrote: > Add new function btrfs_add_delayed_qgroup_reserve() function to record > how much space is reserved for that extent. > > As btrfs only accounts qgroup at run_delayed_refs() time, so newly > allocated extent should keep the reserved space until then. > > So add needed function with related members to do it. > > Signed-off-by: Qu Wenruo > --- > v2: > None > v3: > None > --- > fs/btrfs/delayed-ref.c | 29 + > fs/btrfs/delayed-ref.h | 14 ++ > 2 files changed, 43 insertions(+) > > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > index ac3e81d..bd9b63b 100644 > --- a/fs/btrfs/delayed-ref.c > +++ b/fs/btrfs/delayed-ref.c > @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, > INIT_LIST_HEAD(_ref->ref_list); > head_ref->processing = 0; > head_ref->total_ref_mod = count_mod; > + head_ref->qgroup_reserved = 0; > + head_ref->qgroup_ref_root = 0; > > /* Record qgroup extent info if provided */ > if (qrecord) { > @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info > *fs_info, > return 0; > } > > +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info, > +struct btrfs_trans_handle *trans, > +u64 ref_root, u64 bytenr, u64 num_bytes) > +{ > + struct btrfs_delayed_ref_root *delayed_refs; > + struct btrfs_delayed_ref_head *ref_head; > + int ret = 0; > + > + if (!fs_info->quota_enabled || !is_fstree(ref_root)) > + return 0; > + > + delayed_refs = >transaction->delayed_refs; > + > + spin_lock(_refs->lock); > + ref_head = find_ref_head(_refs->href_root, bytenr, 0); > + if (!ref_head) { > + ret = -ENOENT; > + goto out; > + } Hi Qu, So while running btrfs/063, with qgroups enabled (I modified the test to enable qgroups), ran into this 2 times: [169125.246506] BTRFS info (device sdc): disk space caching is enabled [169125.363164] [ cut here ] [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929 btrfs_finish_ordered_io+0x347/0x4eb [btrfs]() [169125.367702] BTRFS: Transaction aborted (error -2) [169125.368830] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4 psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod e1000 virtio [last unloaded: btrfs] [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [169125.382167] 88007ef2bc28 812566f4 88007ef2bc70 [169125.383643] 88007ef2bc60 8104d0a6 a03cac33 8801f5ca6db0 [169125.385197] 8802c6c7ee98 880122bc1000 fffe 88007ef2bcc8 [169125.386691] Call Trace: [169125.387194] [] dump_stack+0x4e/0x79 [169125.388205] [] warn_slowpath_common+0x9f/0xb8 [169125.389386] [] ? btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.390837] [] warn_slowpath_fmt+0x48/0x50 [169125.391839] [] ? unpin_extent_cache+0xbe/0xcc [btrfs] [169125.392973] [] btrfs_finish_ordered_io+0x347/0x4eb [btrfs] [169125.395714] [] ? _raw_spin_unlock_irqrestore+0x38/0x60 [169125.396888] [] ? trace_hardirqs_off_caller+0x1f/0xb9 [169125.397986] [] finish_ordered_fn+0x15/0x17 [btrfs] [169125.399122] [] normal_work_helper+0x14c/0x32a [btrfs] [169125.400300] [] btrfs_endio_write_helper+0x12/0x14 [btrfs] [169125.401450] [] process_one_work+0x24a/0x4ac [169125.402631] [] worker_thread+0x206/0x2c2 [169125.403622] [] ? rescuer_thread+0x2cb/0x2cb [169125.404693] [] kthread+0xef/0xf7 [169125.405727] [] ? kthread_parkme+0x24/0x24 [169125.406808] [] ret_from_fork+0x3f/0x70 [169125.407834] [] ? kthread_parkme+0x24/0x24 [169125.408840] ---[ end trace 6ee4342a5722b119 ]--- [169125.409654] BTRFS: error (device sdc) in btrfs_finish_ordered_io:2929: errno=-2 No such entry So what you have here is racy: btrfs_finish_ordered_io() joins existing transaction (or starts a new one) insert_reserved_file_extent() btrfs_alloc_reserved_file_extent() --> creates delayed ref *** delayed refs are run, someone called btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head is removed ** btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref head, returns -ENOENT and
[PATCH] btrfs: zero out delayed node upon allocation
This allows us to trim out half of btrfs_init_delayed_node() which is now reduntant. Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- fs/btrfs/delayed-inode.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index a2ae427..7aa84be 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -53,17 +53,11 @@ static inline void btrfs_init_delayed_node( { delayed_node->root = root; delayed_node->inode_id = inode_id; - atomic_set(_node->refs, 0); - delayed_node->count = 0; - delayed_node->flags = 0; delayed_node->ins_root = RB_ROOT; delayed_node->del_root = RB_ROOT; mutex_init(_node->mutex); - delayed_node->index_cnt = 0; INIT_LIST_HEAD(_node->n_list); INIT_LIST_HEAD(_node->p_list); - delayed_node->bytes_reserved = 0; - memset(_node->inode_item, 0, sizeof(delayed_node->inode_item)); } static inline int btrfs_is_continuous_delayed_item( @@ -132,7 +126,7 @@ again: if (node) return node; - node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS); + node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); if (!node) return ERR_PTR(-ENOMEM); btrfs_init_delayed_node(node, root, ino); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: zero out delayed node upon allocation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/25/15 1:48 PM, Alexandru Moise wrote: > This allows us to trim out half of btrfs_init_delayed_node() which > is now reduntant. It's redundant if kmem_cache_zalloc is used, but you haven't documented that doing so is now required. For all of these changes you've posted, if they're to be accepted, I'd really prefer to set up the slab with a constructor instead. Then we don't need to worry about such guarantees. The object returned via kmem_cache_alloc will always be properly initialized. > Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- > fs/btrfs/delayed-inode.c | 8 +--- 1 file changed, 1 > insertion(+), 7 deletions(-) > > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c > index a2ae427..7aa84be 100644 --- a/fs/btrfs/delayed-inode.c +++ > b/fs/btrfs/delayed-inode.c @@ -53,17 +53,11 @@ static inline void > btrfs_init_delayed_node( { delayed_node->root = root; > delayed_node->inode_id = inode_id; - > atomic_set(_node->refs, 0); This makes assumptions about atomic_t and what atomic_set does that aren't guaranteed to be true. When accessors/mutators are part of the API they should be used. - -Jeff > - delayed_node->count = 0; - delayed_node->flags = 0; > delayed_node->ins_root = RB_ROOT; delayed_node->del_root = > RB_ROOT; mutex_init(_node->mutex); - > delayed_node->index_cnt = 0; > INIT_LIST_HEAD(_node->n_list); > INIT_LIST_HEAD(_node->p_list); - > delayed_node->bytes_reserved = 0; - > memset(_node->inode_item, 0, > sizeof(delayed_node->inode_item)); } > > static inline int btrfs_is_continuous_delayed_item( @@ -132,7 > +126,7 @@ again: if (node) return node; > > - node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS); +node = > kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); if (!node) return > ERR_PTR(-ENOMEM); btrfs_init_delayed_node(node, root, ino); > - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2 iQIcBAEBCAAGBQJWLP5RAAoJEB57S2MheeWyQMoQAMUNGQFl14cwbJWtSsNjwZJy eRaSckKcRinc4zoFxATSn4R/UQLadFNg+tAtSnNgoJngpg/nn6Fc19vMS5QVpU1c OhPgYUVkBTjLxhBaY6Iivinbx4+pjFaEP+e6LoqXefelB6Wq17e8DavQcDU5gH2e 3vzXLt8kKa1hlC5SbtjW59cLgHzrclR6h4qeO00R7vWPA4YyIzOKOnYznrgZ69wS V97YBmcyucrl+EENYFe5BYBJmP5LojHjsxfPtF2zzZnvCpSrWPZ5Num/2yCofFir 7HNs86wLEBjsEUhkSjZ/7u3XATjVOXNHVUsknOOMB/B6zk9ngnK51mp7a2Ib41PM nMbNOVHhauumY9sf++RWjN53HTGH8aAlp7B0JIRTceGrvKdT2YGjW8iWwIrtlCE0 bSoEXRNcMYFvR+0we/WKVi4sX0ysRfhDbCuQeWPEJ6R4DbzmtqjCmmHQnZyVpwex 1ZxT8yvFVJvQHEl2E+mHOneRgdVuj89j8AZJ99idIHKa9b60tHAlkq+zXoW0odIt qV09bYoc7J87W5JvTgKVks13ovwKmM7Ij7zHFMBLVO/BGrd+zFFW1qq8r6GJBKG9 PaXt+HUHBCt4TpvAza5JZ0Yn7tYvALHizj+IZStTw2Kz1vQC5KEnjZIMa3LcKrsx r2oQLSfGkcsf8EcIzDX1 =qINk -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: pass proper enum type to start_transaction()
Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- fs/btrfs/transaction.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index a5b0644..cb50f53 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -618,17 +618,20 @@ struct btrfs_trans_handle *btrfs_start_transaction_lflush( struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root) { - return start_transaction(root, 0, TRANS_JOIN, 0); + return start_transaction(root, 0, TRANS_JOIN, +BTRFS_RESERVE_NO_FLUSH); } struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root *root) { - return start_transaction(root, 0, TRANS_JOIN_NOLOCK, 0); + return start_transaction(root, 0, TRANS_JOIN_NOLOCK, +BTRFS_RESERVE_NO_FLUSH); } struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root) { - return start_transaction(root, 0, TRANS_USERSPACE, 0); + return start_transaction(root, 0, TRANS_USERSPACE, +BTRFS_RESERVE_NO_FLUSH); } /* @@ -646,7 +649,8 @@ struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root */ struct btrfs_trans_handle *btrfs_attach_transaction(struct btrfs_root *root) { - return start_transaction(root, 0, TRANS_ATTACH, 0); + return start_transaction(root, 0, TRANS_ATTACH, +BTRFS_RESERVE_NO_FLUSH); } /* @@ -661,7 +665,8 @@ btrfs_attach_transaction_barrier(struct btrfs_root *root) { struct btrfs_trans_handle *trans; - trans = start_transaction(root, 0, TRANS_ATTACH, 0); + trans = start_transaction(root, 0, TRANS_ATTACH, + BTRFS_RESERVE_NO_FLUSH); if (IS_ERR(trans) && PTR_ERR(trans) == -ENOENT) btrfs_wait_for_commit(root, 0); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html