trying to balance, filesystem keeps going read-only.
I have a file system of four 5TB drives. Well, one drive is 8TB with a 5TB partition.. the rest are 5TB drives. I created the initial btrfs file system on on drive. rsync'd data to it. added another drive. rsync'd data. added a third drive, rsync'd data. Added a four drive, trying to balance. The file system gets an error and I have to reboot to get the file system out of read only. I dont think it is hardware issue..but It could be... or it could be some kind bug in btrfs? To recover.. I've been comment out the btrfs in fstab, shutdown.. power off.. verify cable connections. Power on. verify all the devices are present.. remount. This kind of balance was successful: btrfs balance start -dusage=55 /mnt/magenta/ but this gets the error below and goes into Read-only: btrfs filesystem balance /mnt/magenta/ my kernel now is: Linux ubuntu 4.2.0-17-lowlatency #21-Ubuntu SMP PREEMPT Fri Oct 23 20:40:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux ~# btrfs filesystem show Label: 'uv000' uuid: f974efbd-82cb-4812-a3b3-eb12ae470f2c Total devices 4 FS bytes used 11.17TiB devid1 size 4.77TiB used 4.26TiB path /dev/sdf1 devid2 size 4.55TiB used 4.09TiB path /dev/sde devid3 size 4.55TiB used 2.74TiB path /dev/sdb devid4 size 4.55TiB used 96.03GiB path /dev/sdg Label: 'LinuxSampler' uuid: 55aa3bf2-f319-42c7-8f1d-20ee35a91f9a Total devices 1 FS bytes used 555.80GiB devid1 size 2.51TiB used 558.04GiB path /dev/sdf2 # btrfs filesystem df /mnt/btrfs/ Data, single: total=11.16TiB, used=11.16TiB System, RAID1: total=32.00MiB, used=1.20MiB Metadata, RAID1: total=10.00GiB, used=8.29GiB Metadata, DUP: total=4.50GiB, used=4.21GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs filesystem usage /mnt/btrfs/ Overall: Device size: 18.41TiB Device allocated: 11.19TiB Device unallocated: 7.22TiB Device missing: 0.00B Used: 11.18TiB Free (estimated): 7.22TiB (min: 3.61TiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:11.16TiB, Used:11.16TiB /dev/sdb 2.73TiB /dev/sde 4.09TiB /dev/sdf1 4.25TiB /dev/sdg 93.00GiB Metadata,RAID1: Size:10.00GiB, Used:8.29GiB /dev/sdb 5.00GiB /dev/sde 6.00GiB /dev/sdf1 6.00GiB /dev/sdg 3.00GiB Metadata,DUP: Size:4.50GiB, Used:4.21GiB /dev/sdf1 9.00GiB System,RAID1: Size:32.00MiB, Used:1.20MiB /dev/sdb 32.00MiB /dev/sdg 32.00MiB Unallocated: /dev/sdb 1.81TiB /dev/sde 465.53GiB /dev/sdf1 515.80GiB /dev/sdg 4.45TiB and the tail of dmesg - [50363.836660] ata10: EH complete [64894.794201] BTRFS info (device sdb): relocating block group 12704106414080 flags 1 [64899.944105] BTRFS info (device sdb): found 124 extents [64906.622024] BTRFS info (device sdb): found 124 extents [64906.915197] BTRFS info (device sdb): relocating block group 12703032672256 flags 1 [64915.409311] BTRFS info (device sdb): found 813 extents [64947.160961] ata10.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen [64947.160966] ata10.00: failed command: WRITE FPDMA QUEUED [64947.160970] ata10.00: cmd 61/c0:00:38:8a:1d/0f:00:0c:00:00/40 tag 0 ncq 2064384 out res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.160975] ata10.00: status: { DRDY } [64947.160977] ata10.00: failed command: WRITE FPDMA QUEUED [64947.160980] ata10.00: cmd 61/80:08:f8:99:1d/0a:00:0c:00:00/40 tag 1 ncq 1376256 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.160981] ata10.00: status: { DRDY } [64947.160982] ata10.00: failed command: WRITE FPDMA QUEUED [64947.160985] ata10.00: cmd 61/c0:10:78:a4:1d/0f:00:0c:00:00/40 tag 2 ncq 2064384 out res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.160987] ata10.00: status: { DRDY } [64947.160988] ata10.00: failed command: WRITE FPDMA QUEUED [64947.160991] ata10.00: cmd 61/80:18:38:b4:1d/0a:00:0c:00:00/40 tag 3 ncq 1376256 out res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.160992] ata10.00: status: { DRDY } [64947.160994] ata10.00: failed command: WRITE FPDMA QUEUED [64947.160996] ata10.00: cmd 61/c0:20:b8:be:1d/0f:00:0c:00:00/40 tag 4 ncq 2064384 out res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) [64947.160998] ata10.00: status: { DRDY } [64947.160999] ata10.00: failed command: WRITE FPDMA QUEUED [64947.161002] ata10.00: cmd 61/40:28:78:ce:1d/1a:00:0c:00:00/40 tag 5 ncq 3440640 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.161003] ata10.00: status: { DRDY } [64947.161005] ata10.00: failed command: WRITE FPDMA QUEUED [64947.161007] ata10.00: cmd 61/40:30:b8:e8:1d/1a:00:0c:00:00/40 tag 6 ncq 3440640 out res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [64947.161009] ata10.00: status: { DRDY } [64947.161010]
Re: trying to balance, filesystem keeps going read-only.
On Sun, Nov 01, 2015 at 06:24:53AM -0500, Ken Long wrote: > I have a file system of four 5TB drives. Well, one drive is 8TB with a > 5TB partition.. the rest are 5TB drives. I created the initial btrfs > file system on on drive. rsync'd data to it. added another drive. > rsync'd data. added a third drive, rsync'd data. Added a four drive, > trying to balance. The file system gets an error and I have to reboot > to get the file system out of read only. > > I dont think it is hardware issue..but It could be... or it could be > some kind bug in btrfs? Looks very much like a hardware error to me. This stuff: > [64947.160961] ata10.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 > action 0x6 frozen > [64947.160966] ata10.00: failed command: WRITE FPDMA QUEUED > [64947.160970] ata10.00: cmd 61/c0:00:38:8a:1d/0f:00:0c:00:00/40 tag 0 > ncq 2064384 out > res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask > 0x4 (timeout) is coming from the ATA layer, a couple of layers below btrfs, and would definitely indicate some kind of issue with the hardware. > [66025.199406] ata10: softreset failed (1st FIS failed) > [66025.199417] ata10: hard resetting link > [66030.407703] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > [66030.407713] ata10.00: link online but device misclassified > [66030.407746] ata10: EH complete > [66030.408360] sd 9:0:0:0: [sdg] tag#16 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.408363] sd 9:0:0:0: [sdg] tag#16 CDB: Write(16) 8a 00 00 00 00 > 00 09 a4 bf 80 00 00 49 80 00 00 > [66030.408365] blk_update_request: I/O error, dev sdg, sector 161791872 > [66030.408369] BTRFS: bdev /dev/sdg errs: wr 1, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408439] BTRFS: bdev /dev/sdg errs: wr 2, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408537] BTRFS: bdev /dev/sdg errs: wr 3, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408643] BTRFS: bdev /dev/sdg errs: wr 4, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408768] BTRFS: bdev /dev/sdg errs: wr 5, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408880] BTRFS: bdev /dev/sdg errs: wr 6, rd 0, flush 0, corrupt 0, gen > 0 > [66030.408985] BTRFS: bdev /dev/sdg errs: wr 7, rd 0, flush 0, corrupt 0, gen > 0 > [66030.409082] BTRFS: bdev /dev/sdg errs: wr 8, rd 0, flush 0, corrupt 0, gen > 0 > [66030.409180] BTRFS: bdev /dev/sdg errs: wr 9, rd 0, flush 0, corrupt 0, gen > 0 > [66030.409284] BTRFS: bdev /dev/sdg errs: wr 10, rd 0, flush 0, corrupt 0, > gen 0 > [66030.409847] sd 9:0:0:0: [sdg] tag#17 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.409850] sd 9:0:0:0: [sdg] tag#17 CDB: Write(16) 8a 00 00 00 00 > 00 09 a5 09 00 00 00 44 40 00 00 > [66030.409851] blk_update_request: I/O error, dev sdg, sector 161810688 > [66030.411235] sd 9:0:0:0: [sdg] tag#18 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.411238] sd 9:0:0:0: [sdg] tag#18 CDB: Write(16) 8a 00 00 00 00 > 00 09 a5 4d 40 00 00 49 80 00 00 > [66030.411239] blk_update_request: I/O error, dev sdg, sector 161828160 > [66030.412695] sd 9:0:0:0: [sdg] tag#19 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.412697] sd 9:0:0:0: [sdg] tag#19 CDB: Write(16) 8a 00 00 00 00 > 00 09 a5 96 c0 00 00 49 80 00 00 > [66030.412699] blk_update_request: I/O error, dev sdg, sector 161846976 > [66030.414113] sd 9:0:0:0: [sdg] tag#20 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.414115] sd 9:0:0:0: [sdg] tag#20 CDB: Write(16) 8a 00 00 00 00 > 00 09 a5 e0 40 00 00 1f 80 00 00 > [66030.414117] blk_update_request: I/O error, dev sdg, sector 161865792 > [66030.414755] sd 9:0:0:0: [sdg] tag#21 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.414758] sd 9:0:0:0: [sdg] tag#21 CDB: Write(16) 8a 00 00 00 00 > 00 09 a5 ff c0 00 00 15 00 00 00 > [66030.414759] blk_update_request: I/O error, dev sdg, sector 161873856 > [66030.415205] sd 9:0:0:0: [sdg] tag#22 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.415207] sd 9:0:0:0: [sdg] tag#22 CDB: Write(16) 8a 00 00 00 00 > 00 09 a6 14 c0 00 00 44 40 00 00 > [66030.415208] blk_update_request: I/O error, dev sdg, sector 161879232 > [66030.416562] sd 9:0:0:0: [sdg] tag#23 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.416564] sd 9:0:0:0: [sdg] tag#23 CDB: Write(16) 8a 00 00 00 00 > 00 09 a6 59 00 00 00 44 40 00 00 > [66030.416572] blk_update_request: I/O error, dev sdg, sector 161896704 > [66030.417922] sd 9:0:0:0: [sdg] tag#24 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.417924] sd 9:0:0:0: [sdg] tag#24 CDB: Write(16) 8a 00 00 00 00 > 00 09 a6 9d 40 00 00 49 80 00 00 > [66030.417926] blk_update_request: I/O error, dev sdg, sector 161914176 > [66030.419365] sd 9:0:0:0: [sdg] tag#25 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > [66030.419368] sd 9:0:0:0: [sdg] tag#25 CDB: Write(16) 8a 00 00 00 00 > 00 09 a6 e6 c0 00 00 49 80 00 00 Here, we've got
Re: Crash during mount -o degraded, kernel BUG at fs/btrfs/extent_io.c:2044
This is misleading, these error messages might make one think that the 4th drive is bad and has to be replaced, which would reduce the redundancy to the minimum because it's the second drive that's actually bad. following RFC will solve the misleading part of the problem.. [RFC PATCH] Btrfs: fix fs logging for multi device Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trying to balance, filesystem keeps going read-only.
I get a similar read-only status when I try to remove the drive from the array.. Too bad the utility's function can not be slowed down.. to avoid triggering this error... ? I had some success putting data *onto* the drive by croning sync every two seconds in a different terminal. Doesn't seem to be fixed yet.. https://bugzilla.kernel.org/show_bug.cgi?id=93581 On Sun, Nov 1, 2015 at 9:17 AM, Roman Mamedovwrote: > On Sun, 1 Nov 2015 09:07:08 -0500 > Ken Long wrote: > >> Yes, the one drive is that Seagate 8TB drive.. >> >> Smart tools doesn't show anything outrageous or obvious in hardware. >> >> Is there any other info I can provide to isolate, troubleshoot further? >> >> I'm not sure how to correlate the dmesg message to a specific drive, >> SATA cable etc.. > > See this discussion: http://www.spinics.net/lists/linux-btrfs/msg48054.html > > My guess is these drives need to do a lot of housekeeping internally, > especially during heavy write load or random writes, and do not reply to the > host machine in time, which translates into those "frozen [...] failed > command: WRITE FPDMA QUEUED" failures. > > I did not follow the issue closely enough to know if there's a solution yet, > or > even if this is specific to Btrfs or to GNU/Linux in general. Maybe your best > bet would be to avoid using that drive in your Btrfs array altogether for the > time being. > > -- > With respect, > Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crash during mount -o degraded, kernel BUG at fs/btrfs/extent_io.c:2044
On 11/01/2015 04:22 AM, Duncan wrote: So what btrfs is logging to dmesg on mount here, are the historical error counts, in this case expected as they were deliberate during your test, nearly 200K of them, not one or more new errors. To have btrfs report these at the CLI, use btrfs device stats. To zero Thanks for clarifying. I forgot to check btrfs dev stats. That explains it. The 2-drive failure scenario still caused data corruption with 4.3-rc7 though. Philip -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trying to balance, filesystem keeps going read-only.
On Sun, 1 Nov 2015 06:24:53 -0500 Ken Longwrote: > Well, one drive is 8TB with a 5TB partition. Is this by any chance a Seagate "SMR" drive? From what I remember seeing on the list, those do not work well with Btrfs currently, with symptoms very similar to what you're seeing. -- With respect, Roman signature.asc Description: PGP signature
Re: trying to balance, filesystem keeps going read-only.
Yes, the one drive is that Seagate 8TB drive.. Smart tools doesn't show anything outrageous or obvious in hardware. Is there any other info I can provide to isolate, troubleshoot further? I'm not sure how to correlate the dmesg message to a specific drive, SATA cable etc.. On Sun, Nov 1, 2015 at 8:48 AM, Roman Mamedovwrote: > On Sun, 1 Nov 2015 06:24:53 -0500 > Ken Long wrote: > >> Well, one drive is 8TB with a 5TB partition. > > Is this by any chance a Seagate "SMR" drive? From what I remember seeing on > the list, those do not work well with Btrfs currently, with symptoms very > similar to what you're seeing. > > -- > With respect, > Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trying to balance, filesystem keeps going read-only.
On Sun, 1 Nov 2015 09:07:08 -0500 Ken Longwrote: > Yes, the one drive is that Seagate 8TB drive.. > > Smart tools doesn't show anything outrageous or obvious in hardware. > > Is there any other info I can provide to isolate, troubleshoot further? > > I'm not sure how to correlate the dmesg message to a specific drive, > SATA cable etc.. See this discussion: http://www.spinics.net/lists/linux-btrfs/msg48054.html My guess is these drives need to do a lot of housekeeping internally, especially during heavy write load or random writes, and do not reply to the host machine in time, which translates into those "frozen [...] failed command: WRITE FPDMA QUEUED" failures. I did not follow the issue closely enough to know if there's a solution yet, or even if this is specific to Btrfs or to GNU/Linux in general. Maybe your best bet would be to avoid using that drive in your Btrfs array altogether for the time being. -- With respect, Roman signature.asc Description: PGP signature
Re: BTRFS raid 5/6 status
I've looked into snap-raid and it seems well suited to my needs as most of the data is static. I'm planning on using it in conjunction with mhddfs so all drives are seen as a single storage pool. Is there then any benefit in using Btrfs as the underlying filesystem on each of the drives? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 5/6] btrfs-progs: free comparer_set in cmd_qgroup_show
Hi, David Sterba > -Original Message- > From: David Sterba [mailto:dste...@suse.cz] > Sent: Friday, October 30, 2015 9:36 PM > To: Zhao Lei> Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH 5/6] btrfs-progs: free comparer_set in cmd_qgroup_show > > On Thu, Oct 29, 2015 at 05:31:47PM +0800, Zhao Lei wrote: > > comparer_set, which was allocated by malloc(), should be free before > > function return. > > > > Signed-off-by: Zhao Lei > > --- > > cmds-qgroup.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/cmds-qgroup.c b/cmds-qgroup.c index a64b716..f069d32 > > 100644 > > --- a/cmds-qgroup.c > > +++ b/cmds-qgroup.c > > @@ -290,7 +290,7 @@ static int cmd_qgroup_show(int argc, char **argv) > > int filter_flag = 0; > > unsigned unit_mode; > > > > - struct btrfs_qgroup_comparer_set *comparer_set; > > + struct btrfs_qgroup_comparer_set *comparer_set = NULL; > > struct btrfs_qgroup_filter_set *filter_set; > > filter_set = btrfs_qgroup_alloc_filter_set(); > > comparer_set = btrfs_qgroup_alloc_comparer_set(); > > @@ -372,6 +372,8 @@ static int cmd_qgroup_show(int argc, char **argv) > > fprintf(stderr, "ERROR: can't list qgroups: %s\n", > > strerror(e)); > > > > + free(comparer_set); > > Doh, coverity correctly found that comparer_set is freed inside > btrfs_show_qgroups() a few lines above. Patch dropped. > My bad. This problem is find in my node by valgrind memckeck, maybe it is not freed in some case, or a valgrind misreport. I'll check it deeply. Thanks Zhaolei > > + -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Mark Fasheh wrote on 2015/09/22 13:15 -0700: Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.') removed our qgroup accounting during btrfs_drop_snapshot(). Predictably, this results in qgroup numbers going bad shortly after a snapshot is removed. Fix this by adding a dirty extent record when we encounter extents during our shared subtree walk. This effectively restores the functionality we had with the original shared subtree walking code in 1152651 (btrfs: qgroup: account shared subtrees during snapshot delete). The idea with the original patch (and this one) is that shared subtrees can get skipped during drop_snapshot. The shared subtree walk then allows us a chance to visit those extents and add them to the qgroup work for later processing. This ultimately makes the accounting for drop snapshot work. The new qgroup code nicely handles all the other extents during the tree walk via the ref dec/inc functions so we don't have to add actions beyond what we had originally. Signed-off-by: Mark FashehHi Mark, Despite the performance regression reported from Stefan Priebe, there is another problem, I'll comment inlined below. --- fs/btrfs/extent-tree.c | 41 ++--- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3a70e6c..89be620 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7757,17 +7757,37 @@ reada: } /* - * TODO: Modify related function to add related node/leaf to dirty_extent_root, - * for later qgroup accounting. - * - * Current, this function does nothing. + * These may not be seen by the usual inc/dec ref code so we have to + * add them here. */ +static int record_one_subtree_extent(struct btrfs_trans_handle *trans, +struct btrfs_root *root, u64 bytenr, +u64 num_bytes) +{ + struct btrfs_qgroup_extent_record *qrecord; + struct btrfs_delayed_ref_root *delayed_refs; + + qrecord = kmalloc(sizeof(*qrecord), GFP_NOFS); + if (!qrecord) + return -ENOMEM; + + qrecord->bytenr = bytenr; + qrecord->num_bytes = num_bytes; + qrecord->old_roots = NULL; + + delayed_refs = >transaction->delayed_refs; + if (btrfs_qgroup_insert_dirty_extent(delayed_refs, qrecord)) + kfree(qrecord); 1) Unprotected dirty_extent_root. Unfortunately, btrfs_qgroup_insert_dirty_exntet() is not protected by any lock/mutex. And I'm sorry not to add comment about that. In fact, btrfs_qgroup_insert_dirty_extent should always be used with delayed_refs->lock hold. Just like add_delayed_ref_head(), where every caller of add_delayed_ref_head() holds delayed_refs->lock. So here you will nned to hold delayed_refs->lock. 2) Performance regression.(Reported by Stefan Priebe) The performance regression is not caused by your codes, at least not completely. It's my fault not adding enough comment for insert_dirty_extent() function. (just like 1, I must say I'm a bad reviewer until there is bug report) As I was only expecting it called inside add_delayed_ref_head(), and caller of add_delayed_ref_head() has judged whether qgroup is enabled before calling add_delayed_ref_head(). So for qgroup disabled case, insert_dirty_extent() won't ever be called. As a result, if you want to call btrfs_qgroup_insert_dirty_extent() out of add_delayed_ref_head(), you will need to handle the delayed_refs->lock and judge whether qgroup is enabled. BTW, if it's OK for you, you can also further improve the performance of qgroup by using kmem_cache for struct btrfs_qgroup_extent_record. I assume the kmalloc() may be one performance hot spot considering the amount it called in qgroup enabled case. Thanks, Qu + + return 0; +} + static int account_leaf_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb) { int nr = btrfs_header_nritems(eb); - int i, extent_type; + int i, extent_type, ret; struct btrfs_key key; struct btrfs_file_extent_item *fi; u64 bytenr, num_bytes; @@ -7790,6 +7810,10 @@ static int account_leaf_items(struct btrfs_trans_handle *trans, continue; num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + + ret = record_one_subtree_extent(trans, root, bytenr, num_bytes); + if (ret) + return ret; } return 0; } @@ -7858,8 +7882,6 @@ static int adjust_slots_upwards(struct btrfs_root *root, /* * root_eb is the subtree root and is locked before this function is called. - * TODO: Modify this function to mark all (including complete shared node) - * to dirty_extent_root to allow it get accounted in qgroup. */ static int
Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Stefan Priebe wrote on 2015/11/01 21:49 +0100: Hi, this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html adds a regression to my test systems with very large disks (30tb and 50tb). btrfs balance is super slow afterwards while heavily making use of cp --reflink=always on big files (200gb - 500gb). Sorry didn't know how to correctly reply to that "old" message. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for the testing. Are you using qgroup or just doing normal balance with qgroup disabled? For the latter case, that's should be optimized to skip the dirty extent insert in qgroup disabled case. For qgroup enabled case, I'm afraid that's the design. As relocation will drop a subtree to relocate, and to ensure qgroup consistent, we must walk down all the tree blocks and mark them dirty for later qgroup accounting. But there should be some hope left for optimization. For example, if all subtree blocks are already relocated, we can skip the tree down walk routine. Anyway, for your case of huge files, as tree level grows rapidly, any workload involving tree iteration will be very time consuming. Like snapshot deletion and relocation. BTW, thanks for you regression report, I also found another problem of the patch. I'll reply to the author to improve the patchset. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Stefan Priebe posted on Sun, 01 Nov 2015 21:49:44 +0100 as excerpted: > this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > > adds a regression to my test systems with very large disks (30tb and > 50tb). > > btrfs balance is super slow afterwards while heavily making use of cp > --reflink=always on big files (200gb - 500gb). > > Sorry didn't know how to correctly reply to that "old" message. Just on the message-reply bit... Gmane.org carries this list (among many), archiving the posts with both nntp/news and http/web interfaces. Both the web and news interfaces normally allow replies to both old and current messages via the gmane gateway forwarding to the list, tho the first time you reply to a list via gmane, it'll respond with a confirmation to the email address you used, requiring you to reply to that before forwarding the mail on to the list. If you don't reply within a week, the message is dropped. However, at least for the news interface (not sure about the web interface), you only have to confirm for a particular list/newsgroup once, after that, it forwards to the list without further confirmations. That's how I follow all my lists, reading and replying to them as newsgroups via the gmane list2news interface. http://gmane.org for more info. The one caveat is that while on a lot of lists replies to the list only is the norm, on the Linux kernel and vger.kernel.org hosted lists (including this one), replying to all, list and previous posters, is the norm, and I'm not sure if the web interface allows that. On the news interface it of course depends on your news client -- mine is more adapted to news than mail, and while it allows forwarding to your normal mail client for the mail side, normal followups are to news only, and it's not easy to reply to all, so I generally reply to list (as newsgroup) only, unless a poster specifically requests to be CCed on replies. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Hi, this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html adds a regression to my test systems with very large disks (30tb and 50tb). btrfs balance is super slow afterwards while heavily making use of cp --reflink=always on big files (200gb - 500gb). Sorry didn't know how to correctly reply to that "old" message. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Am 02.11.2015 um 02:34 schrieb Qu Wenruo: Stefan Priebe wrote on 2015/11/01 21:49 +0100: Hi, this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html adds a regression to my test systems with very large disks (30tb and 50tb). btrfs balance is super slow afterwards while heavily making use of cp --reflink=always on big files (200gb - 500gb). Sorry didn't know how to correctly reply to that "old" message. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for the testing. Are you using qgroup or just doing normal balance with qgroup disabled? just doing normal balance with qgroup disabled. For the latter case, that's should be optimized to skip the dirty extent insert in qgroup disabled case. For qgroup enabled case, I'm afraid that's the design. As relocation will drop a subtree to relocate, and to ensure qgroup consistent, we must walk down all the tree blocks and mark them dirty for later qgroup accounting. But there should be some hope left for optimization. For example, if all subtree blocks are already relocated, we can skip the tree down walk routine. Anyway, for your case of huge files, as tree level grows rapidly, any workload involving tree iteration will be very time consuming. Like snapshot deletion and relocation. BTW, thanks for you regression report, I also found another problem of the patch. I'll reply to the author to improve the patchset. Thanks, Stefan Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html