Re: exclusive subvolume space missing
Tomasz Pala posted on Sat, 02 Dec 2017 01:53:39 +0100 as excerpted: > # btrfs fi usage / > Overall: > Device size: 128.00GiB > Device allocated:117.19GiB > Device unallocated: 10.81GiB > Device missing: 0.00B > Used:103.56GiB > Free (estimated): 11.19GiB (min: 11.14GiB) > Data ratio: 1.98 > Metadata ratio: 2.00 > Global reserve: 146.08MiB (used: 0.00B) > > Data,single: Size:1.19GiB, Used:1.18GiB >/dev/sda2 1.07GiB >/dev/sdb2 132.00MiB > > Data,RAID1: Size:55.97GiB, Used:50.30GiB >/dev/sda2 55.97GiB >/dev/sdb2 55.97GiB > > Metadata,RAID1: Size:2.00GiB, Used:908.61MiB >/dev/sda2 2.00GiB >/dev/sdb2 2.00GiB > > System,RAID1: Size:32.00MiB, Used:16.00KiB >/dev/sda2 32.00MiB >/dev/sdb2 32.00MiB > > Unallocated: >/dev/sda2 4.93GiB >/dev/sdb2 5.87GiB OK, is this supposed to be raid1 or single data, because the above shows metadata as all raid1, while some data is single tho most is raid1, and while old mkfs used to create unused single chunks on raid1 that had to be removed manually via balance, those single data chunks aren't unused. Which means if it's supposed to raid1, you don't have redundancy on that single data. Assuming the intent is raid1, I'd recommend doing... btrfs balance start -dconvert=raid1,soft / Probably disable quotas at least temporarily while you do so, tho, as they don't scale well with balance and make it take much longer. That should go reasonably fast as it's only a bit over 1 GiB on the one device, and 132 MiB on the other (from your btrfs device usage), and the soft allows it to skip chunks that don't need conversion. It should kill those single entries and even up usage on both devices, along with making the filesystem much more tolerant of loss of one of the two devices. Other than that, what we can see from the above is that it's a relatively small filesystem, 64 GiB each on a pair of devices, raid1 but for the above. We also see that the allocated chunks vs. chunk usage isn't /too/ bad, with that being a somewhat common problem. However, given the relatively small 64 GiB per device pair-device raid1 filesystem, there is some slack, about 5 GiB worth, in that raid1 data, that you can recover. btrfs balance start -dusage=N / Where N represents a percentage full, so 0-100. Normally, smaller values of N complete much faster, with the most effect if they're enough, because at say 10% usage, 10 90% empty chunks can be rewritten into a single 100% full chunk. The idea is to start with a small N value since it completes fast, and redo with higher values as necessary to shrink the total data chunk allocated value toward usage. I too run relatively small btrfs raid1s and would suggest trying N=5, 20, 40, 70, until the spread between used and total is under 2 gigs, under a gig if you want to go that far (nominal data chunk size is a gig so even a full balance will be unlikely to get you a spread less than that). Over 70 likely won't get you much so isn't worth it. That should return the excess to unallocated, leaving the filesystem able to use the freed space for data or metadata chunks as necessary, tho you're unlikely to see an increase in available space in (non-btrfs) df or similar. If the unallocated value gets down below 1 GiB you may have issues trying to free space since balance will want space to write the chunk it's going to write into to free the others, so you probably want to keep an eye on this and rebalance if it gets under 2-3 gigs free space, assuming of course that there's slack between used and total that /can/ be freed by a rebalance. FWIW the same can be done with metadata using -musage=, with metadata chunks being 256 MiB nominal, but keep in mind that global reserve is allocated from metadata space but doesn't count as used, so you typically can't get the spread down below half a GiB or so. And in most cases it's data chunks that get the big spread, not metadata, so it's much more common to have to do -d for data than -m for metadata. All that said, the numbers don't show a runaway spread between total and used, so while this might help, it's not going to fix the primary space being eaten problem of the thread, as I had hoped it might. Additionally, at 2 GiB total per device, metadata chunks aren't runaway consuming your space either, as I'd suspect they might if the problem were for instance atime updates, so while noatime is certainly recommended and might help some, it doesn't appear to be a primary contributor to the problem either. The other possibility that comes to mind here has to do with btrfs COW write patterns... Suppose you start with a 100 MiB file (I'm adjusting the sizes down from the GiB+ example typically used due to the filesystem size
Re: exclusive subvolume space missing
On 2017年12月02日 10:21, Tomasz Pala wrote: > On Sat, Dec 02, 2017 at 09:47:19 +0800, Qu Wenruo wrote: > >>> Actually I should rephrase the problem: >>> >>> "snapshot has taken 8 GB of space despite nothing has altered source >>> subvolume" > > Actually, after: > > # btrfs balance start -v -dconvert=raid1 / > ctrl-c on block group 35G/113G > # btrfs balance start -v -dconvert=raid1,soft / > # btrfs balance start -v -dusage=55 / > Done, had to relocate 1 out of 56 chunks > # btrfs balance start -v -musage=55 / > Done, had to relocate 2 out of 55 chunks > > and waiting a few minutes after ...the 8 GB I've lost yesterday is back: > > # btrfs fi sh / > Label: none uuid: 17a3de25-6e26-4b0b-9665-ac267f6f6c4a > Total devices 2 FS bytes used 44.10GiB > devid1 size 64.00GiB used 54.00GiB path /dev/sda2 > devid2 size 64.00GiB used 54.00GiB path /dev/sdb2 > > # btrfs fi usage / > Overall: > Device size: 128.00GiB > Device allocated:108.00GiB > Device unallocated: 20.00GiB > Device missing: 0.00B > Used: 88.19GiB > Free (estimated): 18.75GiB (min: 18.75GiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 131.14MiB (used: 0.00B) > > Data,RAID1: Size:51.97GiB, Used:43.22GiB >/dev/sda2 51.97GiB >/dev/sdb2 51.97GiB > > Metadata,RAID1: Size:2.00GiB, Used:895.69MiB >/dev/sda2 2.00GiB >/dev/sdb2 2.00GiB > > System,RAID1: Size:32.00MiB, Used:16.00KiB >/dev/sda2 32.00MiB >/dev/sdb2 32.00MiB > > Unallocated: >/dev/sda2 10.00GiB >/dev/sdb2 10.00GiB > > # btrfs dev usage / > /dev/sda2, ID: 1 >Device size:64.00GiB >Device slack: 0.00B >Data,RAID1: 51.97GiB >Metadata,RAID1: 2.00GiB >System,RAID1: 32.00MiB >Unallocated:10.00GiB > > /dev/sdb2, ID: 2 >Device size:64.00GiB >Device slack: 0.00B >Data,RAID1: 51.97GiB >Metadata,RAID1: 2.00GiB >System,RAID1: 32.00MiB >Unallocated:10.00GiB > > # btrfs fi df / > Data, RAID1: total=51.97GiB, used=43.22GiB > System, RAID1: total=32.00MiB, used=16.00KiB > Metadata, RAID1: total=2.00GiB, used=895.69MiB > GlobalReserve, single: total=131.14MiB, used=0.00B > > # df > /dev/sda264G 45G 19G 71% / > > However the difference is on active root fs: > > -0/29124.29GiB 9.77GiB > +0/29115.99GiB 76.00MiB > > Still, 45G used, while there is (if I counted this correctly) 25G of data... > >> Then please provide correct qgroup numbers. >> >> The correct number should be get by: >> # btrfs quota enable >> # btrfs quota rescan -w >> # btrfs qgroup show -prce --sync > > OK, just added the --sort=excl: > > qgroupid rfer excl max_rfer max_excl parent child > -- - > 0/5 16.00KiB 16.00KiB none none --- --- > 0/36122.57GiB 7.00MiB none none --- --- > 0/35822.54GiB 7.50MiB none none --- --- > 0/34322.36GiB 7.84MiB none none --- --- > 0/34522.49GiB 8.05MiB none none --- --- > 0/35722.50GiB 9.27MiB none none --- --- > 0/36022.57GiB 10.27MiB none none --- --- > 0/34422.48GiB 11.09MiB none none --- --- > 0/35922.55GiB 12.57MiB none none --- --- > 0/36222.59GiB 22.96MiB none none --- --- > 0/30212.87GiB 31.23MiB none none --- --- > 0/42815.96GiB 38.68MiB none none --- --- > 0/29411.09GiB 47.86MiB none none --- --- > 0/33621.80GiB 49.59MiB none none --- --- > 0/30012.56GiB 51.43MiB none none --- --- > 0/34222.31GiB 52.93MiB none none --- --- > 0/33321.71GiB 54.54MiB none none --- --- > 0/36322.63GiB 58.83MiB none none --- --- > 0/37023.27GiB 59.46MiB none none --- --- > 0/30513.01GiB 61.47MiB none none --- --- > 0/33121.61GiB 61.49MiB none none --- --- > 0/33421.78GiB 62.95MiB none none --- --- > 0/30613.04GiB 64.11MiB none none --- --- > 0/30412.96GiB 64.90MiB none none ---
Re: exclusive subvolume space missing
On Sat, Dec 02, 2017 at 09:47:19 +0800, Qu Wenruo wrote: >> Actually I should rephrase the problem: >> >> "snapshot has taken 8 GB of space despite nothing has altered source >> subvolume" Actually, after: # btrfs balance start -v -dconvert=raid1 / ctrl-c on block group 35G/113G # btrfs balance start -v -dconvert=raid1,soft / # btrfs balance start -v -dusage=55 / Done, had to relocate 1 out of 56 chunks # btrfs balance start -v -musage=55 / Done, had to relocate 2 out of 55 chunks and waiting a few minutes after ...the 8 GB I've lost yesterday is back: # btrfs fi sh / Label: none uuid: 17a3de25-6e26-4b0b-9665-ac267f6f6c4a Total devices 2 FS bytes used 44.10GiB devid1 size 64.00GiB used 54.00GiB path /dev/sda2 devid2 size 64.00GiB used 54.00GiB path /dev/sdb2 # btrfs fi usage / Overall: Device size: 128.00GiB Device allocated:108.00GiB Device unallocated: 20.00GiB Device missing: 0.00B Used: 88.19GiB Free (estimated): 18.75GiB (min: 18.75GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 131.14MiB (used: 0.00B) Data,RAID1: Size:51.97GiB, Used:43.22GiB /dev/sda2 51.97GiB /dev/sdb2 51.97GiB Metadata,RAID1: Size:2.00GiB, Used:895.69MiB /dev/sda2 2.00GiB /dev/sdb2 2.00GiB System,RAID1: Size:32.00MiB, Used:16.00KiB /dev/sda2 32.00MiB /dev/sdb2 32.00MiB Unallocated: /dev/sda2 10.00GiB /dev/sdb2 10.00GiB # btrfs dev usage / /dev/sda2, ID: 1 Device size:64.00GiB Device slack: 0.00B Data,RAID1: 51.97GiB Metadata,RAID1: 2.00GiB System,RAID1: 32.00MiB Unallocated:10.00GiB /dev/sdb2, ID: 2 Device size:64.00GiB Device slack: 0.00B Data,RAID1: 51.97GiB Metadata,RAID1: 2.00GiB System,RAID1: 32.00MiB Unallocated:10.00GiB # btrfs fi df / Data, RAID1: total=51.97GiB, used=43.22GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=2.00GiB, used=895.69MiB GlobalReserve, single: total=131.14MiB, used=0.00B # df /dev/sda264G 45G 19G 71% / However the difference is on active root fs: -0/29124.29GiB 9.77GiB +0/29115.99GiB 76.00MiB Still, 45G used, while there is (if I counted this correctly) 25G of data... > Then please provide correct qgroup numbers. > > The correct number should be get by: > # btrfs quota enable > # btrfs quota rescan -w > # btrfs qgroup show -prce --sync OK, just added the --sort=excl: qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/36122.57GiB 7.00MiB none none --- --- 0/35822.54GiB 7.50MiB none none --- --- 0/34322.36GiB 7.84MiB none none --- --- 0/34522.49GiB 8.05MiB none none --- --- 0/35722.50GiB 9.27MiB none none --- --- 0/36022.57GiB 10.27MiB none none --- --- 0/34422.48GiB 11.09MiB none none --- --- 0/35922.55GiB 12.57MiB none none --- --- 0/36222.59GiB 22.96MiB none none --- --- 0/30212.87GiB 31.23MiB none none --- --- 0/42815.96GiB 38.68MiB none none --- --- 0/29411.09GiB 47.86MiB none none --- --- 0/33621.80GiB 49.59MiB none none --- --- 0/30012.56GiB 51.43MiB none none --- --- 0/34222.31GiB 52.93MiB none none --- --- 0/33321.71GiB 54.54MiB none none --- --- 0/36322.63GiB 58.83MiB none none --- --- 0/37023.27GiB 59.46MiB none none --- --- 0/30513.01GiB 61.47MiB none none --- --- 0/33121.61GiB 61.49MiB none none --- --- 0/33421.78GiB 62.95MiB none none --- --- 0/30613.04GiB 64.11MiB none none --- --- 0/30412.96GiB 64.90MiB none none --- --- 0/30312.94GiB 68.39MiB none none --- --- 0/36723.20GiB 68.52MiB none none --- --- 0/36623.22GiB 69.79MiB none none --- --- 0/36422.63GiB 72.03MiB
Re: exclusive subvolume space missing
On 2017年12月02日 09:43, Tomasz Pala wrote: > On Sat, Dec 02, 2017 at 09:05:50 +0800, Qu Wenruo wrote: > >>> qgroupid rfer excl >>> >>> 0/26012.25GiB 3.22GiB from 170712 - first snapshot >>> 0/31217.54GiB 4.56GiB from 170811 >>> 0/36625.59GiB 2.44GiB from 171028 >>> 0/37023.27GiB 59.46MiB from 18 - prev snapshot >>> 0/38821.69GiB 7.16GiB from 171125 - last snapshot >>> 0/29124.29GiB 9.77GiB default subvolume >> >> You may need to manually sync the filesystem (trigger a transaction >> commitment) to update qgroup accounting. > > The data I've pasted were just calculated. > >>> # btrfs quota enable / >>> # btrfs qgroup show / >>> WARNING: quota disabled, qgroup data may be out of date >>> [...] >>> # btrfs quota enable / - for the second time! >>> # btrfs qgroup show / >>> WARNING: qgroup data inconsistent, rescan recommended >> >> Please wait the rescan, or any number is not correct. > > Here I was pointing that first "quota enable" resulted in "quota > disabled" warning until I've enabled it once again. > >> It's highly recommended to read btrfs-quota(8) and btrfs-qgroup(8) to >> ensure you understand all the limitation. > > I probably won't understand them all, but this is not an issue of my > concern as I don't use it. There is simply no other way I am aware that > could show me per-subvolume stats. Well, straightforward way, as the > hard way I'm using (btrfs send) confirms the problem. Unfortunately, send doesn't count everything. The most common thing is, send doesn't count extent booking space. Try the following command: # fallocate -l 1G # mkfs.btrfs -f # mount # btrfs subv create /subv1 # xfs_io -f -c "pwrite 0 128M" -c "sync" /subv1/file1 # xfs_io -f "fpunch 0 127M" -c "sync" /subv1/file1 # btrfs subv snapshot -r /subv1 /snapshot # btrfs send /snapshot You will only get the 1M data, while it still takes 128M space on-disk. Btrfs extent book will only free the whole extent if and only if there is no inode referring *ANY* part of the extent. Even only 1M of a 128M file extent is used, it will still takes 128M space on-disk. And that's what send can't tell you. And that's also what qgroup can tell you. That's also why I need *CORRECT* qgroup numbers to further investigate the problem. > > You could simply remove all the quota results I've posted and there will > be the underlaying problem, that the 25 GB of data I got occupies 52 GB. If you only want to know the answer why your "25G" data occupies 52G on disk, above is one of the possible explanations. (And I think I should put it into btrfs(5), although I highly doubt if user will really read them) You could try to defrag, but I'm not sure if defrag works well in multi subvolumes case. > At least one recent snapshot, that was taken after some minor (<100 MB) > changes > from the subvolume, that has undergo some minor changes since then, > occupied 8 GB during one night when the entire system was idling. The only possible method to fully isolate all the disturbing factors is to get rid of snapshot. Build the subvolume from scratch (even no cp --reflink from other subvolume), then test what's happening. Only in that case, you could trust vanilla du (if you don't do any reflink). Although you can always trust qgroups number, such subvolume built from scratch will makes exclusive number equals to reference, making debugging a little easier. Thanks, Qu > > This was crosschecked on files metadata (mtimes compared) and 'du' > results. > > > As a last-resort I've rebalanced the disk (once again), this time with > -dconvert=raid1 (to get rid of the single residue). > signature.asc Description: OpenPGP digital signature
Re: exclusive subvolume space missing
On 2017年12月02日 09:23, Tomasz Pala wrote: > On Sat, Dec 02, 2017 at 08:27:56 +0800, Qu Wenruo wrote: > >> I assume there is program eating up the space. >> Not btrfs itself. > > Very doubtful. I've encountered ext3 "eating" problem once, that couldn't be > find by lsof on 3.4.75 kernel, but the space was returning after killing > Xorg. The system I'm having problem now is very recent, the space > doesn't return after reboot/emergency and doesn't sum up with files. Unlike vanilla df or "fi usage" or "fi df", btrfs quota only counts on-disk extents. That's to say, reserved space won't contribute to qgroup. Unless one is using anonymous file, which is opened but unlinked, so no one can access it except the owner. (Which I doubt that may be your case) Which should make quota the best tool to debug your problem. (As long as you follow the variant limitations of btrfs quota, especially you need to sync or use --sync option to show qgroup numbers) > >>> Now, the weird part for me is exclusive data count: >>> >>> # btrfs sub sh ./snapshot-171125 >>> [...] >>> Subvolume ID: 388 >>> # btrfs fi du -s ./snapshot-171125 >>> Total Exclusive Set shared Filename >>> 21.50GiB63.35MiB20.77GiB snapshot-171125 >> >> That's the difference between how sub show and quota works. >> >> For quota, it's per-root owner check. > > Just to be clear: I've enabled quota _only_ to see subvolume usage on > spot. And exclusive data - the more detailed approach I've described in > e-mail I've send a minute ago. > >> Means even a file extent is shared between different inodes, if all >> inodes are inside the same subvolume, it's counted as exclusive. >> And if any of the file extent belongs to other subvolume, then it's >> counted as shared. > > Good to know, but this is almost UID0-only system. There are system > users (vendor provided) and 2 ssh accounts for su, but nobody uses this > machine for daily work. The quota values were the last tool I could find > to debug. > >> For fi du, it's per-inode owner check. (The exact behavior is a little >> more complex, I'll skip such corner case to make it a little easier to >> understand). >> >> That's to say, if one file extent is shared by different inodes, then >> it's counted as shared, no matter if these inodes belong to different or >> the same subvolume. >> >> That's to say, "fi du" has a looser condition for "shared" calculation, >> and that should explain why you have 20+G shared. > > There shouldn't be many multi-inode extents inside single subvolume, as this > is mostly fresh > system, with no containers, no deduplication, snapshots are taken from > the same running system before or after some more important change is > done. By 'change' I mean altering text config files mostly (plus > etckeeper's git metadata), so the volume of difference is extremelly > low. Actually most of the difs between subvolumes come from updating > distro packages. There were not much reflink copies made on this > partition, only one kernel source compiled (.ccache files removed > today). So this partition is as clean, as it could be after almost > 5 months in use. > > Actually I should rephrase the problem: > > "snapshot has taken 8 GB of space despite nothing has altered source > subvolume" Then please provide correct qgroup numbers. The correct number should be get by: # btrfs quota enable # btrfs quota rescan -w # btrfs qgroup show -prce --sync Rescan and --sync are important to get the correct number. (while rescan can take a long long time to finish) And further more, please ensure that all deleted files are really deleted. Btrfs delay file and subvolume deletion, so you may need to sync several times or use "btrfs subv sync" to ensure deleted files are deleted. (vanilla du won't tell you if such delayed file deletion is really done) Thanks, Qu > signature.asc Description: OpenPGP digital signature
Re: exclusive subvolume space missing
On Sat, Dec 02, 2017 at 09:05:50 +0800, Qu Wenruo wrote: >> qgroupid rfer excl >> >> 0/26012.25GiB 3.22GiB from 170712 - first snapshot >> 0/31217.54GiB 4.56GiB from 170811 >> 0/36625.59GiB 2.44GiB from 171028 >> 0/37023.27GiB 59.46MiB from 18 - prev snapshot >> 0/38821.69GiB 7.16GiB from 171125 - last snapshot >> 0/29124.29GiB 9.77GiB default subvolume > > You may need to manually sync the filesystem (trigger a transaction > commitment) to update qgroup accounting. The data I've pasted were just calculated. >> # btrfs quota enable / >> # btrfs qgroup show / >> WARNING: quota disabled, qgroup data may be out of date >> [...] >> # btrfs quota enable / - for the second time! >> # btrfs qgroup show / >> WARNING: qgroup data inconsistent, rescan recommended > > Please wait the rescan, or any number is not correct. Here I was pointing that first "quota enable" resulted in "quota disabled" warning until I've enabled it once again. > It's highly recommended to read btrfs-quota(8) and btrfs-qgroup(8) to > ensure you understand all the limitation. I probably won't understand them all, but this is not an issue of my concern as I don't use it. There is simply no other way I am aware that could show me per-subvolume stats. Well, straightforward way, as the hard way I'm using (btrfs send) confirms the problem. You could simply remove all the quota results I've posted and there will be the underlaying problem, that the 25 GB of data I got occupies 52 GB. At least one recent snapshot, that was taken after some minor (<100 MB) changes from the subvolume, that has undergo some minor changes since then, occupied 8 GB during one night when the entire system was idling. This was crosschecked on files metadata (mtimes compared) and 'du' results. As a last-resort I've rebalanced the disk (once again), this time with -dconvert=raid1 (to get rid of the single residue). -- Tomasz Pala-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: exclusive subvolume space missing
On Sat, Dec 02, 2017 at 08:27:56 +0800, Qu Wenruo wrote: > I assume there is program eating up the space. > Not btrfs itself. Very doubtful. I've encountered ext3 "eating" problem once, that couldn't be find by lsof on 3.4.75 kernel, but the space was returning after killing Xorg. The system I'm having problem now is very recent, the space doesn't return after reboot/emergency and doesn't sum up with files. >> Now, the weird part for me is exclusive data count: >> >> # btrfs sub sh ./snapshot-171125 >> [...] >> Subvolume ID: 388 >> # btrfs fi du -s ./snapshot-171125 >> Total Exclusive Set shared Filename >> 21.50GiB63.35MiB20.77GiB snapshot-171125 > > That's the difference between how sub show and quota works. > > For quota, it's per-root owner check. Just to be clear: I've enabled quota _only_ to see subvolume usage on spot. And exclusive data - the more detailed approach I've described in e-mail I've send a minute ago. > Means even a file extent is shared between different inodes, if all > inodes are inside the same subvolume, it's counted as exclusive. > And if any of the file extent belongs to other subvolume, then it's > counted as shared. Good to know, but this is almost UID0-only system. There are system users (vendor provided) and 2 ssh accounts for su, but nobody uses this machine for daily work. The quota values were the last tool I could find to debug. > For fi du, it's per-inode owner check. (The exact behavior is a little > more complex, I'll skip such corner case to make it a little easier to > understand). > > That's to say, if one file extent is shared by different inodes, then > it's counted as shared, no matter if these inodes belong to different or > the same subvolume. > > That's to say, "fi du" has a looser condition for "shared" calculation, > and that should explain why you have 20+G shared. There shouldn't be many multi-inode extents inside single subvolume, as this is mostly fresh system, with no containers, no deduplication, snapshots are taken from the same running system before or after some more important change is done. By 'change' I mean altering text config files mostly (plus etckeeper's git metadata), so the volume of difference is extremelly low. Actually most of the difs between subvolumes come from updating distro packages. There were not much reflink copies made on this partition, only one kernel source compiled (.ccache files removed today). So this partition is as clean, as it could be after almost 5 months in use. Actually I should rephrase the problem: "snapshot has taken 8 GB of space despite nothing has altered source subvolume" -- Tomasz Pala-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: exclusive subvolume space missing
>>> Now, the weird part for me is exclusive data count: >>> >>> # btrfs sub sh ./snapshot-171125 >>> [...] >>> Subvolume ID: 388 >>> # btrfs fi du -s ./snapshot-171125 >>> Total Exclusive Set shared Filename >>> 21.50GiB63.35MiB20.77GiB snapshot-171125 >>> >>> How is that possible? This doesn't even remotely relate to 7.15 GiB >>> from qgroup.~The same amount differs in total: 28.75-21.50=7.25 GiB. >>> And the same happens with other snapshots, much more exclusive data >>> shown in qgroup than actually found in files. So if not files, where >>> is that space wasted? Metadata? >> >>Personally, I'd trust qgroups' output about as far as I could spit >> Belgium(*). > > Well, there is something wrong here, as after removing the .ccache > directories inside all the snapshots the 'excl' values decreased > ...except for the last snapshot (the list below is short by ~40 snapshots > that have 2 GB excl in total): > > qgroupid rfer excl > > 0/26012.25GiB 3.22GiBfrom 170712 - first snapshot > 0/31217.54GiB 4.56GiBfrom 170811 > 0/36625.59GiB 2.44GiBfrom 171028 > 0/37023.27GiB 59.46MiBfrom 18 - prev snapshot > 0/38821.69GiB 7.16GiBfrom 171125 - last snapshot > 0/29124.29GiB 9.77GiBdefault subvolume You may need to manually sync the filesystem (trigger a transaction commitment) to update qgroup accounting. > > > [~/test/snapshot-171125]# du -sh . > 15G . > > > After changing back to ro I tested how much data really has changed > between the previous and last snapshot: > > [~/test]# btrfs send -p snapshot-171118 snapshot-171125 | pv > /dev/null > At subvol snapshot-171125 > 74.2MiB 0:00:32 [2.28MiB/s] > > This means there can't be 7 GiB of exclusive data in the last snapshot. Mentioned before, sync the fs first before checking the qgroup numbers. Or use --sync option along with qgroup show. > > Well, even btrfs send -p snapshot-170712 snapshot-171125 | pv > /dev/null > 5.68GiB 0:03:23 [28.6MiB/s] > > I've created a new snapshot right now to compare it with 171125: > 75.5MiB 0:00:43 [1.73MiB/s] > > > OK, I could even compare all the snapshots in sequence: > > # for i in snapshot-17*; btrfs prop set $i ro true > # p=''; for i in snapshot-17*; do [ -n "$p" ] && btrfs send -p "$p" "$i" | pv > > /dev/null; p="$i" done > 1.7GiB 0:00:15 [ 114MiB/s] > 1.03GiB 0:00:38 [27.2MiB/s] > 155MiB 0:00:08 [19.1MiB/s] > 1.08GiB 0:00:47 [23.3MiB/s] > 294MiB 0:00:29 [ 9.9MiB/s] > 324MiB 0:00:42 [7.69MiB/s] > 82.8MiB 0:00:06 [12.7MiB/s] > 64.3MiB 0:00:05 [11.6MiB/s] > 137MiB 0:00:07 [19.3MiB/s] > 85.3MiB 0:00:13 [6.18MiB/s] > 62.8MiB 0:00:19 [3.21MiB/s] > 132MiB 0:00:42 [3.15MiB/s] > 102MiB 0:00:42 [2.42MiB/s] > 197MiB 0:00:50 [3.91MiB/s] > 321MiB 0:01:01 [5.21MiB/s] > 229MiB 0:00:18 [12.3MiB/s] > 109MiB 0:00:11 [ 9.7MiB/s] > 139MiB 0:00:14 [9.32MiB/s] > 573MiB 0:00:35 [15.9MiB/s] > 64.1MiB 0:00:30 [2.11MiB/s] > 172MiB 0:00:11 [14.9MiB/s] > 98.9MiB 0:00:07 [14.1MiB/s] > 54MiB 0:00:08 [6.17MiB/s] > 78.6MiB 0:00:02 [32.1MiB/s] > 15.1MiB 0:00:01 [12.5MiB/s] > 20.6MiB 0:00:00 [ 23MiB/s] > 20.3MiB 0:00:00 [ 23MiB/s] > 110MiB 0:00:14 [7.39MiB/s] > 62.6MiB 0:00:11 [5.67MiB/s] > 65.7MiB 0:00:08 [7.58MiB/s] > 731MiB 0:00:42 [ 17MiB/s] > 73.7MiB 0:00:29 [ 2.5MiB/s] > 322MiB 0:00:53 [6.04MiB/s] > 105MiB 0:00:35 [2.95MiB/s] > 95.2MiB 0:00:36 [2.58MiB/s] > 74.2MiB 0:00:30 [2.43MiB/s] > 75.5MiB 0:00:46 [1.61MiB/s] > > This is 9.3 GB of total diffs between all the snapshots I got. > Plus 15 GB of initial snapshot means there is about 25 GB used, > while df reports twice the amount, way too much for overhead: > /dev/sda264G 52G 11G 84% / > > > # btrfs quota enable / > # btrfs qgroup show / > WARNING: quota disabled, qgroup data may be out of date > [...] > # btrfs quota enable /- for the second time! > # btrfs qgroup show / > WARNING: qgroup data inconsistent, rescan recommended Please wait the rescan, or any number is not correct. (Although it will only be less than actual occupied space) It's highly recommended to read btrfs-quota(8) and btrfs-qgroup(8) to ensure you understand all the limitation. > [...] > 0/42815.96GiB 19.23MiBnewly created (now) snapshot > > > > Assuming the qgroups output is bugus and the space isn't physically > occupied (which is coherent with btrfs fi du output and my expectation) > the question remains: why is that bogus-excl removed from available > space as reported by df or btrfs fi df/usage? And how to reclaim it? Already explained the difference in another thread. Thanks, Qu > > > [~/test]# btrfs device usage / > /dev/sda2, ID: 1 >Device size:64.00GiB >Device slack: 0.00B >Data,single: 1.07GiB >Data,RAID1: 55.97GiB >Metadata,RAID1: 2.00GiB >
Re: exclusive subvolume space missing
On Fri, Dec 01, 2017 at 21:36:14 +, Hugo Mills wrote: >The thing I'd first go looking for here is some rogue process > writing lots of data. I've had something like this happen to me > before, a few times. First, I'd look for large files with "du -ms /* | > sort -n", then work down into the tree until you find them. I already did a handful of searches (mounting parent node in separate directory and diving into default working subvolume on order to unhide possible things covered by any other mounts on top of actual root fs). That's how it looks like: [~/test/@]# du -sh . 15G . >If that doesn't show up anything unusually large, then lsof to look > for open but deleted files (orphans) which are still being written to > by some process. No (deleted) files, the only activity on iotop are internals... 174 be/4 root 15.64 K/s3.67 M/s 0.00 % 5.88 % [btrfs-transacti] 1439 be/4 root0.00 B/s 1173.22 K/s 0.00 % 0.00 % [kworker/u8:8] Only the systemd-journald is writing, but the /var/log is mounted to separate ext3 parition (with journald restarted after the mount); this is also confirmed by looking into separate mount. Anyway that can't be opened-deleted files, as the usage doesn't change after booting into emergency. The worst thing is that the 8 GB was lost during the night, when nothing except for stats collector was running. As already said, this is not the classical "Linux eats my HDD" problem. >This is very likely _not_ to be a btrfs problem, but instead some > runaway process writing lots of crap very fast. Log files are probably > the most plausible location, but not the only one. That would be visible in iostat or /proc/diskstats - it isn't. The free space disappears without being physically written, which means it is some allocation problem. I also created a list of files modified between the snapshots with: find test/@ -xdev -newer some_reference_file_inside_snapshot and there is nothing bigger than a few MBs. I've changed the snapshots to rw and removed some data from all the instances: 4.8 GB in two ISO images and 5 GB-limited .ccache directory. After this I got 11 GB freed, so the numbers are fine. # btrfs fi usage / Overall: Device size: 128.00GiB Device allocated:117.19GiB Device unallocated: 10.81GiB Device missing: 0.00B Used:103.56GiB Free (estimated): 11.19GiB (min: 11.14GiB) Data ratio: 1.98 Metadata ratio: 2.00 Global reserve: 146.08MiB (used: 0.00B) Data,single: Size:1.19GiB, Used:1.18GiB /dev/sda2 1.07GiB /dev/sdb2 132.00MiB Data,RAID1: Size:55.97GiB, Used:50.30GiB /dev/sda2 55.97GiB /dev/sdb2 55.97GiB Metadata,RAID1: Size:2.00GiB, Used:908.61MiB /dev/sda2 2.00GiB /dev/sdb2 2.00GiB System,RAID1: Size:32.00MiB, Used:16.00KiB /dev/sda2 32.00MiB /dev/sdb2 32.00MiB Unallocated: /dev/sda2 4.93GiB /dev/sdb2 5.87GiB >> Now, the weird part for me is exclusive data count: >> >> # btrfs sub sh ./snapshot-171125 >> [...] >> Subvolume ID: 388 >> # btrfs fi du -s ./snapshot-171125 >> Total Exclusive Set shared Filename >> 21.50GiB63.35MiB20.77GiB snapshot-171125 >> >> How is that possible? This doesn't even remotely relate to 7.15 GiB >> from qgroup.~The same amount differs in total: 28.75-21.50=7.25 GiB. >> And the same happens with other snapshots, much more exclusive data >> shown in qgroup than actually found in files. So if not files, where >> is that space wasted? Metadata? > >Personally, I'd trust qgroups' output about as far as I could spit > Belgium(*). Well, there is something wrong here, as after removing the .ccache directories inside all the snapshots the 'excl' values decreased ...except for the last snapshot (the list below is short by ~40 snapshots that have 2 GB excl in total): qgroupid rfer excl 0/26012.25GiB 3.22GiB from 170712 - first snapshot 0/31217.54GiB 4.56GiB from 170811 0/36625.59GiB 2.44GiB from 171028 0/37023.27GiB 59.46MiB from 18 - prev snapshot 0/38821.69GiB 7.16GiB from 171125 - last snapshot 0/29124.29GiB 9.77GiB default subvolume [~/test/snapshot-171125]# du -sh . 15G . After changing back to ro I tested how much data really has changed between the previous and last snapshot: [~/test]# btrfs send -p snapshot-171118 snapshot-171125 | pv > /dev/null At subvol snapshot-171125 74.2MiB 0:00:32 [2.28MiB/s] This means there can't be 7 GiB of exclusive data in the last snapshot. Well, even btrfs send -p snapshot-170712 snapshot-171125 | pv > /dev/null 5.68GiB 0:03:23 [28.6MiB/s] I've created a new snapshot right now to
Re: exclusive subvolume space missing
On 2017年12月02日 00:15, Tomasz Pala wrote: > Hello, > > I got a problem with btrfs running out of space (not THE > Internet-wide, well known issues with interpretation). > > The problem is: something eats the space while not running anything that > justifies this. There were 18 GB free space available, suddenly it > dropped to 8 GB and then to 63 MB during one night. I recovered 1 GB > with rebalance -dusage=5 -musage=5 (or sth about), but it is being eaten > right now, just as I'm writing this e-mail: > > /dev/sda264G 63G 452M 100% / > /dev/sda264G 63G 365M 100% / > /dev/sda264G 63G 316M 100% / > /dev/sda264G 63G 287M 100% / > /dev/sda264G 63G 268M 100% / > /dev/sda264G 63G 239M 100% / > /dev/sda264G 63G 230M 100% / > /dev/sda264G 63G 182M 100% / > /dev/sda264G 63G 163M 100% / > /dev/sda264G 64G 153M 100% / > /dev/sda264G 64G 143M 100% / > /dev/sda264G 64G 96M 100% / > /dev/sda264G 64G 88M 100% / > /dev/sda264G 64G 57M 100% / > /dev/sda264G 64G 25M 100% / > > while my rough calculations show, that there should be at least 10 GB of > free space. After enabling quotas it is somehow confirmed: > > # btrfs qgroup sh --sort=excl / > qgroupid rfer excl > > 0/5 16.00KiB 16.00KiB > [30 snapshots with about 100 MiB excl] > 0/33324.53GiB305.79MiB > 0/29813.44GiB312.74MiB > 0/32723.79GiB427.13MiB > 0/33123.93GiB930.51MiB > 0/26012.25GiB 3.22GiB > 0/31219.70GiB 4.56GiB > 0/38828.75GiB 7.15GiB > 0/29130.60GiB 9.01GiB <- this is the running one > > This is about 30 GB total excl (didn't find a switch to sum this up). I > know I can't just add 'excl' to get usage, so tried to pinpoint the > exact files that occupy space in 0/388 exclusively (this is the last > snapshots taken, all of the snapshots are created from the running fs). I assume there is program eating up the space. Not btrfs itself. > > > Now, the weird part for me is exclusive data count: > > # btrfs sub sh ./snapshot-171125 > [...] > Subvolume ID: 388 > # btrfs fi du -s ./snapshot-171125 > Total Exclusive Set shared Filename > 21.50GiB63.35MiB20.77GiB snapshot-171125 That's the difference between how sub show and quota works. For quota, it's per-root owner check. Means even a file extent is shared between different inodes, if all inodes are inside the same subvolume, it's counted as exclusive. And if any of the file extent belongs to other subvolume, then it's counted as shared. For fi du, it's per-inode owner check. (The exact behavior is a little more complex, I'll skip such corner case to make it a little easier to understand). That's to say, if one file extent is shared by different inodes, then it's counted as shared, no matter if these inodes belong to different or the same subvolume. That's to say, "fi du" has a looser condition for "shared" calculation, and that should explain why you have 20+G shared. Thanks, Qu > > > How is that possible? This doesn't even remotely relate to 7.15 GiB > from qgroup.~The same amount differs in total: 28.75-21.50=7.25 GiB. > And the same happens with other snapshots, much more exclusive data > shown in qgroup than actually found in files. So if not files, where > is that space wasted? Metadata? > > btrfs-progs-4.12 running on Linux 4.9.46. > > best regards, > signature.asc Description: OpenPGP digital signature
Re: [PATCH 5/5] btrfs: Greatly simplify btrfs_read_dev_super
On 12/01/2017 05:19 PM, Nikolay Borisov wrote: Currently this function executes the inner loop at most 1 due to the i = 0; i < 1 condition. Furthermore, the btrfs_super_generation(super) > transid code in the if condition is never executed due to latest always set to NULL hence the first part of the condition always triggering. The gist of btrfs_read_dev_super is really to read the first superblock. Signed-off-by: Nikolay Borisov--- fs/btrfs/disk-io.c | 27 --- 1 file changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 82c96607fc46..6d5f632fd1e7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3170,37 +3170,18 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) { struct buffer_head *bh; - struct buffer_head *latest = NULL; - struct btrfs_super_block *super; - int i; - u64 transid = 0; - int ret = -EINVAL; + int ret; /* we would like to check all the supers, but that would make * a btrfs mount succeed after a mkfs from a different FS. * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ We need below loop to support the above comment at some point, instead of removing I would prefer to fix as per above comments. Thanks, Anand - for (i = 0; i < 1; i++) { - ret = btrfs_read_dev_one_super(bdev, i, ); - if (ret) - continue; - - super = (struct btrfs_super_block *)bh->b_data; - - if (!latest || btrfs_super_generation(super) > transid) { - brelse(latest); - latest = bh; - transid = btrfs_super_generation(super); - } else { - brelse(bh); - } - } - - if (!latest) + ret = btrfs_read_dev_one_super(bdev, 0, ); + if (ret) return ERR_PTR(ret); - return latest; + return bh; } /* -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Well, it's at zero now... # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.16GiB GlobalReserve, single: total=512.00MiB, used=0.00B On 01/12/17 16:47, Duncan wrote: Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as excerpted: On 12/01/2017 05:31 PM, Matt McKinnon wrote: Sorry, I missed your in-line reply: 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB Multi-TiB filesystem, check. total/used ratio looks healthy. Not so healthy, from here. Data/metadata are healthy, yes, but... Any usage at all of global reserve is a red flag indicating that something in the filesystem thinks, or thought when it resorted to global reserve, that space is running out. Global reserve usage doesn't really hint what the problem is, but it's definitely a red flag that there /is/ a problem, and it's easily overlooked, as it apparently was here. It's likely indication of a bug, possibly one of the ones fixed right around 4.12/4.13. I'll let the devs and better experts take it from there, but I'd certainly be worried until global reserve drops to zero usage. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as excerpted: > On 12/01/2017 05:31 PM, Matt McKinnon wrote: >> Sorry, I missed your in-line reply: >> >> >>> 2) How big is this filesystem? What does your `btrfs fi df >>> /mountpoint` say? >>> >> >> # btrfs fi df /export/ >> Data, single: total=30.45TiB, used=30.25TiB >> System, DUP: total=32.00MiB, used=3.62MiB >> Metadata, DUP: total=66.50GiB, used=65.08GiB >> GlobalReserve, single: total=512.00MiB, used=53.69MiB > > Multi-TiB filesystem, check. total/used ratio looks healthy. Not so healthy, from here. Data/metadata are healthy, yes, but... Any usage at all of global reserve is a red flag indicating that something in the filesystem thinks, or thought when it resorted to global reserve, that space is running out. Global reserve usage doesn't really hint what the problem is, but it's definitely a red flag that there /is/ a problem, and it's easily overlooked, as it apparently was here. It's likely indication of a bug, possibly one of the ones fixed right around 4.12/4.13. I'll let the devs and better experts take it from there, but I'd certainly be worried until global reserve drops to zero usage. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: exclusive subvolume space missing
On Fri, Dec 01, 2017 at 05:15:55PM +0100, Tomasz Pala wrote: > Hello, > > I got a problem with btrfs running out of space (not THE > Internet-wide, well known issues with interpretation). > > The problem is: something eats the space while not running anything that > justifies this. There were 18 GB free space available, suddenly it > dropped to 8 GB and then to 63 MB during one night. I recovered 1 GB > with rebalance -dusage=5 -musage=5 (or sth about), but it is being eaten > right now, just as I'm writing this e-mail: > > /dev/sda264G 63G 452M 100% / > /dev/sda264G 63G 365M 100% / > /dev/sda264G 63G 316M 100% / > /dev/sda264G 63G 287M 100% / > /dev/sda264G 63G 268M 100% / > /dev/sda264G 63G 239M 100% / > /dev/sda264G 63G 230M 100% / > /dev/sda264G 63G 182M 100% / > /dev/sda264G 63G 163M 100% / > /dev/sda264G 64G 153M 100% / > /dev/sda264G 64G 143M 100% / > /dev/sda264G 64G 96M 100% / > /dev/sda264G 64G 88M 100% / > /dev/sda264G 64G 57M 100% / > /dev/sda264G 64G 25M 100% / > > while my rough calculations show, that there should be at least 10 GB of > free space. After enabling quotas it is somehow confirmed: > > # btrfs qgroup sh --sort=excl / > qgroupid rfer excl > > 0/5 16.00KiB 16.00KiB > [30 snapshots with about 100 MiB excl] > 0/33324.53GiB305.79MiB > 0/29813.44GiB312.74MiB > 0/32723.79GiB427.13MiB > 0/33123.93GiB930.51MiB > 0/26012.25GiB 3.22GiB > 0/31219.70GiB 4.56GiB > 0/38828.75GiB 7.15GiB > 0/29130.60GiB 9.01GiB <- this is the running one > > This is about 30 GB total excl (didn't find a switch to sum this up). I > know I can't just add 'excl' to get usage, so tried to pinpoint the > exact files that occupy space in 0/388 exclusively (this is the last > snapshots taken, all of the snapshots are created from the running fs). The thing I'd first go looking for here is some rogue process writing lots of data. I've had something like this happen to me before, a few times. First, I'd look for large files with "du -ms /* | sort -n", then work down into the tree until you find them. If that doesn't show up anything unusually large, then lsof to look for open but deleted files (orphans) which are still being written to by some process. This is very likely _not_ to be a btrfs problem, but instead some runaway process writing lots of crap very fast. Log files are probably the most plausible location, but not the only one. > Now, the weird part for me is exclusive data count: > > # btrfs sub sh ./snapshot-171125 > [...] > Subvolume ID: 388 > # btrfs fi du -s ./snapshot-171125 > Total Exclusive Set shared Filename > 21.50GiB63.35MiB20.77GiB snapshot-171125 > > > How is that possible? This doesn't even remotely relate to 7.15 GiB > from qgroup.~The same amount differs in total: 28.75-21.50=7.25 GiB. > And the same happens with other snapshots, much more exclusive data > shown in qgroup than actually found in files. So if not files, where > is that space wasted? Metadata? Personally, I'd trust qgroups' output about as far as I could spit Belgium(*). Hugo. (*) No offence indended to Belgium. -- Hugo Mills | I used to live in hope, but I got evicted. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: exclusive subvolume space missing
Tomasz Pala posted on Fri, 01 Dec 2017 17:15:55 +0100 as excerpted: > Hello, > > I got a problem with btrfs running out of space (not THE > Internet-wide, well known issues with interpretation). > > The problem is: something eats the space while not running anything that > justifies this. There were 18 GB free space available, suddenly it > dropped to 8 GB and then to 63 MB during one night. I recovered 1 GB > with rebalance -dusage=5 -musage=5 (or sth about), but it is being eaten > right now, just as I'm writing this e-mail: > > /dev/sda264G 63G 452M 100% / > /dev/sda264G 63G 365M 100% / > /dev/sda264G 63G 316M 100% / > /dev/sda264G 63G 287M 100% / > /dev/sda264G 63G 268M 100% / > /dev/sda264G 63G 239M 100% / > /dev/sda264G 63G 230M 100% / > /dev/sda264G 63G 182M 100% / > /dev/sda264G 63G 163M 100% / > /dev/sda264G 64G 153M 100% / > /dev/sda264G 64G 143M 100% / > /dev/sda264G 64G 96M 100% / > /dev/sda264G 64G 88M 100% / > /dev/sda264G 64G 57M 100% / > /dev/sda264G 64G 25M 100% / Scary. > while my rough calculations show, that there should be at least 10 GB of > free space. After enabling quotas it is somehow confirmed: I don't use quotas so won't claim working knowledge or an explanation of that side of things, however... > > btrfs-progs-4.12 running on Linux 4.9.46. Until quite recently btrfs quotas were too buggy to recommend for use. While the known blocker-level bugs are now fixed, scaling and real-world performance are still an issue, and AFAIK, the fixes didn't make 4.9 and may not be backported as the feature was simply known to be broken beyond reliable usability at that point. Based on comments in other threads here, I /think/ the critical quota fixes hit 4.10, but of course not being an LTS, 4.10 is long out of support. I'd suggest either turning off and forgetting about quotas since it doesn't appear you actually need them, or upgrading to at least 4.13 and keeping current, or the LTS 4.14 if you want to stay on the same kernel series for awhile. As for the scaling and performance issues, during normal/generic filesystem use things are generally fine; it's various btrfs maintenance commands such as balance, snapshot deletion, and btrfs check, that have the scaling issues, and they have /some/ scaling issues even without quotas, it's just that quotas makes the problem *much* worse. One workaround for balance and snapshot deletion is to temporarily disable quotas while the job is running, then reenable (and rescan if necessary, as I don't use the feature here I'm not sure whether it is). That can literally turn a job that was looking to take /weeks/ due to the scaling issue, into a job of hours. Unfortunately, the sorts of conditions that would trigger running a btrfs check don't lend themselves to the same sort of workaround, so not having quotas on at all is the only workaround there. As to your space being eaten problem, the output of btrfs filesystem usage (and perhaps btrfs device usage if it's a multi-device btrfs) could be really helpful here, much more so than quota reports if it's a btrfs issue, or to help eliminate btrfs as the problem if it's not. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On Fri, Dec 1, 2017 at 12:07 PM, Matt McKinnonwrote: > Right. The file system is 48T, with 17T available, so we're not quite > pushing it yet. > > So far so good on the space_cache=v2 mount. I'm surprised this isn't on the > gotcha page in the wiki; it may end up making a world of difference to the > users here > I'd change one thing at a time so you learn what change does/doesn't resolve the problem. For storage of mostly large files, autodefrag doesn't seem applicable, but I'd leave it on for now since you've already made the space cache v2 change. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs - failed btrfs replace on RAID1 seems to have left things in a wrong state
Patrik Lundquist posted on Fri, 01 Dec 2017 10:29:43 +0100 as excerpted: > On 1 December 2017 at 08:18, Duncan <1i5t5.dun...@cox.net> wrote: >> >> When udev sees a device it triggers >> a btrfs device scan, which lets btrfs know which devices belong to which >> individual btrfs. But once it associates a device with a particular >> btrfs, there's nothing to unassociate it -- the only way to do that on >> a running kernel is to successfully complete a btrfs device remove or >> replacement... and your replace didn't complete due to error. >> >> Of course the other way to do it is to reboot, fresh kernel, fresh >> btrfs state, and it learns again what devices go with which btrfs >> when the appearing devices trigger the udev rule that triggers a >> btrfs scan. > > Or reload the btrfs module. Thanks. Yes. With a monolithic kernel I tend to forget about that (and as I have a btrfs root it wouldn't be possible anyway), but indeed, unloading/reloading the btrfs kernel module clears the btrfs device state tracking as effectively as a reboot. Good point! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Right. The file system is 48T, with 17T available, so we're not quite pushing it yet. So far so good on the space_cache=v2 mount. I'm surprised this isn't on the gotcha page in the wiki; it may end up making a world of difference to the users here Thanks again, Matt On 01/12/17 13:24, Hans van Kranenburg wrote: On 12/01/2017 06:57 PM, Holger Hoffstätte wrote: On 12/01/17 18:34, Matt McKinnon wrote: Thanks, I'll give space_cache=v2 a shot. Yes, very much recommended. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ Turn autodefrag off and use noatime instead of relatime. Your filesystem also seems very full, We don't know. btrfs fi df only displays allocated space. And that being full is good, it means not too much free space fragments everywhere. that's bad with every filesystem but *especially* with btrfs because the allocator has to work really hard to find free space for COWing. Really consider deleting stuff or adding more space. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 12/01/2017 06:57 PM, Holger Hoffstätte wrote: > On 12/01/17 18:34, Matt McKinnon wrote: >> Thanks, I'll give space_cache=v2 a shot. > > Yes, very much recommended. > >> My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ > > Turn autodefrag off and use noatime instead of relatime. > > Your filesystem also seems very full, We don't know. btrfs fi df only displays allocated space. And that being full is good, it means not too much free space fragments everywhere. > that's bad with every filesystem but > *especially* with btrfs because the allocator has to work really hard to find > free space for COWing. Really consider deleting stuff or adding more space. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 2017-12-01 12:13, Andrei Borzenkov wrote: 01.12.2017 20:06, Hans van Kranenburg пишет: Additional tips (forgot to ask for your /proc/mounts before): * Use the noatime mount option, so that only accessing files does not lead to changes in metadata, Is not 'lazytime" default today? It gives you correct atime + no extra metadata update cause by update of atime only. Unless things have changed since the last time this came up, BTRFS does not support the 'lazytime' mount option (but it doesn't complain about it either). Also, lazytime is independent from noatime, and using both can have benefits (lazytime will still have to write out the inode for every file read on the system every 24 hours, but with noatime it only has to write out the inode for files that have changed). On top of all that though, you generally shouldn't be trusting atime because: 1. Many people run with noatime (or patch their kernels to default to noatime instead of relatime), so you can't be certain if the atime is accurate at all. 2. It has somewhat non-intuitive semantics when dealing with directories. 3. Even without noatime thrown in, you only get 1 day resolution by default (as per the operation of 'relatime'). 4. Essentially nothing uses it other than find (which only has one day resolution as it's typically used) and older versions of mutt (which use it because of lazy programming), which is why issue 1 and 3 are the case. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 12/01/17 18:34, Matt McKinnon wrote: > Thanks, I'll give space_cache=v2 a shot. Yes, very much recommended. > My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ Turn autodefrag off and use noatime instead of relatime. Your filesystem also seems very full, that's bad with every filesystem but *especially* with btrfs because the allocator has to work really hard to find free space for COWing. Really consider deleting stuff or adding more space. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks, I'll give space_cache=v2 a shot. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
01.12.2017 20:06, Hans van Kranenburg пишет: > > Additional tips (forgot to ask for your /proc/mounts before): > * Use the noatime mount option, so that only accessing files does not > lead to changes in metadata, Is not 'lazytime" default today? It gives you correct atime + no extra metadata update cause by update of atime only. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 12/01/2017 05:31 PM, Matt McKinnon wrote: > Sorry, I missed your in-line reply: > > >> 1) The one right above, btrfs_write_out_cache, is the write-out of the >> free space cache v1. Do you see this for multiple seconds going on, and >> does it match the time when it's writing X MB/s to disk? >> > > It seems to only last until the next watch update. > > [] io_schedule+0x16/0x40 > [] get_request+0x23e/0x720 > [] blk_queue_bio+0xc1/0x3a0 > [] generic_make_request+0xf8/0x2a0 > [] submit_bio+0x75/0x150 > [] btrfs_map_bio+0xe5/0x2f0 [btrfs] > [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] > [] submit_one_bio+0x63/0xa0 [btrfs] > [] flush_epd_write_bio+0x3b/0x50 [btrfs] > [] flush_write_bio+0xe/0x10 [btrfs] > [] btree_write_cache_pages+0x379/0x450 [btrfs] > [] btree_writepages+0x5d/0x70 [btrfs] > [] do_writepages+0x1c/0x70 > [] __filemap_fdatawrite_range+0xaa/0xe0 > [] filemap_fdatawrite_range+0x13/0x20 > [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] > [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 > [btrfs] > [] btrfs_commit_transaction+0x665/0x900 [btrfs] > [] transaction_kthread+0x18a/0x1c0 [btrfs] > [] kthread+0x109/0x140 > [] ret_from_fork+0x25/0x30 > > The last three lines will stick around for a while. Is switching to > space cache v2 something that everyone should be doing? Something that > would be a good test at least? Yes. Read on. >> 2) How big is this filesystem? What does your `btrfs fi df >> /mountpoint` say? >> > > # btrfs fi df /export/ > Data, single: total=30.45TiB, used=30.25TiB > System, DUP: total=32.00MiB, used=3.62MiB > Metadata, DUP: total=66.50GiB, used=65.08GiB > GlobalReserve, single: total=512.00MiB, used=53.69MiB Multi-TiB filesystem, check. total/used ratio looks healthy. >> 3) What kind of workload are you running? E.g. how can you describe it >> within a range from "big files which just sit there" to "small writes >> and deletes all over the place all the time"? > > It's a pretty light workload most of the time. It's a file system that > exports two NFS shares to a small lab group. I believe it is more small > reads all over a large file (MRI imaging) rather than small writes. Ok. >> 4) What kernel version is this? `uname -a` output? > > # uname -a > Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 > x86_64 x86_64 x86_64 GNU/Linux > Yes, I'd recommend switching to space_cache v2, which stores the free space information in a tree instead of separate blobs, and does not block the transaction while writing out all info of all touched parts of the filesystem again. Here's of course the famous presentation with all kinds of info why: http://events.linuxfoundation.org/sites/events/files/slides/vault2016_0.pdf How: * umount the filesystem * btrfsck --clear-space-cache v1 /block/device * do a rw mount with the space_cache=v2 option added (only needed explicitly once) During that mount, it will generate the free space tree by reading the extent tree and writing the inverse of it. This will take some time, depending on how fast your storage can do random reads with a cold disk cache. For x86_64, using the free space cache v2 is fine since linux 4.5. Up to 4.9, there was a bug for big-endian systems. So, with your kernel it's absolutely fine. Why isn't this the default yet? It's because btrfs-progs don't have support to update the free space tree when doing offline modifications (like check --repair or btrfstune, which you hopefully don't need often anyway). So, until that's fully added, you need to do an `btrfsck --clear-space-cache v2`, then do the offline r/w action and then generate the tree again on next mount. Additional tips (forgot to ask for your /proc/mounts before): * Use the noatime mount option, so that only accessing files does not lead to changes in metadata, which lead to writes, which lead to cowing and writes in a new place, which lead to updates of the free space administration etc... -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
exclusive subvolume space missing
Hello, I got a problem with btrfs running out of space (not THE Internet-wide, well known issues with interpretation). The problem is: something eats the space while not running anything that justifies this. There were 18 GB free space available, suddenly it dropped to 8 GB and then to 63 MB during one night. I recovered 1 GB with rebalance -dusage=5 -musage=5 (or sth about), but it is being eaten right now, just as I'm writing this e-mail: /dev/sda264G 63G 452M 100% / /dev/sda264G 63G 365M 100% / /dev/sda264G 63G 316M 100% / /dev/sda264G 63G 287M 100% / /dev/sda264G 63G 268M 100% / /dev/sda264G 63G 239M 100% / /dev/sda264G 63G 230M 100% / /dev/sda264G 63G 182M 100% / /dev/sda264G 63G 163M 100% / /dev/sda264G 64G 153M 100% / /dev/sda264G 64G 143M 100% / /dev/sda264G 64G 96M 100% / /dev/sda264G 64G 88M 100% / /dev/sda264G 64G 57M 100% / /dev/sda264G 64G 25M 100% / while my rough calculations show, that there should be at least 10 GB of free space. After enabling quotas it is somehow confirmed: # btrfs qgroup sh --sort=excl / qgroupid rfer excl 0/5 16.00KiB 16.00KiB [30 snapshots with about 100 MiB excl] 0/33324.53GiB305.79MiB 0/29813.44GiB312.74MiB 0/32723.79GiB427.13MiB 0/33123.93GiB930.51MiB 0/26012.25GiB 3.22GiB 0/31219.70GiB 4.56GiB 0/38828.75GiB 7.15GiB 0/29130.60GiB 9.01GiB <- this is the running one This is about 30 GB total excl (didn't find a switch to sum this up). I know I can't just add 'excl' to get usage, so tried to pinpoint the exact files that occupy space in 0/388 exclusively (this is the last snapshots taken, all of the snapshots are created from the running fs). Now, the weird part for me is exclusive data count: # btrfs sub sh ./snapshot-171125 [...] Subvolume ID: 388 # btrfs fi du -s ./snapshot-171125 Total Exclusive Set shared Filename 21.50GiB63.35MiB20.77GiB snapshot-171125 How is that possible? This doesn't even remotely relate to 7.15 GiB from qgroup.~The same amount differs in total: 28.75-21.50=7.25 GiB. And the same happens with other snapshots, much more exclusive data shown in qgroup than actually found in files. So if not files, where is that space wasted? Metadata? btrfs-progs-4.12 running on Linux 4.9.46. best regards, -- Tomasz Pala-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Sorry, I missed your in-line reply: 1) The one right above, btrfs_write_out_cache, is the write-out of the free space cache v1. Do you see this for multiple seconds going on, and does it match the time when it's writing X MB/s to disk? It seems to only last until the next watch update. [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] [] transaction_kthread+0x18a/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 The last three lines will stick around for a while. Is switching to space cache v2 something that everyone should be doing? Something that would be a good test at least? 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB 3) What kind of workload are you running? E.g. how can you describe it within a range from "big files which just sit there" to "small writes and deletes all over the place all the time"? It's a pretty light workload most of the time. It's a file system that exports two NFS shares to a small lab group. I believe it is more small reads all over a large file (MRI imaging) rather than small writes. 4) What kernel version is this? `uname -a` output? # uname -a Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
These seem to come up most often: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 12/01/2017 04:24 PM, Matt McKinnon wrote: > Thanks for this. Here's what I get: Ok, and which one is displaying most of the time? > [...] > > [] io_schedule+0x16/0x40 > [] get_request+0x23e/0x720 > [] blk_queue_bio+0xc1/0x3a0 > [] generic_make_request+0xf8/0x2a0 > [] submit_bio+0x75/0x150 > [] btrfs_map_bio+0xe5/0x2f0 [btrfs] > [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] > [] submit_one_bio+0x63/0xa0 [btrfs] > [] flush_epd_write_bio+0x3b/0x50 [btrfs] > [] flush_write_bio+0xe/0x10 [btrfs] > [] btree_write_cache_pages+0x379/0x450 [btrfs] > [] btree_writepages+0x5d/0x70 [btrfs] > [] do_writepages+0x1c/0x70 > [] __filemap_fdatawrite_range+0xaa/0xe0 > [] filemap_fdatawrite_range+0x13/0x20 > [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] > [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 > [btrfs] > [] btrfs_commit_transaction+0x665/0x900 [btrfs] > > [...] > > [] io_schedule+0x16/0x40 > [] wait_on_page_bit+0xe8/0x120 > [] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs] > [] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs] > [] read_tree_block+0x32/0x50 [btrfs] > [] read_block_for_search.isra.32+0x120/0x2e0 [btrfs] > [] btrfs_next_old_leaf+0x215/0x400 [btrfs] > [] btrfs_next_leaf+0x10/0x20 [btrfs] > [] btrfs_lookup_csums_range+0x12e/0x410 [btrfs] > [] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs] > [] run_delalloc_nocow+0x9b2/0xa10 [btrfs] > [] run_delalloc_range+0x68/0x340 [btrfs] > [] writepage_delalloc.isra.47+0xf0/0x140 [btrfs] > [] __extent_writepage+0xc7/0x290 [btrfs] > [] extent_write_cache_pages.constprop.53+0x2b5/0x450 > [btrfs] > [] extent_writepages+0x4d/0x70 [btrfs] > [] btrfs_writepages+0x28/0x30 [btrfs] > [] do_writepages+0x1c/0x70 > [] __filemap_fdatawrite_range+0xaa/0xe0 > [] filemap_fdatawrite_range+0x13/0x20 > [] btrfs_fdatawrite_range+0x20/0x50 [btrfs] > [] __btrfs_write_out_cache+0x3d9/0x420 [btrfs] > [] btrfs_write_out_cache+0x86/0x100 [btrfs] > [] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs] > [] commit_cowonly_roots+0x1fb/0x290 [btrfs] > [] btrfs_commit_transaction+0x434/0x900 [btrfs] 1) The one right above, btrfs_write_out_cache, is the write-out of the free space cache v1. Do you see this for multiple seconds going on, and does it match the time when it's writing X MB/s to disk? 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? 3) What kind of workload are you running? E.g. how can you describe it within a range from "big files which just sit there" to "small writes and deletes all over the place all the time"? 4) What kernel version is this? `uname -a` output? -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks for this. Here's what I get: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 ... [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] ... [] io_schedule+0x16/0x40 [] wait_on_page_bit+0xe8/0x120 [] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs] [] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs] [] read_tree_block+0x32/0x50 [btrfs] [] read_block_for_search.isra.32+0x120/0x2e0 [btrfs] [] btrfs_next_old_leaf+0x215/0x400 [btrfs] [] btrfs_next_leaf+0x10/0x20 [btrfs] [] btrfs_lookup_csums_range+0x12e/0x410 [btrfs] [] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs] [] run_delalloc_nocow+0x9b2/0xa10 [btrfs] [] run_delalloc_range+0x68/0x340 [btrfs] [] writepage_delalloc.isra.47+0xf0/0x140 [btrfs] [] __extent_writepage+0xc7/0x290 [btrfs] [] extent_write_cache_pages.constprop.53+0x2b5/0x450 [btrfs] [] extent_writepages+0x4d/0x70 [btrfs] [] btrfs_writepages+0x28/0x30 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_fdatawrite_range+0x20/0x50 [btrfs] [] __btrfs_write_out_cache+0x3d9/0x420 [btrfs] [] btrfs_write_out_cache+0x86/0x100 [btrfs] [] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs] [] commit_cowonly_roots+0x1fb/0x290 [btrfs] [] btrfs_commit_transaction+0x434/0x900 [btrfs] ... [] tree_search_offset.isra.23+0x37/0x1d0 [btrfs] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
On 12/01/2017 03:25 PM, Matt McKinnon wrote: > > Is there any way to figure out what exactly btrfs-transacti is chugging > on? I have a few file systems that seem to get wedged for days on end > with this process pegged around 100%. I've stopped all snapshots, made > sure no quotas were enabled, turned on autodefrag in the mount options, > tried manual defragging, kernel upgrades, yet still this brings my > system to a crawl. > > Network I/O to the system seems very tiny. The only I/O I see to the > disk is btrfs-transacti writing a couple M/s. > > # time touch foo > > real 2m54.303s > user 0m0.000s > sys 0m0.002s > > # uname -r > 4.12.8-custom > > # btrfs --version > btrfs-progs v4.13.3 > > Yes, I know I'm a bit behind there... One of the simple things you can do is watch the stack traces of the kernel thread. watch 'cat /proc//stack' where is the pid of the btrfs-transaction process. In there, you will see a pattern of reoccuring things, like, it's searching for free space, it's writing out free space cache, or other things. Correlate this with the disk write traffic and see if we get a step further. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-transacti hammering the system
Hi All, Is there any way to figure out what exactly btrfs-transacti is chugging on? I have a few file systems that seem to get wedged for days on end with this process pegged around 100%. I've stopped all snapshots, made sure no quotas were enabled, turned on autodefrag in the mount options, tried manual defragging, kernel upgrades, yet still this brings my system to a crawl. Network I/O to the system seems very tiny. The only I/O I see to the disk is btrfs-transacti writing a couple M/s. # time touch foo real2m54.303s user0m0.000s sys 0m0.002s # uname -r 4.12.8-custom # btrfs --version btrfs-progs v4.13.3 Yes, I know I'm a bit behind there... -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs-progs - failed btrfs replace on RAID1 seems to have left things in a wrong state
Duncan, Thank you for your thorough response to my problem. I am now wiser in my understanding of how btrfs works in RAID1 thanks to your words. Last night I worked with someone in the IRC channel and we essentially came to the exact same conclusion. I used wipefs -a on the errant drive. Rebooted and viola. As of last night the replace was running fine. (Didn't have time to check this morning before heading out) The people on IRC had recommended filing a bug based on the fact that a btrfs filesystem was created during the replace, but if I understand your feedback, this has already been noted and there are patches being considered. As for your backup feedback, it has been thoroughly beaten into my head over the last half-decade that RAID is not backup. Although, I'd argue that RAID on btrfs or ZFS making use of snapshots is pretty darn close. (Since it covers the fat-finger situation - although it doesn't cover the MOBO frying your hard-drives situation) But I do have offsite backup - it's just that it's with a commercial provider (as opposed to, say, a friend's house) so I didn't want to have to download 3TB if things got borked. (I consider that my house burning/theft backup) And it IS in the plans to have a separate backup system in my house. I just haven't spent the money yet as it's currently a bit tight. But I do appreciate that you took the time to explain that in case I didn't know about it. And it's on the mailing list archives now so if someone else is under the misunderstanding that RAID is backup they can also be educated. Anyway, this is running a bit long. I just want to conclude by again offering my thanks at your very thorough response. If I hadn't been able to obtain help on the IRC, this would have put me on the right path. And it came with knowledge rather than just a list of instructions. So thanks for that as well. -- Eric Mesa http://www.ericmesa.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
On Wed 29-11-17 13:38:26, Chris Mason wrote: > On 11/29/2017 12:05 PM, Tejun Heo wrote: > >On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote: > >>Hello, > >> > >>On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote: > >>>What has happened with this patch set? > >> > >>No idea. cc'ing Chris directly. Chris, if the patchset looks good, > >>can you please route them through the btrfs tree? > > > >lol looking at the patchset again, I'm not sure that's obviously the > >right tree. It can either be cgroup, block or btrfs. If no one > >objects, I'll just route them through cgroup. > > We'll have to coordinate a bit during the next merge window but I don't have > a problem with these going in through cgroup. Dave does this sound good to > you? Also I was wondering about another thing: How does this play with Josef's series for metadata writeback (Metadata specific accouting and dirty writeout)? Would the per-inode selection of cgroup writeback still be needed when Josef's series is applied since metadata writeback then won't be associated with any particular mapping anymore? Honza -- Jan KaraSUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs - failed btrfs replace on RAID1 seems to have left things in a wrong state
On 1 December 2017 at 08:18, Duncan <1i5t5.dun...@cox.net> wrote: > > When udev sees a device it triggers > a btrfs device scan, which lets btrfs know which devices belong to which > individual btrfs. But once it associates a device with a particular > btrfs, there's nothing to unassociate it -- the only way to do that on > a running kernel is to successfully complete a btrfs device remove or > replacement... and your replace didn't complete due to error. > > Of course the other way to do it is to reboot, fresh kernel, fresh > btrfs state, and it learns again what devices go with which btrfs > when the appearing devices trigger the udev rule that triggers a > btrfs scan. Or reload the btrfs module. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] btrfs: Greatly simplify btrfs_read_dev_super
Currently this function executes the inner loop at most 1 due to the i = 0; i < 1 condition. Furthermore, the btrfs_super_generation(super) > transid code in the if condition is never executed due to latest always set to NULL hence the first part of the condition always triggering. The gist of btrfs_read_dev_super is really to read the first superblock. Signed-off-by: Nikolay Borisov--- fs/btrfs/disk-io.c | 27 --- 1 file changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 82c96607fc46..6d5f632fd1e7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3170,37 +3170,18 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) { struct buffer_head *bh; - struct buffer_head *latest = NULL; - struct btrfs_super_block *super; - int i; - u64 transid = 0; - int ret = -EINVAL; + int ret; /* we would like to check all the supers, but that would make * a btrfs mount succeed after a mkfs from a different FS. * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - for (i = 0; i < 1; i++) { - ret = btrfs_read_dev_one_super(bdev, i, ); - if (ret) - continue; - - super = (struct btrfs_super_block *)bh->b_data; - - if (!latest || btrfs_super_generation(super) > transid) { - brelse(latest); - latest = bh; - transid = btrfs_super_generation(super); - } else { - brelse(bh); - } - } - - if (!latest) + ret = btrfs_read_dev_one_super(bdev, 0, ); + if (ret) return ERR_PTR(ret); - return latest; + return bh; } /* -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] btrfs: Remove dead code
trans was statically assigned to NULL and this never changed over the course of btrfs_get_extent. So remove any code which checks whether trans != NULL and just hardcode the fact trans is always NULL. This fixes CID#112806 Signed-off-by: Nikolay Borisov--- fs/btrfs/inode.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 57785eadb95c..92d140b06271 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6943,7 +6943,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, struct extent_map *em = NULL; struct extent_map_tree *em_tree = >extent_tree; struct extent_io_tree *io_tree = >io_tree; - struct btrfs_trans_handle *trans = NULL; const bool new_inline = !page || create; read_lock(_tree->lock); @@ -6984,8 +6983,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, path->reada = READA_FORWARD; } - ret = btrfs_lookup_file_extent(trans, root, path, - objectid, start, trans != NULL); + ret = btrfs_lookup_file_extent(NULL, root, path, objectid, start, 0); if (ret < 0) { err = ret; goto out; @@ -7181,11 +7179,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, trace_btrfs_get_extent(root, inode, em); btrfs_free_path(path); - if (trans) { - ret = btrfs_end_transaction(trans); - if (!err) - err = ret; - } if (err) { free_extent_map(em); return ERR_PTR(err); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] btrfs: Fix possible off-by-one in btrfs_search_path_in_tree
The name char array passed to btrfs_search_path_in_tree is of size BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes are in the range of [0, 4079]. Currently the code uses the define but this represents an off-by-one. Signed-off-by: Nikolay Borisov--- fs/btrfs/ioctl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e8adebc8c1b0..fc148b7c4265 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2206,7 +2206,7 @@ static noinline int btrfs_search_path_in_tree(struct btrfs_fs_info *info, if (!path) return -ENOMEM; - ptr = [BTRFS_INO_LOOKUP_PATH_MAX]; + ptr = [BTRFS_INO_LOOKUP_PATH_MAX - 1]; key.objectid = tree_id; key.type = BTRFS_ROOT_ITEM_KEY; @@ -2272,8 +2272,8 @@ static noinline int btrfs_search_path_in_tree(struct btrfs_fs_info *info, static noinline int btrfs_ioctl_ino_lookup(struct file *file, void __user *argp) { -struct btrfs_ioctl_ino_lookup_args *args; -struct inode *inode; + struct btrfs_ioctl_ino_lookup_args *args; + struct inode *inode; int ret = 0; args = memdup_user(argp, sizeof(*args)); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] btrfs: Remove redundant NULL check
Before returning hole_em in btrfs_get_fiemap_extent we check if it's different than null. However, by the time this null check is triggered we already know hole_em is not null because it means it points to the em we found and it has already been dereferenced. Signed-off-by: Nikolay Borisov--- fs/btrfs/inode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 92d140b06271..9e0473c883ce 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7300,9 +7300,8 @@ struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, em->block_start = EXTENT_MAP_DELALLOC; em->block_len = found; } - } else if (hole_em) { + } else return hole_em; - } out: free_extent_map(hole_em); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] btrfs: Remove dead code
'clear' is always set to 0 (BTRFS_FEATURE_COMPAT_SAFE_CLEAR, BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR and BTRFS_FEATURE_INCOMPAT_SAFE_CLEAR are all defined to 0). So remove the code that logically can never execute. Signed-off-by: Nikolay Borisov--- fs/btrfs/sysfs.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index a8bafed931f4..37dbf2fccedc 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -84,8 +84,6 @@ static int can_modify_feature(struct btrfs_feature_attr *fa) if (set & fa->feature_bit) val |= 1; - if (clear & fa->feature_bit) - val |= 2; return val; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Misc cleanups
Here's a bunch of stuff that coverity found, this survived a full xfstest run. Nikolay Borisov (5): btrfs: Remove dead code btrfs: Remove dead code btrfs: Fix possible off-by-one in btrfs_search_path_in_tree btrfs: Remove redundant NULL check btrfs: Greatly simplify btrfs_read_dev_super fs/btrfs/disk-io.c | 27 --- fs/btrfs/inode.c | 12 ++-- fs/btrfs/ioctl.c | 6 +++--- fs/btrfs/sysfs.c | 2 -- 4 files changed, 9 insertions(+), 38 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html