Re: How to delete this snapshot, and how to succeed with balancing?
Simon King posted on Sat, 31 Oct 2015 18:31:45 +0100 as excerpted: > I know that "df" is different from "btrfs fi df". However, I see that df > shows significantly more free space after balancing. Also, when my > computer became unusable, the problem disappeared by balancing and > defragmentation (deleting the old snapshots was not enough). > > Unfortunately, df also shows significantly less free space after > UNSUCCESSFUL balancing. On a btrfs, df is hardly relevant at all, except to the extent that if you're trying to copy a 100 MB file and df says there's only 50 MB of room, obviously there's going to be problems. Btrfs actually has two-stage space allocation. At the first stage, entirely unallocated space is taken in largish chunks, normally separately for data and metadata, nominally 1 GiB size (tho larger or smaller is possible depending on the size of the filesystem and how close to fully chunk-allocated it is) for data chunks, 256 MiB for metadata -- but metadata chunks are normally allocated and used in dup mode, two at a time, on a single-device btrfs, so 512 MiB at a time. At the second stage, space is used from already allocated chunks as needed for files (data) or metadata. And particularly on older kernels, this is where the problem arises, since over time as files are created and deleted, all unallocated space tends to be allocated as data chunks, such that when the existing metadata chunks get full, there's no unallocated space left from which to allocate more metadata chunks, as it's all tied up in data chunks, many of which might be mostly or entirely empty as the files they once were allocated to contain have since been deleted or moved (due to btrfs copy- on-write) elsewhere. On newer kernels, entirely empty chunks are automatically deleted, significantly easing the problem, tho it can still happen if there's a lot of mostly but not entirely empty data chunks. Which is why df isn't always particularly reliable on btrfs, because it doesn't know about all this chunk preallocation stuff, and will (again, at least on older kernels, AFAIK newer ones have improved this to some extent but it's still not ideal) happily report all that empty data-chunk space as available for files, not knowing it's out of space to store metadata. Often, if you were to have one big file take all the space df reports, that would work, because tracking a single file uses only a relatively small bit of metadata space. But try to use only a tenth of the space with a thousand much smaller files, and the remaining metadata space may well be exhausted, allowing no more file creation, even tho df is still saying there's lots of room left, because it's all in data chunks! Which is where balance comes in, since in rewriting the chunks it consolidates them, eliminating chunks when say 3 2/3 full chunks combine into only two full chunks, returning the freed space to unallocated, so it can be allocated for either data or metadata as needed, once again. As for getting out of the tight spot you're in ATM, with all would-be unallocated space apparently (you didn't post btrfs fi show and df output, but this is what the symptoms suggest) gone, tied up in mostly empty data chunks, without even enough space to easily balance those data chunks to free up more space by consolidating them... There's some discussion on the btrfs wiki, in the free-space questions on the faq, and similarly in the problem-faq (watch the link wrap): FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_I_ran_out_of_disk_space.21 Also see FAQ sections 4.6-4.9, discussing free space, and 4.12, discussing balance. Problem-FAQ: https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space Basically, if filters won't let you do it, you can try deleting large files -- assuming they're not also referenced by still existing snapshots. That might empty a data chunk or two, allowing a balance -dusage=0 to eliminate it, giving you enough room to try a higher dusage number, perhaps 5% or 10%, then 20 and 50. (Above 50% the time will go up while the possible payback goes down, and it shouldn't be necessary until the filesystem gets real close to actually full, tho on my ssd, speeds are fast enough I'll sometimes try upto 70% or so.) If it's too tight for that or everything's snapshotted on snapshots you don't want to or can't delete, you can try adding (btrfs device add) a device temporarily. The device should be several gigs in size, minimum; even a few-GiB USB thumbdrive or the like can work, tho access can be slow. That should give you enough additional space to do the balance -dusage= thing, which, assuming it does consolidate nearly empty data chunks, freeing the extra space they took, should free up enough newly unallocated space on the original device, to do a btrfs device delete of the temporarily added device, returning everything that was on it temporarily,
Re: How to delete this snapshot, and how to succeed with balancing?
Hi Hugo, Am 31.10.2015 um 17:41 schrieb Hugo Mills: >> linux-va3e:~ # uname -a >> Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 >> 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux > >OK, that's a bit old -- you would probably do well to upgrade this > anyway, regardless of the issues you're having. (I'd recommend 4.1 at > the moment; there's a bug in 4.2 at the moment that affects > balancing). The latest version of openSuse is Tumbleweed, in a few days there will be openSuse Leap; I am not sure what kernel it would give me. >I thought snapper could automatically delete old snapshots. (I've > never used it, though, so I'm not sure). Worth looking at the snapper > config to see if you can tell it how many to keep. Probably it can, right. >You're telling it to move the first two chunks with less than 10% > usage. If all the other chunks are full, and there are two chunks (one > data, one metadata) with less than 10% usage, then they'll be moved to > two new chunks... with less than 10% usage. So it's perfectly possible > that the same command will show the same output. Do I understand correctly: In that situation, balancing would have no benefit, as two old chunks are moved to two new chunks? Then why are they moved at all? >Incidentally, I would suggest using -dlimit=2 on its own, rather > than both limit and usage. I combined the two, since -dlimit on its own won't work: linux-va3e:~ # btrfs balance start -dlimit=2 / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail >"btrfs balance start /" should rebalance the whole filesystem -- linux-va3e:~ # btrfs balance start / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail linux-va3e:~ # dmesg | tail [ 9814.499013] BTRFS info (device sda2): found 8153 extents [ 9815.254270] BTRFS info (device sda2): relocating block group 820062978048 flags 36 [ 9826.335122] BTRFS info (device sda2): found 8182 extents [ 9826.858482] BTRFS info (device sda2): relocating block group 805064146944 flags 36 [ 9839.444820] BTRFS info (device sda2): found 8184 extents [ 9839.822108] BTRFS info (device sda2): relocating block group 794595164160 flags 36 [ 9850.456697] BTRFS info (device sda2): found 8143 extents [ 9850.778264] BTRFS info (device sda2): relocating block group 794460946432 flags 36 [ 9862.546336] BTRFS info (device sda2): found 8140 extents [ 9862.890330] BTRFS info (device sda2): 12 enospc errors during balance > not that you'd need to for purposes of dealing with space usage > issues. I know that "df" is different from "btrfs fi df". However, I see that df shows significantly more free space after balancing. Also, when my computer became unusable, the problem disappeared by balancing and defragmentation (deleting the old snapshots was not enough). Unfortunately, df also shows significantly less free space after UNSUCCESSFUL balancing. >You may have more success using mkfs.btrfs --mixed when you create > the FS, which puts data and metadata in the same chunks. Can I do this in the running system? Or would that only be an option during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse: Only an option after nuking the old installation and installing a new one from scratch? Best regards, Simon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to delete this snapshot, and how to succeed with balancing?
On Sat, Oct 31, 2015 at 06:31:45PM +0100, Simon King wrote: > Hi Hugo, > > Am 31.10.2015 um 17:41 schrieb Hugo Mills: > >> linux-va3e:~ # uname -a > >> Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 > >> 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux > > > >OK, that's a bit old -- you would probably do well to upgrade this > > anyway, regardless of the issues you're having. (I'd recommend 4.1 at > > the moment; there's a bug in 4.2 at the moment that affects > > balancing). > > The latest version of openSuse is Tumbleweed, in a few days there will > be openSuse Leap; I am not sure what kernel it would give me. > > >I thought snapper could automatically delete old snapshots. (I've > > never used it, though, so I'm not sure). Worth looking at the snapper > > config to see if you can tell it how many to keep. > > Probably it can, right. > > >You're telling it to move the first two chunks with less than 10% > > usage. If all the other chunks are full, and there are two chunks (one > > data, one metadata) with less than 10% usage, then they'll be moved to > > two new chunks... with less than 10% usage. So it's perfectly possible > > that the same command will show the same output. > > Do I understand correctly: In that situation, balancing would have no > benefit, as two old chunks are moved to two new chunks? Then why are > they moved at all? Because that's what balance does -- it's a fairly blunt tool for one thing (evening out the usage across multiple devices) that happens to have some useful side-effects (compacting space usage into smaller numbers of block groups and freeing up the empty ones). > >Incidentally, I would suggest using -dlimit=2 on its own, rather > > than both limit and usage. > > I combined the two, since -dlimit on its own won't work: > > linux-va3e:~ # btrfs balance start -dlimit=2 / > ERROR: error during balancing '/' - No space left on device > There may be more info in syslog - try dmesg | tail And this is with a filesystem that's not fully allocated? (i.e. btrfs fi show indicates that used and total are different for each device). If that's the case, then you may have hit a known but unfixed bug to do with space allocation. > >"btrfs balance start /" should rebalance the whole filesystem -- > > linux-va3e:~ # btrfs balance start / > ERROR: error during balancing '/' - No space left on device > There may be more info in syslog - try dmesg | tail > linux-va3e:~ # dmesg | tail > [ 9814.499013] BTRFS info (device sda2): found 8153 extents > [ 9815.254270] BTRFS info (device sda2): relocating block group > 820062978048 flags 36 > [ 9826.335122] BTRFS info (device sda2): found 8182 extents > [ 9826.858482] BTRFS info (device sda2): relocating block group > 805064146944 flags 36 > [ 9839.444820] BTRFS info (device sda2): found 8184 extents > [ 9839.822108] BTRFS info (device sda2): relocating block group > 794595164160 flags 36 > [ 9850.456697] BTRFS info (device sda2): found 8143 extents > [ 9850.778264] BTRFS info (device sda2): relocating block group > 794460946432 flags 36 > [ 9862.546336] BTRFS info (device sda2): found 8140 extents > [ 9862.890330] BTRFS info (device sda2): 12 enospc errors during balance > > > > not that you'd need to for purposes of dealing with space usage > > issues. > > I know that "df" is different from "btrfs fi df". However, I see that df > shows significantly more free space after balancing. Also, when my > computer became unusable, the problem disappeared by balancing and > defragmentation (deleting the old snapshots was not enough). > > Unfortunately, df also shows significantly less free space after > UNSUCCESSFUL balancing. > > >You may have more success using mkfs.btrfs --mixed when you create > > the FS, which puts data and metadata in the same chunks. > > Can I do this in the running system? Or would that only be an option > during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse: > Only an option after nuking the old installation and installing a new > one from scratch? You'd have to recreate the FS, so it's a matter of a reinstall, or nuking it and restoring from your backups. Hugo. -- Hugo Mills | Le Corbusier's plan for improving Paris involved the hugo@... carfax.org.uk | assassination of the city, and its rebirth as tower http://carfax.org.uk/ | blocks. PGP: E2AB1DE4 | Robert Hughes, The Shock of the New signature.asc Description: Digital signature
Re: How to delete this snapshot, and how to succeed with balancing?
Hi! Am 31.10.2015 um 19:33 schrieb Hugo Mills: >> I combined the two, since -dlimit on its own won't work: >> >> linux-va3e:~ # btrfs balance start -dlimit=2 / >> ERROR: error during balancing '/' - No space left on device >> There may be more info in syslog - try dmesg | tail > >And this is with a filesystem that's not fully allocated? > (i.e. btrfs fi show indicates that used and total are different for > each device). If that's the case, then you may have hit a known but > unfixed bug to do with space allocation. linux-va3e:~ # btrfs fi show Label: none uuid: 656dc65f-240b-4137-a490-0175717dd7fa Total devices 1 FS bytes used 13.71GiB devid1 size 20.00GiB used 16.88GiB path /dev/sda2 btrfs-progs v4.0+20150429 Is there a manual work-around? >> Can I do this in the running system? Or would that only be an option >> during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse: >> Only an option after nuking the old installation and installing a new >> one from scratch? > >You'd have to recreate the FS, so it's a matter of a reinstall, or > nuking it and restoring from your backups. OK. So, I'll try to find out whether it is better to move on to Tumbleweed or to Leap (btw, I found out that the latter is based on the 4.1 kernel). Best regards, Simon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to delete this snapshot, and how to succeed with balancing?
On Sat, Oct 31, 2015 at 04:26:11PM +0100, Simon King wrote: > Hi! > > From the messages I see in this forum, I got the impression that it is a > developer forum and not a help forum. I seek help. So, please point me > to the right place if I shouldn't ask my questions here. No, you're good here. We do support as well as development. :) > Since I am new, first my data: > > linux-va3e:~ # uname -a > Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 > 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux OK, that's a bit old -- you would probably do well to upgrade this anyway, regardless of the issues you're having. (I'd recommend 4.1 at the moment; there's a bug in 4.2 at the moment that affects balancing). > linux-va3e:~ # btrfs --version > btrfs-progs v4.0+20150429 > > linux-va3e:~ # btrfs fi show > Label: none uuid: 656dc65f-240b-4137-a490-0175717dd7fa > Total devices 1 FS bytes used 13.49GiB > devid1 size 20.00GiB used 16.19GiB path /dev/sda2 > > btrfs-progs v4.0+20150429 > > linux-va3e:~ # btrfs fi df / > Data, single: total=12.62GiB, used=12.14GiB > System, DUP: total=32.00MiB, used=16.00KiB > Metadata, DUP: total=1.75GiB, used=1.35GiB > GlobalReserve, single: total=304.00MiB, used=0.00B > > dmesg > dmesg.log is attached. > > > When I installed openSuse 13.2 on my computer, I used the default: > - btrfs is used for a root partition of 20GB > - A program called "snapper" creates snapshots of the system whenever a > change is done (i.e., whenever a new program is installed or an old > program is upgraded). > > After a while, the root partition was full because of btrfs's metadata. > In an openSuse forum, people told me I should regularly delete the > snapshots, and should "btrfs balance" and "btrf fi defragment" in order > to keep the metadata under control. > > It was effective in the sense that I could use my computer again. It is > annoying that the user has to manually delete snapshots and has to > remember to regularly do balancing, though. I thought snapper could automatically delete old snapshots. (I've never used it, though, so I'm not sure). Worth looking at the snapper config to see if you can tell it how many to keep. > Now to my current problem: It seems that my root partition gradually > fills up. So, I am afraid that in a few weeks my computer will be broken > again. And that happens although I keep deleting old snapshots and try > balancing. > > My questions: > 1. There is one old snapshot that I can not delete: > # snapper -c root delete 318 > Fehler beim Löschen des Schnappschusses. > The error message does not hint *why* the snapshot can not be deleted. > Here is one observation that may indicate what goes wrong: > # find / -name 318 > /.snapshots/318 > find: File system loop detected; ‘/.snapshots/318/snapshot’ is part of > the same file system loop as ‘/’. > > So, what can I do to delete the snapshot to hopefully free some memory? That one, I don't know what's happening, I'm afraid. > 2. When I simply do "btrfs balance start /", then it says the device is > full. So, I tried > linux-va3e:~ # btrfs balance start -mlimit=1 -dlimit=2 -dusage=10 / > Done, had to relocate 2 out of 28 chunks > Fine. But when I try it again, it still says that it had to relocate 2 > out of 28 chunks! Shouldn't it be the case that the work has already > been done, so that no further relocation is needed with the same parameters? You're telling it to move the first two chunks with less than 10% usage. If all the other chunks are full, and there are two chunks (one data, one metadata) with less than 10% usage, then they'll be moved to two new chunks... with less than 10% usage. So it's perfectly possible that the same command will show the same output. Incidentally, I would suggest using -dlimit=2 on its own, rather than both limit and usage. You shouldn't normally need to run it on the metadata: The usual case of early ENOSPC is that all of the block groups in the filesystem are allocated (i.e. the space is reserved for either data or metadata, not necessarily used), and then the metadata fills up, while there is still lots of space for data allocated but unused. The balance simply moves some data around so that one or more of the data block groups can be freed up, giving the FS some more space that it can allocate to metadata. > 3. When I increase the parameters, I always come to the point that there > is no space left on device. So, how can I achieve full balance of the > system? "btrfs balance start /" should rebalance the whole filesystem -- not that you'd need to for purposes of dealing with space usage issues. > One last remark: On the openSuse forum, I was advised to re-install the > system, and either reserve at least 50GB for the root partition, or drop > btrfs and use ext4 for the root partition. I would like to avoid such > trouble and hope that you can tell me how to sanitise my root partition,
Re: How to delete this snapshot, and how to succeed with balancing?
Hi Simon, >>> linux-va3e:~ # btrfs balance start -dlimit=2 / >>> ERROR: error during balancing '/' - No space left on device >>> There may be more info in syslog - try dmesg | tail >> >>And this is with a filesystem that's not fully allocated? >> (i.e. btrfs fi show indicates that used and total are different for >> each device). If that's the case, then you may have hit a known but >> unfixed bug to do with space allocation. > > linux-va3e:~ # btrfs fi show > Label: none uuid: 656dc65f-240b-4137-a490-0175717dd7fa > Total devices 1 FS bytes used 13.71GiB > devid1 size 20.00GiB used 16.88GiB path /dev/sda2 > > btrfs-progs v4.0+20150429 > > Is there a manual work-around? For 'No space left on device', a trick I once saw is to run: btrfs balance start -dusage=0 -musage=0 / Under certain circumstances (i don't remember which kernel, tools versions etc), this enables you to create files again on the filesystem. Looking at the btrfs fi df / output, I don't see a real need for balancing, the numbers can be much different and then balance might be usefull. The 318 snaphot is more of a problem and you should get rid of (some/unneeded/all) snapshots first. Default openSuse snapshot ages is high (months and years) so maybe you want to edit configs /etc/snapper/configs/<> /etc/sysconfig/snapper to keep snapshot age just 1 or 2 days or so, but it really depends on how you use the notebook and the subvolumes on the filesystem. Or maybe you just disable snapper snapshotting completely, as 20GB will quite easily get too full with default snapper config. A crontask will automatically delete too old snapshots based on the snapper config. if command (318 or with higher snapshot number) snapper -c root delete 1-318 does not work, or the crontask fails, try btrfs sub del /.snapshots//snapshot The 318 related error might also be fixed/workedaround by newer tools/kernel. Maybe get newer (4.2.3) btrfstools from this repo ttp://download.opensuse.org/repositories/filesystems/openSUSE_13.2/ and find some 4.1 kernel rpm for openSuse 13.2 (or compile your own from kernel.org) You could also start with a Leap/Tumbleweed liveDVD and mount your /dev/sda2 somewhere and run the commands suggested above. You should probably install/enable btrfsmaintenance package (and tune its config) so that defrag and balancing runs as crontask. And one important thing: A btrfs fi defragment with still many snapshots around and 20GB rootfs will make the situation worse... /Henk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to delete this snapshot, and how to succeed with balancing?
On Sat, Oct 31, 2015 at 11:45:15PM +0100, Henk Slager wrote: > Hi Simon, > > >>> linux-va3e:~ # btrfs balance start -dlimit=2 / > >>> ERROR: error during balancing '/' - No space left on device > >>> There may be more info in syslog - try dmesg | tail > >> > >>And this is with a filesystem that's not fully allocated? > >> (i.e. btrfs fi show indicates that used and total are different for > >> each device). If that's the case, then you may have hit a known but > >> unfixed bug to do with space allocation. > > > > linux-va3e:~ # btrfs fi show > > Label: none uuid: 656dc65f-240b-4137-a490-0175717dd7fa > > Total devices 1 FS bytes used 13.71GiB > > devid1 size 20.00GiB used 16.88GiB path /dev/sda2 > > > > btrfs-progs v4.0+20150429 > > > > Is there a manual work-around? Not that we've found in the last year or so of poking at it, I'm afraid. > For 'No space left on device', a trick I once saw is to run: > > btrfs balance start -dusage=0 -musage=0 / > > Under certain circumstances (i don't remember which kernel, tools > versions etc), this enables you to create files again on the > filesystem. That's basically what Simon's been doing (although a little more aggressively). It's not going to help here. > Looking at the btrfs fi df / output, I don't see a real need for > balancing, the numbers can be much different and then balance might be > usefull. If you're hitting ENOSPC with unallocated space, then that's a bug. In fact, it's a known bug that hasn't been fixed yet. Hugo. -- Hugo Mills | That's not rain, that's a lake with slots in it. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature