Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Duncan
Simon King posted on Sat, 31 Oct 2015 18:31:45 +0100 as excerpted:

> I know that "df" is different from "btrfs fi df". However, I see that df
> shows significantly more free space after balancing. Also, when my
> computer became unusable, the problem disappeared by balancing and
> defragmentation (deleting the old snapshots was not enough).
> 
> Unfortunately, df also shows significantly less free space after
> UNSUCCESSFUL balancing.

On a btrfs, df is hardly relevant at all, except to the extent that if
you're trying to copy a 100 MB file and df says there's only 50 MB of
room, obviously there's going to be problems.

Btrfs actually has two-stage space allocation.

At the first stage, entirely unallocated space is taken in largish chunks,
normally separately for data and metadata, nominally 1 GiB size (tho
larger or smaller is possible depending on the size of the filesystem and
how close to fully chunk-allocated it is) for data chunks,
256 MiB for metadata -- but metadata chunks are normally allocated and
used in dup mode, two at a time, on a single-device btrfs, so 512 MiB at a
time.

At the second stage, space is used from already allocated chunks as needed
for files (data) or metadata.

And particularly on older kernels, this is where the problem arises,
since over time as files are created and deleted, all unallocated space
tends to be allocated as data chunks, such that when the existing metadata
chunks get full, there's no unallocated space left from which to allocate
more metadata chunks, as it's all tied up in data chunks, many of which
might be mostly or entirely empty as the files they once were allocated to
contain have since been deleted or moved (due to btrfs copy-
on-write) elsewhere.

On newer kernels, entirely empty chunks are automatically deleted,
significantly easing the problem, tho it can still happen if there's a lot
of mostly but not entirely empty data chunks.

Which is why df isn't always particularly reliable on btrfs, because it
doesn't know about all this chunk preallocation stuff, and will (again,
at least on older kernels, AFAIK newer ones have improved this to some
extent but it's still not ideal) happily report all that empty data-chunk
space as available for files, not knowing it's out of space to store
metadata.  Often, if you were to have one big file take all the space df
reports, that would work, because tracking a single file uses only a
relatively small bit of metadata space.  But try to use only a tenth of
the space with a thousand much smaller files, and the remaining metadata
space may well be exhausted, allowing no more file creation, even tho df
is still saying there's lots of room left, because it's all in data
chunks!

Which is where balance comes in, since in rewriting the chunks it
consolidates them, eliminating chunks when say 3 2/3 full chunks combine
into only two full chunks, returning the freed space to unallocated, so it
can be allocated for either data or metadata as needed, once again.

As for getting out of the tight spot you're in ATM, with all would-be
unallocated space apparently (you didn't post btrfs fi show and df output,
but this is what the symptoms suggest) gone, tied up in mostly empty data
chunks, without even enough space to easily balance those data chunks to
free up more space by consolidating them...

There's some discussion on the btrfs wiki, in the free-space questions on
the faq, and similarly in the problem-faq (watch the link wrap):

FAQ:

https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_I_ran_out_of_disk_space.21

Also see FAQ sections 4.6-4.9, discussing free space, and 4.12,
discussing balance.

Problem-FAQ:

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space


Basically, if filters won't let you do it, you can try deleting large
files -- assuming they're not also referenced by still existing snapshots.
 That might empty a data chunk or two, allowing a balance -dusage=0 to
eliminate it, giving you enough room to try a higher dusage number,
perhaps 5% or 10%, then 20 and 50.  (Above 50% the time will go up while
the possible payback goes down, and it shouldn't be necessary until the
filesystem gets real close to actually full, tho on my ssd, speeds are
fast enough I'll sometimes try upto 70% or so.)

If it's too tight for that or everything's snapshotted on snapshots you
don't want to or can't delete, you can try adding (btrfs device add) a
device temporarily.  The device should be several gigs in size, minimum;
even a few-GiB USB thumbdrive or the like can work, tho access can be
slow.  That should give you enough additional space to do the balance
-dusage= thing, which, assuming it does consolidate nearly empty data
chunks, freeing the extra space they took, should free up enough newly
unallocated space on the original device, to do a btrfs device delete of
the temporarily added device, returning everything that was on it
temporarily, 

Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Simon King
Hi Hugo,

Am 31.10.2015 um 17:41 schrieb Hugo Mills:
>> linux-va3e:~ # uname -a
>> Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23
>> 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux
> 
>OK, that's a bit old -- you would probably do well to upgrade this
> anyway, regardless of the issues you're having. (I'd recommend 4.1 at
> the moment; there's a bug in 4.2 at the moment that affects
> balancing).

The latest version of openSuse is Tumbleweed, in a few days there will
be openSuse Leap; I am not sure what kernel it would give me.

>I thought snapper could automatically delete old snapshots. (I've
> never used it, though, so I'm not sure). Worth looking at the snapper
> config to see if you can tell it how many to keep.

Probably it can, right.

>You're telling it to move the first two chunks with less than 10%
> usage. If all the other chunks are full, and there are two chunks (one
> data, one metadata) with less than 10% usage, then they'll be moved to
> two new chunks... with less than 10% usage. So it's perfectly possible
> that the same command will show the same output.

Do I understand correctly: In that situation, balancing would have no
benefit, as two old chunks are moved to two new chunks? Then why are
they moved at all?

>Incidentally, I would suggest using -dlimit=2 on its own, rather
> than both limit and usage.

I combined the two, since -dlimit on its own won't work:

linux-va3e:~ # btrfs balance start -dlimit=2 /
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail

>"btrfs balance start /" should rebalance the whole filesystem --

linux-va3e:~ # btrfs balance start /
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail
linux-va3e:~ # dmesg | tail
[ 9814.499013] BTRFS info (device sda2): found 8153 extents
[ 9815.254270] BTRFS info (device sda2): relocating block group
820062978048 flags 36
[ 9826.335122] BTRFS info (device sda2): found 8182 extents
[ 9826.858482] BTRFS info (device sda2): relocating block group
805064146944 flags 36
[ 9839.444820] BTRFS info (device sda2): found 8184 extents
[ 9839.822108] BTRFS info (device sda2): relocating block group
794595164160 flags 36
[ 9850.456697] BTRFS info (device sda2): found 8143 extents
[ 9850.778264] BTRFS info (device sda2): relocating block group
794460946432 flags 36
[ 9862.546336] BTRFS info (device sda2): found 8140 extents
[ 9862.890330] BTRFS info (device sda2): 12 enospc errors during balance


> not that you'd need to for purposes of dealing with space usage
> issues.

I know that "df" is different from "btrfs fi df". However, I see that df
shows significantly more free space after balancing. Also, when my
computer became unusable, the problem disappeared by balancing and
defragmentation (deleting the old snapshots was not enough).

Unfortunately, df also shows significantly less free space after
UNSUCCESSFUL balancing.

>You may have more success using mkfs.btrfs --mixed when you create
> the FS, which puts data and metadata in the same chunks.

Can I do this in the running system? Or would that only be an option
during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse:
Only an option after nuking the old installation and installing a new
one from scratch?

Best regards,
Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Hugo Mills
On Sat, Oct 31, 2015 at 06:31:45PM +0100, Simon King wrote:
> Hi Hugo,
> 
> Am 31.10.2015 um 17:41 schrieb Hugo Mills:
> >> linux-va3e:~ # uname -a
> >> Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23
> >> 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux
> > 
> >OK, that's a bit old -- you would probably do well to upgrade this
> > anyway, regardless of the issues you're having. (I'd recommend 4.1 at
> > the moment; there's a bug in 4.2 at the moment that affects
> > balancing).
> 
> The latest version of openSuse is Tumbleweed, in a few days there will
> be openSuse Leap; I am not sure what kernel it would give me.
> 
> >I thought snapper could automatically delete old snapshots. (I've
> > never used it, though, so I'm not sure). Worth looking at the snapper
> > config to see if you can tell it how many to keep.
> 
> Probably it can, right.
> 
> >You're telling it to move the first two chunks with less than 10%
> > usage. If all the other chunks are full, and there are two chunks (one
> > data, one metadata) with less than 10% usage, then they'll be moved to
> > two new chunks... with less than 10% usage. So it's perfectly possible
> > that the same command will show the same output.
> 
> Do I understand correctly: In that situation, balancing would have no
> benefit, as two old chunks are moved to two new chunks? Then why are
> they moved at all?

   Because that's what balance does -- it's a fairly blunt tool for
one thing (evening out the usage across multiple devices) that happens
to have some useful side-effects (compacting space usage into smaller
numbers of block groups and freeing up the empty ones).

> >Incidentally, I would suggest using -dlimit=2 on its own, rather
> > than both limit and usage.
> 
> I combined the two, since -dlimit on its own won't work:
> 
> linux-va3e:~ # btrfs balance start -dlimit=2 /
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail

   And this is with a filesystem that's not fully allocated?
(i.e. btrfs fi show indicates that used and total are different for
each device). If that's the case, then you may have hit a known but
unfixed bug to do with space allocation.

> >"btrfs balance start /" should rebalance the whole filesystem --
> 
> linux-va3e:~ # btrfs balance start /
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
> linux-va3e:~ # dmesg | tail
> [ 9814.499013] BTRFS info (device sda2): found 8153 extents
> [ 9815.254270] BTRFS info (device sda2): relocating block group
> 820062978048 flags 36
> [ 9826.335122] BTRFS info (device sda2): found 8182 extents
> [ 9826.858482] BTRFS info (device sda2): relocating block group
> 805064146944 flags 36
> [ 9839.444820] BTRFS info (device sda2): found 8184 extents
> [ 9839.822108] BTRFS info (device sda2): relocating block group
> 794595164160 flags 36
> [ 9850.456697] BTRFS info (device sda2): found 8143 extents
> [ 9850.778264] BTRFS info (device sda2): relocating block group
> 794460946432 flags 36
> [ 9862.546336] BTRFS info (device sda2): found 8140 extents
> [ 9862.890330] BTRFS info (device sda2): 12 enospc errors during balance
> 
> 
> > not that you'd need to for purposes of dealing with space usage
> > issues.
> 
> I know that "df" is different from "btrfs fi df". However, I see that df
> shows significantly more free space after balancing. Also, when my
> computer became unusable, the problem disappeared by balancing and
> defragmentation (deleting the old snapshots was not enough).
> 
> Unfortunately, df also shows significantly less free space after
> UNSUCCESSFUL balancing.
> 
> >You may have more success using mkfs.btrfs --mixed when you create
> > the FS, which puts data and metadata in the same chunks.
> 
> Can I do this in the running system? Or would that only be an option
> during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse:
> Only an option after nuking the old installation and installing a new
> one from scratch?

   You'd have to recreate the FS, so it's a matter of a reinstall, or
nuking it and restoring from your backups.

   Hugo.

-- 
Hugo Mills | Le Corbusier's plan for improving Paris involved the
hugo@... carfax.org.uk | assassination of the city, and its rebirth as tower
http://carfax.org.uk/  | blocks.
PGP: E2AB1DE4  |   Robert Hughes, The Shock of the New


signature.asc
Description: Digital signature


Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Simon King
Hi!

Am 31.10.2015 um 19:33 schrieb Hugo Mills:
>> I combined the two, since -dlimit on its own won't work:
>>
>> linux-va3e:~ # btrfs balance start -dlimit=2 /
>> ERROR: error during balancing '/' - No space left on device
>> There may be more info in syslog - try dmesg | tail
> 
>And this is with a filesystem that's not fully allocated?
> (i.e. btrfs fi show indicates that used and total are different for
> each device). If that's the case, then you may have hit a known but
> unfixed bug to do with space allocation.

linux-va3e:~ # btrfs fi show
Label: none  uuid: 656dc65f-240b-4137-a490-0175717dd7fa
Total devices 1 FS bytes used 13.71GiB
devid1 size 20.00GiB used 16.88GiB path /dev/sda2

btrfs-progs v4.0+20150429

Is there a manual work-around?

>> Can I do this in the running system? Or would that only be an option
>> during upgrade of openSuse Harlequin to Tumbleweed/Leap? Or even worse:
>> Only an option after nuking the old installation and installing a new
>> one from scratch?
> 
>You'd have to recreate the FS, so it's a matter of a reinstall, or
> nuking it and restoring from your backups.

OK. So, I'll try to find out whether it is better to move on to
Tumbleweed or to Leap (btw, I found out that the latter is based on the
4.1 kernel).

Best regards,
Simon

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Hugo Mills
On Sat, Oct 31, 2015 at 04:26:11PM +0100, Simon King wrote:
> Hi!
> 
> From the messages I see in this forum, I got the impression that it is a
> developer forum and not a help forum. I seek help. So, please point me
> to the right place if I shouldn't ask my questions here.

   No, you're good here. We do support as well as development. :)

> Since I am new, first my data:
> 
> linux-va3e:~ # uname -a
> Linux linux-va3e.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23
> 00:46:04 UTC 2015 (6be6a97) x86_64 x86_64 x86_64 GNU/Linux

   OK, that's a bit old -- you would probably do well to upgrade this
anyway, regardless of the issues you're having. (I'd recommend 4.1 at
the moment; there's a bug in 4.2 at the moment that affects
balancing).

> linux-va3e:~ # btrfs --version
> btrfs-progs v4.0+20150429
> 
> linux-va3e:~ # btrfs fi show
> Label: none  uuid: 656dc65f-240b-4137-a490-0175717dd7fa
> Total devices 1 FS bytes used 13.49GiB
> devid1 size 20.00GiB used 16.19GiB path /dev/sda2
> 
> btrfs-progs v4.0+20150429
> 
> linux-va3e:~ # btrfs fi df /
> Data, single: total=12.62GiB, used=12.14GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=1.75GiB, used=1.35GiB
> GlobalReserve, single: total=304.00MiB, used=0.00B
> 
> dmesg > dmesg.log is attached.
> 
> 
> When I installed openSuse 13.2 on my computer, I used the default:
> - btrfs is used for a root partition of 20GB
> - A program called "snapper" creates snapshots of the system whenever a
> change is done (i.e., whenever a new program is installed or an old
> program is upgraded).
> 
> After a while, the root partition was full because of btrfs's metadata.
> In an openSuse forum, people told me I should regularly delete the
> snapshots, and should "btrfs balance" and "btrf fi defragment" in order
> to keep the metadata under control.
> 
> It was effective in the sense that I could use my computer again. It is
> annoying that the user has to manually delete snapshots and has to
> remember to regularly do balancing, though.

   I thought snapper could automatically delete old snapshots. (I've
never used it, though, so I'm not sure). Worth looking at the snapper
config to see if you can tell it how many to keep.

> Now to my current problem: It seems that my root partition gradually
> fills up. So, I am afraid that in a few weeks my computer will be broken
> again. And that happens although I keep deleting old snapshots and try
> balancing.
> 
> My questions:
> 1. There is one old snapshot that I can not delete:
>   # snapper -c root delete 318
>   Fehler beim Löschen des Schnappschusses.
> The error message does not hint *why* the snapshot can not be deleted.
> Here is one observation that may indicate what goes wrong:
>   # find / -name 318
>   /.snapshots/318
>   find: File system loop detected; ‘/.snapshots/318/snapshot’ is part of
> the same file system loop as ‘/’.
> 
> So, what can I do to delete the snapshot to hopefully free some memory?

   That one, I don't know what's happening, I'm afraid.

> 2. When I simply do "btrfs balance start /", then it says the device is
> full. So, I tried
>   linux-va3e:~ # btrfs balance start -mlimit=1 -dlimit=2  -dusage=10 /
>   Done, had to relocate 2 out of 28 chunks
> Fine. But when I try it again, it still says that it had to relocate 2
> out of 28 chunks! Shouldn't it be the case that the work has already
> been done, so that no further relocation is needed with the same parameters?

   You're telling it to move the first two chunks with less than 10%
usage. If all the other chunks are full, and there are two chunks (one
data, one metadata) with less than 10% usage, then they'll be moved to
two new chunks... with less than 10% usage. So it's perfectly possible
that the same command will show the same output.

   Incidentally, I would suggest using -dlimit=2 on its own, rather
than both limit and usage. You shouldn't normally need to run it on
the metadata: The usual case of early ENOSPC is that all of the block
groups in the filesystem are allocated (i.e. the space is reserved for
either data or metadata, not necessarily used), and then the metadata
fills up, while there is still lots of space for data allocated but
unused.  The balance simply moves some data around so that one or more
of the data block groups can be freed up, giving the FS some more
space that it can allocate to metadata.

> 3. When I increase the parameters, I always come to the point that there
> is no space left on device. So, how can I achieve full balance of the
> system?

   "btrfs balance start /" should rebalance the whole filesystem --
not that you'd need to for purposes of dealing with space usage
issues.

> One last remark: On the openSuse forum, I was advised to re-install the
> system, and either reserve at least 50GB for the root partition, or drop
> btrfs and use ext4 for the root partition. I would like to avoid such
> trouble and hope that you can tell me how to sanitise my root partition,

Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Henk Slager
Hi Simon,

>>> linux-va3e:~ # btrfs balance start -dlimit=2 /
>>> ERROR: error during balancing '/' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>
>>And this is with a filesystem that's not fully allocated?
>> (i.e. btrfs fi show indicates that used and total are different for
>> each device). If that's the case, then you may have hit a known but
>> unfixed bug to do with space allocation.
>
> linux-va3e:~ # btrfs fi show
> Label: none  uuid: 656dc65f-240b-4137-a490-0175717dd7fa
> Total devices 1 FS bytes used 13.71GiB
> devid1 size 20.00GiB used 16.88GiB path /dev/sda2
>
> btrfs-progs v4.0+20150429
>
> Is there a manual work-around?
For 'No space left on device', a trick I once saw is to run:

btrfs balance start -dusage=0 -musage=0 /

Under certain circumstances (i don't remember which kernel, tools
versions etc), this enables you to create files again on the
filesystem.
Looking at the   btrfs fi df /  output, I don't see a real need for
balancing, the numbers can be much different and then balance might be
usefull.

The 318 snaphot is more of a problem and you should get rid of
(some/unneeded/all) snapshots first. Default openSuse snapshot ages is
high (months and years) so maybe you want to edit configs
/etc/snapper/configs/<>
/etc/sysconfig/snapper

to keep snapshot age just 1 or 2 days or so, but it really depends on
how you use the notebook and the subvolumes on the filesystem. Or
maybe you just disable snapper snapshotting completely, as 20GB will
quite easily get too full with default snapper config. A crontask will
automatically delete too old snapshots based on the snapper config.

if command (318 or with higher snapshot number)
snapper -c root delete 1-318

does not work, or the crontask fails, try
btrfs sub del /.snapshots//snapshot

The 318 related error might also be fixed/workedaround by newer tools/kernel.
Maybe get newer (4.2.3) btrfstools from this repo
ttp://download.opensuse.org/repositories/filesystems/openSUSE_13.2/
and find some 4.1 kernel rpm for openSuse 13.2 (or compile your own
from kernel.org)

You could also start with a Leap/Tumbleweed liveDVD and mount your
/dev/sda2  somewhere and run the commands suggested above.

You should probably install/enable btrfsmaintenance package (and tune
its config) so that defrag and balancing runs as crontask.

And one important thing: A btrfs fi defragment with still many
snapshots around and 20GB rootfs will make the situation worse...

/Henk
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to delete this snapshot, and how to succeed with balancing?

2015-10-31 Thread Hugo Mills
On Sat, Oct 31, 2015 at 11:45:15PM +0100, Henk Slager wrote:
> Hi Simon,
> 
> >>> linux-va3e:~ # btrfs balance start -dlimit=2 /
> >>> ERROR: error during balancing '/' - No space left on device
> >>> There may be more info in syslog - try dmesg | tail
> >>
> >>And this is with a filesystem that's not fully allocated?
> >> (i.e. btrfs fi show indicates that used and total are different for
> >> each device). If that's the case, then you may have hit a known but
> >> unfixed bug to do with space allocation.
> >
> > linux-va3e:~ # btrfs fi show
> > Label: none  uuid: 656dc65f-240b-4137-a490-0175717dd7fa
> > Total devices 1 FS bytes used 13.71GiB
> > devid1 size 20.00GiB used 16.88GiB path /dev/sda2
> >
> > btrfs-progs v4.0+20150429
> >
> > Is there a manual work-around?

   Not that we've found in the last year or so of poking at it, I'm
afraid.

> For 'No space left on device', a trick I once saw is to run:
> 
> btrfs balance start -dusage=0 -musage=0 /
> 
> Under certain circumstances (i don't remember which kernel, tools
> versions etc), this enables you to create files again on the
> filesystem.

   That's basically what Simon's been doing (although a little more
aggressively). It's not going to help here.

> Looking at the   btrfs fi df /  output, I don't see a real need for
> balancing, the numbers can be much different and then balance might be
> usefull.

   If you're hitting ENOSPC with unallocated space, then that's a
bug. In fact, it's a known bug that hasn't been fixed yet.

   Hugo.

-- 
Hugo Mills | That's not rain, that's a lake with slots in it.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature