Re: Healthy amount of free space?

2018-07-20 Thread Austin S. Hemmelgarn

On 2018-07-20 01:01, Andrei Borzenkov wrote:

18.07.2018 16:30, Austin S. Hemmelgarn пишет:

On 2018-07-18 09:07, Chris Murphy wrote:

On Wed, Jul 18, 2018 at 6:35 AM, Austin S. Hemmelgarn
 wrote:


If you're doing a training presentation, it may be worth mentioning that
preallocation with fallocate() does not behave the same on BTRFS as
it does
on other filesystems.  For example, the following sequence of commands:

  fallocate -l X ./tmp
  dd if=/dev/zero of=./tmp bs=1 count=X

Will always work on ext4, XFS, and most other filesystems, for any
value of
X between zero and just below the total amount of free space on the
filesystem.  On BTRFS though, it will reliably fail with ENOSPC for
values
of X that are greater than _half_ of the total amount of free space
on the
filesystem (actually, greater than just short of half).  In essence,
preallocating space does not prevent COW semantics for the first write
unless the file is marked NOCOW.


Is this a bug, or is it suboptimal behavior, or is it intentional?

It's been discussed before, though I can't find the email thread right
now.  Pretty much, this is _technically_ not incorrect behavior, as the
documentation for fallocate doesn't say that subsequent writes can't
fail due to lack of space.  I personally consider it a bug though
because it breaks from existing behavior in a way that is avoidable and
defies user expectations.

There are two issues here:

1. Regions preallocated with fallocate still do COW on the first write
to any given block in that region.  This can be handled by either
treating the first write to each block as NOCOW, or by allocating a bit


How is it possible? As long as fallocate actually allocates space, this
should be checksummed which means it is no more possible to overwrite
it. May be fallocate on btrfs could simply reserve space. Not sure
whether it complies with fallocate specification, but as long as
intention is to ensure write will not fail for the lack of space it
should be adequate (to the extent it can be ensured on btrfs of course).
Also hole in file returns zeros by definition which also matches
fallocate behavior.
Except it doesn't _have_ to be checksummed if there's no data there, and 
that will always be the case for a new allocation.   When I say it could 
be NOCOW, I'm talking specifically about the first write to each newly 
allocated block (that is, one either beyond the previous end of the 
file, or one in a region that used to be a hole).  This obviously won't 
work for places where there are already data.



of extra space and doing a rotating approach like this for writes:
     - Write goes into the extra space.
     - Once the write is done, convert the region covered by the write
   into a new block of extra space.
     - When the final block of the preallocated region is written,
   deallocate the extra space.
2. Preallocation does not completely account for necessary metadata
space that will be needed to store the data there.  This may not be
necessary if the first issue is addressed properly.


And then I wonder what happens with XFS COW:

   fallocate -l X ./tmp
   cp --reflink ./tmp ./tmp2
   dd if=/dev/zero of=./tmp bs=1 count=X

I'm not sure.  In this particular case, this will fail on BTRFS for any
X larger than just short of one third of the total free space.  I would
expect it to fail for any X larger than just short of half instead.

ZFS gets around this by not supporting fallocate (well, kind of, if
you're using glibc and call posix_fallocate, that _will_ work, but it
will take forever because it works by writing out each block of space
that's being allocated, which, ironically, means that that still suffers
from the same issue potentially that we have).


What happens on btrfs then? fallocate specifies that new space should be
initialized to zero, so something should still write those zeros?

For new regions (places that were holes previously, or were beyond the 
end of the file), we create an unwritten extent, which is a region 
that's 'allocated', but everything reads back as zero.  The problem is 
that we don't write into the blocks allocated for the unwritten extent 
at all, and only deallocate them once a write to another block finishes. 
 In essence, we're (either explicitly or implicitly) applying COW 
semantics to a region that should not be COW until after the first write 
to each block.


For the case of calling fallocate on existing data, we don't really do 
anything (unless the flag telling fallocate to unshare the region is 
passed).  This is actually consistent with pretty much every other 
filesystem in existence, but that's because pretty much every other 
filesystem in existence implicitly provides the same guarantee that 
fallocate does for regions that already have data.  This case can in 
theory be handled by the same looping algorithm I described above 
without needing the base amount of space allocated, but I wouldn't 
consider it important 

Re: Healthy amount of free space?

2018-07-19 Thread Andrei Borzenkov
18.07.2018 16:30, Austin S. Hemmelgarn пишет:
> On 2018-07-18 09:07, Chris Murphy wrote:
>> On Wed, Jul 18, 2018 at 6:35 AM, Austin S. Hemmelgarn
>>  wrote:
>>
>>> If you're doing a training presentation, it may be worth mentioning that
>>> preallocation with fallocate() does not behave the same on BTRFS as
>>> it does
>>> on other filesystems.  For example, the following sequence of commands:
>>>
>>>  fallocate -l X ./tmp
>>>  dd if=/dev/zero of=./tmp bs=1 count=X
>>>
>>> Will always work on ext4, XFS, and most other filesystems, for any
>>> value of
>>> X between zero and just below the total amount of free space on the
>>> filesystem.  On BTRFS though, it will reliably fail with ENOSPC for
>>> values
>>> of X that are greater than _half_ of the total amount of free space
>>> on the
>>> filesystem (actually, greater than just short of half).  In essence,
>>> preallocating space does not prevent COW semantics for the first write
>>> unless the file is marked NOCOW.
>>
>> Is this a bug, or is it suboptimal behavior, or is it intentional?
> It's been discussed before, though I can't find the email thread right
> now.  Pretty much, this is _technically_ not incorrect behavior, as the
> documentation for fallocate doesn't say that subsequent writes can't
> fail due to lack of space.  I personally consider it a bug though
> because it breaks from existing behavior in a way that is avoidable and
> defies user expectations.
> 
> There are two issues here:
> 
> 1. Regions preallocated with fallocate still do COW on the first write
> to any given block in that region.  This can be handled by either
> treating the first write to each block as NOCOW, or by allocating a bit

How is it possible? As long as fallocate actually allocates space, this
should be checksummed which means it is no more possible to overwrite
it. May be fallocate on btrfs could simply reserve space. Not sure
whether it complies with fallocate specification, but as long as
intention is to ensure write will not fail for the lack of space it
should be adequate (to the extent it can be ensured on btrfs of course).
Also hole in file returns zeros by definition which also matches
fallocate behavior.

> of extra space and doing a rotating approach like this for writes:
>     - Write goes into the extra space.
>     - Once the write is done, convert the region covered by the write
>   into a new block of extra space.
>     - When the final block of the preallocated region is written,
>   deallocate the extra space.
> 2. Preallocation does not completely account for necessary metadata
> space that will be needed to store the data there.  This may not be
> necessary if the first issue is addressed properly.
>>
>> And then I wonder what happens with XFS COW:
>>
>>   fallocate -l X ./tmp
>>   cp --reflink ./tmp ./tmp2
>>   dd if=/dev/zero of=./tmp bs=1 count=X
> I'm not sure.  In this particular case, this will fail on BTRFS for any
> X larger than just short of one third of the total free space.  I would
> expect it to fail for any X larger than just short of half instead.
> 
> ZFS gets around this by not supporting fallocate (well, kind of, if
> you're using glibc and call posix_fallocate, that _will_ work, but it
> will take forever because it works by writing out each block of space
> that's being allocated, which, ironically, means that that still suffers
> from the same issue potentially that we have).

What happens on btrfs then? fallocate specifies that new space should be
initialized to zero, so something should still write those zeros?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-19 Thread Austin S. Hemmelgarn

On 2018-07-18 17:32, Chris Murphy wrote:

On Wed, Jul 18, 2018 at 12:01 PM, Austin S. Hemmelgarn
 wrote:

On 2018-07-18 13:40, Chris Murphy wrote:


On Wed, Jul 18, 2018 at 11:14 AM, Chris Murphy 
wrote:


I don't know for sure, but based on the addresses reported before and
after dd for the fallocated tmp file, it looks like Btrfs is not using
the originally fallocated addresses for dd. So maybe it is COWing into
new blocks, but is just as quickly deallocating the fallocated blocks
as it goes, and hence doesn't end up in enospc?



Previous thread is "Problem with file system" from August 2017. And
there's these reproduce steps from Austin which have fallocate coming
after the dd.

  truncate --size=4G ./test-fs
  mkfs.btrfs ./test-fs
  mkdir ./test
  mount -t auto ./test-fs ./test
  dd if=/dev/zero of=./test/test bs=65536 count=32768
  fallocate -l 2147483650 ./test/test && echo "Success!"


My test Btrfs is 2G not 4G, so I'm cutting the values of dd and
fallocate in half.

[chris@f28s btrfs]$ sudo dd if=/dev/zero of=tmp bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.13391 s, 147 MB/s
[chris@f28s btrfs]$ sync
[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G 1018M  1.1G  50% /mnt/btrfs
[chris@f28s btrfs]$ sudo fallocate -l 1000m tmp


Succeeds. If I do it with a 1200M file for dd and fallocate 1200M over
it, this fails, but I kinda expect that because there's only 1.1G free
space. But maybe that's what you're saying is the bug, it shouldn't
fail?


Yes, you're right, I had things backwards (well, kind of, this does work on
ext4 and regular XFS, so it arguably should work here).


I guess I'm confused what it even means to fallocate over a file with
in-use blocks unless either -d or -p options are used. And from the
man page, I don't grok the distinction between -d and -p either. But
based on their descriptions I'd expect they both should work without
enospc.

Without any specific options, it forces allocation of any sparse regions 
in the file (that is, it gets rid of holes in the file).  On BTRFS, I 
believe the command also forcibly unshares all the extents in the file 
(for the system call, there's a special flag for doing this). 
Additionally, you can extend a file with fallocate this way by 
specifying a length longer than the current size of the file, which 
guarantees that writes into that region will succeed, unlike truncating 
the file to a larger size, which just creates a hole at the end of the 
file to bring it up to size.


As far as `-d` versus `-p`:  `-p` directly translates to the option for 
the system call that punches a hole.  It requires a length and possibly 
an offset, and will punch a hole at that exact location of that exact 
size.  `-d` is a special option that's only available for the command. 
It tells the `fallocate` command to search the file for zero-filled 
regions, and punch holes there.  Neither option should ever trigger an 
ENOSPC, except possibly if it has to split an extent for some reason and 
you are completely out of metadata space.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
Related on XFS list.

https://www.spinics.net/lists/linux-xfs/msg20722.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
On Wed, Jul 18, 2018 at 12:01 PM, Austin S. Hemmelgarn
 wrote:
> On 2018-07-18 13:40, Chris Murphy wrote:
>>
>> On Wed, Jul 18, 2018 at 11:14 AM, Chris Murphy 
>> wrote:
>>
>>> I don't know for sure, but based on the addresses reported before and
>>> after dd for the fallocated tmp file, it looks like Btrfs is not using
>>> the originally fallocated addresses for dd. So maybe it is COWing into
>>> new blocks, but is just as quickly deallocating the fallocated blocks
>>> as it goes, and hence doesn't end up in enospc?
>>
>>
>> Previous thread is "Problem with file system" from August 2017. And
>> there's these reproduce steps from Austin which have fallocate coming
>> after the dd.
>>
>>  truncate --size=4G ./test-fs
>>  mkfs.btrfs ./test-fs
>>  mkdir ./test
>>  mount -t auto ./test-fs ./test
>>  dd if=/dev/zero of=./test/test bs=65536 count=32768
>>  fallocate -l 2147483650 ./test/test && echo "Success!"
>>
>>
>> My test Btrfs is 2G not 4G, so I'm cutting the values of dd and
>> fallocate in half.
>>
>> [chris@f28s btrfs]$ sudo dd if=/dev/zero of=tmp bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.13391 s, 147 MB/s
>> [chris@f28s btrfs]$ sync
>> [chris@f28s btrfs]$ df -h
>> FilesystemSize  Used Avail Use% Mounted on
>> /dev/mapper/vg-btrfstest  2.0G 1018M  1.1G  50% /mnt/btrfs
>> [chris@f28s btrfs]$ sudo fallocate -l 1000m tmp
>>
>>
>> Succeeds. If I do it with a 1200M file for dd and fallocate 1200M over
>> it, this fails, but I kinda expect that because there's only 1.1G free
>> space. But maybe that's what you're saying is the bug, it shouldn't
>> fail?
>
> Yes, you're right, I had things backwards (well, kind of, this does work on
> ext4 and regular XFS, so it arguably should work here).

I guess I'm confused what it even means to fallocate over a file with
in-use blocks unless either -d or -p options are used. And from the
man page, I don't grok the distinction between -d and -p either. But
based on their descriptions I'd expect they both should work without
enospc.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Austin S. Hemmelgarn

On 2018-07-18 13:40, Chris Murphy wrote:

On Wed, Jul 18, 2018 at 11:14 AM, Chris Murphy  wrote:


I don't know for sure, but based on the addresses reported before and
after dd for the fallocated tmp file, it looks like Btrfs is not using
the originally fallocated addresses for dd. So maybe it is COWing into
new blocks, but is just as quickly deallocating the fallocated blocks
as it goes, and hence doesn't end up in enospc?


Previous thread is "Problem with file system" from August 2017. And
there's these reproduce steps from Austin which have fallocate coming
after the dd.

 truncate --size=4G ./test-fs
 mkfs.btrfs ./test-fs
 mkdir ./test
 mount -t auto ./test-fs ./test
 dd if=/dev/zero of=./test/test bs=65536 count=32768
 fallocate -l 2147483650 ./test/test && echo "Success!"


My test Btrfs is 2G not 4G, so I'm cutting the values of dd and
fallocate in half.

[chris@f28s btrfs]$ sudo dd if=/dev/zero of=tmp bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.13391 s, 147 MB/s
[chris@f28s btrfs]$ sync
[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G 1018M  1.1G  50% /mnt/btrfs
[chris@f28s btrfs]$ sudo fallocate -l 1000m tmp


Succeeds. If I do it with a 1200M file for dd and fallocate 1200M over
it, this fails, but I kinda expect that because there's only 1.1G free
space. But maybe that's what you're saying is the bug, it shouldn't
fail?
Yes, you're right, I had things backwards (well, kind of, this does work 
on ext4 and regular XFS, so it arguably should work here).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
On Wed, Jul 18, 2018 at 11:14 AM, Chris Murphy  wrote:

> I don't know for sure, but based on the addresses reported before and
> after dd for the fallocated tmp file, it looks like Btrfs is not using
> the originally fallocated addresses for dd. So maybe it is COWing into
> new blocks, but is just as quickly deallocating the fallocated blocks
> as it goes, and hence doesn't end up in enospc?

Previous thread is "Problem with file system" from August 2017. And
there's these reproduce steps from Austin which have fallocate coming
after the dd.

truncate --size=4G ./test-fs
mkfs.btrfs ./test-fs
mkdir ./test
mount -t auto ./test-fs ./test
dd if=/dev/zero of=./test/test bs=65536 count=32768
fallocate -l 2147483650 ./test/test && echo "Success!"


My test Btrfs is 2G not 4G, so I'm cutting the values of dd and
fallocate in half.

[chris@f28s btrfs]$ sudo dd if=/dev/zero of=tmp bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.13391 s, 147 MB/s
[chris@f28s btrfs]$ sync
[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G 1018M  1.1G  50% /mnt/btrfs
[chris@f28s btrfs]$ sudo fallocate -l 1000m tmp


Succeeds. If I do it with a 1200M file for dd and fallocate 1200M over
it, this fails, but I kinda expect that because there's only 1.1G free
space. But maybe that's what you're saying is the bug, it shouldn't
fail?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
On Wed, Jul 18, 2018 at 11:06 AM, Austin S. Hemmelgarn
 wrote:
> On 2018-07-18 13:04, Chris Murphy wrote:
>>
>> On Wed, Jul 18, 2018 at 7:30 AM, Austin S. Hemmelgarn
>>  wrote:
>>
>>>
>>> I'm not sure.  In this particular case, this will fail on BTRFS for any X
>>> larger than just short of one third of the total free space.  I would
>>> expect
>>> it to fail for any X larger than just short of half instead.
>>
>>
>> I'm confused. I can't get it to fail when X is 3/4 of free space.
>>
>> lvcreate -V 2g -T vg/thintastic -n btrfstest
>> mkfs.btrfs -M /dev/mapper/vg-btrfstest
>> mount /dev/mapper/vg-btrfstest /mnt/btrfs
>> cd /mnt/btrfs
>> fallocate -l 1500m tmp
>> dd if=/dev/zero of=/mnt/btrfs/tmp bs=1M count=1450
>>
>> Succeeds. No enospc. This is on kernel 4.17.6.
>
> Odd, I could have sworn it would fail reliably.  Unless something has
> changed since I last tested though, doing it with X equal to the free space
> on the filesystem will fail.

OK well X is being defined twice here so I can't tell if I'm doing
this correctly. There's fallocate X and that's 75% of free space for
the empty fs at the time of fallocate.

And then there's dd which is 1450m which is ~2.67x the free space at
the time of dd.

I don't know for sure, but based on the addresses reported before and
after dd for the fallocated tmp file, it looks like Btrfs is not using
the originally fallocated addresses for dd. So maybe it is COWing into
new blocks, but is just as quickly deallocating the fallocated blocks
as it goes, and hence doesn't end up in enospc?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
On Wed, Jul 18, 2018 at 7:30 AM, Austin S. Hemmelgarn
 wrote:

>
> I'm not sure.  In this particular case, this will fail on BTRFS for any X
> larger than just short of one third of the total free space.  I would expect
> it to fail for any X larger than just short of half instead.

I'm confused. I can't get it to fail when X is 3/4 of free space.

lvcreate -V 2g -T vg/thintastic -n btrfstest
mkfs.btrfs -M /dev/mapper/vg-btrfstest
mount /dev/mapper/vg-btrfstest /mnt/btrfs
cd /mnt/btrfs
fallocate -l 1500m tmp
dd if=/dev/zero of=/mnt/btrfs/tmp bs=1M count=1450

Succeeds. No enospc. This is on kernel 4.17.6.


Copied from terminal:

[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G   17M  2.0G   1% /mnt/btrfs
[chris@f28s btrfs]$ sudo fallocate -l 1500m /mnt/btrfs/tmp
[chris@f28s btrfs]$ filefrag -v tmp
Filesystem type is: 9123683e
File size of tmp is 1572864000 (384000 blocks of 4096 bytes)
 ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..   32767:  16400.. 49167:  32768: unwritten
   1:32768..   65535:  56576.. 89343:  32768:  49168: unwritten
   2:65536..   98303: 109824..142591:  32768:  89344: unwritten
   3:98304..  131071: 163072..195839:  32768: 142592: unwritten
   4:   131072..  163839: 216320..249087:  32768: 195840: unwritten
   5:   163840..  196607: 269568..302335:  32768: 249088: unwritten
   6:   196608..  229375: 322816..355583:  32768: 302336: unwritten
   7:   229376..  262143: 376064..408831:  32768: 355584: unwritten
   8:   262144..  294911: 429312..462079:  32768: 408832: unwritten
   9:   294912..  327679: 482560..515327:  32768: 462080: unwritten
  10:   327680..  344063:  89344..105727:  16384: 515328: unwritten
  11:   344064..  360447: 142592..158975:  16384: 105728: unwritten
  12:   360448..  376831: 195840..212223:  16384: 158976: unwritten
  13:   376832..  383999: 249088..256255:   7168: 212224:
last,unwritten,eof
tmp: 14 extents found
[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G  1.5G  543M  74% /mnt/btrfs
[chris@f28s btrfs]$ sudo dd if=/dev/zero of=/mnt/btrfs/tmp bs=1M count=1450
1450+0 records in
1450+0 records out
1520435200 bytes (1.5 GB, 1.4 GiB) copied, 13.4757 s, 113 MB/s
[chris@f28s btrfs]$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg-btrfstest  2.0G  1.5G  591M  72% /mnt/btrfs
[chris@f28s btrfs]$ filefrag -v tmp
Filesystem type is: 9123683e
File size of tmp is 1520435200 (371200 blocks of 4096 bytes)
 ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..   16383: 302336..318719:  16384:
   1:16384..   32767: 355584..371967:  16384: 318720:
   2:32768..   49151: 408832..425215:  16384: 371968:
   3:49152..   65535: 462080..478463:  16384: 425216:
   4:65536..   73727: 515328..523519:   8192: 478464:
   5:73728..   86015:   3328.. 15615:  12288: 523520:
   6:86016..   98303: 256256..268543:  12288:  15616:
   7:98304..  104959:  49168.. 55823:   6656: 268544:
   8:   104960..  109047: 105728..109815:   4088:  55824:
   9:   109048..  113143: 158976..163071:   4096: 109816:
  10:   113144..  117239: 212224..216319:   4096: 163072:
  11:   117240..  121335: 318720..322815:   4096: 216320:
  12:   121336..  125431: 371968..376063:   4096: 322816:
  13:   125432..  128251: 425216..428035:   2820: 376064:
  14:   128252..  131071: 478464..481283:   2820: 428036:
  15:   131072..  132409:   1460..  2797:   1338: 481284:
  16:   132410..  165177: 322816..355583:  32768:   2798:
  17:   165178..  197945: 376064..408831:  32768: 355584:
  18:   197946..  230713: 429312..462079:  32768: 408832:
  19:   230714..  263481: 482560..515327:  32768: 462080:
  20:   263482..  296249:  16400.. 49167:  32768: 515328:
  21:   296250..  327687:  56576.. 88013:  31438:  49168:
  22:   327688..  328711: 428036..429059:   1024:  88014:
  23:   328712..  361479: 109824..142591:  32768: 429060:
  24:   361480..  371199:  88014.. 97733:   9720: 142592: last,eof
tmp: 25 extents found
[chris@f28s btrfs]$


*shrug*


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Austin S. Hemmelgarn

On 2018-07-18 09:07, Chris Murphy wrote:

On Wed, Jul 18, 2018 at 6:35 AM, Austin S. Hemmelgarn
 wrote:


If you're doing a training presentation, it may be worth mentioning that
preallocation with fallocate() does not behave the same on BTRFS as it does
on other filesystems.  For example, the following sequence of commands:

 fallocate -l X ./tmp
 dd if=/dev/zero of=./tmp bs=1 count=X

Will always work on ext4, XFS, and most other filesystems, for any value of
X between zero and just below the total amount of free space on the
filesystem.  On BTRFS though, it will reliably fail with ENOSPC for values
of X that are greater than _half_ of the total amount of free space on the
filesystem (actually, greater than just short of half).  In essence,
preallocating space does not prevent COW semantics for the first write
unless the file is marked NOCOW.


Is this a bug, or is it suboptimal behavior, or is it intentional?
It's been discussed before, though I can't find the email thread right 
now.  Pretty much, this is _technically_ not incorrect behavior, as the 
documentation for fallocate doesn't say that subsequent writes can't 
fail due to lack of space.  I personally consider it a bug though 
because it breaks from existing behavior in a way that is avoidable and 
defies user expectations.


There are two issues here:

1. Regions preallocated with fallocate still do COW on the first write 
to any given block in that region.  This can be handled by either 
treating the first write to each block as NOCOW, or by allocating a bit 
of extra space and doing a rotating approach like this for writes:

- Write goes into the extra space.
- Once the write is done, convert the region covered by the write
  into a new block of extra space.
- When the final block of the preallocated region is written,
  deallocate the extra space.
2. Preallocation does not completely account for necessary metadata 
space that will be needed to store the data there.  This may not be 
necessary if the first issue is addressed properly.


And then I wonder what happens with XFS COW:

  fallocate -l X ./tmp
  cp --reflink ./tmp ./tmp2
  dd if=/dev/zero of=./tmp bs=1 count=X
I'm not sure.  In this particular case, this will fail on BTRFS for any 
X larger than just short of one third of the total free space.  I would 
expect it to fail for any X larger than just short of half instead.


ZFS gets around this by not supporting fallocate (well, kind of, if 
you're using glibc and call posix_fallocate, that _will_ work, but it 
will take forever because it works by writing out each block of space 
that's being allocated, which, ironically, means that that still suffers 
from the same issue potentially that we have).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Chris Murphy
On Wed, Jul 18, 2018 at 6:35 AM, Austin S. Hemmelgarn
 wrote:

> If you're doing a training presentation, it may be worth mentioning that
> preallocation with fallocate() does not behave the same on BTRFS as it does
> on other filesystems.  For example, the following sequence of commands:
>
> fallocate -l X ./tmp
> dd if=/dev/zero of=./tmp bs=1 count=X
>
> Will always work on ext4, XFS, and most other filesystems, for any value of
> X between zero and just below the total amount of free space on the
> filesystem.  On BTRFS though, it will reliably fail with ENOSPC for values
> of X that are greater than _half_ of the total amount of free space on the
> filesystem (actually, greater than just short of half).  In essence,
> preallocating space does not prevent COW semantics for the first write
> unless the file is marked NOCOW.

Is this a bug, or is it suboptimal behavior, or is it intentional?

And then I wonder what happens with XFS COW:

 fallocate -l X ./tmp
 cp --reflink ./tmp ./tmp2
 dd if=/dev/zero of=./tmp bs=1 count=X



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-18 Thread Austin S. Hemmelgarn

On 2018-07-17 13:54, Martin Steigerwald wrote:

Nikolay Borisov - 17.07.18, 10:16:

On 17.07.2018 11:02, Martin Steigerwald wrote:

Nikolay Borisov - 17.07.18, 09:20:

On 16.07.2018 23:58, Wolf wrote:

Greetings,
I would like to ask what what is healthy amount of free space to
keep on each device for btrfs to be happy?

This is how my disk array currently looks like

 [root@dennas ~]# btrfs fi usage /raid
 
 Overall:

 Device size:  29.11TiB
 Device allocated: 21.26TiB
 Device unallocated:7.85TiB
 Device missing:  0.00B
 Used: 21.18TiB
 Free (estimated):  3.96TiB  (min: 3.96TiB)
 Data ratio:   2.00
 Metadata ratio:   2.00
 Global reserve:  512.00MiB  (used: 0.00B)


[…]


Btrfs does quite good job of evenly using space on all devices.
No,
how low can I let that go? In other words, with how much space
free/unallocated remaining space should I consider adding new
disk?


Btrfs will start running into problems when you run out of
unallocated space. So the best advice will be monitor your device
unallocated, once it gets really low - like 2-3 gb I will suggest
you run balance which will try to free up unallocated space by
rewriting data more compactly into sparsely populated block
groups. If after running balance you haven't really freed any
space then you should consider adding a new drive and running
balance to even out the spread of data/metadata.


What are these issues exactly?


For example if you have plenty of data space but your metadata is full
then you will be getting ENOSPC.


Of that one I am aware.

This just did not happen so far.

I did not yet add it explicitly to the training slides, but I just make
myself a note to do that.

Anything else?


If you're doing a training presentation, it may be worth mentioning that 
preallocation with fallocate() does not behave the same on BTRFS as it 
does on other filesystems.  For example, the following sequence of commands:


fallocate -l X ./tmp
dd if=/dev/zero of=./tmp bs=1 count=X

Will always work on ext4, XFS, and most other filesystems, for any value 
of X between zero and just below the total amount of free space on the 
filesystem.  On BTRFS though, it will reliably fail with ENOSPC for 
values of X that are greater than _half_ of the total amount of free 
space on the filesystem (actually, greater than just short of half).  In 
essence, preallocating space does not prevent COW semantics for the 
first write unless the file is marked NOCOW.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-17 Thread Martin Steigerwald
Nikolay Borisov - 17.07.18, 10:16:
> On 17.07.2018 11:02, Martin Steigerwald wrote:
> > Nikolay Borisov - 17.07.18, 09:20:
> >> On 16.07.2018 23:58, Wolf wrote:
> >>> Greetings,
> >>> I would like to ask what what is healthy amount of free space to
> >>> keep on each device for btrfs to be happy?
> >>> 
> >>> This is how my disk array currently looks like
> >>> 
> >>> [root@dennas ~]# btrfs fi usage /raid
> >>> 
> >>> Overall:
> >>> Device size:  29.11TiB
> >>> Device allocated: 21.26TiB
> >>> Device unallocated:7.85TiB
> >>> Device missing:  0.00B
> >>> Used: 21.18TiB
> >>> Free (estimated):  3.96TiB  (min: 3.96TiB)
> >>> Data ratio:   2.00
> >>> Metadata ratio:   2.00
> >>> Global reserve:  512.00MiB  (used: 0.00B)
> > 
> > […]
> > 
> >>> Btrfs does quite good job of evenly using space on all devices.
> >>> No,
> >>> how low can I let that go? In other words, with how much space
> >>> free/unallocated remaining space should I consider adding new
> >>> disk?
> >> 
> >> Btrfs will start running into problems when you run out of
> >> unallocated space. So the best advice will be monitor your device
> >> unallocated, once it gets really low - like 2-3 gb I will suggest
> >> you run balance which will try to free up unallocated space by
> >> rewriting data more compactly into sparsely populated block
> >> groups. If after running balance you haven't really freed any
> >> space then you should consider adding a new drive and running
> >> balance to even out the spread of data/metadata.
> > 
> > What are these issues exactly?
> 
> For example if you have plenty of data space but your metadata is full
> then you will be getting ENOSPC.

Of that one I am aware.

This just did not happen so far.

I did not yet add it explicitly to the training slides, but I just make 
myself a note to do that.

Anything else?

> > I have
> > 
> > % btrfs fi us -T /home
> > 
> > Overall:
> > Device size: 340.00GiB
> > Device allocated:340.00GiB
> > Device unallocated:2.00MiB
> > Device missing:  0.00B
> > Used:308.37GiB
> > Free (estimated): 14.65GiB  (min: 14.65GiB)
> > Data ratio:   2.00
> > Metadata ratio:   2.00
> > Global reserve:  512.00MiB  (used: 0.00B)
> > 
> >   Data  Metadata System
> > 
> > Id Path   RAID1 RAID1RAID1Unallocated
> > -- -- -   ---
> > 
> >  1 /dev/mapper/msata-home 165.89GiB  4.08GiB 32.00MiB 1.00MiB
> >  2 /dev/mapper/sata-home  165.89GiB  4.08GiB 32.00MiB 1.00MiB
> > 
> > -- -- -   ---
> > 
> >Total  165.89GiB  4.08GiB 32.00MiB 2.00MiB
> >Used   151.24GiB  2.95GiB 48.00KiB
>
> You already have only 33% of your metadata full so if your workload
> turned out to actually be making more metadata-heavy changed i.e
> snapshots you could exhaust this and get ENOSPC, despite having around
> 14gb of free data space. Furthermore this data space is spread around
> multiple data chunks, depending on how populated they are a balance
> could be able to free up unallocated space which later could be
> re-purposed for metadata (again, depending on what you are doing).

The filesystem above IMO is not fit for snapshots. It would fill up 
rather quickly, I think even when I balance metadata. Actually I tried 
this and as I remember it took at most a day until it was full.

If I read above figures currently at maximum I could gain one additional 
GiB by balancing metadata. That would not make a huge difference.

I bet I am already running this filesystem beyond recommendation, as I 
bet many would argue it is to full already for regular usage… I do not 
see the benefit of squeezing the last free space out of it just to fit 
in another GiB.

So I still do not get the point why it would make sense to balance it at 
this point in time. Especially as this 1 GiB I could regain is not even 
needed. And I do not see th

Re: Healthy amount of free space?

2018-07-17 Thread Austin S. Hemmelgarn

On 2018-07-16 16:58, Wolf wrote:

Greetings,
I would like to ask what what is healthy amount of free space to keep on
each device for btrfs to be happy?

This is how my disk array currently looks like

 [root@dennas ~]# btrfs fi usage /raid
 Overall:
 Device size:  29.11TiB
 Device allocated: 21.26TiB
 Device unallocated:7.85TiB
 Device missing:  0.00B
 Used: 21.18TiB
 Free (estimated):  3.96TiB  (min: 3.96TiB)
 Data ratio:   2.00
 Metadata ratio:   2.00
 Global reserve:  512.00MiB  (used: 0.00B)

 Data,RAID1: Size:10.61TiB, Used:10.58TiB
/dev/mapper/data1   1.75TiB
/dev/mapper/data2   1.75TiB
/dev/mapper/data3 856.00GiB
/dev/mapper/data4 856.00GiB
/dev/mapper/data5   1.75TiB
/dev/mapper/data6   1.75TiB
/dev/mapper/data7   6.29TiB
/dev/mapper/data8   6.29TiB

 Metadata,RAID1: Size:15.00GiB, Used:13.00GiB
/dev/mapper/data1   2.00GiB
/dev/mapper/data2   3.00GiB
/dev/mapper/data3   1.00GiB
/dev/mapper/data4   1.00GiB
/dev/mapper/data5   3.00GiB
/dev/mapper/data6   1.00GiB
/dev/mapper/data7   9.00GiB
/dev/mapper/data8  10.00GiB
Slightly OT, but the distribution of metadata chunks across devices 
looks a bit sub-optimal here.  If you can tolerate the volume being 
somewhat slower for a while, I'd suggest balancing these (it should get 
you better performance long-term).


 System,RAID1: Size:64.00MiB, Used:1.50MiB
/dev/mapper/data2  32.00MiB
/dev/mapper/data6  32.00MiB
/dev/mapper/data7  32.00MiB
/dev/mapper/data8  32.00MiB

 Unallocated:
/dev/mapper/data11004.52GiB
/dev/mapper/data21004.49GiB
/dev/mapper/data31006.01GiB
/dev/mapper/data41006.01GiB
/dev/mapper/data51004.52GiB
/dev/mapper/data61004.49GiB
/dev/mapper/data71005.00GiB
/dev/mapper/data81005.00GiB

Btrfs does quite good job of evenly using space on all devices. No, how
low can I let that go? In other words, with how much space
free/unallocated remaining space should I consider adding new disk?

Disclaimer: What I'm about to say is based on personal experience.  YMMV.

It depends on how you use the filesystem.

Realistically, there are a couple of things I consider when trying to 
decide on this myself:


* How quickly does the total usage increase on average, and how much can 
it be expected to increase in one day in the worst case scenario?  This 
isn't really BTRFS specific, but it's worth mentioning.  I usually don't 
let an array get close enough to full that it wouldn't be able to safely 
handle at least one day of the worst case increase and another 2 of 
average increases.  In BTRFS terms, the 'safely handle' part means you 
should be adding about 5GB for a multi-TB array like you have, or about 
1GB for a sub-TB array.


* What are the typical write patterns?  Do files get rewritten in-place, 
or are they only ever rewritten with a replace-by-rename? Are writes 
mostly random, or mostly sequential?  Are writes mostly small or mostly 
large?  The more towards the first possibility listed in each of those 
question (in-place rewrites, random access, and small writes), the more 
free space you should keep on the volume.


* Does this volume see heavy usage of fallocate() either to preallocate 
space (note that this _DOES NOT WORK SANELY_ on BTRFS), or to punch 
holes or remove ranges from files.  If whatever software you're using 
does this a lot on this volume, you want even more free space.


* Do old files tend to get removed in large batches?  That is, possibly 
hundreds or thousands of files at a time.  If so, and you're running a 
reasonably recent (4.x series) kernel or regularly balance the volume to 
clean up empty chunks, you don't need quite as much free space.


* How quickly can you get a new device added, and is it critical that 
this volume always be writable?  Sounds stupid, but a lot of people 
don't consider this.  If you can trivially get a new device added 
immediately, you can generally let things go a bit further than you 
would normally, same for if the volume being read-only can be tolerated 
for a while without significant issues.


It's worth noting that I explicitly do not care about snapshot usage. 
It rarely has much impact on this other than changing how the total 
usage increases in a day.


Evaluating all of this is of course something I can't really do for you. 
 If I had to guess, with no other information that the allocations 
shown, I'd say that you're probably generically fine until you get down 
to about 5GB more than twice the average

Re: Healthy amount of free space?

2018-07-17 Thread Nikolay Borisov



On 17.07.2018 11:02, Martin Steigerwald wrote:
> Hi Nikolay.
> 
> Nikolay Borisov - 17.07.18, 09:20:
>> On 16.07.2018 23:58, Wolf wrote:
>>> Greetings,
>>> I would like to ask what what is healthy amount of free space to
>>> keep on each device for btrfs to be happy?
>>>
>>> This is how my disk array currently looks like
>>>
>>> [root@dennas ~]# btrfs fi usage /raid
>>> 
>>> Overall:
>>> Device size:  29.11TiB
>>> Device allocated: 21.26TiB
>>> Device unallocated:7.85TiB
>>> Device missing:  0.00B
>>> Used: 21.18TiB
>>> Free (estimated):  3.96TiB  (min: 3.96TiB)
>>> Data ratio:   2.00
>>> Metadata ratio:   2.00
>>> Global reserve:  512.00MiB  (used: 0.00B)
> […]
>>> Btrfs does quite good job of evenly using space on all devices. No,
>>> how low can I let that go? In other words, with how much space
>>> free/unallocated remaining space should I consider adding new disk?
>>
>> Btrfs will start running into problems when you run out of unallocated
>> space. So the best advice will be monitor your device unallocated,
>> once it gets really low - like 2-3 gb I will suggest you run balance
>> which will try to free up unallocated space by rewriting data more
>> compactly into sparsely populated block groups. If after running
>> balance you haven't really freed any space then you should consider
>> adding a new drive and running balance to even out the spread of
>> data/metadata.
> 
> What are these issues exactly?

For example if you have plenty of data space but your metadata is full
then you will be getting ENOSPC.

> 
> I have
> 
> % btrfs fi us -T /home
> Overall:
> Device size: 340.00GiB
> Device allocated:340.00GiB
> Device unallocated:2.00MiB
> Device missing:  0.00B
> Used:308.37GiB
> Free (estimated): 14.65GiB  (min: 14.65GiB)
> Data ratio:   2.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 0.00B)
> 
>   Data  Metadata System  
> Id Path   RAID1 RAID1RAID1Unallocated
> -- -- -   ---
>  1 /dev/mapper/msata-home 165.89GiB  4.08GiB 32.00MiB 1.00MiB
>  2 /dev/mapper/sata-home  165.89GiB  4.08GiB 32.00MiB 1.00MiB
> -- -- -   ---
>Total  165.89GiB  4.08GiB 32.00MiB 2.00MiB
>Used   151.24GiB  2.95GiB 48.00KiB

You already have only 33% of your metadata full so if your workload
turned out to actually be making more metadata-heavy changed i.e
snapshots you could exhaust this and get ENOSPC, despite having around
14gb of free data space. Furthermore this data space is spread around
multiple data chunks, depending on how populated they are a balance
could be able to free up unallocated space which later could be
re-purposed for metadata (again, depending on what you are doing).

> 
> on a RAID-1 filesystem one, part of the time two Plasma desktops + 
> KDEPIM and Akonadi + Baloo desktop search + you name it write to like 
> mad.
> 



> 
> Thanks,
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-17 Thread Martin Steigerwald
Hi Nikolay.

Nikolay Borisov - 17.07.18, 09:20:
> On 16.07.2018 23:58, Wolf wrote:
> > Greetings,
> > I would like to ask what what is healthy amount of free space to
> > keep on each device for btrfs to be happy?
> > 
> > This is how my disk array currently looks like
> > 
> > [root@dennas ~]# btrfs fi usage /raid
> > 
> > Overall:
> > Device size:  29.11TiB
> > Device allocated: 21.26TiB
> > Device unallocated:7.85TiB
> > Device missing:  0.00B
> > Used: 21.18TiB
> > Free (estimated):  3.96TiB  (min: 3.96TiB)
> > Data ratio:   2.00
> > Metadata ratio:   2.00
> > Global reserve:  512.00MiB  (used: 0.00B)
[…]
> > Btrfs does quite good job of evenly using space on all devices. No,
> > how low can I let that go? In other words, with how much space
> > free/unallocated remaining space should I consider adding new disk?
> 
> Btrfs will start running into problems when you run out of unallocated
> space. So the best advice will be monitor your device unallocated,
> once it gets really low - like 2-3 gb I will suggest you run balance
> which will try to free up unallocated space by rewriting data more
> compactly into sparsely populated block groups. If after running
> balance you haven't really freed any space then you should consider
> adding a new drive and running balance to even out the spread of
> data/metadata.

What are these issues exactly?

I have

% btrfs fi us -T /home
Overall:
Device size: 340.00GiB
Device allocated:340.00GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:308.37GiB
Free (estimated): 14.65GiB  (min: 14.65GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

  Data  Metadata System  
Id Path   RAID1 RAID1RAID1Unallocated
-- -- -   ---
 1 /dev/mapper/msata-home 165.89GiB  4.08GiB 32.00MiB 1.00MiB
 2 /dev/mapper/sata-home  165.89GiB  4.08GiB 32.00MiB 1.00MiB
-- -- -   ---
   Total  165.89GiB  4.08GiB 32.00MiB 2.00MiB
   Used   151.24GiB  2.95GiB 48.00KiB

on a RAID-1 filesystem one, part of the time two Plasma desktops + 
KDEPIM and Akonadi + Baloo desktop search + you name it write to like 
mad.

Since kernel 4.5 or 4.6 this simply works. Before that sometimes BTRFS 
crawled to an halt on searching for free blocks, and I had to switch off 
the laptop uncleanly. If that happened, a balance helped for a while. 
But since 4.5 or 4.6 this did not happen anymore.

I found with SLES 12 SP 3 or so there is btrfsmaintenance running a 
balance weekly. Which created an issue on our Proxmox + Ceph on Intel 
NUC based opensource demo lab. This is for sure no recommended 
configuration for Ceph and Ceph is quite slow on these 2,5 inch 
harddisks and 1 GBit network link, despite albeit somewhat minimal, 
limited to 5 GiB m.2 SSD caching. What happened it that the VM crawled 
to a halt and the kernel gave task hung for more than 120 seconds 
messages. The VM was basically unusable during the balance. Sure that 
should not happen with a "proper" setup, also it also did not happen 
without the automatic balance.

Also what would happen on a hypervisor setup with several thousands of 
VMs with BTRFS, when several 100 of them decide to start the balance at 
a similar time? It could probably bring the I/O system below to an halt, 
as many enterprise storage systems are designed to sustain burst I/O 
loads, but not maximum utilization during an extended period of time.

I am really wondering what to recommend in my Linux performance tuning 
and analysis courses. On my own laptop I do not do regular balances so 
far. Due to my thinking: If it is not broken, do not fix it.

My personal opinion here also is: If the filesystem degrades that much 
that it becomes unusable without regular maintenance from user space, 
the filesystem needs to be fixed. Ideally I would not have to worry on 
whether to regularly balance an BTRFS or not. In other words: I should 
not have to visit a performance analysis and tuning course in order to 
use a computer with BTRFS filesystem.

Thanks,
-- 
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Healthy amount of free space?

2018-07-17 Thread Nikolay Borisov



On 16.07.2018 23:58, Wolf wrote:
> Greetings,
> I would like to ask what what is healthy amount of free space to keep on
> each device for btrfs to be happy?
> 
> This is how my disk array currently looks like
> 
> [root@dennas ~]# btrfs fi usage /raid
> Overall:
> Device size:  29.11TiB
> Device allocated: 21.26TiB
> Device unallocated:7.85TiB
> Device missing:  0.00B
> Used: 21.18TiB
> Free (estimated):  3.96TiB  (min: 3.96TiB)
> Data ratio:   2.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 0.00B)
> 
> Data,RAID1: Size:10.61TiB, Used:10.58TiB
>/dev/mapper/data1   1.75TiB
>/dev/mapper/data2   1.75TiB
>/dev/mapper/data3 856.00GiB
>/dev/mapper/data4 856.00GiB
>/dev/mapper/data5   1.75TiB
>/dev/mapper/data6   1.75TiB
>/dev/mapper/data7   6.29TiB
>/dev/mapper/data8   6.29TiB
> 
> Metadata,RAID1: Size:15.00GiB, Used:13.00GiB
>/dev/mapper/data1   2.00GiB
>/dev/mapper/data2   3.00GiB
>/dev/mapper/data3   1.00GiB
>/dev/mapper/data4   1.00GiB
>/dev/mapper/data5   3.00GiB
>/dev/mapper/data6   1.00GiB
>/dev/mapper/data7   9.00GiB
>/dev/mapper/data8  10.00GiB
> 
> System,RAID1: Size:64.00MiB, Used:1.50MiB
>/dev/mapper/data2  32.00MiB
>/dev/mapper/data6  32.00MiB
>/dev/mapper/data7  32.00MiB
>/dev/mapper/data8  32.00MiB
> 
> Unallocated:
>/dev/mapper/data11004.52GiB
>/dev/mapper/data21004.49GiB
>/dev/mapper/data31006.01GiB
>/dev/mapper/data41006.01GiB
>/dev/mapper/data51004.52GiB
>/dev/mapper/data61004.49GiB
>/dev/mapper/data71005.00GiB
>/dev/mapper/data81005.00GiB
> 
> Btrfs does quite good job of evenly using space on all devices. No, how
> low can I let that go? In other words, with how much space
> free/unallocated remaining space should I consider adding new disk?

Btrfs will start running into problems when you run out of unallocated
space. So the best advice will be monitor your device unallocated, once
it gets really low - like 2-3 gb I will suggest you run balance which
will try to free up unallocated space by rewriting data more compactly
into sparsely populated block groups. If after running balance you
haven't really freed any space then you should consider adding a new
drive and running balance to even out the spread of data/metadata.

> 
> Thanks for advice :)
> 
> W.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Healthy amount of free space?

2018-07-16 Thread Wolf
Greetings,
I would like to ask what what is healthy amount of free space to keep on
each device for btrfs to be happy?

This is how my disk array currently looks like

[root@dennas ~]# btrfs fi usage /raid
Overall:
Device size:  29.11TiB
Device allocated: 21.26TiB
Device unallocated:7.85TiB
Device missing:  0.00B
Used: 21.18TiB
Free (estimated):  3.96TiB  (min: 3.96TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:10.61TiB, Used:10.58TiB
   /dev/mapper/data1   1.75TiB
   /dev/mapper/data2   1.75TiB
   /dev/mapper/data3 856.00GiB
   /dev/mapper/data4 856.00GiB
   /dev/mapper/data5   1.75TiB
   /dev/mapper/data6   1.75TiB
   /dev/mapper/data7   6.29TiB
   /dev/mapper/data8   6.29TiB

Metadata,RAID1: Size:15.00GiB, Used:13.00GiB
   /dev/mapper/data1   2.00GiB
   /dev/mapper/data2   3.00GiB
   /dev/mapper/data3   1.00GiB
   /dev/mapper/data4   1.00GiB
   /dev/mapper/data5   3.00GiB
   /dev/mapper/data6   1.00GiB
   /dev/mapper/data7   9.00GiB
   /dev/mapper/data8  10.00GiB

System,RAID1: Size:64.00MiB, Used:1.50MiB
   /dev/mapper/data2  32.00MiB
   /dev/mapper/data6  32.00MiB
   /dev/mapper/data7  32.00MiB
   /dev/mapper/data8  32.00MiB

Unallocated:
   /dev/mapper/data11004.52GiB
   /dev/mapper/data21004.49GiB
   /dev/mapper/data31006.01GiB
   /dev/mapper/data41006.01GiB
   /dev/mapper/data51004.52GiB
   /dev/mapper/data61004.49GiB
   /dev/mapper/data71005.00GiB
   /dev/mapper/data81005.00GiB

Btrfs does quite good job of evenly using space on all devices. No, how
low can I let that go? In other words, with how much space
free/unallocated remaining space should I consider adding new disk?

Thanks for advice :)

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature