Re: RAID-1 refuses to balance large drive

2018-06-07 Thread Zygo Blaxell
On Sat, May 26, 2018 at 06:27:57PM -0700, Brad Templeton wrote:
> A few years ago, I encountered an issue (halfway between a bug and a
> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
> fairly full.   The problem was that after replacing (by add/delete) a
> small drive with a larger one, there were now 2 full drives and one
> new half-full one, and balance was not able to correct this situation
> to produce the desired result, which is 3 drives, each with a roughly
> even amount of free space.  It can't do it because the 2 smaller
> drives are full, and it doesn't realize it could just move one of the
> copies of a block off the smaller drive onto the larger drive to free
> space on the smaller drive, it wants to move them both, and there is
> nowhere to put them both.
> 
> I'm about to do it again, taking my nearly full array which is 4TB,
> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
> repeat the very time consuming situation, so I wanted to find out if
> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
> consider the upgrade to  bionic (4.15) though that adds a lot more to
> my plate before a long trip and I would prefer to avoid if I can.
> 
> So what is the best strategy:
> 
> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
> strategy)
> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
> from 4TB but possibly not enough)
> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
> recently vacated 6TB -- much longer procedure but possibly better

d) Run "btrfs balance start -dlimit=3 /fs" to make some unallocated
space on all drives *before* adding disks.  Then replace, resize up,
and balance until unallocated space on all disks are equal.  There is
no need to continue balancing after that, so once that point is reached
you can cancel the balance.

A number of bad things can happen when unallocated space goes to zero,
and being unable to expand a raid1 array is only one of them.  Avoid that
situation even when not resizing the array, because some cases can be
very difficult to get out of.

Assuming your disk is not filled to the last gigabyte, you'll be able
to keep at least 1GB unallocated on every disk at all times.  Monitor
the amount of unallocated space and balance a few data block groups
(e.g. -dlimit=3) whenever unallocated space gets low.

A potential btrfs enhancement area:  allow the 'devid' parameter of
balance to specify two disks to balance block groups that contain chunks
on both disks.  We want to balance only those block groups that consist of
one chunk on each smaller drive.  This redistributes those block groups
to have one chunk on the large disk and one chunk on one of the smaller
disks, freeing space on the other small disk for the next block group.
Block groups that consist of a chunk on the big disk and one of the
small disks are already in the desired configuration, so rebalancing
them is just a waste of time.  Currently it's only possible to do this
by writing a script to select individual block groups with python-btrfs
or similar--much faster than plain btrfs balance for this case, but more
involved to set up.

> Or has this all been fixed and method A will work fine and get to the
> ideal goal -- 3 drives, with available space suitably distributed to
> allow full utilization over time?
> 
> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton  wrote:
> > A few years ago, I encountered an issue (halfway between a bug and a
> > problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
> > full.   The problem was that after replacing (by add/delete) a small drive
> > with a larger one, there were now 2 full drives and one new half-full one,
> > and balance was not able to correct this situation to produce the desired
> > result, which is 3 drives, each with a roughly even amount of free space.
> > It can't do it because the 2 smaller drives are full, and it doesn't realize
> > it could just move one of the copies of a block off the smaller drive onto
> > the larger drive to free space on the smaller drive, it wants to move them
> > both, and there is nowhere to put them both.
> >
> > I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB
> > and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
> > time consuming situation, so I wanted to find out if things were fixed now.
> > I am running Xenial (kernel 4.4.0) and could consider the upgrade to  bionic
> > (4.15) though that adds a lot more to my plate before a long trip and I
> > would prefer to avoid if I can.
> >
> > So what is the best strategy:
> >
> > a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
> > strategy)
> > b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from
> > 4TB but possibly not enough)
> > c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
> > vacated 6TB -- much longer 

Re: RAID-1 refuses to balance large drive

2018-05-28 Thread Duncan
Brad Templeton posted on Sun, 27 May 2018 11:22:07 -0700 as excerpted:

> BTW, I decided to follow the original double replace strategy suggested 
--
> replace 6TB with 8TB and replace 4TB with 6TB.  That should be sure to
> leave the 2 large drives each with 2TB free once expanded, and thus able
> to fully use all space.
> 
> However, the first one has been going for 9 hours and is "189.7% done" 
> and still going.   Some sort of bug in calculating the completion
> status, obviously.  With luck 200% will be enough?

IIRC there was an over-100% completion status bug fixed, I'd guess about 
18 months to two years ago now, long enough it would have slipped 
regular's minds so nobody would have thought about it even knowing you're 
still on 4.4, that being one of the reasons we don't do as well 
supporting stuff that old.

If it is indeed the same bug, anything even half modern should have it 
fixed

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2018-05-27 Thread Brad Templeton
BTW, I decided to follow the original double replace strategy
suggested -- replace 6TB with 8TB and replace 4TB with 6TB.  That
should be sure to leave the 2 large drives each with 2TB free once
expanded, and thus able to fully use all space.

However, the first one has been going for 9 hours and is "189.7% done"
and still going.   Some sort of bug in calculating the completion
status, obviously.  With luck 200% will be enough?

On Sat, May 26, 2018 at 7:21 PM, Brad Templeton  wrote:
> Certainly.  My apologies for not including them before.   As
> described, the disks are reasonably balanced -- not as full as the
> last time.  As such, it might be enough that balance would (slowly)
> free up enough chunks to get things going.  And if I have to, I will
> partially convert to single again.   Certainly btrfs replace seems
> like the most planned and simple path but it will result in a strange
> distribution of the chunks.
>
> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>Total devices 3 FS bytes used 6.11TiB
>devid1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
>Device size:  12.70TiB
>Device allocated: 12.25TiB
>Device unallocated:  459.95GiB
>Device missing:  0.00B
>Used: 12.21TiB
>Free (estimated):246.35GiB  (min: 246.35GiB)
>Data ratio:   2.00
>Metadata ratio:   2.00
>Global reserve:  512.00MiB  (used: 1.32MiB)
>
> Data,RAID1: Size:6.11TiB, Used:6.09TiB
>   /dev/sda3.48TiB
>   /dev/sdi2   5.28TiB
>   /dev/sdj2   3.46TiB
>
> Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
>   /dev/sda8.00GiB
>   /dev/sdi2   7.00GiB
>   /dev/sdj2  13.00GiB
>
> System,RAID1: Size:32.00MiB, Used:888.00KiB
>   /dev/sdi2  32.00MiB
>   /dev/sdj2  32.00MiB
>
> Unallocated:
>   /dev/sda  153.02GiB
>   /dev/sdi2 154.56GiB
>   /dev/sdj2 152.36GiB
>
>   devid2 size 3.64TiB used 3.49TiB path /dev/sda
>devid3 size 5.43TiB used 5.28TiB path /dev/sdi2
>
>
> On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年05月27日 10:06, Brad Templeton wrote:
>>> Thanks.  These are all things which take substantial fractions of a
>>> day to try, unfortunately.
>>
>> Normally I would suggest just using VM and several small disks (~10G),
>> along with fallocate (the fastest way to use space) to get a basic view
>> of the procedure.
>>
>>> Last time I ended up fixing it in a
>>> fairly kluged way, which was to convert from raid-1 to single long
>>> enough to get enough single blocks that when I converted back to
>>> raid-1 they got distributed to the right drives.
>>
>> Yep, that's the ultimate one-fit-all solution.
>> Also, this reminds me about the fact we could do the
>> RAID1->Single/DUP->Single downgrade in a much much faster way.
>> I think it's worthy considering for later enhancement.
>>
>>>  But this is, aside
>>> from being a kludge, a procedure with some minor risk.  Of course I am
>>> taking a backup first, but still...
>>>
>>> This strikes me as something that should be a fairly common event --
>>> your raid is filling up, and so you expand it by replacing the oldest
>>> and smallest drive with a new much bigger one.   In the old days of
>>> RAID, you could not do that, you had to grow all drives at the same
>>> time, and this is one of the ways that BTRFS is quite superior.
>>> When I had MD raid, I went through a strange process of always having
>>> a raid 5 that consisted of different sized drives.  The raid-5 was
>>> based on the smallest of the 3 drives, and then the larger ones had
>>> extra space which could either be in raid-1, or more imply was in solo
>>> disk mode and used for less critical data (such as backups and old
>>> archives.)   Slowly, and in a messy way, each time I replaced the
>>> smallest drive, I could then grow the raid 5.  Yuck. BTRFS is so
>>> much better, except for this issue.
>>>
>>> So if somebody has a thought of a procedure that is fairly sure to
>>> work and doesn't involve too many copying passes -- copying 4tb is not
>>> a quick operation -- it is much appreciated and might be a good thing
>>> to add to a wiki page, which I would be happy to do.
>>
>> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
>> further advice from community.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo  wrote:


 On 2018年05月27日 09:49, Brad Templeton wrote:
> That is what did not work last time.
>
> I say I think there can be a "fix" because I hope the goal of BTRFS
> raid is to be superior to traditional RAID.   That if one replaces a
> drive, and asks to balance, it figures out what needs to be done to
> make that work.  I understand that the current balance algorithm may
> have 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Duncan
Brad Templeton posted on Sat, 26 May 2018 19:21:57 -0700 as excerpted:

> Certainly.  My apologies for not including them before.

Aieee!  Reply before quote, making the reply out of context, and my
attempt to reply in context... difficult and troublesome.

Please use standard list context-quote, reply in context, next time,
making it easier for further replies also in context.

> As
> described, the disks are reasonably balanced -- not as full as the
> last time.  As such, it might be enough that balance would (slowly)
> free up enough chunks to get things going.  And if I have to, I will
> partially convert to single again.   Certainly btrfs replace seems
> like the most planned and simple path but it will result in a strange
> distribution of the chunks.

[btrfs filesystem usage output below]

> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>Total devices 3 FS bytes used 6.11TiB
>devid1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
>Device size:  12.70TiB
>Device allocated: 12.25TiB
>Device unallocated:  459.95GiB
>Device missing:  0.00B
>Used: 12.21TiB
>Free (estimated):246.35GiB  (min: 246.35GiB)
>Data ratio:   2.00
>Metadata ratio:   2.00
>Global reserve:  512.00MiB  (used: 1.32MiB)
> 
> Data,RAID1: Size:6.11TiB, Used:6.09TiB
>   /dev/sda3.48TiB
>   /dev/sdi2   5.28TiB
>   /dev/sdj2   3.46TiB
> 
> Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
>   /dev/sda8.00GiB
>   /dev/sdi2   7.00GiB
>   /dev/sdj2  13.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:888.00KiB
>   /dev/sdi2  32.00MiB
>   /dev/sdj2  32.00MiB
> 
> Unallocated:
>   /dev/sda  153.02GiB
>   /dev/sdi2 154.56GiB
>   /dev/sdj2 152.36GiB

[Presumably this is a bit of btrfs filesystem show output, but the
rest of it is missing...]

>   devid2 size 3.64TiB used 3.49TiB path /dev/sda
>   devid3 size 5.43TiB used 5.28TiB path /dev/sdi2


Based on the 100+ GiB still free on each of the three devices above,
you should have no issues balancing after replacing one of them.

Presumably the first time you tried it, there was far less, likely under
a GiB free on the two not replaced.  Since data chunks are nominally
1 GiB each and raid1 requires two copies, each on a different device,
that didn't leave enough space on either of the older devices to do
a balance, even tho there was plenty of space left on the just-replaced
new one.

(Tho multiple-GiB chunks are possible on TB+ devices, but 10 GiB free
on each device should be plenty, so 100+ GiB free on each... should be
no issues unless you run into some strange bug.)


Meanwhile, even in the case of not enough space free on all three
existing devices, given that they're currently two 4 TB devices and
a 6 TB device and that you're replacing one of the 4 TB devices with
an 8 TB device...

Doing a two-step replace, first replacing the 6 TB device with the
new 8 TB device, then resizing to the new 8 TB size, giving you ~2 TB of
free space on it, then replacing one of the 4 TB devices with the now
free 6 TB device, and again resizing to the new 6 TB size, giving you
~2 TB free on it too, thus giving you ~2 TB free on each of two devices
instead of all 4 TB of new space on a single device, should do the trick
very well, and should still be faster, probably MUCH faster, than doing
a temporary convert to single, then back to raid1, the kludge you used
last time. =:^)


Meanwhile, while kernel version of course remains up to you, given that
you mentioned 4.4 with a potential upgrade to 4.15, I will at least
cover the following, so you'll have it to use as you decide on kernel
versions.

4.15?  Why?  4.14 is the current mainline LTS kernel series, with 4.15
only being a normal short-term stable series that has already been
EOLed.  So 4.15 now makes little sense at all.  Either go current-stable
series and do 4.16 and continue to upgrade as the new kernels come (4.17
should be out shortly as it's past rc6, with rc7 likely out by the time
you read this and release likely in a week), or stick with 4.14 LTS for
the longer-term support.

Of course you can go with your distro kernel if you like, and I presume
that's why you mentioned 4.15, but as I said it's already EOLed upstream,
and of course this list being a kernel development list, our focus tends
to be on upstream/mainstream, not distro level kernels.  If you choose
a distro level kernel series that's EOLed at kernel.org, then you really
should be getting support from them for it, as they know what they've
backported and/or patched and are thus best positioned to support it.

As for what this list does try to support, it's the last two kernel
release series in each of the current and LTS tracks.  So as the first
release back from current 4.16, 4.15, tho EOLed upstream, is still

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Brad Templeton
Certainly.  My apologies for not including them before.   As
described, the disks are reasonably balanced -- not as full as the
last time.  As such, it might be enough that balance would (slowly)
free up enough chunks to get things going.  And if I have to, I will
partially convert to single again.   Certainly btrfs replace seems
like the most planned and simple path but it will result in a strange
distribution of the chunks.

Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
   Total devices 3 FS bytes used 6.11TiB
   devid1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
   Device size:  12.70TiB
   Device allocated: 12.25TiB
   Device unallocated:  459.95GiB
   Device missing:  0.00B
   Used: 12.21TiB
   Free (estimated):246.35GiB  (min: 246.35GiB)
   Data ratio:   2.00
   Metadata ratio:   2.00
   Global reserve:  512.00MiB  (used: 1.32MiB)

Data,RAID1: Size:6.11TiB, Used:6.09TiB
  /dev/sda3.48TiB
  /dev/sdi2   5.28TiB
  /dev/sdj2   3.46TiB

Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
  /dev/sda8.00GiB
  /dev/sdi2   7.00GiB
  /dev/sdj2  13.00GiB

System,RAID1: Size:32.00MiB, Used:888.00KiB
  /dev/sdi2  32.00MiB
  /dev/sdj2  32.00MiB

Unallocated:
  /dev/sda  153.02GiB
  /dev/sdi2 154.56GiB
  /dev/sdj2 152.36GiB

  devid2 size 3.64TiB used 3.49TiB path /dev/sda
   devid3 size 5.43TiB used 5.28TiB path /dev/sdi2


On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo  wrote:
>
>
> On 2018年05月27日 10:06, Brad Templeton wrote:
>> Thanks.  These are all things which take substantial fractions of a
>> day to try, unfortunately.
>
> Normally I would suggest just using VM and several small disks (~10G),
> along with fallocate (the fastest way to use space) to get a basic view
> of the procedure.
>
>> Last time I ended up fixing it in a
>> fairly kluged way, which was to convert from raid-1 to single long
>> enough to get enough single blocks that when I converted back to
>> raid-1 they got distributed to the right drives.
>
> Yep, that's the ultimate one-fit-all solution.
> Also, this reminds me about the fact we could do the
> RAID1->Single/DUP->Single downgrade in a much much faster way.
> I think it's worthy considering for later enhancement.
>
>>  But this is, aside
>> from being a kludge, a procedure with some minor risk.  Of course I am
>> taking a backup first, but still...
>>
>> This strikes me as something that should be a fairly common event --
>> your raid is filling up, and so you expand it by replacing the oldest
>> and smallest drive with a new much bigger one.   In the old days of
>> RAID, you could not do that, you had to grow all drives at the same
>> time, and this is one of the ways that BTRFS is quite superior.
>> When I had MD raid, I went through a strange process of always having
>> a raid 5 that consisted of different sized drives.  The raid-5 was
>> based on the smallest of the 3 drives, and then the larger ones had
>> extra space which could either be in raid-1, or more imply was in solo
>> disk mode and used for less critical data (such as backups and old
>> archives.)   Slowly, and in a messy way, each time I replaced the
>> smallest drive, I could then grow the raid 5.  Yuck. BTRFS is so
>> much better, except for this issue.
>>
>> So if somebody has a thought of a procedure that is fairly sure to
>> work and doesn't involve too many copying passes -- copying 4tb is not
>> a quick operation -- it is much appreciated and might be a good thing
>> to add to a wiki page, which I would be happy to do.
>
> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
> further advice from community.
>
> Thanks,
> Qu
>
>>
>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年05月27日 09:49, Brad Templeton wrote:
 That is what did not work last time.

 I say I think there can be a "fix" because I hope the goal of BTRFS
 raid is to be superior to traditional RAID.   That if one replaces a
 drive, and asks to balance, it figures out what needs to be done to
 make that work.  I understand that the current balance algorithm may
 have trouble with that.   In this situation, the ideal result would be
 the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
 free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
 extents which are currently on both the 4TB and 6TB -- by moving only
 one copy.
>>>
>>> Btrfs can only do balance in a chunk unit.
>>> Thus btrfs can only do:
>>> 1) Create new chunk
>>> 2) Copy data
>>> 3) Remove old chunk.
>>>
>>> So it can't do the way you mentioned.
>>> But your purpose sounds pretty valid and maybe we could enhanace btrfs
>>> to do such thing.
>>> (Currently only replace can behave like that)
>>>
 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Qu Wenruo


On 2018年05月27日 10:06, Brad Templeton wrote:
> Thanks.  These are all things which take substantial fractions of a
> day to try, unfortunately.

Normally I would suggest just using VM and several small disks (~10G),
along with fallocate (the fastest way to use space) to get a basic view
of the procedure.

> Last time I ended up fixing it in a
> fairly kluged way, which was to convert from raid-1 to single long
> enough to get enough single blocks that when I converted back to
> raid-1 they got distributed to the right drives.

Yep, that's the ultimate one-fit-all solution.
Also, this reminds me about the fact we could do the
RAID1->Single/DUP->Single downgrade in a much much faster way.
I think it's worthy considering for later enhancement.

>  But this is, aside
> from being a kludge, a procedure with some minor risk.  Of course I am
> taking a backup first, but still...
> 
> This strikes me as something that should be a fairly common event --
> your raid is filling up, and so you expand it by replacing the oldest
> and smallest drive with a new much bigger one.   In the old days of
> RAID, you could not do that, you had to grow all drives at the same
> time, and this is one of the ways that BTRFS is quite superior.
> When I had MD raid, I went through a strange process of always having
> a raid 5 that consisted of different sized drives.  The raid-5 was
> based on the smallest of the 3 drives, and then the larger ones had
> extra space which could either be in raid-1, or more imply was in solo
> disk mode and used for less critical data (such as backups and old
> archives.)   Slowly, and in a messy way, each time I replaced the
> smallest drive, I could then grow the raid 5.  Yuck. BTRFS is so
> much better, except for this issue.
> 
> So if somebody has a thought of a procedure that is fairly sure to
> work and doesn't involve too many copying passes -- copying 4tb is not
> a quick operation -- it is much appreciated and might be a good thing
> to add to a wiki page, which I would be happy to do.

Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
further advice from community.

Thanks,
Qu

> 
> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年05月27日 09:49, Brad Templeton wrote:
>>> That is what did not work last time.
>>>
>>> I say I think there can be a "fix" because I hope the goal of BTRFS
>>> raid is to be superior to traditional RAID.   That if one replaces a
>>> drive, and asks to balance, it figures out what needs to be done to
>>> make that work.  I understand that the current balance algorithm may
>>> have trouble with that.   In this situation, the ideal result would be
>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>>> extents which are currently on both the 4TB and 6TB -- by moving only
>>> one copy.
>>
>> Btrfs can only do balance in a chunk unit.
>> Thus btrfs can only do:
>> 1) Create new chunk
>> 2) Copy data
>> 3) Remove old chunk.
>>
>> So it can't do the way you mentioned.
>> But your purpose sounds pretty valid and maybe we could enhanace btrfs
>> to do such thing.
>> (Currently only replace can behave like that)
>>
>>> It is not strictly a "bug" in that the code is operating
>>> as designed, but it is an undesired function.
>>>
>>> The problem is the approach you describe did not work in the prior upgrade.
>>
>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
>> could also help.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo  wrote:


 On 2018年05月27日 09:27, Brad Templeton wrote:
> A few years ago, I encountered an issue (halfway between a bug and a
> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
> fairly full.   The problem was that after replacing (by add/delete) a
> small drive with a larger one, there were now 2 full drives and one
> new half-full one, and balance was not able to correct this situation
> to produce the desired result, which is 3 drives, each with a roughly
> even amount of free space.  It can't do it because the 2 smaller
> drives are full, and it doesn't realize it could just move one of the
> copies of a block off the smaller drive onto the larger drive to free
> space on the smaller drive, it wants to move them both, and there is
> nowhere to put them both.

 It's not that easy.
 For balance, btrfs must first find a large enough space to locate both
 copy, then copy data.
 Or if powerloss happens, it will cause data corruption.

 So in your case, btrfs can only find enough space for one copy, thus
 unable to relocate any chunk.

>
> I'm about to do it again, taking my nearly full array which is 4TB,
> 4TB, 6TB and replacing one of the 4TB with 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Brad Templeton
Thanks.  These are all things which take substantial fractions of a
day to try, unfortunately.Last time I ended up fixing it in a
fairly kluged way, which was to convert from raid-1 to single long
enough to get enough single blocks that when I converted back to
raid-1 they got distributed to the right drives.  But this is, aside
from being a kludge, a procedure with some minor risk.  Of course I am
taking a backup first, but still...

This strikes me as something that should be a fairly common event --
your raid is filling up, and so you expand it by replacing the oldest
and smallest drive with a new much bigger one.   In the old days of
RAID, you could not do that, you had to grow all drives at the same
time, and this is one of the ways that BTRFS is quite superior.
When I had MD raid, I went through a strange process of always having
a raid 5 that consisted of different sized drives.  The raid-5 was
based on the smallest of the 3 drives, and then the larger ones had
extra space which could either be in raid-1, or more imply was in solo
disk mode and used for less critical data (such as backups and old
archives.)   Slowly, and in a messy way, each time I replaced the
smallest drive, I could then grow the raid 5.  Yuck. BTRFS is so
much better, except for this issue.

So if somebody has a thought of a procedure that is fairly sure to
work and doesn't involve too many copying passes -- copying 4tb is not
a quick operation -- it is much appreciated and might be a good thing
to add to a wiki page, which I would be happy to do.

On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo  wrote:
>
>
> On 2018年05月27日 09:49, Brad Templeton wrote:
>> That is what did not work last time.
>>
>> I say I think there can be a "fix" because I hope the goal of BTRFS
>> raid is to be superior to traditional RAID.   That if one replaces a
>> drive, and asks to balance, it figures out what needs to be done to
>> make that work.  I understand that the current balance algorithm may
>> have trouble with that.   In this situation, the ideal result would be
>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>> extents which are currently on both the 4TB and 6TB -- by moving only
>> one copy.
>
> Btrfs can only do balance in a chunk unit.
> Thus btrfs can only do:
> 1) Create new chunk
> 2) Copy data
> 3) Remove old chunk.
>
> So it can't do the way you mentioned.
> But your purpose sounds pretty valid and maybe we could enhanace btrfs
> to do such thing.
> (Currently only replace can behave like that)
>
>> It is not strictly a "bug" in that the code is operating
>> as designed, but it is an undesired function.
>>
>> The problem is the approach you describe did not work in the prior upgrade.
>
> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
> could also help.
>
> Thanks,
> Qu
>
>>
>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年05月27日 09:27, Brad Templeton wrote:
 A few years ago, I encountered an issue (halfway between a bug and a
 problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
 fairly full.   The problem was that after replacing (by add/delete) a
 small drive with a larger one, there were now 2 full drives and one
 new half-full one, and balance was not able to correct this situation
 to produce the desired result, which is 3 drives, each with a roughly
 even amount of free space.  It can't do it because the 2 smaller
 drives are full, and it doesn't realize it could just move one of the
 copies of a block off the smaller drive onto the larger drive to free
 space on the smaller drive, it wants to move them both, and there is
 nowhere to put them both.
>>>
>>> It's not that easy.
>>> For balance, btrfs must first find a large enough space to locate both
>>> copy, then copy data.
>>> Or if powerloss happens, it will cause data corruption.
>>>
>>> So in your case, btrfs can only find enough space for one copy, thus
>>> unable to relocate any chunk.
>>>

 I'm about to do it again, taking my nearly full array which is 4TB,
 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
 repeat the very time consuming situation, so I wanted to find out if
 things were fixed now.   I am running Xenial (kernel 4.4.0) and could
 consider the upgrade to  bionic (4.15) though that adds a lot more to
 my plate before a long trip and I would prefer to avoid if I can.
>>>
>>> Since there is nothing to fix, the behavior will not change at all.
>>>

 So what is the best strategy:

 a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
 strategy)
 b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
 from 4TB but possibly not enough)
 c) 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Qu Wenruo


On 2018年05月27日 09:49, Brad Templeton wrote:
> That is what did not work last time.
> 
> I say I think there can be a "fix" because I hope the goal of BTRFS
> raid is to be superior to traditional RAID.   That if one replaces a
> drive, and asks to balance, it figures out what needs to be done to
> make that work.  I understand that the current balance algorithm may
> have trouble with that.   In this situation, the ideal result would be
> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
> extents which are currently on both the 4TB and 6TB -- by moving only
> one copy.

Btrfs can only do balance in a chunk unit.
Thus btrfs can only do:
1) Create new chunk
2) Copy data
3) Remove old chunk.

So it can't do the way you mentioned.
But your purpose sounds pretty valid and maybe we could enhanace btrfs
to do such thing.
(Currently only replace can behave like that)

> It is not strictly a "bug" in that the code is operating
> as designed, but it is an undesired function.
> 
> The problem is the approach you describe did not work in the prior upgrade.

Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
And before/after balance, "btrfs fi usage" and "btrfs fi show" output
could also help.

Thanks,
Qu

> 
> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年05月27日 09:27, Brad Templeton wrote:
>>> A few years ago, I encountered an issue (halfway between a bug and a
>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>>> fairly full.   The problem was that after replacing (by add/delete) a
>>> small drive with a larger one, there were now 2 full drives and one
>>> new half-full one, and balance was not able to correct this situation
>>> to produce the desired result, which is 3 drives, each with a roughly
>>> even amount of free space.  It can't do it because the 2 smaller
>>> drives are full, and it doesn't realize it could just move one of the
>>> copies of a block off the smaller drive onto the larger drive to free
>>> space on the smaller drive, it wants to move them both, and there is
>>> nowhere to put them both.
>>
>> It's not that easy.
>> For balance, btrfs must first find a large enough space to locate both
>> copy, then copy data.
>> Or if powerloss happens, it will cause data corruption.
>>
>> So in your case, btrfs can only find enough space for one copy, thus
>> unable to relocate any chunk.
>>
>>>
>>> I'm about to do it again, taking my nearly full array which is 4TB,
>>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>>> repeat the very time consuming situation, so I wanted to find out if
>>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>>> my plate before a long trip and I would prefer to avoid if I can.
>>
>> Since there is nothing to fix, the behavior will not change at all.
>>
>>>
>>> So what is the best strategy:
>>>
>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
>>> strategy)
>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>>> from 4TB but possibly not enough)
>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>>> recently vacated 6TB -- much longer procedure but possibly better
>>>
>>> Or has this all been fixed and method A will work fine and get to the
>>> ideal goal -- 3 drives, with available space suitably distributed to
>>> allow full utilization over time?
>>
>> Btrfs chunk allocator is already trying to utilize all drivers for a
>> long long time.
>> When allocate chunks, btrfs will choose the device with the most free
>> space. However the nature of RAID1 needs btrfs to allocate extents from
>> 2 different devices, which makes your replaced 4/4/6 a little complex.
>> (If your 4/4/6 array is set up and then filled to current stage, btrfs
>> should be able to utilize all the space)
>>
>>
>> Personally speaking, if you're confident enough, just add a new device,
>> and then do balance.
>> If enough chunks get balanced, there should be enough space freed on
>> existing disks.
>> Then remove the newly added device, then btrfs should handle the
>> remaining space well.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton  wrote:
 A few years ago, I encountered an issue (halfway between a bug and a
 problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
 full.   The problem was that after replacing (by add/delete) a small drive
 with a larger one, there were now 2 full drives and one new half-full one,
 and balance was not able to correct this situation to produce the desired
 result, which is 3 drives, each with a roughly even amount of free space.
 It can't do it because the 2 smaller drives are full, and it doesn't 
 realize
 it could just move 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Brad Templeton
That is what did not work last time.

I say I think there can be a "fix" because I hope the goal of BTRFS
raid is to be superior to traditional RAID.   That if one replaces a
drive, and asks to balance, it figures out what needs to be done to
make that work.  I understand that the current balance algorithm may
have trouble with that.   In this situation, the ideal result would be
the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
extents which are currently on both the 4TB and 6TB -- by moving only
one copy.   It is not strictly a "bug" in that the code is operating
as designed, but it is an undesired function.

The problem is the approach you describe did not work in the prior upgrade.

On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo  wrote:
>
>
> On 2018年05月27日 09:27, Brad Templeton wrote:
>> A few years ago, I encountered an issue (halfway between a bug and a
>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>> fairly full.   The problem was that after replacing (by add/delete) a
>> small drive with a larger one, there were now 2 full drives and one
>> new half-full one, and balance was not able to correct this situation
>> to produce the desired result, which is 3 drives, each with a roughly
>> even amount of free space.  It can't do it because the 2 smaller
>> drives are full, and it doesn't realize it could just move one of the
>> copies of a block off the smaller drive onto the larger drive to free
>> space on the smaller drive, it wants to move them both, and there is
>> nowhere to put them both.
>
> It's not that easy.
> For balance, btrfs must first find a large enough space to locate both
> copy, then copy data.
> Or if powerloss happens, it will cause data corruption.
>
> So in your case, btrfs can only find enough space for one copy, thus
> unable to relocate any chunk.
>
>>
>> I'm about to do it again, taking my nearly full array which is 4TB,
>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>> repeat the very time consuming situation, so I wanted to find out if
>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>> my plate before a long trip and I would prefer to avoid if I can.
>
> Since there is nothing to fix, the behavior will not change at all.
>
>>
>> So what is the best strategy:
>>
>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
>> strategy)
>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>> from 4TB but possibly not enough)
>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>> recently vacated 6TB -- much longer procedure but possibly better
>>
>> Or has this all been fixed and method A will work fine and get to the
>> ideal goal -- 3 drives, with available space suitably distributed to
>> allow full utilization over time?
>
> Btrfs chunk allocator is already trying to utilize all drivers for a
> long long time.
> When allocate chunks, btrfs will choose the device with the most free
> space. However the nature of RAID1 needs btrfs to allocate extents from
> 2 different devices, which makes your replaced 4/4/6 a little complex.
> (If your 4/4/6 array is set up and then filled to current stage, btrfs
> should be able to utilize all the space)
>
>
> Personally speaking, if you're confident enough, just add a new device,
> and then do balance.
> If enough chunks get balanced, there should be enough space freed on
> existing disks.
> Then remove the newly added device, then btrfs should handle the
> remaining space well.
>
> Thanks,
> Qu
>
>>
>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton  wrote:
>>> A few years ago, I encountered an issue (halfway between a bug and a
>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>>> full.   The problem was that after replacing (by add/delete) a small drive
>>> with a larger one, there were now 2 full drives and one new half-full one,
>>> and balance was not able to correct this situation to produce the desired
>>> result, which is 3 drives, each with a roughly even amount of free space.
>>> It can't do it because the 2 smaller drives are full, and it doesn't realize
>>> it could just move one of the copies of a block off the smaller drive onto
>>> the larger drive to free space on the smaller drive, it wants to move them
>>> both, and there is nowhere to put them both.
>>>
>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB
>>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
>>> time consuming situation, so I wanted to find out if things were fixed now.
>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  bionic
>>> (4.15) though that adds a lot more to my plate before a long trip and I
>>> would prefer to avoid if I can.

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Qu Wenruo


On 2018年05月27日 09:27, Brad Templeton wrote:
> A few years ago, I encountered an issue (halfway between a bug and a
> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
> fairly full.   The problem was that after replacing (by add/delete) a
> small drive with a larger one, there were now 2 full drives and one
> new half-full one, and balance was not able to correct this situation
> to produce the desired result, which is 3 drives, each with a roughly
> even amount of free space.  It can't do it because the 2 smaller
> drives are full, and it doesn't realize it could just move one of the
> copies of a block off the smaller drive onto the larger drive to free
> space on the smaller drive, it wants to move them both, and there is
> nowhere to put them both.

It's not that easy.
For balance, btrfs must first find a large enough space to locate both
copy, then copy data.
Or if powerloss happens, it will cause data corruption.

So in your case, btrfs can only find enough space for one copy, thus
unable to relocate any chunk.

> 
> I'm about to do it again, taking my nearly full array which is 4TB,
> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
> repeat the very time consuming situation, so I wanted to find out if
> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
> consider the upgrade to  bionic (4.15) though that adds a lot more to
> my plate before a long trip and I would prefer to avoid if I can.

Since there is nothing to fix, the behavior will not change at all.

> 
> So what is the best strategy:
> 
> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
> strategy)
> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
> from 4TB but possibly not enough)
> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
> recently vacated 6TB -- much longer procedure but possibly better
> 
> Or has this all been fixed and method A will work fine and get to the
> ideal goal -- 3 drives, with available space suitably distributed to
> allow full utilization over time?

Btrfs chunk allocator is already trying to utilize all drivers for a
long long time.
When allocate chunks, btrfs will choose the device with the most free
space. However the nature of RAID1 needs btrfs to allocate extents from
2 different devices, which makes your replaced 4/4/6 a little complex.
(If your 4/4/6 array is set up and then filled to current stage, btrfs
should be able to utilize all the space)


Personally speaking, if you're confident enough, just add a new device,
and then do balance.
If enough chunks get balanced, there should be enough space freed on
existing disks.
Then remove the newly added device, then btrfs should handle the
remaining space well.

Thanks,
Qu

> 
> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton  wrote:
>> A few years ago, I encountered an issue (halfway between a bug and a
>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>> full.   The problem was that after replacing (by add/delete) a small drive
>> with a larger one, there were now 2 full drives and one new half-full one,
>> and balance was not able to correct this situation to produce the desired
>> result, which is 3 drives, each with a roughly even amount of free space.
>> It can't do it because the 2 smaller drives are full, and it doesn't realize
>> it could just move one of the copies of a block off the smaller drive onto
>> the larger drive to free space on the smaller drive, it wants to move them
>> both, and there is nowhere to put them both.
>>
>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB
>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
>> time consuming situation, so I wanted to find out if things were fixed now.
>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  bionic
>> (4.15) though that adds a lot more to my plate before a long trip and I
>> would prefer to avoid if I can.
>>
>> So what is the best strategy:
>>
>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
>> strategy)
>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from
>> 4TB but possibly not enough)
>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
>> vacated 6TB -- much longer procedure but possibly better
>>
>> Or has this all been fixed and method A will work fine and get to the ideal
>> goal -- 3 drives, with available space suitably distributed to allow full
>> utilization over time?
>>
>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager  wrote:
>>>
>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>>  wrote:
 On 23 March 2016 at 20:33, Chris Murphy  wrote:
>
> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton 
> wrote:
>>
>> I am surprised to hear it said that having the 

Re: RAID-1 refuses to balance large drive

2018-05-26 Thread Brad Templeton
A few years ago, I encountered an issue (halfway between a bug and a
problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
fairly full.   The problem was that after replacing (by add/delete) a
small drive with a larger one, there were now 2 full drives and one
new half-full one, and balance was not able to correct this situation
to produce the desired result, which is 3 drives, each with a roughly
even amount of free space.  It can't do it because the 2 smaller
drives are full, and it doesn't realize it could just move one of the
copies of a block off the smaller drive onto the larger drive to free
space on the smaller drive, it wants to move them both, and there is
nowhere to put them both.

I'm about to do it again, taking my nearly full array which is 4TB,
4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
repeat the very time consuming situation, so I wanted to find out if
things were fixed now.   I am running Xenial (kernel 4.4.0) and could
consider the upgrade to  bionic (4.15) though that adds a lot more to
my plate before a long trip and I would prefer to avoid if I can.

So what is the best strategy:

a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" strategy)
b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
from 4TB but possibly not enough)
c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
recently vacated 6TB -- much longer procedure but possibly better

Or has this all been fixed and method A will work fine and get to the
ideal goal -- 3 drives, with available space suitably distributed to
allow full utilization over time?

On Sat, May 26, 2018 at 6:24 PM, Brad Templeton  wrote:
> A few years ago, I encountered an issue (halfway between a bug and a
> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
> full.   The problem was that after replacing (by add/delete) a small drive
> with a larger one, there were now 2 full drives and one new half-full one,
> and balance was not able to correct this situation to produce the desired
> result, which is 3 drives, each with a roughly even amount of free space.
> It can't do it because the 2 smaller drives are full, and it doesn't realize
> it could just move one of the copies of a block off the smaller drive onto
> the larger drive to free space on the smaller drive, it wants to move them
> both, and there is nowhere to put them both.
>
> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB
> and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
> time consuming situation, so I wanted to find out if things were fixed now.
> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  bionic
> (4.15) though that adds a lot more to my plate before a long trip and I
> would prefer to avoid if I can.
>
> So what is the best strategy:
>
> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
> strategy)
> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from
> 4TB but possibly not enough)
> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
> vacated 6TB -- much longer procedure but possibly better
>
> Or has this all been fixed and method A will work fine and get to the ideal
> goal -- 3 drives, with available space suitably distributed to allow full
> utilization over time?
>
> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager  wrote:
>>
>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>  wrote:
>> > On 23 March 2016 at 20:33, Chris Murphy  wrote:
>> >>
>> >> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton 
>> >> wrote:
>> >> >
>> >> > I am surprised to hear it said that having the mixed sizes is an odd
>> >> > case.
>> >>
>> >> Not odd as in wrong, just uncommon compared to other arrangements being
>> >> tested.
>> >
>> > I think mixed drive sizes in raid1 is a killer feature for a home NAS,
>> > where you replace an old smaller drive with the latest and largest
>> > when you need more storage.
>> >
>> > My raid1 currently consists of 6TB+3TB+3*2TB.
>>
>> For the original OP situation, with chunks all filled op with extents
>> and devices all filled up with chunks, 'integrating' a new 6TB drive
>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
>> way in order to avoid immediate balancing needs:
>> - 'plug-in' the 6TB
>> - btrfs-replace  4TB by 6TB
>> - btrfs fi resize max 6TB_devID
>> - btrfs-replace  2TB by 4TB
>> - btrfs fi resize max 4TB_devID
>> - 'unplug' the 2TB
>>
>> So then there would be 2 devices with roughly 2TB space available, so
>> good for continued btrfs raid1 writes.
>>
>> An offline variant with dd instead of btrfs-replace could also be done
>> (I used to do that sometimes when btrfs-replace was not implemented).
>> My experience is that btrfs-replace speed is roughly at max speed (so
>> 

Re: RAID-1 refuses to balance large drive

2016-03-26 Thread Brad Templeton



For those curious as the the result, the reduction to single and
restoration to RAID1 did indeed balance the array.   It was extremely
slow of course on a 12tb array.   I did not bother doing this with the
metadata.   I also stopped the conversion to single when it had freed up
enough space on the 2 smaller drives, because at that time it was moving
stuff into the big drive, which seemed sub-optimal considering what was
to come.

In general, obviously, I hope the long term goal is to not need this,
indeed not to need manual balance at all.   I would hope the goal is to
just be able to add and remove drives, tell the system what type of
redundancy you need and let it figure out the rest.  But I know this is
an FS in development.

I've actually come to feel that when it comes to personal drive arrays,
we actually need something much smarter than today's filesystems.  Truth
is, for example, that once my infrequently accessed files, such as old
photo and video archives, have a solid backup made, there is not
actually a need to keep them redundantly at all, except for speed, while
the much smaller volume of frequently accessed files needs that (or even
extra redundancy not for safety but extra speed, and of course cache on
an SSD is even better.)   This requires not just the fileystem and OS to
get smarter about this, but even the apps.  It may happen some day -- no
matter how cheap storage gets, we keep coming up with ways to fill it.

Thanks for the help.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-25 Thread Duncan
Henk Slager posted on Fri, 25 Mar 2016 15:35:52 +0100 as excerpted:

> For the original OP situation, with chunks all filled op with extents
> and devices all filled up with chunks, 'integrating' a new 6TB drive
> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
> way in order to avoid immediate balancing needs:

> - 'plug-in' the 6TB
> - btrfs-replace  4TB by 6TB
> - btrfs fi resize max 6TB_devID
> - btrfs-replace  2TB by 4TB
> - btrfs fi resize max 4TB_devID
> - 'unplug' the 2TB

Way to think outside the box, Henk!  I'll have to remember this as it's
a very clever and rather useful method-tool to have in the ol' admin
toolbox (aka brain). =:^)

I only wish I had thought of it, as it sure seems clear... now that
you described it!

Greatly appreciated, in any case! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-25 Thread Henk Slager
On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
 wrote:
> On 23 March 2016 at 20:33, Chris Murphy  wrote:
>>
>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton  wrote:
>> >
>> > I am surprised to hear it said that having the mixed sizes is an odd
>> > case.
>>
>> Not odd as in wrong, just uncommon compared to other arrangements being 
>> tested.
>
> I think mixed drive sizes in raid1 is a killer feature for a home NAS,
> where you replace an old smaller drive with the latest and largest
> when you need more storage.
>
> My raid1 currently consists of 6TB+3TB+3*2TB.

For the original OP situation, with chunks all filled op with extents
and devices all filled up with chunks, 'integrating' a new 6TB drive
in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
way in order to avoid immediate balancing needs:
- 'plug-in' the 6TB
- btrfs-replace  4TB by 6TB
- btrfs fi resize max 6TB_devID
- btrfs-replace  2TB by 4TB
- btrfs fi resize max 4TB_devID
- 'unplug' the 2TB

So then there would be 2 devices with roughly 2TB space available, so
good for continued btrfs raid1 writes.

An offline variant with dd instead of btrfs-replace could also be done
(I used to do that sometimes when btrfs-replace was not implemented).
My experience is that btrfs-replace speed is roughly at max speed (so
harddisk magnetic media transferspeed) during the whole replace
process and it does in a more direct way what you actually want. So in
total mostly way faster device replace/upgrade than with the
add+delete method. And raid1 redundancy is active all the time. Of
course it means first make sure the system runs up-to-date/latest
kernel+tools.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-25 Thread Patrik Lundquist
On 23 March 2016 at 20:33, Chris Murphy  wrote:
>
> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton  wrote:
> >
> > I am surprised to hear it said that having the mixed sizes is an odd
> > case.
>
> Not odd as in wrong, just uncommon compared to other arrangements being 
> tested.

I think mixed drive sizes in raid1 is a killer feature for a home NAS,
where you replace an old smaller drive with the latest and largest
when you need more storage.

My raid1 currently consists of 6TB+3TB+3*2TB.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-24 Thread Andrew Vaughan
Hi Brad

Just a user here, not a dev.

I think I might have run into a similar bug about 6 months ago.

At the time I was running Debian stable.  (iirc that is kernel 3.16
and probably btrfs-progs of a similar vintage).

The filesystem was originally a 2 x 6TB array with a 4TB drive added
later when space began to get low.  I'm pretty sure I must have done
at least a partial balance after adding the 4TB drive, but something
like 1TB free on each of the two 6GB drives, and 2GB on the 4TB would
have been 'good enough for me'.

It was nearly full again when a copy unexpectedly reported
out-of-space.  Balance didn't fix it.  In retrospect btrfs had
probably run out of chunks on both 6TB drives.

I'm not sure what actually fixed it.  I upgraded to Debian testing
(something I was going to do soon anyway).  I might have also
temporarily added another drive.   (I have since had a 6TB drive fail,
and btrfs is running happily on 2x4TB, and 1x6TB).

More inline below.

On 24 March 2016 at 05:34, Chris Murphy  wrote:
> On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton  wrote:
>> Thanks for assist.  To reiterate what I said in private:
>>
>> a) I am fairly sure I swapped drives by adding the 6TB drive and then
>> removing the 2TB drive, which would not have made the 6TB think it was
>> only 2TB.The btrfs statistics commands have shown from the beginning
>> the size of the device as 6TB, and that after the remove, it haad 4TB
>> unallocated.
>
> I agree this seems to be consistent with what's been reported.
>



>>
>> Some options remaining open to me:
>>
>> a) I could re-add the 2TB device, which is still there.  Then balance
>> again, which hopefully would move a lot of stuff.   Then remove it again
>> and hopefully the new stuff would distribute mostly to the large drive.
>>  Then I could try balance again.
>
> Yeah, to do this will require -f to wipe the signature info from that
> drive when you add it. But I don't think this is a case of needing
> more free space, I think it might be due to the odd number of drives
> that are also fairly different in size.
>
If I recall correctly, when I did a device delete, I thought device
delete did remove the btrfs signature.  But I could be wrong

> But then what happens when you delete the 2TB drive after the balance?
> Do you end up right back in this same situation?
>

If balance manages to get the data properly distributed across the
drives, then the 2TB should be mostly empty, and device delete should
be able to remove the 2TB disk.   I successfully added a 4TB disk, did
a balance, and then removed a failing 6TB from the 3 drive array
above.

>
>>
>> b) It was suggested I could (with a good backup) convert the drive to
>> non-RAID1 to free up tons of space and then re-convert.  What's the
>> precise procedure for that?  Perhaps I can do it with a limit to see how
>> it works as an experiment?   Any way to specifically target the blocks
>> that have their two copies on the 2 smaller drives for conversion?
>
> btrfs balance -dconvert=single -mconvert=single -f   ## you have to
> use -f to force reduction in redundancy
> btrfs balance -dconvert=raid1 -mconvert=raid1

I would probably try upgrading to a newer kernel + btrfs-progs first.
Before converting back to raid1, I would also run btrfs device usage
and check to see whether the all devices have approximately the same
amount of unallocated space.  If they don't, maybe try running a full
balance again.



Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-24 Thread Duncan
Brad Templeton posted on Wed, 23 Mar 2016 19:49:00 -0700 as excerpted:

> On 03/23/2016 07:33 PM, Qu Wenruo wrote:
> 
>>> Still, it seems to me
>>> that the lack of space even after I filled the disks should not
>>> interfere with the balance's ability to move chunks which are found on
>>> both 3 and 4 so that one remains and one goes to the 6.  This action
>>> needs no spare space.   Now I presume the current algorithm perhaps
>>> does not work this way?
>> 
>> No, balance is not working like that.
>> Although most user consider balance is moving data, which is partly
>> right. The fact is, balance is, copy-and-delete. And it needs spare
>> space.
>> 
>> Means you must have enough space for the extents you are balancing,
>> then btrfs will copy them, update reference, and then delete old data
>> (with its block group).
>> 
>> So for balancing data in already filled device, btrfs needs to find
>> space for them first.
>> Which will need 2 devices with unallocated space for RAID1.
>> 
>> And in you case, you only have 1 devices with unallocated space, so no
>> space to balance.
> 
> Ah.  I would class this as a bug, or at least a non-optimal design.  If
> I understand, you say it tries to move both of the matching chunks to
> new homes.  This makes no sense if there are 3 drives because it is
> assured that one chunk is staying on the same drive.   Even with 4 or
> more drives, where this could make sense, in fact it would still be wise
> to attempt to move only one of the pair of chunks, and then move the
> other if that is also a good idea.

What balance does, at its most basic, is rewrite and in the process 
manipulate chunks in some desired way, depending on the filters used, if 
any.  Once the chunks have been rewritten, the old copies are deleted.  
But existing chunks are never simply left in place unless the filters 
exclude them entirely.  If they are rewritten, a new chunk is created and 
the old chunk is removed.

Now one of the simplest and most basic effects of this rewrite process is 
that where two or more chunks of the same type (typically data or 
metadata) are only partially full, the rewrite process will create a new 
chunk and start writing, filling it until it is full, then creating 
another and filling it, etc, which ends up compacting chunks as it 
rewrites them.  So if there's ten chunks and average of 50% full, it'll 
compact that into five chunks, 100% full.  The usage filter is very 
helpful here, letting you tell balance to only bother with chunks that 
are under say 10% (usage=10) full, where you'll get a pretty big effect 
for the effort, as 10 such chunks can be consolidated into one.  Of 
course that would only happen if you /had/ 10 such chunks under 10% full, 
but at say usage=50, you still get one freed chunk for every two balance 
rewrites, taking longer, but still far less time than it would take to 
rewrite 90% full chunks, with far more dramatic effects... as long as 
there are chunks to balance and combine at that usage level, of course.

Here, we're using a different side effect, the fact that with a raid1 
setup, there are always two copies of the chunk, one on each of exactly 
two devices, and that when new chunks are allocated, they *SHOULD* be 
allocated from the devices with the most free space, subject only to the 
rule that both copies cannot be on the same device, so the effect is that 
it'll allocate from the device with the most space left for the first 
copy, and then for the second copy, it'll allocate from the device with 
the most space left, but where the device list excludes the device that 
the first copy is on.

But, the point that Qu is making is that balance, by definition, rewrites 
both raid1 copies of the chunk.  It can't simply rewrite just the one 
that's on the fullest device to the most empty and leave the other copy 
alone.  So what it will do is allocate space for a new chunk from each of 
the two devices with the most space left, and will copy the chunks to 
them, only releasing the existing copies when the copy is done and the 
new copies are safely on their respective devices.

Which means that at least two devices MUST have space left in ordered to 
rebalance from raid1 to raid1.  If only one device has space left, no 
rebalance can be done.

Now your 3 TB and 4 TB devices, one each, are full, with space left only 
on the 6 TB device.  When you first switched from the 2 TB device to the 
6 TB device, the device delete would have rewritten from the 2 TB device 
to the 6 TB device, and you probably had some space left on the other 
devices at that point.  However, you didn't have enough space left on the 
other two devices to utilize much of the 6 TB device, because each time 
you allocated a chunk on the 6 TB device, a chunk had to be allocated on 
one of the others as well, and they simply didn't have enough space left 
by that point to do that too many times.


Now, you /did/ try to rebalance before you /fully/ ran out of space on 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Qu Wenruo



Brad Templeton wrote on 2016/03/23 19:49 -0700:



On 03/23/2016 07:33 PM, Qu Wenruo wrote:



The stage I talked about is only for you fill btrfs from scratch, with 3
4 6 devices.

Just as an example to explain how btrfs allocated space on un-even devices.



Then we had 4 + 3 + 6 + 2, but did not add more files or balance.

Then we had a remove of the 2, which caused, as expected, all the chunks
on the 2TB drive to be copied to the 6TB drive, as it was the most empty
drive.

Then we had a balance.  The balance (I would have expected) would have
moved chunks found on both 3 and 4, taking one of them and moving it to
the 6.  Generally alternating taking ones from the 3 and 4.   I can see
no reason this should not work even if 3 and 4 are almost entirely full,
but they were not.
But this did not happen.



2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk.
 After stage 1), we have 3/3/5 remaining space, then btrfs will pick
 space from 5T remaining(6T devices), and switch between the other 3T
 remaining one.

 Cause the remaining space to be 1/1/1.

3) Fake-even allocation stage: Allocate 1T raid chunk.
 Now all devices have the same unallocated space, and there are 3
 devices, we can't really balance all chunks across them.
 As we must and will only select 2 devices, in this stage, there will
 be 1T unallocated and never be used.

After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2
= 6.5T

Now let's talk about your 3 + 4 + 6 case.

For your initial state, 3 and 4 T devices is already filled up.
Even your 6T device have about 4T available space, it's only 1 device,
not 2 which raid1 needs.

So, no space for balance to allocate a new raid chunk. The extra 20G is
so small that almost makes no sence.


Yes, it was added as an experiment on the suggestion of somebody on the
IRC channel.  I will be rid of it soon.  Still, it seems to me that the
lack of space even after I filled the disks should not interfere with
the balance's ability to move chunks which are found on both 3 and 4 so
that one remains and one goes to the 6.  This action needs no spare
space.   Now I presume the current algorithm perhaps does not work
this way?


No, balance is not working like that.
Although most user consider balance is moving data, which is partly right.
The fact is, balance is, copy-and-delete. And it needs spare space.

Means you must have enough space for the extents you are balancing, then
btrfs will copy them, update reference, and then delete old data (with
its block group).

So for balancing data in already filled device, btrfs needs to find
space for them first.
Which will need 2 devices with unallocated space for RAID1.

And in you case, you only have 1 devices with unallocated space, so no
space to balance.


Ah.  I would class this as a bug, or at least a non-optimal design.  If
I understand, you say it tries to move both of the matching chunks to
new homes.  This makes no sense if there are 3 drives because it is
assured that one chunk is staying on the same drive.   Even with 4 or
more drives, where this could make sense, in fact it would still be wise
to attempt to move only one of the pair of chunks, and then move the
other if that is also a good idea.


For only one of the pair of chunk, you mean a stripe of a chunk.
And in that case, IIRC only replace is doing like that.

In most case, btrfs do in chunk unit, which means that may move data 
inside a device.


Even in that case, it's still useful.

For example, there is a chunk(1G size) which only contains 1 extent(4K).
Such balance can move the 4K extent into an existing chunk, and free the 
whole 1G chunk to allow new chunk to be created.


Considering balance is not only for making chunk allocation even, but 
also for a lot of other use, IMHO the behavior can hardly called as a bug.










My next plan is to add the 2tb back. If I am right, balance will move
chunks from 3 and 4 to the 2TB,


Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2
devices.
And if 2TB is filled and 3/4 and free space, it's also possible to 3/4
devices.

That will free 2TB in already filled up devices. But that's still not
enough to get space even.

You may need to balance several times(maybe 10+) to make space a little
even, as balance won't balance any chunk which is created by balance.
(Or balance will loop infinitely).


Now I understand -- I had not thought it would try to move 2 when that's
so obviously wrong on a 3-drive, and so I was not thinking of the
general case.  So I can now calculate that if I add the 2TB, in an ideal
situation, it will perhaps get 1TB of chunks and the 6TB will get 1TB of
chunks and then the 4 drives will have 3 with 1TB free, and the 6TB will
have 3TB free.   Then when I remove the 2TB, the 6TB should get all its
chunks and will have 2TB free and the other two 1TB free and that's
actually the right situation as all new blocks will appear on the 6TB
and one of the other 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Chris Murphy
On Wed, Mar 23, 2016 at 8:49 PM, Brad Templeton  wrote:
> On 03/23/2016 07:33 PM, Qu Wenruo wrote:
>>
>> No, balance is not working like that.
>> Although most user consider balance is moving data, which is partly right.
>> The fact is, balance is, copy-and-delete. And it needs spare space.
>>
>> Means you must have enough space for the extents you are balancing, then
>> btrfs will copy them, update reference, and then delete old data (with
>> its block group).
>>
>> So for balancing data in already filled device, btrfs needs to find
>> space for them first.
>> Which will need 2 devices with unallocated space for RAID1.
>>
>> And in you case, you only have 1 devices with unallocated space, so no
>> space to balance.
>
> Ah.  I would class this as a bug, or at least a non-optimal design.  If
> I understand, you say it tries to move both of the matching chunks to
> new homes.  This makes no sense if there are 3 drives because it is
> assured that one chunk is staying on the same drive.   Even with 4 or
> more drives, where this could make sense, in fact it would still be wise
> to attempt to move only one of the pair of chunks, and then move the
> other if that is also a good idea.

In a separate thread, it's observed that balance code is getting
complicated and it's probably important that it not be too smart for
itself.

The thing to understand is that chunks are a contiguous range of
physical sectors. What's really being copied are extents in those
chunks. And the balance not only rewrites extents but it tries to
collect them together to efficiently use the chunk space. The Btrfs
chunk isn't like an md chunk.

>
>
>>
>>
>>>
>>> My next plan is to add the 2tb back. If I am right, balance will move
>>> chunks from 3 and 4 to the 2TB,
>>
>> Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2
>> devices.
>> And if 2TB is filled and 3/4 and free space, it's also possible to 3/4
>> devices.
>>
>> That will free 2TB in already filled up devices. But that's still not
>> enough to get space even.
>>
>> You may need to balance several times(maybe 10+) to make space a little
>> even, as balance won't balance any chunk which is created by balance.
>> (Or balance will loop infinitely).
>
> Now I understand -- I had not thought it would try to move 2 when that's
> so obviously wrong on a 3-drive, and so I was not thinking of the
> general case.  So I can now calculate that if I add the 2TB, in an ideal
> situation, it will perhaps get 1TB of chunks and the 6TB will get 1TB of
> chunks and then the 4 drives will have 3 with 1TB free, and the 6TB will
> have 3TB free.

The problem is that you have two devices totally full now, devid1 and
devid2. So it's not certain it's going to start just copying chunks
off those drives. Whatever it does, it does on both chunk copies. It
might be moving them. It might be packing them more efficiently with
extents. No deallocation of a chunk can happen until it's empty. So
for two full drives it's difficult to see how this gets fixed just
with a regular balance. I think you have to go to single profile...
OR...

Add the 2TB.
Remove the 6TB and wait.

devid3 size 5.43TiB used 1.42TiB path /dev/sdg2   this
suggests 1.4TiB on the 6TB drive so it should be possible for those
chunks to get moved to the 2TB drive.

Now you have an empty 6TB, and you still have a (very full) raid1 with all data.

mkfs a new volume on the 6TB, btrfs send/receive to get all data on
the 6TB drive. "Data,RAID1: Size:3.87TiB, Used:3.87TiB" suggests only
4TB data so the 6TB can hold all of it.

Now you can umount the old volume; and you can force add 3TB and 4TB
to the new 6TB volume, and -dconvert=raid1 -mconvert=raid1

The worse case scenario is the the 6TB drive dies during the
conversion and then it could be totally broken and you have to go to
backup. But otherwise, it's a bit less risky than two balances to and
from single profile across three or even four drives.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Brad Templeton


On 03/23/2016 06:59 PM, Qu Wenruo wrote:

> 
> About chunk allocation problem, I hope to get a clear view of the whole
> disk layout now.
> 
> What's the final disk layout?
> Is that 4T + 3T + 6T + 20G layout?
> 
> If so, I'll say, in that case, only fully re-convert to single may help.
> As there is no enough space to allocate new raid1 chunks for balance
> them all.
> 
> 
> Chris Murphy may have already mentioned, btrfs chunk allocation has some
> limitation, although it is already more flex than mdadm.
> 
> 
> Btrfs chunk allocation will choose the device with most unallocated, and
> for raid1, it will ensure always pick 2 different devices to allocation.
> 
> This allocation does make btrfs raid1 allocation more space in a more
> flex method than mdadm raid1.
> But that only works if you start from scratch.
> 
> I'll explain it that case first.
> 
> 1) 6T and 4T devices only stage: Allocate 1T Raid1 chunk.
>As 6T and 4T devices have the most unallocated space, so the first
>1T raid chunk will be allocated from them.
>Remaining space: 3/3/5

This stage never existed.  We had a 4 + 3 + 2 stage, which was low-ish
on space but not full.  I mean it had hundreds of gb free.

Then we had 4 + 3 + 6 + 2, but did not add more files or balance.

Then we had a remove of the 2, which caused, as expected, all the chunks
on the 2TB drive to be copied to the 6TB drive, as it was the most empty
drive.

Then we had a balance.  The balance (I would have expected) would have
moved chunks found on both 3 and 4, taking one of them and moving it to
the 6.  Generally alternating taking ones from the 3 and 4.   I can see
no reason this should not work even if 3 and 4 are almost entirely full,
but they were not.
But this did not happen.

> 
> 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk.
>After stage 1), we have 3/3/5 remaining space, then btrfs will pick
>space from 5T remaining(6T devices), and switch between the other 3T
>remaining one.
> 
>Cause the remaining space to be 1/1/1.
> 
> 3) Fake-even allocation stage: Allocate 1T raid chunk.
>Now all devices have the same unallocated space, and there are 3
>devices, we can't really balance all chunks across them.
>As we must and will only select 2 devices, in this stage, there will
>be 1T unallocated and never be used.
> 
> After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2
> = 6.5T
> 
> Now let's talk about your 3 + 4 + 6 case.
> 
> For your initial state, 3 and 4 T devices is already filled up.
> Even your 6T device have about 4T available space, it's only 1 device,
> not 2 which raid1 needs.
> 
> So, no space for balance to allocate a new raid chunk. The extra 20G is
> so small that almost makes no sence.

Yes, it was added as an experiment on the suggestion of somebody on the
IRC channel.  I will be rid of it soon.  Still, it seems to me that the
lack of space even after I filled the disks should not interfere with
the balance's ability to move chunks which are found on both 3 and 4 so
that one remains and one goes to the 6.  This action needs no spare
space.   Now I presume the current algorithm perhaps does not work this way?

My next plan is to add the 2tb back. If I am right, balance will move
chunks from 3 and 4 to the 2TB, but it should not move any from the 6TB
because it has so much space.  LIkewise, when I re-remove the 2tb, all
its chunks should move to the 6tb, and I will be at least in a usable state.

Or is the single approach faster?

> 
> 
> The convert to single then back to raid1, will do its job partly.
> But according to other report from mail list.
> The result won't be perfect even, even the reporter uses devices with
> all same size.
> 
> 
> So to conclude:
> 
> 1) Btrfs will use most of devices space for raid1.
> 2) 1) only happens if one fills btrfs from scratch
> 3) For already filled case, convert to single then convert back will
>work, but not perfectly.
> 
> Thanks,
> Qu
> 
>>
>>
>>
>>> Under mdadm the bigger drive
>>> still helped, because it replaced at smaller drive, the one that was
>>> holding the RAID back, but you didn't get to use all the big drive until
>>> a year later when you had upgraded them all.  In the meantime you used
>>> the extra space in other RAIDs.  (For example, a raid-5 plus a raid-1 on
>>> the 2 bigger drives) Or you used the extra space as non-RAID space, ie.
>>> space for static stuff that has offline backups.  In fact, most of my
>>> storage is of that class (photo archives, reciprocal backups of other
>>> systems) where RAID is not needed.
>>>
>>> So the long story is, I think most home users are likely to always have
>>> different sizes and want their FS to treat it well.
>>
>> Yes of course. And at the expense of getting a frownie face
>>
>> "Btrfs is under heavy development, and is not suitable for
>> any uses other than benchmarking and review."
>> https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt
>>
>> Despite that 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Qu Wenruo



Chris Murphy wrote on 2016/03/23 13:33 -0600:

On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton  wrote:

It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4.  I will upgrade to
Xenial in April but probably not before, I don't have days to spend on
this.   Is there a fairly safe ppa to pull 4.4 or 4.5?


I'm not sure.


  In olden days, I

would patch and build my kernels from source but I just don't have time
for all the long-term sysadmin burden that creates any more.

Also, I presume if this is a bug, it's in btrfsprogs, though the new one
presumably needs a newer kernel too.


No you can mix and match progs and kernel versions. You just don't get
new features if you don't have a new kernel.

But the issue is the balance code is all in the kernel. It's activated
by user space tools but it's all actually done by kernel code.




I am surprised to hear it said that having the mixed sizes is an odd
case.


Not odd as in wrong, just uncommon compared to other arrangements being tested.


  That was actually one of the more compelling features of btrfs
that made me switch from mdadm, lvm and the rest.   I presumed most
people were the same. You need more space, you go out and buy a new
drive and of course the new drive is bigger than the old drives you
bought because they always get bigger.


Of course and I'm not saying it shouldn't work. The central problem
here is we don't even know what the problem really is; we only know
the manifestation of the problem isn't the desired or expected
outcome. And how to find out the cause is different than how to fix
it.


About chunk allocation problem, I hope to get a clear view of the whole 
disk layout now.


What's the final disk layout?
Is that 4T + 3T + 6T + 20G layout?

If so, I'll say, in that case, only fully re-convert to single may help.
As there is no enough space to allocate new raid1 chunks for balance 
them all.



Chris Murphy may have already mentioned, btrfs chunk allocation has some 
limitation, although it is already more flex than mdadm.



Btrfs chunk allocation will choose the device with most unallocated, and 
for raid1, it will ensure always pick 2 different devices to allocation.


This allocation does make btrfs raid1 allocation more space in a more 
flex method than mdadm raid1.

But that only works if you start from scratch.

I'll explain it that case first.

1) 6T and 4T devices only stage: Allocate 1T Raid1 chunk.
   As 6T and 4T devices have the most unallocated space, so the first
   1T raid chunk will be allocated from them.
   Remaining space: 3/3/5

2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk.
   After stage 1), we have 3/3/5 remaining space, then btrfs will pick
   space from 5T remaining(6T devices), and switch between the other 3T
   remaining one.

   Cause the remaining space to be 1/1/1.

3) Fake-even allocation stage: Allocate 1T raid chunk.
   Now all devices have the same unallocated space, and there are 3
   devices, we can't really balance all chunks across them.
   As we must and will only select 2 devices, in this stage, there will
   be 1T unallocated and never be used.

After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 
= 6.5T


Now let's talk about your 3 + 4 + 6 case.

For your initial state, 3 and 4 T devices is already filled up.
Even your 6T device have about 4T available space, it's only 1 device, 
not 2 which raid1 needs.


So, no space for balance to allocate a new raid chunk. The extra 20G is 
so small that almost makes no sence.



The convert to single then back to raid1, will do its job partly.
But according to other report from mail list.
The result won't be perfect even, even the reporter uses devices with 
all same size.



So to conclude:

1) Btrfs will use most of devices space for raid1.
2) 1) only happens if one fills btrfs from scratch
3) For already filled case, convert to single then convert back will
   work, but not perfectly.

Thanks,
Qu






Under mdadm the bigger drive
still helped, because it replaced at smaller drive, the one that was
holding the RAID back, but you didn't get to use all the big drive until
a year later when you had upgraded them all.  In the meantime you used
the extra space in other RAIDs.  (For example, a raid-5 plus a raid-1 on
the 2 bigger drives) Or you used the extra space as non-RAID space, ie.
space for static stuff that has offline backups.  In fact, most of my
storage is of that class (photo archives, reciprocal backups of other
systems) where RAID is not needed.

So the long story is, I think most home users are likely to always have
different sizes and want their FS to treat it well.


Yes of course. And at the expense of getting a frownie face

"Btrfs is under heavy development, and is not suitable for
any uses other than benchmarking and review."
https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt

Despite that disclosure, what you're describing is not what I'd expect
and not what I've 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Duncan
Brad Templeton posted on Wed, 23 Mar 2016 12:10:29 -0700 as excerpted:

> It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4.

Presumably that's a type for btrfs-progs.  Either that or Ubuntu's using 
a versioning that's totally different than upstream btrfs.  For some time 
now (since the 3.12 release, ancient history in btrfs terms), btrfs-progs 
has been release version synced with the kernel.  So the latest release 
is 4.5.0, to match the kernel 4.5.0 that came out shortly before that 
userspace release and that was developed at the same time.  Before that 
was 4.4.1, a primarily bugfix release to the previous 4.4.0.

Before 3.12, the previous actual userspace release, extremely stale by 
that point, was 0.19, tho there was a 0.20-rc1 release, that wasn't 
followed up with a 0.20 full release.  The recommendation back then was 
to run and for distros to ship git snapshots.

So where 0.4 came from I've not the foggiest, unless as I said it's a 
typo, perhaps for 4.0.

> I will upgrade to
> Xenial in April but probably not before, I don't have days to spend on
> this.   Is there a fairly safe ppa to pull 4.4 or 4.5?  In olden days, I
> would patch and build my kernels from source but I just don't have time
> for all the long-term sysadmin burden that creates any more.

Heh, this posting is from a gentooer, who builds /everything/ from 
sources. =:^)  Tho that's not really a problem as it can go on in the 
background and thus takes little actual attention time.

The real time is in figuring out what I need to know about what has 
changed between versions and if/how that needs to affect my existing 
config, but that's time that needs spent regardless of the distro, the 
major question being one of rolling distro and thus spending that time a 
bit here and a bit there as the various components upgrade, with a better 
chance of actually nailing down the problem to a specific package upgrade 
when there's issues, or doing it all in one huge version upgrade, which 
pretty much leaves you high and dry in terms of fixing problems since the 
entire world changes at once and it's thus nearly impossible to pin a bug 
to a particular package upgrade.


But meanwhile, as CMurphy says at the expense of a frowny face...

Given that btrfs is still maturing, and /not/ yet entirely stable and 
mature, and the fact that the list emphasis is on mainline, the list 
kernel recommendation is to follow one of two tracks, either mainline 
current, or mainline LTS.

If you choose the mainline current track, the recommendation is to stay 
within the latest two current kernel series.  With 4.5 out, that means 
you should be on 4.4 at least,  Previous non-LTS kernel series no longer 
get patch backports at least from mainline, and as we focus on mainline 
here, we're not tracking what distros may or may not backport on their 
own, so we simply can't provide the same level of support.

For LTS kernel track, the recommendation has recently relaxed slightly.  
Previously, it was again to stick with the latest two kernel LTS series, 
which would be 4.4 and 4.1.  However, the one previous to that was 3.18, 
and it has been reasonably stable, certainly more so that those previous 
to that, so while 4.1 or 4.4 is still what we really like to see, we 
recognize that some will be sticking to 3.18 and are continuing to try to 
support them as well, now that the LTS 4.4 has pushed it out of the 
primary recommended range.  But previous to that really isn't supported.

Not that we won't do best-effort, regardless, but in many instances, the 
best recommendation we can make with out-of-support kernels really is to 
upgrade to something more current, and try again.

Meanwhile, yes, we do recognize that distros have chosen to support btrfs 
on kernels outside that list.  But as I said, we don't track what patches 
the distros may or may not have backported, and thus aren't in a 
particularly good position to provide support for them.  The distros 
themselves, having chosen to provide that support, are in a far better 
position to do just that, since they know what they've backported and 
what they haven't.  So in that case, the best we can do is refer you to 
the distros whose support you are nominally relying on, to actually 
provide that support.

And obviously, kernel 4.2 isn't one of the ones named above.  It's 
neither a mainstream LTS, nor any longer within the last two current 
kernel releases.

So kernel upgrade, however you choose to do it, is strongly recommended, 
with two other alternatives if you prefer:

1) Ask your distro for support of versions off the mainline support 
list.  After all, they're the ones claiming to support the known to be 
not entirely stabilized and ready for production use btrfs on non-
mainline-LTS kernels long after mainline support for those non-LTS 
kernels has been dropped.

2) Choose a filesystem that better matches your needs, presumably because 
it /is/ fully mature and stable, and thus is properly 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Chris Murphy
On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton  wrote:
> It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4.  I will upgrade to
> Xenial in April but probably not before, I don't have days to spend on
> this.   Is there a fairly safe ppa to pull 4.4 or 4.5?

I'm not sure.


 In olden days, I
> would patch and build my kernels from source but I just don't have time
> for all the long-term sysadmin burden that creates any more.
>
> Also, I presume if this is a bug, it's in btrfsprogs, though the new one
> presumably needs a newer kernel too.

No you can mix and match progs and kernel versions. You just don't get
new features if you don't have a new kernel.

But the issue is the balance code is all in the kernel. It's activated
by user space tools but it's all actually done by kernel code.



> I am surprised to hear it said that having the mixed sizes is an odd
> case.

Not odd as in wrong, just uncommon compared to other arrangements being tested.

>  That was actually one of the more compelling features of btrfs
> that made me switch from mdadm, lvm and the rest.   I presumed most
> people were the same. You need more space, you go out and buy a new
> drive and of course the new drive is bigger than the old drives you
> bought because they always get bigger.

Of course and I'm not saying it shouldn't work. The central problem
here is we don't even know what the problem really is; we only know
the manifestation of the problem isn't the desired or expected
outcome. And how to find out the cause is different than how to fix
it.



> Under mdadm the bigger drive
> still helped, because it replaced at smaller drive, the one that was
> holding the RAID back, but you didn't get to use all the big drive until
> a year later when you had upgraded them all.  In the meantime you used
> the extra space in other RAIDs.  (For example, a raid-5 plus a raid-1 on
> the 2 bigger drives) Or you used the extra space as non-RAID space, ie.
> space for static stuff that has offline backups.  In fact, most of my
> storage is of that class (photo archives, reciprocal backups of other
> systems) where RAID is not needed.
>
> So the long story is, I think most home users are likely to always have
> different sizes and want their FS to treat it well.

Yes of course. And at the expense of getting a frownie face

"Btrfs is under heavy development, and is not suitable for
any uses other than benchmarking and review."
https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt

Despite that disclosure, what you're describing is not what I'd expect
and not what I've previously experienced. But I haven't had three
different sized drives, and they weren't particularly full, and I
don't know if you started with three from the outset at mkfs time or
if this is the result of two drives with a third added on later, etc.
So the nature of file systems is actually really complicated and it's
normal for there to be regressions - and maybe this is a regression,
hard to say with available information.



> Since 6TB is a relatively new size, I wonder if that plays a role.  More
> than 4TB of free space to balance into, could that confuse it?

Seems unlikely.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Alexander Fougner
2016-03-23 20:10 GMT+01:00 Brad Templeton :
> It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4.  I will upgrade to
> Xenial in April but probably not before, I don't have days to spend on
> this.   Is there a fairly safe ppa to pull 4.4 or 4.5?

Use the mainline ppa: http://kernel.ubuntu.com/~kernel-ppa/mainline/
Instructions: https://wiki.ubuntu.com/Kernel/MainlineBuilds

You can also find a newer btrfs-progs .deb here:
launchpad.net/ubuntu/+source/btrfs-tools

 In olden days, I
> would patch and build my kernels from source but I just don't have time
> for all the long-term sysadmin burden that creates any more.
>
> Also, I presume if this is a bug, it's in btrfsprogs, though the new one
> presumably needs a newer kernel too.
>
> I am surprised to hear it said that having the mixed sizes is an odd
> case.  That was actually one of the more compelling features of btrfs
> that made me switch from mdadm, lvm and the rest.   I presumed most
> people were the same. You need more space, you go out and buy a new
> drive and of course the new drive is bigger than the old drives you
> bought because they always get bigger.  Under mdadm the bigger drive
> still helped, because it replaced at smaller drive, the one that was
> holding the RAID back, but you didn't get to use all the big drive until
> a year later when you had upgraded them all.  In the meantime you used
> the extra space in other RAIDs.  (For example, a raid-5 plus a raid-1 on
> the 2 bigger drives) Or you used the extra space as non-RAID space, ie.
> space for static stuff that has offline backups.  In fact, most of my
> storage is of that class (photo archives, reciprocal backups of other
> systems) where RAID is not needed.
>
> So the long story is, I think most home users are likely to always have
> different sizes and want their FS to treat it well.
>
> Since 6TB is a relatively new size, I wonder if that plays a role.  More
> than 4TB of free space to balance into, could that confuse it?
>
> Off to do a backup (good idea anyway.)
>
>
>
> On 03/23/2016 11:34 AM, Chris Murphy wrote:
>> On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton  wrote:
>>> Thanks for assist.  To reiterate what I said in private:
>>>
>>> a) I am fairly sure I swapped drives by adding the 6TB drive and then
>>> removing the 2TB drive, which would not have made the 6TB think it was
>>> only 2TB.The btrfs statistics commands have shown from the beginning
>>> the size of the device as 6TB, and that after the remove, it haad 4TB
>>> unallocated.
>>
>> I agree this seems to be consistent with what's been reported.
>>
>>
>>>
>>> So I am looking for other options, or if people have commands I might
>>> execute to diagnose this (as it seems to be a flaw in balance) let me know.
>>
>> What version of btrfs-progs is this? I'm vaguely curious what 'btrfs
>> check' reports (without --repair). Any version is OK but it's better
>> to use something fairly recent since the check code continues to
>> change a lot.
>>
>> Another thing you could try is a newer kernel. Maybe there's a related
>> bug in 4.2.0. I think it may be more likely this is just an edge case
>> bug that's always been there, but it's valuable to know if recent
>> kernels exhibit the problem.
>>
>> And before proceeding with a change in layout (converting to another
>> profile) I suggest taking an image of the metadata with btrfs-image,
>> it might come in handy for a developer.
>>
>>
>>
>>>
>>> Some options remaining open to me:
>>>
>>> a) I could re-add the 2TB device, which is still there.  Then balance
>>> again, which hopefully would move a lot of stuff.   Then remove it again
>>> and hopefully the new stuff would distribute mostly to the large drive.
>>>  Then I could try balance again.
>>
>> Yeah, to do this will require -f to wipe the signature info from that
>> drive when you add it. But I don't think this is a case of needing
>> more free space, I think it might be due to the odd number of drives
>> that are also fairly different in size.
>>
>> But then what happens when you delete the 2TB drive after the balance?
>> Do you end up right back in this same situation?
>>
>>
>>
>>>
>>> b) It was suggested I could (with a good backup) convert the drive to
>>> non-RAID1 to free up tons of space and then re-convert.  What's the
>>> precise procedure for that?  Perhaps I can do it with a limit to see how
>>> it works as an experiment?   Any way to specifically target the blocks
>>> that have their two copies on the 2 smaller drives for conversion?
>>
>> btrfs balance -dconvert=single -mconvert=single -f   ## you have to
>> use -f to force reduction in redundancy
>> btrfs balance -dconvert=raid1 -mconvert=raid1
>>
>> There is the devid= filter but I'm not sure of the consequences of
>> limiting the conversion to two of three devices, that's kinda
>> confusing and is sufficiently an edge case I wonder how many bugs
>> you're looking to find today? :-)
>>
>>
>>
>>> c) Finally, I 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Brad Templeton
It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4.  I will upgrade to
Xenial in April but probably not before, I don't have days to spend on
this.   Is there a fairly safe ppa to pull 4.4 or 4.5?  In olden days, I
would patch and build my kernels from source but I just don't have time
for all the long-term sysadmin burden that creates any more.

Also, I presume if this is a bug, it's in btrfsprogs, though the new one
presumably needs a newer kernel too.

I am surprised to hear it said that having the mixed sizes is an odd
case.  That was actually one of the more compelling features of btrfs
that made me switch from mdadm, lvm and the rest.   I presumed most
people were the same. You need more space, you go out and buy a new
drive and of course the new drive is bigger than the old drives you
bought because they always get bigger.  Under mdadm the bigger drive
still helped, because it replaced at smaller drive, the one that was
holding the RAID back, but you didn't get to use all the big drive until
a year later when you had upgraded them all.  In the meantime you used
the extra space in other RAIDs.  (For example, a raid-5 plus a raid-1 on
the 2 bigger drives) Or you used the extra space as non-RAID space, ie.
space for static stuff that has offline backups.  In fact, most of my
storage is of that class (photo archives, reciprocal backups of other
systems) where RAID is not needed.

So the long story is, I think most home users are likely to always have
different sizes and want their FS to treat it well.

Since 6TB is a relatively new size, I wonder if that plays a role.  More
than 4TB of free space to balance into, could that confuse it?

Off to do a backup (good idea anyway.)



On 03/23/2016 11:34 AM, Chris Murphy wrote:
> On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton  wrote:
>> Thanks for assist.  To reiterate what I said in private:
>>
>> a) I am fairly sure I swapped drives by adding the 6TB drive and then
>> removing the 2TB drive, which would not have made the 6TB think it was
>> only 2TB.The btrfs statistics commands have shown from the beginning
>> the size of the device as 6TB, and that after the remove, it haad 4TB
>> unallocated.
> 
> I agree this seems to be consistent with what's been reported.
> 
> 
>>
>> So I am looking for other options, or if people have commands I might
>> execute to diagnose this (as it seems to be a flaw in balance) let me know.
> 
> What version of btrfs-progs is this? I'm vaguely curious what 'btrfs
> check' reports (without --repair). Any version is OK but it's better
> to use something fairly recent since the check code continues to
> change a lot.
> 
> Another thing you could try is a newer kernel. Maybe there's a related
> bug in 4.2.0. I think it may be more likely this is just an edge case
> bug that's always been there, but it's valuable to know if recent
> kernels exhibit the problem.
> 
> And before proceeding with a change in layout (converting to another
> profile) I suggest taking an image of the metadata with btrfs-image,
> it might come in handy for a developer.
> 
> 
> 
>>
>> Some options remaining open to me:
>>
>> a) I could re-add the 2TB device, which is still there.  Then balance
>> again, which hopefully would move a lot of stuff.   Then remove it again
>> and hopefully the new stuff would distribute mostly to the large drive.
>>  Then I could try balance again.
> 
> Yeah, to do this will require -f to wipe the signature info from that
> drive when you add it. But I don't think this is a case of needing
> more free space, I think it might be due to the odd number of drives
> that are also fairly different in size.
> 
> But then what happens when you delete the 2TB drive after the balance?
> Do you end up right back in this same situation?
> 
> 
> 
>>
>> b) It was suggested I could (with a good backup) convert the drive to
>> non-RAID1 to free up tons of space and then re-convert.  What's the
>> precise procedure for that?  Perhaps I can do it with a limit to see how
>> it works as an experiment?   Any way to specifically target the blocks
>> that have their two copies on the 2 smaller drives for conversion?
> 
> btrfs balance -dconvert=single -mconvert=single -f   ## you have to
> use -f to force reduction in redundancy
> btrfs balance -dconvert=raid1 -mconvert=raid1
> 
> There is the devid= filter but I'm not sure of the consequences of
> limiting the conversion to two of three devices, that's kinda
> confusing and is sufficiently an edge case I wonder how many bugs
> you're looking to find today? :-)
> 
> 
> 
>> c) Finally, I could take a full-full backup (my normal backups don't
>> bother with cached stuff and certain other things that you can recover)
>> and take the system down for a while to just wipe and restore the
>> volumes.  That doesn't find the bug, however.
> 
> I'd have the full backup no matter what choice you make. At any time
> for any reason any filesystem can face plant without warning.
> 
> But 

Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Chris Murphy
On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton  wrote:
> Thanks for assist.  To reiterate what I said in private:
>
> a) I am fairly sure I swapped drives by adding the 6TB drive and then
> removing the 2TB drive, which would not have made the 6TB think it was
> only 2TB.The btrfs statistics commands have shown from the beginning
> the size of the device as 6TB, and that after the remove, it haad 4TB
> unallocated.

I agree this seems to be consistent with what's been reported.


>
> So I am looking for other options, or if people have commands I might
> execute to diagnose this (as it seems to be a flaw in balance) let me know.

What version of btrfs-progs is this? I'm vaguely curious what 'btrfs
check' reports (without --repair). Any version is OK but it's better
to use something fairly recent since the check code continues to
change a lot.

Another thing you could try is a newer kernel. Maybe there's a related
bug in 4.2.0. I think it may be more likely this is just an edge case
bug that's always been there, but it's valuable to know if recent
kernels exhibit the problem.

And before proceeding with a change in layout (converting to another
profile) I suggest taking an image of the metadata with btrfs-image,
it might come in handy for a developer.



>
> Some options remaining open to me:
>
> a) I could re-add the 2TB device, which is still there.  Then balance
> again, which hopefully would move a lot of stuff.   Then remove it again
> and hopefully the new stuff would distribute mostly to the large drive.
>  Then I could try balance again.

Yeah, to do this will require -f to wipe the signature info from that
drive when you add it. But I don't think this is a case of needing
more free space, I think it might be due to the odd number of drives
that are also fairly different in size.

But then what happens when you delete the 2TB drive after the balance?
Do you end up right back in this same situation?



>
> b) It was suggested I could (with a good backup) convert the drive to
> non-RAID1 to free up tons of space and then re-convert.  What's the
> precise procedure for that?  Perhaps I can do it with a limit to see how
> it works as an experiment?   Any way to specifically target the blocks
> that have their two copies on the 2 smaller drives for conversion?

btrfs balance -dconvert=single -mconvert=single -f   ## you have to
use -f to force reduction in redundancy
btrfs balance -dconvert=raid1 -mconvert=raid1

There is the devid= filter but I'm not sure of the consequences of
limiting the conversion to two of three devices, that's kinda
confusing and is sufficiently an edge case I wonder how many bugs
you're looking to find today? :-)



> c) Finally, I could take a full-full backup (my normal backups don't
> bother with cached stuff and certain other things that you can recover)
> and take the system down for a while to just wipe and restore the
> volumes.  That doesn't find the bug, however.

I'd have the full backup no matter what choice you make. At any time
for any reason any filesystem can face plant without warning.

But yes this should definitely work or else you've definitely found a
bug. Finding the bug in your current scenario is harder because the
history of this volume makes it really non-deterministic whereas if
you start with a 3 disk volume at mkfs time, and then you reproduce
this problem, for sure it's a bug. And fairly straightforward to
reproduce.

I still recommend a newer kernel and progs though, just because
there's no work being done on 4.2 anymore. I suggest 4.4.6 and 4.4.1
progs. And then if you reproduce it, it's not just a bug, it's a
current bug.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Brad Templeton
Thanks for assist.  To reiterate what I said in private:

a) I am fairly sure I swapped drives by adding the 6TB drive and then
removing the 2TB drive, which would not have made the 6TB think it was
only 2TB.The btrfs statistics commands have shown from the beginning
the size of the device as 6TB, and that after the remove, it haad 4TB
unallocated.

b) Even if my memory is wrong and I did a replace (that's not even
documented in the wiki page on multiple device so I did not think I had
heard of it) I have since does a resize to "max" on all devices, and
still the balance moves nothing.   It says it processes almost all the
blocks it sees, but nothing changes.

So I am looking for other options, or if people have commands I might
execute to diagnose this (as it seems to be a flaw in balance) let me know.

Some options remaining open to me:

a) I could re-add the 2TB device, which is still there.  Then balance
again, which hopefully would move a lot of stuff.   Then remove it again
and hopefully the new stuff would distribute mostly to the large drive.
 Then I could try balance again.

b) It was suggested I could (with a good backup) convert the drive to
non-RAID1 to free up tons of space and then re-convert.  What's the
precise procedure for that?  Perhaps I can do it with a limit to see how
it works as an experiment?   Any way to specifically target the blocks
that have their two copies on the 2 smaller drives for conversion?

c) Finally, I could take a full-full backup (my normal backups don't
bother with cached stuff and certain other things that you can recover)
and take the system down for a while to just wipe and restore the
volumes.  That doesn't find the bug, however.

On 03/22/2016 11:17 PM, Chris Murphy wrote:
> On Tue, Mar 22, 2016 at 11:54 PM, Brad Templeton  wrote:
>> Actually, the URL suggests that all the space will be used, which is
>> what I had read about btrfs, that it handled this.
> 
> It will. But it does this by dominating writes to the devices that
> have the most free space, until all devices have the same free space.
> 
> 
>> But again, how could it possibly know to restrict the new device to only
>> using 2TB?
> 
> In your case, before resizing it, it's just inheriting the size from
> the device being replaced.
> 
>>
>> Stage one:  Add the new 6TB device.  The 2TB device is still present.
>>
>> Stage two:  Remove the 2TB device.
> 
> OK this is confusing. In your first post you said replaced. That
> suggests you used 'btrfs replace start' rather than 'btrfs device add'
> followed by 'btrfs device remove'. So which did you do?
> 
> If you did the latter, then there's no resize necessary.
> 
> 
>> The system copies everything on it
>> to the device which has the most space, the empty 6TB device.  But you
>> are saying it decides to _shrink_ the 6TB device now that we know it is
>> a 2TB device being removed?
> 
> No I'm not. The source of confusion appears to be that you're
> unfamiliar with 'btrfs replace' so you mean 'dev add' followed by 'dev
> remove' to mean replaced.
> 
> This line:
> devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
> 
> suggests it's using the entire 6TB of the newly added drive, it's
> already at max size.
> 
> 
>> We didn't know the 2TB would be removed
>> when we added the 6TB, so I just can't fathom why the code would do
>> that.  In addition, the stats I get back say it didn't do that.
> 
> I don't understand the first part. Whether you asked for 'dev remove'
> or you used 'replace' both of those mean removing some device. You
> have to specify the device to be removed.
> 
> Now might be a good time to actually write out the exact commands you've used.
> 
> 
>>
>> More to the point, after the resize, the balance is still not changing
>> any size numbers.  It should be moving blocks to the most empty device,
>> should it not?There is almost no space on devids 1 and 2, so it
>> would not copy any chunks there.
>>
>> I'm starting to think this is a bug, but I'll keep plugging.
> 
> Could be a bug. Three drive raid1 of different sizes is somewhat
> uncommon so it's possible it's hit an edge case somehow. Qu will know
> more about how to find out why it's not allocating mostly to the
> larger drive. The eventual work around might end up being to convert
> data chunks to single, then convert back to raid1. But before doing
> that it'd be better to find out why it's not doing the right thing the
> normal way.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-23 Thread Chris Murphy
On Tue, Mar 22, 2016 at 11:54 PM, Brad Templeton  wrote:
> Actually, the URL suggests that all the space will be used, which is
> what I had read about btrfs, that it handled this.

It will. But it does this by dominating writes to the devices that
have the most free space, until all devices have the same free space.


> But again, how could it possibly know to restrict the new device to only
> using 2TB?

In your case, before resizing it, it's just inheriting the size from
the device being replaced.

>
> Stage one:  Add the new 6TB device.  The 2TB device is still present.
>
> Stage two:  Remove the 2TB device.

OK this is confusing. In your first post you said replaced. That
suggests you used 'btrfs replace start' rather than 'btrfs device add'
followed by 'btrfs device remove'. So which did you do?

If you did the latter, then there's no resize necessary.


> The system copies everything on it
> to the device which has the most space, the empty 6TB device.  But you
> are saying it decides to _shrink_ the 6TB device now that we know it is
> a 2TB device being removed?

No I'm not. The source of confusion appears to be that you're
unfamiliar with 'btrfs replace' so you mean 'dev add' followed by 'dev
remove' to mean replaced.

This line:
devid3 size 5.43TiB used 1.42TiB path /dev/sdg2

suggests it's using the entire 6TB of the newly added drive, it's
already at max size.


> We didn't know the 2TB would be removed
> when we added the 6TB, so I just can't fathom why the code would do
> that.  In addition, the stats I get back say it didn't do that.

I don't understand the first part. Whether you asked for 'dev remove'
or you used 'replace' both of those mean removing some device. You
have to specify the device to be removed.

Now might be a good time to actually write out the exact commands you've used.


>
> More to the point, after the resize, the balance is still not changing
> any size numbers.  It should be moving blocks to the most empty device,
> should it not?There is almost no space on devids 1 and 2, so it
> would not copy any chunks there.
>
> I'm starting to think this is a bug, but I'll keep plugging.

Could be a bug. Three drive raid1 of different sizes is somewhat
uncommon so it's possible it's hit an edge case somehow. Qu will know
more about how to find out why it's not allocating mostly to the
larger drive. The eventual work around might end up being to convert
data chunks to single, then convert back to raid1. But before doing
that it'd be better to find out why it's not doing the right thing the
normal way.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Chris Murphy
On Tue, Mar 22, 2016 at 10:47 PM, Brad Templeton  wrote:
> That's rather counter intuitive behaviour.  In most FSs, resizes are
> needed when you do things like change the size of an underlying
> partition, or you weren't using all the partition.  When you add one
> drive with device add, and you then remove another with device delete,
> why and how would the added device know to size itself to the device
> that you are planning to delete?   Ie. I don't see how it could know
> (you add the new drive before even telling it you want to remove the old
> one) and I also can't see a reason it would not use all the drive you
> tell it to add.
>
> In any event, I did a btrfs fi resize 3:max /local on the 6TB as you
> suggest, and have another balance running but it appears like all the
> others to be doing nothing, though of course it will take hours.  Are
> you sure it works that way?  Even before the resize, as you see below,
> it indicates the volume is 6TB with 4TB of unallocated space.  It is
> only the df that says full (and the fact that there is no unallocated
> space on the 3TB and 4TB drives.)


It does work that way and I agree off hand that the lack of
automatically doing a resize to max is counter intuitive. I'd think
the user has implicitly set the size they want by handing over the
device to Btrfs, be it a whole device, partition or LV. There might be
some notes in the mail archive and possibly comments in btrfs-progs
that explains the logic.

devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
devid2 size 2.73TiB used 2.72TiB path /dev/sdh
devid3 size 5.43TiB used 1.42TiB path /dev/sdg

Also note that after a successful balance this will not be evenly
allocated because device sizes aren't even. Simplistically it'll do
something like this: copy 1 chunks on devid3 and copy 2 chunks on
devid1 until the free space on devid1 is equal to free space on
devid2. And then it'll start alternating copy 2 chunks between devid1
and 2, while copy 1 chunks continue to write on devid3. That happens
until free space on all three is equal, and then allocation alternates
among all three to try to maintain approximately equal free space
remaining.

You might find this helpful:
http://carfax.org.uk/btrfs-usage/



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Brad Templeton
That's rather counter intuitive behaviour.  In most FSs, resizes are
needed when you do things like change the size of an underlying
partition, or you weren't using all the partition.  When you add one
drive with device add, and you then remove another with device delete,
why and how would the added device know to size itself to the device
that you are planning to delete?   Ie. I don't see how it could know
(you add the new drive before even telling it you want to remove the old
one) and I also can't see a reason it would not use all the drive you
tell it to add.

In any event, I did a btrfs fi resize 3:max /local on the 6TB as you
suggest, and have another balance running but it appears like all the
others to be doing nothing, though of course it will take hours.  Are
you sure it works that way?  Even before the resize, as you see below,
it indicates the volume is 6TB with 4TB of unallocated space.  It is
only the df that says full (and the fact that there is no unallocated
space on the 3TB and 4TB drives.)

On 03/22/2016 09:01 PM, Qu Wenruo wrote:
> 
> 
> Brad Templeton wrote on 2016/03/22 17:47 -0700:
>> I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
>> a 6TB.  The other drives are a 3TB and a 4TB.After switching the
>> drive, I did a balance and ... essentially nothing changed.  It did not
>> balance clusters over to the 6TB drive off of the other 2 drives.  I
>> found it odd, and wondered if it would do it as needed, but as time went
>> on, the filesys got full for real.
> 
> Did you resized the replaced deivces to max?
> Without resize, btrfs still consider it can only use 2T of the 6T devices.
> 
> Thanks,
> Qu
> 
>>
>> Making inquiries on the IRC channel, it was suggested perhaps the drives
>> were too full for a balance, but they had at least 50gb free I would
>> estimate, when I swapped.As a test, I added a 4th drive, a spare
>> 20gb partition and did a balance.  The balance did indeed balance the 3
>> small drives, so they now each have 6gb unallocated, but the big drive
>> remained unchanged.   The balance reported it operated on almost all the
>> clusters, though.
>>
>> Linux kernel 4.2.0 (Ubuntu Wiley)
>>
>> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>>  Total devices 4 FS bytes used 3.88TiB
>>  devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
>>  devid2 size 2.73TiB used 2.72TiB path /dev/sdh
>>  devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
>>  devid4 size 20.00GiB used 14.00GiB path /dev/sda1
>>
>> btrfs fi usage /local
>>
>> Overall:
>>  Device size:  11.81TiB
>>  Device allocated:  7.77TiB
>>  Device unallocated:4.04TiB
>>  Device missing:  0.00B
>>  Used:  7.76TiB
>>  Free (estimated):  2.02TiB  (min: 2.02TiB)
>>  Data ratio:   2.00
>>  Metadata ratio:   2.00
>>  Global reserve:  512.00MiB  (used: 0.00B)
>>
>> Data,RAID1: Size:3.87TiB, Used:3.87TiB
>> /dev/sda1  14.00GiB
>> /dev/sdg2   1.41TiB
>> /dev/sdh2.72TiB
>> /dev/sdi2   3.61TiB
>>
>> Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
>> /dev/sdg2   5.00GiB
>> /dev/sdh7.00GiB
>> /dev/sdi2  10.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:572.00KiB
>> /dev/sdg2  32.00MiB
>> /dev/sdi2  32.00MiB
>>
>> Unallocated:
>> /dev/sda1   6.00GiB
>> /dev/sdg2   4.02TiB
>> /dev/sdh5.52GiB
>> /dev/sdi2   7.36GiB
>>
>> --
>> btrfs fi df /local
>> Data, RAID1: total=3.87TiB, used=3.87TiB
>> System, RAID1: total=32.00MiB, used=572.00KiB
>> Metadata, RAID1: total=11.00GiB, used=9.79GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> I would have presumed that a balance would take blocks found on both the
>> 3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
>> unallocated space.  But this does not happen.  Any clues on how to make
>> it happen?
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Qu Wenruo



Brad Templeton wrote on 2016/03/22 17:47 -0700:

I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
a 6TB.  The other drives are a 3TB and a 4TB.After switching the
drive, I did a balance and ... essentially nothing changed.  It did not
balance clusters over to the 6TB drive off of the other 2 drives.  I
found it odd, and wondered if it would do it as needed, but as time went
on, the filesys got full for real.


Did you resized the replaced deivces to max?
Without resize, btrfs still consider it can only use 2T of the 6T devices.

Thanks,
Qu



Making inquiries on the IRC channel, it was suggested perhaps the drives
were too full for a balance, but they had at least 50gb free I would
estimate, when I swapped.As a test, I added a 4th drive, a spare
20gb partition and did a balance.  The balance did indeed balance the 3
small drives, so they now each have 6gb unallocated, but the big drive
remained unchanged.   The balance reported it operated on almost all the
clusters, though.

Linux kernel 4.2.0 (Ubuntu Wiley)

Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
 Total devices 4 FS bytes used 3.88TiB
 devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
 devid2 size 2.73TiB used 2.72TiB path /dev/sdh
 devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
 devid4 size 20.00GiB used 14.00GiB path /dev/sda1

btrfs fi usage /local

Overall:
 Device size:  11.81TiB
 Device allocated:  7.77TiB
 Device unallocated:4.04TiB
 Device missing:  0.00B
 Used:  7.76TiB
 Free (estimated):  2.02TiB  (min: 2.02TiB)
 Data ratio:   2.00
 Metadata ratio:   2.00
 Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:3.87TiB, Used:3.87TiB
/dev/sda1  14.00GiB
/dev/sdg2   1.41TiB
/dev/sdh2.72TiB
/dev/sdi2   3.61TiB

Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
/dev/sdg2   5.00GiB
/dev/sdh7.00GiB
/dev/sdi2  10.00GiB

System,RAID1: Size:32.00MiB, Used:572.00KiB
/dev/sdg2  32.00MiB
/dev/sdi2  32.00MiB

Unallocated:
/dev/sda1   6.00GiB
/dev/sdg2   4.02TiB
/dev/sdh5.52GiB
/dev/sdi2   7.36GiB

--
btrfs fi df /local
Data, RAID1: total=3.87TiB, used=3.87TiB
System, RAID1: total=32.00MiB, used=572.00KiB
Metadata, RAID1: total=11.00GiB, used=9.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I would have presumed that a balance would take blocks found on both the
3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
unallocated space.  But this does not happen.  Any clues on how to make
it happen?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID-1 refuses to balance large drive

2016-03-22 Thread Brad Templeton
I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
a 6TB.  The other drives are a 3TB and a 4TB.After switching the
drive, I did a balance and ... essentially nothing changed.  It did not
balance clusters over to the 6TB drive off of the other 2 drives.  I
found it odd, and wondered if it would do it as needed, but as time went
on, the filesys got full for real.

Making inquiries on the IRC channel, it was suggested perhaps the drives
were too full for a balance, but they had at least 50gb free I would
estimate, when I swapped.As a test, I added a 4th drive, a spare
20gb partition and did a balance.  The balance did indeed balance the 3
small drives, so they now each have 6gb unallocated, but the big drive
remained unchanged.   The balance reported it operated on almost all the
clusters, though.

Linux kernel 4.2.0 (Ubuntu Wiley)

Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
Total devices 4 FS bytes used 3.88TiB
devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
devid2 size 2.73TiB used 2.72TiB path /dev/sdh
devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
devid4 size 20.00GiB used 14.00GiB path /dev/sda1

btrfs fi usage /local

Overall:
Device size:  11.81TiB
Device allocated:  7.77TiB
Device unallocated:4.04TiB
Device missing:  0.00B
Used:  7.76TiB
Free (estimated):  2.02TiB  (min: 2.02TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:3.87TiB, Used:3.87TiB
   /dev/sda1  14.00GiB
   /dev/sdg2   1.41TiB
   /dev/sdh2.72TiB
   /dev/sdi2   3.61TiB

Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
   /dev/sdg2   5.00GiB
   /dev/sdh7.00GiB
   /dev/sdi2  10.00GiB

System,RAID1: Size:32.00MiB, Used:572.00KiB
   /dev/sdg2  32.00MiB
   /dev/sdi2  32.00MiB

Unallocated:
   /dev/sda1   6.00GiB
   /dev/sdg2   4.02TiB
   /dev/sdh5.52GiB
   /dev/sdi2   7.36GiB

--
btrfs fi df /local
Data, RAID1: total=3.87TiB, used=3.87TiB
System, RAID1: total=32.00MiB, used=572.00KiB
Metadata, RAID1: total=11.00GiB, used=9.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I would have presumed that a balance would take blocks found on both the
3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
unallocated space.  But this does not happen.  Any clues on how to make
it happen?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html