Re: Convert from RAID 5 to 10

2016-12-01 Thread Wilson Meier
Am 30/11/16 um 17:48 schrieb Austin S. Hemmelgarn:
> On 2016-11-30 10:49, Wilson Meier wrote:
>> Am 30/11/16 um 15:37 schrieb Austin S. Hemmelgarn:
>>
>> Transferring this to car analogy, just to make it a bit more funny:
>> The airbag (raid level whatever) itself is ok but the micro controller
>> (general btrfs) which has the responsibility to inflate the airbag is
>> suffers some problems, sometimes doesn't inflate and the manufacturer
>> doesn't mention about that fact.
>> From your point of you the airbag is ok. From my point of view -> Don't
>> buy that car!!!
>> Don't you mean that the fact that the live safer suffers problems should
>> be noted and every dependent component should point to that fact?
>> I think it should.
>> I'm not talking about performance issues, i'm talking about data loss.
>> Now the next one can throw in "Backups, always make backups!".
>> Sure, but backup is backup and raid is raid. Both have their own
>> concerns.
> A better analogy for a car would be something along the lines of the
> radio working fine but the general wiring having issues that cause all
> the electronics in the car to stop working under certain
> circumstances. In that case, the radio itself is absolutely OK, but it
> suffers from issues caused directly by poor design elsewhere in the
> vehicle.
Ahm, no. You cannot exchange a security mechanism (raid) with a comfort
one (compression) and treat them as the same in terms of importance.
It makes a serious difference to have a not properly working airbag or
not being able to listen to music while your a driving against a wall.
Anyway, we should stop this here.
>>>> I'm not angry or something like that :) .
>>>> I just would like to have the possibility to read such information
>>>> about
>>>> the storage i put my personal data (> 3 TB) on its official wiki.
> There are more places than the wiki to look for info about BTRFS (and
> this is the case about almost any piece of software, not just BTRFS,
> very few things have one central source for everything).  I don't mean
> to sound unsympathetic, but given what you're saying, it's sounding
> more and more like you didn't look at anything beyond the wiki and
> should have checked other sources as well.
This is your assumption.


Am 01/12/16 um 07:47 schrieb Duncan:
> Austin S. Hemmelgarn posted on Wed, 30 Nov 2016 11:48:57 -0500 as
> excerpted:
>> On 2016-11-30 10:49, Wilson Meier wrote:
>>> Do you also have all home users in mind, which go to vacation (sometime
>>>> 3 weeks) and don't have a 24/7 support team to replace monitored disks
>>> which do report SMART errors?
>> Better than 90% of people I know either shut down their systems when
>> they're going to be away for a long period of time, or like me have
>> ways to log in remotely and tell the FS to not use that disk anymore.
> https://btrfs.wiki.kernel.org/index.php/Getting_started ... ... has
> two warnings offset in red right in the first section: * If you have
> btrfs filesystems, run the latest kernel.
I do. Ok not the very latest but i'm always on the latest major version.
Right now i have 4.8.4 and the very latest is 4.8.11.
> * You should keep and test backups of your data, and be prepared to use 
> them.
I have daily backups.
> As to the three weeks vacation thing... And "daily use" != "three
> weeks without physical access to something you're going to actually be
> relying on for parts of those three weeks".
>
Maybe i have my own mailserver and owncloud to server files to my
family? Maybe i'm out of country and somewhere i have no internet access?
I will not comment this any further as it leads us nowhere.


In general i think that this discussion is taking a complete wrong
direction.
The only thing i have asked for is to document the *known*
problems/flaws/limitations of all raid profiles and link to them from
the stability matrix.

Regarding raid10:
Even if one knows about the fact that btrfs handles things on chunk
level one would assume that the code is written in a way to put the
copies on different stripes.
Otherwise raid10 ***can*** become pretty useless in terms of data
redundancy and 2 x raid1 with an lvm should be considered as a replacement.
This is a serious thing and should be documented. If this is documented
somewhere then please point me to it as i cannot find a word about that
anywhere.

Cheers,
Wilson


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Convert from RAID 5 to 10

2016-11-30 Thread Wilson Meier


Am 30/11/16 um 15:37 schrieb Austin S. Hemmelgarn:
> On 2016-11-30 08:12, Wilson Meier wrote:
>> Am 30/11/16 um 11:41 schrieb Duncan:
>>> Wilson Meier posted on Wed, 30 Nov 2016 09:35:36 +0100 as excerpted:
>>>
>>>> Am 30/11/16 um 09:06 schrieb Martin Steigerwald:
>>>>> Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov:
>>>>>> [snip]
>>>>> So the stability matrix would need to be updated not to recommend any
>>>>> kind of BTRFS RAID 1 at the moment?
>>>>>
>>>>> Actually I faced the BTRFS RAID 1 read only after first attempt of
>>>>> mounting it "degraded" just a short time ago.
>>>>>
>>>>> BTRFS still needs way more stability work it seems to me.
>>>>>
>>>> I would say the matrix should be updated to not recommend any RAID
>>>> Level
>>>> as from the discussion it seems they all of them have flaws.
>>>> To me RAID is broken if one cannot expect to recover from a device
>>>> failure in a solid way as this is why RAID is used.
>>>> Correct me if i'm wrong. Right now i'm making my thoughts about
>>>> migrating to another FS and/or Hardware RAID.
>>> It should be noted that no list regular that I'm aware of anyway, would
>>> make any claims about btrfs being stable and mature either now or in
>>> the
>>> near-term future in any case.  Rather to the contrary, as I
>>> generally put
>>> it, btrfs is still stabilizing and maturing, with backups one is
>>> willing
>>> to use (and as any admin of any worth would say, a backup that hasn't
>>> been tested usable isn't yet a backup; the job of creating the backup
>>> isn't done until that backup has been tested actually usable for
>>> recovery) still extremely strongly recommended.  Similarly, keeping up
>>> with the list is recommended, as is staying relatively current on both
>>> the kernel and userspace (generally considered to be within the latest
>>> two kernel series of either current or LTS series kernels, and with a
>>> similarly versioned btrfs userspace).
>>>
>>> In that context, btrfs single-device and raid1 (and raid0 of course)
>>> are
>>> quite usable and as stable as btrfs in general is, that being
>>> stabilizing
>>> but not yet fully stable and mature, with raid10 being slightly less so
>>> and raid56 being much more experimental/unstable at this point.
>>>
>>> But that context never claims full stability even for the relatively
>>> stable raid1 and single device modes, and in fact anticipates that
>>> there
>>> may be times when recovery from the existing filesystem may not be
>>> practical, thus the recommendation to keep tested usable backups at the
>>> ready.
>>>
>>> Meanwhile, it remains relatively common on this list for those
>>> wondering
>>> about their btrfs on long-term-stale (not a typo) "enterprise" distros,
>>> or even debian-stale, to be actively steered away from btrfs,
>>> especially
>>> if they're not willing to update to something far more current than
>>> those
>>> distros often provide, because in general, the current stability status
>>> of btrfs is in conflict with the reason people generally choose to use
>>> that level of old and stale software in the first place -- they
>>> prioritize tried and tested to work, stable and mature, over the latest
>>> generally newer and flashier featured but sometimes not entirely
>>> stable,
>>> and btrfs at this point simply doesn't meet that sort of stability/
>>> maturity expectations, nor is it likely to for some time (measured in
>>> years), due to all the reasons enumerated so well in the above thread.
>>>
>>>
>>> In that context, the stability status matrix on the wiki is already
>>> reasonably accurate, certainly so IMO, because "OK" in context means as
>>> OK as btrfs is in general, and btrfs itself remains still stabilizing,
>>> not fully stable and mature.
>>>
>>> If there IS an argument as to the accuracy of the raid0/1/10 OK status,
>>> I'd argue it's purely due to people not understanding the status of
>>> btrfs
>>> in general, and that if there's a general deficiency at all, it's in
>>> the
>>> lack of a general stability status paragraph on that page itself
>>> explaining all this, despite the fact that the main https:/

Re: Convert from RAID 5 to 10

2016-11-30 Thread Wilson Meier
Am 30/11/16 um 11:41 schrieb Duncan:
> Wilson Meier posted on Wed, 30 Nov 2016 09:35:36 +0100 as excerpted:
>
>> Am 30/11/16 um 09:06 schrieb Martin Steigerwald:
>>> Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov:
>>>> [snip]
>>> So the stability matrix would need to be updated not to recommend any
>>> kind of BTRFS RAID 1 at the moment?
>>>
>>> Actually I faced the BTRFS RAID 1 read only after first attempt of
>>> mounting it "degraded" just a short time ago.
>>>
>>> BTRFS still needs way more stability work it seems to me.
>>>
>> I would say the matrix should be updated to not recommend any RAID Level
>> as from the discussion it seems they all of them have flaws.
>> To me RAID is broken if one cannot expect to recover from a device
>> failure in a solid way as this is why RAID is used.
>> Correct me if i'm wrong. Right now i'm making my thoughts about
>> migrating to another FS and/or Hardware RAID.
> It should be noted that no list regular that I'm aware of anyway, would 
> make any claims about btrfs being stable and mature either now or in the 
> near-term future in any case.  Rather to the contrary, as I generally put 
> it, btrfs is still stabilizing and maturing, with backups one is willing 
> to use (and as any admin of any worth would say, a backup that hasn't 
> been tested usable isn't yet a backup; the job of creating the backup 
> isn't done until that backup has been tested actually usable for 
> recovery) still extremely strongly recommended.  Similarly, keeping up 
> with the list is recommended, as is staying relatively current on both 
> the kernel and userspace (generally considered to be within the latest 
> two kernel series of either current or LTS series kernels, and with a 
> similarly versioned btrfs userspace).
>
> In that context, btrfs single-device and raid1 (and raid0 of course) are 
> quite usable and as stable as btrfs in general is, that being stabilizing 
> but not yet fully stable and mature, with raid10 being slightly less so 
> and raid56 being much more experimental/unstable at this point.
>
> But that context never claims full stability even for the relatively 
> stable raid1 and single device modes, and in fact anticipates that there 
> may be times when recovery from the existing filesystem may not be 
> practical, thus the recommendation to keep tested usable backups at the 
> ready.
>
> Meanwhile, it remains relatively common on this list for those wondering 
> about their btrfs on long-term-stale (not a typo) "enterprise" distros, 
> or even debian-stale, to be actively steered away from btrfs, especially 
> if they're not willing to update to something far more current than those 
> distros often provide, because in general, the current stability status 
> of btrfs is in conflict with the reason people generally choose to use 
> that level of old and stale software in the first place -- they 
> prioritize tried and tested to work, stable and mature, over the latest 
> generally newer and flashier featured but sometimes not entirely stable, 
> and btrfs at this point simply doesn't meet that sort of stability/
> maturity expectations, nor is it likely to for some time (measured in 
> years), due to all the reasons enumerated so well in the above thread.
>
>
> In that context, the stability status matrix on the wiki is already 
> reasonably accurate, certainly so IMO, because "OK" in context means as 
> OK as btrfs is in general, and btrfs itself remains still stabilizing, 
> not fully stable and mature.
>
> If there IS an argument as to the accuracy of the raid0/1/10 OK status, 
> I'd argue it's purely due to people not understanding the status of btrfs 
> in general, and that if there's a general deficiency at all, it's in the 
> lack of a general stability status paragraph on that page itself 
> explaining all this, despite the fact that the main https://
> btrfs.wiki.kernel.org landing page states quite plainly under stability 
> status that btrfs remains under heavy development and that current 
> kernels are strongly recommended.  (Tho were I editing it, there'd 
> certainly be a more prominent mention of keeping backups at the ready as 
> well.)
>
Hi Duncan,

i understand your arguments but cannot fully agree.
First of all, i'm not sticking with old stale versions of whatever as i
try to keep my system up2date.
My kernel is 4.8.4 (Gentoo) and btrfs-progs is 4.8.4.
That being said, i'm quite aware of the heavy development status of
btrfs but pointing the finger on the users saying that they don't fully
understand the status of btrfs without giving the information on the
wiki is in my opinion not the right

Re: Convert from RAID 5 to 10

2016-11-30 Thread Wilson Meier


Am 30/11/16 um 09:06 schrieb Martin Steigerwald:
> Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov:
>> On Wed, 30 Nov 2016 00:16:48 +0100
>>
>> Wilson Meier <wilson.me...@gmail.com> wrote:
>>> That said, btrfs shouldn't be used for other then raid1 as every other
>>> raid level has serious problems or at least doesn't work as the expected
>>> raid level (in terms of failure recovery).
>> RAID1 shouldn't be used either:
>>
>> *) Read performance is not optimized: all metadata is always read from the
>> first device unless it has failed, data reads are supposedly balanced
>> between devices per PID of the process reading. Better implementations
>> dispatch reads per request to devices that are currently idle.
>>
>> *) Write performance is not optimized, during long full bandwidth sequential
>> writes it is common to see devices writing not in parallel, but with a long
>> periods of just one device writing, then another. (Admittedly have been
>> some time since I tested that).
>>
>> *) A degraded RAID1 won't mount by default.
>>
>> If this was the root filesystem, the machine won't boot.
>>
>> To mount it, you need to add the "degraded" mount option.
>> However you have exactly a single chance at that, you MUST restore the RAID
>> to non-degraded state while it's mounted during that session, since it
>> won't ever mount again in the r/w+degraded mode, and in r/o mode you can't
>> perform any operations on the filesystem, including adding/removing
>> devices.
>>
>> *) It does not properly handle a device disappearing during operation.
>> (There is a patchset to add that).
>>
>> *) It does not properly handle said device returning (under a
>> different /dev/sdX name, for bonus points).
>>
>> Most of these also apply to all other RAID levels.
> So the stability matrix would need to be updated not to recommend any kind of 
> BTRFS RAID 1 at the moment?
>
> Actually I faced the BTRFS RAID 1 read only after first attempt of mounting 
> it 
> "degraded" just a short time ago.
>
> BTRFS still needs way more stability work it seems to me.
>
I would say the matrix should be updated to not recommend any RAID Level
as from the discussion it seems they all of them have flaws.
To me RAID is broken if one cannot expect to recover from a device
failure in a solid way as this is why RAID is used.
Correct me if i'm wrong. Right now i'm making my thoughts about
migrating to another FS and/or Hardware RAID.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Convert from RAID 5 to 10

2016-11-29 Thread Wilson Meier


On 30.11.2016 00:49, Chris Murphy wrote:
> On Tue, Nov 29, 2016 at 4:16 PM, Wilson Meier <wilson.me...@gmail.com> wrote:
>>
>>
>> On 29.11.2016 23:52, Chris Murphy wrote:
>>> On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> 
>>> wrote:
>>>> On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
>>>>> On 2016-11-29 12:20, Florian Lindner wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have 4 harddisks with 3TB capacity each. They are all used in a
>>>>>> btrfs RAID 5. It has come to my attention, that there
>>>>>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>>>>>> that, I want to convert the the raid 5 to a raid 10
>>>>>> and I have several questions.
>>>>>>
>>>>>> * Is that possible as an online conversion?
>>>>> Yes, as long as you have a complete array to begin with (converting from
>>>>> a degraded raid5/6 array has the same issues as rebuilding a degraded
>>>>> raid5/6 array).
>>>>>>
>>>>>> * Since my effective capacity will shrink during conversions, does
>>>>>> btrfs check if there is enough free capacity to
>>>>>> convert? As you see below, right now it's probably too full, but I'm
>>>>>> going to delete some stuff.
>>>>> No, you'll have to do the math yourself.  This would be a great project
>>>>> idea to place on the wiki though.
>>>>>>
>>>>>> * I understand the command to convert is
>>>>>>
>>>>>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>>>>>
>>>>>> Correct?
>>>>> Yes, but I would personally convert first metadata then data.  The
>>>>> raid10 profile gets better performance than raid5, so converting the
>>>>> metadata first (by issuing a balance just covering the metadata) should
>>>>> speed up the data conversion a bit).
>>>>>>
>>>>>> * What disks are allowed to fail? My understanding of a raid 10 is
>>>>>> like that
>>>>>>
>>>>>> disks = {a, b, c, d}
>>>>>>
>>>>>> raid0( raid1(a, b), raid1(c, d) )
>>>>>>
>>>>>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>>>>>> to fail (either a or b and c or d are allowed to fail)
>>>>>>
>>>>>> How is that with a btrfs raid 10?
>>>>> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
>>>>> work like you show, but in practice it doesn't.
>>>> I'm a little bit concerned right now. I migrated my 4 disk raid6 to
>>>> raid10 because of the known raid5/6 problems. I assumed that btrfs
>>>> raid10 can handle 2 disk failures as longs as they occur in different
>>>> stripes.
>>>> Could you please point out why it cannot sustain 2 disk failures?
>>>
>>> Conventional raid10 has a fixed assignment of which drives are
>>> mirrored pairs, and this doesn't happen with Btrfs at the device level
>>> but rather the chunk level. And a chunk stripe number is not fixed to
>>> a particular device, therefore it's possible a device will have more
>>> than one chunk stripe number. So what that means is the loss of two
>>> devices has a pretty decent chance of resulting in the loss of both
>>> copies of a chunk, whereas conventional RAID 10 must lose both
>>> mirrored pairs for data loss to happen.
>>>
>>> With very cursory testing what I've found is btrfs-progs establishes
>>> an initial stripe number to device mapping that's different than the
>>> kernel code. The kernel code appears to be pretty consistent so long
>>> as the member devices are identically sized. So it's probably not an
>>> unfixable problem, but the effect is that right now Btrfs raid10
>>> profile is more like raid0+1.
>>>
>>> You can use
>>> $ sudo btrfs insp dump-tr -t 3 /dev/
>>>
>>> That will dump the chunk tree, and you can see if any device has more
>>> than one chunk stripe number associated with it.
>>>
>>>
>> Huh, that makes sense. That probably should be fixed :)
>>
>> Given your advised command (extended it a bit for readability):
>> # btrfs insp dump-tr -t 3 /dev/mapper/luks-2.1 | grep "stripe " 

Re: Convert from RAID 5 to 10

2016-11-29 Thread Wilson Meier


On 29.11.2016 23:52, Chris Murphy wrote:
> On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> wrote:
>> On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
>>> On 2016-11-29 12:20, Florian Lindner wrote:
>>>> Hello,
>>>>
>>>> I have 4 harddisks with 3TB capacity each. They are all used in a
>>>> btrfs RAID 5. It has come to my attention, that there
>>>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>>>> that, I want to convert the the raid 5 to a raid 10
>>>> and I have several questions.
>>>>
>>>> * Is that possible as an online conversion?
>>> Yes, as long as you have a complete array to begin with (converting from
>>> a degraded raid5/6 array has the same issues as rebuilding a degraded
>>> raid5/6 array).
>>>>
>>>> * Since my effective capacity will shrink during conversions, does
>>>> btrfs check if there is enough free capacity to
>>>> convert? As you see below, right now it's probably too full, but I'm
>>>> going to delete some stuff.
>>> No, you'll have to do the math yourself.  This would be a great project
>>> idea to place on the wiki though.
>>>>
>>>> * I understand the command to convert is
>>>>
>>>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>>>
>>>> Correct?
>>> Yes, but I would personally convert first metadata then data.  The
>>> raid10 profile gets better performance than raid5, so converting the
>>> metadata first (by issuing a balance just covering the metadata) should
>>> speed up the data conversion a bit).
>>>>
>>>> * What disks are allowed to fail? My understanding of a raid 10 is
>>>> like that
>>>>
>>>> disks = {a, b, c, d}
>>>>
>>>> raid0( raid1(a, b), raid1(c, d) )
>>>>
>>>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>>>> to fail (either a or b and c or d are allowed to fail)
>>>>
>>>> How is that with a btrfs raid 10?
>>> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
>>> work like you show, but in practice it doesn't.
>> I'm a little bit concerned right now. I migrated my 4 disk raid6 to
>> raid10 because of the known raid5/6 problems. I assumed that btrfs
>> raid10 can handle 2 disk failures as longs as they occur in different
>> stripes.
>> Could you please point out why it cannot sustain 2 disk failures?
> 
> Conventional raid10 has a fixed assignment of which drives are
> mirrored pairs, and this doesn't happen with Btrfs at the device level
> but rather the chunk level. And a chunk stripe number is not fixed to
> a particular device, therefore it's possible a device will have more
> than one chunk stripe number. So what that means is the loss of two
> devices has a pretty decent chance of resulting in the loss of both
> copies of a chunk, whereas conventional RAID 10 must lose both
> mirrored pairs for data loss to happen.
> 
> With very cursory testing what I've found is btrfs-progs establishes
> an initial stripe number to device mapping that's different than the
> kernel code. The kernel code appears to be pretty consistent so long
> as the member devices are identically sized. So it's probably not an
> unfixable problem, but the effect is that right now Btrfs raid10
> profile is more like raid0+1.
> 
> You can use
> $ sudo btrfs insp dump-tr -t 3 /dev/
> 
> That will dump the chunk tree, and you can see if any device has more
> than one chunk stripe number associated with it.
> 
> 
Huh, that makes sense. That probably should be fixed :)

Given your advised command (extended it a bit for readability):
# btrfs insp dump-tr -t 3 /dev/mapper/luks-2.1 | grep "stripe " | awk '{
print $1" "$2" "$3" "$4 }' | sort -u

I get:
stripe 0 devid 1
stripe 0 devid 4
stripe 1 devid 2
stripe 1 devid 3
stripe 1 devid 4
stripe 2 devid 1
stripe 2 devid 2
stripe 2 devid 3
stripe 3 devid 1
stripe 3 devid 2
stripe 3 devid 3
stripe 3 devid 4

Now i'm even more concerned!
That said, btrfs shouldn't be used for other then raid1 as every other
raid level has serious problems or at least doesn't work as the expected
raid level (in terms of failure recovery).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Convert from RAID 5 to 10

2016-11-29 Thread Wilson Meier
On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
> On 2016-11-29 12:20, Florian Lindner wrote:
>> Hello,
>>
>> I have 4 harddisks with 3TB capacity each. They are all used in a
>> btrfs RAID 5. It has come to my attention, that there
>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>> that, I want to convert the the raid 5 to a raid 10
>> and I have several questions.
>>
>> * Is that possible as an online conversion?
> Yes, as long as you have a complete array to begin with (converting from
> a degraded raid5/6 array has the same issues as rebuilding a degraded
> raid5/6 array).
>>
>> * Since my effective capacity will shrink during conversions, does
>> btrfs check if there is enough free capacity to
>> convert? As you see below, right now it's probably too full, but I'm
>> going to delete some stuff.
> No, you'll have to do the math yourself.  This would be a great project
> idea to place on the wiki though.
>>
>> * I understand the command to convert is
>>
>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>
>> Correct?
> Yes, but I would personally convert first metadata then data.  The
> raid10 profile gets better performance than raid5, so converting the
> metadata first (by issuing a balance just covering the metadata) should
> speed up the data conversion a bit).
>>
>> * What disks are allowed to fail? My understanding of a raid 10 is
>> like that
>>
>> disks = {a, b, c, d}
>>
>> raid0( raid1(a, b), raid1(c, d) )
>>
>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>> to fail (either a or b and c or d are allowed to fail)
>>
>> How is that with a btrfs raid 10?
> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
> work like you show, but in practice it doesn't.
I'm a little bit concerned right now. I migrated my 4 disk raid6 to
raid10 because of the known raid5/6 problems. I assumed that btrfs
raid10 can handle 2 disk failures as longs as they occur in different
stripes.
Could you please point out why it cannot sustain 2 disk failures?

Thanks
>>
>> * Any other advice? ;-)
> You'll actually get significantly better performance with no loss of
> data safety by running BTRFS in raid1 mode on top of two RAID0 volumes
> (LVM/MD/hardware doesn't matter much).  I do this myself and see roughly
> 10-20% improved performance on average with my workloads.
> 
> If you do decide to do this, it's theoretically possible to do so
> online, but it's kind of tricky, so I won't post any instructions for
> that here unless someone asks for them.
>>
>> Thanks a lot,
>>
>> Florian
>>
>>
>> Some information of my filesystem:
>>
>> # btrfs filesystem show /
>> Label: 'data'  uuid: 57e5b9e9-01ae-4f9e-8a3d-9f42204d7005
>> Total devices 4 FS bytes used 7.57TiB
>> devid1 size 2.72TiB used 2.72TiB path /dev/sda4
>> devid2 size 2.72TiB used 2.72TiB path /dev/sdb4
>> devid3 size 2.72TiB used 2.72TiB path /dev/sdc4
>> devid4 size 2.72TiB used 2.72TiB path /dev/sdd4
>>
>> # btrfs filesystem df /
>> Data, RAID5: total=8.14TiB, used=7.56TiB
>> System, RAID5: total=96.00MiB, used=592.00KiB
>> Metadata, RAID5: total=12.84GiB, used=11.06GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
> Based on this output, you will need to delete some data  before you can
> convert to raid10.  With 4 2.72TiB drives, you're looking at roughly
> 5.44TiB of usable space, so you're probably going to have to delete at
> least 2-3TiB of data from this filesystem before converting.
> 
> If you're not already using transparent compression, it could probably
> help some with this, but it likely won't save you more than a few
> hundred GB unless you are storing lots of data that compresses very well.
>>
>> # df -h
>> Filesystem  Size  Used Avail Use% Mounted on
>>
>> /dev/sda411T  7.6T  597G  93% /
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-14 Thread Wilson Meier


> Am 14.09.2016 um 09:02 schrieb Anand Jain <anand.j...@oracle.com>:
> 
> 
> 
> Wilson,
> 
> Thanks for commenting. Pls see inline below..
> 
>> On 09/14/2016 12:42 AM, Wilson Meier wrote:
>> Hi Anand,
>> 
>> these are great news! Thanks for yor work. I'm looking forward to use the 
>> encryption.
>> 
>> I would like to ask a few question regarding the feature set.
>> 
>> 1. is encryption of an existing, filled and unencrypted subvolume without 
>> manually moving the data possible?
> 
>  Encrypt contexts are set only on newly created files. However you can
>  create empty encrypted subvol and move files and dir into it. In short
>  you can't set encrypt property on non-empty subvolume as of now.

Ok, so manually moving to an new encrypted subvolume is the only possibility. 
Maybe there will be a possibility in the feature. ;)

> 
>> 2. What about encrypting the root and boot subvolume? Will it work with 
>> grub2?
> 
>  Keys are only in-memory, which does not persist or prompt
>  for it across boot. I had keyctl code written to prompt
>  for key but it isn't successful yet. Probably once we support
>  keyfile, then root/boot support is possible.
> 

Currently i'm using dm-crypt and btrfs to achieve a fully encrypted system. I'm 
looking forward to switch to a pure btrfs solution. Hopefully this will be 
possible soon.

>> 3. How does btrfs rescue handle the encrypted subvolume to recover data in 
>> case of an emergency?
> 
>  btrfs rescue / btrfsck works as usual. btrfs restore which
>  needs decrypt isn't supported.
> 

Don't get me wrong, but not being able to use btrfs restore is a showstopper as 
i already had a case where i could only rescue my data using the restore 
command. In my opinion in the current state of btrfs such recover options are 
key.

>> 4. Is it possible to unlock a subvolume using a keyfile?
> 
>  keyfile support is on top of the list to be supported, it helps
>  testing as well.
> 

This goes hand in hand with my question about boot/root unlocking. 

> Thanks, Anand

Thanks for your feedback. I really appreciate.
Wilson

> 
>> Thanks in advance,
>> 
>> Wilson
>> 
>> 
>>> Am 13.09.2016 um 15:39 schrieb Anand Jain <anand.j...@oracle.com>:
>>> 
>>> 
>>> This patchset adds btrfs encryption support.
>>> 
>>> The main objective of this series is to have bugs fixed and stability.
>>> I have verified with fstests to confirm that there is no regression.
>>> 
>>> A design write-up is coming next, however here below is the quick example
>>> on the cli usage. Please try out, let me know if I have missed something.
>>> 
>>> Also would like to mention that a review from the security experts is due,
>>> which is important and I believe those review comments can be accommodated
>>> without major changes from here.
>>> 
>>> Also yes, thanks for the emails, I hear, per file encryption and inline
>>> with vfs layer is also important, which is wip among other things in the
>>> list.
>>> 
>>> As of now these patch set supports encryption on per subvolume, as
>>> managing properties on per subvolume is a kind of core to btrfs, which is
>>> easier for data center solution-ing, seamlessly persistent and easy to
>>> manage.
>>> 
>>> 
>>> Steps:
>>> -
>>> 
>>> Make sure following kernel TFMs are compiled in.
>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>> name : ctr(aes)
>>> name : cbc(aes)
>>> 
>>> Create encrypted subvolume.
>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1
>>> Create subvolume '/btrfs/e1'
>>> Passphrase:
>>> Again passphrase:
>>> 
>>> A key is created and its hash is updated into the subvolume item,
>>> and then added to the system keyctl.
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>   Encryption:ctr(aes)@btrfs:75197c8e (594790215)
>>> 
>>> # keyctl show 594790215
>>> Keyring
>>> 594790215 --alsw-v  0 0  logon: btrfs:75197c8e
>>> 
>>> 
>>> Now any file data extents under the subvol /btrfs/e1 will be
>>> encrypted.
>>> 
>>> You may revoke key using keyctl or btrfs(8) as below.
>>> # btrfs su encrypt -k out /btrfs/e1
>>> 
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>   Encryption:ctr(aes)@btrfs:75197c8e (Required key not available)
>>> 
>>> # keyctl show 594790215

Re: [RFC] Preliminary BTRFS Encryption

2016-09-13 Thread Wilson Meier
Hi Anand,

these are great news! Thanks for yor work. I'm looking forward to use the 
encryption.

I would like to ask a few question regarding the feature set.

1. is encryption of an existing, filled and unencrypted subvolume without 
manually moving the data possible?

2. What about encrypting the root and boot subvolume? Will it work with grub2?

3. How does btrfs rescue handle the encrypted subvolume to recover data in case 
of an emergency? 

4. Is it possible to unlock a subvolume using a keyfile?

Thanks in advance,

Wilson


> Am 13.09.2016 um 15:39 schrieb Anand Jain :
> 
> 
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.
> 
> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.
> 
> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.
> 
> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which is
> easier for data center solution-ing, seamlessly persistent and easy to
> manage.
> 
> 
> Steps:
> -
> 
> Make sure following kernel TFMs are compiled in.
> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
> name : ctr(aes)
> name : cbc(aes)
> 
> Create encrypted subvolume.
> # btrfs su create -e 'ctr(aes)' /btrfs/e1
> Create subvolume '/btrfs/e1'
> Passphrase: 
> Again passphrase: 
> 
> A key is created and its hash is updated into the subvolume item,
> and then added to the system keyctl.
> # btrfs su show /btrfs/e1 | egrep -i encrypt
>Encryption:ctr(aes)@btrfs:75197c8e (594790215)
> 
> # keyctl show 594790215
> Keyring
> 594790215 --alsw-v  0 0  logon: btrfs:75197c8e
> 
> 
> Now any file data extents under the subvol /btrfs/e1 will be
> encrypted.
> 
> You may revoke key using keyctl or btrfs(8) as below.
> # btrfs su encrypt -k out /btrfs/e1
> 
> # btrfs su show /btrfs/e1 | egrep -i encrypt
>Encryption:ctr(aes)@btrfs:75197c8e (Required key not available)
> 
> # keyctl show 594790215
> Keyring
> Unable to dump key: Key has been revoked
> 
> As the key hash is updated, If you provide wrong passphrase in the next
> key in, it won't add key to the system. So we have key verification
> from the day1.
> 
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase: 
> Again passphrase: 
> ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : 
> Key was rejected by service
> 
> ERROR: key set failed: Key was rejected by service
> 
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase: 
> Again passphrase: 
> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
> 
> Now if you revoke the key the read / write fails with key error.
> 
> # md5sum /btrfs/e1/2k-test-file 
> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
> 
> # btrfs su encrypt -k out /btrfs/e1
> 
> # md5sum /btrfs/e1/2k-test-file 
> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
> 
> # cp /tfs/1k-test-file /btrfs/e1/
> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked
> 
> Plain text memory scratches for security reason is pending. As there are some
> key revoke notification challenges to coincide with encryption context switch,
> which I do believe should be fixed in the due course, but is not a roadblock
> at this stage.
> 
> Thanks, Anand
> 
> 
> Anand Jain (1):
>  btrfs: Encryption: Add btrfs encryption support
> 
> fs/btrfs/Makefile   |   4 +-
> fs/btrfs/btrfs_inode.h  |   6 +
> fs/btrfs/compression.c  |  30 +-
> fs/btrfs/compression.h  |  10 +-
> fs/btrfs/ctree.h|   4 +
> fs/btrfs/disk-io.c  |   3 +
> fs/btrfs/encrypt.c  | 807 
> fs/btrfs/encrypt.h  |  94 +
> fs/btrfs/inode.c| 255 -
> fs/btrfs/ioctl.c|  67 
> fs/btrfs/lzo.c  |   2 +-
> fs/btrfs/props.c| 331 +++-
> fs/btrfs/super.c|  27 +-
> fs/btrfs/tests/crypto-tests.c   | 376 +++
> fs/btrfs/tests/crypto-tests.h   |  38 ++
> fs/btrfs/zlib.c |   2 +-
> include/uapi/linux/btrfs_tree.h |   6 +-
> 17 files changed, 2027 insertions(+), 35 deletions(-)
> create mode 100644 fs/btrfs/encrypt.c
> create mode 100644 fs/btrfs/encrypt.h
> create mode 100755 fs/btrfs/tests/crypto-tests.c
> create mode 100755 fs/btrfs/tests/crypto-tests.h
> 
> Anand Jain (2):
>  btrfs-progs: make 

Re: [PATCH 1/2] btrfs-progs: mkfs: Warn user for minimal RAID5/6 devices setup

2016-09-02 Thread Wilson Meier
+1 for the note/warning!



> Am 02.09.2016 um 03:59 schrieb Steven Haigh :
> 
> --7NtBNafxaf8ODulNFrivgmW7Mjs09oWwR
> Content-Type: text/plain; charset=windows-1252
> Content-Transfer-Encoding: quoted-printable
> 
> Is it worthwhile adding a note that RAID5 / RAID6 may very well eat your
> data at this stage?
> 
>> On 02/09/16 11:41, Qu Wenruo wrote:
>> For RAID5, 2 devices setup is just RAID1 with more overhead.
>> For RAID6, 3 devices setup is RAID1 with 3 copies, not what most user
>> want.
>> =20
>> So warn user at mkfs time for such case, and add explain in man pages.
>> =20
>> Signed-off-by: Qu Wenruo 
>> ---
>> Documentation/mkfs.btrfs.asciidoc | 15 +++-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html