Re: status page

2018-04-25 Thread Duncan
Gandalf Corvotempesta posted on Wed, 25 Apr 2018 14:30:42 +0200 as
excerpted:

> For me, RAID56 is mandatory. Any ETA for a stable RAID56 ?
> Is something we should expect this year, next year, next 10 years, 
> ?

It's complicated... is the best short answer to that.  Here's my take at 
a somewhat longer, admin/user-oriented (as I'm not a dev, just a btrfs 
user and list regular), answer.

AFAIK, current status of raid56/parity-raid is "no known major bugs left 
in the current code, but one major caveat, the common to parity-raid 
unless worked around some other way 'degraded-mode parity-raid write 
hole'", which arguably has somewhat more significance in btrfs than other 
parity-raid implications because the current raid56 implementation 
doesn't checksum the parity itself, thus losing some of the data 
integrity safeguards people normally choose btrfs for in the first 
place.  The implications are particularly disturbing with regard to 
metadata because due to parity-raid's read-modify-write cycle, it's not 
just newly written/changed data/metadata that's put at risk, but 
potentially otherwise old and stable data as well.

Again, this is a known issue with parity-raid in general, that simply has 
additional implications on btrfs.  But because it's a generally well 
known issue, there are generally well accepted mitigations available.  
*If* your storage plans account for that with sufficient safeguards such 
as a good (tested) backup routine that ensures that you are actually 
defining as appropriately valuable your data by the number and frequency 
of backups you have of it...  (Data without a backup is simply being 
defined as of less value than the time/trouble/resources necessary to do 
that backup, because if it were more valuable, there'd *BE* that backup.)

... Then AFAIK at this point the only thing btrfs raid56 mode lacks, 
stability-wise, is the testing of time, since until recently there *were* 
severe known bugs, and altho they've now been fixed, the fixes are recent 
enough that it's quite possible that other bugs still remain to show 
themselves, now that the older bugs have been fixed.

My own suggestion for such time-testing is a year, five kernel cycles, 
after the last known severe bug has been fixed.  If there's no hint of 
further reset-the-clock level bugs in that time, then it's reasonable to 
consider, still with some caution and additional safeguards, deployment 
beyond testing.


Meanwhile, as others have mentioned, there are a number of proposals out 
there for write-hole mitigation.

The theoretically cleanest but also the most intensive, since it requires 
rewriting and retesting much of the existing raid56 mode, would be 
rewriting raid56 mode to COW and checksum parity as well.  If this 
happens, it's almost certainly least five years out to well tested and 
could well be a decade out.

Another possibility is taking a technique from zfs, doing stripes of 
varying size (varying number of strips less than the total number of 
devices) depending on how much data is being written.  Btrfs raid56 mode 
can already deal with this to some extent, and does so when some devices 
are smaller than others and thus run out of space, so stripes written 
after that don't include them.  A similar situation occurs when devices 
are added, until a balance redoes existing stripes to take into account 
the new device.  What btrfs raid56 mode /could/ do is extend this and 
handle small writes much as zfs does, deliberately writing less than full-
width stripes when there's less data, thus avoiding read-modify-write of 
existing data/metadata.  A balance could then be scheduled periodically 
to restripe these "short stripes" to full width.

A variant of the above would simply write full-width, but partially 
empty, stripes.  Both of these should be less work to code than the first/
cleanest solution above since they to a large extent simply repurpose 
existing code, but they're somewhat more complicated and thus potentially 
more bug prone, and they both would require periodic rebalancing of the 
short or partially empty stripes to full width for full efficiency.

Finally, there's the possibility of logging partial-width writes before 
actually writing them.  This would be an extension to existing code, and 
would require writing small writes twice, once to the log and then 
rewriting to the main storage at full stripe width with parity.  As a 
result, it'd slow things down (tho only for less than full-width stripe 
writes, full width would be written as normal as they don't involve the 
risky read-modify-write cycle), but people don't choose parity-raid for 
write speed anyway, /because/ of the read-modify-write penalty it imposes.

This last solution should involve the least change to existing code, and 
thus should be the fastest to implement, with the least chance of 
introducing new bugs so the testing and bugfixing cycle should be shorter 
as well, but ouch, that logged-write penalty 

Re: status page

2018-04-25 Thread Hugo Mills
On Wed, Apr 25, 2018 at 02:30:42PM +0200, Gandalf Corvotempesta wrote:
> 2018-04-25 13:39 GMT+02:00 Austin S. Hemmelgarn :
> > Define 'stable'.
> 
> Something ready for production use like ext or xfs with no critical
> bugs or with easy data loss.
> 
> > If you just want 'safe for critical data', it's mostly there already
> > provided that your admins and operators are careful.  Assuming you avoid
> > qgroups and parity raid, don't run the filesystem near full all the time,
> > and keep an eye on the chunk allocations (which is easy to automate with
> > newer kernels), you will generally be fine.  We've been using it in
> > production where I work for a couple of years now, with the only issues
> > we've encountered arising from the fact that we're stuck using an older
> > kernel which doesn't automatically deallocate empty chunks.
> 
> For me, RAID56 is mandatory. Any ETA for a stable RAID56 ?
> Is something we should expect this year, next year, next 10 years,  ?

   There's not really any ETAs for anything in the kernel, in general,
unless the relevant code has already been committed and accepted (when
it has a fairly deterministic path from then onwards). ETAs for
finding even known bugs are pretty variable, depending largely on how
easily the bug can be reproduced by the reporter and by the developer.

   As for a stable version -- you'll have to define "stable" in a way
that's actually measurable to get any useful answer, and even then,
see my previous comment about ETAs.

   There have been example patches in the last few months on the
subject of closing the write hole, so there's clear ongoing work on
that particular item, but again, see the comment on ETAs. It'll be
done when it's done.

   Hugo.

-- 
Hugo Mills | Nothing wrong with being written in Perl... Some of
hugo@... carfax.org.uk | my best friends are written in Perl.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  dark


signature.asc
Description: Digital signature


Re: status page

2018-04-25 Thread Gandalf Corvotempesta
2018-04-25 13:39 GMT+02:00 Austin S. Hemmelgarn :
> Define 'stable'.

Something ready for production use like ext or xfs with no critical
bugs or with easy data loss.

> If you just want 'safe for critical data', it's mostly there already
> provided that your admins and operators are careful.  Assuming you avoid
> qgroups and parity raid, don't run the filesystem near full all the time,
> and keep an eye on the chunk allocations (which is easy to automate with
> newer kernels), you will generally be fine.  We've been using it in
> production where I work for a couple of years now, with the only issues
> we've encountered arising from the fact that we're stuck using an older
> kernel which doesn't automatically deallocate empty chunks.

For me, RAID56 is mandatory. Any ETA for a stable RAID56 ?
Is something we should expect this year, next year, next 10 years,  ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status page

2018-04-25 Thread Austin S. Hemmelgarn

On 2018-04-25 07:13, Gandalf Corvotempesta wrote:

2018-04-23 17:16 GMT+02:00 David Sterba :

Reviewed and updated for 4.16, there's no change regarding the overall
status, though 4.16 has some raid56 fixes.


Thank you!
Any ETA for a stable RAID56 ? (or, even better, for a stable btrfs
ready for production use)

I've seen many improvements in these months and a stable btrfs seems
to be not that far.


Define 'stable'.

If you want 'bug free', that won't happen ever.  Even 'stable' 
filesystems like XFS and ext4 still have bugs found and fixed on a 
somewhat regular basis.  The only filesystem drivers that don't have 
bugs reported are either so trivial that there really are no bugs (see 
for example minix and vfat) or aren't under active development (and 
therefore all the bugs have been fixed already).


If you just want 'safe for critical data', it's mostly there already 
provided that your admins and operators are careful.  Assuming you avoid 
qgroups and parity raid, don't run the filesystem near full all the 
time, and keep an eye on the chunk allocations (which is easy to 
automate with newer kernels), you will generally be fine.  We've been 
using it in production where I work for a couple of years now, with the 
only issues we've encountered arising from the fact that we're stuck 
using an older kernel which doesn't automatically deallocate empty chunks.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status page

2018-04-25 Thread Gandalf Corvotempesta
2018-04-23 17:16 GMT+02:00 David Sterba :
> Reviewed and updated for 4.16, there's no change regarding the overall
> status, though 4.16 has some raid56 fixes.

Thank you!
Any ETA for a stable RAID56 ? (or, even better, for a stable btrfs
ready for production use)

I've seen many improvements in these months and a stable btrfs seems
to be not that far.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status page

2018-04-23 Thread David Sterba
On Thu, Apr 19, 2018 at 06:24:29PM +0200, Gandalf Corvotempesta wrote:
> Hi to all,
> as kernel 4.16 is out and 4.17 in in RC, would be possible to update
> BTRFS status page
> https://btrfs.wiki.kernel.org/index.php/Status to reflect 4.16 stability ?
> 
> That page is still based on kernel 4.15 (marked as EOL here:
> https://www.kernel.org/)

Reviewed and updated for 4.16, there's no change regarding the overall
status, though 4.16 has some raid56 fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


status page

2018-04-19 Thread Gandalf Corvotempesta
Hi to all,
as kernel 4.16 is out and 4.17 in in RC, would be possible to update
BTRFS status page
https://btrfs.wiki.kernel.org/index.php/Status to reflect 4.16 stability ?

That page is still based on kernel 4.15 (marked as EOL here:
https://www.kernel.org/)

Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please update the BTRFS status page

2018-02-27 Thread David Sterba
On Sat, Feb 24, 2018 at 03:05:48AM +0100, waxhead wrote:
> The latest released kernel is 4.15

Updated. There's one pending ack for the degraded mount that's still
mostly-ok but IMHO it's been fixed in 4.14 so it'll be OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please update the BTRFS status page

2018-02-23 Thread waxhead

The latest released kernel is 4.15
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html