Re: How does btrfs handle sudden shutdowns?

2012-11-08 Thread Alex
Michael Kjörling  kjorling.se> writes:

> 
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)
> 

Really rather well! We've had a sequence of power-cuts around here and I've
scrubbed each time, finding only one corruption over all which was fixed by the
scrub and no data lost.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does btrfs handle sudden shutdowns?

2012-11-06 Thread Hugo Mills
On Tue, Nov 06, 2012 at 01:47:02PM +, Michael Kjörling wrote:
> On 6 Nov 2012 12:48 +, from h...@carfax.org.uk (Hugo Mills):
> >There are also some caveats: while the FS should always be
> > consistent, the latest transaction write may not have been completed,
> > so you could potentially lose up to 30 seconds of writes to the FS
> > from immediately before the crash.
> 
> I'd rather lose the most recent 30 seconds of writes but have a
> consistent file system with as-consistent-as-can-be-expected data,
> than end up with a corrupted file system.
> 
> On that note; can this value be tuned currently, is it hardcoded, or
> is it stored in metadata somewhere but the tooling to tune it is not
> yet available?

   As far as I understand, no, it's hard-coded.

> >If the FS does corrupt over a power failure, and the hardware can
> > be demonstrated to be good, then we have a bug that needs to be
> > tracked down. (There have been a number of these over the development
> > of the FS so far, but they do get fixed).
> 
> Is there a simple way to tell ahead of time whether the hardware meets
> the assumptions made by the file system with regards to write barriers
> etc.?

   "Most" hardware does. I think there's a "barriers disabled" warning
in the kernel logs on mounting the FS, and some time ago there were
rumours of a tool to check for it (from Red Hat, but I don't know if
it ever saw the light of day). That's all for the case where the
hardware explicitly states that it doesn't support barriers.

   More concerning is the out-of-spec hardware which claims to support
barriers and utterly fails to do so. I don't think there's much you
can do to detect that case, other than force failures and try to catch
it out -- then return it to the manufacturer under whatever consumer
protection laws you have, on the grounds that it's no fit for purpose.

   I think the number of actual such hard disks that do this is fairly
small, but they are out there. I'm not aware of a blacklist/quirks
list for them.

> >I guess the question for you is: are you after the _expected_
> > behaviour of the FS (should always be consistent on good hardware, but
> > you may lose up to 30 seconds of writes), or are you after mitigation
> > strategies in the face of FS bugs (keep off-site backups and be
> > prepared to use them)?
> 
> I already have full, daily on-site backups on an external drive that
> is logically unmounted except for when backups are running, as well as
> partial off-site backups to cloud storage - and of course, taking
> advantage of btrfs's snapshotting support there is no real reason why
> I couldn't increase the backup frequency while retaining data
> consistency. Losing half a minute of writes is fairly inconsequential
> for personal use as long as the file system remains consistent, and in
> the face of disastrous corruption it is at least possible to do a full
> restore to bare metal from rescue media and backup without losing too
> much. Not trivial time-wise (that's currently 1.4 TB over USB 2.0),
> but possible.

   OK, so I hope I've managed to answer your question satisfactorily.
Let us know if there's any outstanding queries you want cleared up. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- "I will not be pushed,  filed, stamped, indexed, briefed, ---
   debriefed or numbered.  My life is my own."   


signature.asc
Description: Digital signature


Re: How does btrfs handle sudden shutdowns?

2012-11-06 Thread Michael Kjörling
On 6 Nov 2012 12:48 +, from h...@carfax.org.uk (Hugo Mills):
>There are also some caveats: while the FS should always be
> consistent, the latest transaction write may not have been completed,
> so you could potentially lose up to 30 seconds of writes to the FS
> from immediately before the crash.

I'd rather lose the most recent 30 seconds of writes but have a
consistent file system with as-consistent-as-can-be-expected data,
than end up with a corrupted file system.

On that note; can this value be tuned currently, is it hardcoded, or
is it stored in metadata somewhere but the tooling to tune it is not
yet available?


>If the FS does corrupt over a power failure, and the hardware can
> be demonstrated to be good, then we have a bug that needs to be
> tracked down. (There have been a number of these over the development
> of the FS so far, but they do get fixed).

Is there a simple way to tell ahead of time whether the hardware meets
the assumptions made by the file system with regards to write barriers
etc.?


>I guess the question for you is: are you after the _expected_
> behaviour of the FS (should always be consistent on good hardware, but
> you may lose up to 30 seconds of writes), or are you after mitigation
> strategies in the face of FS bugs (keep off-site backups and be
> prepared to use them)?

I already have full, daily on-site backups on an external drive that
is logically unmounted except for when backups are running, as well as
partial off-site backups to cloud storage - and of course, taking
advantage of btrfs's snapshotting support there is no real reason why
I couldn't increase the backup frequency while retaining data
consistency. Losing half a minute of writes is fairly inconsequential
for personal use as long as the file system remains consistent, and in
the face of disastrous corruption it is at least possible to do a full
restore to bare metal from rescue media and backup without losing too
much. Not trivial time-wise (that's currently 1.4 TB over USB 2.0),
but possible.

-- 
Michael Kjörling • http://michael.kjorling.se • mich...@kjorling.se
“People who think they know everything really annoy
those of us who know we don’t.” (Bjarne Stroustrup)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does btrfs handle sudden shutdowns?

2012-11-06 Thread Liu Bo
On Tue, Nov 06, 2012 at 12:33:08PM +, Michael Kjörling wrote:
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)
> 

AFAIK, yes, because btrfs is naturally COW supported, which means
you can roll back to the latest stable situation at least.

> What would be the risk points, file-system-wise?
> 

Data loss is possible if you're not writing with O_SYNC or doing fsync
after a write.

> Can for example a rotating snapshot schedule mitigate some or all
> issues relating to sudden shutdowns, if any? (_For example_, take a
> snapshot every minute, keeping the last five; if the main file system
> fails to mount, then could the most recent usable snapshot be used as
> a fallback, or is it likely to be equally damaged or inconsistent?)
> 

In your case, when we finish creating a snapshot, the whole FS is at a
stable status(both metadata and data is safely written into the disk).

So yes, you can use the latest snapshot as a fallback or backup or something.

I'd note here, btrfs somewhat suffers from ENOSPC cases, where it may
recover itself or get you into readonly state, but you data is safe at least.

thanks,
liubo

> Obviously a UPS or other form of fallback power is preferable to no
> UPS if power outages are a concern, so as to allow a controlled system
> shutdown (or fail-over to a more long-term backup power supply) in the
> event of a prolonged power outage, but I'm wondering about situations
> where such don't exist or even fail.
> 
> -- 
> Michael Kjörling • http://michael.kjorling.se • mich...@kjorling.se
> “People who think they know everything really annoy
> those of us who know we don’t.” (Bjarne Stroustrup)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does btrfs handle sudden shutdowns?

2012-11-06 Thread Hugo Mills
On Tue, Nov 06, 2012 at 12:33:08PM +, Michael Kjörling wrote:
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)

   In theory (i.e. by the design of the FS), you should be able to
pull the plug on btrfs at any point, and the FS will always be
consistent.

   This makes some assumptions: That writing a single page to the FS
is atomic. That the hardware reports barriers to the OS reliably. i.e.
if the hardware says it's fully stored data without losing it, then it
actually has.

   There are also some caveats: while the FS should always be
consistent, the latest transaction write may not have been completed,
so you could potentially lose up to 30 seconds of writes to the FS
from immediately before the crash.

   If the FS does corrupt over a power failure, and the hardware can
be demonstrated to be good, then we have a bug that needs to be
tracked down. (There have been a number of these over the development
of the FS so far, but they do get fixed).

> What would be the risk points, file-system-wise?
> 
> Can for example a rotating snapshot schedule mitigate some or all
> issues relating to sudden shutdowns, if any? (_For example_, take a
> snapshot every minute, keeping the last five; if the main file system
> fails to mount, then could the most recent usable snapshot be used as
> a fallback, or is it likely to be equally damaged or inconsistent?)

   No, snapshots give you no additional guarantees -- if the FS
corrupts and is unmountable, a snapshot is part of the same FS and
will also be unmountable.

> Obviously a UPS or other form of fallback power is preferable to no
> UPS if power outages are a concern, so as to allow a controlled system
> shutdown (or fail-over to a more long-term backup power supply) in the
> event of a prolonged power outage, but I'm wondering about situations
> where such don't exist or even fail.

   As I said above, the FS structures _should_ be completely reliable
in the face of power loss; that they haven't been in the past is
definitely a bug, and those bugs have been / are being fixed as
they're found. We've had very few transid match failures recently,
which used to be the main failure mode for these bugs. I don't know
whether that's because people aren't reporting them, or because
they're not happening nearly so often these days. I suspect the
latter.

   I guess the question for you is: are you after the _expected_
behaviour of the FS (should always be consistent on good hardware, but
you may lose up to 30 seconds of writes), or are you after mitigation
strategies in the face of FS bugs (keep off-site backups and be
prepared to use them)?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature