Re: Lost file-system story

2011-12-14 Thread Aleksej Saushev
James Chacon chacon.ja...@gmail.com writes:

 On Tue, Dec 13, 2011 at 4:09 PM, Greg A. Woods wo...@planix.ca wrote:
 At Wed, 14 Dec 2011 09:06:23 +1030, Brett Lymn brett.l...@baesystems.com 
 wrote:
 Subject: Re: Lost file-system story

 On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote:
 
  fsck is supposed to handle *all* corruptions to the file system that can
  occur as part of normal file system operation in the kernel. It is doing
  best effort for others. It's a bug if it doesn't do the former and a
  potential missing feature for the latter.

 There are a lot of slips twixt cup and lip.  If you are really unlucky
 you can get an outage at just the wrong time that will cause the
 filesystem to be hosed so badly that fsck cannot recover it.  Sure, fsck
 can run to completion but all you have is most of your FS in lost+found
 which you have to be really really desperate to sort through.  I have
 been working with UNIX for over 20years now and I have only seen this
 happen once and it was with a commercial UNIX.

 I've seen that happen more than once unfortunately.  SunOS-4 once I think.

 I agree 100% with Joerg here though.

 I'm pretty sure at least some of the times I've seen fsck do more damage
 than good it was due to a kernel bug or more breaking assumptions about
 ordered operations.

 There have of course also been some pretty serious bugs in various fsck
 implementations across the years and vendors.

 I'd be suspicious of fsck failing on a regularly mounted disk with
 corruption that can't otherwise be tracked to outside influences (bad
 ram, bad disk cache, etc). I've seen some bizarre things happen on ram
 errors over the years for instance.

I've got infinite sequence of nested subdirectories on new hardware and
stable FreeBSD 5.3 once. Something like http://xkcd.com/981/
fsck refused to work there.


-- 
HE CE3OH...



Re: Lost file-system story

2011-12-14 Thread Greg A. Woods
At Wed, 14 Dec 2011 07:50:37 + (UTC), mlel...@serpens.de (Michael van Elst) 
wrote:
Subject: Re: Lost file-system story
 
 wo...@planix.ca (Greg A. Woods) writes:
 
 easy, if not even easier, to do a mount -u -r
 
 Does this work again?

Not that I know of, and PR#30525 concurs, as does the commit mentioned
in that PR to prevent it from falsely appearing to work, a change which
remains in netbsd-5 and -current to date.  See my discussion of this
issue earlier in this thread.

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpsPtoKtaNDu.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-14 Thread David Holland
On Thu, Dec 15, 2011 at 12:48:51AM +0400, Aleksej Saushev wrote:
   There have of course also been some pretty serious bugs in various fsck
   implementations across the years and vendors.
  
   I'd be suspicious of fsck failing on a regularly mounted disk with
   corruption that can't otherwise be tracked to outside influences (bad
   ram, bad disk cache, etc). I've seen some bizarre things happen on ram
   errors over the years for instance.
  
  I've got infinite sequence of nested subdirectories on new hardware and
  stable FreeBSD 5.3 once. Something like http://xkcd.com/981/
  fsck refused to work there.

At one point some time back when pounding on rename, I got a test
volume into a state where if you ran fsck -fy it would fix a ton of
stuff, run to completion, and mark the fs clean. Which was great,
except that if you did it again, it would do the same thing. Over and
over. I'm glad it was a test volume...

-- 
David A. Holland
dholl...@netbsd.org


Fwd: Lost file-system story

2011-12-13 Thread Donald Allen
I did it again. gmail is trying to teach an old dog a new trick 


-- Forwarded message --
From: Donald Allen donaldcal...@gmail.com
Date: Tue, Dec 13, 2011 at 10:04 AM
Subject: Re: Lost file-system story
To: David Holland dholland-t...@netbsd.org


On Tue, Dec 13, 2011 at 1:27 AM, David Holland dholland-t...@netbsd.org wrote:
 On Mon, Dec 12, 2011 at 03:31:09PM -0500, Donald Allen wrote:
   Note that this bug *may* not worsen the probability of recovery after
   a crash. It might even increase it! Think about it. If you boot NetBSD
   and mount a filesystem async, it is going to be correctly structured
   (or deemed to be by fsck) at boot time, or the system wouldn't mount
   it. Assuming the system is happy with it, if you then make changes to
   the filesystem,  but, because of this bug they are all in the buffer
   cache and never get written out, and then the system crashes ---
   you've got the filesystem you started with.

 Not necessarily;

I did say *may* (which I wrote because you could write a good book
about NetBSD internals with what I don't know about NetBSD internals).

right off I can see two ways to get hosed:

 1. Delete a large file. This causes the in-memory FS to believe the
 indirect blocks from this file are free; then it can reallocate them
 as data for some other file. That data then *does* get written out, so
 after crashing and rebooting the indirect blocks contain utter
 nonsense. The ffs fsck probably can't recover this.

 2. Use a program that calls fsync(). This will write out some metadata
 blocks and not others; in the relatively benign case it will just
 update a previously-free inode and after crashing fsck will place the
 file in lost+found. In less benign cases it might do the converse of
 (1), and e.g. overwrite file data with indirect blocks, leading to
 crosslinked files or worse and probably total fsck failure.

 Not that any of this matters...

I agree. I was just indulging in some idle speculation, having some
fun. This bug should be fixed and I think the fix, as I said before,
should include a knob to allow the user to control the sync frequency
(maybe the knob is already there in sysctl and getting ignored for
some reason?). I'm running NetBSD again on my test machine, and have a
sleep-sync loop started in rc.local.

/Don



 --
 David A. Holland
 dholl...@netbsd.org


Re: Lost file-system story

2011-12-13 Thread Greg A. Woods
At Wed, 14 Dec 2011 09:06:23 +1030, Brett Lymn brett.l...@baesystems.com 
wrote:
Subject: Re: Lost file-system story
 
 On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote:
  
  fsck is supposed to handle *all* corruptions to the file system that can
  occur as part of normal file system operation in the kernel. It is doing
  best effort for others. It's a bug if it doesn't do the former and a
  potential missing feature for the latter.
  
 
 There are a lot of slips twixt cup and lip.  If you are really unlucky
 you can get an outage at just the wrong time that will cause the
 filesystem to be hosed so badly that fsck cannot recover it.  Sure, fsck
 can run to completion but all you have is most of your FS in lost+found
 which you have to be really really desperate to sort through.  I have
 been working with UNIX for over 20years now and I have only seen this
 happen once and it was with a commercial UNIX.

I've seen that happen more than once unfortunately.  SunOS-4 once I think.

I agree 100% with Joerg here though.

I'm pretty sure at least some of the times I've seen fsck do more damage
than good it was due to a kernel bug or more breaking assumptions about
ordered operations.

There have of course also been some pretty serious bugs in various fsck
implementations across the years and vendors.

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpYVEF362Y36.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-13 Thread James Chacon
On Tue, Dec 13, 2011 at 4:09 PM, Greg A. Woods wo...@planix.ca wrote:
 At Wed, 14 Dec 2011 09:06:23 +1030, Brett Lymn brett.l...@baesystems.com 
 wrote:
 Subject: Re: Lost file-system story

 On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote:
 
  fsck is supposed to handle *all* corruptions to the file system that can
  occur as part of normal file system operation in the kernel. It is doing
  best effort for others. It's a bug if it doesn't do the former and a
  potential missing feature for the latter.
 

 There are a lot of slips twixt cup and lip.  If you are really unlucky
 you can get an outage at just the wrong time that will cause the
 filesystem to be hosed so badly that fsck cannot recover it.  Sure, fsck
 can run to completion but all you have is most of your FS in lost+found
 which you have to be really really desperate to sort through.  I have
 been working with UNIX for over 20years now and I have only seen this
 happen once and it was with a commercial UNIX.

 I've seen that happen more than once unfortunately.  SunOS-4 once I think.

 I agree 100% with Joerg here though.

 I'm pretty sure at least some of the times I've seen fsck do more damage
 than good it was due to a kernel bug or more breaking assumptions about
 ordered operations.

 There have of course also been some pretty serious bugs in various fsck
 implementations across the years and vendors.


I'd be suspicious of fsck failing on a regularly mounted disk with
corruption that can't otherwise be tracked to outside influences (bad
ram, bad disk cache, etc). I've seen some bizarre things happen on ram
errors over the years for instance.

James


Re: Lost file-system story

2011-12-13 Thread Greg A. Woods
At Mon, 12 Dec 2011 18:49:31 -0500 (EST), Matt W. Benjamin 
m...@linuxbox.com wrote:
Subject: Re: Lost file-system story
 
 Why would sync not be effective under MNT_ASYNC?  Use of sync is not
 required to lead to consistency expect with respect to an arbitrary
 point in time, but I don't think anyone ever believed otherwise.
 However, there should be no question of metadata never being written
 out if sync was run?

Well sync(2) _could_ be effective even in the face of MNT_ASYNC, though
I'm not sure it will, or indeed even should be required to, have a
guaranteed ongoing beneficial affect to the on-disk consistency of
filesystem that was mounted with MNT_ASYNC while activity continues to
proceed on the filesystem.

I.e. I don't expect sync(2) to suddenly enforce order on the writes that
it schedules to a MNT_ASYNC-mounted filesystem.  The ordering _may_ be a
natural result of the implementation, but if it's not then I wouldn't
consider that to be a bug, and I certainly wouldn't write any
documentation that suggested it might be a possible outcome.  MNT_ASYNC
means, to me at least, that even sync(2) can get away with doing writes
to a filesystem mounted with that flag in an order other than one which
would guarantee on-disk consistency to a level where fsck could repair
it.

I.e. sync(2) could possibly make things worse for MNT_ASYNC mounted
filesystems before it makes them better, and I don't see how that could
be considered to be a bug.

I do agree that IFF the filesystem is made quiescent, AND all writes
necessary and scheduled by sync(2) are allowed to come to completion,
THEN the on-disk state of an MNT_ASYNC-mounted filesystem must be
consistent (and all data blocks must be flushed to the disk too).

However if you're going to go to that trouble (i.e. close all files open
on the MNT_ASYNC-mounted filesystem and somehow prevent any other file
operations of any kind on that filesystem until such time that you think
the sync(2) scheduled writes are all done), then it should be just as
easy, if not even easier, to do a mount -u -r (or mount -u -o
noasync, or even umount), in which case you'll not only be sure that
the filesystem is consistent and secure, but you'll know when it reaches
this state (i.e. you won't have to guess about when sync(2)'s scheduled
work completes).

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpcLcSlnWPyx.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-13 Thread Michael van Elst
wo...@planix.ca (Greg A. Woods) writes:

easy, if not even easier, to do a mount -u -r

Does this work again?

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
A potential Snark may lurk in every tree.


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Fri, 9 Dec 2011 22:12:25 -0500, Donald Allen donaldcal...@gmail.com wrote:
Subject: Re: Lost file-system story
 
 On Fri, Dec 9, 2011 at 8:43 PM, Greg A. Woods wo...@planix.ca wrote:
  At Fri, 9 Dec 2011 15:50:35 -0500, Donald Allen donaldcal...@gmail.com 
  wrote:
  Subject: Re: Lost file-system story
  
   does not guarantee to keep a consistent file system structure on the
   disk is what I expected from NetBSD. From what I've been told in this
   discussion, NetBSD pretty much guarantees that if you use async and
   the system crashes, you *will* lose the filesystem if there's been any
   writing to it for an arbitrarily long period of time, since apparently
   meta-data for async filesystems doesn't get written as a matter of
   course.
 
  I'm not sure what the difference is.
 
 You would be sure if you'd read my posts carefully. The difference is
 whether the probability of an async-mounted filesystem is near zero or
 near one.

I think perhaps the misunderstanding between you and everyone else is
because you haven't fully appreciated what everyone has been trying to
tell you about the true meaning of async in Unix-based filesystems,
and in particular about NetBSD's current implementation of Unix-based
filesystems, and what that all means to implementing algorithms that can
relibably repair the on-disk image of a filesystem after a crash.

I would have thought the warning given in the description of async in
mount(8) would be sufficient, but apparently you haven't read it that
way.

Perhaps the problem is the last occurance of the word or in the last
sentence of that warning should be changed to and.  To me that would
at least make the warning a bit stronger.


  And that's why by default, and by very strong recommendation, filesystem
  metadata for Unix-based filesystems (sans WABPL) should always be
  written synchronously to the disk if you ever hope to even try to use
  fsck(8).
 
 That's simply not true. Have you ever used Linux in all the years that
  ext2 was the predominant filesystem? ext2 filesystems were routinely
 mounted async for many years; everything -- data, meta-data -- was
 written asynchronously with no regard to ordering. 

DO NOT confuse any Linux-based filesystem with any Unix-based
filesystem.  They may have nearly identical semantics from the user
programming perspective (i.e. POSIX), but they're all entirely different
under the hood.

Unix-based filesystems (sans WABPL, and ignoring the BSD-only LFS) have
never ever Ever EVER given any guarantee about the repariability of the
filesystem after a crash if it has been mounted with MNT_ASYNC.

Indeed it is more or less _impossible_ by design for the system to make
any such guarantee given what MNT_ASYNC actually means for Unix-based
filesystems, and especially what it means in the NetBSD implementation.


  Unix filesystems, including Berkeley Fast File System variant, have
  never made any guarantees about the recoverability of an async-mounted
  filesystem after a crash.
 
 I never thought or asserted otherwise.

Well, from my perspective, especially after carefully reading your
posts, you do indeed seem to think that async-mounted Unix-based
filesystems should be able to be repaired, at least some of the time,
despite the documentation, and all the collected wisdom of those who've
replied to your posts so far, saying otherwise.


  You seem to have inferred some impossible capability based on your
  experience with other non-Unix filesystems that have a completely
  different internal structure and implementation from the Unix-based
  filesystems in NetBSD.
 
 Nonsense -- I have inferred no such thing. Instead of referring you to
 previous posts for a re-read, I'll give you a little summary. I am
 speaking about probabilities. I completely understand that no
 filesystem mounted async (or any other way, for that matter), whether
 Linux or NetBSD or OpenBSD, is GUARANTEED to survive a crash.

OK, let's try stating this once more in what I hope are the same terms
you're trying to use:  The probablility of any Unix-based filesystem
being repariable after a crash is zero (0) if it has been mounted with
MNT_ASYNC, and if there was _any_ activity that affected its structure
since mount time up to the time of the crash.  It still might survive
after some types of changes, but it _probably_ won't.  There are no
guarantees.  Use newfs and restore to recover.

Linux ext2 is not a Unix-based filesystem and Linux itself is not a
Unix-based kernel.  The meaning of async to ext2 is apparently very
different than it is to any Unix-based filesystem.  NetBSD might be free
of UNIX(tm) code, but it and its progenitors, right back to the 7th
Edition of the original Unix, were all implemented by people firmly
entrenched in the original Unix heritage from the inside out.

For Unix-based filesystems and their repair tools, any probablility of
recovery less than one is as good as if it were zero.  Don't ever get
your hopes up.  Use newfs

Re: Lost file-system story

2011-12-12 Thread Valeriy E. Ushakov
On Sun, Dec 11, 2011 at 23:23:33 -0500, Donald Allen wrote:

 On Sun, Dec 11, 2011 at 9:53 PM, Greg A. Woods wo...@planix.ca wrote:

  Perhaps this sentence from McKusick's memo about fsck will help you to
  understand:  fsck is able to repair corrupted file systems using
  procedures based upon the order in which UNIX honors these file system
  update requests.  This is true for all Unix-based filesystems.
 
 I'm not going to put words in McKusick's mouth, but I think you have
 misinterpreted this to mean that without ordering, recovery is
 impossible. If that's what you think (and you've said so, except when
 you've contradicted yourself), then you are wrong. Why? Because the
 evidence (e.g., my experiments) says  that recovery *is* possible. Not
 guaranteed. Possible.

What you are arguing is effectively isomorphic to:

1. I have C code that does i = i++ + i++;
2. When I use compiler C1 it always give me this specific result for i.
3. When I use compiler C2 it sometimes (or always) gives me some
   different result.
4. B/c of #2 C2 compiler must be wrong

-uwe


Re: Lost file-system story

2011-12-12 Thread David Holland
On Sun, Dec 11, 2011 at 06:53:26PM -0800, Greg A. Woods wrote:
   You would be sure if you'd read my posts carefully. The difference is
   whether the probability of an async-mounted filesystem is near zero or
   near one.
  
  I think perhaps the misunderstanding between you and everyone else is
  because you haven't fully appreciated what everyone has been trying to
  tell you about the true meaning of async in Unix-based filesystems,
  and in particular about NetBSD's current implementation of Unix-based
  filesystems, and what that all means to implementing algorithms that can
  relibably repair the on-disk image of a filesystem after a crash.

No, as far as I can tell he understands perfectly well; he just
doesn't consider the behavior acceptable.

It appears that currently a ffs volume mounted -oasync never writes
back metadata. I don't think this behavior is acceptable either.

The fact that mounting -oasync violates assumptions made by fsck_ffs,
with the result that fsck may not be able to recover after a crash
(either without making a huge mess in lost+found, or not at all) is
secondary at the moment, because in the absence of the previous
glaring bug it's impossible to even estimate what the probability of
it choking is.

(Note that with ext2 on Linux from time to time fsck will not be able
to recover after a crash and make a huge mess in lost+found. It
never happened all that often and is probably less common now after 15
years or so of incremental work on e2fsck.)

  DO NOT confuse any Linux-based filesystem with any Unix-based
  filesystem.  They may have nearly identical semantics from the user
  programming perspective (i.e. POSIX), but they're all entirely different
  under the hood.
  
  Unix-based filesystems (sans WABPL, and ignoring the BSD-only LFS) have
  never ever Ever EVER given any guarantee about the repariability of the
  filesystem after a crash if it has been mounted with MNT_ASYNC.

What on earth do you mean by Unix-based filesystems such that this
statement is true?

  Perhaps this sentence from McKusick's memo about fsck will help you to
  understand:  fsck is able to repair corrupted file systems using
  procedures based upon the order in which UNIX honors these file system
  update requests.  This is true for all Unix-based filesystems.

No, it is true for ffs, and possibly for our ext2 implementation
(which shares a lot of code with ffs) but nothing else.

-- 
David A. Holland
dholl...@netbsd.org


Re: Lost file-system story

2011-12-12 Thread Mouse
 [...], you do indeed seem to think that async-mounted Unix-based
 filesystems should be able to be repaired, at least some of the time,

There's a huge difference between this isn't promsied and this never
happens.

They _can_ be repaired...some of the time.  When they can, it is
because, by coincidence, it just so happens that the stuff that got
written produces a filesystem fsck can repair.

 The probablility of any Unix-based filesystem being repariable after
 a crash is zero (0) if it has been mounted with MNT_ASYNC, and if
 there was _any_ activity that affected its structure since mount time
 up to the time of the crash.

This is simply false.  I just tried it.  On a 5.1 i386 system, I used
fdisk and disklabel to make a half-gig partition, newfsed it, mounted
it normally, copied a file into it, unmounted it, mounted it async,
removed the file, and hit the power switch.  After the machine came
back up, I tried fsck on the filesystem.  It said it was clean.  I used
fsck -f.  It was happy.  I mounted it and, as far as I can tell, fsck
was correct in thinking the filesystem was OK.  So, there is an
existence-proof-by-example that there are circumstances under which a
filesystem mounted async can be changed and still be left in a state
fsck can repair.

 It still might survive after some types of changes, but it _probably_
 won't.

Right.  But that's not probability ... is zero (0).

 Linux ext2 is not a Unix-based filesystem and Linux itself is not a
 Unix-based kernel.

It's about as Unix-based as NetBSD is.  Unless you mean something
strange by Unix-based - what _do_ you mean by it?

 For Unix-based filesystems and their repair tools, any probablility
 of recovery less than one is as good as if it were zero.

That's not how I feel about it when I've lost a filesystem.  I'll take
a filesystem with a nonzero probability of recovering something useful
from over one that guarantees to trash everything any day (other things
being equal, of course).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Mon, 12 Dec 2011 15:08:40 +, David Holland dholland-t...@netbsd.org 
wrote:
Subject: Re: Lost file-system story
 
 On Sun, Dec 11, 2011 at 06:53:26PM -0800, Greg A. Woods wrote:
 No, as far as I can tell he understands perfectly well; he just
 doesn't consider the behavior acceptable.
 
 It appears that currently a ffs volume mounted -oasync never writes
 back metadata. I don't think this behavior is acceptable either.

I agree there are conditions and operations which _should_ guarantee
that the on-disk state of the filesystem is identical to what the user
perceives and thus that the filesystem is 100% consistent and secure.

It seems umount(2) works to make this guarantee, for example.

The two other most important of these that come to mind are:

mount -u -r /async-mounted-fs

and

mount -u -o noasync /async-mounted-fs

It is my understanding that neither works at the moment, and that this
is a known and reported and accepted bug, as I outlined in an earlier
post to this thread.

I think sync(2) should probably also work, but _only_ if the filesystem
is made entirely quiescent from before the time sync() is called, and
until after the time all the writes it has scheduled have completed, all
the way to the disk media.  (and of course once activity starts on the
filesystem again, all guarantees are lost again)

It might be nice if sync(2) could schedule all the needed writes to
happen in an order which would ensure consistency and repairability of
the on-disk image at any given time, but I'm guessing this might be too
much to ask, at least without some more significant effort.

However without enforcing the synchronous ordering of writes, sync(2)
is effectively useless for the purposes Mr. Allen appears to have,
though perhaps his level of risk tollerance would still make it useful
to him while others of us would still be unable to tolerate its dangers
in any scenarios where we were not prepared to use newfs to recover.

Besides, the only way I know to guarantee a filesystem remains quiescent
is to unmount it, so if you do that first then there's nothing for
sync(2) to do afterwards, so nothing new to implement.  :-)


   DO NOT confuse any Linux-based filesystem with any Unix-based
   filesystem.  They may have nearly identical semantics from the user
   programming perspective (i.e. POSIX), but they're all entirely different
   under the hood.
   
   Unix-based filesystems (sans WABPL, and ignoring the BSD-only LFS) have
   never ever Ever EVER given any guarantee about the repariability of the
   filesystem after a crash if it has been mounted with MNT_ASYNC.
 
 What on earth do you mean by Unix-based filesystems such that this
 statement is true?

I mean exactly what it sounds like -- nothing more.

Having almost no knowledge about ext2 or any other non-Unix-based
filesystems, I'm trying to be careful to avoid making any claims about
those non-Unix-based filesystems.

I included FFS as a Unix-based filesystem because I know for sure that
it shares many of the attributes of the original Unix filesystems with
respect to the issues surrouding MNT_ASYNC.

   Perhaps this sentence from McKusick's memo about fsck will help you to
   understand:  fsck is able to repair corrupted file systems using
   procedures based upon the order in which UNIX honors these file system
   update requests.  This is true for all Unix-based filesystems.
 
 No, it is true for ffs, and possibly for our ext2 implementation
 (which shares a lot of code with ffs) but nothing else.

Well, if you follow what I by Unix-based filesystems, and you ignore LFS
and options like WABPL, as I've said, then I believe it is entirely true
since within my definition that leaves just FFS, and.

V7, though it didn't have MNT_ASYNC, would suffer the same as if
MNT_ASYNC were implemented for it -- indeed I'm guessing that NetBSD's
reimplementation of v7fs will have the same problems with MNT_ASYNC.

As I say, I don't know enough about the non-Unix-based filesystems in
NetBSD, such as those compatible AmigaDOS, Acorn, Windows NT, or even
MS-DOS, to know if they would be adversely affected by MNT_ASYNC.
Indeed I'm not even sure if they all have reasonable filesystem repair
tools (NetBSD has none, except maybe for ext2fs and msdos, though in my
experience NetBSD's MS-DOS filesystem implementation is very fragile and
it does not have a truly useful fsck_msdos, even without trying to use
MNT_ASYNC with it).  SysVbfs may suffer too, but I don't know enough
about it either despite it being by definition Unix-based, and we don't
have an fsck for it in any case.

I'd also be guessing about EFS, and I'm not sure I'd categorize it as
Unix-based any more than I do LFS.

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpVUXizbZcol.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Greg Troxel

Andy Ruhl acr...@gmail.com writes:

 If solving your problem depends on sync frequency, I don't see why
 this shouldn't be managed by some knob to twiddle. Given that the
 crash scenario doesn't get worse depending on where the knob is or if
 the crash happens while the knob is working. If it does, it's
 pointless.

My sense is that Donald isn't complaining about why is the sync
frequency 30s instead of 60s; it's more bafflement at waiting 10-15
minutes with an idle disk and having the data not synced at all.
There's a historical period of 30s, and that seems both not often enough
not to cause trouble and often enough to not boggle users.

It may also make sense to have a syncer behavior that is low rate, to
not overwhelm asked-for IO, and to use most of the disk bandwidth when
it is on, and to let it be otherwise, for laptops.  But a basic
correctness property is almost certainly that if the disk is spun up and
is not in heavy use and lots of time passes, dirty buffers (data and
metadata) are written to disk.


pgp7T73KBKPFG.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Mon, 12 Dec 2011 11:09:44 -0500 (EST), Mouse mo...@rodents-montreal.org 
wrote:
Subject: Re: Lost file-system story

 They _can_ be repaired...some of the time.  When they can, it is
 because, by coincidence, it just so happens that the stuff that got
 written produces a filesystem fsck can repair.

That's totally irrellevant.

Possibilities other than zero or one are not useful in manual pages, and
they are only useful to an end user as a very last resort -- equivalent
to calling out the army to put Humpty Dumpty back together again.

For all useful intents and purposes any probablity of irreparable damage
of greater than zero is, for the end user, and for all planning purposes,
as good as a probability of one.  Plan to use newfs and restore after
every crash and you'll be OK.  Plan otherwise and you will eventually be
disappointed.

 That's not how I feel about it when I've lost a filesystem.  I'll take
 a filesystem with a nonzero probability of recovering something useful
 from over one that guarantees to trash everything any day (other things
 being equal, of course).

Heh.  Yup, there are those of use who will find it a challenge to see
just how much we can recover from a damaged file system no matter how
useful the outcome may be.

You don't put that in the manual page though, and you never give the end
user that expectation (unless it's already too late for them and they've
got yolk all over their face).

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgp42zLQBCM7L.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Sun, 11 Dec 2011 23:23:33 -0500, Donald Allen donaldcal...@gmail.com wrote:
Subject: Re: Lost file-system story
 
 How can you possibly say such a thing and hope to be taken seriously?
 What you just said means that P(survival) = .999 is the same as
 P(survival) = 0.
 
 There are a LOT of situations (e.g., mine) where P(survival) = .999
 would be very acceptable and P(survival) = 0 would not.

The manual page must not give probabilities or even speak of
possiblities.

So, as-is you have been warned properly by the manual page.

For planning purposes you _must_ expect that your filesystem will be
damaged beyond repair after a crash and that you will have to use
newfs and restore to recover.  Learn these expectations well and you
will be happier in the long run.  Fail to learn them and you have no
recourse but to wallow in your own sorrows.  I.e. you can't come to the
mailing list and say that you expected something better just because you
say you can get something better from something else entirely different.
You have false expectations based on your experiences with entirely
foreign environments.

Maybe Humpty Dumpty can be put back together again, sometimes, but even
if you have all the King's horses and all the King's men on call to
respond to a disaster at a moment's notice, you must not expect that you
can have the egg put back together successfully, even just once, even if
it does look like just a minor crack this time.

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpiHkVmGsc5g.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Eric Haszlakiewicz
On Mon, Dec 12, 2011 at 11:39:38AM -0800, Greg A. Woods wrote:
 Having almost no knowledge about ext2 or any other non-Unix-based
 filesystems, I'm trying to be careful to avoid making any claims about
 those non-Unix-based filesystems.

hmm.. so then how can you claim that it is entirely different (as you did
in an earlier email)?  It sounds like you're talking our of your, ahem.. depth.

 I included FFS as a Unix-based filesystem because I know for sure that
 it shares many of the attributes of the original Unix filesystems with
 respect to the issues surrouding MNT_ASYNC.

Have you tried actually comparing the current NetBSD ffs sources against
whatever Unix sources you are talking about?  While I'm sure that there
are many attributes that are shared, if you even compare the current NetBSD
sources with those from, say, 1994, you will find a ton of differences.

eric


Re: Lost file-system story

2011-12-12 Thread Eric Haszlakiewicz
On Mon, Dec 12, 2011 at 12:10:32PM -0800, Greg A. Woods wrote:
 At Sun, 11 Dec 2011 23:23:33 -0500, Donald Allen donaldcal...@gmail.com 
 wrote:
 Subject: Re: Lost file-system story
  
  How can you possibly say such a thing and hope to be taken seriously?
  What you just said means that P(survival) = .999 is the same as
  P(survival) = 0.
  
  There are a LOT of situations (e.g., mine) where P(survival) = .999
  would be very acceptable and P(survival) = 0 would not.
 
 The manual page must not give probabilities or even speak of
 possiblities.

Oh really, Greg?  I suppose you can believe that if you want to, while the 
rest of us can continue to live in the real world where knowing things like 
that is actually useful.

 recourse but to wallow in your own sorrows.  I.e. you can't come to the
 mailing list and say that you expected something better just because you
 say you can get something better from something else entirely different.
 You have false expectations based on your experiences with entirely
 foreign environments.

Donald, don't listen to Greg.  Just in case it needs to be repeated, you're
not the only one that thinks it is reasonable to expect a non-0 probability
that things will be recovereable, even if something goes wrong.

eric


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Mon, 12 Dec 2011 14:23:40 -0600, Eric Haszlakiewicz e...@nimenees.com 
wrote:
Subject: Re: Lost file-system story
 
 Donald, don't listen to Greg.  Just in case it needs to be repeated, you're
 not the only one that thinks it is reasonable to expect a non-0 probability
 that things will be recovereable, even if something goes wrong.

Eric, what part of MNT_ASYNC don't you understand?

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgpz7FPSKpwfe.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Donald Allen
On Mon, Dec 12, 2011 at 2:40 PM, Greg Troxel g...@ir.bbn.com wrote:

 Andy Ruhl acr...@gmail.com writes:

 If solving your problem depends on sync frequency, I don't see why
 this shouldn't be managed by some knob to twiddle. Given that the
 crash scenario doesn't get worse depending on where the knob is or if
 the crash happens while the knob is working. If it does, it's
 pointless.

 My sense is that Donald isn't complaining about why is the sync
 frequency 30s instead of 60s;

That's right. The only thing I'm *really* complaining about is people
who don't read what seems to me to be plain English (I exclude from my
complaint those for whom English is not their native language).

it's more bafflement at waiting 10-15
 minutes with an idle disk and having the data not synced at all.
 There's a historical period of 30s, and that seems both not often enough
 not to cause trouble and often enough to not boggle users.

That's certainly an issue with NetBSD that David Holland, correctly in
my view, identified as a bug. OpenBSD, per the experiments I've
already described, does not exhibit this behavior.

Note that this bug *may* not worsen the probability of recovery after
a crash. It might even increase it! Think about it. If you boot NetBSD
and mount a filesystem async, it is going to be correctly structured
(or deemed to be by fsck) at boot time, or the system wouldn't mount
it. Assuming the system is happy with it, if you then make changes to
the filesystem,  but, because of this bug they are all in the buffer
cache and never get written out, and then the system crashes ---
you've got the filesystem you started with. This bug more importantly
affects, in my view, the amount of stuff you might lose in the event
of a crash. If the system has been up for N hours and you've been
working away, making changes, dutifully hitting ctrl-s in gnumeric to
write out changes because people have told you that changes to a
gnumeric spreadsheet aren't in the filesystem until saved, and the
system crashes, you are in for a big surprise. Chances are good that
you will not lose the filesystem, but chances are great that you will
lose your N hours of work.


 It may also make sense to have a syncer behavior that is low rate, to
 not overwhelm asked-for IO, and to use most of the disk bandwidth when
 it is on, and to let it be otherwise, for laptops.  But a basic
 correctness property is almost certainly that if the disk is spun up and
 is not in heavy use and lots of time passes, dirty buffers (data and
 metadata) are written to disk.

Yep.

Now, knowing about this bug, a simple sync-sleep loop takes care of
it. But it should be fixed in the system, so the user doesn't have to
remember to do this, or to install such a loop in one of the init-time
files.

/Don


Re: Lost file-system story

2011-12-12 Thread Donald Allen
On Mon, Dec 12, 2011 at 3:10 PM, Greg A. Woods wo...@planix.ca wrote:
 At Sun, 11 Dec 2011 23:23:33 -0500, Donald Allen donaldcal...@gmail.com 
 wrote:
 Subject: Re: Lost file-system story

 How can you possibly say such a thing and hope to be taken seriously?
 What you just said means that P(survival) = .999 is the same as
 P(survival) = 0.

 There are a LOT of situations (e.g., mine) where P(survival) = .999
 would be very acceptable and P(survival) = 0 would not.

 The manual page must not give probabilities or even speak of
 possiblities.

Even when the process the man page is describing is non-deterministic?
So you want man pages that lie?


 So, as-is you have been warned properly by the manual page.

 For planning purposes you _must_ expect that your filesystem will be
 damaged beyond repair after a crash and that you will have to use
 newfs and restore to recover.  Learn these expectations well and you
 will be happier in the long run.  Fail to learn them and you have no
 recourse but to wallow in your own sorrows.  I.e. you can't come to the
 mailing list and say that you expected something better just because you
 say you can get something better from something else entirely different.
 You have false expectations based on your experiences with entirely
 foreign environments.

 Maybe Humpty Dumpty can be put back together again, sometimes, but even
 if you have all the King's horses and all the King's men on call to
 respond to a disaster at a moment's notice, you must not expect that you
 can have the egg put back together successfully, even just once, even if
 it does look like just a minor crack this time.

You seem to have some pre-conceived and incorrect notions, together
with a don't-confuse-me-with-the-facts attitude. You've hit the Daily
Double.

You spoke about happier in the long run above. I'd suggest trying to
give more weight to reading/input and less to writing/output, and
you'll most likely be happier in the long run. No guarantees, of
course.

/Don



 --
                                                Greg A. Woods
                                                Planix, Inc.

 wo...@planix.com       +1 250 762-7675        http://www.planix.com/


Re: Lost file-system story

2011-12-12 Thread Greg A. Woods
At Mon, 12 Dec 2011 14:17:35 -0600, Eric Haszlakiewicz e...@nimenees.com 
wrote:
Subject: Re: Lost file-system story
 
 On Mon, Dec 12, 2011 at 11:39:38AM -0800, Greg A. Woods wrote:
  Having almost no knowledge about ext2 or any other non-Unix-based
  filesystems, I'm trying to be careful to avoid making any claims about
  those non-Unix-based filesystems.
 
 hmm.. so then how can you claim that it is entirely different (as you did
 in an earlier email)?  It sounds like you're talking our of your, ahem.. 
 depth.

As I said, I'm trying to be careful to avoid making claims one way or
another about non-Unix-based filesystems.

I'm also trying to keep in mind that MNT_ASYNC can be an attribute of
the OS implementation well above the filesystems and I'm also trying to
avoid making claims about non-Unix filesytem structures which may be
faced with this feature for the first time.

Once upon a time I was quite familiar with the use of the tools that
came before fsck.  I have a great deal of experience with the on-disk
structure of V7fs, SysVfs, and many of the minor variants of these
filesystems.  I'm experienced with many of the things that can go wrong
with these filesystems and I'm moderately experienced with how they can
be repaired as best as is humanly possible with low-level bit
manipulating tools when bugs in either the kernel or fsck cause
unexpected failures (not unlike what can happen when MNT_ASYNC is used).
I'm moderately experienced with more modern filesystems such as with
SysVr4's native FS and Berkeley FFS, though less experienced with
low-level on-disk repair of those filesystems (since on these modern
Unix-based filesystems the standard repair tools, especially fsck, have
been vastly improved; and kernel bugs which destroy the ordered writing
of metadata have effectively been eliminated).

  I included FFS as a Unix-based filesystem because I know for sure that
  it shares many of the attributes of the original Unix filesystems with
  respect to the issues surrouding MNT_ASYNC.
 
 Have you tried actually comparing the current NetBSD ffs sources against
 whatever Unix sources you are talking about?  While I'm sure that there
 are many attributes that are shared, if you even compare the current NetBSD
 sources with those from, say, 1994, you will find a ton of differences.

This has nothing to do with any given pile of source code per se.  The
issues that affect repariability of a Unix-based filesystem are higher
level design considerations that are common to the implementations of
fsck and the filesystems they can repair from the v7 addenda tape all
the way through to the implementation of modern day NetBSD's
fsck_ffs(8).

You might find McKusick and Kowalski's paper about BSD FFS fsck
enlightening.  (I can supply a copy if you can't find it elsewhere.  It
would be nice if it could be included in the NetBSD distribution, even
if not cleaned up to reflect the current implementation.  It was in
4.4BSD-Lite2, after all.)


Like I said earlier:

Perhaps the superblock(s) should also record when a filesystem has been
mounted with MNT_ASYNC so that fsck(8) can print a warning such as:

FS is dirty and was mounted async.  Demons will fly out of your nose


-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgprM7NvSBuE4.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Greg Troxel

Greg A. Woods wo...@planix.ca writes:

 At Mon, 12 Dec 2011 14:23:40 -0600, Eric Haszlakiewicz e...@nimenees.com 
 wrote:
 Subject: Re: Lost file-system story
 
 Donald, don't listen to Greg.  Just in case it needs to be repeated, you're
 not the only one that thinks it is reasonable to expect a non-0 probability
 that things will be recovereable, even if something goes wrong.

 Eric, what part of MNT_ASYNC don't you understand?

He seems to understand it quite well.

Donald came here not complaining, just surprised that things were
somewhat worse than one would have expected.  And he's right - async
doesn't mean and data might never be written indefinitely, just that
there are no ordering or completion guarantees.  I'm not 100% clear what
is wrong, but it seems likely that this discussion has surfaced a bug or
two


pgpPerN83fK7Y.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-12 Thread Matt W. Benjamin
Hi,

Why would sync not be effective under MNT_ASYNC?  Use of sync is not required 
to lead to consistency expect with respect to an arbitrary point in time, but I 
don't think anyone ever believed otherwise.  However, there should be no 
question of metadata never being written out if sync was run?

Matt

- Greg A. Woods wo...@planix.ca wrote:

 
 (I am waffling though on whether I think sync(2) should have any
 beneficial affect on the consistency of MNT_ASYNC-mounted
 filesytems.)
 

-- 

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309


Re: Lost file-system story

2011-12-12 Thread Mouse
 They _can_ be repaired...some of the time.

 That's totally irrellevant.

I don't think so, not when I'm replying to a claim otherwise.

 Possibilities other than zero or one are not useful in manual pages,

Then we can throw away fsck, because there is always _some_ chance the
filesystem will be irreparable.  Memory, CPUs, disks, and the
transports between them do fail, occasionally transiently.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Lost file-system story

2011-12-11 Thread Donald Allen
On Fri, Dec 9, 2011 at 8:43 PM, Greg A. Woods wo...@planix.ca wrote:
 At Fri, 9 Dec 2011 15:50:35 -0500, Donald Allen donaldcal...@gmail.com 
 wrote:
 Subject: Re: Lost file-system story

 does not guarantee to keep a consistent file system structure on the
 disk is what I expected from NetBSD. From what I've been told in this
 discussion, NetBSD pretty much guarantees that if you use async and
 the system crashes, you *will* lose the filesystem if there's been any
 writing to it for an arbitrarily long period of time, since apparently
 meta-data for async filesystems doesn't get written as a matter of
 course.

 I'm not sure what the difference is.

You would be sure if you'd read my posts carefully. The difference is
whether the probability of an async-mounted filesystem is near zero or
near one.

 You seem to be quibbling over
 minor differences and perhaps one-off experiences.

Having a crash almost certainly destroy your filesystem vs. having the
filesystem almost certainly survive a crash is not a minor difference.

 Both OpenBSD and
 NetBSD also say that you should not use the async flag unless you are
 prepared to recreate the file system from scratch if your system
 crashes.  That means use newfs(8) [and, by implication, something like
 restore(8)], not fsck(8), to recover after a crash.  You got lucky with
 your test on OpenBSD.



 And then there's the matter of NetBSD fsck apparently not
 really being designed to cope with the mess left on the disk after
 such a crash. Please correct me if I've misinterpreted what's been
 said here (there have been a few different stories told, so I'm trying
 to compute the mean).

 That's been true of Unix (and many unix-like) filesystems and their
 fsck(8) commands since the beginning of Unix.

 fsck(8) is designed to rely on the possible states of on-disk filesystem
 metadata because that's now Unix-based filesystems have been guaranteed
 to work (barring use of MNT_ASYNC, obviously).

 And that's why by default, and by very strong recommendation, filesystem
 metadata for Unix-based filesystems (sans WABPL) should always be
 written synchronously to the disk if you ever hope to even try to use
 fsck(8).

That's simply not true. Have you ever used Linux in all the years that
 ext2 was the predominant filesystem? ext2 filesystems were routinely
mounted async for many years; everything -- data, meta-data -- was
written asynchronously with no regard to ordering. And yet, when those
systems crashed, fsck generally, not always, but usually, restored the
filesystem to working order. Of course, some data could be lost and
was, but you rarely suffered the loss of an entire filesystem. That's
a fact.



 I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
 I'm simply pointing out that that system appears to be an example of
 ffs doing what I thought it did and what I know ext2 and journal-less
 ext4 do -- do a very good job of putting the world into operating
 order (without offering an impossible guarantee to do so) after a
 crash when async is used, after having been told that ffs and its fsck
 were not designed to do this.

 You seem to be very confused about what MNT_ASYNC is and is not.  :-)

No, you don't understand what I've said.


 Unix filesystems, including Berkeley Fast File System variant, have
 never made any guarantees about the recoverability of an async-mounted
 filesystem after a crash.

I never thought or asserted otherwise.


 You seem to have inferred some impossible capability based on your
 experience with other non-Unix filesystems that have a completely
 different internal structure and implementation from the Unix-based
 filesystems in NetBSD.

Nonsense -- I have inferred no such thing. Instead of referring you to
previous posts for a re-read, I'll give you a little summary. I am
speaking about probabilities. I completely understand that no
filesystem mounted async (or any other way, for that matter), whether
Linux or NetBSD or OpenBSD, is GUARANTEED to survive a crash. The
probability of surviving a crash for any of them is  1. But my
experience with Linux ext2 over many years has been that the
probability of survival is quite high, near 1. When I reported my
experience with NetBSD ffs in this thread, I expressed surprise that
the filesystem was a total loss, based on what preceded the crash. My
surprise was a result of years of Linux experience. I then got some
responses -- see the one from Thor Lancelot Simon, for example. In
that message, he asserts that, in NetBSD, *nothing* pushes meta-data
to the disk for a filesystem mounted async. Others said some
contradictory things about that and I'm not sure what the truth is,
but if Simon is right, then the probability of crash survival in
NetBSD is indeed near zero. Another point that was made was that
NetBSD ffs fsck was not designed to put a damaged filesystem back
together, at least the kind of damage one might encounter with async
mounting. The probability of an async

Re: Lost file-system story

2011-12-11 Thread Matthew Mondor
On Fri, 9 Dec 2011 22:12:25 -0500
Donald Allen donaldcal...@gmail.com wrote:

 Linux systems do periodically write ext2 meta-data to the disk. And
 ext2 fsck has always been very good, and has gotten better over the
 years, due to the efforts of Ted T'so. I first installed Linux in
 1993, almost 20 years ago, and have been using it continuously ever
 since. I have *never* lost an ext2 filesystem and I've never mounted
 one sync.

I'm not sure if it's the case on Linux with ext2, but by default NetBSD
FFS mounts are not sync, nor async; metadata is sync and data blocks
are async.  In async mode, all data is asyncronously written, including
the metadata, and in sync mode everything is written synchronously (the
default OpenBSD uses, if I recall).  I just wanted to specify this as
you mentioned not mounting your ext2 systems in sync mode, but a
default NetBSD FFS mount will not be in sync mode either.

Other available options with FFS are using soft dependencies (softdep)
or WAPBL metadata journalling (log), with which it is possible to have
increased performance VS the default mode, without really sacrificing
reliability, unlike with the fully async mode.  In those modes,
metadata is written asynchroneously as well.

Sorry if what I said is already obvious to you,
-- 
Matt


Re: Lost file-system story

2011-12-11 Thread David Holland
On Tue, Dec 06, 2011 at 11:58:25AM -0500, Thor Lancelot Simon wrote:
  With the filesystem mounted async *nothing* pushes out most
  metadata updates,

If this is really true, it's a bug and should be fixed.

-- 
David A. Holland
dholl...@netbsd.org


Fwd: Lost file-system story

2011-12-11 Thread Donald Allen
I should have sent this to the mailing list as well as David. Google
has fixed something that wasn't broke -- gmail. They've introduced a
new UI that I haven't gotten used to yet ...


-- Forwarded message --
From: Donald Allen donaldcal...@gmail.com
Date: Sun, Dec 11, 2011 at 10:23 AM
Subject: Re: Lost file-system story
To: David Holland dholland-t...@netbsd.org


On Sun, Dec 11, 2011 at 8:57 AM, David Holland dholland-t...@netbsd.org wrote:
 On Tue, Dec 06, 2011 at 11:58:25AM -0500, Thor Lancelot Simon wrote:
   With the filesystem mounted async *nothing* pushes out most
   metadata updates,

 If this is really true, it's a bug and should be fixed.

It may very well be true. I just did the following:

I brought up my test laptop, running 5.1 GENERIC, with /home mounted
async,noatime. I created a new file in my home directory. I should
note that when I ZZ'ed out of vi, the disk light flashed momentarily,
and I could hear the disk doing something. I did an ls -lt | head and
the new file was there. I waited just under a minute (to let syncs
happen; this is longer than any of the sysctl vfs.sync.delays, which I
assume are in seconds; the man page doesn't say) and then I pulled the
plug (no battery in the machine). On restart, I got no fsck errors,
but the new file was not in my home directory. I then repeated this
test, waiting a little over a minute this time. Same result, the new
file was gone (this time I got fsck errors). Then I did the test a
third time, but this time I did a sync before pulling the plug. On
restart, I still got some fsck errors that were fixed, but the new
file was present.

This does suggest that the meta-data is not being written, at least
within a minute or so of creating a new file.

One thing I think we have not discussed much or at all in this thread
is that the filesystem surviving a crash and how much data you lose
when it does survive are separate issues. The experiments I did
yesterday demonstrate that a NetBSD ffs async-mounted filesystem,
together with its fsck, *can* survive a crash in bad circumstances --
lots of write activity at the time of the crash. We don't know what
the probability of survival is, just that it's  0.

What I did yesterday also does not address loss of data. If Simon is
correct and NetBSD is just not writing metadata until sync is
explicitly called, then you could have a system up for days or weeks
and lose as many as all of the files created in an async filesystem
since the last re-boot. We don't know definitively what it's doing
yet, but I think I've demonstrated that it's not writing meta-data
within one minute windows. I will do some more playing with this,
waiting longer and will report what I find. We also know from this
morning's tests that a user-called sync does get the meta-data
written, reducing the amount of data lost in a crash that the
filesystem survives. So those who advocated periodically calling sync
in a loop (Christos first suggested this to me in a private email) are
correct -- it's necessary if you are going to use async mounting.

More later ...

/Don



 --
 David A. Holland
 dholl...@netbsd.org


Re: Lost file-system story

2011-12-11 Thread Donald Allen
On Sun, Dec 11, 2011 at 10:25 AM, Donald Allen donaldcal...@gmail.com wrote:
 I should have sent this to the mailing list as well as David. Google
 has fixed something that wasn't broke -- gmail. They've introduced a
 new UI that I haven't gotten used to yet ...


 -- Forwarded message --
 From: Donald Allen donaldcal...@gmail.com
 Date: Sun, Dec 11, 2011 at 10:23 AM
 Subject: Re: Lost file-system story
 To: David Holland dholland-t...@netbsd.org


 On Sun, Dec 11, 2011 at 8:57 AM, David Holland dholland-t...@netbsd.org 
 wrote:
 On Tue, Dec 06, 2011 at 11:58:25AM -0500, Thor Lancelot Simon wrote:
   With the filesystem mounted async *nothing* pushes out most
   metadata updates,

 If this is really true, it's a bug and should be fixed.

 It may very well be true. I just did the following:

 I brought up my test laptop, running 5.1 GENERIC, with /home mounted
 async,noatime. I created a new file in my home directory. I should
 note that when I ZZ'ed out of vi, the disk light flashed momentarily,
 and I could hear the disk doing something. I did an ls -lt | head and
 the new file was there. I waited just under a minute (to let syncs
 happen; this is longer than any of the sysctl vfs.sync.delays, which I
 assume are in seconds; the man page doesn't say) and then I pulled the
 plug (no battery in the machine). On restart, I got no fsck errors,
 but the new file was not in my home directory. I then repeated this
 test, waiting a little over a minute this time. Same result, the new
 file was gone (this time I got fsck errors). Then I did the test a
 third time, but this time I did a sync before pulling the plug. On
 restart, I still got some fsck errors that were fixed, but the new
 file was present.

 This does suggest that the meta-data is not being written, at least
 within a minute or so of creating a new file.

 One thing I think we have not discussed much or at all in this thread
 is that the filesystem surviving a crash and how much data you lose
 when it does survive are separate issues. The experiments I did
 yesterday demonstrate that a NetBSD ffs async-mounted filesystem,
 together with its fsck, *can* survive a crash in bad circumstances --
 lots of write activity at the time of the crash. We don't know what
 the probability of survival is, just that it's  0.

 What I did yesterday also does not address loss of data. If Simon is
 correct and NetBSD is just not writing metadata until sync is
 explicitly called, then you could have a system up for days or weeks
 and lose as many as all of the files created in an async filesystem
 since the last re-boot. We don't know definitively what it's doing
 yet, but I think I've demonstrated that it's not writing meta-data
 within one minute windows. I will do some more playing with this,
 waiting longer and will report what I find. We also know from this
 morning's tests that a user-called sync does get the meta-data
 written, reducing the amount of data lost in a crash that the
 filesystem survives. So those who advocated periodically calling sync
 in a loop (Christos first suggested this to me in a private email) are
 correct -- it's necessary if you are going to use async mounting.

I repeated the test without the sync, but waited 15 minutes after
creating the new file before killing the power. When the system came
up, I got fsck errors that were fixed, and the new file I created 15
minutes before pulling the plug was not present. Whether this is
intentional or a bug, I agree with David Holland -- it's wrong and
should be fixed.

/Don


 More later ...

 /Don



 --
 David A. Holland
 dholl...@netbsd.org


Re: Lost file-system story

2011-12-11 Thread Joerg Sonnenberger
On Sun, Dec 11, 2011 at 10:50:29AM -0500, Donald Allen wrote:
 I repeated the test without the sync, but waited 15 minutes after
 creating the new file before killing the power. When the system came
 up, I got fsck errors that were fixed, and the new file I created 15
 minutes before pulling the plug was not present. Whether this is
 intentional or a bug, I agree with David Holland -- it's wrong and
 should be fixed.

I disagree. It is exactly why I use FFS with -o async -- to get a disk
backed storage, that doesn't waste resources, if everything fits into
memory, but falls gracefully otherwise.

Joerg


Re: Lost file-system story

2011-12-11 Thread Donald Allen
On Sun, Dec 11, 2011 at 11:04 AM, Joerg Sonnenberger
jo...@britannica.bec.de wrote:
 On Sun, Dec 11, 2011 at 10:50:29AM -0500, Donald Allen wrote:
 I repeated the test without the sync, but waited 15 minutes after
 creating the new file before killing the power. When the system came
 up, I got fsck errors that were fixed, and the new file I created 15
 minutes before pulling the plug was not present. Whether this is
 intentional or a bug, I agree with David Holland -- it's wrong and
 should be fixed.

 I disagree. It is exactly why I use FFS with -o async -- to get a disk
 backed storage, that doesn't waste resources, if everything fits into
 memory, but falls gracefully otherwise.

Certainly a valid requirement, but we haven't talked about what the
fix should be. I think it should have an adjustable sync frequency, so
that the user can turn a knob from I want to lose as little as
possible to I want maximum performance. If I get my wish, you can
use the latter, which should set the frequency to zero.

/Don


 Joerg


Re: Lost file-system story

2011-12-11 Thread Joerg Sonnenberger
On Sun, Dec 11, 2011 at 11:32:51AM -0500, Donald Allen wrote:
 On Sun, Dec 11, 2011 at 11:04 AM, Joerg Sonnenberger
 jo...@britannica.bec.de wrote:
  On Sun, Dec 11, 2011 at 10:50:29AM -0500, Donald Allen wrote:
  I repeated the test without the sync, but waited 15 minutes after
  creating the new file before killing the power. When the system came
  up, I got fsck errors that were fixed, and the new file I created 15
  minutes before pulling the plug was not present. Whether this is
  intentional or a bug, I agree with David Holland -- it's wrong and
  should be fixed.
 
  I disagree. It is exactly why I use FFS with -o async -- to get a disk
  backed storage, that doesn't waste resources, if everything fits into
  memory, but falls gracefully otherwise.
 
 Certainly a valid requirement, but we haven't talked about what the
 fix should be. I think it should have an adjustable sync frequency, so
 that the user can turn a knob from I want to lose as little as
 possible to I want maximum performance. If I get my wish, you can
 use the latter, which should set the frequency to zero.

I don't see the point. Out of order meta updates can fry the file system
at any point. Really, just don't use them if you can't recreate the
file system freely. As has been mentioned elsewhere in the thread, the
default mount option is *not* async.

Joerg


Re: Lost file-system story

2011-12-11 Thread Donald Allen
On Sun, Dec 11, 2011 at 11:44 AM, Joerg Sonnenberger
jo...@britannica.bec.de wrote:
 On Sun, Dec 11, 2011 at 11:32:51AM -0500, Donald Allen wrote:
 On Sun, Dec 11, 2011 at 11:04 AM, Joerg Sonnenberger
 jo...@britannica.bec.de wrote:
  On Sun, Dec 11, 2011 at 10:50:29AM -0500, Donald Allen wrote:
  I repeated the test without the sync, but waited 15 minutes after
  creating the new file before killing the power. When the system came
  up, I got fsck errors that were fixed, and the new file I created 15
  minutes before pulling the plug was not present. Whether this is
  intentional or a bug, I agree with David Holland -- it's wrong and
  should be fixed.
 
  I disagree. It is exactly why I use FFS with -o async -- to get a disk
  backed storage, that doesn't waste resources, if everything fits into
  memory, but falls gracefully otherwise.

 Certainly a valid requirement, but we haven't talked about what the
 fix should be. I think it should have an adjustable sync frequency, so
 that the user can turn a knob from I want to lose as little as
 possible to I want maximum performance. If I get my wish, you can
 use the latter, which should set the frequency to zero.

 I don't see the point. Out of order meta updates can fry the file system
 at any point. Really, just don't use them if you can't recreate the
 file system freely. As has been mentioned elsewhere in the thread, the
 default mount option is *not* async.

Yes, they *can* destroy the filesystem, but in Linux ext2, they rarely
do (see what I've said about this in previous messages in this
thread), and I've started, in a small way, to build a case for NetBSD
ffs and its fsck also having a reasonable probability of surviving a
crash (what really matters is the joint probability of crashing --
very low in the case of Linux over the years -- *and* losing the
filesystem on restart).

As for the knob, it probably doesn't make sense to mount a filesystem
async and then set the knob to sync every 50 milliseconds. One isn't
going to get much of a performance benefit in return for incurring the
risk of async mounting (I would guess that the risk goes down as the
sync frequency goes up, but doesn't go to zero). If safety is one's
orientation, it would probably be better to mount default, sync, or
softdep, or use the new journaling option. But sync'ing every 5
minutes or 10 minutes might well give one the performance benefit that
brought async to consideration in the first place, while likely
limiting lost work to a 5- or 10-minute window. I say likely,
because I emphasize again, for the umpteenth time in this discussion,
that I completely understand that async incurs the risk of losing the
whole filesystem. But if NetBSD/ffs/fsck turns out to exhibit the same
behavior as Linux/ext2 has exhibited for years, the joint probability
of crashing and incurring that loss is extremely low. And if it
happens, I can and will deal with that.

As an example, the machine I'm typing this on is running 5.1 with an
/etc/fstab that looks like this:

# NetBSD /etc/fstab
# See /usr/share/examples/fstab/ for more examples.
/dev/wd0a   /   ffs rw,noatime  1 1
/dev/wd0b   noneswapsw,dp   0 0
/dev/wd0e   /usrffs rw,noatime  1 2
/dev/wd0f   /varffs rw,noatime  1 2
/dev/wd0g   /home   ffs rw,noatime,async1 2
/dev/wd0b   /tmpmfs rw,-s=205632
kernfs  /kern   kernfs  rw
ptyfs   /dev/ptsptyfs   rw
procfs  /proc   procfs  rw
/dev/cd0a   /cdrom  cd9660  ro,noauto

So everything has the default mounting+noatime except /home, which is
noatime,async. I routinely rsync my home directory among my many
machines, so I've got N very up-to-date backups. If I lose /home, not
that big a deal. But if the system crashes and the filesystem is
recovered, I'd like to have the option to make it a smaller deal
still, and be able to define a maximum-loss window, something smaller
than the min(time since last normal reboot, time since last rsync).

/Don




 Joerg


Re: Lost file-system story

2011-12-11 Thread Donald Allen
 More later ...

I installed OpenBSD 5.0 on the same machine, similar setup (all
filesystems noatime except /tmp and /home, which are both
async,noatime). I repeated my experiment -- wrote a new file in my
home directory, waited a few minutes, and killed the power. On reboot,
there were complaints from the fscks, async and not, all fixed. The
system came up without a manual fsck and the new file was present in
my directory. So meta-data for async filesystems is being written
within a window of a handful of minutes with OpenBSD.

/Don


Re: Lost file-system story

2011-12-11 Thread Rhialto
On Fri 09 Dec 2011 at 17:40:29 -0500, Donald Allen wrote:
 If I can find the time, I'll do that.

Even a little shell script would do:

#!/bin/sh
while sleep 30; do sync; done

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- There's no point being grown-up if you 
\X/ rhialto/at/xs4all.nl-- can't be childish sometimes. -The 4th Doctor


Re: Lost file-system story

2011-12-11 Thread Andy Ruhl
On Sun, Dec 11, 2011 at 3:21 PM, Donald Allen donaldcal...@gmail.com wrote:
 More later ...

 I installed OpenBSD 5.0 on the same machine, similar setup (all
 filesystems noatime except /tmp and /home, which are both
 async,noatime). I repeated my experiment -- wrote a new file in my
 home directory, waited a few minutes, and killed the power. On reboot,
 there were complaints from the fscks, async and not, all fixed. The
 system came up without a manual fsck and the new file was present in
 my directory. So meta-data for async filesystems is being written
 within a window of a handful of minutes with OpenBSD.

I haven't read every single word you've said about this subject, so I
apologize if I'm missing something.

I assume you're using async because you want better performance and
you have some tolerance for data loss, otherwise this wouldn't even be
a discussion I think.

We're just talking about probabilities of data loss then, correct? For
some people (I suspect, a few that have already answered), this isn't
something they are willing to discuss, even though we all know it's
impossible to get to 1 as you said. But you can get really close
these days.

If solving your problem depends on sync frequency, I don't see why
this shouldn't be managed by some knob to twiddle. Given that the
crash scenario doesn't get worse depending on where the knob is or if
the crash happens while the knob is working. If it does, it's
pointless.

Why haven't other solutions been discussed? NetBSD supports ext2. And
raid. And all kinds of other stuff. Why not use it?

Andy


Re: Lost file-system story

2011-12-11 Thread David Holland
On Sun, Dec 11, 2011 at 05:04:23PM +0100, Joerg Sonnenberger wrote:
   I repeated the test without the sync, but waited 15 minutes after
   creating the new file before killing the power. When the system came
   up, I got fsck errors that were fixed, and the new file I created 15
   minutes before pulling the plug was not present. Whether this is
   intentional or a bug, I agree with David Holland -- it's wrong and
   should be fixed.
  
  I disagree. It is exactly why I use FFS with -o async -- to get a disk
  backed storage, that doesn't waste resources, if everything fits into
  memory, but falls gracefully otherwise.

That's as may be, but it's still wrong. The syncer should be writing
out the metadata buffers as well as file data. (For your purpose,
you'd want it to be writing out neither, btw.)

Note the result from OpenBSD; we probably broke it with the UBC merge
and never noticed.

Don't we have at least one filesystem that doesn't support UBC? What
happens to it?

-- 
David A. Holland
dholl...@netbsd.org


Re: Lost file-system story

2011-12-10 Thread Edgar Fuß
My impression is that you are asking for the impossible.

The underlying misconception (which I know very well for suffering from it 
myself) is that a filesystem aims at keeping the on-disc metadata consistent 
and that tools like fsck are intended to rapair any inconsistencies happening 
nontheless.

This, I learned, is not true.

The point of syncronous metadata writes, soft dependency metadata write 
re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the on-disc 
metadata consistent. The sole point is to, under all adverse conditions, leave 
that metadata in a state that can be _put back_ into a consistent state 
(peferrably reflecting an in-memory state not too far back from the time of the 
crash) by fsck, on-mount journal replay or whatever.
That difference becomes perfectly clear with journalling. After an unclean 
shutdown, the on-disc metadata need not be consistent. But the journal enables 
putting it back into a consistent state quite easily.
So fsck is not aimed at and does not claim to be able to recover from random 
inconsistencies in the on-disc metadata. It is aimed at repairing those 
inconsistencies that can occur because a crash _given the metadata was written 
syncronously_.
FreeBSD's background fsck, by the way, is aimed at reparing only those 
inconsistencies that can occur given the metadata was written with softep's 
re-ordering.

Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
performance penalty.

So you seem to be asking for the File System Holy Grail: a file system that is 
as fast as asyncronous metadata writes, yet able to survive any possible kind 
of unclean shutdown. Such a thing, to my knowledge, doesn't exist.


Re: Lost file-system story

2011-12-10 Thread Donald Allen
On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote:
 My impression is that you are asking for the impossible.

 The underlying misconception (which I know very well for suffering from it 
 myself) is that a filesystem aims at keeping the on-disc metadata consistent 
 and that tools like fsck are intended to rapair any inconsistencies happening 
 nontheless.

 This, I learned, is not true.

 The point of syncronous metadata writes, soft dependency metadata write 
 re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the 
 on-disc metadata consistent. The sole point is to, under all adverse 
 conditions, leave that metadata in a state that can be _put back_ into a 
 consistent state (peferrably reflecting an in-memory state not too far back 
 from the time of the crash) by fsck, on-mount journal replay or whatever.
 That difference becomes perfectly clear with journalling. After an unclean 
 shutdown, the on-disc metadata need not be consistent. But the journal 
 enables putting it back into a consistent state quite easily.
 So fsck is not aimed at and does not claim to be able to recover from random 
 inconsistencies in the on-disc metadata. It is aimed at repairing those 
 inconsistencies that can occur because a crash _given the metadata was 
 written syncronously_.
 FreeBSD's background fsck, by the way, is aimed at reparing only those 
 inconsistencies that can occur given the metadata was written with softep's 
 re-ordering.

 Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
 performance penalty.

 So you seem to be asking for the File System Holy Grail: a file system that 
 is as fast as asyncronous metadata writes, yet able to survive any possible 
 kind of unclean shutdown. Such a thing, to my knowledge, doesn't exist.

I'm sorry, I don't wish to be rude, but you, too, seem not to have
read what I've written carefully. Or, perhaps the fault is mine, that
I simply haven't made myself sufficiently clear. I've talked at length
about the behavior of Linux ext2 and that it was more than acceptable,
both from a standpoint of performance and reliability. I am not
looking for something able to survive any possible kind of unclean
shutdown. I'm looking for a reasonably low joint probability of a
crash occurring *and* losing an async-mounted filesystem as a result.
I simply want an async implementation where the benefit (performance)
is not out-weighed by the risk (lost filesystems) and I cited Linux
ext2 is an example of that. If that's not clear to you, then I'm
afraid I can't do better.


Re: Lost file-system story

2011-12-10 Thread Donald Allen
On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow
buh...@lothlorien.nfbcal.org wrote:
        Hello.  Just for your edification, it is possible to break out of fsck
 mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
 own.

This whole discussion, interesting though it may be, may have occurred
simply because of my unfamiliarity with NetBSD and probably a mistake
in not looking at the fsck man page for something like the -y option
when I reached the point where continuing to feed 'y's to fsck after
the original crash seemed like a losing battle. Had I thought about -y
(I know that fscks typically have such an option, but in my experience
it's an optional answer to fsck questions, as OpenBSD's is; for
whatever reason, I didn't think of it), I'd have used it, since I had
nothing to lose at that point. But it's possible you have put your
finger on the real truth of what happened here. Read on.

You suggested trying the experiment I did with OpenBSD with NetBSD,
and so I did. Twice. I installed NetBSD with separate directories for
/, /usr, /var, /tmp, and /home, ala OpenBSD's default setup. All,
except /home and /tmp were mounted softdef,noatime. /home was mounted
async, and /tmp is an in-memory filesystem. The first time, I untarred
the OpenBSD ports.tar.gz (I used it because it was what I used in the
OpenBSD test, it's big, and I had it lying around) into a temporary
directory in my home directory. With the battery removed from the
laptop, I did an

rm -rf ports

and while that was happening, I pulled the power connector.

On restart, fsck found a bunch of things it didn't like about the
/home filesystem, but managed to fix things up to its satisfaction and
declare the filesystem clean. My home directory survived this and,
like OpenBSD, a fair amount of the ports directory was still present.
I then removed it and re-did the untar, while the untar was happening,
I again pulled the plug. This time, the automatic fsck got unhappy
enough to drop me into single-user mode and ran fsck there manually. I
again encountered a seemingly never-ending sequence of requests to fix
this and that. So I aborted and used the -y option. It charged through
a bunch of trouble spots and completed. On reboot, I found the same
situation as the first one -- home directory intact and some of the
ports directory present.

I have a some thoughts about this:

1. Had I run fsck -y at the time of the first crash, I might well have
found what I found today -- a repaired filesystem that was usable. So
my assertion that the filesystem was lost may well have simply been my
lack of skill as a NetBSD sys-admin.
2. Today's experiment shows that a NetBSD ffs filesystem mounted
async, together with its fsck, *is* capable of surviving even a pretty
brutal improper shutdown -- loss of power while a lot of writing was
happening. Obviously I still don't have enough data to know if the
probability of survival is comparable to Linux ext2, but what I found
today is at least encouraging.

I did one more experiment, and that was to untar the ports tarball,
and then waited about a minute. I then did a sync. The disk light
blinked just for a brief moment. This is a *big* tar file, but it
appears from this easy little test that there was not a huge amount of
dirty stuff sitting in the buffer cache. This is obviously not
definitive, but does suggest that NetBSD is migrating stuff from the
buffer cache back to the disk for async-mounted filesystems in timely
fashion. A look at the code is probably the final arbiter here. I also
note that there are sysctl items, such as vfs.sync.metadelay that I
would like to understand.

/Don Allen


Re: Lost file-system story

2011-12-10 Thread Aleksej Saushev
Donald Allen donaldcal...@gmail.com writes:

 On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote:
 Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
 performance penalty.

 So you seem to be asking for the File System Holy Grail: a file
 system that is as fast as asyncronous metadata writes, yet able to
 survive any possible kind of unclean shutdown. Such a thing, to my
 knowledge, doesn't exist.

 I'm sorry, I don't wish to be rude, but you, too, seem not to have
 read what I've written carefully. Or, perhaps the fault is mine, that
 I simply haven't made myself sufficiently clear. I've talked at length
 about the behavior of Linux ext2 and that it was more than acceptable,
 both from a standpoint of performance and reliability. I am not
 looking for something able to survive any possible kind of unclean
 shutdown. I'm looking for a reasonably low joint probability of a
 crash occurring *and* losing an async-mounted filesystem as a result.
 I simply want an async implementation where the benefit (performance)
 is not out-weighed by the risk (lost filesystems) and I cited Linux
 ext2 is an example of that. If that's not clear to you, then I'm
 afraid I can't do better.

I think that it should be clear that async mount excludes what you want.
Async mount basically means that you create fresh file system after boot.
In linux it may mean another thing (e.g., it may be less asynchronous),
in BSDs it means exactly that. Thus, unless you really can afford
starting file system from scratch, don't mount it async.


-- 
HE CE3OH...



Re: Lost file-system story

2011-12-09 Thread Donald Allen
I just did a little experiment. I installed OpenBSD 5.0 on the same
machine where I had my adventure with NetBSD. This time, I broke up
the world into separate filesystems, which OpenBSD facilitates,
mounting only /home and /tmp async, noatime. All the others were
mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
into my home directory (I had previously un-tarred it into /usr). I
then did

rm -rf ports

which takes awhile. While that was going, I hit the power button (I
can afford to lose a filesystem containing only my home directory;
it's backed up thoroughly, because I rsync it from one machine to
another; there are current copies on several other machines). The
system did a rapid shutdown without sync'ing the filesystems.

On restart, all the softdep-mounted filesystems had no errors in fsck,
as one might expect (especially since there was no intensive
write-activity going on when I improperly shut the system down, as
there was in /home), but I got an Unexpected inconsistency error in
my home directory and requested a manual fsck; the system dropped into
single-user mode after the automatic fscks finished. I ran the fsck on
the filesystem that gets mounted as /home and there were a number of
files and directories that were apparently half-deleted and it asked
me one-by-one if I wanted to delete them. I did with a few, but then
used the 'F' option to do so without further interaction (I don't
believe the NetBSD fsck gave me that option; it is not documented in
the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
man page). The fsck completed and marked the filesystem clean. I
rebooted, everything mounted normally, and a check of my home
directory shows everything intact, even most of the ports directory,
whose deletion I deliberately interrupted.

The async warning in the OpenBSD mount page reads as follows:

async   Metadata I/O to the file system should be done
 asynchronously.  By default, only regular data is
 read/written asynchronously.

 This is a dangerous flag to set since it does not
 guarantee to keep a consistent file system structure on
 the disk.  You should not use this flag unless you are
 prepared to recreate the file system should your system
 crash.  The most common use of this flag is to speed up
 restore(8) where it can give a factor of two speed
 increase.

does not guarantee to keep a consistent file system structure on the
disk is what I expected from NetBSD. From what I've been told in this
discussion, NetBSD pretty much guarantees that if you use async and
the system crashes, you *will* lose the filesystem if there's been any
writing to it for an arbitrarily long period of time, since apparently
meta-data for async filesystems doesn't get written as a matter of
course. And then there's the matter of NetBSD fsck apparently not
really being designed to cope with the mess left on the disk after
such a crash. Please correct me if I've misinterpreted what's been
said here (there have been a few different stories told, so I'm trying
to compute the mean).

I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
I'm simply pointing out that that system appears to be an example of
ffs doing what I thought it did and what I know ext2 and journal-less
ext4 do -- do a very good job of putting the world into operating
order (without offering an impossible guarantee to do so) after a
crash when async is used, after having been told that ffs and its fsck
were not designed to do this. The reason I'm beating on this is that I
would have liked to use NetBSD for the application I have in mind, but
I need the performance improvement that async provides (my tests show
this; the tests also show that NetBSD async is about as fast as Linux,
much faster than OpenBSD async, at least for doing a lot of writing,
such as un-tarring a large tar file). This is practical if the joint
probability of the system crashing *and* losing the async filesystem
is low. My one little data point was discouraging -- the system
crashed when using a wireless card with a common chipset (atheros)
resulted in losing my network connection and then a system crash when
a restart of networking was attempted (and, I had to use the atheros
card because the system didn't pick up the built-in Cisco wireless
device, which I think is supposed to be served by the an driver). The
crash took out the filesystem, as we've been discussing.

So I'd love it if my experience encourages someone to improve NetBSD
ffs and fsck to make use of async practical, perhaps by drawing on
what OpenBSD has done. I also realize that my situation is unusual,
and with resources being scarce, there are a lot more important things
to work on, that will affect a lot more people. But I'd at least like
to get it in the queue.


Re: Lost file-system story

2011-12-09 Thread Brian Buhrow
Hello.  Just for your edification, it is possible to break out of fsck
mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
own.

With regard to your notes on speed with NetBSD versus OpenBSD,  I
suspect the speed trade off is where the difference is.  OpenBSD is
flushing buffers to disk more frequently than NetBSD is, and thus the
filesystem is more complete with respect to what is on disk.  Since you
readily admit that you are a rare case, might I suggest that there may be
an easy way for you to have your cake and eat it too. That is, get the
speed and performance of NetBSD with the relative reliability, which may
have been luck -- I'm not sure, with OpenBSD.  You could write yourself a
little program, or find an old version of update(8) from old source trees,
which runs as a daemon and calls sync(2) every n seconds where n is what
ever comfort level you deem appropriate.  I believe that when you call
sync(2), even async mounted filesystem data is flushed.  With that program
running, I'd be interested in having you retry your experiment with NetBSD
and see if your results differ.

-Brian
On Dec 9,  3:50pm, Donald Allen wrote:
} Subject: Re: Lost file-system story
} I just did a little experiment. I installed OpenBSD 5.0 on the same
} machine where I had my adventure with NetBSD. This time, I broke up
} the world into separate filesystems, which OpenBSD facilitates,
} mounting only /home and /tmp async, noatime. All the others were
} mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
} into my home directory (I had previously un-tarred it into /usr). I
} then did
} 
} rm -rf ports
} 
} which takes awhile. While that was going, I hit the power button (I
} can afford to lose a filesystem containing only my home directory;
} it's backed up thoroughly, because I rsync it from one machine to
} another; there are current copies on several other machines). The
} system did a rapid shutdown without sync'ing the filesystems.
} 
} On restart, all the softdep-mounted filesystems had no errors in fsck,
} as one might expect (especially since there was no intensive
} write-activity going on when I improperly shut the system down, as
} there was in /home), but I got an Unexpected inconsistency error in
} my home directory and requested a manual fsck; the system dropped into
} single-user mode after the automatic fscks finished. I ran the fsck on
} the filesystem that gets mounted as /home and there were a number of
} files and directories that were apparently half-deleted and it asked
} me one-by-one if I wanted to delete them. I did with a few, but then
} used the 'F' option to do so without further interaction (I don't
} believe the NetBSD fsck gave me that option; it is not documented in
} the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
} man page). The fsck completed and marked the filesystem clean. I
} rebooted, everything mounted normally, and a check of my home
} directory shows everything intact, even most of the ports directory,
} whose deletion I deliberately interrupted.
} 
} The async warning in the OpenBSD mount page reads as follows:
} 
} async   Metadata I/O to the file system should be done
}  asynchronously.  By default, only regular data is
}  read/written asynchronously.
} 
}  This is a dangerous flag to set since it does not
}  guarantee to keep a consistent file system structure on
}  the disk.  You should not use this flag unless you are
}  prepared to recreate the file system should your system
}  crash.  The most common use of this flag is to speed up
}  restore(8) where it can give a factor of two speed
}  increase.
} 
} does not guarantee to keep a consistent file system structure on the
} disk is what I expected from NetBSD. From what I've been told in this
} discussion, NetBSD pretty much guarantees that if you use async and
} the system crashes, you *will* lose the filesystem if there's been any
} writing to it for an arbitrarily long period of time, since apparently
} meta-data for async filesystems doesn't get written as a matter of
} course. And then there's the matter of NetBSD fsck apparently not
} really being designed to cope with the mess left on the disk after
} such a crash. Please correct me if I've misinterpreted what's been
} said here (there have been a few different stories told, so I'm trying
} to compute the mean).
} 
} I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
} I'm simply pointing out that that system appears to be an example of
} ffs doing what I thought it did and what I know ext2 and journal-less
} ext4 do -- do a very good job of putting the world into operating
} order (without offering an impossible guarantee to do so) after a
} crash when async is used, after

Re: Lost file-system story

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 15:50:35 -0500
Donald Allen donaldcal...@gmail.com wrote:

 were not designed to do this. The reason I'm beating on this is that I
 would have liked to use NetBSD for the application I have in mind, but
 I need the performance improvement that async provides (my tests show
 this; the tests also show that NetBSD async is about as fast as Linux,
 much faster than OpenBSD async, at least for doing a lot of writing,
 such as un-tarring a large tar file). This is practical if the joint

The speed and reliability WAPBL provides have been enough for my uses
personally; are the few seconds saved using async really that worth the
trouble?  Also, if raw speed is needed to do many installations on
identical systems, dd with large blocks to mirror the system might be a
faster alternative...

I'm not argueing that fsck shouldn't be able to recover though; it
ideally should, but the problem seems to be that too much metadata is
missing when crashing while writing in async mode.

OpenBSD's async mode would be slightly slower while flushing metadata
more often, probably.  Perhaps that having a sysctl to control flushing
would be a good thing, though.

Thanks,
-- 
Matt


Re: Lost file-system story

2011-12-09 Thread Donald Allen
On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow
buh...@lothlorien.nfbcal.org wrote:
        Hello.  Just for your edification, it is possible to break out of fsck
 mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
 own.

        With regard to your notes on speed with NetBSD versus OpenBSD,  I
 suspect the speed trade off is where the difference is.  OpenBSD is
 flushing buffers to disk more frequently than NetBSD is, and thus the
 filesystem is more complete with respect to what is on disk.

I suspect that is due to OpenBSD's lack of a unified buffer cache,
which NetBSD has. So they run out of space in the buffer cache, even
though memory devoted to (empty) page-frames is available.

 Since you
 readily admit that you are a rare case, might I suggest that there may be
 an easy way for you to have your cake and eat it too. That is, get the
 speed and performance of NetBSD with the relative reliability, which may
 have been luck -- I'm not sure, with OpenBSD.  You could write yourself a
 little program, or find an old version of update(8) from old source trees,
 which runs as a daemon and calls sync(2) every n seconds where n is what
 ever comfort level you deem appropriate.  I believe that when you call
 sync(2), even async mounted filesystem data is flushed.  With that program
 running, I'd be interested in having you retry your experiment with NetBSD
 and see if your results differ.

If I can find the time, I'll do that.


 -Brian
 On Dec 9,  3:50pm, Donald Allen wrote:
 } Subject: Re: Lost file-system story
 } I just did a little experiment. I installed OpenBSD 5.0 on the same
 } machine where I had my adventure with NetBSD. This time, I broke up
 } the world into separate filesystems, which OpenBSD facilitates,
 } mounting only /home and /tmp async, noatime. All the others were
 } mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
 } into my home directory (I had previously un-tarred it into /usr). I
 } then did
 }
 } rm -rf ports
 }
 } which takes awhile. While that was going, I hit the power button (I
 } can afford to lose a filesystem containing only my home directory;
 } it's backed up thoroughly, because I rsync it from one machine to
 } another; there are current copies on several other machines). The
 } system did a rapid shutdown without sync'ing the filesystems.
 }
 } On restart, all the softdep-mounted filesystems had no errors in fsck,
 } as one might expect (especially since there was no intensive
 } write-activity going on when I improperly shut the system down, as
 } there was in /home), but I got an Unexpected inconsistency error in
 } my home directory and requested a manual fsck; the system dropped into
 } single-user mode after the automatic fscks finished. I ran the fsck on
 } the filesystem that gets mounted as /home and there were a number of
 } files and directories that were apparently half-deleted and it asked
 } me one-by-one if I wanted to delete them. I did with a few, but then
 } used the 'F' option to do so without further interaction (I don't
 } believe the NetBSD fsck gave me that option; it is not documented in
 } the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
 } man page). The fsck completed and marked the filesystem clean. I
 } rebooted, everything mounted normally, and a check of my home
 } directory shows everything intact, even most of the ports directory,
 } whose deletion I deliberately interrupted.
 }
 } The async warning in the OpenBSD mount page reads as follows:
 }
 }             async   Metadata I/O to the file system should be done
 }                      asynchronously.  By default, only regular data is
 }                      read/written asynchronously.
 }
 }                      This is a dangerous flag to set since it does not
 }                      guarantee to keep a consistent file system structure on
 }                      the disk.  You should not use this flag unless you are
 }                      prepared to recreate the file system should your system
 }                      crash.  The most common use of this flag is to speed up
 }                      restore(8) where it can give a factor of two speed
 }                      increase.
 }
 } does not guarantee to keep a consistent file system structure on the
 } disk is what I expected from NetBSD. From what I've been told in this
 } discussion, NetBSD pretty much guarantees that if you use async and
 } the system crashes, you *will* lose the filesystem if there's been any
 } writing to it for an arbitrarily long period of time, since apparently
 } meta-data for async filesystems doesn't get written as a matter of
 } course. And then there's the matter of NetBSD fsck apparently not
 } really being designed to cope with the mess left on the disk after
 } such a crash. Please correct me if I've misinterpreted what's been
 } said here (there have been a few different stories told, so I'm trying
 } to compute the mean).
 }
 } I am

Re: Lost file-system story

2011-12-09 Thread Greg A. Woods
At Tue, 6 Dec 2011 12:44:16 -0500, Donald Allen donaldcal...@gmail.com wrote:
Subject: Re: Lost file-system story
 
 much more clear. When I read this before the fun started, I took it to
 mean, perhaps unjustifiably, what I know to be true -- there is some
 non-zero probability that fsck of an async file-system will not be
 able to verify and/or restore the filesystem to correctness  after a
 crash. You are saying that the probability, in the case of NetBSD, is
 1. If that's true, that there's no periodic sync, I would say that's
 *really* a mistake. It should be there with a knob the administrator
 can turn to adjust the sync frequency.

Just to be clear:  There is such a knob, or rather binary switch.  It's
called umount(2).

sync(2) might work too, but I seem to vaguely remember something about
it not working for async-mounted filesystems, and some obscure reason
why it wouldn't/couldn't work for them, though that doesn't seem logical
to me any more.  sync(2) should, IMHO, even go so far as to cause the
dirty flag to be cleared on the disk once all the writes to flush all
necessary updates have completed (and assuming of course that no further
changes of any kind are made to the filesystem after sync(2) scheduled
all the writes, and assuming of course that writes cached in the storage
interface controller or in the drive controller will be written out in
order.

In theory mount -u -r should work too, but then there's PR#30525.

Steve Bellovin asked a question some time ago on netbsd-users about why
umount(2) works, but mount -u -r doesn't, and to the best of my
understanding it hasn't been answered yet (though mention was made of a
possible fix to be found in FreeBSD, followed by some musings on how
hard it is to find and use such fixes in the diverging code bases of
FreeBSD and NetBSD).

Perhaps sync(2) will fail for async-mounted filesystems, or even without
MNT_ASYNC, for the same reason that mount -u -r fails, though that's
pure speculation based on my vague ideas, and is not based on anything
in the code.  The question was asked in PR#30525 about mount -u -r
vs. filesystems mounted with MNT_SYNC, but nobody knew if that would
make any significant difference or not (and I would naively suspect not).

Perhaps the superblock should also record when a filesystem has been
mounted with MNT_ASYNC so that fsck(8) can print a warning such as:

FS is dirty and was mounted async.  Demons will fly out of your nose

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgppoVyhhnBug.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-09 Thread Greg A. Woods
At Fri, 9 Dec 2011 15:50:35 -0500, Donald Allen donaldcal...@gmail.com wrote:
Subject: Re: Lost file-system story
 
 does not guarantee to keep a consistent file system structure on the
 disk is what I expected from NetBSD. From what I've been told in this
 discussion, NetBSD pretty much guarantees that if you use async and
 the system crashes, you *will* lose the filesystem if there's been any
 writing to it for an arbitrarily long period of time, since apparently
 meta-data for async filesystems doesn't get written as a matter of
 course.

I'm not sure what the difference is.  You seem to be quibbling over
minor differences and perhaps one-off experiences.  Both OpenBSD and
NetBSD also say that you should not use the async flag unless you are
prepared to recreate the file system from scratch if your system
crashes.  That means use newfs(8) [and, by implication, something like
restore(8)], not fsck(8), to recover after a crash.  You got lucky with
your test on OpenBSD.


 And then there's the matter of NetBSD fsck apparently not
 really being designed to cope with the mess left on the disk after
 such a crash. Please correct me if I've misinterpreted what's been
 said here (there have been a few different stories told, so I'm trying
 to compute the mean).

That's been true of Unix (and many unix-like) filesystems and their
fsck(8) commands since the beginning of Unix.

fsck(8) is designed to rely on the possible states of on-disk filesystem
metadata because that's now Unix-based filesystems have been guaranteed
to work (barring use of MNT_ASYNC, obviously).

And that's why by default, and by very strong recommendation, filesystem
metadata for Unix-based filesystems (sans WABPL) should always be
written synchronously to the disk if you ever hope to even try to use
fsck(8).


 I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
 I'm simply pointing out that that system appears to be an example of
 ffs doing what I thought it did and what I know ext2 and journal-less
 ext4 do -- do a very good job of putting the world into operating
 order (without offering an impossible guarantee to do so) after a
 crash when async is used, after having been told that ffs and its fsck
 were not designed to do this.

You seem to be very confused about what MNT_ASYNC is and is not.  :-)

Unix filesystems, including Berkeley Fast File System variant, have
never made any guarantees about the recoverability of an async-mounted
filesystem after a crash.

You seem to have inferred some impossible capability based on your
experience with other non-Unix filesystems that have a completely
different internal structure and implementation from the Unix-based
filesystems in NetBSD.

Perhaps the BSD manuals have assumed some knowledge of Unix history, but
even the NetBSD-1.6 mount(8) manual, from 2002, is _extremely_ clear
about the dangers of the async flag, with strong emphasis in the
formatted text on the relevant warning:

 async   All I/O to the file system should be done asyn-
 chronously.  In the event of a crash, _it_is_
 _impossible_for_the_system_to_verify_the_integrity_of_
 _data_on_a_file_system_mounted_with_this_option._  You
 should only use this option if you have an applica-
 tion-specific data recovery mechanism, or are willing
 to recreate the file system from scratch.

According to CVS that wording has not changed since October 1, 2002, and
the emphasised text has been there unchanged since September 16, 1998.

 So I'd love it if my experience encourages someone to improve NetBSD
 ffs and fsck to make use of async practical

As others have already said, this has already been done.  It's called
WABPL.  See wapbl(4) for more information.  Use mount -o log to enable
it.

(BTW, I personally don't think you would want to use softdep -- it can
suffer almost as badly as async after a crash, though perhaps without
totally invalidating fsck(8)'s ability to at least recover files and
directories which were static since mount; and it does also offer vastly
improved performance in many use cases, but as the manual says, it
should still be used with care (i.e. recognition of the risks of
less-tested, much more complex code, and vastly changed internal
implmentation semantics implying radically different recovery modes.)

-- 
Greg A. Woods
Planix, Inc.

wo...@planix.com   +1 250 762-7675http://www.planix.com/


pgp7bEgL4qiOc.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-07 Thread Manuel Bouyer
On Wed, Dec 07, 2011 at 10:21:14AM +1100, Simon Burge wrote:
 David Holland wrote:
 
  There is at least one known structural problem where atime/mtime
  updates do not get applied to buffers (but are instead saved up
  internally) so they don't get written out by the syncer.
  
  We believe this is what causes those unmount-time writes, or at least
  many of them.
 
 I understand the delayed atime writes were to added to reduce the number
 of times a laptop harddisk spins up.  I've often wondered if a simple
 sysctl could be added to control this.
 
 Unmounting my /home on my main machine takes approximately a minute.  My


Seconded. On a ftp server with a large filesystem (5TB, 5M inodes), shutdown
takes a very long time too.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Lost file-system story

2011-12-07 Thread Joerg Sonnenberger
On Wed, Dec 07, 2011 at 10:54:40AM +0100, Manuel Bouyer wrote:
 On Wed, Dec 07, 2011 at 10:21:14AM +1100, Simon Burge wrote:
  David Holland wrote:
  
   There is at least one known structural problem where atime/mtime
   updates do not get applied to buffers (but are instead saved up
   internally) so they don't get written out by the syncer.
   
   We believe this is what causes those unmount-time writes, or at least
   many of them.
  
  I understand the delayed atime writes were to added to reduce the number
  of times a laptop harddisk spins up.  I've often wondered if a simple
  sysctl could be added to control this.
  
  Unmounting my /home on my main machine takes approximately a minute.  My
 
 
 Seconded. On a ftp server with a large filesystem (5TB, 5M inodes), shutdown
 takes a very long time too.

Isn't that more the issue of writing out the atime updates?

Joerg


Re: Lost file-system story

2011-12-07 Thread Manuel Bouyer
On Wed, Dec 07, 2011 at 10:59:11AM +0100, Joerg Sonnenberger wrote:
 On Wed, Dec 07, 2011 at 10:54:40AM +0100, Manuel Bouyer wrote:
  On Wed, Dec 07, 2011 at 10:21:14AM +1100, Simon Burge wrote:
   David Holland wrote:
   
There is at least one known structural problem where atime/mtime
updates do not get applied to buffers (but are instead saved up
internally) so they don't get written out by the syncer.

We believe this is what causes those unmount-time writes, or at least
many of them.
   
   I understand the delayed atime writes were to added to reduce the number
   of times a laptop harddisk spins up.  I've often wondered if a simple
   sysctl could be added to control this.
   
   Unmounting my /home on my main machine takes approximately a minute.  My
  
  
  Seconded. On a ftp server with a large filesystem (5TB, 5M inodes), shutdown
  takes a very long time too.
 
 Isn't that more the issue of writing out the atime updates?

Yes, that's it. Wasn't Simon talking about this ?

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Lost file-system story

2011-12-07 Thread Simon Burge
Manuel Bouyer wrote:

 On Wed, Dec 07, 2011 at 10:59:11AM +0100, Joerg Sonnenberger wrote:
  On Wed, Dec 07, 2011 at 10:54:40AM +0100, Manuel Bouyer wrote:
   On Wed, Dec 07, 2011 at 10:21:14AM +1100, Simon Burge wrote:
David Holland wrote:

 There is at least one known structural problem where atime/mtime
 updates do not get applied to buffers (but are instead saved up
 internally) so they don't get written out by the syncer.
 
 We believe this is what causes those unmount-time writes, or at least
 many of them.

I understand the delayed atime writes were to added to reduce the number
of times a laptop harddisk spins up.  I've often wondered if a simple
sysctl could be added to control this.

Unmounting my /home on my main machine takes approximately a minute.  My
   
   
   Seconded. On a ftp server with a large filesystem (5TB, 5M inodes), 
   shutdown
   takes a very long time too.
  
  Isn't that more the issue of writing out the atime updates?
 
 Yes, that's it. Wasn't Simon talking about this ?

Yes.  We all appear to be in total agreement here :)

Cheers,
Simon.


Re: Lost file-system story

2011-12-07 Thread Ignatios Souvatzis
On Tue, Dec 06, 2011 at 12:44:16PM -0500, Donald Allen wrote:
 On Tue, Dec 6, 2011 at 11:58 AM, Thor Lancelot Simon t...@panix.com wrote:
  On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:
 
  2. I'm a little bit surprised that the filesystem was as much of a
  mess as it was.
 
  I'm not.  You mounted the filesystem async and had a crash.  With the
  filesystem mounted async *nothing* pushes out most metadata updates,
  with the result that the filesystem's metadata can quickly enter a
  fatally inconsistent state.  The only way home safe is a clean unmount.
 
 So unwritten meta-data from an async filesystem can sit in the buffer
 cache for arbitrarily long periods of time in NetBSD? I just want to
 be sure I understand what you are saying. Because that essentially
 guarantees, as you imply above, that if the system crashes, you will
 lose the filesystem. That makes the following warning, in the mount(8)
 man page, in the description of the async option:
 
 In the event of a crash, it is impossible for the system to verify
 the integrity of data on a file system mounted with this option.
 
 much more clear. When I read this before the fun started, I took it to

You left out part of the warning. From NetBSD 5.1:

   async   All I/O to the file system should be done asyn-
   chronously.  In the event of a crash, it is
   impossible for the system to verify the integrity of
   data on a file system mounted with this option.  You
   should only use this option if you have an applica-
   tion-specific data recovery mechanism, or are willing
   to recreate the file system from scratch.

Isn't the last sentence of that paragraph in your version?

Basically, there are two situations where -o async on ffs is sort of safe:

a) you're installing or restore(8)ing on a freshly newfs'd filesystem,
   plan to unmount (or shutdown) as soon as you're finished before using 
   the file system, and could do that again, with the same source data,
   in the event of a power failure during the operation; you get the
   benefit of a fast installation/restore.

b) the file system is on volatile memory and would be gone anyway on
   shutdown, crash, or power failure.

FFS in its default mode has been designed to do part of operations
in an async fashion, but to guarantee enough writes that the remaining 
inconsistencies after a power failure can be cleaned up by fsck_ffs. 
fsck_ffs is designed for this task. It's not designed for arbitrary 
repairs.

(When ffs does async, it really does it.)

Regards,
-is


Re: Lost file-system story

2011-12-07 Thread Donald Allen
On Wed, Dec 7, 2011 at 9:58 AM, Ignatios Souvatzis pre...@ycm-bonn.de wrote:
 On Tue, Dec 06, 2011 at 12:44:16PM -0500, Donald Allen wrote:
 On Tue, Dec 6, 2011 at 11:58 AM, Thor Lancelot Simon t...@panix.com wrote:
  On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:
 
  2. I'm a little bit surprised that the filesystem was as much of a
  mess as it was.
 
  I'm not.  You mounted the filesystem async and had a crash.  With the
  filesystem mounted async *nothing* pushes out most metadata updates,
  with the result that the filesystem's metadata can quickly enter a
  fatally inconsistent state.  The only way home safe is a clean unmount.

 So unwritten meta-data from an async filesystem can sit in the buffer
 cache for arbitrarily long periods of time in NetBSD? I just want to
 be sure I understand what you are saying. Because that essentially
 guarantees, as you imply above, that if the system crashes, you will
 lose the filesystem. That makes the following warning, in the mount(8)
 man page, in the description of the async option:

 In the event of a crash, it is impossible for the system to verify
 the integrity of data on a file system mounted with this option.

 much more clear. When I read this before the fun started, I took it to

 You left out part of the warning. From NetBSD 5.1:

   async       All I/O to the file system should be done asyn-
               chronously.  In the event of a crash, it is
               impossible for the system to verify the integrity of
               data on a file system mounted with this option.  You
               should only use this option if you have an applica-
               tion-specific data recovery mechanism, or are willing
               to recreate the file system from scratch.

 Isn't the last sentence of that paragraph in your version?

No. My version says If you use this option and the system crashes,
everything will be fine.

/Don


Lost file-system story

2011-12-06 Thread Donald Allen
I recently installed NetBSD 5.1 on an old Thinkpad T41 that I use for
experimentation. I installed it with a single, monolithic filesystem,
which I mounted async,noatime. Yes, I'm fully aware that's dangerous
and was aware of it at the time. But  I have a long history of
running Linux systems with ext2 filesystems and now, journal-less ext4
filesystems, and in all the years of running those systems, where no
particular care is taken to write file-system meta-data in ordered
fashion, I have never lost a file-system. Linux crashes are extremely
rare, my systems are either laptops or on UPSes, and I never do
something as stupid as just whacking the power-button to shut them
down. On the rare occasions when a file-system has suffered an
improper shutdown, fsck has always been able to recover with little or
no damage. (I should perhaps mention that I'm retired now, having had
a long career in software development, with a lot of OS development
experience -- IBM CP/67, Tenex, TOPS20, Unix (Mach), and a LOT of
Linux sys-admin experience; less with the BSDs, but not zero).

The T41 has built-in Aironet Wireless Communications MPI350 wireless
hardware. The GENERIC 5.1 kernel did not see this device at boot time,
so no wireless. To fix this, I stuck an Atheros-based PCMCIA card in
the machine, which did work. I was attempting to build Gnucash via
pkgsrc on the T41 and had left the machine grinding away overnight
(webkit is one of Gnucash's dependencies, and it's huge). It had
finished the build when I got up the following morning and I installed
gnucash and then did a
bunch of cleaning-up in /usr/pkgsrc. I then tried to use firefox and
found that my network connection was dead. So I did a

 /etc/rc.d/network restart

and the system froze, completely dead.

Upon restart, the automatic fsck gave up and requested a manual fsck.
I tried that, but there are just too many things broken, a
consequence, I'm sure, of running async and having this crash occur
just after having done a lot of filesystem writing. The situation was
so bad, I had to abandon this install.

There are two issues here:

1. It looks like there's a bug in the Atheros driver.
2. I'm a little bit surprised that the filesystem was as much of a
mess as it was.

I mentioned all this to old friend Christos Zoulas and he suggested
that I post this message. It is certainly true that I had done a lot
of writing to the filesystem (as a result of my pkgsrc cleanup) and
that had occurred within, say 10 minutes of the crash, maybe less. So
it wasn't hours. But it also wasn't seconds. My Linux experience, and
this is strictly gut feel -- I have no hard evidence to back this up
-- tells me that if this had happened on a Linux system with an async,
unjournaled filesystem, the filesystem would have survived. In
suggesting that I post this, Christos mentioned that he's seen
situations where a lot of writing happened in a session (e.g., a
kernel build) and then the sync at shutdown time took a long time,
which has made him somewhat suspicious that there might be a problem
with the trickle sync that the kernel is supposed to be doing.

So my purpose in posting this is to ask after doing 'make clean's of
perhaps 15 or 20 packages and their dependencies, what is your
estimate of the maximum time before everything gets safely written out
of the buffer cache (this machine has a 1.6 Ghz Pentium M, 2 GB of
memory, and a 7200 rpm 60 GB pata disk -- yes, not a normal
configuration for a T41; I stuck the memory and disk in this machine
taken from another, dead Thinkpad I have)? Is it seconds? Tens of
seconds? Minutes? If it's small, then I would suggest that a kernel
wizard have a look at the trickle sync stuff. I made the point to
Christos that I'm probably one of a very small number, maybe one, who
would mount the whole world async (and please, no lectures; I knew the
risk going in; this was an experiment and I knew it could end badly; I
did not have 10 years worth of un-backed-up financial data on this
machine :-), and it is almost certainly true that if the filesystem
had been mounted sync or softdep, it would have survived the crash. So
if there's a problem with trickle sync, it would only have
catastrophic consequences in the very rare case of someone doing what
I did (mounting async, doing a lot of writing followed by a system
crash). I'm trying to make the argument that there could be a problem
that is benign in 99.99% of the NetBSD setups, and so you haven't
heard about.

/Don Allen


Re: Lost file-system story

2011-12-06 Thread Greg Troxel

Interesting situation.  I agree that after 30s to a minute that most
things should have been flushed.

As a side note, it would be interesting to benchmark async vs wapbl.

I have never really looked, but it has always seemed that it would be
nice to have:

  statistics visibility into the number of dirty  buffers/etc. in
  various caches

  a way to force flushes and clear caches (individually)

Specifically, I think it would be great if 'systat vmstat' had a count
of dirty buffers.

Perhaps this is doable now and I just don't know how.


Another question is if the disk had write caching enabled, but I would
also expect it to flush the write cache quickly.
It would be nice to have  visibility into that cache, but I don't know
if the ata interface supports it.




pgp7ARt3Zqrdn.pgp
Description: PGP signature


Re: Lost file-system story

2011-12-06 Thread Donald Allen
On Tue, Dec 6, 2011 at 11:10 AM, Donald Allen donaldcal...@gmail.com wrote:
[deleted]

 catastrophic consequences in the very rare case of someone doing what
 I did (mounting async, doing a lot of writing followed by a system
 crash). I'm trying to make the argument that there could be a problem
 that is benign in 99.99% of the NetBSD setups, and so you haven't
 heard about.

I should amend this a bit. By 'benign' above, I meant that you
wouldn't lose the filesystem. But if trickle-sync is working too
slowly or not at all, I would think that, in the event of a crash
preceded by writes to a softdep-mounted filesystem, more data could be
lost than if trickle-sync were working as intended. Which wouldn't
feel so benign if it happened to you.

/Don


Re: Lost file-system story

2011-12-06 Thread Thor Lancelot Simon
On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:
 
 2. I'm a little bit surprised that the filesystem was as much of a
 mess as it was.

I'm not.  You mounted the filesystem async and had a crash.  With the
filesystem mounted async *nothing* pushes out most metadata updates,
with the result that the filesystem's metadata can quickly enter a
fatally inconsistent state.  The only way home safe is a clean unmount.

If you mount an FFS filesystem async you are playing with fire.  Sure,
it can be useful, but asbestos clothing is not optional.

Thor


Re: Lost file-system story

2011-12-06 Thread David Holland
On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:
  My Linux experience, and this is strictly gut feel -- I have no
  hard evidence to back this up -- tells me that if this had happened
  on a Linux system with an async, unjournaled filesystem, the
  filesystem would have survived.

Yes, it likely would have, at least if that filesystem was ext2fs.

There is at least one issue beyond bugs though: ext2's fsck is
written to cope with this situation. The ffs fsck isn't, and so it
makes unwarranted assumptions and gets itself into trouble, sometimes
even into infinite repair loops. (That is, where you can 'fsck -fy'
over and over again and it'll never reach a clean state.)

The short answer is: don't do that.

I have no idea, btw, if using our ext2fs this way, along with e2fsck
from the Linux ext2fsprogs, can be expected to work or not. I have
doubts about our fsck_ext2fs though.

  In
  suggesting that I post this, Christos mentioned that he's seen
  situations where a lot of writing happened in a session (e.g., a
  kernel build) and then the sync at shutdown time took a long time,
  which has made him somewhat suspicious that there might be a problem
  with the trickle sync that the kernel is supposed to be doing.

There is at least one known structural problem where atime/mtime
updates do not get applied to buffers (but are instead saved up
internally) so they don't get written out by the syncer.

We believe this is what causes those unmount-time writes, or at least
many of them. However, failure to update timestamps shouldn't result
in a trashed fs.

-- 
David A. Holland
dholl...@netbsd.org


Re: Lost file-system story

2011-12-06 Thread Donald Allen
On Tue, Dec 6, 2011 at 11:58 AM, Thor Lancelot Simon t...@panix.com wrote:
 On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:

 2. I'm a little bit surprised that the filesystem was as much of a
 mess as it was.

 I'm not.  You mounted the filesystem async and had a crash.  With the
 filesystem mounted async *nothing* pushes out most metadata updates,
 with the result that the filesystem's metadata can quickly enter a
 fatally inconsistent state.  The only way home safe is a clean unmount.

So unwritten meta-data from an async filesystem can sit in the buffer
cache for arbitrarily long periods of time in NetBSD? I just want to
be sure I understand what you are saying. Because that essentially
guarantees, as you imply above, that if the system crashes, you will
lose the filesystem. That makes the following warning, in the mount(8)
man page, in the description of the async option:

In the event of a crash, it is impossible for the system to verify
the integrity of data on a file system mounted with this option.

much more clear. When I read this before the fun started, I took it to
mean, perhaps unjustifiably, what I know to be true -- there is some
non-zero probability that fsck of an async file-system will not be
able to verify and/or restore the filesystem to correctness  after a
crash. You are saying that the probability, in the case of NetBSD, is
1. If that's true, that there's no periodic sync, I would say that's
*really* a mistake. It should be there with a knob the administrator
can turn to adjust the sync frequency. There are uses for async
filesystems (hell, google used ext2 for years and now uses
journal-less ext4) and, as I said in my original post, with the
assumed periodic sync'ing, fsck can put the system back together after
a crash, in my case that has been invariably true.


 If you mount an FFS filesystem async you are playing with fire.  Sure,
 it can be useful, but asbestos clothing is not optional.





 Thor


Re: Lost file-system story

2011-12-06 Thread Simon Burge
David Holland wrote:

 There is at least one known structural problem where atime/mtime
 updates do not get applied to buffers (but are instead saved up
 internally) so they don't get written out by the syncer.
 
 We believe this is what causes those unmount-time writes, or at least
 many of them.

I understand the delayed atime writes were to added to reduce the number
of times a laptop harddisk spins up.  I've often wondered if a simple
sysctl could be added to control this.

Unmounting my /home on my main machine takes approximately a minute.  My
ups-nut script does

sync
( cd /home; umount /home )
sync

as soon as it gets an on-battery event so that hopefully the actual
shutdown if needed happens quickly.

Cheers,
Simon.