Apologies for not catching this in my first reply to Bruce's message.

> There is also the discussion of ordered meta-data updates (OMDU) vs
> unordered (UMDU).  Linux (with the exception of newer journalled
> file systems) does UMDU.  With OMDU, the file meta-data (inode,
> indirect blocks, etc) is written in an ordered fashion, typically
> before the data.  This means FWIR that you can have good meta-data
> pointing to bad data in the case of a crash.  With UMDU, you can
> have bad meta-data but good data, which is something that a fsck
> will detect.

You have ODMU backwards.  Any sane ordered write scheme will write out
a block X before writing out a block (inode or directory entry) which
points to block X.  FFS, with or without soft updates, should never
encounter a case where an inode points to bad data.  (Of course, if
you disk controller reorders write operations you'll lose no matter
what.  Unfortunately, you have to choose both your hardware and your
software somewhat carefully if you really care about filsystem
consistency.)

Linux ext2fs has no write ordering whatsoever.  If the system goes
down uncleanly, you can get metadata pointing to bad data or data not
pointed to by metadata.  A recently created file might exist but
contain blocks from an old copy of /etc/shadow instead of the data you
wrote to it.  It's really ugly.  fsck cannot correct all of the
possible problems which can arise, no matter how clever or thorough it
is.  People have tried to justify this state of affairs in lots of
ways, but the only potentially correct and convincing justification
is, "who cares?"  Which is great unless you're one of the (admittedly,
relatively few) people who does care.

Note that write ordering is different from synchronous
vs. asynchronous operations.  Write ordering is about filesystem
consistency, which is mostly irrelevant to qmail's operation because
of the way qmail works.  ext2fs is also a little odd with respect to
synchronous operations (as discussed in my last piece of mail), but
it's certainly possible to work around that.

Reply via email to