On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote:
> On February 13, 2009 1:10:55 PM -0500 Miles Nordin <car...@ivy.net> wrote:
> >>>>>>"fc" == Frank Cusack <fcus...@fcusack.com> writes:
> >
> >    fc> If you're misordering writes
> >    fc> isn't that a completely different problem?
> >
> >no.  ignoring the flush cache command causes writes to be misordered.
> 
> oh.  can you supply a reference or if you have the time, some more
> explanation?  (or can someone else confirm this.)

Ordering matters for atomic operations, and filesystems are full of
those.

Now, if ordering is broken but the writes all eventually hit the disk
then no one will notice.  But if power failures and/or partitions
(cables get pulled, network partitions occur affecting an iSCSI
connection, ...) then bad things happen.

For ZFS the easiest way to ameliorate this is the txg fallback fix that
Jeff Bonwick has said is now a priority.  And if ZFS guarantees no block
re-use until N txgs pass after a block is freed, then the fallback can
be of up to N txgs, which gives you a decent chance that you'll recover
your pool in the face of buggy devices, but for each discarded txg you
lose that transaction's writes, you lose data incrementally.  (The
larger N is the better your chance that the oldest of the last N txg's
writes will all hit the disk in spite of the disk's lousy cache
behaviors.)

The next question is how to do the fallback, UI-wise.  Should it ever be
automatic?  A pool option for that would be nice (I'd use it on all-USB
pools).  If/when not automatic, how should the user/admin be informed of
the failure to open the pool and the option to fallback on an older txg
(with data loss)?  (For non-removable pools imported at boot time the
answer is that the service will fail, causing sulogin to be invoked so
you can fix the problem on console.  For removable pools there should be
a GUI.)

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to