>>>>> "ms" == Monish Shah <mon...@indranetworks.com> writes:
>>>>> "sl" == Scott Lawson <scott.law...@manukau.ac.nz> writes:
>>>>> "np" == Neal Pollack <neal.poll...@sun.com> writes:

    ms> If you are on a UPS, is it OK to disable ZIL?

    sl> I have seen numerous UPS' failures over the years,

yeah at my place in NYC we've had more problems with the UPS than with
the service.  At the very least a UPS needs to switch off for new
batteries every two years, and the raw service does not go out that
often for me.

It starts to make more sense to use a UPS if you have dual power
supplies, dual UPS's, bypass switches.  Or crappy aboveground power.

anyway, typical machines panic because of bugs a lot more often than
either UPS or line problems.

**BUT THIS IS ALL BESIDE THE POINT**!

The ZIL is for implementing fsync() for databases and also the part of
NFS that allows servers to reboot without client data loss.  It has
*NOTHING TO DO* with losing your entire pool.  Disabling the ZIL does
not make catastrophic pool loss more likely, not even a little bit!

Unfortunately some software developer decided to write a bunch of DIRE
WARNINGS to SCARE PEOPLE INTO ASSUMPTIONS leading them to use the
maximum amount of code of which said developer is justly proud,
regardless of whether they're using it for the right reason or not.

oddly, I don't think disabling ZIL will make catastrophic loss more
likely for databases running above the ZFS, either, because unlike
non-COW filesystems ZFS never recovers to a state where writes appear
to have happened out-of-order prior to the crash.  Yes, disabling the
ZIL could break the 'D' in ACID for databases running above that ZFS,
but in a way that rolls them back in time, not makes them become
corrupt.  Running without ZIL is as if a snapshot were taken at each
TXG commit time, and on reboot after a crash you recover to the most
recent TXG-snapshot that fully committed, thus databases will be
``crash-consistent'' even without the ZIL, unless I'm mistaken.

Adding an SSD *does* make catastrophic pool loss more likely, because
if you break the SSD and then export the pool, you can never import it
again.  so, adding an SSD for the ZIL as a suggestive good-little-boy
alternative to disabling the ZIL makes catastrophic loss of the entire
pool more likely, not less.

The advantage of rolling with ZIL is, if you're using NFS you should
be able to crash and reboot the server without the clients noticing.
Also MTA's that accept messages, databases that confirm orders and
bookings, won't lose anything they've accepted or confirmed in the
crash (if everything else works).  I wish ZIL could be enabled and
disabled per filesystem instead of per kernel.

Attachment: pgpxF80aXBJS7.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to