>>>>> "rs" == Ragnar Sundblad <ra...@csc.kth.se> writes:

    rs> But are there any clients that assume that an iSCSI volume is
    rs> synchronous?

there will probably be clients that might seem to implicitly make this
assuption by mishandling the case where an iSCSI target goes away and
then comes back (but comes back less whatever writes were in its write
cache).  Handling that case for NFS was complicated, and I bet such
complexity is just missing without any equivalent from the iSCSI spec,
but I could be wrong.  I'd love to be educated.

Even if there is some magical thing in iSCSI to handle it, the magic
will be rarely used and often wrong until peopel learn how to test it,
which they haven't yet they way they have with NFS.

yeah, of course, making all writes synchronous isn't an ok way to fix
this case because it'll make iscsi way slower than non-iscsi
alternatives.

    rs> Isn't an iSCSI target supposed to behave like any other SCSI
    rs> disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?  With
    rs> that I mean: A disk which understands SCSI commands with an
    rs> optional write cache that could be turned off, with cache sync
    rs> command, and all those things.

yeah, reboot a SAS disk without rebooting the host it's attached to,
and you may see some dropped writes showing up as mysterious checksum
errors there as well.  I bet disabling said SAS disk's write cache
will lessen/eliminate that problem.

I think it's become a stupid mess because everyone assumed long past
the point where it became unreasonable that disks with mounted
filesystems would not ever lose power unless the kernel with the
mounted filesystem also lost power.

    rs> But - all normal disks come with write caching enabled, [...]
    rs> so why should an iSCSI lun behave any different?

because normal disks usually don't dump the contents of their write
caches on the floor unless the kernel running the filesystem code also
loses power at the same instant.  This coincident kernel panic acts as
a signal to the filesystem to expect some lost writes of the disks.
It also lets the kernel take advantage of NFS server reboot recovery
(asking NFS clients to replay some of their writes), and it's an
excuse to force-close any file a userland process might've had open on
the filesystem, thus forcing those userland processes to go through
their crash-recovery steps by replaying database logs and such.

Over iSCSI it's relatively common for a target to lose power and then
come back without its write cache.  but when iSCSI does it, now you
are expected to soldier on without killing all userland processes.
NFS probably could invoke its crash recovery state machine without an
actual server reboot if it wanted to, but I bet it doesn't currently
know how, and that's probably not the right fix because you've still
got the userland processes problem.

I agree with you iSCSI write cache needs to stay on, but there is
probably broken shit all over the place from this.  pre-ZFS iSCSI
targets tend to have battery-backed NVRAM so they can be
all-synchronous without demolishing performance and thus fix, or maybe
just ease a little bit, this problem.

Attachment: pgp1nTEtWxPi8.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to