>>>>> "rs" == Ragnar Sundblad <ra...@csc.kth.se> writes:
rs> But are there any clients that assume that an iSCSI volume is rs> synchronous? there will probably be clients that might seem to implicitly make this assuption by mishandling the case where an iSCSI target goes away and then comes back (but comes back less whatever writes were in its write cache). Handling that case for NFS was complicated, and I bet such complexity is just missing without any equivalent from the iSCSI spec, but I could be wrong. I'd love to be educated. Even if there is some magical thing in iSCSI to handle it, the magic will be rarely used and often wrong until peopel learn how to test it, which they haven't yet they way they have with NFS. yeah, of course, making all writes synchronous isn't an ok way to fix this case because it'll make iscsi way slower than non-iscsi alternatives. rs> Isn't an iSCSI target supposed to behave like any other SCSI rs> disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With rs> that I mean: A disk which understands SCSI commands with an rs> optional write cache that could be turned off, with cache sync rs> command, and all those things. yeah, reboot a SAS disk without rebooting the host it's attached to, and you may see some dropped writes showing up as mysterious checksum errors there as well. I bet disabling said SAS disk's write cache will lessen/eliminate that problem. I think it's become a stupid mess because everyone assumed long past the point where it became unreasonable that disks with mounted filesystems would not ever lose power unless the kernel with the mounted filesystem also lost power. rs> But - all normal disks come with write caching enabled, [...] rs> so why should an iSCSI lun behave any different? because normal disks usually don't dump the contents of their write caches on the floor unless the kernel running the filesystem code also loses power at the same instant. This coincident kernel panic acts as a signal to the filesystem to expect some lost writes of the disks. It also lets the kernel take advantage of NFS server reboot recovery (asking NFS clients to replay some of their writes), and it's an excuse to force-close any file a userland process might've had open on the filesystem, thus forcing those userland processes to go through their crash-recovery steps by replaying database logs and such. Over iSCSI it's relatively common for a target to lose power and then come back without its write cache. but when iSCSI does it, now you are expected to soldier on without killing all userland processes. NFS probably could invoke its crash recovery state machine without an actual server reboot if it wanted to, but I bet it doesn't currently know how, and that's probably not the right fix because you've still got the userland processes problem. I agree with you iSCSI write cache needs to stay on, but there is probably broken shit all over the place from this. pre-ZFS iSCSI targets tend to have battery-backed NVRAM so they can be all-synchronous without demolishing performance and thus fix, or maybe just ease a little bit, this problem.
pgp1nTEtWxPi8.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss