Hello list, my Unified Storage 7310 cluster has a little tickbox that allows me to enable/disable write caching on a LUN. It also has a warning saying not to enable it if you don't know what you're doing, but doesn't go into any detail about how it actually works. I'd like to know what I'm doing, so I hope someone on this list can enlighten me.
As I've understood it so far, when the write cache is disabled, all writes go straight to the ZIL, and are asynchronously migrated to the data disks. The behaviour is the same as if you're writing synchronously to a file on an NFS export. Correct me if I'm wrong here. If I enable the write cache, does all writes then go straight to DRAM/ARC (are those the same thing)? Does the data stop by the ZIL on the way to the data disks, or is it migrated straight there? When using write caching, is it safe to do cluster takeovers/failbacks, will the volatile cached data be flushed to the ZIL or the data disks before the other cluster node takes over or not? I'm using the 7310 as backing for a private cloud with virtual machines running Linux, mostly - one LUN per machine. The filesystem is designed to work well with write-caching block devices, as it's journalling all writes and can inserts barriers after the journal commit blocks. I can see a potential problem with surviving a (uncontrolled only?) cluster failover on the 7310 without a reboot and a journal replay on the VM though - DRAM-cached writes will just vanish and operation will continue as if they never happened, right? But other than that, would it be safe for me to enable write caching? Does enabling/disabling write cache affect pre-caching of reads of the recently-written data? If the writes are cached in the ZIL only (ie. cache disabled, if I've understood correctly), will a read of that data have to wait for the writes to be flushed to the data disks before the data can be read back and inserted into the ARC/L2ARC? If that's the case, will this situation improve if I enable write caching? If a block that was recently-written (and have therefore not yet reached the data disks) is overwritten, will the first write still be flushed to the data disks at some point or not? The reason I'm asking is that I recently had a situation where a virtual machine ran out of memory and started trashing its swap partition, ie. writing lots of data just to re-read it shortly after. This basically killed the performance of the 7310 completely - the disks were all pegged at ~250 IOOPS. The swap partition on the VM was small enough to have fitted in any of the DRAM, ARC, L2ARC, or ZIL, so I would very much like to understand why that happened and if enabling write caching would have prevented it. Best regards, -- Tore Anderson Redpill Linpro AS - http://www.redpill-linpro.com/ Tel: +47 21 54 41 27 _______________________________________________ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss