Ian Jackson wrote: Content-Description: message body text > The attached patch implements the ATA write cache feature. This > enables a guest to control, in the standard way, whether disk writes > are immediately committed to disk before the IDE command completes, or > may be buffered in the host. > > In this patch, by default buffering is off, which provides better > reliability but may have a performance impact. It would be > straightforward to change the default, or perhaps offer a command-line > option, if that would be preferred. > > This patch is derived from one which was originally submitted to the > Xen tree by Rik van Riel <[EMAIL PROTECTED]>.
This is a very sensible improvement, imho. However, I notice that it tells the guest that data is committed to hard storage when the host has merely called fsync(). On Linux (and other host OSes), fdatsync() and fsync() don't always commit data to hard storage; it sometimes only commits it to the hard drive cache. (Seriously, just look at fs/ext3/fsync.c; only journal writes cause the flush, and they aren't done if the inode itself hasn't changed). It may be worth mentioning in documentation that guests which need strong durability guarantees, i.e. for critical database work or for filesystem journalling safety following host power failure, it is not enough to disable the IDE write cache in the guest even with this patch. It is necessary to disable the host's disk write cache too, for that. Ideally, the host would provide variation of fdatasync() which flushes data to hard storage in the same way that kernel filesystem journal writes can do, and Qemu would use that. But, presently, I'm not aware of any way to do that short of the administrator disabling the host's disk write cache. (Darwin provides F_FULLSYNC. On Linux, an extra flag to sync_file_range() suggests itself. It would need changes to the block device and elevator APIs, though, as it's a flush command not an ordering tag, and not always associated with a prior or subsequent write although there are some coalescing optimisations when it can be.) -- Jamie