On Fri, Aug 24, 2012 at 12:49:11PM -0400, Thor Lancelot Simon wrote: > If the controller has a nonvolatile cache, it should ignore the > flush.
I can see why a paranoid admin would want to keep the flush: if you have a power failure longer than the BBU's capacity. That's why I suggested to make it user-settable. > If you have the awkward cache where the controller cache > is NV but the disks have volatile caches which are switched on, > then the disks' caches must be flushed but the controller's > should not. I think in this case you have to flush both: if you flush only the disks, the data you want to be on stable storage may still be in the controller's cache. I can see how to detect if the controller's cache is NV or not, but getting the drive's cache status may be harder. I don't have the programming docs for this controller, unfortunably. > > If the controller's firmware can't do this you probably need > to advise the user to turn off the local caches on the disks, > and skip the controller-level flush in the driver if the > controller indicates its cache is NV. This is a setup of the controller's firmware. > > I have always thought we should use finer grained barriers > (in terms of SCSI or modern ATA, writes with ordered tags, > which we wait for, to force all preceding simple-tagged > requests out first) instead of full cache flushes, but > it is noteworthy that Linux started with that approach and > hurriedly backed away from it. If we intend to go there we > must understand as best we can why they changed their minds > and why we should still use the way they rejected. Agreed. Maybe it's because of (S)ATA drives. If I read the specs properly, the WRITE FUA commands returns only when the data hits the platters, but doens't guarantee that previously written data still in cache are written (i.e. a WRITE FUA may pass previous writes). It looks like there's not ordered tag in ATA. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --