On Jan 22, 2016, at 11:54 AM, James K. Lowden <jklowden at schemamania.org> wrote: > > On Fri, 22 Jan 2016 06:24:08 +0000 > Simon Slavin <slavins at bigfraud.org> wrote: > >> This is, of course, all about waiting for a rotating disc to be in >> the right place. > > All true, but I think you're exaggerating if you're implying that's > what the user will see. A call to write(2) doesn't necessarily involve > the rotating media; it merely transfers the data from userspace to the > kernel buffer cache (using Linux as an example). Even fsync, on > consumer-grade disks, may return when the data have been flushed to the > device's cache, before they come to rest on the platter. Both buffers > ameliorate the effects of latency and track-to-track seek.
First, SQLite *does* fsync() each transaction before returning, on purpose, to provide the D in ACID: https://www.sqlite.org/lockingv3.html Second, even if you?re using the sort of consumer-grade disk that lies about fsync [*] you still have seek time to cope with. The track on disk where the data lands is probably not the track where the indices and other metadata structures live. The head may have to go back and forth several times to complete a transaction. Even when the disk lies about fsync, that cost eventually has to be paid. If it?s left unpaid too long, the write buffer fills up, and then SQLite will have to wait for buffer space to open up. > Given...the capacity of the raw disk (about 100 MB/s) You mean transfer rate, not capacity, of course. But you only get 100 MByte/sec in linear reads, not random writes, which is what multi-track writes effectively are. Typical disks drop into the single digits of MByte/sec on random writes. [*] See "Disks from the Perspective of a File System?, by Marshall Kirk McKusick [**] in ACM Queue: https://queue.acm.org/detail.cfm?id=2367378 [**] https://en.wikipedia.org/wiki/Marshall_Kirk_McKusick