On 14/07/2013 20:13, Greg Smith wrote:
The most efficient way to write things out is to delay those writes as long as possible.

That doesn't smell right to me. It might be that delaying allows more combining and allows the kernel to see more at once and optimise it, but I think the counter-argument is that it is an efficiency loss to have either CPU or disk idle waiting on the other. It cannot make sense from a throughput point of view to have disks doing nothing and then become overloaded so they are a bottleneck (primarily seeking) and the CPU does nothing.

Now I have NOT measured behaviour but I'd observe that we see disks that can stream 100MB/s but do only 5% of that if they are doing random IO. Some random seeks during sync can't be helped, but if they are done when we aren't waiting for sync completion then they are in effect free. The flip side is that we can't really know whether they will get merged with adjacent writes later so its hard to schedule them early. But we can observe that if we have a bunch of writes to adjacent data then a seek to do the write is effectively amortised across them.

So it occurs to me that perhaps we can watch for patterns where we have groups of adjacent writes that might stream, and when they form we might schedule them to be pushed out early (if not immediately), ideally out as far as the drive (but not flushed from its cache) and without forcing all other data to be flushed too. And perhaps we should always look to be getting drives dedicated to dbms to do something, even if it turns out to have been redundant in the end.

That's not necessarily easy on Linux without using a direct unbuffered IO but to me that is Linux' problem. For a start its not the only target system, and having feedback 'we need' from db and mail system groups to the NT kernels devs hasn't hurt, and it never hurt Solaris to hear what Oracle and Sybase devs felt they needed either.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to