Hi With your tests did you try to write the hot buffers first? ie buffers with a high refcount, either by sorting them on refcount or at least sweeping the buffer list in reverse?
In my understanding there's an 'impedance mismatch' between what postgresql wants and what the OS offers. when it called fsync() Postresql wants a set of buffers selected quickly at checkpoint start time written to disks, but the OS only offers to write all dirties buffers at fsync time, not exactly the same contract, on a loaded server with checkpoint spreading the difference could be big, worst case checkpoint want 8KB fsync write 1GB. As a control, there's 150 years of math, up to Maxwell himself, behind t Adding as little energy (packets) as randomly as possible to a control system you couldn't measure actuators do make a by writing to the OS the less likely to be recycle buffers first it may have less work to do at fsync time, hopefully they have been written by the OS background task during the spread and are not re-dirtied by other backends. Didier