Bruce Momjian wrote:
xlogdefs.h says:

/*
 *  Because O_DIRECT bypasses the kernel buffers, and because we never
 *  read those buffers except during crash recovery, it is a win to use
 *  it in all cases where we sync on each write().  We could allow O_DIRECT
 *  with fsync(), but because skipping the kernel buffer forces writes out
 *  quickly, it seems best just to use it for O_SYNC.  It is hard to imagine
 *  how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
 *  Also, O_DIRECT is never enough to force data to the drives, it merely
 *  tries to bypass the kernel cache, so we still need O_SYNC or fsync().
 */

This seems wrong because fsync() can win if there are two writes before
the sync call.  Can kernels not issue fsync() if the write was O_DIRECT?
If that is the cause, we should document it.

The comment does look busted, because you did imagine exactly a case where they might be combined. The only incompatibility that I'm aware of is that O_DIRECT requires reads and writes to be aligned properly, so you can't use it in random application code unless it's aware of that. O_DIRECT and fsync are compatible; for example, MySQL allows combining the two: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html

(That whole bit of documentation around innodb_flush_method includes some very interesting observations around O_DIRECT actually)

I'm starting to consider the idea that much of the performance gains seen on earlier systems with O_DIRECT was because it decreased CPU usage shuffling things into the OS cache, rather than its impact on avoiding pollution of said cache. On Linux for example, its main accomplishment is decribed like this: "File I/O is done directly to/from user space buffers." http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html The earliest paper on the implementation suggests a big decrease in CPU overhead from that: http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html

Impossible to guess whether that's more true ("CPU cache pollution is a bigger problem now") or less true ("drives are much slower relative to CPUs now") today. I'm trying to remain agnostic and let the benchmarks offer an opinion instead.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to