Greg Smith wrote: > Bruce Momjian wrote: > > xlogdefs.h says: > > > > /* > > * Because O_DIRECT bypasses the kernel buffers, and because we never > > * read those buffers except during crash recovery, it is a win to use > > * it in all cases where we sync on each write(). We could allow O_DIRECT > > * with fsync(), but because skipping the kernel buffer forces writes out > > * quickly, it seems best just to use it for O_SYNC. It is hard to imagine > > * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT. > > * Also, O_DIRECT is never enough to force data to the drives, it merely > > * tries to bypass the kernel cache, so we still need O_SYNC or fsync(). > > */ > > > > This seems wrong because fsync() can win if there are two writes before > > the sync call. Can kernels not issue fsync() if the write was O_DIRECT? > > If that is the cause, we should document it. > > > > The comment does look busted, because you did imagine exactly a case > where they might be combined. The only incompatibility that I'm aware > of is that O_DIRECT requires reads and writes to be aligned properly, so > you can't use it in random application code unless it's aware of that. > O_DIRECT and fsync are compatible; for example, MySQL allows combining > the two: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html > > (That whole bit of documentation around innodb_flush_method includes > some very interesting observations around O_DIRECT actually) > > I'm starting to consider the idea that much of the performance gains > seen on earlier systems with O_DIRECT was because it decreased CPU usage > shuffling things into the OS cache, rather than its impact on avoiding > pollution of said cache. On Linux for example, its main accomplishment > is decribed like this: "File I/O is done directly to/from user space > buffers." > http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html The > earliest paper on the implementation suggests a big decrease in CPU > overhead from that: > http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html > > Impossible to guess whether that's more true ("CPU cache pollution is a > bigger problem now") or less true ("drives are much slower relative to > CPUs now") today. I'm trying to remain agnostic and let the benchmarks > offer an opinion instead.
Agreed. Perhaps we need a separate setting to turn direct I/O on and off, and decouple wal_sync_method and direct I/O. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers