Tom Lane wrote:
Neil Conway <[EMAIL PROTECTED]> writes:
is opening a file with O_DIRECT sufficient to ensure that
a write(2) does not return until the data has hit disk?

Some googling suggests so, eg
http://www.die.net/doc/linux/man/man2/open.2.html

Really?  On that page I read:
 "O_DIRECT...at the completion of the read(2) or write(2)
  system call, data is guaranteed to have been transferred."
which sounds to me like transfered to the device's cache
but not necessarily flushed through the device's cache.
It says nothing about physical media.  That wording feels
different to me from O_SYNC which reads:
 "O_SYNC will block the calling process until the data has
  been physically written to the underlying hardware."
which does suggest to me that it writes to physical media.
Or am I reading that wrong?



PS: I've gotten way out of my depth here, but...

    ...attempting to browse the Linux source(!!)

  Looking at the O_SYNC stuff in ext3:
      http://lxr.linux.no/source/fs/ext3/file.c#L67
  it looks like in this conditional:
   if (file->f_flags & O_SYNC) {
      ...
      goto force_commit;
   }
  the goto branch calls ext3_force_commit() in much the
  same way that it seems fsync() does here:
      http://lxr.linux.no/source/fs/ext3/fsync.c#L71
  so I believe O_SYNC does at least as much as fsync().

  However I can't find O_DIRECT anywhere in the ext3 stuff,
  so if it does work it's less obvious how or if it could.

  Moreover I see O_SYNC used lots of places:
      http://lxr.linux.no/ident?i=O_SYNC
  in various places like fs/ext3/; and and I don't
  see O_DIRECT in nearly as many places
      http://lxr.linux.no/ident?i=O_DIRECT
  It looks like reiserfs and xfs seem look at O_DIRECT,
  but ext3 doesn't appear to unless it's somewhere
  outside the fs/ext3 directory.


PPS: Of course not even fsync() flushed correctly until very recent kernels:
    http://hardware.slashdot.org/comments.pl?sid=149349&cid=12519114
    In that article Jeff Garzik (the linux SATA driver guy) suggests
    that until very recent kernels ext3 did not have write barrier
    support that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE
    (SCSI) commands even on fsync.


PPPS: No, I don't understand the kernel - I'm just showing what quick
      grep commands showed without any deep understanding.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to