Attached is an updated version of your Darwin Fsync-writethrough patch. 
I did much more restructuring because we do fsync in more places than
just WAL (like at checkpoints), and we should follow the WAL
write-through in those places too if that method is used for fsync of
WAL, rather than the normal fsync without the force-writethrough.

I restructured the code so now pg_fsync does writethrough or not based
on the wal setting.  My feeling is you need WAL writethrough, you need
it for other fsyncs as well.  Write-through was added in an 8.0.X
subrelease, and because it was a subrelease, the patch did a minimal
amount of code change.  This change is the restructuring that makes
sense for 8.1.

---------------------------------------------------------------------------

Chris Campbell wrote:
> Thanks, Bruce! Here's the patch (would you rather patches be attached 
> files, or inline?).
> 
> Some explanation. In the current code, there is no "fsync" on Windows, 
> only "fsync_writethrough". But, internally, it behaves identically to 
> "fsync". The only special case was for the GUC strings.
> 
> The patch changes it thusly: since "fsync" and "fsync_writethrough" 
> cannot be identical on Darwin, I added a new constant for 
> SYNC_METHOD_FSYNC_WRITE_THROUGH (3). A GUC value of 
> "fsync_writethrough" sets fsync_method to this new constant.
> 
> On Windows, the fsync_writethrough case falls through to the fsync case 
> (which simply runs fsync). On Darwin, the fsync_writethrough case 
> executes fcntl(openLogFile, F_FULLFSYNC, 0), which does everything that 
> fsync does, and then flushes the disk's write cache. The code does not 
> need to call fsync directly, just fcntl.
> 
> The performance of a script that executes 11,000 INSERT statements 
> (outside a transaction) is pretty slow, since it requires flushing the 
> disk write cache after EVERY query (transaction commit). I waited about 
> 15 minutes before I just killed it. But if you do all the inserts 
> inside a transaction, then I only pay the cache flush price once at 
> transaction commit, and it's quite usable.
> 
> I asked around on IRC, and the only place where flushing the disk cache 
> is important is when syncing a WAL file. So I didn't change the 
> behavior of fsync() globally for the code...just for WAL syncs.
> 
> I added a new #define called HAVE_FSYNC_WRITE_THROUGH, which is defined 
> only by the win32.h and darwin.h port headers. This controls the 
> availability of the "fsync_writethrough" GUC setting. Thanks to the 
> port headers, no configure test is needed.
> 
> Thanks! Let me know what you think of the patch, and tell me what else 
> I can do.
> 
> - Chris
> 
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.319
diff -c -c -r1.319 runtime.sgml
*** doc/src/sgml/runtime.sgml   15 May 2005 00:26:18 -0000      1.319
--- doc/src/sgml/runtime.sgml   17 May 2005 22:03:49 -0000
***************
*** 1595,1601 ****
          values are
          <literal>fsync</> (call <function>fsync()</> at each commit),
          <literal>fdatasync</> (call <function>fdatasync()</> at each commit),
!         <literal>fsync_writethrough</> (call <function>_commit()</> at each 
commit on Windows),
          <literal>open_sync</> (write WAL files with <function>open()</> 
option <symbol>O_SYNC</>), and
          <literal>open_datasync</> (write WAL files with <function>open()</> 
option <symbol>O_DSYNC</>).
          Not all of these choices are available on all platforms.
--- 1595,1601 ----
          values are
          <literal>fsync</> (call <function>fsync()</> at each commit),
          <literal>fdatasync</> (call <function>fdatasync()</> at each commit),
!         <literal>fsync_writethrough</> (force write-through of any disk write 
cache),
          <literal>open_sync</> (write WAL files with <function>open()</> 
option <symbol>O_SYNC</>), and
          <literal>open_datasync</> (write WAL files with <function>open()</> 
option <symbol>O_DSYNC</>).
          Not all of these choices are available on all platforms.
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.191
diff -c -c -r1.191 xlog.c
*** src/backend/access/transam/xlog.c   10 May 2005 22:27:29 -0000      1.191
--- src/backend/access/transam/xlog.c   17 May 2005 22:03:53 -0000
***************
*** 51,61 ****
   * default method.    We assume that fsync() is always available, and that
   * configure determined whether fdatasync() is.
   */
- #define SYNC_METHOD_FSYNC             0
- #define SYNC_METHOD_FDATASYNC 1
- #define SYNC_METHOD_OPEN              2               /* used for both O_SYNC 
and
-                                                                               
 * O_DSYNC */
- 
  #if defined(O_SYNC)
  #define OPEN_SYNC_FLAG           O_SYNC
  #else
--- 51,56 ----
***************
*** 79,99 ****
  #define DEFAULT_SYNC_METHOD_STR    "open_datasync"
  #define DEFAULT_SYNC_METHOD              SYNC_METHOD_OPEN
  #define DEFAULT_SYNC_FLAGBIT     OPEN_DATASYNC_FLAG
! #else
! #if defined(HAVE_FDATASYNC)
  #define DEFAULT_SYNC_METHOD_STR   "fdatasync"
  #define DEFAULT_SYNC_METHOD             SYNC_METHOD_FDATASYNC
  #define DEFAULT_SYNC_FLAGBIT    0
! #else
! #ifndef FSYNC_IS_WRITE_THROUGH
  #define DEFAULT_SYNC_METHOD_STR   "fsync"
  #else
  #define DEFAULT_SYNC_METHOD_STR   "fsync_writethrough"
! #endif
! #define DEFAULT_SYNC_METHOD             SYNC_METHOD_FSYNC
  #define DEFAULT_SYNC_FLAGBIT    0
  #endif
- #endif
  
  
  /* User-settable parameters */
--- 74,92 ----
  #define DEFAULT_SYNC_METHOD_STR    "open_datasync"
  #define DEFAULT_SYNC_METHOD              SYNC_METHOD_OPEN
  #define DEFAULT_SYNC_FLAGBIT     OPEN_DATASYNC_FLAG
! #elif defined(HAVE_FDATASYNC)
  #define DEFAULT_SYNC_METHOD_STR   "fdatasync"
  #define DEFAULT_SYNC_METHOD             SYNC_METHOD_FDATASYNC
  #define DEFAULT_SYNC_FLAGBIT    0
! #elif !defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
  #define DEFAULT_SYNC_METHOD_STR   "fsync"
+ #define DEFAULT_SYNC_METHOD             SYNC_METHOD_FSYNC
+ #define DEFAULT_SYNC_FLAGBIT    0
  #else
  #define DEFAULT_SYNC_METHOD_STR   "fsync_writethrough"
! #define DEFAULT_SYNC_METHOD             SYNC_METHOD_FSYNC_WRITETHROUGH
  #define DEFAULT_SYNC_FLAGBIT    0
  #endif
  
  
  /* User-settable parameters */
***************
*** 122,128 ****
  
  
  /* these are derived from XLOG_sync_method by assign_xlog_sync_method */
! static int    sync_method = DEFAULT_SYNC_METHOD;
  static int    open_sync_bit = DEFAULT_SYNC_FLAGBIT;
  
  #define XLOG_SYNC_BIT  (enableFsync ? open_sync_bit : 0)
--- 115,121 ----
  
  
  /* these are derived from XLOG_sync_method by assign_xlog_sync_method */
! int   sync_method = DEFAULT_SYNC_METHOD;
  static int    open_sync_bit = DEFAULT_SYNC_FLAGBIT;
  
  #define XLOG_SYNC_BIT  (enableFsync ? open_sync_bit : 0)
***************
*** 5249,5264 ****
        int                     new_sync_method;
        int                     new_sync_bit;
  
- #ifndef FSYNC_IS_WRITE_THROUGH
        if (pg_strcasecmp(method, "fsync") == 0)
- #else
-       /* Win32 fsync() == _commit(), which writes through a write cache */
-       if (pg_strcasecmp(method, "fsync_writethrough") == 0)
- #endif
        {
                new_sync_method = SYNC_METHOD_FSYNC;
                new_sync_bit = 0;
        }
  #ifdef HAVE_FDATASYNC
        else if (pg_strcasecmp(method, "fdatasync") == 0)
        {
--- 5242,5259 ----
        int                     new_sync_method;
        int                     new_sync_bit;
  
        if (pg_strcasecmp(method, "fsync") == 0)
        {
                new_sync_method = SYNC_METHOD_FSYNC;
                new_sync_bit = 0;
        }
+ #ifdef HAVE_FSYNC_WRITETHROUGH
+       else if (pg_strcasecmp(method, "fsync_writethrough") == 0)
+       {
+               new_sync_method = SYNC_METHOD_FSYNC_WRITETHROUGH;
+               new_sync_bit = 0;
+       }
+ #endif
  #ifdef HAVE_FDATASYNC
        else if (pg_strcasecmp(method, "fdatasync") == 0)
        {
***************
*** 5328,5339 ****
        switch (sync_method)
        {
                case SYNC_METHOD_FSYNC:
!                       if (pg_fsync(openLogFile) != 0)
                                ereport(PANIC,
                                                (errcode_for_file_access(),
                                        errmsg("could not fsync log file %u, 
segment %u: %m",
                                                   openLogId, openLogSeg)));
                        break;
  #ifdef HAVE_FDATASYNC
                case SYNC_METHOD_FDATASYNC:
                        if (pg_fdatasync(openLogFile) != 0)
--- 5323,5343 ----
        switch (sync_method)
        {
                case SYNC_METHOD_FSYNC:
!                       if (pg_fsync_no_writethrough(openLogFile) != 0)
                                ereport(PANIC,
                                                (errcode_for_file_access(),
                                        errmsg("could not fsync log file %u, 
segment %u: %m",
                                                   openLogId, openLogSeg)));
                        break;
+ #ifdef HAVE_FSYNC_WRITETHROUGH
+               case SYNC_METHOD_FSYNC_WRITETHROUGH:
+                       if (pg_fsync_writethrough(openLogFile) != 0)
+                               ereport(PANIC,
+                                               (errcode_for_file_access(),
+                                       errmsg("could not fsync write-through 
log file %u, segment %u: %m",
+                                                  openLogId, openLogSeg)));
+                       break;
+ #endif
  #ifdef HAVE_FDATASYNC
                case SYNC_METHOD_FDATASYNC:
                        if (pg_fdatasync(openLogFile) != 0)
Index: src/backend/storage/file/fd.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.115
diff -c -c -r1.115 fd.c
*** src/backend/storage/file/fd.c       31 Dec 2004 22:00:51 -0000      1.115
--- src/backend/storage/file/fd.c       17 May 2005 22:03:54 -0000
***************
*** 232,242 ****
  
  
  /*
!  * pg_fsync --- same as fsync except does nothing if enableFsync is off
   */
  int
  pg_fsync(int fd)
  {
        if (enableFsync)
                return fsync(fd);
        else
--- 232,258 ----
  
  
  /*
!  * pg_fsync --- do fsync with or without writethrough
   */
  int
  pg_fsync(int fd)
  {
+ #ifndef HAVE_FSYNC_WRITETHROUGH_ONLY
+       if (sync_method != SYNC_METHOD_FSYNC_WRITETHROUGH)
+               return pg_fsync_no_writethrough(fd);
+       else
+ #endif
+               return pg_fsync_writethrough(fd);
+ }
+ 
+ 
+ /*
+  * pg_fsync_no_writethrough --- same as fsync except does nothing if
+  *    enableFsync is off
+  */
+ int
+ pg_fsync_no_writethrough(int fd)
+ {
        if (enableFsync)
                return fsync(fd);
        else
***************
*** 244,249 ****
--- 260,283 ----
  }
  
  /*
+  * pg_fsync_writethrough
+  */
+ int
+ pg_fsync_writethrough(int fd)
+ {
+       if (enableFsync)
+ #ifdef WIN32
+               return _commit(fd);
+ #elif defined(__darwin__)
+               return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
+ #else
+               return -1;
+ #endif
+       else
+               return 0;
+ }
+ 
+ /*
   * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is 
off
   *
   * Not all platforms have fdatasync; treat as fsync if not available.
Index: src/include/access/xlog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v
retrieving revision 1.60
diff -c -c -r1.60 xlog.h
*** src/include/access/xlog.h   28 Apr 2005 21:47:17 -0000      1.60
--- src/include/access/xlog.h   17 May 2005 22:03:56 -0000
***************
*** 75,80 ****
--- 75,87 ----
   */
  #define XLOG_NO_TRAN                  XLR_INFO_MASK
  
+ /* Sync methods */
+ #define SYNC_METHOD_FSYNC             0
+ #define SYNC_METHOD_FDATASYNC 1
+ #define SYNC_METHOD_OPEN              2                       /* for O_SYNC 
and O_DSYNC */
+ #define SYNC_METHOD_FSYNC_WRITETHROUGH        3
+ extern int    sync_method;
+ 
  /*
   * List of these structs is used to pass data to XLogInsert().
   *
Index: src/include/port/darwin.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/port/darwin.h,v
retrieving revision 1.6
diff -c -c -r1.6 darwin.h
*** src/include/port/darwin.h   23 Dec 2003 03:31:30 -0000      1.6
--- src/include/port/darwin.h   17 May 2005 22:03:57 -0000
***************
*** 1 ****
--- 1,4 ----
  #define __darwin__    1
+ 
+ #define HAVE_FSYNC_WRITETHROUGH
+ 
Index: src/include/port/win32.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/port/win32.h,v
retrieving revision 1.44
diff -c -c -r1.44 win32.h
*** src/include/port/win32.h    24 Mar 2005 04:36:19 -0000      1.44
--- src/include/port/win32.h    17 May 2005 22:03:57 -0000
***************
*** 16,23 ****
  #define mkdir(a,b)    mkdir(a)
  
  
! #define fsync(a)      _commit(a)
! #define FSYNC_IS_WRITE_THROUGH
  #define ftruncate(a,b)        chsize(a,b)
  
  #define USES_WINSOCK
--- 16,23 ----
  #define mkdir(a,b)    mkdir(a)
  
  
! #define HAVE_FSYNC_WRITETHROUGH
! #define HAVE_FSYNC_WRITETHROUGH_ONLY
  #define ftruncate(a,b)        chsize(a,b)
  
  #define USES_WINSOCK
Index: src/include/storage/fd.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/fd.h,v
retrieving revision 1.50
diff -c -c -r1.50 fd.h
*** src/include/storage/fd.h    31 Dec 2004 22:03:42 -0000      1.50
--- src/include/storage/fd.h    17 May 2005 22:03:57 -0000
***************
*** 89,94 ****
--- 89,96 ----
                                                          SubTransactionId 
parentSubid);
  extern void RemovePgTempFiles(void);
  extern int    pg_fsync(int fd);
+ extern int    pg_fsync_no_writethrough(int fd);
+ extern int    pg_fsync_writethrough(int fd);
  extern int    pg_fdatasync(int fd);
  
  /* Filename components for OpenTemporaryFile */
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to