Applied. ---------------------------------------------------------------------------
Bruce Momjian wrote: > > Attached is an updated version of your Darwin Fsync-writethrough patch. > I did much more restructuring because we do fsync in more places than > just WAL (like at checkpoints), and we should follow the WAL > write-through in those places too if that method is used for fsync of > WAL, rather than the normal fsync without the force-writethrough. > > I restructured the code so now pg_fsync does writethrough or not based > on the wal setting. My feeling is you need WAL writethrough, you need > it for other fsyncs as well. Write-through was added in an 8.0.X > subrelease, and because it was a subrelease, the patch did a minimal > amount of code change. This change is the restructuring that makes > sense for 8.1. > > --------------------------------------------------------------------------- > > Chris Campbell wrote: > > Thanks, Bruce! Here's the patch (would you rather patches be attached > > files, or inline?). > > > > Some explanation. In the current code, there is no "fsync" on Windows, > > only "fsync_writethrough". But, internally, it behaves identically to > > "fsync". The only special case was for the GUC strings. > > > > The patch changes it thusly: since "fsync" and "fsync_writethrough" > > cannot be identical on Darwin, I added a new constant for > > SYNC_METHOD_FSYNC_WRITE_THROUGH (3). A GUC value of > > "fsync_writethrough" sets fsync_method to this new constant. > > > > On Windows, the fsync_writethrough case falls through to the fsync case > > (which simply runs fsync). On Darwin, the fsync_writethrough case > > executes fcntl(openLogFile, F_FULLFSYNC, 0), which does everything that > > fsync does, and then flushes the disk's write cache. The code does not > > need to call fsync directly, just fcntl. > > > > The performance of a script that executes 11,000 INSERT statements > > (outside a transaction) is pretty slow, since it requires flushing the > > disk write cache after EVERY query (transaction commit). I waited about > > 15 minutes before I just killed it. But if you do all the inserts > > inside a transaction, then I only pay the cache flush price once at > > transaction commit, and it's quite usable. > > > > I asked around on IRC, and the only place where flushing the disk cache > > is important is when syncing a WAL file. So I didn't change the > > behavior of fsync() globally for the code...just for WAL syncs. > > > > I added a new #define called HAVE_FSYNC_WRITE_THROUGH, which is defined > > only by the win32.h and darwin.h port headers. This controls the > > availability of the "fsync_writethrough" GUC setting. Thanks to the > > port headers, no configure test is needed. > > > > Thanks! Let me know what you think of the patch, and tell me what else > > I can do. > > > > - Chris > > > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > Index: doc/src/sgml/runtime.sgml > =================================================================== > RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v > retrieving revision 1.319 > diff -c -c -r1.319 runtime.sgml > *** doc/src/sgml/runtime.sgml 15 May 2005 00:26:18 -0000 1.319 > --- doc/src/sgml/runtime.sgml 17 May 2005 22:03:49 -0000 > *************** > *** 1595,1601 **** > values are > <literal>fsync</> (call <function>fsync()</> at each commit), > <literal>fdatasync</> (call <function>fdatasync()</> at each > commit), > ! <literal>fsync_writethrough</> (call <function>_commit()</> at each > commit on Windows), > <literal>open_sync</> (write WAL files with <function>open()</> > option <symbol>O_SYNC</>), and > <literal>open_datasync</> (write WAL files with <function>open()</> > option <symbol>O_DSYNC</>). > Not all of these choices are available on all platforms. > --- 1595,1601 ---- > values are > <literal>fsync</> (call <function>fsync()</> at each commit), > <literal>fdatasync</> (call <function>fdatasync()</> at each > commit), > ! <literal>fsync_writethrough</> (force write-through of any disk > write cache), > <literal>open_sync</> (write WAL files with <function>open()</> > option <symbol>O_SYNC</>), and > <literal>open_datasync</> (write WAL files with <function>open()</> > option <symbol>O_DSYNC</>). > Not all of these choices are available on all platforms. > Index: src/backend/access/transam/xlog.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v > retrieving revision 1.191 > diff -c -c -r1.191 xlog.c > *** src/backend/access/transam/xlog.c 10 May 2005 22:27:29 -0000 1.191 > --- src/backend/access/transam/xlog.c 17 May 2005 22:03:53 -0000 > *************** > *** 51,61 **** > * default method. We assume that fsync() is always available, and that > * configure determined whether fdatasync() is. > */ > - #define SYNC_METHOD_FSYNC 0 > - #define SYNC_METHOD_FDATASYNC 1 > - #define SYNC_METHOD_OPEN 2 /* used for both O_SYNC > and > - > * O_DSYNC */ > - > #if defined(O_SYNC) > #define OPEN_SYNC_FLAG O_SYNC > #else > --- 51,56 ---- > *************** > *** 79,99 **** > #define DEFAULT_SYNC_METHOD_STR "open_datasync" > #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN > #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG > ! #else > ! #if defined(HAVE_FDATASYNC) > #define DEFAULT_SYNC_METHOD_STR "fdatasync" > #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC > #define DEFAULT_SYNC_FLAGBIT 0 > ! #else > ! #ifndef FSYNC_IS_WRITE_THROUGH > #define DEFAULT_SYNC_METHOD_STR "fsync" > #else > #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough" > ! #endif > ! #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC > #define DEFAULT_SYNC_FLAGBIT 0 > #endif > - #endif > > > /* User-settable parameters */ > --- 74,92 ---- > #define DEFAULT_SYNC_METHOD_STR "open_datasync" > #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN > #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG > ! #elif defined(HAVE_FDATASYNC) > #define DEFAULT_SYNC_METHOD_STR "fdatasync" > #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC > #define DEFAULT_SYNC_FLAGBIT 0 > ! #elif !defined(HAVE_FSYNC_WRITETHROUGH_ONLY) > #define DEFAULT_SYNC_METHOD_STR "fsync" > + #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC > + #define DEFAULT_SYNC_FLAGBIT 0 > #else > #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough" > ! #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH > #define DEFAULT_SYNC_FLAGBIT 0 > #endif > > > /* User-settable parameters */ > *************** > *** 122,128 **** > > > /* these are derived from XLOG_sync_method by assign_xlog_sync_method */ > ! static int sync_method = DEFAULT_SYNC_METHOD; > static int open_sync_bit = DEFAULT_SYNC_FLAGBIT; > > #define XLOG_SYNC_BIT (enableFsync ? open_sync_bit : 0) > --- 115,121 ---- > > > /* these are derived from XLOG_sync_method by assign_xlog_sync_method */ > ! int sync_method = DEFAULT_SYNC_METHOD; > static int open_sync_bit = DEFAULT_SYNC_FLAGBIT; > > #define XLOG_SYNC_BIT (enableFsync ? open_sync_bit : 0) > *************** > *** 5249,5264 **** > int new_sync_method; > int new_sync_bit; > > - #ifndef FSYNC_IS_WRITE_THROUGH > if (pg_strcasecmp(method, "fsync") == 0) > - #else > - /* Win32 fsync() == _commit(), which writes through a write cache */ > - if (pg_strcasecmp(method, "fsync_writethrough") == 0) > - #endif > { > new_sync_method = SYNC_METHOD_FSYNC; > new_sync_bit = 0; > } > #ifdef HAVE_FDATASYNC > else if (pg_strcasecmp(method, "fdatasync") == 0) > { > --- 5242,5259 ---- > int new_sync_method; > int new_sync_bit; > > if (pg_strcasecmp(method, "fsync") == 0) > { > new_sync_method = SYNC_METHOD_FSYNC; > new_sync_bit = 0; > } > + #ifdef HAVE_FSYNC_WRITETHROUGH > + else if (pg_strcasecmp(method, "fsync_writethrough") == 0) > + { > + new_sync_method = SYNC_METHOD_FSYNC_WRITETHROUGH; > + new_sync_bit = 0; > + } > + #endif > #ifdef HAVE_FDATASYNC > else if (pg_strcasecmp(method, "fdatasync") == 0) > { > *************** > *** 5328,5339 **** > switch (sync_method) > { > case SYNC_METHOD_FSYNC: > ! if (pg_fsync(openLogFile) != 0) > ereport(PANIC, > (errcode_for_file_access(), > errmsg("could not fsync log file %u, > segment %u: %m", > openLogId, openLogSeg))); > break; > #ifdef HAVE_FDATASYNC > case SYNC_METHOD_FDATASYNC: > if (pg_fdatasync(openLogFile) != 0) > --- 5323,5343 ---- > switch (sync_method) > { > case SYNC_METHOD_FSYNC: > ! if (pg_fsync_no_writethrough(openLogFile) != 0) > ereport(PANIC, > (errcode_for_file_access(), > errmsg("could not fsync log file %u, > segment %u: %m", > openLogId, openLogSeg))); > break; > + #ifdef HAVE_FSYNC_WRITETHROUGH > + case SYNC_METHOD_FSYNC_WRITETHROUGH: > + if (pg_fsync_writethrough(openLogFile) != 0) > + ereport(PANIC, > + (errcode_for_file_access(), > + errmsg("could not fsync write-through > log file %u, segment %u: %m", > + openLogId, openLogSeg))); > + break; > + #endif > #ifdef HAVE_FDATASYNC > case SYNC_METHOD_FDATASYNC: > if (pg_fdatasync(openLogFile) != 0) > Index: src/backend/storage/file/fd.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v > retrieving revision 1.115 > diff -c -c -r1.115 fd.c > *** src/backend/storage/file/fd.c 31 Dec 2004 22:00:51 -0000 1.115 > --- src/backend/storage/file/fd.c 17 May 2005 22:03:54 -0000 > *************** > *** 232,242 **** > > > /* > ! * pg_fsync --- same as fsync except does nothing if enableFsync is off > */ > int > pg_fsync(int fd) > { > if (enableFsync) > return fsync(fd); > else > --- 232,258 ---- > > > /* > ! * pg_fsync --- do fsync with or without writethrough > */ > int > pg_fsync(int fd) > { > + #ifndef HAVE_FSYNC_WRITETHROUGH_ONLY > + if (sync_method != SYNC_METHOD_FSYNC_WRITETHROUGH) > + return pg_fsync_no_writethrough(fd); > + else > + #endif > + return pg_fsync_writethrough(fd); > + } > + > + > + /* > + * pg_fsync_no_writethrough --- same as fsync except does nothing if > + * enableFsync is off > + */ > + int > + pg_fsync_no_writethrough(int fd) > + { > if (enableFsync) > return fsync(fd); > else > *************** > *** 244,249 **** > --- 260,283 ---- > } > > /* > + * pg_fsync_writethrough > + */ > + int > + pg_fsync_writethrough(int fd) > + { > + if (enableFsync) > + #ifdef WIN32 > + return _commit(fd); > + #elif defined(__darwin__) > + return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0; > + #else > + return -1; > + #endif > + else > + return 0; > + } > + > + /* > * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is > off > * > * Not all platforms have fdatasync; treat as fsync if not available. > Index: src/include/access/xlog.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v > retrieving revision 1.60 > diff -c -c -r1.60 xlog.h > *** src/include/access/xlog.h 28 Apr 2005 21:47:17 -0000 1.60 > --- src/include/access/xlog.h 17 May 2005 22:03:56 -0000 > *************** > *** 75,80 **** > --- 75,87 ---- > */ > #define XLOG_NO_TRAN XLR_INFO_MASK > > + /* Sync methods */ > + #define SYNC_METHOD_FSYNC 0 > + #define SYNC_METHOD_FDATASYNC 1 > + #define SYNC_METHOD_OPEN 2 /* for O_SYNC > and O_DSYNC */ > + #define SYNC_METHOD_FSYNC_WRITETHROUGH 3 > + extern int sync_method; > + > /* > * List of these structs is used to pass data to XLogInsert(). > * > Index: src/include/port/darwin.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/port/darwin.h,v > retrieving revision 1.6 > diff -c -c -r1.6 darwin.h > *** src/include/port/darwin.h 23 Dec 2003 03:31:30 -0000 1.6 > --- src/include/port/darwin.h 17 May 2005 22:03:57 -0000 > *************** > *** 1 **** > --- 1,4 ---- > #define __darwin__ 1 > + > + #define HAVE_FSYNC_WRITETHROUGH > + > Index: src/include/port/win32.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/port/win32.h,v > retrieving revision 1.44 > diff -c -c -r1.44 win32.h > *** src/include/port/win32.h 24 Mar 2005 04:36:19 -0000 1.44 > --- src/include/port/win32.h 17 May 2005 22:03:57 -0000 > *************** > *** 16,23 **** > #define mkdir(a,b) mkdir(a) > > > ! #define fsync(a) _commit(a) > ! #define FSYNC_IS_WRITE_THROUGH > #define ftruncate(a,b) chsize(a,b) > > #define USES_WINSOCK > --- 16,23 ---- > #define mkdir(a,b) mkdir(a) > > > ! #define HAVE_FSYNC_WRITETHROUGH > ! #define HAVE_FSYNC_WRITETHROUGH_ONLY > #define ftruncate(a,b) chsize(a,b) > > #define USES_WINSOCK > Index: src/include/storage/fd.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/storage/fd.h,v > retrieving revision 1.50 > diff -c -c -r1.50 fd.h > *** src/include/storage/fd.h 31 Dec 2004 22:03:42 -0000 1.50 > --- src/include/storage/fd.h 17 May 2005 22:03:57 -0000 > *************** > *** 89,94 **** > --- 89,96 ---- > SubTransactionId > parentSubid); > extern void RemovePgTempFiles(void); > extern int pg_fsync(int fd); > + extern int pg_fsync_no_writethrough(int fd); > + extern int pg_fsync_writethrough(int fd); > extern int pg_fdatasync(int fd); > > /* Filename components for OpenTemporaryFile */ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster