Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2011-03-10 Thread Bruce Momjian
Josh Berkus wrote:
 On 12/6/10 6:10 PM, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote:
  Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
  From my run, it looks like even so regular fsync might be better than
  open_sync.
  
  But I think you need to use fsync_writethrough if you actually want 
  durability.
  
  Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
  their face.  So that's another problem with test_fsync: it omits
  fsync_writethrough.
 
 Yeah, the issue with test_fsync appears to be that it's designed to work
 without os-specific switches no matter what, not to accurately reflect
 how we access wal.

I have now modified pg_test_fsync to use O_DIRECT for O_SYNC/O_FSYNC,
and O_DSYNC, if supported, so it now matches how we use WAL (except we
don't use O_DIRECT when in 'archive' and 'hot standby' mode).  Applied
patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index d075483..49a7b3c
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***
*** 23,29 
  #define XLOG_BLCKSZ_K	(XLOG_BLCKSZ / 1024)
  
  #define LABEL_FORMAT		%-32s
! #define NA_FORMAT			LABEL_FORMAT %18s
  #define OPS_FORMAT			%9.3f ops/sec
  
  static const char *progname;
--- 23,29 
  #define XLOG_BLCKSZ_K	(XLOG_BLCKSZ / 1024)
  
  #define LABEL_FORMAT		%-32s
! #define NA_FORMAT			%18s
  #define OPS_FORMAT			%9.3f ops/sec
  
  static const char *progname;
*** handle_args(int argc, char *argv[])
*** 134,139 
--- 134,144 
  	}
  
  	printf(%d operations per test\n, ops_per_test);
+ #if PG_O_DIRECT != 0
+ 	printf(O_DIRECT supported on this platform for open_datasync and open_sync.\n);
+ #else
+ 	printf(Direct I/O is not supported on this platform.\n);
+ #endif
  }
  
  static void
*** test_sync(int writes_per_op)
*** 184,226 
  	/*
  	 * Test open_datasync if available
  	 */
! #ifdef OPEN_DATASYNC_FLAG
! 	printf(LABEL_FORMAT, open_datasync
! #if PG_O_DIRECT != 0
! 		 (non-direct I/O)*
! #endif
! 		);
  	fflush(stdout);
  
! 	if ((tmpfile = open(filename, O_RDWR | O_DSYNC, 0)) == -1)
! 		die(could not open output file);
! 	gettimeofday(start_t, NULL);
! 	for (ops = 0; ops  ops_per_test; ops++)
! 	{
! 		for (writes = 0; writes  writes_per_op; writes++)
! 			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
! die(write failed);
! 		if (lseek(tmpfile, 0, SEEK_SET) == -1)
! 			die(seek failed);
! 	}
! 	gettimeofday(stop_t, NULL);
! 	close(tmpfile);
! 	print_elapse(start_t, stop_t);
! 
! 	/*
! 	 * If O_DIRECT is enabled, test that with open_datasync
! 	 */
! #if PG_O_DIRECT != 0
  	if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
  	{
! 		printf(NA_FORMAT, o_direct, n/a**\n);
  		fs_warning = true;
  	}
  	else
  	{
! 		printf(LABEL_FORMAT, open_datasync (direct I/O));
! 		fflush(stdout);
! 
  		gettimeofday(start_t, NULL);
  		for (ops = 0; ops  ops_per_test; ops++)
  		{
--- 189,207 
  	/*
  	 * Test open_datasync if available
  	 */
! 	printf(LABEL_FORMAT, open_datasync);
  	fflush(stdout);
  
! #ifdef OPEN_DATASYNC_FLAG
  	if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
  	{
! 		printf(NA_FORMAT, n/a*\n);
  		fs_warning = true;
  	}
  	else
  	{
! 		if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
! 			die(could not open output file);
  		gettimeofday(start_t, NULL);
  		for (ops = 0; ops  ops_per_test; ops++)
  		{
*** test_sync(int writes_per_op)
*** 234,252 
  		close(tmpfile);
  		print_elapse(start_t, stop_t);
  	}
- #endif
- 
  #else
! 	printf(NA_FORMAT, open_datasync, n/a\n);
  #endif
  
  /*
   * Test fdatasync if available
   */
- #ifdef HAVE_FDATASYNC
  	printf(LABEL_FORMAT, fdatasync);
  	fflush(stdout);
  
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die(could not open output file);
  	gettimeofday(start_t, NULL);
--- 215,231 
  		close(tmpfile);
  		print_elapse(start_t, stop_t);
  	}
  #else
! 	printf(NA_FORMAT, n/a\n);
  #endif
  
  /*
   * Test fdatasync if available
   */
  	printf(LABEL_FORMAT, fdatasync);
  	fflush(stdout);
  
+ #ifdef HAVE_FDATASYNC
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die(could not open output file);
  	gettimeofday(start_t, NULL);
*** test_sync(int writes_per_op)
*** 263,269 
  	close(tmpfile);
  	print_elapse(start_t, stop_t);
  #else
! 	printf(NA_FORMAT, fdatasync, n/a\n);
  #endif
  
  /*
--- 242,248 
  	close(tmpfile);
  	print_elapse(start_t, stop_t);
  #else
! 	printf(NA_FORMAT, n/a\n);
  #endif
  
  /*
*** test_sync(int writes_per_op)
*** 

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-08 Thread Marti Raudsepp
On Tue, Dec 7, 2010 at 03:34, Tom Lane t...@sss.pgh.pa.us wrote:
 To my mind, O_DIRECT is not really the key issue here, it's whether to
 prefer O_DSYNC or fdatasync.

Since different platforms implement these primitives differently, and
it's not always clear from the header file definitions which options
are actually implemented, how about simply hard-coding a default value
for each platform?

1. This would be quite straightforward to code and document (a table
of platforms and their default wal_sync_method setting)

2. The best performing (or safest) method can be chosen on every
platform. From the above discussion it seems that Windows and OSX
should default to fdatasync_writethrough even if other methods are
available

3. It would pre-empt similar surprises if other platforms change their
header files, like what happened on Linux now.

Sounds like the simple and foolproof solution.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-08 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Tue, Dec 7, 2010 at 03:34, Tom Lane t...@sss.pgh.pa.us wrote:
 To my mind, O_DIRECT is not really the key issue here, it's whether to
 prefer O_DSYNC or fdatasync.

 Since different platforms implement these primitives differently, and
 it's not always clear from the header file definitions which options
 are actually implemented, how about simply hard-coding a default value
 for each platform?

There's not a fixed finite list of platforms we support.  In general
we prefer to avoid designing things that way at all.  If we have to have
specific exceptions for specific platforms, we grin and bear it, but for
the most part behavioral differences ought to be driven by configure's
probes for platform features.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-07 Thread Steve Singer

On 10-12-06 09:00 PM, Josh Berkus wrote:

Steve,


If you tell me which options to pgbench and which .conf file settings
you'd like to see I can probably arrange to run some tests on AIX.


Compile and run test_fsync in PGSRC/src/tools/fsync.



Attached are runs against two different disk sub-systems from a server 
running AIX 5.3.


The first one is against the local disks


Loops = 1

Simple write:
8k write  60812.454/second

Compare file sync methods using one write:
open_datasync 8k write  162.160/second
open_sync 8k write  158.472/second
8k write, fdatasync 158.157/second
8k write, fsync  45.382/second

Compare file sync methods using two writes:
2 open_datasync 8k writes79.472/second
2 open_sync 8k writes80.095/second
8k write, 8k write, fdatasync   159.268/second
8k write, 8k write, fsync44.725/second

Compare open_sync with different sizes:
open_sync 16k write 162.017/second
2 open_sync 8k writes79.709/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close   45.361/second
8k write, close, fsync   36.311/second





The below profile is from the same machine using an IBM DS 6800 SAN for 
storage.



Loops = 1

Simple write:
8k write  75933.027/second

Compare file sync methods using one write:
open_datasync 8k write 2762.801/second
open_sync 8k write 2453.822/second
8k write, fdatasync2867.331/second
8k write, fsync1094.048/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  1287.845/second
2 open_sync 8k writes  1332.084/second
8k write, 8k write, fdatasync  1966.411/second
8k write, 8k write, fsync  1048.354/second

Compare open_sync with different sizes:
open_sync 16k write2281.425/second
2 open_sync 8k writes  1401.561/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 1298.404/second
8k write, close, fsync 1188.582/second




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Regardless, I'm now leaning heavily toward the idea of avoiding 
 open_datasync by default given this bug, and backpatching that change to 
 at least 8.4.  I'll do some more database-level performance tests here 
 just as a final sanity check on that.  My gut feel is now that we'll 
 eventually be taking something like Marti's patch, adding some more 
 documentation around it, and applying that to HEAD as well as some 
 number of back branches.

I think we have got consensus that (1) open_datasync should not be the
default on Linux, and (2) this change needs to be back-patched.  What
is not clear to me is whether we have consensus to change the option
preference order globally, or restrict the change to just be effective
on Linux.  The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.  However, it seems
reasonable to me to suppose that open_datasync could only be a win in
very restricted scenarios and thus shouldn't be a preferred default.
Also, I dread trying to document the behavior if the preference order
becomes platform-dependent.

With the holidays fast approaching, our window to do something about
this in a timely fashion grows short.  If we don't schedule update
releases to be made this week, I think we're looking at not getting the
updates out till after New Year's.  Do we want to wait that long?  Is
anyone actually planning to do performance testing that would prove
anything about non-Linux platforms?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Greg Smith

Tom Lane wrote:

The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.


Survey of some popular platforms:

Linux:  don't want O_DIRECT by default for reliability reasons, and 
there's no clear performance win in the default config with small 
wal_buffers


Solaris:  O_DIRECT doesn't work, there's another API support has never 
been added for; see 
http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and


Windows:  Small reported gains for O_DIRECT, i.e 10% at 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01615.php


FreeBSD:  It probably works there, but I've never seen good performance 
tests of it on this platform.


Mac OS X:  Like Solaris, there's a similar mechanism but it's not 
O_DIRECT; see 
http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag 
for notes about the F_NOCACHE  feature used.  Same basic situation as 
Solaris; there's an API, but PostgreSQL doesn't use it yet.


So my guess is that some small percentage of Windows users might notice 
a change here, and some testing on FreeBSD would be useful too.  That's 
about it for platforms that I think anybody needs to worry about.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books



Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Steve Singer

On 10-12-06 06:56 PM, Greg Smith wrote:

Tom Lane wrote:

The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.


Survey of some popular platforms:



snip


So my guess is that some small percentage of Windows users might notice
a change here, and some testing on FreeBSD would be useful too. That's
about it for platforms that I think anybody needs to worry about.


If you tell me which options to pgbench and which .conf file settings 
you'd like to see I can probably arrange to run some tests on AIX.






--
Greg Smith   2ndQuadrant usg...@2ndquadrant.comBaltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 So my guess is that some small percentage of Windows users might notice 
 a change here, and some testing on FreeBSD would be useful too.  That's 
 about it for platforms that I think anybody needs to worry about.

To my mind, O_DIRECT is not really the key issue here, it's whether to
prefer O_DSYNC or fdatasync.  I looked back in the archives, and I think
that the main reason we prefer O_DSYNC when available is the results
I got here:

http://archives.postgresql.org/pgsql-hackers/2001-03/msg00381.php

which demonstrated a performance benefit on HPUX 10.20, though with a
test tool much more primitive than test_fsync.  I still have that
machine, although the disk that was in it at the time died awhile back.
What's in there now is a Seagate ST336607LW spinning at 1 RPM (166
rev/sec) and today I get numbers like this from test_fsync:

Simple write:
8k write  28331.020/second

Compare file sync methods using one write:
open_datasync 8k write  161.190/second
open_sync 8k write  156.478/second
8k write, fdatasync  54.302/second
8k write, fsync  51.810/second

Compare file sync methods using two writes:
2 open_datasync 8k writes81.702/second
2 open_sync 8k writes80.172/second
8k write, 8k write, fdatasync40.829/second
8k write, 8k write, fsync39.836/second

Compare open_sync with different sizes:
open_sync 16k write  80.192/second
2 open_sync 8k writes78.018/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close   52.527/second
8k write, close, fsync   54.092/second

So *on that rather ancient platform* there's a measurable performance
benefit to O_DSYNC, but this seems to be largely because fdatasync is
stubbed to fsync in userspace rather than because fdatasync wouldn't
be a better idea in the abstract.  Also, a lot of the argument against
fsync at the time was that it forced the kernel to iterate through all
the buffers for the WAL file to see if any were dirty.  I would imagine
that modern kernels are a tad smarter about that; and even if they
aren't, the CPU speed versus disk speed tradeoff has changed enough
since 2001 that iterating through 16MB of buffers isn't as interesting
as it was then.

So to my mind, switching to the preference order fdatasync,
fsync_writethrough, fsync seems like the thing to do.  Since we assume
fsync is always available, that means that O_DSYNC/O_SYNC will not be
the defaults on any platform.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
Steve,

 If you tell me which options to pgbench and which .conf file settings
 you'd like to see I can probably arrange to run some tests on AIX.

Compile and run test_fsync in PGSRC/src/tools/fsync.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus

 Mac OS X:  Like Solaris, there's a similar mechanism but it's not
 O_DIRECT; see
 http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag
 for notes about the F_NOCACHE  feature used.  Same basic situation as
 Solaris; there's an API, but PostgreSQL doesn't use it yet.

Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
From my run, it looks like even so regular fsync might be better than
open_sync.  Results from a MacBook:

Sidney-Stratton:fsync josh$ ./test_fsync
Loops = 1

Simple write:
8k write   2121.004/second

Compare file sync methods using one write:
(open_datasync unavailable)
open_sync 8k write 1993.833/second
(fdatasync unavailable)
8k write, fsync1878.154/second

Compare file sync methods using two writes:
(open_datasync unavailable)
2 open_sync 8k writes  1005.009/second
(fdatasync unavailable)
8k write, 8k write, fsync  1709.862/second

Compare open_sync with different sizes:
open_sync 16k write1728.803/second
2 open_sync 8k writes   969.416/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 1772.572/second
8k write, close, fsync 1939.897/second


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote:

 Mac OS X:  Like Solaris, there's a similar mechanism but it's not
 O_DIRECT; see
 http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag
 for notes about the F_NOCACHE  feature used.  Same basic situation as
 Solaris; there's an API, but PostgreSQL doesn't use it yet.

 Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
 From my run, it looks like even so regular fsync might be better than
 open_sync.

But I think you need to use fsync_writethrough if you actually want durability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
 From my run, it looks like even so regular fsync might be better than
 open_sync.

 But I think you need to use fsync_writethrough if you actually want 
 durability.

Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
their face.  So that's another problem with test_fsync: it omits
fsync_writethrough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
On 12/6/10 6:10 PM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available.
 From my run, it looks like even so regular fsync might be better than
 open_sync.
 
 But I think you need to use fsync_writethrough if you actually want 
 durability.
 
 Yeah.  Unless your laptop contains an SSD, those numbers are garbage on
 their face.  So that's another problem with test_fsync: it omits
 fsync_writethrough.

Yeah, the issue with test_fsync appears to be that it's designed to work
without os-specific switches no matter what, not to accurately reflect
how we access wal.

I'll see if I can do better.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
All,

Geirth's results from his FreeBSD 7.1 server using 8.4's test_fsync:

Simple write timing:
write0.007081

Compare fsync times on write() and non-write() descriptor:
If the times are similar, fsync() can sync data written
on a different descriptor.
write, fsync, close  5.937933
write, close, fsync  8.056394

Compare one o_sync write to two:
one 16k o_sync write 7.366927
two 8k o_sync writes15.299300

Compare file sync methods with one 8k write:
(o_dsync unavailable)
open o_sync, write   7.512682
(fdatasync unavailable)
write, fsync 5.856480

Compare file sync methods with two 8k writes:
(o_dsync unavailable)
open o_sync, write  15.472910
(fdatasync unavailable)
write, fsync 5.880319


... again, open_sync does not look very impressive.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Josh Berkus
All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():

=

In summary, O_DIRECT is a potentially powerful tool that should be used
with caution.  It is recommended that applications treat use of O_DIRECT
as a performance option which is disabled by default.

=

Second, Linus has a quote about O_DIRECT that I think should serve as an
indicator to us that directIO will never be beneficial-by-default on
Linux, and might even someday be desupported:



The right way to do it is to just not use O_DIRECT.

The whole notion of direct IO is totally braindamaged. Just say no.

This is your brain: O
This is your brain on O_DIRECT: .

Any questions?

I should have fought back harder. There really is no valid reason for EVER
using O_DIRECT. You need a buffer whatever IO you do, and it might as well
be the page cache. There are better ways to control the page cache than
play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead.

Linus
=



-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 21:55, Josh Berkus wrote:

All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():


The quote on that man page is hilarious:

The thing that has always disturbed me about O_DIRECT  is  that
 the whole interface is just stupid, and was probably designed by
 a deranged monkey on some serious mind-controlling  substances.
  -- Linus

I agree we should not enable it by default. If it's faster on some 
circumstances, the admin is free to do the research and enable it, but 
defaults need to be safe above all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-02 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 On 11/30/2010 11:17 PM, Tom Lane wrote:
  Andrew Dunstanand...@dunslane.net  writes:
  On 11/30/2010 10:09 PM, Tom Lane wrote:
  We should wait for the outcome of the discussion about whether to change
  the default wal_sync_method before worrying about this.
  we've just had a significant PGX customer encounter this with the latest
  Postgres on Redhat's freshly released flagship product. Presumably the
  default wal_sync_method will only change prospectively.
  I don't think so.  The fact that Linux is changing underneath us is a
  compelling reason for back-patching a change here.  Our older branches
  still have to be able to run on modern OS versions.  I'm also fairly
  unclear on what you think a fix would look like if it's not effectively
  a change in the default.
 
  (Hint: this *will* be changing, one way or another, in Red Hat's version
  of 8.4, since that's what RH is shipping in RHEL6.)
 
  
 
 Well, my initial idea was that if PG_O_DIRECT is non-zero, we should 
 test at startup time if we can use it on the WAL file system and inhibit 
 its use if not.
 
 Incidentally, I notice it's not used at all in test_fsync.c - should it 
 not be?

test_fsync certainly should be using PG_O_DIRECT in the same places the
backend does.  Once we decide how to handle PG_O_DIRECT, I will modify
test_fsync to match.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 As things stand, though, I think the only thing that's really open for
 discussion is how wide to make the scope of the default-change: should
 we just do it across the board, or try to limit it to some subset of the
 platforms where open_datasync is currently the default.  And that's a
 decision that ought to be informed by some performance testing.

Maybe I have a distorded view of the situation for having hit the
problem with an ubuntu upgrade, but it really does not look like a
performance item to me.

PANIC:  could not open file pg_xlog/00010001 (log file 0, 
segment 1): Invalid argument

It took me quite some time to be able to start my development cluster
again and validate some new patch to send to the list.

Now I understand that you want to test the other alternatives before to
choose among those which work, but my opinion is that it should be fixed
in HEAD before next alpha, or even ASAP. It could be that a HINT here
would be enough for contributors not to lose to much time. It would be

HINT: if you're running linux, please try to change wal_sync_method,
open_datasync is not reliable anymore in recent kernels. An example of
trustworthy setting is fdatasync.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Marti Raudsepp
On Wed, Dec 1, 2010 at 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 PANIC:  could not open file pg_xlog/00010001 (log file 0, 
 segment 1): Invalid argument

+1 I got the same error when trying to get PostgreSQL working on tmpfs
and gave up.

 Now I understand that you want to test the other alternatives before to
 choose among those which work, but my opinion is that it should be fixed
 in HEAD before next alpha, or even ASAP.

It's queued for this month's commitfest, so things are moving.

https://commitfest.postgresql.org/action/patch_view?id=432

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Robert Haas
On Wed, Dec 1, 2010 at 12:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 On 11/30/10 7:09 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

 We should wait for the outcome of the discussion about whether to change
 the default wal_sync_method before worrying about this.

 Are we considering backporting that change?

 If so, this would be another argument in favor of changing the default.

 Well, no, actually it's the same (only) argument.  We'd never consider
 back-patching such a change if our hand weren't being forced by kernel
 changes :-(

 As things stand, though, I think the only thing that's really open for
 discussion is how wide to make the scope of the default-change: should
 we just do it across the board, or try to limit it to some subset of the
 platforms where open_datasync is currently the default.  And that's a
 decision that ought to be informed by some performance testing.

If we could get a clear idea of what performance testing needs to be
done, I suspect we could find some people willing to do it.  What do
you think would be useful?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andrew Dunstan



On 11/30/2010 11:17 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 11/30/2010 10:09 PM, Tom Lane wrote:

We should wait for the outcome of the discussion about whether to change
the default wal_sync_method before worrying about this.

we've just had a significant PGX customer encounter this with the latest
Postgres on Redhat's freshly released flagship product. Presumably the
default wal_sync_method will only change prospectively.

I don't think so.  The fact that Linux is changing underneath us is a
compelling reason for back-patching a change here.  Our older branches
still have to be able to run on modern OS versions.  I'm also fairly
unclear on what you think a fix would look like if it's not effectively
a change in the default.

(Hint: this *will* be changing, one way or another, in Red Hat's version
of 8.4, since that's what RH is shipping in RHEL6.)




Well, my initial idea was that if PG_O_DIRECT is non-zero, we should 
test at startup time if we can use it on the WAL file system and inhibit 
its use if not.


Incidentally, I notice it's not used at all in test_fsync.c - should it 
not be?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus
Tom,

 Well, no, actually it's the same (only) argument.  We'd never consider
 back-patching such a change if our hand weren't being forced by kernel
 changes :-(

I think we have to back-patch the change.  The way it is now, a DBA who
thinks they are doing normal sensible configuration can cause PostgreSQL
to fail to restart.  Imagine this scenario, for example:

1) DBA, using PostgreSQL 8.3, gets worried about possible disk issues
2) DBA changes their single Ext3/4 partition to data=journal
3) DBA restarts system
4) PostgreSQL won't start
5) DBA thrashes around for a few hours while the site is down
6) DBA gets fired and the new DBA migrates to some other DBMS.

I simply can't think of *anywhere* we could put the information about
opensync and Linux/Ext which would be prominent enough to avoid the
above scenario.  And per replies, a lot of people have hit this issue
already.

It's a bug and it's our bug.  Back when we added O_DIRECT, we assumed
that support for O_DIRECT/opensync could be determined on an OS/kernel
basis, because that was the information we had.   Now it turns out that
support can vary *by filesystem* and *between remounts*.  We didn't have
any way of knowing different back in 2004, but that doesn't mean we
don't need to fix our mistaken assumption now.

Ideally, we would change our code to test support for O_DIRECT on
startup, rather than at compile time, and backport *that*.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 It's a bug and it's our bug.

No, it's a filesystem bug that this particular filesystem doesn't
support a perfectly reasonable combination of options, and doesn't
even fail gracefully as it could easily do.  But assigning blame
doesn't help much.

 Back when we added O_DIRECT, we assumed
 that support for O_DIRECT/opensync could be determined on an OS/kernel
 basis, because that was the information we had.   Now it turns out that
 support can vary *by filesystem* and *between remounts*.  We didn't have
 any way of knowing different back in 2004, but that doesn't mean we
 don't need to fix our mistaken assumption now.

 Ideally, we would change our code to test support for O_DIRECT on
 startup, rather than at compile time, and backport *that*.

I'm not convinced that a startup-time test would be enough either,
since as you note a remount might be enough to change the situation.

I think the best answer is to get out of the business of using
O_DIRECT by default, especially seeing that available evidence
suggests it might not be a performance win anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus

 I think the best answer is to get out of the business of using
 O_DIRECT by default, especially seeing that available evidence
 suggests it might not be a performance win anyway.

Well, we don't have any performance evidence ... there's an issue with
the fsync-test script which causes it not to use O_DIRECT.

However, we haven't seen any evidence for benefits on any production
filesystem, either.  So given the lack of evidence of performance
benefit, combined with the definite evidence of related failures, I
agree that simply disabling O_DIRECT by default would be a good way to
solve this.

It might be nice to add new sync_method options, osync_odirect and
odatasync_odirect for DBAs who think they know enough to tune with
non-defaults.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andres Freund
On Wednesday 01 December 2010 19:09:05 Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  It's a bug and it's our bug.
 
 No, it's a filesystem bug that this particular filesystem doesn't
 support a perfectly reasonable combination of options, and doesn't
 even fail gracefully as it could easily do.  But assigning blame
 doesn't help much.
I wouldnt call it a reasonable combination - promising fs-level data-
journaling (data=journal) and O_DIRECT are not really compatible with each 
other...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 It might be nice to add new sync_method options, osync_odirect and
 odatasync_odirect for DBAs who think they know enough to tune with
 non-defaults.

That would have the benefit that we'd not have to argue with people
who liked the current behavior (assuming there are any).  I'm not
sure there's much technical advantage, but from a political standpoint
it might be the easiest sort of change to push through.

However, this doesn't really address the question of what a sensible
choice of default is.  If there's little evidence about whether the
current flavor of open_datasync is really the fastest way, there's
none whatsoever that establishes open_datasync_without_o_direct
being a sane choice of default.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus

 However, this doesn't really address the question of what a sensible
 choice of default is.  If there's little evidence about whether the
 current flavor of open_datasync is really the fastest way, there's
 none whatsoever that establishes open_datasync_without_o_direct
 being a sane choice of default.

No, I'd switch to fdatasync.  That's the performance that most people
are familiar with anyway, since it was all Linux supported before.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andrew Dunstan



On 12/01/2010 01:41 PM, Andres Freund wrote:

On Wednesday 01 December 2010 19:09:05 Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

It's a bug and it's our bug.

No, it's a filesystem bug that this particular filesystem doesn't
support a perfectly reasonable combination of options, and doesn't
even fail gracefully as it could easily do.  But assigning blame
doesn't help much.

I wouldnt call it a reasonable combination - promising fs-level data-
journaling (data=journal) and O_DIRECT are not really compatible with each
other...




OK, but how is an application supposed to know that data journaling is 
set. Postgres doesn't even look at the FS type, let alone the mount 
options. From the app's POV it's perfectly reasonable. If the OS is 
going to provide the API, it should expect people to use it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Greg Smith

Tom Lane wrote:

I think the best answer is to get out of the business of using
O_DIRECT by default, especially seeing that available evidence
suggests it might not be a performance win anyway.
  


I was concerned that open_datasync might be doing a better job of 
forcing data out of drive write caches.  But the tests I've done on 
RHEL6 so far suggest that's not true; the write guarantees seem to be 
the same as when using fdatasync.  And there's certainly one performance 
regression possible going from fdatasync to open_datasync, the case 
where you're overflowing wal_buffers before you actually commit.


Below is a test of the troublesome behavior on the same RHEL6 system I 
gave test_fsync performance test results from at 
http://archives.postgresql.org/message-id/4ce2ebf8.4040...@2ndquadrant.com


This confirms that the kernel now defining O_DSYNC behavior as being 
available, but not actually supporting it when running the filesystem in 
journaled mode, is the problem here.  That's clearly a kernel bug and no 
fault of PostgreSQL, it's just never been exposed in a default 
configuration before.  The RedHat bugzilla report seems a bit unclear 
about what's going on here, may be worth updating that to note the 
underlying cause.


Regardless, I'm now leaning heavily toward the idea of avoiding 
open_datasync by default given this bug, and backpatching that change to 
at least 8.4.  I'll do some more database-level performance tests here 
just as a final sanity check on that.  My gut feel is now that we'll 
eventually be taking something like Marti's patch, adding some more 
documentation around it, and applying that to HEAD as well as some 
number of back branches.


$ mount | head -n 1
/dev/sda7 on / type ext4 (rw)
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync# the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 17:20:16 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
$ psql -c show wal_sync_method
wal_sync_method
-
open_datasync

[Edit /etc/fstab, change mount options to be data=journal and reboot]

$ mount | grep journal
/dev/sda7 on / type ext4 (rw,data=journal)
$ cat postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync# the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:50 EST
PANIC:  could not open file pg_xlog/00010001 (log file 
0, segment 1): Invalid argument

LOG:  startup process (PID 2690) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
$ pg_ctl stop

$ vi $PGDATA/postgresql.conf
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
wal_sync_method = fdatasync# the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:40 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Josh Berkus
Hackers,

Some of you might already be aware that this combination produces a
fatal startup crash in PostgreSQL:

1. Create an Ext3 or Ext4 partition and mount it with data=journal on a
server with linux kernel 2.6.30 or later.
2. Initdb a PGDATA on that partition
3. Start PostgreSQL with the default config from that PGDATA

This was reported a ways back:
https://bugzilla.redhat.com/show_bug.cgi?format=multipleid=567113

To explain: calling O_DIRECT on an ext3 or ext4 partition with
data=journalled causes a crash.  However, recent Linux kernels now
report support for O_DIRECT when we compile PostgreSQL, so we use it by
default.  This results in a crash by default situation with new
Linuxes if anyone sets data=journal.

We just encountered this again with another user.  With RHEL6 out now,
this seems likely to become a fairly common crash report.

Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

We should wait for the outcome of the discussion about whether to change
the default wal_sync_method before worrying about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Josh Berkus
On 11/30/10 7:09 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
 
 We should wait for the outcome of the discussion about whether to change
 the default wal_sync_method before worrying about this.

Are we considering backporting that change?

If so, this would be another argument in favor of changing the default.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Andrew Dunstan



On 11/30/2010 10:09 PM, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?

We should wait for the outcome of the discussion about whether to change
the default wal_sync_method before worrying about this.




Tom,

we've just had a significant PGX customer encounter this with the latest 
Postgres on Redhat's freshly released flagship product. Presumably the 
default wal_sync_method will only change prospectively. But this will 
feel to every user out there who encounters it like a bug in our code, 
and it needs attention. It was darn difficult to diagnose, and many 
people will just give up in disgust if they encounter it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 11/30/2010 10:09 PM, Tom Lane wrote:
 We should wait for the outcome of the discussion about whether to change
 the default wal_sync_method before worrying about this.

 we've just had a significant PGX customer encounter this with the latest 
 Postgres on Redhat's freshly released flagship product. Presumably the 
 default wal_sync_method will only change prospectively.

I don't think so.  The fact that Linux is changing underneath us is a
compelling reason for back-patching a change here.  Our older branches
still have to be able to run on modern OS versions.  I'm also fairly
unclear on what you think a fix would look like if it's not effectively
a change in the default.

(Hint: this *will* be changing, one way or another, in Red Hat's version
of 8.4, since that's what RH is shipping in RHEL6.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 11/30/10 7:09 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas?
 
 We should wait for the outcome of the discussion about whether to change
 the default wal_sync_method before worrying about this.

 Are we considering backporting that change?

 If so, this would be another argument in favor of changing the default.

Well, no, actually it's the same (only) argument.  We'd never consider
back-patching such a change if our hand weren't being forced by kernel
changes :-(

As things stand, though, I think the only thing that's really open for
discussion is how wide to make the scope of the default-change: should
we just do it across the board, or try to limit it to some subset of the
platforms where open_datasync is currently the default.  And that's a
decision that ought to be informed by some performance testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers