Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2011-03-10 Thread Bruce Momjian
Josh Berkus wrote: On 12/6/10 6:10 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-08 Thread Marti Raudsepp
On Tue, Dec 7, 2010 at 03:34, Tom Lane t...@sss.pgh.pa.us wrote: To my mind, O_DIRECT is not really the key issue here, it's whether to prefer O_DSYNC or fdatasync. Since different platforms implement these primitives differently, and it's not always clear from the header file definitions which

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-08 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes: On Tue, Dec 7, 2010 at 03:34, Tom Lane t...@sss.pgh.pa.us wrote: To my mind, O_DIRECT is not really the key issue here, it's whether to prefer O_DSYNC or fdatasync. Since different platforms implement these primitives differently, and it's not always

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-07 Thread Steve Singer
On 10-12-06 09:00 PM, Josh Berkus wrote: Steve, If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. Compile and run test_fsync in PGSRC/src/tools/fsync. Attached are runs against two different disk

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: Regardless, I'm now leaning heavily toward the idea of avoiding open_datasync by default given this bug, and backpatching that change to at least 8.4. I'll do some more database-level performance tests here just as a final sanity check on that. My

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Greg Smith
Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: Linux: don't want O_DIRECT by default for reliability reasons,

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Steve Singer
On 10-12-06 06:56 PM, Greg Smith wrote: Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: snip So my guess is

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. To my mind, O_DIRECT is not really the key

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
Steve, If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. Compile and run test_fsync in PGSRC/src/tools/fsync. -- -- Josh Berkus

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
Mac OS X: Like Solaris, there's a similar mechanism but it's not O_DIRECT; see http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag for notes about the F_NOCACHE feature used. Same basic situation as Solaris; there's an API,

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Robert Haas
On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Mac OS X:  Like Solaris, there's a similar mechanism but it's not O_DIRECT; see http://stackoverflow.com/questions/2299402/how-does-one-do-raw-io-on-mac-os-x-ie-equivalent-to-linuxs-o-direct-flag for notes about the

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than open_sync. But I think you need to use

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
On 12/6/10 6:10 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 6, 2010 at 9:04 PM, Josh Berkus j...@agliodbs.com wrote: Actually, on OSX 10.5.8, o_dsync and fdatasync aren't even available. From my run, it looks like even so regular fsync might be better than

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Josh Berkus
All, Geirth's results from his FreeBSD 7.1 server using 8.4's test_fsync: Simple write timing: write0.007081 Compare fsync times on write() and non-write() descriptor: If the times are similar, fsync() can sync data written on a different descriptor. write,

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Josh Berkus
All, So, I've been doing some reading about this issue, and I think regardless of what other changes we make we should never enable O_DIRECT automatically on Linux, and it was a mistake for us to do so in the first place. First, in the Linux docs for open(): = In summary, O_DIRECT is a

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Heikki Linnakangas
On 03.12.2010 21:55, Josh Berkus wrote: All, So, I've been doing some reading about this issue, and I think regardless of what other changes we make we should never enable O_DIRECT automatically on Linux, and it was a mistake for us to do so in the first place. First, in the Linux docs for

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-02 Thread Bruce Momjian
Andrew Dunstan wrote: On 11/30/2010 11:17 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 11/30/2010 10:09 PM, Tom Lane wrote: We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this. we've

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes: As things stand, though, I think the only thing that's really open for discussion is how wide to make the scope of the default-change: should we just do it across the board, or try to limit it to some subset of the platforms where open_datasync is currently

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Marti Raudsepp
On Wed, Dec 1, 2010 at 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: PANIC:  could not open file pg_xlog/00010001 (log file 0, segment 1): Invalid argument +1 I got the same error when trying to get PostgreSQL working on tmpfs and gave up. Now I understand that you

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Robert Haas
On Wed, Dec 1, 2010 at 12:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 11/30/10 7:09 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Apparently, testing for O_DIRECT at compile time isn't adequate.  Ideas? We should wait for the outcome of the

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andrew Dunstan
On 11/30/2010 11:17 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 11/30/2010 10:09 PM, Tom Lane wrote: We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this. we've just had a significant PGX

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus
Tom, Well, no, actually it's the same (only) argument. We'd never consider back-patching such a change if our hand weren't being forced by kernel changes :-( I think we have to back-patch the change. The way it is now, a DBA who thinks they are doing normal sensible configuration can cause

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: It's a bug and it's our bug. No, it's a filesystem bug that this particular filesystem doesn't support a perfectly reasonable combination of options, and doesn't even fail gracefully as it could easily do. But assigning blame doesn't help much. Back when

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus
I think the best answer is to get out of the business of using O_DIRECT by default, especially seeing that available evidence suggests it might not be a performance win anyway. Well, we don't have any performance evidence ... there's an issue with the fsync-test script which causes it not to

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andres Freund
On Wednesday 01 December 2010 19:09:05 Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: It's a bug and it's our bug. No, it's a filesystem bug that this particular filesystem doesn't support a perfectly reasonable combination of options, and doesn't even fail gracefully as it could

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: It might be nice to add new sync_method options, osync_odirect and odatasync_odirect for DBAs who think they know enough to tune with non-defaults. That would have the benefit that we'd not have to argue with people who liked the current behavior (assuming

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Josh Berkus
However, this doesn't really address the question of what a sensible choice of default is. If there's little evidence about whether the current flavor of open_datasync is really the fastest way, there's none whatsoever that establishes open_datasync_without_o_direct being a sane choice of

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Andrew Dunstan
On 12/01/2010 01:41 PM, Andres Freund wrote: On Wednesday 01 December 2010 19:09:05 Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: It's a bug and it's our bug. No, it's a filesystem bug that this particular filesystem doesn't support a perfectly reasonable combination of options, and

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-01 Thread Greg Smith
Tom Lane wrote: I think the best answer is to get out of the business of using O_DIRECT by default, especially seeing that available evidence suggests it might not be a performance win anyway. I was concerned that open_datasync might be doing a better job of forcing data out of drive write

[HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Josh Berkus
Hackers, Some of you might already be aware that this combination produces a fatal startup crash in PostgreSQL: 1. Create an Ext3 or Ext4 partition and mount it with data=journal on a server with linux kernel 2.6.30 or later. 2. Initdb a PGDATA on that partition 3. Start PostgreSQL with the

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: Apparently, testing for O_DIRECT at compile time isn't adequate. Ideas? We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this. regards, tom lane -- Sent via

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Josh Berkus
On 11/30/10 7:09 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Apparently, testing for O_DIRECT at compile time isn't adequate. Ideas? We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this. Are we

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Andrew Dunstan
On 11/30/2010 10:09 PM, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: Apparently, testing for O_DIRECT at compile time isn't adequate. Ideas? We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this.

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 11/30/2010 10:09 PM, Tom Lane wrote: We should wait for the outcome of the discussion about whether to change the default wal_sync_method before worrying about this. we've just had a significant PGX customer encounter this with the latest

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-11-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: On 11/30/10 7:09 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Apparently, testing for O_DIRECT at compile time isn't adequate. Ideas? We should wait for the outcome of the discussion about whether to change the default wal_sync_method