Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Jeff Janes
On Mon, Mar 12, 2012 at 7:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

 Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
 when it's set to -1. Thanks to your result, we should increase the max to
 32MB?

 I think that might be a good idea, although I'm not entirely convinced
 that we understand why increasing wal_buffers is helping as much as it
 is.  I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
 case that we were writing buffers while holding the insert lock.
 Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
 and the other wal_buffers=32MB.  On the 16MB test, the elog() fired 15
 times in a single second shortly after the start of the test, and then
 9 more times over the rest of the test.  On the 32MB test, the elog()
 fired a total 6 times over the course of the test.  The first test got
 14320 tps, while the second got 15026 tps.  I find that quite
 surprising, because although WAL buffer wraparound is certainly bad
 (on this test, it probably brings the system completely to a halt
 until fsync() finishes) it really shouldn't lock up the system for
 multiple seconds at a time.  And yet that's what it would need to be
 doing to account for the tps discrepancy on this test, considering how
 rarely it occurs.

Where did you put the elog?  If you put it near
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START(), it is probably too
late in the code.

If someone else (like the background wal writer) is blocked on an
fsync, then AdvanceXLInsertBuffer will block on getting the
WALWriteLock.  Once it gets the lock, it will likely find it no longer
needs to do the write (because it was done by the thing that just
blocked and then released it), but at that point the damage has
already been done.  The damage is not that it has to do a write, but
that it had to block (indirectly) on an fsync which it didn't really
care about.

One possibility is that the writer should update
xlogctl-LogwrtResult.Write between finishing the write and starting
the fsync.  That way an AdvanceXLInsertBuffer that arrives during the
fsync could see that the data is needs to overwrite has already been
written, even if not yet fsynced, and that would be enough to allow it
proceed.

That would be ugly, because AdvanceXLInsertBuffer that arrives during
the write itself would check xlogctl and then block until the fsync
finished, with no way to wake up as soon as the write-part finished.
So making it work cleanly would require a redesign of the whole
locking scheme.  Which the Scaling patch is already doing.




 Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
 problem isn't so much - or isn't only - that it's expensive to write
 buffers while also holding WALInsertLock.  Maybe it's too expensive
 even to acquire WalWriteLock in the first place - that is, the real
 problem isn't so much the wraparound condition itself, but the expense
 of testing whether a possible wraparound has actually occurred.  A
 quick test suggests that we acquire WALWriteLock here much more often
 than we actually write anything while holding it, and that we get a
 big burst of WALWriteLock acquisitions here immediately after a
 checkpoint.  I don't have any proof that this is what's causing the
 tps drop with smaller wal_buffers, but I think there has to be
 something other than an actual wraparound condition causing problems
 here, because that just doesn't seem to happen frequently enough to be
 an issue.

I think my analysis is pretty much a re-wording of yours, but I'd
emphasize that getting the WALWriteLock is bad not just because they
fight over the lock, but because someone else (probably background wal
writer) is camping out on the lock while doing an fsync.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Robert Haas
On Tue, Mar 13, 2012 at 11:18 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 6:44 PM, Josh Berkus j...@agliodbs.com wrote:
 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 H.   Do you have any ability to test on XFS?

 It seems I do.

 XFS, with fsync = on:
 tps = 14746.687499 (including connections establishing)
 XFS, with fsync = off:
 tps = 25121.876560 (including connections establishing)

 No real dramatic difference there, maybe a bit slower.

 On further thought, it may be that this is just a simple case of too
 many checkpoints.  With fsync=off, we don't have to actually write all
 that dirty data back to disk.  I'm going to try cranking up
 checkpoint_segments and see what happens.

OK, this is bizarre.  I wiped out my XFS filesystem and put ext4 back,
and look at this:

tps = 19105.740878 (including connections establishing)
tps = 19687.674409 (including connections establishing)

That's a jump of nearly a third from before.  I'm not sure what's
different.  Nothing, AFAIK.  I drop and recreate the database after
every test run, so I don't see why this should be so much better,
unless ext4 degrades over time (even though the FS is nearly empty,
and I'm dropping the whole database after each test run).

Then I tried it with checkpoint_segments=3000 rather than 300.

tps = 26750.190469 (including connections establishing)

Hmm, what happens with checkpoint_segments=3000 and fsync=off?

tps = 30395.583366 (including connections establishing)

Hmm, and what if I set checkpoint_segments=300 and fsync=off?

tps = 26029.160919 (including connections establishing)

Not sure what to make of all this, yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Robert Haas
On Wed, Mar 14, 2012 at 3:29 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I think my analysis is pretty much a re-wording of yours, but I'd
 emphasize that getting the WALWriteLock is bad not just because they
 fight over the lock, but because someone else (probably background wal
 writer) is camping out on the lock while doing an fsync.

Yeah, good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Andres Freund
On Tuesday, March 13, 2012 03:26:34 AM Robert Haas wrote:
 Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
 machine.  32 clients, 1800 seconds, scale factor 300, synchronous
 commit off.
That graph makes me cringe because its pretty representative of what I have 
seen in practise. Any chance we can get a ~10s average in there? I find it 
pretty hard to read more than spikey.

Whats the filesystem + options on this?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.
 If it increases the TPS of Nate@16MB, but doesn't change the other 3
 situations much, then that suggests the IO system is driving it.
 Basically moving up to 32MB is partially innoculating against slow
 fsyncs upon log switch on that machine.

I had the idea of running pg_test_fsync on each machine.  Here, for
what it's worth, are the numbers.

Nate Boley's box:

2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 116.604 ops/sec
fdatasync 108.654 ops/sec
fsync  20.234 ops/sec
fsync_writethroughn/a
open_sync  17.979 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync  39.833 ops/sec
fdatasync  58.072 ops/sec
fsync  19.756 ops/sec
fsync_writethroughn/a
open_sync   8.425 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
 1 * 16kB open_sync write  17.408 ops/sec
 2 *  8kB open_sync writes  9.376 ops/sec
 4 *  4kB open_sync writes  4.912 ops/sec
 8 *  2kB open_sync writes  2.477 ops/sec
16 *  1kB open_sync writes  1.244 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close19.818 ops/sec
write, close, fsync19.086 ops/sec

Non-Sync'ed 8kB writes:
write   101176.089 ops/sec

IBM POWER7 server:

2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 167.009 ops/sec
fdatasync 167.091 ops/sec
fsync  47.321 ops/sec
fsync_writethroughn/a
open_sync  53.735 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync  68.891 ops/sec
fdatasync 164.597 ops/sec
fsync  50.334 ops/sec
fsync_writethroughn/a
open_sync  25.240 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
 1 * 16kB open_sync write  53.305 ops/sec
 2 *  8kB open_sync writes 23.863 ops/sec
 4 *  4kB open_sync writes 12.500 ops/sec
 8 *  2kB open_sync writes  4.734 ops/sec
16 *  1kB open_sync writes  3.063 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close48.730 ops/sec
write, close, fsync48.463 ops/sec

Non-Sync'ed 8kB writes:
write   186465.321 ops/sec

I can't rerun any more serious benchmarks on Nate Boley's box right
now due to other activity on the box.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

I reran the 32-client benchmark on the IBM machine with fsync=off and got this:

32MB: tps = 26809.442903 (including connections establishing)
16MB: tps = 26651.320145 (including connections establishing)

That's a speedup of nearly a factor of two, so clearly fsync-related
stalls are a big problem here, even with wal_buffers cranked up
through the ceiling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
performance still bounces up and down, so there's obviously some other
factor contributing to latency spikes, but equally clearly, needing to
wait for fsyncs makes it a lot worse.  I bet if we could understand
why that happens, we could improve things here a good deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB.no-fsync.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 4:55 AM, Andres Freund and...@anarazel.de wrote:
 On Tuesday, March 13, 2012 03:26:34 AM Robert Haas wrote:
 Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
 machine.  32 clients, 1800 seconds, scale factor 300, synchronous
 commit off.
 That graph makes me cringe because its pretty representative of what I have
 seen in practise. Any chance we can get a ~10s average in there? I find it
 pretty hard to read more than spikey.

I tried that but don't find it an improvement; however I'm attaching
an example for your consideration.

 Whats the filesystem + options on this?

ext4, default options.  Fedora 16, 3.2.6-3.fc16.ppc64.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB-10s.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Josh Berkus

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

H.   Do you have any ability to test on XFS?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Fujii Masao
On Wed, Mar 14, 2012 at 7:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
 performance still bounces up and down, so there's obviously some other
 factor contributing to latency spikes

Initialization of WAL file? Do the latency spikes disappear if you start
benchmark after you prepare lots of the recycled WAL files?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 10:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 7:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
 performance still bounces up and down, so there's obviously some other
 factor contributing to latency spikes

 Initialization of WAL file? Do the latency spikes disappear if you start
 benchmark after you prepare lots of the recycled WAL files?

The latency spikes seem to correspond to checkpoints, so I don't think
that's it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 6:44 PM, Josh Berkus j...@agliodbs.com wrote:
 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 H.   Do you have any ability to test on XFS?

It seems I do.

XFS, with fsync = on:
tps = 14746.687499 (including connections establishing)
XFS, with fsync = off:
tps = 25121.876560 (including connections establishing)

No real dramatic difference there, maybe a bit slower.

On further thought, it may be that this is just a simple case of too
many checkpoints.  With fsync=off, we don't have to actually write all
that dirty data back to disk.  I'm going to try cranking up
checkpoint_segments and see what happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

 Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
 when it's set to -1. Thanks to your result, we should increase the max to
 32MB?

I think that might be a good idea, although I'm not entirely convinced
that we understand why increasing wal_buffers is helping as much as it
is.  I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
case that we were writing buffers while holding the insert lock.
Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
and the other wal_buffers=32MB.  On the 16MB test, the elog() fired 15
times in a single second shortly after the start of the test, and then
9 more times over the rest of the test.  On the 32MB test, the elog()
fired a total 6 times over the course of the test.  The first test got
14320 tps, while the second got 15026 tps.  I find that quite
surprising, because although WAL buffer wraparound is certainly bad
(on this test, it probably brings the system completely to a halt
until fsync() finishes) it really shouldn't lock up the system for
multiple seconds at a time.  And yet that's what it would need to be
doing to account for the tps discrepancy on this test, considering how
rarely it occurs.

Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
problem isn't so much - or isn't only - that it's expensive to write
buffers while also holding WALInsertLock.  Maybe it's too expensive
even to acquire WalWriteLock in the first place - that is, the real
problem isn't so much the wraparound condition itself, but the expense
of testing whether a possible wraparound has actually occurred.  A
quick test suggests that we acquire WALWriteLock here much more often
than we actually write anything while holding it, and that we get a
big burst of WALWriteLock acquisitions here immediately after a
checkpoint.  I don't have any proof that this is what's causing the
tps drop with smaller wal_buffers, but I think there has to be
something other than an actual wraparound condition causing problems
here, because that just doesn't seem to happen frequently enough to be
an issue.

Anyway, maybe none of that matters at the moment.  Perhaps it's enough
to know that wal_buffers16MB can help, and just bump up the maximum
auto-tuned value a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes
On Sat, Mar 10, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

On Nate Boley's machine, the difference was ~100% increase rather than
~10%.  Do you think the difference is in the CPU architecture, or the
IO subsystem?

Also, do you have the latency numbers?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

Oh, right.  I had forgotten how dramatic the changes were in those
test runs.  I guess I should be happy that the absolute numbers on
this machine were as high as they were.  This machine seems to be
beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

That is an excellent question.  I tried looking at vmstat output, but
a funny thing kept happening: periodically, the iowait column would
show a gigantic negative number instead of a number between 0 and 100.
 This makes me a little chary of believing any of it.  Even if I did,
I'm not sure that would fully answer the question.  So I guess the
short answer is that I don't know, and I'm not even sure how I might
go about figuring it out.  Any ideas?

 Also, do you have the latency numbers?

Not at the moment, but I'll generate them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes
On Mon, Mar 12, 2012 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

 Oh, right.  I had forgotten how dramatic the changes were in those
 test runs.  I guess I should be happy that the absolute numbers on
 this machine were as high as they were.  This machine seems to be
 beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

On which machine was that happening?

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
machines) with fsync=off (as well as synchronous_commit=off still)
might help clarify things.
If it increases the TPS of Nate@16MB, but doesn't change the other 3
situations much, then that suggests the IO system is driving it.
Basically moving up to 32MB is partially innoculating against slow
fsyncs upon log switch on that machine.

Does the POWER7 have a nonvolatile cache?  What happened with
synchronous_commit=on?

Also, since all data fits in shared_buffers, making
checkpoint_segments and checkpoint_timeout be larger than the
benchmark period should remove the only other source of writing from
the system.  With no checkpoints, no evictions, and no fysncs, it is
unlikely for the remaining IO to be the bottleneck.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

 On which machine was that happening?

The IBM server.

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.
 If it increases the TPS of Nate@16MB, but doesn't change the other 3
 situations much, then that suggests the IO system is driving it.
 Basically moving up to 32MB is partially innoculating against slow
 fsyncs upon log switch on that machine.

Mmm, yeah.  Although, I think it might have been 64MB rather than 32MB
that I tested on that machine.

 Does the POWER7 have a nonvolatile cache?  What happened with
 synchronous_commit=on?

Haven't tried that yet.

 Also, since all data fits in shared_buffers, making
 checkpoint_segments and checkpoint_timeout be larger than the
 benchmark period should remove the only other source of writing from
 the system.  With no checkpoints, no evictions, and no fysncs, it is
 unlikely for the remaining IO to be the bottleneck.

Another thing to test.

Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
machine.  32 clients, 1800 seconds, scale factor 300, synchronous
commit off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB.pngattachment: tps-32MB.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-11 Thread Fujii Masao
On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
when it's set to -1. Thanks to your result, we should increase the max to
32MB?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers, redux

2012-03-10 Thread Robert Haas
I've finally been able to run some more tests of the effect of
adjusting wal_buffers to values higher than 16MB.  I ran the test on
the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
configuration settings:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

I ran three 30-minute tests at scale factor 300 with wal_buffers set
at various values from 16MB up to 160MB, in multiples of 16MB, using
pgbench with 32 clients and 32 threads in each case.  The short
version is that 32MB seems to be significantly better than 16MB, by
about 1000 tps, and after that it gets murky; full results are below.

16MB tps = 13069.420658 (including connections establishing)
16MB tps = 14370.293228 (including connections establishing)
16MB tps = 13787.219680 (including connections establishing)
32MB tps = 14916.068968 (including connections establishing)
32MB tps = 14746.448728 (including connections establishing)
32MB tps = 15110.676467 (including connections establishing)
48MB tps = 15111.981870 (including connections establishing)
48MB tps = 12824.628192 (including connections establishing)
48MB tps = 15090.280081 (including connections establishing)
64MB tps = 15382.740815 (including connections establishing)
64MB tps = 12367.942312 (including connections establishing)
64MB tps = 15195.405382 (including connections establishing)
80MB tps = 15346.080326 (including connections establishing)
80MB tps = 12791.192216 (including connections establishing)
80MB tps = 14780.804054 (including connections establishing)
96MB tps = 15476.229392 (including connections establishing)
96MB tps = 15426.012162 (including connections establishing)
96MB tps = 15548.671849 (including connections establishing)
112MB tps = 15400.669675 (including connections establishing)
112MB tps = 15676.416378 (including connections establishing)
112MB tps = 15016.651083 (including connections establishing)
128MB tps = 15811.463522 (including connections establishing)
128MB tps = 15590.343669 (including connections establishing)
128MB tps = 15256.867665 (including connections establishing)
144MB tps = 15842.131696 (including connections establishing)
144MB tps = 15669.880640 (including connections establishing)
144MB tps = 15753.460908 (including connections establishing)
160MB tps = 15658.726637 (including connections establishing)
160MB tps = 15599.600752 (including connections establishing)
160MB tps = 15311.198480 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers