Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-11 Thread Jeff Janes
On Mon, Feb 6, 2012 at 6:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 4, 2012 at 2:13 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 We really need to nail that down.  Could you post the scripts (on the
 wiki) you use for running the benchmark and making the graph?  I'd
 like to see how much work it would be for me to change it to detect
 checkpoints and do something like color the markers blue during
 checkpoints and red elsewhen.

 They're pretty crude - I've attached them here.

Thanks.  But given Greg's comments about pgbench-tools, I'll probably
work on learning that instead.


 Also, I'm not sure how bad that graph really is.  The overall
 throughput is more variable, and there are a few latency spikes but
 they are few.  The dominant feature is simply that the long-term
 average is less than the initial burst.Of course the goal is to have
 a high level of throughput with a smooth latency under sustained
 conditions.  But to expect that that long-sustained smooth level of
 throughput be identical to the initial burst throughput sounds like
 more of a fantasy than a goal.

 That's probably true, but the drop-off is currently quite extreme.
 The fact that disabling full_page_writes causes throughput to increase
 by 4x is dismaying, at least to me.

I just meant that the latest graph, with full page writes off, is not that bad.

The ones with fpw on are definitely bad, more due to the latency
spikes than the throughput.



 If we want to accept the lowered
 throughput and work on the what variability/spikes are there, I think
 a good approach would be to take the long term TPS average, and dial
 the number of clients back until the initial burst TPS matches that
 long term average.  Then see if the spikes still exist over the long
 term using that dialed back number of clients.

 Hmm, I might be able to do that.

 I don't think the full-page-writes are leading to WALInsert
 contention, for example, because that would probably lead to smooth
 throughput decline, but not those latency spikes in which those entire
 seconds go by without transactions.

 Right.

 I doubt it is leading to general
 IO compaction, as the IO at that point should be pretty much
 sequential (the checkpoint has not yet reached the sync stage, the WAL
 is sequential).  So I bet that that is caused by fsyncs occurring at
 xlog segment switches, and the locking that that entails.

 That's definitely possible.

 If I
 recall, we can have a segment which is completely written to OS and in
 the process of being fsynced, and we can have another segment which is
 in some state of partially in wal_buffers and partly written out to OS
 cache, but that we can't start reusing the wal_buffers that were
 already written to OS for that segment (and therefore are
 theoretically available for reuse by the upcoming 3rd segment)  until
 the previous segments fsync has completed.  So all WALInsert's freeze.
  Or something like that.  This code has changed a bit since last time
 I studied it.

 Yeah, I need to better-characterize where the pauses are coming from,
 but I'm reluctant to invest too much effort in until Heikki's xlog
 scaling patch goes in, because I think that's going to change things
 enough that any work done now will mostly be wasted.

OK, You've scared me off from digging into the locking at wal switch
for now.  So instead I've spent some time today trying to unbreak the
xlog scaling patch and haven't had any luck.  Does anyone know if any
combination of that patches history + git master history had been
tested and verified to produce a recoverable WAL stream?  It is a
shame that make check doesn't test for that, but I don't know how to
make it do so.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-07 Thread Greg Smith

On 01/24/2012 03:53 PM, Robert Haas wrote:

There are two graphs for each branch.  The first is a scatter plot of
latency vs. transaction time.  I found that graph hard to understand,
though; I couldn't really tell what I was looking at.  So I made a
second set of graphs which graph number of completed transactions in a
given second of the test against time.


Note that you're now reinventing parts of pgbench-tools; the main two 
graphs it gives are the latency scatter plot and TPS per second.  The 
things you're likely to find interesting next are maximum latency, 90th 
percentile latency, and a delta for what changed in pg_stat_bgwriter 
during the test; those are the other things I track in that program.


I'm working toward publishing my own tests of the performance patches 
still considered useful by the end of the week.  Murphy's Law has active 
on that project since it started though--server crashed the day I left 
on a week long trip, and I've been sick ever since getting back.



First, some of
these transactions had really long latency.  Second, there are a
remarkable number of seconds all of the test during which no
transactions at all manage to complete, sometimes several seconds in a
row.


These periods have in my tests always been associated with Linux turning 
aggressive about cleaning out its write cache, either due to fsync 
request or simply crossing one of its thresholds for doing so.  My 
current record is an 80 second pause with no transactions completing.


One of things I expect to add to pgbench_tools within the next week is 
tracking how much dirty memory is accumulating during each test.  Seeing 
that graph overlaid on top of the rest makes a lot of what's happening 
at any time more obvious.  Noting when the checkpoints happen is a bit 
less interesting, because once the first one happens, they happen almost 
continuously.


You really need to track when the write and sync phases are happening 
for that to be really useful.  This circles back to why I proposed 
exposing those timing bits in pg_stat_bgwriter.  pgbench-tools already 
grabs data from it, which avoids all the mess around log file parsing.  
If I could do that more often and extract checkpoint timing from that 
data, it would make labelling graphs like these much easier to do, from 
the client that's running the benchmark even.



Third, all of the tests initially start of
processing transactions very quickly, and get slammed down very hard,
probably because the very high rate of transaction processing early on
causes a checkpoint to occur around 200 s.


At the beginning of a write-heavy pgbench run, rate is high until one of 
these two things happen:


1) A checkpoint begins
2) Linux's write cache threshold (typically 
/proc/sys/vm/dirty_background_ratio) worth of dirty memory accumulates.


Note that (1) on its own isn't necessarily the problem, it's something 
the case that it just makes (2) happen much faster then.


Basically, the first 30 to 150 seconds of any write-heavy test always 
have an inflated speed.  You're writing into the OS cache at maximum 
speed, and none of those writes are making it to physical disk--except 
perhaps for the WAL, which is all fast and sequential.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-06 Thread Robert Haas
On Sat, Feb 4, 2012 at 2:13 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 We really need to nail that down.  Could you post the scripts (on the
 wiki) you use for running the benchmark and making the graph?  I'd
 like to see how much work it would be for me to change it to detect
 checkpoints and do something like color the markers blue during
 checkpoints and red elsewhen.

They're pretty crude - I've attached them here.

 Also, I'm not sure how bad that graph really is.  The overall
 throughput is more variable, and there are a few latency spikes but
 they are few.  The dominant feature is simply that the long-term
 average is less than the initial burst.Of course the goal is to have
 a high level of throughput with a smooth latency under sustained
 conditions.  But to expect that that long-sustained smooth level of
 throughput be identical to the initial burst throughput sounds like
 more of a fantasy than a goal.

That's probably true, but the drop-off is currently quite extreme.
The fact that disabling full_page_writes causes throughput to increase
by 4x is dismaying, at least to me.

 If we want to accept the lowered
 throughput and work on the what variability/spikes are there, I think
 a good approach would be to take the long term TPS average, and dial
 the number of clients back until the initial burst TPS matches that
 long term average.  Then see if the spikes still exist over the long
 term using that dialed back number of clients.

Hmm, I might be able to do that.

 I don't think the full-page-writes are leading to WALInsert
 contention, for example, because that would probably lead to smooth
 throughput decline, but not those latency spikes in which those entire
 seconds go by without transactions.

Right.

 I doubt it is leading to general
 IO compaction, as the IO at that point should be pretty much
 sequential (the checkpoint has not yet reached the sync stage, the WAL
 is sequential).  So I bet that that is caused by fsyncs occurring at
 xlog segment switches, and the locking that that entails.

That's definitely possible.

 If I
 recall, we can have a segment which is completely written to OS and in
 the process of being fsynced, and we can have another segment which is
 in some state of partially in wal_buffers and partly written out to OS
 cache, but that we can't start reusing the wal_buffers that were
 already written to OS for that segment (and therefore are
 theoretically available for reuse by the upcoming 3rd segment)  until
 the previous segments fsync has completed.  So all WALInsert's freeze.
  Or something like that.  This code has changed a bit since last time
 I studied it.

Yeah, I need to better-characterize where the pauses are coming from,
but I'm reluctant to invest too much effort in until Heikki's xlog
scaling patch goes in, because I think that's going to change things
enough that any work done now will mostly be wasted.

It might be worth trying a run with wal_buffers=32MB or something like
that, just to see whether that mitigates any of the locking pile-ups.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


runtestl
Description: Binary data


makeplot
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-04 Thread Jeff Janes
On Wed, Feb 1, 2012 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 25, 2012 at 8:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 24, 2012 at 4:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I think we should be working to commit XLogInsert and then Group
 Commit, then come back to the discussion.

 I definitely agree that those two have way more promise than anything
 else on the table.  However, now that I understand how badly we're
 getting screwed by checkpoints, I'm curious to do some more
 investigation of what's going on there.  I can't help thinking that in
 these particular cases the full page writes may be a bigger issue than
 the checkpoint itself.  If that turns out to be the case it's not
 likely to be something we're able to address for 9.2, but I'd like to
 at least characterize it.

 I did another run with full_page_writes=off:

 http://wiki.postgresql.org/wiki/File:Tps-master-nofpw.png

 This is run with the master branch as of commit
 4f42b546fd87a80be30c53a0f2c897acb826ad52, same as the previous tests,
 so that the results are comparable.

 The graph is pretty dramatically different from the earlier one:

 http://wiki.postgresql.org/wiki/File:Tps-master.png

 So are the TPS numbers:

 full page writes on: tps = 2200.848350 (including connections establishing)
 full page writes off: tps = 9494.757916 (including connections establishing)

 It seems pretty clear that, even with full_page_writes=off, the
 checkpoint is affecting things negatively: the first 700 seconds or so
 show much better and more consistent performance than the remaining
 portion of the test.  I'm not sure why that is, but I'm guessing there
 was a checkpoint around that time.

We really need to nail that down.  Could you post the scripts (on the
wiki) you use for running the benchmark and making the graph?  I'd
like to see how much work it would be for me to change it to detect
checkpoints and do something like color the markers blue during
checkpoints and red elsewhen.

Also, I'm not sure how bad that graph really is.  The overall
throughput is more variable, and there are a few latency spikes but
they are few.  The dominant feature is simply that the long-term
average is less than the initial burst.  Of course the goal is to have
a high level of throughput with a smooth latency under sustained
conditions.  But to expect that that long-sustained smooth level of
throughput be identical to the initial burst throughput sounds like
more of a fantasy than a goal.  If we want to accept the lowered
throughput and work on the what variability/spikes are there, I think
a good approach would be to take the long term TPS average, and dial
the number of clients back until the initial burst TPS matches that
long term average.  Then see if the spikes still exist over the long
term using that dialed back number of clients.



 But the effect is much worse with
 full_page_writes=on: the distinctive parabolic shape of those graphs
 is apparently caused by the gradually decreasing frequency of full
 page writes as the number of transactions processed since the last
 checkpoint grows.

The dilution out of not-yet-full-page-written buffers as the
checkpoint ages is probably the ultimate cause of that pattern, but I
suspect there are intermediate causes that we could tackle.

I don't think the full-page-writes are leading to WALInsert
contention, for example, because that would probably lead to smooth
throughput decline, but not those latency spikes in which those entire
seconds go by without transactions.  I doubt it is leading to general
IO compaction, as the IO at that point should be pretty much
sequential (the checkpoint has not yet reached the sync stage, the WAL
is sequential).  So I bet that that is caused by fsyncs occurring at
xlog segment switches, and the locking that that entails.  If I
recall, we can have a segment which is completely written to OS and in
the process of being fsynced, and we can have another segment which is
in some state of partially in wal_buffers and partly written out to OS
cache, but that we can't start reusing the wal_buffers that were
already written to OS for that segment (and therefore are
theoretically available for reuse by the upcoming 3rd segment)  until
the previous segments fsync has completed.  So all WALInsert's freeze.
 Or something like that.  This code has changed a bit since last time
I studied it.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-03 Thread Kevin Grittner
Robert Haas  wrote:
 
 A couple of things stand out at me from these graphs. First, some
 of these transactions had really long latency. Second, there are a
 remarkable number of seconds all of the test during which no
 transactions at all manage to complete, sometimes several seconds
 in a row. I'm not sure why. Third, all of the tests initially start
 of processing transactions very quickly, and get slammed down very
 hard, probably because the very high rate of transaction processing
 early on causes a checkpoint to occur around 200 s.
 
The amazing performance at the very start of all of these tests
suggests that there is a write-back cache (presumably battery-backed)
which is absorbing writes until the cache becomes full, at which
point actual disk writes become a bottleneck.  The problems you
mention here, where no transactions complete, sounds like the usual
problem that many people have complained about on the lists, where
the controller cache becomes so overwhelmed that activity seems to
cease while the controller catches up.  Greg, and to a lesser degree
myself, have written about this for years.
 
On the nofpw graph, I wonder whether the lower write rate just takes
that much longer to fill the controller cache.  I don't think it's
out of the question that it could take 700 seconds instead of 200
seconds depending on whether full pages are being fsynced to WAL. 
This effect is precisely why I think that on such machines the DW
feature may be a huge help.  If one small file is being written to
and fsynced repeatedly, it stays fresh enough not to actually be
written to the disk (it will stay in OS or controller cache), and the
disks are freed up to write everything else, helping to keep the
controller cache from being overwhelmed.  (Whether patches to date
are effective at achieving this is a separate question -- I'm
convinced the concept is sound for certain important workloads.)
 
 I didn't actually log when the checkpoints were occurring,
 
It would be good to have that information if you can get it for
future tests.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-01 Thread Robert Haas
On Wed, Jan 25, 2012 at 8:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 24, 2012 at 4:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I think we should be working to commit XLogInsert and then Group
 Commit, then come back to the discussion.

 I definitely agree that those two have way more promise than anything
 else on the table.  However, now that I understand how badly we're
 getting screwed by checkpoints, I'm curious to do some more
 investigation of what's going on there.  I can't help thinking that in
 these particular cases the full page writes may be a bigger issue than
 the checkpoint itself.  If that turns out to be the case it's not
 likely to be something we're able to address for 9.2, but I'd like to
 at least characterize it.

I did another run with full_page_writes=off:

http://wiki.postgresql.org/wiki/File:Tps-master-nofpw.png

This is run with the master branch as of commit
4f42b546fd87a80be30c53a0f2c897acb826ad52, same as the previous tests,
so that the results are comparable.

The graph is pretty dramatically different from the earlier one:

http://wiki.postgresql.org/wiki/File:Tps-master.png

So are the TPS numbers:

full page writes on: tps = 2200.848350 (including connections establishing)
full page writes off: tps = 9494.757916 (including connections establishing)

It seems pretty clear that, even with full_page_writes=off, the
checkpoint is affecting things negatively: the first 700 seconds or so
show much better and more consistent performance than the remaining
portion of the test.  I'm not sure why that is, but I'm guessing there
was a checkpoint around that time.  But the effect is much worse with
full_page_writes=on: the distinctive parabolic shape of those graphs
is apparently caused by the gradually decreasing frequency of full
page writes as the number of transactions processed since the last
checkpoint grows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-02-01 Thread Simon Riggs
On Wed, Feb 1, 2012 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 25, 2012 at 8:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 24, 2012 at 4:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I think we should be working to commit XLogInsert and then Group
 Commit, then come back to the discussion.

 I definitely agree that those two have way more promise than anything
 else on the table.  However, now that I understand how badly we're
 getting screwed by checkpoints, I'm curious to do some more
 investigation of what's going on there.  I can't help thinking that in
 these particular cases the full page writes may be a bigger issue than
 the checkpoint itself.  If that turns out to be the case it's not
 likely to be something we're able to address for 9.2, but I'd like to
 at least characterize it.

 I did another run with full_page_writes=off:

 http://wiki.postgresql.org/wiki/File:Tps-master-nofpw.png

 This is run with the master branch as of commit
 4f42b546fd87a80be30c53a0f2c897acb826ad52, same as the previous tests,
 so that the results are comparable.

 The graph is pretty dramatically different from the earlier one:

 http://wiki.postgresql.org/wiki/File:Tps-master.png

 So are the TPS numbers:

 full page writes on: tps = 2200.848350 (including connections establishing)
 full page writes off: tps = 9494.757916 (including connections establishing)

 It seems pretty clear that, even with full_page_writes=off, the
 checkpoint is affecting things negatively: the first 700 seconds or so
 show much better and more consistent performance than the remaining
 portion of the test.  I'm not sure why that is, but I'm guessing there
 was a checkpoint around that time.  But the effect is much worse with
 full_page_writes=on: the distinctive parabolic shape of those graphs
 is apparently caused by the gradually decreasing frequency of full
 page writes as the number of transactions processed since the last
 checkpoint grows.

Sounds like time to test the checkpoint smoothing patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-25 Thread Robert Haas
On Tue, Jan 24, 2012 at 4:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I think we should be working to commit XLogInsert and then Group
 Commit, then come back to the discussion.

I definitely agree that those two have way more promise than anything
else on the table.  However, now that I understand how badly we're
getting screwed by checkpoints, I'm curious to do some more
investigation of what's going on there.  I can't help thinking that in
these particular cases the full page writes may be a bigger issue than
the checkpoint itself.  If that turns out to be the case it's not
likely to be something we're able to address for 9.2, but I'd like to
at least characterize it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-25 Thread Jeff Janes
On Tue, Jan 24, 2012 at 12:53 PM, Robert Haas robertmh...@gmail.com wrote:
 Early yesterday morning, I was able to use Nate Boley's test machine
 do a single 30-minute pgbench run at scale factor 300 using a variety
 of trees built with various patches, and with the -l option added to
 track latency on a per-transaction basis.  All tests were done using
 32 clients and permanent tables.  The configuration was otherwise
 identical to that described here:

 http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6...@mail.gmail.com

Previously we mostly used this machine for CPU benchmarking.  Have you
previously described the IO subsystem?  That is becoming relevant for
these types of benchmarks.   For example, is WAL separated from data?


 By doing this, I hoped to get a better understanding of (1) the
 effects of a scale factor too large to fit in shared_buffers,

In my hands, the active part of data at scale of 300 fits very
comfortably into 8GB shared_buffers.

Indeed, until you have run a very long time so that pgbench_history
gets large, everything fits in 8GB.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-25 Thread Jeff Janes
On Wed, Jan 25, 2012 at 9:09 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 25, 2012 at 12:00 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Jan 24, 2012 at 12:53 PM, Robert Haas robertmh...@gmail.com wrote:
 Early yesterday morning, I was able to use Nate Boley's test machine
 do a single 30-minute pgbench run at scale factor 300 using a variety
 of trees built with various patches, and with the -l option added to
 track latency on a per-transaction basis.  All tests were done using
 32 clients and permanent tables.  The configuration was otherwise
 identical to that described here:

 http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6...@mail.gmail.com

 Previously we mostly used this machine for CPU benchmarking.  Have you
 previously described the IO subsystem?  That is becoming relevant for
 these types of benchmarks.   For example, is WAL separated from data?

 I actually don't know much about the I/O subsystem, but, no, WAL is
 not separated from data.  I believe $PGDATA is on a SAN, but I don't
 know anything about its characteristics.

I think the checkpointing issues that Greg is exploring are important,
and I'm pretty sure that that is what is limiting your TPS in this
case.  However, I don't think we can make much progress on that front
using a machine whose IO subsystem is largely unknown, and not
tweakable.

So I think the best use of this machine would be to explore the purely
CPU based scaling issues, like freelist, CLOG, and XLogInsert.

To do that, I'd set the scale factor small enough so that entire data
set is willing to be cached dirty by the kernel, based on the kernel
parameters:

egrep .  /proc/sys/vm/dirty_*

Then set shared_buffers to be less than the needs for that scale
factor, so freelist gets exercised.

And neutralize checkpoints, by setting fsync=off, so they don't
generate physical IO that we can't control for given the current
constraints on the machine set up.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-25 Thread Robert Haas
On Wed, Jan 25, 2012 at 12:00 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Jan 24, 2012 at 12:53 PM, Robert Haas robertmh...@gmail.com wrote:
 Early yesterday morning, I was able to use Nate Boley's test machine
 do a single 30-minute pgbench run at scale factor 300 using a variety
 of trees built with various patches, and with the -l option added to
 track latency on a per-transaction basis.  All tests were done using
 32 clients and permanent tables.  The configuration was otherwise
 identical to that described here:

 http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6...@mail.gmail.com

 Previously we mostly used this machine for CPU benchmarking.  Have you
 previously described the IO subsystem?  That is becoming relevant for
 these types of benchmarks.   For example, is WAL separated from data?

I actually don't know much about the I/O subsystem, but, no, WAL is
not separated from data.  I believe $PGDATA is on a SAN, but I don't
know anything about its characteristics.

 By doing this, I hoped to get a better understanding of (1) the
 effects of a scale factor too large to fit in shared_buffers,

 In my hands, the active part of data at scale of 300 fits very
 comfortably into 8GB shared_buffers.

 Indeed, until you have run a very long time so that pgbench_history
 gets large, everything fits in 8GB.

Hmm, my mistake.  Maybe I need to crank up the scale factor some more.
 This is why good benchmarking is hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-25 Thread Nathan Boley
 I actually don't know much about the I/O subsystem, but, no, WAL is
 not separated from data.  I believe $PGDATA is on a SAN, but I don't
 know anything about its characteristics.

The computer is here:
http://www.supermicro.nl/Aplus/system/2U/2042/AS-2042G-6RF.cfm

$PGDATA is on a 5 disk SATA software raid 5.

Is there anything else that would be useful to know? ( Sorry, this
isn't a subject that I'm very familiar with )

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-24 Thread Robert Haas
Early yesterday morning, I was able to use Nate Boley's test machine
do a single 30-minute pgbench run at scale factor 300 using a variety
of trees built with various patches, and with the -l option added to
track latency on a per-transaction basis.  All tests were done using
32 clients and permanent tables.  The configuration was otherwise
identical to that described here:

http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6...@mail.gmail.com

By doing this, I hoped to get a better understanding of (1) the
effects of a scale factor too large to fit in shared_buffers, (2) what
happens on a longer test run, and (3) how response time varies
throughout the test.  First, here are the raw tps numbers:

background-clean-slru-v2: tps = 2027.282539 (including connections establishing)
buffreelistlock-reduction-v1: tps = 2625.155348 (including connections
establishing)
buffreelistlock-reduction-v1-freelist-ok-v2: tps = 2468.638149
(including connections establishing)
freelist-ok-v2: tps = 2467.065010 (including connections establishing)
group-commit-2012-01-21: tps = 2205.128609 (including connections establishing)
master: tps = 2200.848350 (including connections establishing)
removebufmgrfreelist-v1: tps = 2679.453056 (including connections establishing)
xloginsert-scale-6: tps = 3675.312202 (including connections establishing)

Obviously these numbers are fairly noisy, especially since this is
just one run, so the increases and decreases might not be all that
meaningful.  Time permitting, I'll try to run some more tests to get
my hands around that situation a little better,

Graphs are here:

http://wiki.postgresql.org/wiki/Robert_Haas_9.2CF4_Benchmark_Results

There are two graphs for each branch.  The first is a scatter plot of
latency vs. transaction time.  I found that graph hard to understand,
though; I couldn't really tell what I was looking at.  So I made a
second set of graphs which graph number of completed transactions in a
given second of the test against time.  The results are also included
on the previous page, below the latency graphs, and I find them much
more informative.

A couple of things stand out at me from these graphs.  First, some of
these transactions had really long latency.  Second, there are a
remarkable number of seconds all of the test during which no
transactions at all manage to complete, sometimes several seconds in a
row.  I'm not sure why.  Third, all of the tests initially start of
processing transactions very quickly, and get slammed down very hard,
probably because the very high rate of transaction processing early on
causes a checkpoint to occur around 200 s.  I didn't actually log when
the checkpoints were occuring, but it seems like a good guess.  It's
also interesting to wonder whether the checkpoint I/O itself causes
the drop-off, or the ensuing full page writes.  Fourth,
xloginsert-scale-6 helps quite a bit; in fact, it's the only patch
that actually changes the whole shape of the tps graph.  I'm
speculating here, but that may be because it blunts the impact of full
page writes by allowing backends to copy their full page images into
the write-ahead log in parallel.

One thing I also noticed while running the tests is that the system
was really not using much CPU time.  It was mostly idle.  That could
be because waiting for I/O leads to waiting for locks, or it could be
fundamental lock contention.  I don't know which.

A couple of obvious further tests suggest themselves: (1) rerun some
of the tests with full_page_writes=off, and (2) repeat this test with
the remaining performance-related patches.  It would be especially
interesting, I think, to see what effect the checkpoint-related
patches have on these graphs.  But I plan to drop
buffreelistlock-reduction-v1 and freelist-ok-v2 from future test runs
based on Simon's comments elsewhere.  I'm including the results here
just because these tests were already running when he made those
comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-24 Thread Simon Riggs
On Tue, Jan 24, 2012 at 8:53 PM, Robert Haas robertmh...@gmail.com wrote:

 do a single 30-minute pgbench run at scale factor 300 using a variety

Nice

A minor but necessary point: Repeated testing of the Group commit
patch when you have synch commit off is clearly pointless, so
publishing numbers for that without saying very clearly that's what's
happening doesn't really help.

 Graphs are here:

 http://wiki.postgresql.org/wiki/Robert_Haas_9.2CF4_Benchmark_Results

Nicer

 There are two graphs for each branch.  The first is a scatter plot of
 latency vs. transaction time.  I found that graph hard to understand,
 though; I couldn't really tell what I was looking at.  So I made a
 second set of graphs which graph number of completed transactions in a
 given second of the test against time.  The results are also included
 on the previous page, below the latency graphs, and I find them much
 more informative.

 A couple of things stand out at me from these graphs.  First, some of
 these transactions had really long latency.  Second, there are a
 remarkable number of seconds all of the test during which no
 transactions at all manage to complete, sometimes several seconds in a
 row.  I'm not sure why.  Third, all of the tests initially start of
 processing transactions very quickly, and get slammed down very hard,
 probably because the very high rate of transaction processing early on
 causes a checkpoint to occur around 200 s.

Check

I'm happy that this exposes characteristics I've seen and have been
discussing for a while now.

It would be useful to calculate the slow-down contribution of the
longer txns. What % of total time is given over to slowest 10% of
transactions. Looking at that, I think you'll see why we should care
about sorting out what happens in the worst cases.

  I didn't actually log when
 the checkpoints were occuring, but it seems like a good guess.  It's
 also interesting to wonder whether the checkpoint I/O itself causes
 the drop-off, or the ensuing full page writes.  Fourth,
 xloginsert-scale-6 helps quite a bit; in fact, it's the only patch
 that actually changes the whole shape of the tps graph.  I'm
 speculating here, but that may be because it blunts the impact of full
 page writes by allowing backends to copy their full page images into
 the write-ahead log in parallel.

I think we should be working to commit XLogInsert and then Group
Commit, then come back to the discussion.

There's no way we're committing other patches but not those, so
everything needs to be viewed with those two in. i.e. commit and then
re-baseline.

So I'd say no more tests just yet, but then lots of testing next week++.

 But I plan to drop
 buffreelistlock-reduction-v1 and freelist-ok-v2 from future test runs
 based on Simon's comments elsewhere.  I'm including the results here
 just because these tests were already running when he made those
 comments.

Yep

One you aren't testing is clog_history, which is designed to work in
conjunction with background_clean_slru. That last one clearly needs a
little tuning though...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches

2012-01-24 Thread Pavan Deolasee
On Wed, Jan 25, 2012 at 2:23 AM, Robert Haas robertmh...@gmail.com wrote:
 Early yesterday morning, I was able to use Nate Boley's test machine
 do a single 30-minute pgbench run at scale factor 300 using a variety
 of trees built with various patches, and with the -l option added to
 track latency on a per-transaction basis.  All tests were done using
 32 clients and permanent tables.  The configuration was otherwise
 identical to that described here:

 http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6...@mail.gmail.com

 By doing this, I hoped to get a better understanding of (1) the
 effects of a scale factor too large to fit in shared_buffers, (2) what
 happens on a longer test run, and (3) how response time varies
 throughout the test.  First, here are the raw tps numbers:

 background-clean-slru-v2: tps = 2027.282539 (including connections 
 establishing)
 buffreelistlock-reduction-v1: tps = 2625.155348 (including connections
 establishing)
 buffreelistlock-reduction-v1-freelist-ok-v2: tps = 2468.638149
 (including connections establishing)
 freelist-ok-v2: tps = 2467.065010 (including connections establishing)
 group-commit-2012-01-21: tps = 2205.128609 (including connections 
 establishing)
 master: tps = 2200.848350 (including connections establishing)
 removebufmgrfreelist-v1: tps = 2679.453056 (including connections 
 establishing)
 xloginsert-scale-6: tps = 3675.312202 (including connections establishing)

 Obviously these numbers are fairly noisy, especially since this is
 just one run, so the increases and decreases might not be all that
 meaningful.  Time permitting, I'll try to run some more tests to get
 my hands around that situation a little better,


This is nice. I am sure long running tests will point out many more
issues. If we are doing these tests, it might be more effective if we
run even longer runs, such as to get at least 3-4
checkpoints/vacuum/analyze (and other such events which can impact
final numbers either way) per test. Otherwise, one patch may stood out
if it avoids, say one checkpoint.

It would definitely help to log the checkpoint,
auto-vacuum/auto-analyze details and plot them on the graph to see if
the drop in performance has anything to do with these activities. It
might also be a good idea to collect pg statistics such as relation
sizes at the end of the run.

Thanks,
Pavan
-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers