Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 The checkpoint_segments seems dramatic enough to be real.  I wonder
 if the test is short enough that it never got around to re-using
 any of them, so it was doing extra writes for the initial creation
 during the test?
 
 That's exactly what I was about to suggest.  Are you starting each
 run from a fresh initdb?  If so, try running the load long enough
 that the number of WAL files stabilizes (should happen at 2x
 checkpoint_segments) and then start the test measurement.
 
default conf (xlogs not populated)
real3m49.604s
real3m47.225s
real3m45.831s

default conf (xlogs populated)
real3m45.603s
real3m45.284s
real3m45.906s

default conf + checkpoint_segments = 100 (xlogs not populated)
real4m27.629s
real4m24.496s
real4m22.832s

default conf + checkpoint_segments = 100 (xlogs populated)
real3m52.746s
real3m52.619s
real3m50.418s
 
I used ten times the number of rows, to get more meaningful results. 
To get the populated times, I just dropped the target table and
created it again; otherwise identical runs.  Clearly, pre-populating
the xlog files reduces run time, especially for a large number of xlog
files; however, I still got better performance with a smaller set of
xlog files.
 
Regarding the fact that even with the xlog files pre-populated, the
smaller set of xlog files is faster: I'm only guessing, but I suspect
the battery backed RAID controller is what's defeating conventional
wisdom here.  By writing to the same, relatively small, set of xlog
files repeatedly, some of the actual disk writes probably evaporate in
the BBU cache.  More frequent checkpoints from the smaller number of
xlog files might also have caused data to start streaming to the disk
a little sooner, minimizing write gluts later.
 
I've often seen similar benefits to the BBU cache which cause some of
the frequently-given advice here to have no discernible affect or be
counter-productive in our environment.  (I know that some doubted that
my aggressive background writer settings didn't increase disk writes,
but I couldn't even measure a difference there in the writes from OS
cache to the controller cache, much less anything which indicated it
actually increased physical disk writes.)
 
By the way, the number of xlog files seemed to always go to two above
2x checkpoint_segments.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-26 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Regarding the fact that even with the xlog files pre-populated, the
 smaller set of xlog files is faster: I'm only guessing, but I suspect
 the battery backed RAID controller is what's defeating conventional
 wisdom here.  By writing to the same, relatively small, set of xlog
 files repeatedly, some of the actual disk writes probably evaporate in
 the BBU cache.

Yeah, sounds plausible.  How big is your BBU cache?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-26 Thread Andrew Dunstan
On Fri, June 26, 2009 4:13 pm, Kevin Grittner wrote:
 By the way, the number of xlog files seemed to always go to two above
 2x checkpoint_segments.


The docs say:

There will always be at least one WAL segment file, and will normally not
be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1
files.


cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 How big is your BBU cache?
 
On this machine, I guess it is 512MB.  (Possibly 1GB, but I'm having
trouble finding the right incantation to check it at the moment, so
I'm going by what the hardware tech remembers.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-23 Thread Simon Riggs

On Mon, 2009-06-22 at 15:18 -0500, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote: 
  
  
 default postgresql.conf (comments stripped)
 max_connections = 100
 shared_buffers = 32MB

This forces ring size to be 4MB, since min(32MB/8, ringsize).

Please re-run tests with your config, ring size 4MB, which should give
same answer

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Alan Li
Hi Tom,

How much concern is there for the contention for use cases where the WAL
can't be bypassed?

Thanks, Alan

On Sun, Jun 21, 2009 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
  The following copying 3M rows(each) into a seperate table of the same
  database.

 Is this with WAL, or bypassing WAL?  Given what we've already seen,
 a lot of contention for WALInsertLock wouldn't surprise me much here.
 It should be possible to bypass that though.

regards, tom lane





Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
I was going to say that since we flush the WAL every 16MB anyway (at 
every XLOG file switch), you shouldn't see any benefit with larger ring 
buffers, since to fill 16MB of data you're not going to write more than 
16MB WAL.


I'm not convinced that WAL segment boundaries are particularly relevant
to this.  The unit of flushing is an 8K page, not a segment.


We fsync() the old WAL segment every time we switch to a new WAL 
segment. That's what I meant by flush.


If the walwriter is keeping up, it will fsync() the WAL more often, but 
16MB is the maximum distance between fsync()s.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Simon Riggs

On Mon, 2009-06-22 at 10:52 +0300, Heikki Linnakangas wrote:
 Tom Lane wrote:
  Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  I was going to say that since we flush the WAL every 16MB anyway (at 
  every XLOG file switch), you shouldn't see any benefit with larger ring 
  buffers, since to fill 16MB of data you're not going to write more than 
  16MB WAL.
  
  I'm not convinced that WAL segment boundaries are particularly relevant
  to this.  The unit of flushing is an 8K page, not a segment.
 
 We fsync() the old WAL segment every time we switch to a new WAL 
 segment. That's what I meant by flush.
 
 If the walwriter is keeping up, it will fsync() the WAL more often, but 
 16MB is the maximum distance between fsync()s.

Yes, but the fsync is performed by the process that writes the WAL, not
necessarily by the process that inserts the WAL. In perfect balance, an
inserter-of-WAL could insert an infinite amount of WAL and never need to
fsync the WAL. So the question is are we in balance between WALWriter
and COPY?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Alan Li a...@truviso.com writes:
 How much concern is there for the contention for use cases where the WAL
 can't be bypassed?

If you mean is something going to be done about it in 8.4, the
answer is no.  This is a pre-existing issue that there is no simple
fix for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Andrew Dunstan



Tom Lane wrote:

Alan Li a...@truviso.com writes:
  

How much concern is there for the contention for use cases where the WAL
can't be bypassed?



If you mean is something going to be done about it in 8.4, the
answer is no.  This is a pre-existing issue that there is no simple
fix for.


  


I thought he was asking if we intend to provide for WAL bypass on a 
table by table basis in future.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 Alan Li a...@truviso.com writes:
 How much concern is there for the contention for use cases where the WAL
 can't be bypassed?
 
 If you mean is something going to be done about it in 8.4, the
 answer is no.  This is a pre-existing issue that there is no simple
 fix for.

 I thought he was asking if we intend to provide for WAL bypass on a 
 table by table basis in future.

I thought he was asking for a solution to the problem of WALInsertLock
contention.  In any case, we have WAL bypass on a table by table basis
now, don't we?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Andrew Dunstan



Tom Lane wrote:

I thought he was asking for a solution to the problem of WALInsertLock
contention.  In any case, we have WAL bypass on a table by table basis
now, don't we?


  


If we do I'm ignorant of it ;-) How do we say Never WAL this table?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 I thought he was asking for a solution to the problem of WALInsertLock
 contention.  In any case, we have WAL bypass on a table by table basis
 now, don't we?

 If we do I'm ignorant of it ;-) How do we say Never WAL this table?

Make it a temporary table.

The more useful case for data load is create or truncate it in the
same transaction, of course.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Aidan Van Dyk
* Andrew Dunstan and...@dunslane.net [090622 10:47]:

 If we do I'm ignorant of it ;-) How do we say Never WAL this table?

CREATE TEMPORARY TABLE ...

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 The more useful case for data load is create or truncate it in the
 same transaction, of course.

Unfortunately, WAL bypass also requires not being in archive mode with
no way to turn that off w/o a server restart, aiui.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 The more useful case for data load is create or truncate it in the
 same transaction, of course.

 Unfortunately, WAL bypass also requires not being in archive mode with
 no way to turn that off w/o a server restart, aiui.

Well, if you're trying to archive then you certainly wouldn't want WAL
off, so I'm failing to see where this thread is going ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  Unfortunately, WAL bypass also requires not being in archive mode with
  no way to turn that off w/o a server restart, aiui.
 
 Well, if you're trying to archive then you certainly wouldn't want WAL
 off, so I'm failing to see where this thread is going ...

I disagree.  I'd love to be able to say please bypass WAL logging for
this bulk load because I know that I'll pick up the data during my next
full dump and I can reload it from original if I get disrupted before
then.  This is especially true when you're doing bulk loads of static or
reference data from another data source.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Simon Riggs

On Mon, 2009-06-22 at 11:14 -0400, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
  The more useful case for data load is create or truncate it in the
  same transaction, of course.
 
  Unfortunately, WAL bypass also requires not being in archive mode with
  no way to turn that off w/o a server restart, aiui.
 
 Well, if you're trying to archive then you certainly wouldn't want WAL
 off, so I'm failing to see where this thread is going ...

I was thinking it might be beneficial to be able to defer writing WAL
until COPY is complete, so heap_sync would either fsync the whole heap
file or copy the whole file to WAL.

That would avoid writing WAL piecemeal because we could just backup the
whole block, plus we wouldn't write anything at all if the COPY failed.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 I was thinking it might be beneficial to be able to defer writing WAL
 until COPY is complete, so heap_sync would either fsync the whole heap
 file or copy the whole file to WAL.

What about indexes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Simon Riggs

On Mon, 2009-06-22 at 11:24 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  I was thinking it might be beneficial to be able to defer writing WAL
  until COPY is complete, so heap_sync would either fsync the whole heap
  file or copy the whole file to WAL.
 
 What about indexes?

I was thinking we could do exactly as stated for the cases that would be
WAL-bypass now, but need to write WAL because XLogArchivingActive().
i.e. improve the exact case we are measuring here.

Yes, it is more complex than that for loading to existing tables.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Tom Lane wrote:


I thought he was asking for a solution to the problem of WALInsertLock
contention.  In any case, we have WAL bypass on a table by table basis
now, don't we?
  


  

If we do I'm ignorant of it ;-) How do we say Never WAL this table?



Make it a temporary table.


  


That doesn't help if you need the data visible in multiple sessions. But 
we're digressing from the original topic. Sorry.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 I'm not convinced that WAL segment boundaries are particularly relevant
 to this.  The unit of flushing is an 8K page, not a segment.

 We fsync() the old WAL segment every time we switch to a new WAL 
 segment. That's what I meant by flush.

 If the walwriter is keeping up, it will fsync() the WAL more often, but 
 16MB is the maximum distance between fsync()s.

I'm still not convinced --- to my mind the issue is not whether fsyncs
happen but whether the COPY process has to wait for 'em, and I don't
think that segment boundaries directly affect that.  I'd still be
interested to see similar measurements done with different wal_buffer
settings.

However, in the interests of getting this resolved in time for 8.4.0,
I propose that we just settle on 16MB as the bulkwrite ring buffer size.
There doesn't seem to be any evidence that a larger size will make for
a significant improvement, and we shouldn't allow COPY to trash a bigger
fraction of the arena than it really has to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Dimitri Fontaine

Le 22 juin 2009 à 17:24, Tom Lane t...@sss.pgh.pa.us a écrit :


Simon Riggs si...@2ndquadrant.com writes:

I was thinking it might be beneficial to be able to defer writing WAL
until COPY is complete, so heap_sync would either fsync the whole  
heap

file or copy the whole file to WAL.


What about indexes?


Skip this optimisation if there are any.
It's already Common practise to create them only after copy succeeded  
when possible for better bulk loading perfs.


Then there's also the copy + Insert ... Select ... technique.

--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 
 I wonder though whether the wal_buffers setting interacts with the
 ring size.  Has everyone who's tested this used the same 16MB
 wal_buffers setting as in Alan's original scenario?
 
I had been using his postgresql.conf file, then added autovacuum =
off.  When I tried with setting the ring size to 16MB, I accidentally
left off the step to copy the postgresql.conf file, and got better
performance.  I alternated between the postgresql.conf file from
earlier tests and the default file left there by the initdb, and got
this:
 
8.4rc1 with 16MB ring, default postgresql.conf
0m23.223s
0m23.489s
0m23.921s
 
8.4rc1 with 16MB ring, Alan's postgresql.conf
0m28.678s
0m26.171s
0m27.513s
 
default postgresql.conf (comments stripped)
max_connections = 100
shared_buffers = 32MB
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
 
Alan's postgresql.conf (comments stripped)
shared_buffers = 256MB
wal_buffers = 16MB
checkpoint_segments = 100
autovacuum = off
 
I'm not going to claim I know why, but I thought I should report it.
 
Oh, and the 8.3.7 numbers and pre-patch numbers were averaging the
same under the day-time load as the replication sync mode.  So, with
the ring size at 16MB this load is faster under 8.4 than 8.3.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote: 
 I wonder though whether the wal_buffers setting interacts with the
 ring size.  Has everyone who's tested this used the same 16MB
 wal_buffers setting as in Alan's original scenario?
 
 I had been using his postgresql.conf file, then added autovacuum =
 off.  When I tried with setting the ring size to 16MB, I accidentally
 left off the step to copy the postgresql.conf file, and got better
 performance.

Huh, that's bizarre.  I can see that increasing shared_buffers should
make no difference in this test case (we're not using them all anyway).
But why should increasing wal_buffers make it slower?  I forget the
walwriter's control algorithm at the moment ... maybe it works harder
when wal buffers are full?

BTW, I committed the change to use 16MB; that will be in RC2.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Greg Smith

On Mon, 22 Jun 2009, Kevin Grittner wrote:

When I tried with setting the ring size to 16MB, I accidentally left off 
the step to copy the postgresql.conf file, and got better performance.


Do you have happen to have a build with assertions turned on?  That is one 
common cause of performance going down via increased shared_buffers that 
people tend to run into during beta.


You can check via psql with

  show debug_assertions;

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Kevin Grittner
Greg Smith gsm...@gregsmith.com wrote: 
 
 Do you have happen to have a build with assertions turned on?
 
Nope.  I showed my ./configure options upthread, but can confirm with
pg_config:
 
BINDIR = /usr/local/pgsql-8.4rc1/bin
DOCDIR = /usr/local/pgsql-8.4rc1/share/doc
HTMLDIR = /usr/local/pgsql-8.4rc1/share/doc
INCLUDEDIR = /usr/local/pgsql-8.4rc1/include
PKGINCLUDEDIR = /usr/local/pgsql-8.4rc1/include
INCLUDEDIR-SERVER = /usr/local/pgsql-8.4rc1/include/server
LIBDIR = /usr/local/pgsql-8.4rc1/lib
PKGLIBDIR = /usr/local/pgsql-8.4rc1/lib
LOCALEDIR = /usr/local/pgsql-8.4rc1/share/locale
MANDIR = /usr/local/pgsql-8.4rc1/share/man
SHAREDIR = /usr/local/pgsql-8.4rc1/share
SYSCONFDIR = /usr/local/pgsql-8.4rc1/etc
PGXS = /usr/local/pgsql-8.4rc1/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pgsql-8.4rc1'
'--enable-integer-datetimes' '--enable-debug' '--disable-nls'
'--with-libxml'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql-8.4rc1/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4rc1
 
 You can check via psql with
 
show debug_assertions;
 
OK, we'll do it your way.  :-)
 
kgri...@athena:~/postgresql-8.4rc1 psql postgres
psql (8.4rc1)
Type help for help.

postgres=# show debug_assertions;
 debug_assertions
--
 off
(1 row)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Kevin Grittner
Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: 
 
 A 25-30% performance regression in our main bulk loading mechanism 
 should at least be explained before the release...
 
I think a performance regression of that magnitude merits holding up
a release to resolve.
 
Note that in a follow-up post showing the slow 8.4 copying on Linux,
the copy was 11.7% to 19.3% *faster* on 8.4 when the WAL writing was
suppressed:
 
http://archives.postgresql.org/pgsql-performance/2009-06/msg00219.php
 
Extracting from that post:
 
# I can reproduce that on Linux(CentoS 5.3/x86_64, Nehalem Xeon E5530)
 
# on 8.4 I get:
# 
# 3m59/4m01/3m56s runtime and a profile of
# 
# samples  %symbol name
# 636302   19.6577  XLogInsert
# 415510   12.8366  CopyReadLine
 
# on 8.3.7 I get 2m58s,2m54s,2m55s
# 
# and a profile of:
# 
# samples  %symbol name
# 460966   16.2924  XLogInsert
# 307386   10.8643  CopyReadLine
 
# If I do the same test utilizing WAL bypass the picture changes:
# 
# 8.3 runtimes:2m16,2min14s,2min22s
 
# 8.4 runtime: 2m1s,2m,1m59s
 
Is there a reason to believe that the XLogInsert part of this *only*
affects bulk loads?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 
 Huh, that's bizarre.  I can see that increasing shared_buffers
 should make no difference in this test case (we're not using them
 all anyway). But why should increasing wal_buffers make it slower? 
 I forget the walwriter's control algorithm at the moment ... maybe
 it works harder when wal buffers are full?
 
I created a postgresql.conf file with the options from the default
file, and then tried that by itself again, and with each of three
other options:
 
none
0m24.540s
0m24.630s
0m23.778s

checkpoint_segments = 100
0m30.251s
0m29.474s
0m26.604s

wal_buffers = 16MB
0m24.487s
0m23.939s
0m23.557s

shared_buffers = 256MB
0m25.885s
0m25.654s
0m24.025s
 
So the big hit seems to come from boosting checkpoint_segments,
although boosting shared_buffers seems to cause a slight slowdown. 
Boosting wal_buffers seemed to help a little.  Both of these last two,
though, are within the noise, so low confidence on those without a lot
more tests.
 
The checkpoint_segments seems dramatic enough to be real.  I wonder if
the test is short enough that it never got around to re-using any of
them, so it was doing extra writes for the initial creation during the
test?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Kevin Grittner
I wrote: 
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: 
  
 A 25-30% performance regression in our main bulk loading mechanism 
 should at least be explained before the release...
 
 I think a performance regression of that magnitude merits holding
 up a release to resolve.
 
Wow.  That sure took a long time to come through.  I posted that
days ago
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 The checkpoint_segments seems dramatic enough to be real.  I wonder if
 the test is short enough that it never got around to re-using any of
 them, so it was doing extra writes for the initial creation during the
 test?

That's exactly what I was about to suggest.  Are you starting each run
from a fresh initdb?  If so, try running the load long enough that the
number of WAL files stabilizes (should happen at 2x checkpoint_segments)
and then start the test measurement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-22 Thread Alan Li
On Mon, Jun 22, 2009 at 7:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alan Li a...@truviso.com writes:
  How much concern is there for the contention for use cases where the WAL
  can't be bypassed?

 If you mean is something going to be done about it in 8.4, the
 answer is no.  This is a pre-existing issue that there is no simple
 fix for.

regards, tom lane


 No no, I am certainly not implying anything for the 8.4 timeframe.

Moving forward, I imagine this being more of a problem for data warehouse
applications, where bulk inserts occur on existing fact tables.  In this
case, the WAL cannot be bypassed (unless the bulk insert occurs on a newly
created partition).  And since COPY is cpu-bound, it would perhaps be
advantageous to do parallel COPY's on the same table on multi-core systems,
which won't work with WAL bypassing either.

Thanks, Alan


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Greg Smith

On Sat, 20 Jun 2009, Simon Riggs wrote:


I would suggest that we check how much WAL has been written. There may
be a secondary effect or a different regression hidden in these results.


What's the easiest way to do that?  My first thought was to issue a 
checkpoint before the test (which is a good idea to make each test 
consistent anyway), save the output from pg_controldata, test, checkpoint, 
and look at the control data again.  This seems kind of clunky though, but 
still better than trolling through the OS statistics for the data.  Any 
clever ideas for a better way to measure bytes of WAL written during a 
particular chunk of code?  We may need some sort of checkpoint/sync after 
the test to get correct results, because I've noticed that the tests I run 
sometimes continue writing out buffers for a few seconds after the test 
time is finished.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Heikki Linnakangas

Tom Lane wrote:

Simon Riggs si...@2ndquadrant.com writes:

On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote:

8192 6m43.203s/6m48.293s
16384 6m24.980s/6m24.116s
32768 6m20.753s/6m22.083s
65536 6m22.913s/6m22.449s
1048576 6m23.765s/6m24.645s



The rest of the patch should have had a greater effect on tables with
thinner rows. Your results match my expectations, though I read from
them that we should use 16384, since that provides some gain, not just a
cancellation of the regression. 


+1 for using 16384 (ie, max ring buffer size 16MB).  Maybe even more.
It seems likely that other cases might have an even bigger issue than
is exhibited in the couple of test cases we have here, so we should
leave some margin for error.  Also, there's code in there to limit the
ring buffer to 1/8th of shared buffers, so we don't have to worry about
trashing the whole buffer arena in small configurations.  Any limitation
at all is still a step forward over previous releases as far as not
trashing the arena is concerned.


+1. You might get away with a smaller ring with narrow tables, where 
writing 16MB of data produces more than 16MB of WAL, but I don't think 
it can ever be the other way round. Leaving a little bit of room for 
error doesn't seem like a bad idea, though.


IIRC we experimented with an auto-tuning ring size when we worked on the 
original ring buffer patch. The idea is that you start with a small 
ring, and enlarge it in StrategyRejectBuffer. But that seems too risky 
for 8.4.


I wonder if using the small ring showed any benefit when the COPY is not 
WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, 
so the small ring might have some L2 cache benefits.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Heikki Linnakangas wrote:

Tom Lane wrote:

Simon Riggs si...@2ndquadrant.com writes:

On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote:

8192 6m43.203s/6m48.293s
16384 6m24.980s/6m24.116s
32768 6m20.753s/6m22.083s
65536 6m22.913s/6m22.449s
1048576 6m23.765s/6m24.645s



The rest of the patch should have had a greater effect on tables with
thinner rows. Your results match my expectations, though I read from
them that we should use 16384, since that provides some gain, not just a
cancellation of the regression. 


+1 for using 16384 (ie, max ring buffer size 16MB).  Maybe even more.
It seems likely that other cases might have an even bigger issue than
is exhibited in the couple of test cases we have here, so we should
leave some margin for error.  Also, there's code in there to limit the
ring buffer to 1/8th of shared buffers, so we don't have to worry about
trashing the whole buffer arena in small configurations.  Any limitation
at all is still a step forward over previous releases as far as not
trashing the arena is concerned.


+1. You might get away with a smaller ring with narrow tables, where 
writing 16MB of data produces more than 16MB of WAL, but I don't think 
it can ever be the other way round. Leaving a little bit of room for 
error doesn't seem like a bad idea, though.


yeah 16MB seems like the best choice given the available data and how 
far we are into the release cycle.




IIRC we experimented with an auto-tuning ring size when we worked on the 
original ring buffer patch. The idea is that you start with a small 
ring, and enlarge it in StrategyRejectBuffer. But that seems too risky 
for 8.4.


agreed.



I wonder if using the small ring showed any benefit when the COPY is not 
WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, 
so the small ring might have some L2 cache benefits.


I did some limited testing on that but I was unable to measure any 
significant effect - especially since the difference between wal-logged 
and not is rather small for a non-parallel COPY (ie in the above example 
you get around 6m20s runtime for wal-logged and ~5m40s in the other case).



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Simon Riggs

On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote:
 I did some limited testing on that but I was unable to measure any 
 significant effect - especially since the difference between
 wal-logged and not is rather small for a non-parallel COPY (ie in the
 above example you get around 6m20s runtime for wal-logged and ~5m40s
 in the other case).

This is a common confusion for small tests.

Non-WAL logged case causes all buffers to be written to disk at end of
COPY. This is roughly the same size as the volume of WAL written. In
logged case we do not write data blocks, they get written at next
checkpoint. So the reduction in I/O is not apparent, since during the
period of the test the I/O is about the same in both cases and less I/O
in the non-WAL logged case. On longer tests the difference shows more
clearly because the data blocks start to migrate out of shared buffers
while the COPY is still running, effecting the test results. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Simon Riggs

On Sun, 2009-06-21 at 02:45 -0400, Greg Smith wrote:
 On Sat, 20 Jun 2009, Simon Riggs wrote:
 
  I would suggest that we check how much WAL has been written. There may
  be a secondary effect or a different regression hidden in these results.
 
 What's the easiest way to do that? 

pg_current_xlog_insert_location()

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Simon Riggs wrote:

On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote:
I did some limited testing on that but I was unable to measure any 
significant effect - especially since the difference between

wal-logged and not is rather small for a non-parallel COPY (ie in the
above example you get around 6m20s runtime for wal-logged and ~5m40s
in the other case).


This is a common confusion for small tests.

Non-WAL logged case causes all buffers to be written to disk at end of
COPY. This is roughly the same size as the volume of WAL written. In
logged case we do not write data blocks, they get written at next
checkpoint. So the reduction in I/O is not apparent, since during the
period of the test the I/O is about the same in both cases and less I/O
in the non-WAL logged case. On longer tests the difference shows more
clearly because the data blocks start to migrate out of shared buffers
while the COPY is still running, effecting the test results. 


I was actually testing with and without explicit CHECKPOINTing 
before/after the load(and also with longer runs) too - the difference is 
negligible especially with only one process involved.
I think the difference is simply not that large because we are still 
mostly CPU bound within COPY on reasonably fast IO-subsystems.




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Stefan Kaltenbrunner wrote:

Simon Riggs wrote:

On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote:
I did some limited testing on that but I was unable to measure any 
significant effect - especially since the difference between

wal-logged and not is rather small for a non-parallel COPY (ie in the
above example you get around 6m20s runtime for wal-logged and ~5m40s
in the other case).


This is a common confusion for small tests.

Non-WAL logged case causes all buffers to be written to disk at end of
COPY. This is roughly the same size as the volume of WAL written. In
logged case we do not write data blocks, they get written at next
checkpoint. So the reduction in I/O is not apparent, since during the
period of the test the I/O is about the same in both cases and less I/O
in the non-WAL logged case. On longer tests the difference shows more
clearly because the data blocks start to migrate out of shared buffers
while the COPY is still running, effecting the test results. 


I was actually testing with and without explicit CHECKPOINTing 
before/after the load(and also with longer runs) too - the difference is 
negligible especially with only one process involved.
I think the difference is simply not that large because we are still 
mostly CPU bound within COPY on reasonably fast IO-subsystems.


hmm to further demonstrate that I just did some testing(same config as 
before and the 16MB for the buffer) by loading those 60M rows into a 
20GB ramdisk instead of the SAN(with a CHECKPOINT before and after).


this results in the following improvements:

16384:
wal bypass: 5min40s - 5min10s (~9%)
wal logged: 6min20s - 6min8s (~3%)


vmstat 5 output shows that the system is in fact CPU bound (ie using ~6% 
which is more or less a full core on a 16 core box) and not doing 
anything IO-wise.



 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa st
 1  0  0 19010444 118648 1541568400 0 6 1068 2151 
6  0 94  0  0
 1  0  0 18870448 118652 120400 0 6 1069 2142 
6  0 94  0  0
 1  0  0 18730568 118684 1569487200 0   185 1080 2151 
6  0 94  0  0
 1  0  0 18591236 118692 1583451600 019 1072 2144 
6  0 94  0  0
 1  0  0 18451472 118696 1597353200 046 1073 2152 
6  0 94  0  0
 1  0  0 18311720 118704 1611320400 0 7 1059 2136 
6  0 94  0  0
 1  0  0 18171968 118704 1625294400 0 0 1077 2171 
6  0 94  0  0
 1  0  0 18032088 118712 1639230000 054 1062 2138 
6  0 94  0  0
 1  0  0 17891716 118720 1653206000 0 8 1078 2176 
6  0 94  0  0



So I do think that IO is in fact not too significant for this kind of 
testing and we still have ways to go in terms of CPU efficiency in COPY.




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Robert Haas
On Sun, Jun 21, 2009 at 6:48 AM, Stefan
Kaltenbrunnerste...@kaltenbrunner.cc wrote:
 So I do think that IO is in fact not too significant for this kind of
 testing and we still have ways to go in terms of CPU efficiency in COPY.

It would be interesting to see some gprof or oprofile output from that
test.   I went back and dug up the results that I got when I profiled
this patch during initial development, and my version of the patch
applied, the profile looked like this on my system:

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 14.48  0.85 0.851 0.85 5.47  DoCopy
 10.05  1.44 0.59 1001 0.00 0.00  CopyReadLine
  5.62  1.77 0.33 1039 0.00 0.00  PageAddItem
  5.11  2.07 0.30 10400378 0.00 0.00  LWLockRelease
  4.68  2.35 0.28 1013 0.00 0.00  heap_insert
  4.34  2.60 0.26 1012 0.00 0.00  heap_formtuple
  3.83  2.83 0.23 10356158 0.00 0.00  LWLockAcquire
  3.83  3.05 0.23 1054 0.00 0.00  MarkBufferDirty
  3.32  3.25 0.20 1013 0.00 0.00  RelationGetBufferForTuple
  3.07  3.43 0.18 1005 0.00 0.00  pg_verify_mbstr_len
  2.90  3.60 0.17 1002 0.00 0.00  CopyGetData
  2.73  3.76 0.16 2030 0.00 0.00  enlargeStringInfo
  2.73  3.92 0.16 2014 0.00 0.00  pq_getbytes
  2.04  4.04 0.12 1000 0.00 0.00  InputFunctionCall

...but this might not be very representative, since I think I may have
tested it on a single-column table.  It would be interesting to see
some other results.

Simon had the idea of further improving performance by keeping the
current buffer locked (this patch just kept it pinned, but not
locked), but I didn't see an obvious clean design for that.  Heikki
also had a patch for speeding up copy, but it got dropped for 8.4 due
to time constraints.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It would be interesting to see some gprof or oprofile output from that
 test.   I went back and dug up the results that I got when I profiled
 this patch during initial development, and my version of the patch
 applied, the profile looked like this on my system:

Were you testing with a temp table?  The lack of XLogInsert in your
profile is striking.  Stefan's results upthread had it at the top,
and I got more or less the same thing here (didn't keep my numbers
unfortunately).

 Simon had the idea of further improving performance by keeping the
 current buffer locked (this patch just kept it pinned, but not
 locked), but I didn't see an obvious clean design for that.

The potential for deadlock seems sufficient reason not to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

It would be interesting to see some gprof or oprofile output from that
test.   I went back and dug up the results that I got when I profiled
this patch during initial development, and my version of the patch
applied, the profile looked like this on my system:


Were you testing with a temp table?  The lack of XLogInsert in your
profile is striking.  Stefan's results upthread had it at the top,
and I got more or less the same thing here (didn't keep my numbers
unfortunately).


I guess that profile was for the wal bypass case and it looks fairly 
similiar to what I get here(lineitem table into tmpfs - though only 30M 
rows this time to keep VM pressure low):


samples  %symbol name
286197   17.1997  DoCopy
232958   14.0002  CopyReadLine
99762 5.9954  DecodeNumber
92751 5.5741  heap_fill_tuple
84439 5.0746  pg_verify_mbstr_len
65421 3.9316  InputFunctionCall
62502 3.7562  DecodeDate
53565 3.2191  heap_form_tuple
47731 2.8685  ParseDateTime
41206 2.4764  DecodeDateTime
39936 2.4001  pg_next_dst_boundary
36093 2.1691  AllocSetAlloc
33967 2.0413  heap_compute_data_size
29921 1.7982  float4in
27227 1.6363  DetermineTimeZoneOffset
25622 1.5398  pg_atoi
24703 1.4846  pg_mblen
24495 1.4721  .plt
23912 1.4371  pg_mbstrlen_with_len
23448 1.4092  bpchar_input
20033 1.2039  date2j
16331 0.9815  date_in
15684 0.9426  bpcharin
14819 0.8906  PageAddItem
14261 0.8571  ValidateDate


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I wonder if using the small ring showed any benefit when the COPY is not 
 WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, 
 so the small ring might have some L2 cache benefits.

I think the notion that we might get a cache win from a smaller ring
is an illusion.  We're not expecting to go back and re-read from a
previously filled page in this scenario.  In any case, all of the
profiling results so far show that the CPU bottlenecks are elsewhere.
Until we can squeeze an order of magnitude out of COPY's data parsing
and/or XLogInsert, any possible cache effects will be down in the noise.

So to my mind, the only question left to answer (at least for the 8.4
cycle) is is 16MB enough, or do we want to make the ring even bigger?.
Right at the moment I'd be satisfied with 16, but I wonder whether there
are scenarios where 32MB would show a significant advantage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Robert Haas
On Sun, Jun 21, 2009 at 11:31 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It would be interesting to see some gprof or oprofile output from that
 test.   I went back and dug up the results that I got when I profiled
 this patch during initial development, and my version of the patch
 applied, the profile looked like this on my system:

 Were you testing with a temp table?

No.

 The lack of XLogInsert in your
 profile is striking.  Stefan's results upthread had it at the top,
 and I got more or less the same thing here (didn't keep my numbers
 unfortunately).

As Stephen guessed, I created the table in the same transaction that I
inserted into it...

 Simon had the idea of further improving performance by keeping the
 current buffer locked (this patch just kept it pinned, but not
 locked), but I didn't see an obvious clean design for that.

 The potential for deadlock seems sufficient reason not to do that.

Yep, that was the problem.  *thinks*

I think we had the idea of buffering up enough tuples to fill a page
(estimating conservatively so as to make sure we actually fill it),
and then inserting them all at once.  But I'm not sure how much
trouble that causes in terms of the timing of inserting index entries
and firing of after row insert triggers.  If the command ID doesn't
change, it seems like it might be OK.  Or at worst, even if the
optimization would only work in cases where there are no after row
triggers, that would still be useful to a lot of people, I think.

I haven't really spent much time on this angle of attack and
completely confess to not understanding all of the issues...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Robert Haas
On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I wonder if using the small ring showed any benefit when the COPY is not
 WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen,
 so the small ring might have some L2 cache benefits.

 I think the notion that we might get a cache win from a smaller ring
 is an illusion.  We're not expecting to go back and re-read from a
 previously filled page in this scenario.  In any case, all of the
 profiling results so far show that the CPU bottlenecks are elsewhere.
 Until we can squeeze an order of magnitude out of COPY's data parsing
 and/or XLogInsert, any possible cache effects will be down in the noise.

 So to my mind, the only question left to answer (at least for the 8.4
 cycle) is is 16MB enough, or do we want to make the ring even bigger?.
 Right at the moment I'd be satisfied with 16, but I wonder whether there
 are scenarios where 32MB would show a significant advantage.

Even 32MB is not that much.  It seems to me that in any realistic
production scenario you're going to have at least half a gig of shared
buffers, so we're really talking about at most one-sixteenth of the
shared buffer arena, and possibly quite a bit less.  I think that's
pretty conservative.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 There was some discussion of doing this in general for all inserts
 inside the indexam. The btree indexam could buffer up any inserts done
 within the transaction and keep them in an in-memory btree. Any
 lookups done within the transaction first look up in the in-memory
 tree then the disk. If the in-memory buffer fills up then we flush
 them to the index.

 The reason this is tempting is that we could then insert them all in a
 single index-merge operation which would often be more efficient than
 retail inserts.

That's not gonna work for a unique index, which unfortunately is a
pretty common case ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
I wonder if using the small ring showed any benefit when the COPY is not 
WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, 
so the small ring might have some L2 cache benefits.


I think the notion that we might get a cache win from a smaller ring
is an illusion.  We're not expecting to go back and re-read from a
previously filled page in this scenario.  In any case, all of the
profiling results so far show that the CPU bottlenecks are elsewhere.
Until we can squeeze an order of magnitude out of COPY's data parsing
and/or XLogInsert, any possible cache effects will be down in the noise.


we also need to take a serious look at our locking overhead - WAL logged 
COPY is already taking a significant performance hit with just a second 
process running in parallel(into a seperate table).
I just did some testing using those 16MB buffer, the upthread mentioned 
postgresql.conf and a 20GB tmpfs.


The following copying 3M rows(each) into a seperate table of the same 
database.


processes   total time(s)   rows/s  rows/s - per core

1   17.5171428.57   171428.57
2   20.8288461.54   144230.77
4   25.5470588.24   117647.06
6   31.1578778.14   96463.02
8   41.4579710.14   72463.77
10  63  476190.48   47619.05
12  89  404494.38   33707.87
14  116 362068.97   25862.07
16  151 317880.79   19867.55



the higher the process count the more erratic the box behaves - it will 
show a very high context switch rate (between 30 and 40/s) a 
large amount of idle time (60%!).


example vmstat 5 output for the 12 process test:

 7  0  0 21654500  45436 1293251600 0 3 1079 336941 
34  7 59  0  0
 6  0  0 21354044  45444 1323244400 052 1068 341836 
35  7 59  0  0
 4  0  0 21053832  45452 1353147200 023 1082 341672 
35  7 59  0  0
 9  0  0 20751136  45460 138600 041 1063 344117 
35  7 59  0  0
 6  0  0 20443856  45468 1413811600 014 1079 349398 
35  7 58  0  0
 8  0  0 20136592  45476 164400 0 8 1060 351569 
35  7 58  0  0
10  0  0 19836600  45484 1474332000 0   144 1086 341533 
35  7 58  0  0
 7  0  0 19540472  45492 1503961600 094 1067 337731 
36  7 58  0  0
 2  0  0 19258244  45500 1532115600 015 1079 311394 
34  6 60  0  0




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 The following copying 3M rows(each) into a seperate table of the same 
 database.

Is this with WAL, or bypassing WAL?  Given what we've already seen,
a lot of contention for WALInsertLock wouldn't surprise me much here.
It should be possible to bypass that though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Heikki Linnakangas

Robert Haas wrote:

On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote:

So to my mind, the only question left to answer (at least for the 8.4
cycle) is is 16MB enough, or do we want to make the ring even bigger?.
Right at the moment I'd be satisfied with 16, but I wonder whether there
are scenarios where 32MB would show a significant advantage.


Even 32MB is not that much.  It seems to me that in any realistic
production scenario you're going to have at least half a gig of shared
buffers, so we're really talking about at most one-sixteenth of the
shared buffer arena, and possibly quite a bit less.  I think that's
pretty conservative.


I was going to say that since we flush the WAL every 16MB anyway (at 
every XLOG file switch), you shouldn't see any benefit with larger ring 
buffers, since to fill 16MB of data you're not going to write more than 
16MB WAL. But then I realized that that's not true if you have an 
unusually low fillfactor. If you only fill each page say 50% full, 
you're going to use 32MB worth of data pages but only write 16MB of WAL. 
And maybe you could have a situation like that with very wide rows as 
well, with wasted space on each page that's not enough to store one more 
 row.


Could you test that scenario?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I was going to say that since we flush the WAL every 16MB anyway (at 
 every XLOG file switch), you shouldn't see any benefit with larger ring 
 buffers, since to fill 16MB of data you're not going to write more than 
 16MB WAL.

I'm not convinced that WAL segment boundaries are particularly relevant
to this.  The unit of flushing is an 8K page, not a segment.

I wonder though whether the wal_buffers setting interacts with the
ring size.  Has everyone who's tested this used the same 16MB
wal_buffers setting as in Alan's original scenario?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
The following copying 3M rows(each) into a seperate table of the same 
database.


Is this with WAL, or bypassing WAL?  Given what we've already seen,
a lot of contention for WALInsertLock wouldn't surprise me much here.
It should be possible to bypass that though.


this was with WAL. here are the numbers for bypass:

processestotal time(s)rows/srows/s - per core

1   15  20  20
2   15.1397350.99   198675.5
4   15.2789473.68   197368.42
6   15.31176470.59  196078.43
8   16.21481481.48  185185.19
10  21.91369863.01  136986.3
12  22.71585903.08  132158.59
14  25.2166.67  119047.62
16  27.91720430.11  107526.88


runtimes grew very short here but the numbers still seem sane and if you 
compare them to what I did on real storage(though without the 16MB 
ringbuffer fix!):


http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html

you will see that for a single core there is almost no performance 
difference between ramdisk and real disk, at 8 cores there is the 
largest gap at around 45% but on 16 cores we are down to a mere 20% 
difference.
All in all it seems that we have a big locking issue with WALInsertLock 
and even with that removed we are mostly CPU limited and not IO limited 
for COPY.




Stefan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Simon Riggs

On Sun, 2009-06-21 at 12:38 -0400, Tom Lane wrote:
 Greg Stark gsst...@mit.edu writes:
  There was some discussion of doing this in general for all inserts
  inside the indexam. The btree indexam could buffer up any inserts done
  within the transaction and keep them in an in-memory btree. Any
  lookups done within the transaction first look up in the in-memory
  tree then the disk. If the in-memory buffer fills up then we flush
  them to the index.
 
  The reason this is tempting is that we could then insert them all in a
  single index-merge operation which would often be more efficient than
  retail inserts.
 
 That's not gonna work for a unique index, which unfortunately is a
 pretty common case ...

I think it can. If we fail on a unique index we fail. We aren't
expecting that, else we wouldn't be using COPY. So I reckon its
acceptable to load a whole block of rows and then load a whole blocks's
worth of index entries. The worst thing that can happen is we insert a
few extra heap rows that get aborted, which is small in comparison to
the potential gains from buffering.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-21 Thread Simon Riggs

On Sun, 2009-06-21 at 20:37 +0300, Heikki Linnakangas wrote:
 Robert Haas wrote:
  On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote:
  So to my mind, the only question left to answer (at least for the 8.4
  cycle) is is 16MB enough, or do we want to make the ring even bigger?.
  Right at the moment I'd be satisfied with 16, but I wonder whether there
  are scenarios where 32MB would show a significant advantage.
  
  Even 32MB is not that much.  It seems to me that in any realistic
  production scenario you're going to have at least half a gig of shared
  buffers, so we're really talking about at most one-sixteenth of the
  shared buffer arena, and possibly quite a bit less.  I think that's
  pretty conservative.
 
 I was going to say that since we flush the WAL every 16MB anyway (at 
 every XLOG file switch), you shouldn't see any benefit with larger ring 
 buffers, since to fill 16MB of data you're not going to write more than 
 16MB WAL. But then I realized that that's not true if you have an 
 unusually low fillfactor. If you only fill each page say 50% full, 
 you're going to use 32MB worth of data pages but only write 16MB of WAL. 
 And maybe you could have a situation like that with very wide rows as 
 well, with wasted space on each page that's not enough to store one more 
   row.

If walwriter is working correctly then it should be writing and fsyncing
WAL, while the COPY process just inserts WAL. I don't see that as an
argument to limit us to 16MB. But I take your point as being an argument
in favour of that as a consensus value for us to choose.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Greg Smith

On Sat, 20 Jun 2009, Simon Riggs wrote:


At the time, I also proposed a filled buffer list change to bufmgr to
allow bgwriter to preferentially target COPY's filled blocks, which
would also help with this effect.


One of the things I keep meaning to investigate is whether there's any 
benefit to pushing buffers the background writer cleans onto the free 
list, to speed up their subsequent allocation to needy backends.  Both 
this and the sequential scan ring buffer implementation might both benefit 
from an approach where buffers are similarly pushed onto the free list 
when they're no longer needed by the process that pulled them in. 
Buffers could move from allocated-used-filled buffer list-free list in 
the COPY buffer case, and allocated-used-free list when executing a 
sequential scan.


That would seem to me to be a more robust general approach for solving 
this class of problem than the whole ring buffer idea, which is a great 
start but bound to run into situations where the size of the buffer just 
isn't right anymore a few hardware generations down the road.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Simon Riggs

On Sat, 2009-06-20 at 02:53 -0400, Greg Smith wrote:
 On Sat, 20 Jun 2009, Simon Riggs wrote:
 
  At the time, I also proposed a filled buffer list change to bufmgr to
  allow bgwriter to preferentially target COPY's filled blocks, which
  would also help with this effect.
 
 One of the things I keep meaning to investigate is whether there's any 
 benefit to pushing buffers the background writer cleans onto the free 
 list, to speed up their subsequent allocation to needy backends.  Both 
 this and the sequential scan ring buffer implementation might both benefit 
 from an approach where buffers are similarly pushed onto the free list 
 when they're no longer needed by the process that pulled them in. 
 Buffers could move from allocated-used-filled buffer list-free list in 
 the COPY buffer case, and allocated-used-free list when executing a 
 sequential scan.

The reason for not doing that would be that we don't know that the
blocks are free to use; we know very little about them. The longer we
leave them the more likely they are to be reused, so putting buffers
onto the freelist when they aren't actually free would likely make the
cache less effective, ISTM.

With filled buffers from COPY we have a very good clue that the buffers
will no longer be needed and can treat them differently from others.
Also, if we can get the bgwriter to do some of the work then we would
have the COPY process, bgwriter and WALwriter all working together on
the data loading. (We need to make WALwriter a little smarter also, so
that it can respond to changing WAL write workloads).

 That would seem to me to be a more robust general approach for solving 
 this class of problem than the whole ring buffer idea, which is a great 
 start but bound to run into situations where the size of the buffer just 
 isn't right anymore a few hardware generations down the road.

The ring buffer optimises L2 cache, not the buffer cache in general. If
we put buffers back on the freelist that is the same as having one
global ring buffer, which would then spoil the benefit for L2 on
multi-CPU systems.

We don't see any L2 benefit with COPY yet, but it is potentially there
if we can overcome the stronger effect of the WAL costs. When the ring
buffer expands to minimise WAL overheads we also minimise benefit for
L2. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Simon Riggs

On Fri, 2009-06-19 at 22:03 -0400, Greg Smith wrote:

 This makes me wonder if in addition to the ring buffering issue, there
 isn't just plain more writing per average completed transaction in 8.4
 with this type of COPY.

I would suggest that we check how much WAL has been written. There may
be a secondary effect or a different regression hidden in these results.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Greg Stark
On Sat, Jun 20, 2009 at 9:22 AM, Simon Riggssi...@2ndquadrant.com wrote:
 That would seem to me to be a more robust general approach for solving
 this class of problem than the whole ring buffer idea, which is a great
 start but bound to run into situations where the size of the buffer just
 isn't right anymore a few hardware generations down the road.

 The ring buffer optimises L2 cache, not the buffer cache in general. If
 we put buffers back on the freelist that is the same as having one
 global ring buffer, which would then spoil the benefit for L2 on
 multi-CPU systems.

Incidentally a ring buffer is something else. We're talking about a
clock-sweep algorithm.

The whole point of the clock-sweep algorithm is that it's an
approximated LRU but with no single point of contention like a list
pointer. Doing this would be undermining that advantage.

I don't understand what you mean by size of the buffer either. The
only real parameter in the clock sweep algorithm is how many distinct
counter values there are. The more values the closer to an LRU the
result is. it isn't really tied to hardware in any way, just the usage
pattern.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Stefan Kaltenbrunner

Greg Smith wrote:

On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote:

In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem 
(8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast.


The server Alan identified as Solaris 10 8/07 s10x_u4wos_12b X86 has a 
Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150). 
The filesystem involved in that particular case is UFS, which I am 
suspicious of as being part of why the problem is so pronounced 
there--the default UFS tuning is pretty lightweight in terms of how much 
caching it does.  Not sure if Alan ran any tests against the big ZFS 
volume on the other sever, I think all the results he posted were from 
the UFS boot drive there too.


so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in 
more or less the same performance that 8.3 had.


It looks like it's a touch higher on our 8/07 system, it levels out at 
8192 * (haven't checked the other one yet).  I'm seeing this, using 
Alan's original test set size (to make sure I was running exactly the 
same test) and just grabbing the low/high from a set of 3 runs:


8.3.7:  0m39.266s   0m43.269s (alan:  36.2 - 39.2)

256:0m50.435s   0m51.944s (alan:  48.1 - 50.6)
1024:   0m47.299s   0m49.346s
4096:   0m43.725s   0m46.116s
8192:   0m40.715s   0m42.480s
16384:  0m41.318s   0m42.118s
65536:  0m41.675s   0m42.955s


hmm interesting - I just did a bunch of runs using the lineitem table 
from the DBT3 tests (loading 60M rows in each run) and the same config 
Alan used.


8.4(postpatch - not RC1 but that one seems to behave exactly the same way)

lineitem1
256 9min38s
512 9min20s
1024 7m44.667s/7m45.342s
2048 7m15.500s/7m17.910s
4096 7m11.424s/7m13.276s
8192 6m43.203s/6m48.293s
16384 6m24.980s/6m24.116s
32768 6m20.753s/6m22.083s
65536 6m22.913s/6m22.449s
1048576 6m23.765s/6m24.645s


8.3

6m45.650s/6m44.781s


so on this workload the sweetspot seems to be much higher than on the 
one with the narrower rows.


[...]
That's actually doing less I/O per capita, which is why it's also got 
less waiting for I/O%, but it's completing the most work.  This makes me 
wonder if in addition to the ring buffering issue, there isn't just 
plain more writing per average completed transaction in 8.4 with this 
type of COPY. This might explain why even with the expanded ring buffer, 
both Stephan and my test runs still showed a bit of a regression against 
8.3.  I'm guessing we have a second, smaller shooter here involved as well.


well yes I also suspect that there is some secondary effect at play here 
  and I believe I have seen the more IO with 8.4 thing here too but I 
have not actually paid enough attention yet to be sure.





In any case, a bump of the ring multiplier to either 4096 or 8192 
eliminates the worst of the regression here, good improvement so far.


yeah with the above numbers I would say that 8192 should remove most if 
not all of the regression. However it seems that we might have to make 
this more dynamic in the future since the behaviour seems to depend on a 
number of variables...



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Greg Stark
On Sat, Jun 20, 2009 at 12:10 PM, Greg Starkgsst...@mit.edu wrote:

 I don't understand what you mean by size of the buffer either.

Ok, having gone back and read the whole thread I understand the
context for that statement. Nevermind.



-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
and I believe I have seen the more IO with 8.4 thing here too but I 
 have not actually paid enough attention yet to be sure.

FSM/VM overhead maybe?  I think COPY IN is setting the SKIP_FSM bit,
but I wonder if there's some vestigial overhead anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Simon Riggs

On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote:
  
  8.3.7:  0m39.266s   0m43.269s (alan:  36.2 - 39.2)
  
  8192:   0m40.715s   0m42.480s
  16384:  0m41.318s   0m42.118s
  65536:  0m41.675s   0m42.955s
 
 hmm interesting - I just did a bunch of runs using the lineitem table 
 from the DBT3 tests (loading 60M rows in each run) and the same config 
 Alan used.
 
 8.4(postpatch - not RC1 but that one seems to behave exactly the same way)
 
 lineitem1
 8192 6m43.203s/6m48.293s
 16384 6m24.980s/6m24.116s
 32768 6m20.753s/6m22.083s
 65536 6m22.913s/6m22.449s
 1048576 6m23.765s/6m24.645s

 8.3
 
 6m45.650s/6m44.781s

 so on this workload the sweetspot seems to be much higher than on the 
 one with the narrower rows.

The rest of the patch should have had a greater effect on tables with
thinner rows. Your results match my expectations, though I read from
them that we should use 16384, since that provides some gain, not just a
cancellation of the regression. 

I would suggest that we leave it as a tunable parameter in this release
and remove it again once we have clear evidence of how to set it. We are
unlikely to cover conclusively how to do this before we release 8.4.

  
  In any case, a bump of the ring multiplier to either 4096 or 8192 
  eliminates the worst of the regression here, good improvement so far.
 
 yeah with the above numbers I would say that 8192 should remove most if 
 not all of the regression. However it seems that we might have to make 
 this more dynamic in the future since the behaviour seems to depend on a 
 number of variables...

I would be inclined to repeat this with multiple concurrent COPYs. We
may be able to improve on the numbers there, as well as get a
perspective on how to set parameter in real world.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Greg Smith

On Sat, 20 Jun 2009, Simon Riggs wrote:


The reason for not doing that would be that we don't know that the
blocks are free to use; we know very little about them. The longer we
leave them the more likely they are to be reused, so putting buffers
onto the freelist when they aren't actually free would likely make the
cache less effective, ISTM.


This is the prevailing wisdom.  The current design of the background 
writer tries to make a reasonable guess as to how many buffers are going 
to be allocated in the next bg_writer_delay period, and it 
locates/generates clean buffers to fill that expected need.  The idea 
would be to put those buffers, which have a good chance of being allocated 
by somebody else and therefore invalided shortly, onto the free list.


There's a certain amount of overhead the backends themselves wait for in 
order to advance the clock sweep position to find the buffers they need, 
and that adds some latency to them.  I would guess there's some potential 
for reducing that latency if the freelist was sometimes populated, which 
right now it rarely is.  While that would add contention onto the free 
list, I wonder if it might still be less than the contention on advancing 
the current buffer pool strategy point.


I don't want to go through the whole thing in this thread, just pointing 
out some similarity with an item I'd already proposed onto the TODO 
list--and that a good solution there might get rid of this whole how big 
do I make the ring buffer? situation.  Obviously actual measurements here 
would trump any theorizing as to what works better, it's hard to get any 
intuition about low-level optimizing given how complicated CPU caches are 
nowadays.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-20 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote:
 8192 6m43.203s/6m48.293s
 16384 6m24.980s/6m24.116s
 32768 6m20.753s/6m22.083s
 65536 6m22.913s/6m22.449s
 1048576 6m23.765s/6m24.645s

 The rest of the patch should have had a greater effect on tables with
 thinner rows. Your results match my expectations, though I read from
 them that we should use 16384, since that provides some gain, not just a
 cancellation of the regression. 

+1 for using 16384 (ie, max ring buffer size 16MB).  Maybe even more.
It seems likely that other cases might have an even bigger issue than
is exhibited in the couple of test cases we have here, so we should
leave some margin for error.  Also, there's code in there to limit the
ring buffer to 1/8th of shared buffers, so we don't have to worry about
trashing the whole buffer arena in small configurations.  Any limitation
at all is still a step forward over previous releases as far as not
trashing the arena is concerned.

 I would suggest that we leave it as a tunable parameter in this release
 and remove it again once we have clear evidence of how to set it.

It's not going to become an exposed tunable.  There is not evidence to
justify that, and we are down to the point of the release cycle where
any more than one-liner changes have to be looked at with considerable
reluctance.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Alan Li
It doesn't look like it's related to autovacuum.  I re-ran the test against
the two solaris boxes with autovacuum turned off and the results look about
the same.

8.3.7 - Solaris 10 11/06 s10x_u3wos_10 X86
real0m43.662s
user0m0.001s
sys 0m0.003s
real0m43.565s
user0m0.001s
sys 0m0.003s
real0m43.742s
user0m0.001s
sys 0m0.003s

8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86
real0m59.304s
user0m0.001s
sys 0m0.003s
real0m58.653s
user0m0.001s
sys 0m0.003s
real1m0.253s
user0m0.001s
sys 0m0.003s

8.3.7 - Solaris 10 8/07 s10x_u4wos_12b X86
real0m38.981s
user0m0.002s
sys 0m0.004s
real0m39.879s
user0m0.002s
sys 0m0.004s
real0m39.111s
user0m0.002s
sys 0m0.004s

8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86
real0m50.647s
user0m0.002s
sys 0m0.004s
real0m49.453s
user0m0.002s
sys 0m0.004s
real0m49.725s
user0m0.002s
sys 0m0.004s

Alan


On Thu, Jun 18, 2009 at 3:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Kevin Grittner kevin.gritt...@wicourts.gov writes:
  Tom Lane t...@sss.pgh.pa.us wrote:
  I am unable to duplicate any slowdown on this test case.

  [ Kevin can ]

 It'd be useful first off to figure out if it's a CPU or I/O issue.
 Is there any visible difference in vmstat output?  Also, try turning
 off autovacuum in both cases, just to see if that's related.

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Kenneth Marshall
Yes, you are right. I thought that they were absolute function
counts. The data makes more sense now.

Regards,
Ken

On Thu, Jun 18, 2009 at 07:03:34PM -0500, Kevin Grittner wrote:
 Kenneth Marshall k...@rice.edu wrote: 
  
  What is not clear from Stefen's function listing is how the 8.4
  server could issue 33% more XLogInsert() and CopyReadLine()
  calls than the 8.3.7 server using the same input file.
  
 I thought those were profiling numbers -- the number of times a timer
 checked what was executing and found it in that method.  Which
 suggests that those two methods are probably slower now than in 8.3.7,
 at least in some environments.
  
 -Kevin
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Andrew Dunstan



Kevin Grittner wrote:
 
8.3.7

real0m24.249s
real0m24.054s
real0m24.361s
 
8.4rc1

real0m33.503s
real0m34.198s
real0m33.931s
 

  


Ugh. This looks like a poster child case for a benchfarm ...

Is there any chance you guys could triangulate this a bit? Good initial 
triangulation points might be the end of each commitfest. (I have a 
vested interest in making sure COPY performance doesn't regress, since 
it will affect parallel restore's speed in spades.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Marko Kreen
On 6/19/09, Andrew Dunstan and...@dunslane.net wrote:
  Kevin Grittner wrote:
   8.3.7
  real0m24.249s
  real0m24.054s
  real0m24.361s
   8.4rc1
  real0m33.503s
  real0m34.198s
  real0m33.931s

  Ugh. This looks like a poster child case for a benchfarm ...

  Is there any chance you guys could triangulate this a bit? Good initial
 triangulation points might be the end of each commitfest. (I have a vested
 interest in making sure COPY performance doesn't regress, since it will
 affect parallel restore's speed in spades.)

git bisect?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Stefan Kaltenbrunner

Andrew Dunstan wrote:



Kevin Grittner wrote:
 
8.3.7

real0m24.249s
real0m24.054s
real0m24.361s
 
8.4rc1

real0m33.503s
real0m34.198s
real0m33.931s
 

  


Ugh. This looks like a poster child case for a benchfarm ...


indeed...



Is there any chance you guys could triangulate this a bit? Good initial 
triangulation points might be the end of each commitfest. (I have a 
vested interest in making sure COPY performance doesn't regress, since 
it will affect parallel restore's speed in spades.)


Maybe parallel restore is the issue why we haven't noticed this earlier. 
The case that regressed this way is WAL logged COPY, COPY that can 
bypass WAL (which typically happens in parallel restore now) is actually 
 a bit faster in my testing in 8.4.


I will try and see if I can figure out what caused the regression...


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Tom Lane
Just eyeing the code ... another thing we changed since 8.3 is to enable
posix_fadvise() calls for WAL.  Any of the complaints want to try diking
out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)?

#if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
if (!XLogArchivingActive() 
(get_sync_bit(sync_method)  PG_O_DIRECT) == 0)
(void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
#endif

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Just eyeing the code ... another thing we changed since 8.3 is to enable
posix_fadvise() calls for WAL.  Any of the complaints want to try diking
out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)?

#if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
if (!XLogArchivingActive() 
(get_sync_bit(sync_method)  PG_O_DIRECT) == 0)
(void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
#endif


doesn't seem to cause any noticable difference for me...


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Just eyeing the code ... another thing we changed since 8.3 is to enable
posix_fadvise() calls for WAL.  Any of the complaints want to try diking
out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)?

#if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
if (!XLogArchivingActive() 
(get_sync_bit(sync_method)  PG_O_DIRECT) == 0)
(void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
#endif


ok after a bit of bisecting I'm happy to announce the winner of the contest:

http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php

this patch causes a 25-30% performance regression for WAL logged copy, 
however in the WAL bypass case (maybe that was what got tested?) it 
results in a 20% performance increase.


the raw numbers using the upthread posted minimal postgresql.conf are:

post patch/wal logged: 4min10s/4min19/4min12
post patch/wal bypass: 1m55s/1m58s/2m00
prepatch/wal logged: 2m55s/3min00/2m59
prepatch/wal bypass: 2m22s/2m18s/2m20s


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Kenneth Marshall
On Fri, Jun 19, 2009 at 07:49:31PM +0200, Stefan Kaltenbrunner wrote:
 Tom Lane wrote:
 Just eyeing the code ... another thing we changed since 8.3 is to enable
 posix_fadvise() calls for WAL.  Any of the complaints want to try diking
 out this bit of code (near line 2580 in 
 src/backend/access/transam/xlog.c)?
 #if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
  if (!XLogArchivingActive() 
  (get_sync_bit(sync_method)  PG_O_DIRECT) == 0)
  (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
 #endif

 ok after a bit of bisecting I'm happy to announce the winner of the 
 contest:

 http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php

 this patch causes a 25-30% performance regression for WAL logged copy, 
 however in the WAL bypass case (maybe that was what got tested?) it results 
 in a 20% performance increase.

 the raw numbers using the upthread posted minimal postgresql.conf are:

 post patch/wal logged: 4min10s/4min19/4min12
 post patch/wal bypass: 1m55s/1m58s/2m00
 prepatch/wal logged: 2m55s/3min00/2m59
 prepatch/wal bypass: 2m22s/2m18s/2m20s


 Stefan


Great! Maybe just increasing the size of the BULKWRITE ring,
possibly as a function of the shared_memory is all that is
needed. 256kB is the currently coded ring_size in storage/buffer/freelist.c

Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 ok after a bit of bisecting I'm happy to announce the winner of the contest:
 http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php

 this patch causes a 25-30% performance regression for WAL logged copy, 
 however in the WAL bypass case (maybe that was what got tested?) it 
 results in a 20% performance increase.

Hmm.  What that patch actually changes is that it prevents a bulk insert
(ie COPY in) from trashing the entire shared-buffers arena.  I think the
reason for the WAL correlation is that once it's filled the ring buffer,
creating new pages requires writing out old ones, and the
WAL-before-data rule means that the copy process has to block waiting
for WAL to go down to disk before it can write.  When it's allowed to
use the whole arena there is more chance for some of that writing to be
done by the walwriter or bgwriter.  But the details are going to depend
on the platform's CPU vs I/O balance, which no doubt explains why some
of us don't see it.

I don't think we want to revert that patch --- not trashing the whole
buffer arena seems like a Good Thing from a system-wide point of view,
even if it makes individual COPY operations go slower.  However, we
could maybe play around with the tradeoffs a bit.  In particular it
seems like it would be useful to experiment with different ring buffer
sizes.  Could you try increasing the ring size allowed in
src/backend/storage/buffer/freelist.c for the BULKWRITE case

***
*** 384,389 
--- 384,392 
case BAS_BULKREAD:
ring_size = 256 * 1024 / BLCKSZ;
break;
+   case BAS_BULKWRITE:
+   ring_size = 256 * 1024 / BLCKSZ;
+   break;
case BAS_VACUUM:
ring_size = 256 * 1024 / BLCKSZ;
break;


and see if maybe we can buy back most of the loss with not too much
of a ring size increase?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:

ok after a bit of bisecting I'm happy to announce the winner of the contest:
http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php


this patch causes a 25-30% performance regression for WAL logged copy, 
however in the WAL bypass case (maybe that was what got tested?) it 
results in a 20% performance increase.


Hmm.  What that patch actually changes is that it prevents a bulk insert
(ie COPY in) from trashing the entire shared-buffers arena.  I think the
reason for the WAL correlation is that once it's filled the ring buffer,
creating new pages requires writing out old ones, and the
WAL-before-data rule means that the copy process has to block waiting
for WAL to go down to disk before it can write.  When it's allowed to
use the whole arena there is more chance for some of that writing to be
done by the walwriter or bgwriter.  But the details are going to depend
on the platform's CPU vs I/O balance, which no doubt explains why some
of us don't see it.


hmm - In my case both the CPU (an Intel E5530 Nehalem) and the IO 
subsystem (8GB Fiberchannel connected NetApp with 4GB cache) are pretty 
fast. and even with say fsync=off 8.4RC1 is only slightly faster than 
8.3 with the same config and fsync=on so maybe there is a secondary 
effect at play too.




I don't think we want to revert that patch --- not trashing the whole
buffer arena seems like a Good Thing from a system-wide point of view,
even if it makes individual COPY operations go slower.  However, we
could maybe play around with the tradeoffs a bit.  In particular it
seems like it would be useful to experiment with different ring buffer
sizes.  Could you try increasing the ring size allowed in
src/backend/storage/buffer/freelist.c for the BULKWRITE case

***
*** 384,389 
--- 384,392 
case BAS_BULKREAD:
ring_size = 256 * 1024 / BLCKSZ;
break;
+   case BAS_BULKWRITE:
+   ring_size = 256 * 1024 / BLCKSZ;
+   break;
case BAS_VACUUM:
ring_size = 256 * 1024 / BLCKSZ;
break;


and see if maybe we can buy back most of the loss with not too much
of a ring size increase?


already started testing that once I found the offending commit.

256 * 1024 / BLCKSZ
4min10s/4min19/4min12

512 * 1024 / BLCKSZ
3min27s/3min32s

1024 * 1024 / BLCKSZ
3min14s/3min12s

2048 * 1024 / BLCKSZ
3min02/3min02

4096 * 1024 / BLCKSZ
2m59/2m58s

8192 * 1024 / BLCKSZ

2m59/2m59s

so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in 
more or less the same performance that 8.3 had.




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Josh Berkus



so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in
more or less the same performance that 8.3 had.


Can some folks test this with different size COPYs?  That's both 
larger/smaller tables, and larger/smaller rows.  We should also test 
copy with large blob data.



--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Simon Riggs

On Fri, 2009-06-19 at 14:11 -0400, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
  ok after a bit of bisecting I'm happy to announce the winner of the contest:
  http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php
 
  this patch causes a 25-30% performance regression for WAL logged copy, 
  however in the WAL bypass case (maybe that was what got tested?) it 
  results in a 20% performance increase.
 
 Hmm.  What that patch actually changes is that it prevents a bulk insert
 (ie COPY in) from trashing the entire shared-buffers arena.  I think the
 reason for the WAL correlation is that once it's filled the ring buffer,
 creating new pages requires writing out old ones, and the
 WAL-before-data rule means that the copy process has to block waiting
 for WAL to go down to disk before it can write.  When it's allowed to
 use the whole arena there is more chance for some of that writing to be
 done by the walwriter or bgwriter.  But the details are going to depend
 on the platform's CPU vs I/O balance, which no doubt explains why some
 of us don't see it.
 
 I don't think we want to revert that patch --- not trashing the whole
 buffer arena seems like a Good Thing from a system-wide point of view,
 even if it makes individual COPY operations go slower.  However, we
 could maybe play around with the tradeoffs a bit.  In particular it
 seems like it would be useful to experiment with different ring buffer
 sizes.  Could you try increasing the ring size allowed in
 src/backend/storage/buffer/freelist.c for the BULKWRITE case

Yes, that's definitely the cause. The ring buffer technique was
prototyped in 8.3 and a ring of 32 blocks was found not to be
sufficient, which was one reason we didn't try to commit that then.
At the time, I also proposed a filled buffer list change to bufmgr to
allow bgwriter to preferentially target COPY's filled blocks, which
would also help with this effect.

You'll get better benefit from decreasing wal_writer_delay also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-19 Thread Greg Smith

On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote:

In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem 
(8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast.


The server Alan identified as Solaris 10 8/07 s10x_u4wos_12b X86 has a 
Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150). 
The filesystem involved in that particular case is UFS, which I am 
suspicious of as being part of why the problem is so pronounced there--the 
default UFS tuning is pretty lightweight in terms of how much caching it 
does.  Not sure if Alan ran any tests against the big ZFS volume on the 
other sever, I think all the results he posted were from the UFS boot 
drive there too.


so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more 
or less the same performance that 8.3 had.


It looks like it's a touch higher on our 8/07 system, it levels out at 
8192 * (haven't checked the other one yet).  I'm seeing this, using Alan's 
original test set size (to make sure I was running exactly the same test) 
and just grabbing the low/high from a set of 3 runs:


8.3.7:  0m39.266s   0m43.269s (alan:  36.2 - 39.2)

256:0m50.435s   0m51.944s (alan:  48.1 - 50.6)
1024:   0m47.299s   0m49.346s
4096:   0m43.725s   0m46.116s
8192:   0m40.715s   0m42.480s
16384:  0m41.318s   0m42.118s
65536:  0m41.675s   0m42.955s

I collected some iostat data here as well for some of the runs (the vmstat 
data was harder to read, this being Solaris, and didn't seem to add 
anything).  I'm seeing lines like this with the default ring buffer of 256 
*:


   ttysd1   sd2   nfs1   cpu
 tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0  322  12   10  41371 275400   00   12 11  0 78
   0  166   0   00  46246 338000   00   14 10  0 76
   0  164   0   00  44874 306810   00   13  9  0 78

Obviously sd2 is where the database and source file are at.  Basically, 
about one core (out of four) tied up with a pretty even split of 
user/system time.  Using the highest ring size I tried, 65536 *, gives 
lines that look like this:


   ttysd1   sd2   nfs1   cpu
 tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0  163   0   00  56696 429100   00   20 12  0 68
   0  166   0   00  58554 454200   00   21 12  0 67
   0  168   0   00  56057 430800   00   21 12  0 67

So it seems like increasing the ring size helps saturate the disks better, 
went from ~45MB/s to 57MB/s.  What's kind of interesting is to compare 
this against the 8.3.7 run, which is the fastest of them all, which I was 
expecting to find had the highest write rate of them all;


   ttysd1   sd2   nfs1   cpu
 tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0   83   0   00  47654 212100   00   23  8  0 69
   0  240   0   00  44198 215010   00   19  8  0 73
   0   83   0   00  37750 111010   00   21  6  0 72

That's actually doing less I/O per capita, which is why it's also got less 
waiting for I/O%, but it's completing the most work.  This makes me wonder 
if in addition to the ring buffering issue, there isn't just plain more 
writing per average completed transaction in 8.4 with this type of COPY. 
This might explain why even with the expanded ring buffer, both Stephan 
and my test runs still showed a bit of a regression against 8.3.  I'm 
guessing we have a second, smaller shooter here involved as well.


In any case, a bump of the ring multiplier to either 4096 or 8192 
eliminates the worst of the regression here, good improvement so far.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Stefan Kaltenbrunner

Any objections if I add:

http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php

to the (currently empty) list of open items for 8.4?

A 25-30% performance regression in our main bulk loading mechanism 
should at least be explained before the release...





Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 Any objections if I add:
 http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php
 to the (currently empty) list of open items for 8.4?

I am unable to duplicate any slowdown on this test case.  AFAICT
8.4 and 8.3 branch tip are about the same speed; if anything 8.4
is faster.  Testing on x86_64 Fedora 10 ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 Any objections if I add:

http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php
 to the (currently empty) list of open items for 8.4?
 
 I am unable to duplicate any slowdown on this test case.  AFAICT
 8.4 and 8.3 branch tip are about the same speed; if anything 8.4
 is faster.  Testing on x86_64 Fedora 10 ...
 
I just ran the specified test on:
 
Linux ATHENA 2.6.16.60-0.31-smp #1 SMP Tue Oct 7 16:16:29 UTC 2008
x86_64 x86_64 x86_64 GNU/Linux
Linux version 2.6.16.60-0.31-smp (ge...@buildhost) (gcc version 4.1.2
20070115 (SUSE Linux)) #1 SMP Tue Oct 7 16:16:29 UTC 2008
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 2
 
I did configure with a prefix and these options:
 
 --enable-integer-datetimes --enable-debug --disable-nls --with-libxml
 
I did initdb with --no-locale
 
8.3.7
real0m24.249s
real0m24.054s
real0m24.361s
 
8.4rc1
real0m33.503s
real0m34.198s
real0m33.931s
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote: 
 I am unable to duplicate any slowdown on this test case.

 [ Kevin can ]

It'd be useful first off to figure out if it's a CPU or I/O issue.
Is there any visible difference in vmstat output?  Also, try turning
off autovacuum in both cases, just to see if that's related.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 
 It'd be useful first off to figure out if it's a CPU or I/O issue.
 Is there any visible difference in vmstat output?  Also, try turning
 off autovacuum in both cases, just to see if that's related.
 
Both took slightly longer with autovacuum off, but probably just
within the noise.
 
It's hard to interpret the vmstat output, largely because I chose to
run this on one of our biggest servers, which is not currently serving
an application, per se, but as a replication target, and this being
off hours is busy running the sync process to the source machines. 
This involves generating md5 sums on both sides for blocks of rows,
which is pretty CPU-intensive.  There is very little disk output from
that right now, pretty light on the disk reads, but keeping a few CPUs
pretty busy generating those md5 sums.
 
I've got to go keep an appointment, but I'll come back and see if I
can do more.  For now, here's the raw vmstat, in case someone can pick
out info I'm missing:
 
-Kevin

baseline:
 
procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 3  1240 1456928  70164 12435073600  10167800  6  1 89  
4  0
 4  0240 1451224  70164 12435689600  558047 5457 39514 15  3 74 
 8  0
 2  1240 1445812  70172 12436100800  4988   543 5347 36680 15  3 73 
 9  0
 3  1240 1440196  70172 12436614400  5928   501 3431 30250 14  3 76 
 8  0
 3  1240 143  70172 12437231200  592047 6477 34794 16  2 73 
 9  0
 1  3240 1428916  70172 12437642400  5268   468 5449 33003 15  2 73 
10  0
 1  1240 1422820  70172 12438259200  5036   107 3995 26131 12  2 75 
11  0
 1  3240 1418556  70172 12438772800  4052   156 4562 33519 12  2 78 
 8  0
 1  3240 1412972  70180 12439286400  5544   676 5063 32300 14  2 73 
11  0
 1  3240 1406964  70180 12439903200  5796   119 4743 28285 12  2 74 
13  0
 0  3240 1400992  70180 12440417600  6112   116 4184 15183  9  1 75 
15  0
 4  0240 1396372  70180 12441033600  4604   119 4123 38639 14  2 74 
10  0
 3  1240 1391936  70180 12441651200  4572   332 5025 38534 15  3 73 
 9  0
 2  3240 1387324  70188 12441856000  4796   397 4271 23039 11  2 80 
 8  0
 2  1240 1380196  70188 12442575200  6864   209 5583 32509 16  2 73 
 9  0
 1  3240 1374980  70188 12443294400  529252 5007 37758 15  2 74 
 9  0
 0  0240 1372708  70188 12443500800  2600   132 3363 18199  9  1 86 
 4  0
 2  2240 1367488  70188 12443912000  5152   430 5450 42153 18  3 74 
 6  0
 
8.3.7 test:
 
procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0  0240 590064  70164 12521939200  10167800  6  1 89  
4  0
 4  1240 586336  70164 12522144800  362837 4990 39525 15  3 77  
5  0
 2  0240 581892  70164 12522659200  455650 5597 33057 16  2 76  
5  0
 3  1240 578616  70164 12522967200  350042 3902 36959 14  3 80  
4  0
 3  1240 573520  70164 12523481600  523282 5081 26713 15  2 75  
8  0
 1  3240 591688  70164 12521733600  4612 15890 5256 33309 13  3 76  
8  0
 3  1240 586944  70172 12522041600  4596   100 4640 23263 12  2 77  
9  0
 3  1240 595632  70160 12521528000  5512  5814 5314 26822 16  2 73  
8  0
 1  2240 581200  70152 12522968000  4684 39061 3604 26682 16  3 67 
14  0
 4  1240 596972  70152 12521118400  1508 53915 2729 20654  9  3 83  
5  0
 2  4240 589184  70152 12522043200  4140 44877 3809 82485 12  4 69 
16  0
 1  4240 597620  70116 12520916000  5744 54096 4597 26422 14  3 66 
17  0
 1  4240 585936  70116 12521944000  4136 44584 3661 35574 18  3 68 
11  0
 0  5240 601460  70096 12520300800  4968 54009 4669 26507 12  3 72 
14  0
 1  4240 592604  70080 12521022400  5064 44674 4190 40394 14  3 71 
12  0
 2  3240 604844  70076 12519686400  5528 49601 4664 39189 18  4 68 
10  0
 1  4240 590348  70072 12521434400  7716 49397 4949 41086 18  3 67 
11  0
 1  4240 600036  70060 12520099200  4152 53968 4641 31875 16  3 66 
15  0
 1  4240 587336  70052 12521230400  6420 44620 4427 47627 16  4 67 
13  0
 3  2240 582508  70044 12521540000  4524 32812 4510 256103 18  9 60 
13  0
 5  1240 603680  70020 12519280800  6584 33341 3844 28189 16  3 68 
13  0
 4  1240 591464  70016 12520617600  6180 54710 6664 37282 20  3 64 
13  0
 5  1240 586924  70020 12521233600  5748 44552 5351 39772 18  3 67 
11  0
 4  1240 602376  70004 12519385600  5840 54181 4824 34279 17  4 67 

Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Kenneth Marshall
On Thu, Jun 18, 2009 at 05:20:08PM -0400, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
  Any objections if I add:
  http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php
  to the (currently empty) list of open items for 8.4?
 
 I am unable to duplicate any slowdown on this test case.  AFAICT
 8.4 and 8.3 branch tip are about the same speed; if anything 8.4
 is faster.  Testing on x86_64 Fedora 10 ...
 
   regards, tom lane
 

What is not clear from Stefen's function listing is how the 8.4
server could issue 33% more XLogInsert() and CopyReadLine()
calls than the 8.3.7 server using the same input file. That would
account for the slow down but now why it is happening.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: 
 
 I've got to go keep an appointment
 
Sorry about that.  Back now.  Anything else I can do to help with
this?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 open item: copy performance regression?

2009-06-18 Thread Kevin Grittner
Kenneth Marshall k...@rice.edu wrote: 
 
 What is not clear from Stefen's function listing is how the 8.4
 server could issue 33% more XLogInsert() and CopyReadLine()
 calls than the 8.3.7 server using the same input file.
 
I thought those were profiling numbers -- the number of times a timer
checked what was executing and found it in that method.  Which
suggests that those two methods are probably slower now than in 8.3.7,
at least in some environments.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers