Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: The checkpoint_segments seems dramatic enough to be real. I wonder if the test is short enough that it never got around to re-using any of them, so it was doing extra writes for the initial creation during the test? That's exactly what I was about to suggest. Are you starting each run from a fresh initdb? If so, try running the load long enough that the number of WAL files stabilizes (should happen at 2x checkpoint_segments) and then start the test measurement. default conf (xlogs not populated) real3m49.604s real3m47.225s real3m45.831s default conf (xlogs populated) real3m45.603s real3m45.284s real3m45.906s default conf + checkpoint_segments = 100 (xlogs not populated) real4m27.629s real4m24.496s real4m22.832s default conf + checkpoint_segments = 100 (xlogs populated) real3m52.746s real3m52.619s real3m50.418s I used ten times the number of rows, to get more meaningful results. To get the populated times, I just dropped the target table and created it again; otherwise identical runs. Clearly, pre-populating the xlog files reduces run time, especially for a large number of xlog files; however, I still got better performance with a smaller set of xlog files. Regarding the fact that even with the xlog files pre-populated, the smaller set of xlog files is faster: I'm only guessing, but I suspect the battery backed RAID controller is what's defeating conventional wisdom here. By writing to the same, relatively small, set of xlog files repeatedly, some of the actual disk writes probably evaporate in the BBU cache. More frequent checkpoints from the smaller number of xlog files might also have caused data to start streaming to the disk a little sooner, minimizing write gluts later. I've often seen similar benefits to the BBU cache which cause some of the frequently-given advice here to have no discernible affect or be counter-productive in our environment. (I know that some doubted that my aggressive background writer settings didn't increase disk writes, but I couldn't even measure a difference there in the writes from OS cache to the controller cache, much less anything which indicated it actually increased physical disk writes.) By the way, the number of xlog files seemed to always go to two above 2x checkpoint_segments. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov writes: Regarding the fact that even with the xlog files pre-populated, the smaller set of xlog files is faster: I'm only guessing, but I suspect the battery backed RAID controller is what's defeating conventional wisdom here. By writing to the same, relatively small, set of xlog files repeatedly, some of the actual disk writes probably evaporate in the BBU cache. Yeah, sounds plausible. How big is your BBU cache? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, June 26, 2009 4:13 pm, Kevin Grittner wrote: By the way, the number of xlog files seemed to always go to two above 2x checkpoint_segments. The docs say: There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: How big is your BBU cache? On this machine, I guess it is 512MB. (Possibly 1GB, but I'm having trouble finding the right incantation to check it at the moment, so I'm going by what the hardware tech remembers.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, 2009-06-22 at 15:18 -0500, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: default postgresql.conf (comments stripped) max_connections = 100 shared_buffers = 32MB This forces ring size to be 4MB, since min(32MB/8, ringsize). Please re-run tests with your config, ring size 4MB, which should give same answer -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Hi Tom, How much concern is there for the contention for use cases where the WAL can't be bypassed? Thanks, Alan On Sun, Jun 21, 2009 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: The following copying 3M rows(each) into a seperate table of the same database. Is this with WAL, or bypassing WAL? Given what we've already seen, a lot of contention for WALInsertLock wouldn't surprise me much here. It should be possible to bypass that though. regards, tom lane
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I was going to say that since we flush the WAL every 16MB anyway (at every XLOG file switch), you shouldn't see any benefit with larger ring buffers, since to fill 16MB of data you're not going to write more than 16MB WAL. I'm not convinced that WAL segment boundaries are particularly relevant to this. The unit of flushing is an 8K page, not a segment. We fsync() the old WAL segment every time we switch to a new WAL segment. That's what I meant by flush. If the walwriter is keeping up, it will fsync() the WAL more often, but 16MB is the maximum distance between fsync()s. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, 2009-06-22 at 10:52 +0300, Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I was going to say that since we flush the WAL every 16MB anyway (at every XLOG file switch), you shouldn't see any benefit with larger ring buffers, since to fill 16MB of data you're not going to write more than 16MB WAL. I'm not convinced that WAL segment boundaries are particularly relevant to this. The unit of flushing is an 8K page, not a segment. We fsync() the old WAL segment every time we switch to a new WAL segment. That's what I meant by flush. If the walwriter is keeping up, it will fsync() the WAL more often, but 16MB is the maximum distance between fsync()s. Yes, but the fsync is performed by the process that writes the WAL, not necessarily by the process that inserts the WAL. In perfect balance, an inserter-of-WAL could insert an infinite amount of WAL and never need to fsync the WAL. So the question is are we in balance between WALWriter and COPY? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Alan Li a...@truviso.com writes: How much concern is there for the contention for use cases where the WAL can't be bypassed? If you mean is something going to be done about it in 8.4, the answer is no. This is a pre-existing issue that there is no simple fix for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Alan Li a...@truviso.com writes: How much concern is there for the contention for use cases where the WAL can't be bypassed? If you mean is something going to be done about it in 8.4, the answer is no. This is a pre-existing issue that there is no simple fix for. I thought he was asking if we intend to provide for WAL bypass on a table by table basis in future. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: Alan Li a...@truviso.com writes: How much concern is there for the contention for use cases where the WAL can't be bypassed? If you mean is something going to be done about it in 8.4, the answer is no. This is a pre-existing issue that there is no simple fix for. I thought he was asking if we intend to provide for WAL bypass on a table by table basis in future. I thought he was asking for a solution to the problem of WALInsertLock contention. In any case, we have WAL bypass on a table by table basis now, don't we? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: I thought he was asking for a solution to the problem of WALInsertLock contention. In any case, we have WAL bypass on a table by table basis now, don't we? If we do I'm ignorant of it ;-) How do we say Never WAL this table? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: I thought he was asking for a solution to the problem of WALInsertLock contention. In any case, we have WAL bypass on a table by table basis now, don't we? If we do I'm ignorant of it ;-) How do we say Never WAL this table? Make it a temporary table. The more useful case for data load is create or truncate it in the same transaction, of course. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
* Andrew Dunstan and...@dunslane.net [090622 10:47]: If we do I'm ignorant of it ;-) How do we say Never WAL this table? CREATE TEMPORARY TABLE ... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] 8.4 open item: copy performance regression?
* Tom Lane (t...@sss.pgh.pa.us) wrote: The more useful case for data load is create or truncate it in the same transaction, of course. Unfortunately, WAL bypass also requires not being in archive mode with no way to turn that off w/o a server restart, aiui. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 8.4 open item: copy performance regression?
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: The more useful case for data load is create or truncate it in the same transaction, of course. Unfortunately, WAL bypass also requires not being in archive mode with no way to turn that off w/o a server restart, aiui. Well, if you're trying to archive then you certainly wouldn't want WAL off, so I'm failing to see where this thread is going ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: Unfortunately, WAL bypass also requires not being in archive mode with no way to turn that off w/o a server restart, aiui. Well, if you're trying to archive then you certainly wouldn't want WAL off, so I'm failing to see where this thread is going ... I disagree. I'd love to be able to say please bypass WAL logging for this bulk load because I know that I'll pick up the data during my next full dump and I can reload it from original if I get disrupted before then. This is especially true when you're doing bulk loads of static or reference data from another data source. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, 2009-06-22 at 11:14 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: The more useful case for data load is create or truncate it in the same transaction, of course. Unfortunately, WAL bypass also requires not being in archive mode with no way to turn that off w/o a server restart, aiui. Well, if you're trying to archive then you certainly wouldn't want WAL off, so I'm failing to see where this thread is going ... I was thinking it might be beneficial to be able to defer writing WAL until COPY is complete, so heap_sync would either fsync the whole heap file or copy the whole file to WAL. That would avoid writing WAL piecemeal because we could just backup the whole block, plus we wouldn't write anything at all if the COPY failed. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Simon Riggs si...@2ndquadrant.com writes: I was thinking it might be beneficial to be able to defer writing WAL until COPY is complete, so heap_sync would either fsync the whole heap file or copy the whole file to WAL. What about indexes? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, 2009-06-22 at 11:24 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: I was thinking it might be beneficial to be able to defer writing WAL until COPY is complete, so heap_sync would either fsync the whole heap file or copy the whole file to WAL. What about indexes? I was thinking we could do exactly as stated for the cases that would be WAL-bypass now, but need to write WAL because XLogArchivingActive(). i.e. improve the exact case we are measuring here. Yes, it is more complex than that for loading to existing tables. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: I thought he was asking for a solution to the problem of WALInsertLock contention. In any case, we have WAL bypass on a table by table basis now, don't we? If we do I'm ignorant of it ;-) How do we say Never WAL this table? Make it a temporary table. That doesn't help if you need the data visible in multiple sessions. But we're digressing from the original topic. Sorry. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: I'm not convinced that WAL segment boundaries are particularly relevant to this. The unit of flushing is an 8K page, not a segment. We fsync() the old WAL segment every time we switch to a new WAL segment. That's what I meant by flush. If the walwriter is keeping up, it will fsync() the WAL more often, but 16MB is the maximum distance between fsync()s. I'm still not convinced --- to my mind the issue is not whether fsyncs happen but whether the COPY process has to wait for 'em, and I don't think that segment boundaries directly affect that. I'd still be interested to see similar measurements done with different wal_buffer settings. However, in the interests of getting this resolved in time for 8.4.0, I propose that we just settle on 16MB as the bulkwrite ring buffer size. There doesn't seem to be any evidence that a larger size will make for a significant improvement, and we shouldn't allow COPY to trash a bigger fraction of the arena than it really has to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Le 22 juin 2009 à 17:24, Tom Lane t...@sss.pgh.pa.us a écrit : Simon Riggs si...@2ndquadrant.com writes: I was thinking it might be beneficial to be able to defer writing WAL until COPY is complete, so heap_sync would either fsync the whole heap file or copy the whole file to WAL. What about indexes? Skip this optimisation if there are any. It's already Common practise to create them only after copy succeeded when possible for better bulk loading perfs. Then there's also the copy + Insert ... Select ... technique. -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: I wonder though whether the wal_buffers setting interacts with the ring size. Has everyone who's tested this used the same 16MB wal_buffers setting as in Alan's original scenario? I had been using his postgresql.conf file, then added autovacuum = off. When I tried with setting the ring size to 16MB, I accidentally left off the step to copy the postgresql.conf file, and got better performance. I alternated between the postgresql.conf file from earlier tests and the default file left there by the initdb, and got this: 8.4rc1 with 16MB ring, default postgresql.conf 0m23.223s 0m23.489s 0m23.921s 8.4rc1 with 16MB ring, Alan's postgresql.conf 0m28.678s 0m26.171s 0m27.513s default postgresql.conf (comments stripped) max_connections = 100 shared_buffers = 32MB datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english' Alan's postgresql.conf (comments stripped) shared_buffers = 256MB wal_buffers = 16MB checkpoint_segments = 100 autovacuum = off I'm not going to claim I know why, but I thought I should report it. Oh, and the 8.3.7 numbers and pre-patch numbers were averaging the same under the day-time load as the replication sync mode. So, with the ring size at 16MB this load is faster under 8.4 than 8.3. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I wonder though whether the wal_buffers setting interacts with the ring size. Has everyone who's tested this used the same 16MB wal_buffers setting as in Alan's original scenario? I had been using his postgresql.conf file, then added autovacuum = off. When I tried with setting the ring size to 16MB, I accidentally left off the step to copy the postgresql.conf file, and got better performance. Huh, that's bizarre. I can see that increasing shared_buffers should make no difference in this test case (we're not using them all anyway). But why should increasing wal_buffers make it slower? I forget the walwriter's control algorithm at the moment ... maybe it works harder when wal buffers are full? BTW, I committed the change to use 16MB; that will be in RC2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, 22 Jun 2009, Kevin Grittner wrote: When I tried with setting the ring size to 16MB, I accidentally left off the step to copy the postgresql.conf file, and got better performance. Do you have happen to have a build with assertions turned on? That is one common cause of performance going down via increased shared_buffers that people tend to run into during beta. You can check via psql with show debug_assertions; -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Greg Smith gsm...@gregsmith.com wrote: Do you have happen to have a build with assertions turned on? Nope. I showed my ./configure options upthread, but can confirm with pg_config: BINDIR = /usr/local/pgsql-8.4rc1/bin DOCDIR = /usr/local/pgsql-8.4rc1/share/doc HTMLDIR = /usr/local/pgsql-8.4rc1/share/doc INCLUDEDIR = /usr/local/pgsql-8.4rc1/include PKGINCLUDEDIR = /usr/local/pgsql-8.4rc1/include INCLUDEDIR-SERVER = /usr/local/pgsql-8.4rc1/include/server LIBDIR = /usr/local/pgsql-8.4rc1/lib PKGLIBDIR = /usr/local/pgsql-8.4rc1/lib LOCALEDIR = /usr/local/pgsql-8.4rc1/share/locale MANDIR = /usr/local/pgsql-8.4rc1/share/man SHAREDIR = /usr/local/pgsql-8.4rc1/share SYSCONFDIR = /usr/local/pgsql-8.4rc1/etc PGXS = /usr/local/pgsql-8.4rc1/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--prefix=/usr/local/pgsql-8.4rc1' '--enable-integer-datetimes' '--enable-debug' '--disable-nls' '--with-libxml' CC = gcc CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g CFLAGS_SL = -fpic LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql-8.4rc1/lib' LDFLAGS_SL = LIBS = -lpgport -lxml2 -lz -lreadline -lcrypt -ldl -lm VERSION = PostgreSQL 8.4rc1 You can check via psql with show debug_assertions; OK, we'll do it your way. :-) kgri...@athena:~/postgresql-8.4rc1 psql postgres psql (8.4rc1) Type help for help. postgres=# show debug_assertions; debug_assertions -- off (1 row) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: A 25-30% performance regression in our main bulk loading mechanism should at least be explained before the release... I think a performance regression of that magnitude merits holding up a release to resolve. Note that in a follow-up post showing the slow 8.4 copying on Linux, the copy was 11.7% to 19.3% *faster* on 8.4 when the WAL writing was suppressed: http://archives.postgresql.org/pgsql-performance/2009-06/msg00219.php Extracting from that post: # I can reproduce that on Linux(CentoS 5.3/x86_64, Nehalem Xeon E5530) # on 8.4 I get: # # 3m59/4m01/3m56s runtime and a profile of # # samples %symbol name # 636302 19.6577 XLogInsert # 415510 12.8366 CopyReadLine # on 8.3.7 I get 2m58s,2m54s,2m55s # # and a profile of: # # samples %symbol name # 460966 16.2924 XLogInsert # 307386 10.8643 CopyReadLine # If I do the same test utilizing WAL bypass the picture changes: # # 8.3 runtimes:2m16,2min14s,2min22s # 8.4 runtime: 2m1s,2m,1m59s Is there a reason to believe that the XLogInsert part of this *only* affects bulk loads? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: Huh, that's bizarre. I can see that increasing shared_buffers should make no difference in this test case (we're not using them all anyway). But why should increasing wal_buffers make it slower? I forget the walwriter's control algorithm at the moment ... maybe it works harder when wal buffers are full? I created a postgresql.conf file with the options from the default file, and then tried that by itself again, and with each of three other options: none 0m24.540s 0m24.630s 0m23.778s checkpoint_segments = 100 0m30.251s 0m29.474s 0m26.604s wal_buffers = 16MB 0m24.487s 0m23.939s 0m23.557s shared_buffers = 256MB 0m25.885s 0m25.654s 0m24.025s So the big hit seems to come from boosting checkpoint_segments, although boosting shared_buffers seems to cause a slight slowdown. Boosting wal_buffers seemed to help a little. Both of these last two, though, are within the noise, so low confidence on those without a lot more tests. The checkpoint_segments seems dramatic enough to be real. I wonder if the test is short enough that it never got around to re-using any of them, so it was doing extra writes for the initial creation during the test? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
I wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: A 25-30% performance regression in our main bulk loading mechanism should at least be explained before the release... I think a performance regression of that magnitude merits holding up a release to resolve. Wow. That sure took a long time to come through. I posted that days ago -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov writes: The checkpoint_segments seems dramatic enough to be real. I wonder if the test is short enough that it never got around to re-using any of them, so it was doing extra writes for the initial creation during the test? That's exactly what I was about to suggest. Are you starting each run from a fresh initdb? If so, try running the load long enough that the number of WAL files stabilizes (should happen at 2x checkpoint_segments) and then start the test measurement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Mon, Jun 22, 2009 at 7:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alan Li a...@truviso.com writes: How much concern is there for the contention for use cases where the WAL can't be bypassed? If you mean is something going to be done about it in 8.4, the answer is no. This is a pre-existing issue that there is no simple fix for. regards, tom lane No no, I am certainly not implying anything for the 8.4 timeframe. Moving forward, I imagine this being more of a problem for data warehouse applications, where bulk inserts occur on existing fact tables. In this case, the WAL cannot be bypassed (unless the bulk insert occurs on a newly created partition). And since COPY is cpu-bound, it would perhaps be advantageous to do parallel COPY's on the same table on multi-core systems, which won't work with WAL bypassing either. Thanks, Alan
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, 20 Jun 2009, Simon Riggs wrote: I would suggest that we check how much WAL has been written. There may be a secondary effect or a different regression hidden in these results. What's the easiest way to do that? My first thought was to issue a checkpoint before the test (which is a good idea to make each test consistent anyway), save the output from pg_controldata, test, checkpoint, and look at the control data again. This seems kind of clunky though, but still better than trolling through the OS statistics for the data. Any clever ideas for a better way to measure bytes of WAL written during a particular chunk of code? We may need some sort of checkpoint/sync after the test to get correct results, because I've noticed that the tests I run sometimes continue writing out buffers for a few seconds after the test time is finished. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote: 8192 6m43.203s/6m48.293s 16384 6m24.980s/6m24.116s 32768 6m20.753s/6m22.083s 65536 6m22.913s/6m22.449s 1048576 6m23.765s/6m24.645s The rest of the patch should have had a greater effect on tables with thinner rows. Your results match my expectations, though I read from them that we should use 16384, since that provides some gain, not just a cancellation of the regression. +1 for using 16384 (ie, max ring buffer size 16MB). Maybe even more. It seems likely that other cases might have an even bigger issue than is exhibited in the couple of test cases we have here, so we should leave some margin for error. Also, there's code in there to limit the ring buffer to 1/8th of shared buffers, so we don't have to worry about trashing the whole buffer arena in small configurations. Any limitation at all is still a step forward over previous releases as far as not trashing the arena is concerned. +1. You might get away with a smaller ring with narrow tables, where writing 16MB of data produces more than 16MB of WAL, but I don't think it can ever be the other way round. Leaving a little bit of room for error doesn't seem like a bad idea, though. IIRC we experimented with an auto-tuning ring size when we worked on the original ring buffer patch. The idea is that you start with a small ring, and enlarge it in StrategyRejectBuffer. But that seems too risky for 8.4. I wonder if using the small ring showed any benefit when the COPY is not WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, so the small ring might have some L2 cache benefits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Heikki Linnakangas wrote: Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote: 8192 6m43.203s/6m48.293s 16384 6m24.980s/6m24.116s 32768 6m20.753s/6m22.083s 65536 6m22.913s/6m22.449s 1048576 6m23.765s/6m24.645s The rest of the patch should have had a greater effect on tables with thinner rows. Your results match my expectations, though I read from them that we should use 16384, since that provides some gain, not just a cancellation of the regression. +1 for using 16384 (ie, max ring buffer size 16MB). Maybe even more. It seems likely that other cases might have an even bigger issue than is exhibited in the couple of test cases we have here, so we should leave some margin for error. Also, there's code in there to limit the ring buffer to 1/8th of shared buffers, so we don't have to worry about trashing the whole buffer arena in small configurations. Any limitation at all is still a step forward over previous releases as far as not trashing the arena is concerned. +1. You might get away with a smaller ring with narrow tables, where writing 16MB of data produces more than 16MB of WAL, but I don't think it can ever be the other way round. Leaving a little bit of room for error doesn't seem like a bad idea, though. yeah 16MB seems like the best choice given the available data and how far we are into the release cycle. IIRC we experimented with an auto-tuning ring size when we worked on the original ring buffer patch. The idea is that you start with a small ring, and enlarge it in StrategyRejectBuffer. But that seems too risky for 8.4. agreed. I wonder if using the small ring showed any benefit when the COPY is not WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, so the small ring might have some L2 cache benefits. I did some limited testing on that but I was unable to measure any significant effect - especially since the difference between wal-logged and not is rather small for a non-parallel COPY (ie in the above example you get around 6m20s runtime for wal-logged and ~5m40s in the other case). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote: I did some limited testing on that but I was unable to measure any significant effect - especially since the difference between wal-logged and not is rather small for a non-parallel COPY (ie in the above example you get around 6m20s runtime for wal-logged and ~5m40s in the other case). This is a common confusion for small tests. Non-WAL logged case causes all buffers to be written to disk at end of COPY. This is roughly the same size as the volume of WAL written. In logged case we do not write data blocks, they get written at next checkpoint. So the reduction in I/O is not apparent, since during the period of the test the I/O is about the same in both cases and less I/O in the non-WAL logged case. On longer tests the difference shows more clearly because the data blocks start to migrate out of shared buffers while the COPY is still running, effecting the test results. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, 2009-06-21 at 02:45 -0400, Greg Smith wrote: On Sat, 20 Jun 2009, Simon Riggs wrote: I would suggest that we check how much WAL has been written. There may be a secondary effect or a different regression hidden in these results. What's the easiest way to do that? pg_current_xlog_insert_location() -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Simon Riggs wrote: On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote: I did some limited testing on that but I was unable to measure any significant effect - especially since the difference between wal-logged and not is rather small for a non-parallel COPY (ie in the above example you get around 6m20s runtime for wal-logged and ~5m40s in the other case). This is a common confusion for small tests. Non-WAL logged case causes all buffers to be written to disk at end of COPY. This is roughly the same size as the volume of WAL written. In logged case we do not write data blocks, they get written at next checkpoint. So the reduction in I/O is not apparent, since during the period of the test the I/O is about the same in both cases and less I/O in the non-WAL logged case. On longer tests the difference shows more clearly because the data blocks start to migrate out of shared buffers while the COPY is still running, effecting the test results. I was actually testing with and without explicit CHECKPOINTing before/after the load(and also with longer runs) too - the difference is negligible especially with only one process involved. I think the difference is simply not that large because we are still mostly CPU bound within COPY on reasonably fast IO-subsystems. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner wrote: Simon Riggs wrote: On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote: I did some limited testing on that but I was unable to measure any significant effect - especially since the difference between wal-logged and not is rather small for a non-parallel COPY (ie in the above example you get around 6m20s runtime for wal-logged and ~5m40s in the other case). This is a common confusion for small tests. Non-WAL logged case causes all buffers to be written to disk at end of COPY. This is roughly the same size as the volume of WAL written. In logged case we do not write data blocks, they get written at next checkpoint. So the reduction in I/O is not apparent, since during the period of the test the I/O is about the same in both cases and less I/O in the non-WAL logged case. On longer tests the difference shows more clearly because the data blocks start to migrate out of shared buffers while the COPY is still running, effecting the test results. I was actually testing with and without explicit CHECKPOINTing before/after the load(and also with longer runs) too - the difference is negligible especially with only one process involved. I think the difference is simply not that large because we are still mostly CPU bound within COPY on reasonably fast IO-subsystems. hmm to further demonstrate that I just did some testing(same config as before and the 16MB for the buffer) by loading those 60M rows into a 20GB ramdisk instead of the SAN(with a CHECKPOINT before and after). this results in the following improvements: 16384: wal bypass: 5min40s - 5min10s (~9%) wal logged: 6min20s - 6min8s (~3%) vmstat 5 output shows that the system is in fact CPU bound (ie using ~6% which is more or less a full core on a 16 core box) and not doing anything IO-wise. r b swpd free buff cache si sobibo in cs us sy id wa st 1 0 0 19010444 118648 1541568400 0 6 1068 2151 6 0 94 0 0 1 0 0 18870448 118652 120400 0 6 1069 2142 6 0 94 0 0 1 0 0 18730568 118684 1569487200 0 185 1080 2151 6 0 94 0 0 1 0 0 18591236 118692 1583451600 019 1072 2144 6 0 94 0 0 1 0 0 18451472 118696 1597353200 046 1073 2152 6 0 94 0 0 1 0 0 18311720 118704 1611320400 0 7 1059 2136 6 0 94 0 0 1 0 0 18171968 118704 1625294400 0 0 1077 2171 6 0 94 0 0 1 0 0 18032088 118712 1639230000 054 1062 2138 6 0 94 0 0 1 0 0 17891716 118720 1653206000 0 8 1078 2176 6 0 94 0 0 So I do think that IO is in fact not too significant for this kind of testing and we still have ways to go in terms of CPU efficiency in COPY. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, Jun 21, 2009 at 6:48 AM, Stefan Kaltenbrunnerste...@kaltenbrunner.cc wrote: So I do think that IO is in fact not too significant for this kind of testing and we still have ways to go in terms of CPU efficiency in COPY. It would be interesting to see some gprof or oprofile output from that test. I went back and dug up the results that I got when I profiled this patch during initial development, and my version of the patch applied, the profile looked like this on my system: % cumulative self self total time seconds secondscalls s/call s/call name 14.48 0.85 0.851 0.85 5.47 DoCopy 10.05 1.44 0.59 1001 0.00 0.00 CopyReadLine 5.62 1.77 0.33 1039 0.00 0.00 PageAddItem 5.11 2.07 0.30 10400378 0.00 0.00 LWLockRelease 4.68 2.35 0.28 1013 0.00 0.00 heap_insert 4.34 2.60 0.26 1012 0.00 0.00 heap_formtuple 3.83 2.83 0.23 10356158 0.00 0.00 LWLockAcquire 3.83 3.05 0.23 1054 0.00 0.00 MarkBufferDirty 3.32 3.25 0.20 1013 0.00 0.00 RelationGetBufferForTuple 3.07 3.43 0.18 1005 0.00 0.00 pg_verify_mbstr_len 2.90 3.60 0.17 1002 0.00 0.00 CopyGetData 2.73 3.76 0.16 2030 0.00 0.00 enlargeStringInfo 2.73 3.92 0.16 2014 0.00 0.00 pq_getbytes 2.04 4.04 0.12 1000 0.00 0.00 InputFunctionCall ...but this might not be very representative, since I think I may have tested it on a single-column table. It would be interesting to see some other results. Simon had the idea of further improving performance by keeping the current buffer locked (this patch just kept it pinned, but not locked), but I didn't see an obvious clean design for that. Heikki also had a patch for speeding up copy, but it got dropped for 8.4 due to time constraints. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Robert Haas robertmh...@gmail.com writes: It would be interesting to see some gprof or oprofile output from that test. I went back and dug up the results that I got when I profiled this patch during initial development, and my version of the patch applied, the profile looked like this on my system: Were you testing with a temp table? The lack of XLogInsert in your profile is striking. Stefan's results upthread had it at the top, and I got more or less the same thing here (didn't keep my numbers unfortunately). Simon had the idea of further improving performance by keeping the current buffer locked (this patch just kept it pinned, but not locked), but I didn't see an obvious clean design for that. The potential for deadlock seems sufficient reason not to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: It would be interesting to see some gprof or oprofile output from that test. I went back and dug up the results that I got when I profiled this patch during initial development, and my version of the patch applied, the profile looked like this on my system: Were you testing with a temp table? The lack of XLogInsert in your profile is striking. Stefan's results upthread had it at the top, and I got more or less the same thing here (didn't keep my numbers unfortunately). I guess that profile was for the wal bypass case and it looks fairly similiar to what I get here(lineitem table into tmpfs - though only 30M rows this time to keep VM pressure low): samples %symbol name 286197 17.1997 DoCopy 232958 14.0002 CopyReadLine 99762 5.9954 DecodeNumber 92751 5.5741 heap_fill_tuple 84439 5.0746 pg_verify_mbstr_len 65421 3.9316 InputFunctionCall 62502 3.7562 DecodeDate 53565 3.2191 heap_form_tuple 47731 2.8685 ParseDateTime 41206 2.4764 DecodeDateTime 39936 2.4001 pg_next_dst_boundary 36093 2.1691 AllocSetAlloc 33967 2.0413 heap_compute_data_size 29921 1.7982 float4in 27227 1.6363 DetermineTimeZoneOffset 25622 1.5398 pg_atoi 24703 1.4846 pg_mblen 24495 1.4721 .plt 23912 1.4371 pg_mbstrlen_with_len 23448 1.4092 bpchar_input 20033 1.2039 date2j 16331 0.9815 date_in 15684 0.9426 bpcharin 14819 0.8906 PageAddItem 14261 0.8571 ValidateDate Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I wonder if using the small ring showed any benefit when the COPY is not WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, so the small ring might have some L2 cache benefits. I think the notion that we might get a cache win from a smaller ring is an illusion. We're not expecting to go back and re-read from a previously filled page in this scenario. In any case, all of the profiling results so far show that the CPU bottlenecks are elsewhere. Until we can squeeze an order of magnitude out of COPY's data parsing and/or XLogInsert, any possible cache effects will be down in the noise. So to my mind, the only question left to answer (at least for the 8.4 cycle) is is 16MB enough, or do we want to make the ring even bigger?. Right at the moment I'd be satisfied with 16, but I wonder whether there are scenarios where 32MB would show a significant advantage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, Jun 21, 2009 at 11:31 AM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It would be interesting to see some gprof or oprofile output from that test. I went back and dug up the results that I got when I profiled this patch during initial development, and my version of the patch applied, the profile looked like this on my system: Were you testing with a temp table? No. The lack of XLogInsert in your profile is striking. Stefan's results upthread had it at the top, and I got more or less the same thing here (didn't keep my numbers unfortunately). As Stephen guessed, I created the table in the same transaction that I inserted into it... Simon had the idea of further improving performance by keeping the current buffer locked (this patch just kept it pinned, but not locked), but I didn't see an obvious clean design for that. The potential for deadlock seems sufficient reason not to do that. Yep, that was the problem. *thinks* I think we had the idea of buffering up enough tuples to fill a page (estimating conservatively so as to make sure we actually fill it), and then inserting them all at once. But I'm not sure how much trouble that causes in terms of the timing of inserting index entries and firing of after row insert triggers. If the command ID doesn't change, it seems like it might be OK. Or at worst, even if the optimization would only work in cases where there are no after row triggers, that would still be useful to a lot of people, I think. I haven't really spent much time on this angle of attack and completely confess to not understanding all of the issues... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I wonder if using the small ring showed any benefit when the COPY is not WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, so the small ring might have some L2 cache benefits. I think the notion that we might get a cache win from a smaller ring is an illusion. We're not expecting to go back and re-read from a previously filled page in this scenario. In any case, all of the profiling results so far show that the CPU bottlenecks are elsewhere. Until we can squeeze an order of magnitude out of COPY's data parsing and/or XLogInsert, any possible cache effects will be down in the noise. So to my mind, the only question left to answer (at least for the 8.4 cycle) is is 16MB enough, or do we want to make the ring even bigger?. Right at the moment I'd be satisfied with 16, but I wonder whether there are scenarios where 32MB would show a significant advantage. Even 32MB is not that much. It seems to me that in any realistic production scenario you're going to have at least half a gig of shared buffers, so we're really talking about at most one-sixteenth of the shared buffer arena, and possibly quite a bit less. I think that's pretty conservative. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Greg Stark gsst...@mit.edu writes: There was some discussion of doing this in general for all inserts inside the indexam. The btree indexam could buffer up any inserts done within the transaction and keep them in an in-memory btree. Any lookups done within the transaction first look up in the in-memory tree then the disk. If the in-memory buffer fills up then we flush them to the index. The reason this is tempting is that we could then insert them all in a single index-merge operation which would often be more efficient than retail inserts. That's not gonna work for a unique index, which unfortunately is a pretty common case ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I wonder if using the small ring showed any benefit when the COPY is not WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, so the small ring might have some L2 cache benefits. I think the notion that we might get a cache win from a smaller ring is an illusion. We're not expecting to go back and re-read from a previously filled page in this scenario. In any case, all of the profiling results so far show that the CPU bottlenecks are elsewhere. Until we can squeeze an order of magnitude out of COPY's data parsing and/or XLogInsert, any possible cache effects will be down in the noise. we also need to take a serious look at our locking overhead - WAL logged COPY is already taking a significant performance hit with just a second process running in parallel(into a seperate table). I just did some testing using those 16MB buffer, the upthread mentioned postgresql.conf and a 20GB tmpfs. The following copying 3M rows(each) into a seperate table of the same database. processes total time(s) rows/s rows/s - per core 1 17.5171428.57 171428.57 2 20.8288461.54 144230.77 4 25.5470588.24 117647.06 6 31.1578778.14 96463.02 8 41.4579710.14 72463.77 10 63 476190.48 47619.05 12 89 404494.38 33707.87 14 116 362068.97 25862.07 16 151 317880.79 19867.55 the higher the process count the more erratic the box behaves - it will show a very high context switch rate (between 30 and 40/s) a large amount of idle time (60%!). example vmstat 5 output for the 12 process test: 7 0 0 21654500 45436 1293251600 0 3 1079 336941 34 7 59 0 0 6 0 0 21354044 45444 1323244400 052 1068 341836 35 7 59 0 0 4 0 0 21053832 45452 1353147200 023 1082 341672 35 7 59 0 0 9 0 0 20751136 45460 138600 041 1063 344117 35 7 59 0 0 6 0 0 20443856 45468 1413811600 014 1079 349398 35 7 58 0 0 8 0 0 20136592 45476 164400 0 8 1060 351569 35 7 58 0 0 10 0 0 19836600 45484 1474332000 0 144 1086 341533 35 7 58 0 0 7 0 0 19540472 45492 1503961600 094 1067 337731 36 7 58 0 0 2 0 0 19258244 45500 1532115600 015 1079 311394 34 6 60 0 0 Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: The following copying 3M rows(each) into a seperate table of the same database. Is this with WAL, or bypassing WAL? Given what we've already seen, a lot of contention for WALInsertLock wouldn't surprise me much here. It should be possible to bypass that though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Robert Haas wrote: On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote: So to my mind, the only question left to answer (at least for the 8.4 cycle) is is 16MB enough, or do we want to make the ring even bigger?. Right at the moment I'd be satisfied with 16, but I wonder whether there are scenarios where 32MB would show a significant advantage. Even 32MB is not that much. It seems to me that in any realistic production scenario you're going to have at least half a gig of shared buffers, so we're really talking about at most one-sixteenth of the shared buffer arena, and possibly quite a bit less. I think that's pretty conservative. I was going to say that since we flush the WAL every 16MB anyway (at every XLOG file switch), you shouldn't see any benefit with larger ring buffers, since to fill 16MB of data you're not going to write more than 16MB WAL. But then I realized that that's not true if you have an unusually low fillfactor. If you only fill each page say 50% full, you're going to use 32MB worth of data pages but only write 16MB of WAL. And maybe you could have a situation like that with very wide rows as well, with wasted space on each page that's not enough to store one more row. Could you test that scenario? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I was going to say that since we flush the WAL every 16MB anyway (at every XLOG file switch), you shouldn't see any benefit with larger ring buffers, since to fill 16MB of data you're not going to write more than 16MB WAL. I'm not convinced that WAL segment boundaries are particularly relevant to this. The unit of flushing is an 8K page, not a segment. I wonder though whether the wal_buffers setting interacts with the ring size. Has everyone who's tested this used the same 16MB wal_buffers setting as in Alan's original scenario? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: The following copying 3M rows(each) into a seperate table of the same database. Is this with WAL, or bypassing WAL? Given what we've already seen, a lot of contention for WALInsertLock wouldn't surprise me much here. It should be possible to bypass that though. this was with WAL. here are the numbers for bypass: processestotal time(s)rows/srows/s - per core 1 15 20 20 2 15.1397350.99 198675.5 4 15.2789473.68 197368.42 6 15.31176470.59 196078.43 8 16.21481481.48 185185.19 10 21.91369863.01 136986.3 12 22.71585903.08 132158.59 14 25.2166.67 119047.62 16 27.91720430.11 107526.88 runtimes grew very short here but the numbers still seem sane and if you compare them to what I did on real storage(though without the 16MB ringbuffer fix!): http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html you will see that for a single core there is almost no performance difference between ramdisk and real disk, at 8 cores there is the largest gap at around 45% but on 16 cores we are down to a mere 20% difference. All in all it seems that we have a big locking issue with WALInsertLock and even with that removed we are mostly CPU limited and not IO limited for COPY. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, 2009-06-21 at 12:38 -0400, Tom Lane wrote: Greg Stark gsst...@mit.edu writes: There was some discussion of doing this in general for all inserts inside the indexam. The btree indexam could buffer up any inserts done within the transaction and keep them in an in-memory btree. Any lookups done within the transaction first look up in the in-memory tree then the disk. If the in-memory buffer fills up then we flush them to the index. The reason this is tempting is that we could then insert them all in a single index-merge operation which would often be more efficient than retail inserts. That's not gonna work for a unique index, which unfortunately is a pretty common case ... I think it can. If we fail on a unique index we fail. We aren't expecting that, else we wouldn't be using COPY. So I reckon its acceptable to load a whole block of rows and then load a whole blocks's worth of index entries. The worst thing that can happen is we insert a few extra heap rows that get aborted, which is small in comparison to the potential gains from buffering. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sun, 2009-06-21 at 20:37 +0300, Heikki Linnakangas wrote: Robert Haas wrote: On Sun, Jun 21, 2009 at 11:52 AM, Tom Lanet...@sss.pgh.pa.us wrote: So to my mind, the only question left to answer (at least for the 8.4 cycle) is is 16MB enough, or do we want to make the ring even bigger?. Right at the moment I'd be satisfied with 16, but I wonder whether there are scenarios where 32MB would show a significant advantage. Even 32MB is not that much. It seems to me that in any realistic production scenario you're going to have at least half a gig of shared buffers, so we're really talking about at most one-sixteenth of the shared buffer arena, and possibly quite a bit less. I think that's pretty conservative. I was going to say that since we flush the WAL every 16MB anyway (at every XLOG file switch), you shouldn't see any benefit with larger ring buffers, since to fill 16MB of data you're not going to write more than 16MB WAL. But then I realized that that's not true if you have an unusually low fillfactor. If you only fill each page say 50% full, you're going to use 32MB worth of data pages but only write 16MB of WAL. And maybe you could have a situation like that with very wide rows as well, with wasted space on each page that's not enough to store one more row. If walwriter is working correctly then it should be writing and fsyncing WAL, while the COPY process just inserts WAL. I don't see that as an argument to limit us to 16MB. But I take your point as being an argument in favour of that as a consensus value for us to choose. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, 20 Jun 2009, Simon Riggs wrote: At the time, I also proposed a filled buffer list change to bufmgr to allow bgwriter to preferentially target COPY's filled blocks, which would also help with this effect. One of the things I keep meaning to investigate is whether there's any benefit to pushing buffers the background writer cleans onto the free list, to speed up their subsequent allocation to needy backends. Both this and the sequential scan ring buffer implementation might both benefit from an approach where buffers are similarly pushed onto the free list when they're no longer needed by the process that pulled them in. Buffers could move from allocated-used-filled buffer list-free list in the COPY buffer case, and allocated-used-free list when executing a sequential scan. That would seem to me to be a more robust general approach for solving this class of problem than the whole ring buffer idea, which is a great start but bound to run into situations where the size of the buffer just isn't right anymore a few hardware generations down the road. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, 2009-06-20 at 02:53 -0400, Greg Smith wrote: On Sat, 20 Jun 2009, Simon Riggs wrote: At the time, I also proposed a filled buffer list change to bufmgr to allow bgwriter to preferentially target COPY's filled blocks, which would also help with this effect. One of the things I keep meaning to investigate is whether there's any benefit to pushing buffers the background writer cleans onto the free list, to speed up their subsequent allocation to needy backends. Both this and the sequential scan ring buffer implementation might both benefit from an approach where buffers are similarly pushed onto the free list when they're no longer needed by the process that pulled them in. Buffers could move from allocated-used-filled buffer list-free list in the COPY buffer case, and allocated-used-free list when executing a sequential scan. The reason for not doing that would be that we don't know that the blocks are free to use; we know very little about them. The longer we leave them the more likely they are to be reused, so putting buffers onto the freelist when they aren't actually free would likely make the cache less effective, ISTM. With filled buffers from COPY we have a very good clue that the buffers will no longer be needed and can treat them differently from others. Also, if we can get the bgwriter to do some of the work then we would have the COPY process, bgwriter and WALwriter all working together on the data loading. (We need to make WALwriter a little smarter also, so that it can respond to changing WAL write workloads). That would seem to me to be a more robust general approach for solving this class of problem than the whole ring buffer idea, which is a great start but bound to run into situations where the size of the buffer just isn't right anymore a few hardware generations down the road. The ring buffer optimises L2 cache, not the buffer cache in general. If we put buffers back on the freelist that is the same as having one global ring buffer, which would then spoil the benefit for L2 on multi-CPU systems. We don't see any L2 benefit with COPY yet, but it is potentially there if we can overcome the stronger effect of the WAL costs. When the ring buffer expands to minimise WAL overheads we also minimise benefit for L2. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, 2009-06-19 at 22:03 -0400, Greg Smith wrote: This makes me wonder if in addition to the ring buffering issue, there isn't just plain more writing per average completed transaction in 8.4 with this type of COPY. I would suggest that we check how much WAL has been written. There may be a secondary effect or a different regression hidden in these results. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, Jun 20, 2009 at 9:22 AM, Simon Riggssi...@2ndquadrant.com wrote: That would seem to me to be a more robust general approach for solving this class of problem than the whole ring buffer idea, which is a great start but bound to run into situations where the size of the buffer just isn't right anymore a few hardware generations down the road. The ring buffer optimises L2 cache, not the buffer cache in general. If we put buffers back on the freelist that is the same as having one global ring buffer, which would then spoil the benefit for L2 on multi-CPU systems. Incidentally a ring buffer is something else. We're talking about a clock-sweep algorithm. The whole point of the clock-sweep algorithm is that it's an approximated LRU but with no single point of contention like a list pointer. Doing this would be undermining that advantage. I don't understand what you mean by size of the buffer either. The only real parameter in the clock sweep algorithm is how many distinct counter values there are. The more values the closer to an LRU the result is. it isn't really tied to hardware in any way, just the usage pattern. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Greg Smith wrote: On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote: In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast. The server Alan identified as Solaris 10 8/07 s10x_u4wos_12b X86 has a Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150). The filesystem involved in that particular case is UFS, which I am suspicious of as being part of why the problem is so pronounced there--the default UFS tuning is pretty lightweight in terms of how much caching it does. Not sure if Alan ran any tests against the big ZFS volume on the other sever, I think all the results he posted were from the UFS boot drive there too. so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more or less the same performance that 8.3 had. It looks like it's a touch higher on our 8/07 system, it levels out at 8192 * (haven't checked the other one yet). I'm seeing this, using Alan's original test set size (to make sure I was running exactly the same test) and just grabbing the low/high from a set of 3 runs: 8.3.7: 0m39.266s 0m43.269s (alan: 36.2 - 39.2) 256:0m50.435s 0m51.944s (alan: 48.1 - 50.6) 1024: 0m47.299s 0m49.346s 4096: 0m43.725s 0m46.116s 8192: 0m40.715s 0m42.480s 16384: 0m41.318s 0m42.118s 65536: 0m41.675s 0m42.955s hmm interesting - I just did a bunch of runs using the lineitem table from the DBT3 tests (loading 60M rows in each run) and the same config Alan used. 8.4(postpatch - not RC1 but that one seems to behave exactly the same way) lineitem1 256 9min38s 512 9min20s 1024 7m44.667s/7m45.342s 2048 7m15.500s/7m17.910s 4096 7m11.424s/7m13.276s 8192 6m43.203s/6m48.293s 16384 6m24.980s/6m24.116s 32768 6m20.753s/6m22.083s 65536 6m22.913s/6m22.449s 1048576 6m23.765s/6m24.645s 8.3 6m45.650s/6m44.781s so on this workload the sweetspot seems to be much higher than on the one with the narrower rows. [...] That's actually doing less I/O per capita, which is why it's also got less waiting for I/O%, but it's completing the most work. This makes me wonder if in addition to the ring buffering issue, there isn't just plain more writing per average completed transaction in 8.4 with this type of COPY. This might explain why even with the expanded ring buffer, both Stephan and my test runs still showed a bit of a regression against 8.3. I'm guessing we have a second, smaller shooter here involved as well. well yes I also suspect that there is some secondary effect at play here and I believe I have seen the more IO with 8.4 thing here too but I have not actually paid enough attention yet to be sure. In any case, a bump of the ring multiplier to either 4096 or 8192 eliminates the worst of the regression here, good improvement so far. yeah with the above numbers I would say that 8192 should remove most if not all of the regression. However it seems that we might have to make this more dynamic in the future since the behaviour seems to depend on a number of variables... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, Jun 20, 2009 at 12:10 PM, Greg Starkgsst...@mit.edu wrote: I don't understand what you mean by size of the buffer either. Ok, having gone back and read the whole thread I understand the context for that statement. Nevermind. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: and I believe I have seen the more IO with 8.4 thing here too but I have not actually paid enough attention yet to be sure. FSM/VM overhead maybe? I think COPY IN is setting the SKIP_FSM bit, but I wonder if there's some vestigial overhead anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote: 8.3.7: 0m39.266s 0m43.269s (alan: 36.2 - 39.2) 8192: 0m40.715s 0m42.480s 16384: 0m41.318s 0m42.118s 65536: 0m41.675s 0m42.955s hmm interesting - I just did a bunch of runs using the lineitem table from the DBT3 tests (loading 60M rows in each run) and the same config Alan used. 8.4(postpatch - not RC1 but that one seems to behave exactly the same way) lineitem1 8192 6m43.203s/6m48.293s 16384 6m24.980s/6m24.116s 32768 6m20.753s/6m22.083s 65536 6m22.913s/6m22.449s 1048576 6m23.765s/6m24.645s 8.3 6m45.650s/6m44.781s so on this workload the sweetspot seems to be much higher than on the one with the narrower rows. The rest of the patch should have had a greater effect on tables with thinner rows. Your results match my expectations, though I read from them that we should use 16384, since that provides some gain, not just a cancellation of the regression. I would suggest that we leave it as a tunable parameter in this release and remove it again once we have clear evidence of how to set it. We are unlikely to cover conclusively how to do this before we release 8.4. In any case, a bump of the ring multiplier to either 4096 or 8192 eliminates the worst of the regression here, good improvement so far. yeah with the above numbers I would say that 8192 should remove most if not all of the regression. However it seems that we might have to make this more dynamic in the future since the behaviour seems to depend on a number of variables... I would be inclined to repeat this with multiple concurrent COPYs. We may be able to improve on the numbers there, as well as get a perspective on how to set parameter in real world. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Sat, 20 Jun 2009, Simon Riggs wrote: The reason for not doing that would be that we don't know that the blocks are free to use; we know very little about them. The longer we leave them the more likely they are to be reused, so putting buffers onto the freelist when they aren't actually free would likely make the cache less effective, ISTM. This is the prevailing wisdom. The current design of the background writer tries to make a reasonable guess as to how many buffers are going to be allocated in the next bg_writer_delay period, and it locates/generates clean buffers to fill that expected need. The idea would be to put those buffers, which have a good chance of being allocated by somebody else and therefore invalided shortly, onto the free list. There's a certain amount of overhead the backends themselves wait for in order to advance the clock sweep position to find the buffers they need, and that adds some latency to them. I would guess there's some potential for reducing that latency if the freelist was sometimes populated, which right now it rarely is. While that would add contention onto the free list, I wonder if it might still be less than the contention on advancing the current buffer pool strategy point. I don't want to go through the whole thing in this thread, just pointing out some similarity with an item I'd already proposed onto the TODO list--and that a good solution there might get rid of this whole how big do I make the ring buffer? situation. Obviously actual measurements here would trump any theorizing as to what works better, it's hard to get any intuition about low-level optimizing given how complicated CPU caches are nowadays. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Simon Riggs si...@2ndquadrant.com writes: On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote: 8192 6m43.203s/6m48.293s 16384 6m24.980s/6m24.116s 32768 6m20.753s/6m22.083s 65536 6m22.913s/6m22.449s 1048576 6m23.765s/6m24.645s The rest of the patch should have had a greater effect on tables with thinner rows. Your results match my expectations, though I read from them that we should use 16384, since that provides some gain, not just a cancellation of the regression. +1 for using 16384 (ie, max ring buffer size 16MB). Maybe even more. It seems likely that other cases might have an even bigger issue than is exhibited in the couple of test cases we have here, so we should leave some margin for error. Also, there's code in there to limit the ring buffer to 1/8th of shared buffers, so we don't have to worry about trashing the whole buffer arena in small configurations. Any limitation at all is still a step forward over previous releases as far as not trashing the arena is concerned. I would suggest that we leave it as a tunable parameter in this release and remove it again once we have clear evidence of how to set it. It's not going to become an exposed tunable. There is not evidence to justify that, and we are down to the point of the release cycle where any more than one-liner changes have to be looked at with considerable reluctance. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
It doesn't look like it's related to autovacuum. I re-ran the test against the two solaris boxes with autovacuum turned off and the results look about the same. 8.3.7 - Solaris 10 11/06 s10x_u3wos_10 X86 real0m43.662s user0m0.001s sys 0m0.003s real0m43.565s user0m0.001s sys 0m0.003s real0m43.742s user0m0.001s sys 0m0.003s 8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86 real0m59.304s user0m0.001s sys 0m0.003s real0m58.653s user0m0.001s sys 0m0.003s real1m0.253s user0m0.001s sys 0m0.003s 8.3.7 - Solaris 10 8/07 s10x_u4wos_12b X86 real0m38.981s user0m0.002s sys 0m0.004s real0m39.879s user0m0.002s sys 0m0.004s real0m39.111s user0m0.002s sys 0m0.004s 8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86 real0m50.647s user0m0.002s sys 0m0.004s real0m49.453s user0m0.002s sys 0m0.004s real0m49.725s user0m0.002s sys 0m0.004s Alan On Thu, Jun 18, 2009 at 3:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I am unable to duplicate any slowdown on this test case. [ Kevin can ] It'd be useful first off to figure out if it's a CPU or I/O issue. Is there any visible difference in vmstat output? Also, try turning off autovacuum in both cases, just to see if that's related. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Yes, you are right. I thought that they were absolute function counts. The data makes more sense now. Regards, Ken On Thu, Jun 18, 2009 at 07:03:34PM -0500, Kevin Grittner wrote: Kenneth Marshall k...@rice.edu wrote: What is not clear from Stefen's function listing is how the 8.4 server could issue 33% more XLogInsert() and CopyReadLine() calls than the 8.3.7 server using the same input file. I thought those were profiling numbers -- the number of times a timer checked what was executing and found it in that method. Which suggests that those two methods are probably slower now than in 8.3.7, at least in some environments. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner wrote: 8.3.7 real0m24.249s real0m24.054s real0m24.361s 8.4rc1 real0m33.503s real0m34.198s real0m33.931s Ugh. This looks like a poster child case for a benchfarm ... Is there any chance you guys could triangulate this a bit? Good initial triangulation points might be the end of each commitfest. (I have a vested interest in making sure COPY performance doesn't regress, since it will affect parallel restore's speed in spades.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On 6/19/09, Andrew Dunstan and...@dunslane.net wrote: Kevin Grittner wrote: 8.3.7 real0m24.249s real0m24.054s real0m24.361s 8.4rc1 real0m33.503s real0m34.198s real0m33.931s Ugh. This looks like a poster child case for a benchfarm ... Is there any chance you guys could triangulate this a bit? Good initial triangulation points might be the end of each commitfest. (I have a vested interest in making sure COPY performance doesn't regress, since it will affect parallel restore's speed in spades.) git bisect? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Andrew Dunstan wrote: Kevin Grittner wrote: 8.3.7 real0m24.249s real0m24.054s real0m24.361s 8.4rc1 real0m33.503s real0m34.198s real0m33.931s Ugh. This looks like a poster child case for a benchfarm ... indeed... Is there any chance you guys could triangulate this a bit? Good initial triangulation points might be the end of each commitfest. (I have a vested interest in making sure COPY performance doesn't regress, since it will affect parallel restore's speed in spades.) Maybe parallel restore is the issue why we haven't noticed this earlier. The case that regressed this way is WAL logged COPY, COPY that can bypass WAL (which typically happens in parallel restore now) is actually a bit faster in my testing in 8.4. I will try and see if I can figure out what caused the regression... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Just eyeing the code ... another thing we changed since 8.3 is to enable posix_fadvise() calls for WAL. Any of the complaints want to try diking out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)? #if defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) if (!XLogArchivingActive() (get_sync_bit(sync_method) PG_O_DIRECT) == 0) (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED); #endif regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Just eyeing the code ... another thing we changed since 8.3 is to enable posix_fadvise() calls for WAL. Any of the complaints want to try diking out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)? #if defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) if (!XLogArchivingActive() (get_sync_bit(sync_method) PG_O_DIRECT) == 0) (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED); #endif doesn't seem to cause any noticable difference for me... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Just eyeing the code ... another thing we changed since 8.3 is to enable posix_fadvise() calls for WAL. Any of the complaints want to try diking out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)? #if defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) if (!XLogArchivingActive() (get_sync_bit(sync_method) PG_O_DIRECT) == 0) (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED); #endif ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. the raw numbers using the upthread posted minimal postgresql.conf are: post patch/wal logged: 4min10s/4min19/4min12 post patch/wal bypass: 1m55s/1m58s/2m00 prepatch/wal logged: 2m55s/3min00/2m59 prepatch/wal bypass: 2m22s/2m18s/2m20s Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, Jun 19, 2009 at 07:49:31PM +0200, Stefan Kaltenbrunner wrote: Tom Lane wrote: Just eyeing the code ... another thing we changed since 8.3 is to enable posix_fadvise() calls for WAL. Any of the complaints want to try diking out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)? #if defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) if (!XLogArchivingActive() (get_sync_bit(sync_method) PG_O_DIRECT) == 0) (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED); #endif ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. the raw numbers using the upthread posted minimal postgresql.conf are: post patch/wal logged: 4min10s/4min19/4min12 post patch/wal bypass: 1m55s/1m58s/2m00 prepatch/wal logged: 2m55s/3min00/2m59 prepatch/wal bypass: 2m22s/2m18s/2m20s Stefan Great! Maybe just increasing the size of the BULKWRITE ring, possibly as a function of the shared_memory is all that is needed. 256kB is the currently coded ring_size in storage/buffer/freelist.c Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. Hmm. What that patch actually changes is that it prevents a bulk insert (ie COPY in) from trashing the entire shared-buffers arena. I think the reason for the WAL correlation is that once it's filled the ring buffer, creating new pages requires writing out old ones, and the WAL-before-data rule means that the copy process has to block waiting for WAL to go down to disk before it can write. When it's allowed to use the whole arena there is more chance for some of that writing to be done by the walwriter or bgwriter. But the details are going to depend on the platform's CPU vs I/O balance, which no doubt explains why some of us don't see it. I don't think we want to revert that patch --- not trashing the whole buffer arena seems like a Good Thing from a system-wide point of view, even if it makes individual COPY operations go slower. However, we could maybe play around with the tradeoffs a bit. In particular it seems like it would be useful to experiment with different ring buffer sizes. Could you try increasing the ring size allowed in src/backend/storage/buffer/freelist.c for the BULKWRITE case *** *** 384,389 --- 384,392 case BAS_BULKREAD: ring_size = 256 * 1024 / BLCKSZ; break; + case BAS_BULKWRITE: + ring_size = 256 * 1024 / BLCKSZ; + break; case BAS_VACUUM: ring_size = 256 * 1024 / BLCKSZ; break; and see if maybe we can buy back most of the loss with not too much of a ring size increase? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. Hmm. What that patch actually changes is that it prevents a bulk insert (ie COPY in) from trashing the entire shared-buffers arena. I think the reason for the WAL correlation is that once it's filled the ring buffer, creating new pages requires writing out old ones, and the WAL-before-data rule means that the copy process has to block waiting for WAL to go down to disk before it can write. When it's allowed to use the whole arena there is more chance for some of that writing to be done by the walwriter or bgwriter. But the details are going to depend on the platform's CPU vs I/O balance, which no doubt explains why some of us don't see it. hmm - In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast. and even with say fsync=off 8.4RC1 is only slightly faster than 8.3 with the same config and fsync=on so maybe there is a secondary effect at play too. I don't think we want to revert that patch --- not trashing the whole buffer arena seems like a Good Thing from a system-wide point of view, even if it makes individual COPY operations go slower. However, we could maybe play around with the tradeoffs a bit. In particular it seems like it would be useful to experiment with different ring buffer sizes. Could you try increasing the ring size allowed in src/backend/storage/buffer/freelist.c for the BULKWRITE case *** *** 384,389 --- 384,392 case BAS_BULKREAD: ring_size = 256 * 1024 / BLCKSZ; break; + case BAS_BULKWRITE: + ring_size = 256 * 1024 / BLCKSZ; + break; case BAS_VACUUM: ring_size = 256 * 1024 / BLCKSZ; break; and see if maybe we can buy back most of the loss with not too much of a ring size increase? already started testing that once I found the offending commit. 256 * 1024 / BLCKSZ 4min10s/4min19/4min12 512 * 1024 / BLCKSZ 3min27s/3min32s 1024 * 1024 / BLCKSZ 3min14s/3min12s 2048 * 1024 / BLCKSZ 3min02/3min02 4096 * 1024 / BLCKSZ 2m59/2m58s 8192 * 1024 / BLCKSZ 2m59/2m59s so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more or less the same performance that 8.3 had. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more or less the same performance that 8.3 had. Can some folks test this with different size COPYs? That's both larger/smaller tables, and larger/smaller rows. We should also test copy with large blob data. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, 2009-06-19 at 14:11 -0400, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. Hmm. What that patch actually changes is that it prevents a bulk insert (ie COPY in) from trashing the entire shared-buffers arena. I think the reason for the WAL correlation is that once it's filled the ring buffer, creating new pages requires writing out old ones, and the WAL-before-data rule means that the copy process has to block waiting for WAL to go down to disk before it can write. When it's allowed to use the whole arena there is more chance for some of that writing to be done by the walwriter or bgwriter. But the details are going to depend on the platform's CPU vs I/O balance, which no doubt explains why some of us don't see it. I don't think we want to revert that patch --- not trashing the whole buffer arena seems like a Good Thing from a system-wide point of view, even if it makes individual COPY operations go slower. However, we could maybe play around with the tradeoffs a bit. In particular it seems like it would be useful to experiment with different ring buffer sizes. Could you try increasing the ring size allowed in src/backend/storage/buffer/freelist.c for the BULKWRITE case Yes, that's definitely the cause. The ring buffer technique was prototyped in 8.3 and a ring of 32 blocks was found not to be sufficient, which was one reason we didn't try to commit that then. At the time, I also proposed a filled buffer list change to bufmgr to allow bgwriter to preferentially target COPY's filled blocks, which would also help with this effect. You'll get better benefit from decreasing wal_writer_delay also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote: In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast. The server Alan identified as Solaris 10 8/07 s10x_u4wos_12b X86 has a Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150). The filesystem involved in that particular case is UFS, which I am suspicious of as being part of why the problem is so pronounced there--the default UFS tuning is pretty lightweight in terms of how much caching it does. Not sure if Alan ran any tests against the big ZFS volume on the other sever, I think all the results he posted were from the UFS boot drive there too. so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more or less the same performance that 8.3 had. It looks like it's a touch higher on our 8/07 system, it levels out at 8192 * (haven't checked the other one yet). I'm seeing this, using Alan's original test set size (to make sure I was running exactly the same test) and just grabbing the low/high from a set of 3 runs: 8.3.7: 0m39.266s 0m43.269s (alan: 36.2 - 39.2) 256:0m50.435s 0m51.944s (alan: 48.1 - 50.6) 1024: 0m47.299s 0m49.346s 4096: 0m43.725s 0m46.116s 8192: 0m40.715s 0m42.480s 16384: 0m41.318s 0m42.118s 65536: 0m41.675s 0m42.955s I collected some iostat data here as well for some of the runs (the vmstat data was harder to read, this being Solaris, and didn't seem to add anything). I'm seeing lines like this with the default ring buffer of 256 *: ttysd1 sd2 nfs1 cpu tin tout kps tps serv kps tps serv kps tps serv us sy wt id 0 322 12 10 41371 275400 00 12 11 0 78 0 166 0 00 46246 338000 00 14 10 0 76 0 164 0 00 44874 306810 00 13 9 0 78 Obviously sd2 is where the database and source file are at. Basically, about one core (out of four) tied up with a pretty even split of user/system time. Using the highest ring size I tried, 65536 *, gives lines that look like this: ttysd1 sd2 nfs1 cpu tin tout kps tps serv kps tps serv kps tps serv us sy wt id 0 163 0 00 56696 429100 00 20 12 0 68 0 166 0 00 58554 454200 00 21 12 0 67 0 168 0 00 56057 430800 00 21 12 0 67 So it seems like increasing the ring size helps saturate the disks better, went from ~45MB/s to 57MB/s. What's kind of interesting is to compare this against the 8.3.7 run, which is the fastest of them all, which I was expecting to find had the highest write rate of them all; ttysd1 sd2 nfs1 cpu tin tout kps tps serv kps tps serv kps tps serv us sy wt id 0 83 0 00 47654 212100 00 23 8 0 69 0 240 0 00 44198 215010 00 19 8 0 73 0 83 0 00 37750 111010 00 21 6 0 72 That's actually doing less I/O per capita, which is why it's also got less waiting for I/O%, but it's completing the most work. This makes me wonder if in addition to the ring buffering issue, there isn't just plain more writing per average completed transaction in 8.4 with this type of COPY. This might explain why even with the expanded ring buffer, both Stephan and my test runs still showed a bit of a regression against 8.3. I'm guessing we have a second, smaller shooter here involved as well. In any case, a bump of the ring multiplier to either 4096 or 8192 eliminates the worst of the regression here, good improvement so far. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 8.4 open item: copy performance regression?
Any objections if I add: http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php to the (currently empty) list of open items for 8.4? A 25-30% performance regression in our main bulk loading mechanism should at least be explained before the release... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Any objections if I add: http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php to the (currently empty) list of open items for 8.4? I am unable to duplicate any slowdown on this test case. AFAICT 8.4 and 8.3 branch tip are about the same speed; if anything 8.4 is faster. Testing on x86_64 Fedora 10 ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Any objections if I add: http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php to the (currently empty) list of open items for 8.4? I am unable to duplicate any slowdown on this test case. AFAICT 8.4 and 8.3 branch tip are about the same speed; if anything 8.4 is faster. Testing on x86_64 Fedora 10 ... I just ran the specified test on: Linux ATHENA 2.6.16.60-0.31-smp #1 SMP Tue Oct 7 16:16:29 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Linux version 2.6.16.60-0.31-smp (ge...@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Tue Oct 7 16:16:29 UTC 2008 SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 PATCHLEVEL = 2 I did configure with a prefix and these options: --enable-integer-datetimes --enable-debug --disable-nls --with-libxml I did initdb with --no-locale 8.3.7 real0m24.249s real0m24.054s real0m24.361s 8.4rc1 real0m33.503s real0m34.198s real0m33.931s -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I am unable to duplicate any slowdown on this test case. [ Kevin can ] It'd be useful first off to figure out if it's a CPU or I/O issue. Is there any visible difference in vmstat output? Also, try turning off autovacuum in both cases, just to see if that's related. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: It'd be useful first off to figure out if it's a CPU or I/O issue. Is there any visible difference in vmstat output? Also, try turning off autovacuum in both cases, just to see if that's related. Both took slightly longer with autovacuum off, but probably just within the noise. It's hard to interpret the vmstat output, largely because I chose to run this on one of our biggest servers, which is not currently serving an application, per se, but as a replication target, and this being off hours is busy running the sync process to the source machines. This involves generating md5 sums on both sides for blocks of rows, which is pretty CPU-intensive. There is very little disk output from that right now, pretty light on the disk reads, but keeping a few CPUs pretty busy generating those md5 sums. I've got to go keep an appointment, but I'll come back and see if I can do more. For now, here's the raw vmstat, in case someone can pick out info I'm missing: -Kevin baseline: procs ---memory-- ---swap-- -io -system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 3 1240 1456928 70164 12435073600 10167800 6 1 89 4 0 4 0240 1451224 70164 12435689600 558047 5457 39514 15 3 74 8 0 2 1240 1445812 70172 12436100800 4988 543 5347 36680 15 3 73 9 0 3 1240 1440196 70172 12436614400 5928 501 3431 30250 14 3 76 8 0 3 1240 143 70172 12437231200 592047 6477 34794 16 2 73 9 0 1 3240 1428916 70172 12437642400 5268 468 5449 33003 15 2 73 10 0 1 1240 1422820 70172 12438259200 5036 107 3995 26131 12 2 75 11 0 1 3240 1418556 70172 12438772800 4052 156 4562 33519 12 2 78 8 0 1 3240 1412972 70180 12439286400 5544 676 5063 32300 14 2 73 11 0 1 3240 1406964 70180 12439903200 5796 119 4743 28285 12 2 74 13 0 0 3240 1400992 70180 12440417600 6112 116 4184 15183 9 1 75 15 0 4 0240 1396372 70180 12441033600 4604 119 4123 38639 14 2 74 10 0 3 1240 1391936 70180 12441651200 4572 332 5025 38534 15 3 73 9 0 2 3240 1387324 70188 12441856000 4796 397 4271 23039 11 2 80 8 0 2 1240 1380196 70188 12442575200 6864 209 5583 32509 16 2 73 9 0 1 3240 1374980 70188 12443294400 529252 5007 37758 15 2 74 9 0 0 0240 1372708 70188 12443500800 2600 132 3363 18199 9 1 86 4 0 2 2240 1367488 70188 12443912000 5152 430 5450 42153 18 3 74 6 0 8.3.7 test: procs ---memory-- ---swap-- -io -system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 0240 590064 70164 12521939200 10167800 6 1 89 4 0 4 1240 586336 70164 12522144800 362837 4990 39525 15 3 77 5 0 2 0240 581892 70164 12522659200 455650 5597 33057 16 2 76 5 0 3 1240 578616 70164 12522967200 350042 3902 36959 14 3 80 4 0 3 1240 573520 70164 12523481600 523282 5081 26713 15 2 75 8 0 1 3240 591688 70164 12521733600 4612 15890 5256 33309 13 3 76 8 0 3 1240 586944 70172 12522041600 4596 100 4640 23263 12 2 77 9 0 3 1240 595632 70160 12521528000 5512 5814 5314 26822 16 2 73 8 0 1 2240 581200 70152 12522968000 4684 39061 3604 26682 16 3 67 14 0 4 1240 596972 70152 12521118400 1508 53915 2729 20654 9 3 83 5 0 2 4240 589184 70152 12522043200 4140 44877 3809 82485 12 4 69 16 0 1 4240 597620 70116 12520916000 5744 54096 4597 26422 14 3 66 17 0 1 4240 585936 70116 12521944000 4136 44584 3661 35574 18 3 68 11 0 0 5240 601460 70096 12520300800 4968 54009 4669 26507 12 3 72 14 0 1 4240 592604 70080 12521022400 5064 44674 4190 40394 14 3 71 12 0 2 3240 604844 70076 12519686400 5528 49601 4664 39189 18 4 68 10 0 1 4240 590348 70072 12521434400 7716 49397 4949 41086 18 3 67 11 0 1 4240 600036 70060 12520099200 4152 53968 4641 31875 16 3 66 15 0 1 4240 587336 70052 12521230400 6420 44620 4427 47627 16 4 67 13 0 3 2240 582508 70044 12521540000 4524 32812 4510 256103 18 9 60 13 0 5 1240 603680 70020 12519280800 6584 33341 3844 28189 16 3 68 13 0 4 1240 591464 70016 12520617600 6180 54710 6664 37282 20 3 64 13 0 5 1240 586924 70020 12521233600 5748 44552 5351 39772 18 3 67 11 0 4 1240 602376 70004 12519385600 5840 54181 4824 34279 17 4 67
Re: [HACKERS] 8.4 open item: copy performance regression?
On Thu, Jun 18, 2009 at 05:20:08PM -0400, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Any objections if I add: http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php to the (currently empty) list of open items for 8.4? I am unable to duplicate any slowdown on this test case. AFAICT 8.4 and 8.3 branch tip are about the same speed; if anything 8.4 is faster. Testing on x86_64 Fedora 10 ... regards, tom lane What is not clear from Stefen's function listing is how the 8.4 server could issue 33% more XLogInsert() and CopyReadLine() calls than the 8.3.7 server using the same input file. That would account for the slow down but now why it is happening. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov wrote: I've got to go keep an appointment Sorry about that. Back now. Anything else I can do to help with this? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kenneth Marshall k...@rice.edu wrote: What is not clear from Stefen's function listing is how the 8.4 server could issue 33% more XLogInsert() and CopyReadLine() calls than the 8.3.7 server using the same input file. I thought those were profiling numbers -- the number of times a timer checked what was executing and found it in that method. Which suggests that those two methods are probably slower now than in 8.3.7, at least in some environments. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers