[HACKERS] 7.5-dev, pg_dumpall, dollarquoting
Hi! since we have a lot of databases here that suffer from pg_dump's deficits in 7.3 and 7.4 regarding dependencies, we tried pg_dump from the upcoming 7.5 release. This version works much better, but it is not possible to dump a complete cluster using pg_dumpall in a 7.3 or 7.4 compatible way because pg_dumpall lacks the -X disable-dollar-quoting switch. Would it be possible to modify pg_dumpall to accept the same commands as pg_dump (at least those that make sense) - or am I missing something here ? thanks Stefan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Tablespace issues (comment on ,moving indexes)
Hi! I'm currently working on the psql tab-complete code, fixing quite a lot of bugs/annoyances in the process. One of the things I'm trying to do is syncing the available commands in psql with the docs - during this work I found two irritating things regarding tablespaces: 1. there is no COMMENT ON TABLESPACE support - it is neither documented nor does it seem to work using the obvious syntax (COMMENT ON TABLESPACE 'foo' IS 'bar'). 2. how is one supposed to move indexes(not tables) to another tablespace? The (devel)docs have this in the ALTER TABLE - section: This form changes the table's tablespace to the specified tablespace and moves the data file(s) associated with the table to the new tablespace. Indexes on the table, if any, are not moved; but they can be moved separately with additional SET TABLESPACE commands. not sure how to interpret that - who would an example for moving an index look like given that (AFAIR there is nothing like ALTER INDEX 'foo' SET TABLESPACE 'bar') ? thanks Stefan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] tablespace and sequences?
Fabien COELHO wrote: (3) psql auto completion does not have CREATE/DROP TABLESPACE in its list. I have already posted a patch for this(http://candle.pha.pa.us/mhonarc/patches/msg0.html) and afaik it is on Bruce's Beta-TODO list too. Stefan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Regression test failures
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I am still seeing random regression test failures on my SMP BSD/OS machine. It basically happens when doing 'gmake check'. I have tried running repeated tests and can't get it to reproduce, but when checking patches it has happened perhaps once a week for the past six weeks. It happens once and then doesn't happen again. I will keep investigating. I reported this perhaps three weeks ago. Do these failures look anything like this? --- 78,86 DROP TABLE foo; CREATE TABLE bar (a int); ROLLBACK TO SAVEPOINT one; ! WARNING: AbortSubTransaction while in ABORT state ! ERROR: relation 555088 deleted while still in use ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost I got this once this morning and have been unable to reproduce it. The OID referenced in the message seemed to correspond to the relation bar, created just above the point of error. Just for the record I had strange errors too on beta1 - when playing with creating/deleting/altering tables and savepoints(not sure if that is related anyhow). I had it once two times in a row, but when I tried to build a testcase to report this issue I couldn't reproduce it again :-( iirc the error I got was something along the line of: ERROR: catalog is missing 1 attribute(s) for relid 17231 Stefan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] -HEAD build failure on OpenBSD 3.6-current/Sparc64 +patch
this one got caught by the testfarm as well - it looks like the openbsd-specific makefile is missing a -fPIC for the Sparc platform(I would assume that at least NetBSD/sparc is affected too but I don't have access to such a system to test on). And I also think that -shared is now prefered/recommended on OpenBSD/elf too - a small patch implementing those chances is attached and passes regress on my box. The actual compile failure is in contrib/tsearch2 (because it looks like the postgresql libs itself are small enough to avoid the 8k GOT limit). Stefan gmake[1]: Entering directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.8969/contrib/tsearch2' sed -e 's,MODULE_PATHNAME,$libdir/tsearch2,g' \ -e 's,DATA_PATH,/home/pgbuild/pgbuildfarm/HEAD/inst/share/postgresql/contrib,g' tsearch.sql.in tsearch2.sql cp untsearch.sql.in untsearch2.sql gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o dict_ex.o dict_ex.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o dict.o dict.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o snmap.o snmap.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o stopword.o stopword.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o common.o common.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o prs_dcfg.o prs_dcfg.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o dict_snowball.o dict_snowball.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o dict_ispell.o dict_ispell.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o dict_syn.o dict_syn.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o wparser.o wparser.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o wparser_def.o wparser_def.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o ts_cfg.o ts_cfg.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o tsvector.o tsvector.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o rewrite.o rewrite.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o crc32.o crc32.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o query.o query.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o gistidx.o gistidx.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o tsvector_op.o tsvector_op.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o rank.o rank.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./snowball -I./ispell -I./wordparser -I. -I../../src/include -c -o ts_stat.o ts_stat.c gmake -C snowball SUBSYS.o gmake[2]: Entering directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.8969/contrib/tsearch2/snowball' gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -DPIC -I./.. -I. -I../../../src/include -c -o english_stem.o english_stem.c gcc -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -fpic
Re: [HACKERS] -HEAD build failure on OpenBSD 3.6-current/Sparc64
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: this one got caught by the testfarm as well - it looks like the openbsd-specific makefile is missing a -fPIC for the Sparc platform(I would assume that at least NetBSD/sparc is affected too but I don't have access to such a system to test on). Why did you remove -DPIC ? uhm partly because I sent the wrong patch and partly because I didn't understood what that to do anyway(in the !Sparc case). The only place I can find on my machine where defining PIC seems to have an effect is in /usr/include/sparc64/asm.h - so I would guess it was a no-op anyway on Openbsd/!Sparc. But my programming skills are somewhat limited so I would definitly need some guidance on this or I will just sent another patch with adds -DPIC back for both cases if you want. Stefan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] -HEAD build failure on OpenBSD 3.6-current/Sparc64
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Why did you remove -DPIC ? uhm partly because I sent the wrong patch and partly because I didn't understood what that to do anyway(in the !Sparc case). The only place I can find on my machine where defining PIC seems to have an effect is in /usr/include/sparc64/asm.h - so I would guess it was a no-op anyway on Openbsd/!Sparc. It may do nothing, but since it was in there and not causing trouble, I think it's probably best to leave it. agreed - leaving it in is definitely a safer approach But my programming skills are somewhat limited so I would definitly need some guidance on this or I will just sent another patch with adds -DPIC back for both cases if you want. No, I can change it. thanks Stefan ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] -HEAD regression failure on OpenBSD 3.6-current/x86
One of my boxes(emu) on the buildfarm fails to pass the float8 regressiontest: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2004-10-31%2003:35:02 the interesting thing is that spoonbill (slightly older OpenBSD-current/Sparc64) passes this test(but fails contribcheck later on): http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2004-10-30%2023:50:04 A wild guess is that the difference might be that OpenBSD/x86 is still using a 2.95.x compiler in the base system and Sparc64 already has 3.3.x... Yet it looks like that the float8 issue is not really fatal - (-0 vs 0) just annoying :-). Stefan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] OpenBSD/Sparc status
Tom Lane wrote: The answer is: it's a gcc bug. The attached program should print x = 12.3 y = 12.3 but if compiled with -O or -O2 on Stefan's machine, I get garbage: $ gcc -O ftest.c $ ./a.out x = 12.3 y = 1.47203e-39 woa - scary. I will report that to the OpenBSD-folks upstream - many thanks for the nice testcase! Stefan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] OpenBSD/Sparc status
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Meanwhile, what do we do? Turn off -O in src/template/openbsd for some/all releases? Certainly not. This problem is only known to exist in one gcc version for one architecture, and besides it's only affecting (so far as we can tell) one rather inessential contrib module. I'd say ignore the test failure until Stefan can get a fixed gcc. FWIW: I got the bug confirmed by Miod Vallat (OpenBSD hacker) on IRC, it looks that at least OpenBSD 3.6-STABLE and OpenBSD-current on Sparc64 with the stock system compiler are affected. Stefan ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] OpenBSD/Sparc status
Tom Lane wrote: Darcy Buskermolen [EMAIL PROTECTED] writes: I can confirm this behavior on Solaris 8/sparc 64 as well. bash-2.03$ gcc -m64 -O2 test.c bash-2.03$ ./a.out x = 12.3 y = 2.51673e-42 bash-2.03$ gcc -m64 -O3 test.c bash-2.03$ ./a.out x = 12.3 y = 12.3 bash-2.03$ Hmm. I hadn't bothered to try -O3 ... interesting that it works correctly again at that level. -O3 works on my box too Anyway, this proves that it is an upstream gcc bug and not something OpenBSD broke. I just tried on solaris9 with gcc 3.4.2 - seems the bug is fixed in this version. Unfortunably it is quite problematic to change the compiler at least on OpenBSD gcc 3.3.2 is quite heavily modified on that platform and switching the base system compiler might screw a boatload of other tools. The actual recommendation I got from the OpenBSD-folks was to add -mfaster-structs to the compiler flags with seems to work around the issue - I'm currently doing a full build to verify that though ... Stefan ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] OpenBSD/Sparc status
Darcy Buskermolen wrote: On November 19, 2004 10:55 am, you wrote: The answer is: it's a gcc bug. The attached program should print x = 12.3 y = 12.3 but if compiled with -O or -O2 on Stefan's machine, I get garbage: $ gcc -O ftest.c $ ./a.out x = 12.3 y = 1.47203e-39 $ gcc -v Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd3.6/3.3.2/specs Configured with: Thread model: single gcc version 3.3.2 (propolice) $ I can confirm this behavior on Solaris 8/sparc 64 as well. some more datapoints: solaris 2.9 with gcc 3.1 is broken(-O3 does not help here) linux/sparc64 (debian) with gcc 3.3.5 is broken too So it looks like at least gcc 3.1 and gcc 3.3.x are affected on Sparc64 on all operating systems. Stefan ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] latest pgcrypto patches cause compile errors
looks like the latest pgcrypto-patches that just got applied cause widespread failures on the buildfarm machines: http://www.pgbuildfarm.org/cgi-bin/show_status.pl Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] openbsd, plpython, missing threading symbols
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: The alternative is to say that plpython isn't supported on BSDen unless you choose to build an unthreaded libpython. I'm OK with that, but if that's what's done I think we should check for it up front at configure time and not let it fail at run time like we do now. If you can create a suitable configure test, it'd be fine with me. Not sure if it is of any help but mod_python seems to be using this configure.in snippet to detect (and reject) a threaded python installation: # check if python is compiled with threads AC_MSG_CHECKING(whether Python is compiled with thread support) PyTHREADS=`$PYTHON_BIN -c import sys; print \thread\ in sys.builtin_module_names` if test $PyTHREADS = 1; then AC_MSG_RESULT(yes) echo echo ** WARNING ** echo Python is compiled with thread support. Apache 1.3 does not use threads. echo On some systems this will cause problems during compilation, on others echo it may result in unpredictable behaviour of your Apache server. Yet on echo others it will work just fine. The recommended approach is to compile echo Python without thread support in a separate location and specify it with echo --with-python option to this ./configure script. echo else AC_MSG_RESULT([no threads, good]) fi Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] psql and ROLES
Hi, I'm currently working on syncing psql's tab-complete code with the docs especially wrt ROLES. while working on this I noticed the following things: *) there is no backslash command for getting a list of Roles (like \du \dg for Users and Groups) - I'm considering using \dr for that - does that sound sensible ? *) the new connectionlimit code allows for negative Limits (beside -1) like this: playground=# CREATE ROLE testrole LOGIN CONNECTION LIMIT -9; CREATE ROLE that doesn't strike me as that useful (and it is not clear what that should mean anyway because such a user can still login) - so maybe we should reject that (and create a sensible upper bound for that too) Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] psql and ROLES
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: *) there is no backslash command for getting a list of Roles (like \du \dg for Users and Groups) - I'm considering using \dr for that - does that sound sensible ? We could just recycle \du and/or \dg for the purpose. If those should still exist as separate commands, what should they do differently from \dr? There's no longer any hard-and-fast distinction ... ok - that seems sensible - I will just reuse \du for this *) the new connectionlimit code allows for negative Limits (beside -1) Right now, any negative value is interpreted as no limit. I don't feel a pressing need to change that. ok - in that case we might consider changing the wording in the docs from -1 (the default) means no limit to something like any negative value means no limit Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Changes improve the performance of INSERT and UPDATE
Tom Lane wrote: Hiroki Kataoka [EMAIL PROTECTED] writes: This small patch improves the performance of INSERT and UPDATE. By my machine, these changes raised the performance about 5%~10% in pgbench. BTW, in profiling the backend I've never seen PageAddItem take more than about 1% of the runtime, and in pgbench in particular it seems to be down around 0.1% ... so the above seems a bit optimistic ... I have the nearly same result, but pgbench says different. I don't know why my test generates 5~10% performance improvement. Therefore, I want to take a benchmark in a reliable environment. I've been testing this patch a bit, and I'm unable to measure any consistent improvement in pgbench times (sometimes it seems to win, and some other times it doesn't). And gprof still swears up and down that PageAddItem is only about 0.1% of the runtime, which would make it impossible to obtain more than an 0.1% speedup. I'm inclined to write off your result as measurement error --- it's notoriously hard to get reproducible results out of pgbench. I played with the patch two weeks ago (pgbench and some bulkloading of production data as well as restores from large dumps) - afair I have been unable to measure any noticable real-life improvement. Trusting pgbench is quite difficult for such tests - I can get easily up to 15% variation on consecutive runs on my boxes here ... Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] ALTER ROLES - questions
Hi! I played around with roles a bit today and noticed some minor things: ALTER ROLE seems to support ALTER ROLE name ROLE name - but that form is not mentioned in the docs: playground=# CREATE ROLE myrole; CREATE ROLE playground=# CREATE ROLE myrole2; CREATE ROLE playground=# ALTER ROLE myrole ROLE myrole2; ALTER ROLE ALTER ROLE name IN ROLE name (undocumented but seems logical to try because CREATE ROLE supports that) seems to result in the following a bit cryptic error message: playground=# CREATE ROLE myrole; CREATE ROLE playground=# CREATE ROLE myrole2; CREATE ROLE playground=# ALTER ROLE myrole IN ROLE myrole2; ERROR: option addroleto not recognized I understand that adding/removing role membership can be done by the means of GRANT/REVOKE but at least improving the error message(or stopping the parser from accepting that syntax) a bit would be nice :-) Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Any MIPS assembly experts in the house?
Tom Lane wrote: I see the latest buildfarm result from a mipsel machine is failing: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2005-08-26%2005:30:07 and the failure is this: TRAP: FailedAssertion(!(lock-shared 0), File: lwlock.c, Line: 456) LOG: server process (PID 10112) was terminated by signal 6 which makes it seem highly probable that this recently committed patch to convert the MIPS out-of-line spinlock code into inline assembler isn't right: http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php Can anyone spot the problem? If not I fear we'll have to revert this. As the owner of said machine I was about to report the problem - but on a subsequent run of the buildfarm-script(to get access to the compiled source for further debugging and testing) it completed without an error. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Any MIPS assembly experts in the house?
Tom Lane wrote: I wrote: Can anyone spot the problem? If not I fear we'll have to revert this. After a bit of reading MIPS documentation, I found out that the proposed patch is exactly backward: it returns 1 if it gets the lock and 0 if the lock is already held :-( Because callers will loop on a nonzero return, the second iteration falls through, which is why the thing isn't an infinite loop. Only problem is when we hit the lock at an instant when somebody else already has it. Given the short duration of our spinlock holds, it was probably quite a coincidence that Stefan's machine got a failure almost immediately. We might have had the problem lurking for awhile. I'll try to commit something that really works in a little bit. well not sure if that counts as really works :-) http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2005-08-27%2006:33:05 Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] small pg_dumpall bug/warning in 8.1beta1
Hi! During testing of 8.1 I found that using pg_dumpall (-g) against a live 8.0 install that has at least one GROUP defined results in the following warning while it creates the CREATE ROLE statements in the dump: row number 0 is out of range 0..-1 To reproduce the problem it is enough to init a new 8.0 cluster, create a empty GROUP and dump the cluster using pg_dumpall from -HEAD(or beta1). Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] small pg_dumpall bug/warning in 8.1beta1
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: During testing of 8.1 I found that using pg_dumpall (-g) against a live 8.0 install that has at least one GROUP defined results in the following warning while it creates the CREATE ROLE statements in the dump: row number 0 is out of range 0..-1 Fixed, thanks for the report! AFAICT, this is purely cosmetic, and the dump is OK anyway. But it's definitely an oversight. Oh yeah - the dump is fine otherwise, thats why I labeled it a small bug/warning :-) Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] memcpy SEGV on AIX 5.3
Seneca Cunningham wrote: On an powerPC AIX 5.3 box, initdb from 8.1beta4 segfaults at src/backend/utils/hash/dynahash.c:673. No segfaults occur and all 98 regression tests pass if a test is added to see if keycopy is memcpy and if it is, go through a loop memcpying one byte at a time instead of memcpying everything at once. looks like I'm seeing a similiar problem(using -HEAD) on AIX 5.3ML3 using the IBM AIX c-compiler. initdb just hangs after selecting default max_connections ... in a 100% CPU-loop. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] memcpy SEGV on AIX 5.3
Stefan Kaltenbrunner wrote: Seneca Cunningham wrote: On an powerPC AIX 5.3 box, initdb from 8.1beta4 segfaults at src/backend/utils/hash/dynahash.c:673. No segfaults occur and all 98 regression tests pass if a test is added to see if keycopy is memcpy and if it is, go through a loop memcpying one byte at a time instead of memcpying everything at once. looks like I'm seeing a similiar problem(using -HEAD) on AIX 5.3ML3 using the IBM AIX c-compiler. initdb just hangs after selecting default max_connections ... in a 100% CPU-loop. yeah this seems to be the very same issue. A backtrace of a stuck initdb-process looks like: (gdb) bt #0 0x10006f10 in bcopy () #1 0x1001d398 in hash_search (hashp=0x1001d85c, keyPtr=0xf020f9fc, action=804399456, foundPtr=0x20029d78 ) at dynahash.c:673 #2 0x100a5e58 in formrdesc (relationName=0x2006bf38 t\emplat\e1, relationReltype=0, hasoids=0 '\0', natts=0, att=0x200100f8) at relcache.c:1295 #3 0x100a818c in RelationCacheInitialize () at relcache.c:2160 #4 0x102016ec in InitPostgres (dbname=0xd025b7f0 \200A, username=0x2 ) at postinit.c:424 #5 0x102009e8 in BootstrapMain (argc=271528, argv=0xb0002) at bootstrap.c:445 #6 0x1578 in main (argc=0, argv=0x0) at main.c:285 #7 0x122c in __start () Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Marc G. Fournier wrote: Tomorrow evening, I'm going to wrap up RC1, to announce it on Monday ... if anyone is sitting on *anything*, please say something before about midnight GMT ... hmm well -HEAD(and 8.0.4 too!) is broken on AIX 5.3ML3: http://archives.postgresql.org/pgsql-hackers/2005-10/msg01053.php Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: hmm well -HEAD(and 8.0.4 too!) is broken on AIX 5.3ML3: http://archives.postgresql.org/pgsql-hackers/2005-10/msg01053.php [ shrug... ] The reports of this problem have not given enough information to fix it, and since it's not a regression from 8.0, it's not going to hold up the 8.1 release. When and if we receive enough info to fix it, we'll gladly do so, but ... (My guess is that the problem is a compiler or libc bug anyway, given that one report says that replacing a memcpy call with an equivalent loop makes the failure go away.) seneca is using gcc 4.0.1, I can reproduce the sig11 with gcc 3.3.2 and the hang with the IBM AIX-compiler so that would indicate a libc-bug ... If somebody is interested I can provide access to my testbox to help in debugging ... Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.1 Release Candidate 1 Coming ...
Mag Gam wrote: Is this issue only on AIX 5.3 ML1 thru ML 3? Does the build work fine with 5.2 (ALL MLs)? 5.3 ML1 works but it is affected by the System include Bug mentioned in our AIX-FAQ. ML3 is supposed to fix that specific problem but breaks in another more difficult way as it seems ... Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Gist Recovery testing
Teodor Sigaev wrote: btree manages to avoid calling the index support functions during WAL restore --- can't you do the same? Perhaps you need to be including more information in the initial xlog records, so that the cleanup step has everything it needs. Or resort to brute-force search (which is more or less what btree does). I don't think this operation needs to be very efficient, since it's a corner case that should only seldom be invoked. I've just commited WALogging for GiST. It works for online backup (normal recovery) and mostly after crash, but in this case it can't restore inclompleted inserts although it can detected and say to log thet it's needed to reindex GiST index. FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on the buildfarm crashing during testing of GIST-related things in contrib. Any chance this could be related to this change ? Stefan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] hashtable crash (was Re: [PATCHES] Post-mortem: final 2PC patch)
Tom Lane wrote: dynahash.c thinks it should always copy 255 bytes of key, since that's what it was told the key size was ... but in this case the supplied search key has been allocated very close to the end of the process's memory, and there are not 255 bytes before the end of memory. aaah - this description rings a bell ... OpenBSD has some very useful features for configuration of malloc() - and on this particular box it has: G ``Guard''. Enable guard pages and chunk randomization. Each page size or larger allocation is followed by a guard page that will cause a segmentation fault upon any access. Smaller than page size chunks are returned in a random order. and indeed - enabling G on another (x86) OpenBSD box of mine causes make check to die there too Stefan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Gist Recovery testing
Oleg Bartunov wrote: On Wed, 15 Jun 2005, Stefan Kaltenbrunner wrote: Teodor Sigaev wrote: btree manages to avoid calling the index support functions during WAL restore --- can't you do the same? Perhaps you need to be including more information in the initial xlog records, so that the cleanup step has everything it needs. Or resort to brute-force search (which is more or less what btree does). I don't think this operation needs to be very efficient, since it's a corner case that should only seldom be invoked. I've just commited WALogging for GiST. It works for online backup (normal recovery) and mostly after crash, but in this case it can't restore inclompleted inserts although it can detected and say to log thet it's needed to reindex GiST index. FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on the buildfarm crashing during testing of GIST-related things in contrib. Any chance this could be related to this change ? Most probably :) But, wait a little bit. We have a patch currently tested and I see no problem with all GiST-based contribs on my Slackware Linux 10.1 using it. I played a little bit on lionfish(this is the result of a COPY of the btree_gist testdata into an variant of the regressiontest tables) and managed to get the following backtrace: #0 gistmakedeal (state=0x0, giststate=0x7fff5128) at gist.c:597 #1 0x00436658 in gistdoinsert (r=0x2c0752e0, itup=0x100b4c10, giststate=0x7fff5128) at gist.c:325 #2 0x00436444 in gistinsert (fcinfo=0x2b52eab0) at gist.c:288 #3 0x0073522c in FunctionCall6 (flinfo=0x2b52eab0, arg1=39, arg2=0, arg3=810, arg4=5, arg5=39, arg6=5) at fmgr.c:1270 #4 0x0045aca8 in index_insert (indexRelation=0x2c0752e0, values=0x7fff6920, isnull=0x7fff69a0 , heap_t_ctid=0x100b2bec, heapRelation=0x1, check_uniqueness=0 '\0') at indexam.c:215 #5 0x00580074 in ExecInsertIndexTuples (slot=0x100ad710, tupleid=0x100b2bec, estate=0x100ab5c0, is_vacuum=0 '\0') at execUtils.c:935 #6 0x00519800 in CopyFrom (rel=0x2c072a90, attnumlist=0x100ab200, binary=0 '\0', oids=0 '\0', delim=0x7c277c \t, null_print=0x7c2774 \\N, csv_mode=0 '\0', quote=0x0, escape=0x0, force_notnull_atts=0x0, header_line=0 '\0') at copy.c:1955 #7 0x00515f08 in DoCopy (stmt=0x2b52eab0) at copy.c:1032 #8 0x00671868 in ProcessUtility (parsetree=0x10090f10, params=0x0, dest=0x10090f78, completionTag=0x7fff6f78 ) at utility.c:608 #9 0x0066f5ec in PortalRunUtility (portal=0x100990c8, query=0x10090fc8, dest=0x10090f78, completionTag=0x7fff6f78 ) at pquery.c:940 #10 0x0066fbb0 in PortalRunMulti (portal=0x100990c8, dest=0x10090f78, altdest=0x10090f78, completionTag=0x7fff6f78 ) at pquery.c:1007 #11 0x0066eb30 in PortalRun (portal=0x100990c8, count=2147483647, dest=0x10090f78, altdest=0x10090f78, completionTag=0x7fff6f78 ) at pquery.c:617 #12 0x00666f60 in exec_simple_query (query_string=0x10090be8 COPY inettmp FROM STDIN) at postgres.c:1021 #13 0x0066be54 in PostgresMain (argc=4, argv=0x100513c0, username=0x10051390 pgbuild) at postgres.c:3186 #14 0x00621304 in BackendRun (port=0x10060300) at postmaster.c:2800 #15 0x006208bc in BackendStartup (port=0x10060300) at postmaster.c:2440 #16 0x0061d23c in ServerLoop () at postmaster.c:1221 #17 0x0061b6d0 in PostmasterMain (argc=3, argv=0x10050e40) at postmaster.c:930 #18 0x005ab904 in main (argc=3, argv=0x10050e40) at main.c:268 Stefan ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] What bison versions are installed on buildfarm machines?
Tom Lane wrote: Is there any way to find out $subject? I see that several of the buildfarm machines are choking on a patch I committed yesterday: guc-file.l: In function `ProcessConfigFile': guc-file.l:162: error: `YY_FLUSH_BUFFER' undeclared (first use in this function) guc-file.l:162: error: (Each undeclared identifier is reported only once guc-file.l:162: error: for each function it appears in.) make[4]: *** [guc.o] Error 1 YY_FLUSH_BUFFER is documented as a standard macro in bison 1.875, which is the oldest version we officially support. But I'm prepared to change it if there is another way that would work with a wider range of bison versions. I just verified that -HEAD is broken on Debian Sarge 3.1 (nearly all of the failing buildfarm members are Debian Sarge 3.1 boxes) - and I just verified the Problem exists on i386 too. The version of bison available on Debian Sarge reports as: bison (GNU Bison) 1.875d Written by Robert Corbett and Richard Stallman. Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] What bison versions are installed on buildfarm machines?
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: I just verified that -HEAD is broken on Debian Sarge 3.1 (nearly all of the failing buildfarm members are Debian Sarge 3.1 boxes) - and I just verified the Problem exists on i386 too. What flex version are they using? flex 2.5.31 Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] What bison versions are installed on buildfarm machines?
Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Not that hard to believe. 2.5.4 is what the major distributions are shipping. Even FC4 comes with 2.5.4a. The only reason I can see for this is that Flex is now considered a NON-GNU project. No, the major reason for it is that flex 2.5.31 is seriously broken and non-compatible with its prior releases. I wasn't aware that they'd gone so far as to remove a documented macro (one that was documented in 2.5.4 as the *preferred* way to do things, mind you) but we already knew of several other issues with it. See the archives. I'll try to snarf a copy and see if there's a way to do it that's compatible with both releases, but it's the flex authors' own fault that 2.5.31 has had such poor uptake. hmm it does not seem to work with the 2.5.4 version debian supplies as flex-old too - maybe the following debian bug report(filed against woody!) is related to this: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=194904 Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] What bison versions are installed on buildfarm machines?
Stefan Kaltenbrunner wrote: Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Not that hard to believe. 2.5.4 is what the major distributions are shipping. Even FC4 comes with 2.5.4a. The only reason I can see for this is that Flex is now considered a NON-GNU project. No, the major reason for it is that flex 2.5.31 is seriously broken and non-compatible with its prior releases. I wasn't aware that they'd gone so far as to remove a documented macro (one that was documented in 2.5.4 as the *preferred* way to do things, mind you) but we already knew of several other issues with it. See the archives. I'll try to snarf a copy and see if there's a way to do it that's compatible with both releases, but it's the flex authors' own fault that 2.5.31 has had such poor uptake. hmm it does not seem to work with the 2.5.4 version debian supplies as flex-old too - maybe the following debian bug report(filed against woody!) is related to this: sorry for the false alarm, actually i can confirm that 2.5.31 (which is the default flex on Sarge) is broken and 2.5.4 (available as flex-old) DOES work. Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] could not access status of transaction 0
Hi all! We seem to be getting this error (in german) once in a while on a rather complex database: FEHLER: konnte auf den Status von Transaktion 0 nicht zugreifen DETAIL: kann Datei /var/databases/postgres/data/pg_subtrans/57DA nicht erstellen: Die Datei existiert bereits which roughly translates to ERROR: could not access status of transaction 0 DETAIL: could not create file /var/databases/postgres/data/pg_subtrans/57DA: File exists and seems to be generated in backend/access/transam/slru.c it looks like we got those(with changing filenames) about 5 times during the last 2 months mostly during low-utilisation times (on this particular database - not on the server itself). The Server itself is a Dual AMD Opteron box running Debian Sarge/AMD64 with 64Bit Kernel and Userspace. PostgreSQL version is 8.0.5. It might be interesting to note that we use slony to replicate a few tables of this database to multiple slaves and according to our logs it always was the slony-user connected to the database that triggered this error. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] could not access status of transaction 0
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: ERROR: could not access status of transaction 0 DETAIL: could not create file /var/databases/postgres/data/pg_subtrans/57DA: File exists and seems to be generated in backend/access/transam/slru.c it looks like we got those(with changing filenames) about 5 times during the last 2 months mostly during low-utilisation times (on this particular database - not on the server itself). Hm ... can you check exactly what set of filenames exists in pg_subtrans/ when this happens? Is it always referencing pg_subtrans/, or are there similar complains about pg_clog/ ? it always complains about pg_subtrans/ , as for finding out what files are in that directory when that happens - I will try to put something in place that monitors the direcory. However it might take a while until we get a result out of this since I cannot reproduce this issue at will and it only happens rarely :-( Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] -HEAD compile failure on OpenBSD-current
Hi! -HEAD does not compile on my SMP i386 OpenBSD-current box (using a plain ./configure): gmake[4]: Leaving directory `/home/mastermind/source/pgsql/src/backend/utils/mb' /usr/local/bin/gmake -C misc SUBSYS.o gmake[4]: Entering directory `/home/mastermind/source/pgsql/src/backend/utils/mi sc' /usr/bin/flex guc-file.l sed -e 's/^yy/GUC_yy/g' -e 's/\([^a-zA-Z0-9_]\)yy/\1GUC_yy/g' lex.yy.c guc-fil e.c rm -f lex.yy.c gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wendif-labels -fno- strict-aliasing -I. -I../../../../src/include -c -o guc.o guc.c guc.c:570: error: invalid lvalue in unary `' guc.c:570: error: initializer element is not constant guc.c:570: error: (near initialization for `ConfigureNamesBool[19].variable') guc.c:577: error: initializer element is not constant guc.c:577: error: (near initialization for `ConfigureNamesBool[19]') guc.c:584: error: initializer element is not constant guc.c:584: error: (near initialization for `ConfigureNamesBool[20].gen') guc.c:587: error: initializer element is not constant guc.c:587: error: (near initialization for `ConfigureNamesBool[20]') guc.c:592: error: initializer element is not constant this failure seems to be the same one that the buildfarm member herring reports: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herringdt=2006-01-08%2005:30:24 Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fw: Is anyone interested in getting PostgreSQL working
Jim Buttafuoco wrote: Hackers, I can confirm that HEAD does not initdb because of a SIGBUS as reported below by Martin Pitt @ debian (see his email below). My build farm member (corgi) did pass all checks 6 days ago (I was having some issues with the build farm code before that). If anyone would like to SSH into the box, please contact me via email and I will get an account setup. Right now, I am trying to build 8.1 to see if it passes. I cannot confirm this - the mipsel box I have on the buildfarm (lionfish) seems to be happyily building all branches and completing make check. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fw: Is anyone interested in getting PostgreSQL working
Jim Buttafuoco wrote: Stefan, first i would ask you to fix your mailserver setup because my last Mail to you bounced with: 550 5.0.0 Sorry we don't accept mail from Austria which makes it rather difficult for me to reply to your personal mail well that is good news, can you tell me what version of linux you are using and what gcc version also. I will let Martin know. lionfish is a stock Debian/Sarge box (a cobalt cube) with gcc 3.3.5. Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] problem with large maintenance_work_mem settings and CREATE INDEX
Hi all! while playing on a new box i noticed that postgresql does not seem to be able to cope with very large settings for maintenance_work_mem. For a test I created a single table with 5 integer columns containing about 1,8B rows 8(about 300M distinct values in the column I want to index): foo=# select count(*) from testtable; count 1800201755 (1 row) I tried to create an index on one of the columns: foo=# SET maintenance_work_mem to 400; SET foo=# CREATE INDEX a_idx ON testtable(a); ERROR: invalid memory alloc request size 1073741824 foo=# SET maintenance_work_mem to 300; SET foo=# CREATE INDEX a_idx ON testtable(a); ERROR: invalid memory alloc request size 1073741824 the error is generated pretty fast (a few seconds into the create index) however: foo=# SET maintenance_work_mem to 200; SET foo=# CREATE INDEX a_idx ON testtable(a); is running now for about 10 hours with nearly no IO but pegging the CPU-core it is running on at a constent 100%. watching the process while this happens seems to indicate that the above error occures after the backend exceeds about 3,1GB in resident size. The box in question is a Dual Opteron 275 (4 cores @2,2Ghz) with 16GB of RAM and 24GB of swap. OS is Debian Sarge/AMD64 with a pure 64bit userland. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] problem with large maintenance_work_mem settings and
Stefan Kaltenbrunner wrote: Hi all! while playing on a new box i noticed that postgresql does not seem to be able to cope with very large settings for maintenance_work_mem. For a test I created a single table with 5 integer columns containing about 1,8B rows 8(about 300M distinct values in the column I want to index): foo=# select count(*) from testtable; count 1800201755 (1 row) I tried to create an index on one of the columns: foo=# SET maintenance_work_mem to 400; SET foo=# CREATE INDEX a_idx ON testtable(a); ERROR: invalid memory alloc request size 1073741824 foo=# SET maintenance_work_mem to 300; SET foo=# CREATE INDEX a_idx ON testtable(a); ERROR: invalid memory alloc request size 1073741824 the error is generated pretty fast (a few seconds into the create index) however: foo=# SET maintenance_work_mem to 200; SET foo=# CREATE INDEX a_idx ON testtable(a); is running now for about 10 hours with nearly no IO but pegging the CPU-core it is running on at a constent 100%. watching the process while this happens seems to indicate that the above error occures after the backend exceeds about 3,1GB in resident size. The box in question is a Dual Opteron 275 (4 cores @2,2Ghz) with 16GB of RAM and 24GB of swap. OS is Debian Sarge/AMD64 with a pure 64bit userland. forgot to mention that this is 8.1.3 compiled from source. Further testing shows that not only CREATE INDEX has some issue with large maintenance_work_mem settings : foo=# set maintenance_work_mem to 200; SET foo=# VACUUM ANALYZE verbose; INFO: vacuuming information_schema.sql_features ERROR: invalid memory alloc request size 204798 Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] problem with large maintenance_work_mem settings and
hubert depesz lubaczewski wrote: On 3/4/06, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: forgot to mention that this is 8.1.3 compiled from source. Further testing shows that not only CREATE INDEX has some issue with large maintenance_work_mem settings : what does it show: cat /proc/sys/kernel/shmmax 1421326592 not that I think it is related to the problem at all. It looks like I'm hitting the MaxAllocSize Limit in src/include/utils/memutils.h. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] problem with large maintenance_work_mem settings and
Stefan Kaltenbrunner wrote: hubert depesz lubaczewski wrote: On 3/4/06, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: forgot to mention that this is 8.1.3 compiled from source. Further testing shows that not only CREATE INDEX has some issue with large maintenance_work_mem settings : what does it show: cat /proc/sys/kernel/shmmax 1421326592 not that I think it is related to the problem at all. It looks like I'm hitting the MaxAllocSize Limit in src/include/utils/memutils.h. just tried to increase this limit to 4GB (from the default 1GB) and this seems to help a fair bit. (ie CREATE INDEX and VACUUM do seem to work with much higher maintainance_work_mem settings now. BUT: VACUUM VERBOSE (or VACUUM ANALYZE VERBOSE) breaks(with a rather gigantic allocation request ;-)): foo=# VACUUM VERBOSE; INFO: vacuuming information_schema.sql_features ERROR: invalid memory alloc request size 18446744073709551615 Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] problem with large maintenance_work_mem settings and
Michael Paesold wrote: Stefan Kaltenbrunner wrote: hubert depesz lubaczewski wrote: On 3/4/06, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: forgot to mention that this is 8.1.3 compiled from source. Further testing shows that not only CREATE INDEX has some issue with large maintenance_work_mem settings : what does it show: cat /proc/sys/kernel/shmmax 1421326592 not that I think it is related to the problem at all. I can second that. Maintenance work mem is not allocated in shared memory. It looks like I'm hitting the MaxAllocSize Limit in src/include/utils/memutils.h. There are two issues you have mentioned. This one is more obvious: the limitation of the memory that can be allocated. yes The other one is the very bad performance for index creation. I can only guess, but is sound like this is related to the recent discussion on hackers about issues with the qsort performance. If the theory is true, your index creation should be much faster with a much lower setting for maintenance_work_mem, so that it uses external sort. See the discussion starting here: http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php I was following this thread - and it was partly a reason why I'm playing with that(the CREATE INDEX on that table finished after about 12 hours with a bit less 2GB for maintenance_work_mem(for comparision it took me only about 2,5hours to create this table) . I'm currently testing who long it takes with very low settings to force an external sort. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: not that I think it is related to the problem at all. It looks like I'm hitting the MaxAllocSize Limit in src/include/utils/memutils.h. just tried to increase this limit to 4GB (from the default 1GB) and this seems to help a fair bit. s/help a fair bit/break a whole lot of stuff/ There are reasons for that limit, and you can't just arbitrarily rejigger it. heh - sure this is just a testbox so it was worth a try and I don't care for the data anyway ... The sorting code probably needs a defense to keep it from trying to exceed MaxAllocSize for the SortObject array; AFAIR there is no such consideration there now, but it's easily added. I'm not sure where your VACUUM failure is coming from though --- can you get a back trace from the errfinish call in that case? like(with maintenance_work_mem set to 200): (gdb) bt #0 errfinish (dummy=0) at elog.c:310 #1 0x005c6c93 in elog_finish (elevel=-4145840, fmt=0x84da50 invalid memory alloc request size %lu) at elog.c:931 #2 0x005d96a0 in MemoryContextAlloc (context=0x8d9c58, size=204798) at mcxt.c:505 #3 0x004db947 in lazy_space_alloc (vacrelstats=0x8de5b0, relblocks=6) at vacuumlazy.c:963 #4 0x004dab33 in lazy_scan_heap (onerel=0x2ad69a589cc8, vacrelstats=0x8de5b0, Irel=0x0, nindexes=0) at vacuumlazy.c:240 #5 0x004da9d1 in lazy_vacuum_rel (onerel=0x2ad69a589cc8, vacstmt=0x8c0fd0) at vacuumlazy.c:157 #6 0x004d7325 in vacuum_rel (relid=2589498568, vacstmt=0x8c0fd0, expected_relkind=-27 'Ã¥') at vacuum.c:1077 #7 0x004d6990 in vacuum (vacstmt=0x8c0fd0, relids=0x0) at vacuum.c:449 #8 0x0055e871 in PortalRunUtility (portal=0x8e0360, query=0x8c0e00, dest=0x8c1050, completionTag=0x7fc0c410 ) at pquery.c:987 #9 0x0055eb07 in PortalRunMulti (portal=0x8e0360, dest=0x8c1050, altdest=0x8c1050, completionTag=0x7fc0c410 ) at pquery.c:1054 #10 0x0055e28f in PortalRun (portal=0x8e0360, count=9223372036854775807, dest=0x8c1050, altdest=0x8c1050, completionTag=0x7fc0c410 ) at pquery.c:665 #11 0x0055a3a1 in exec_simple_query (query_string=0x8c0cf0 VACUUM VERBOSE;) at postgres.c:1002 #12 0x0055cc2c in PostgresMain (argc=4, argv=0x84c078, username=0x84c040 postgres) at postgres.c:3217 #13 0x00538a71 in BackendRun (port=0x86add0) at postmaster.c:2853 #14 0x00538550 in BackendStartup (port=0x86add0) at postmaster.c:2497 #15 0x00536b4d in ServerLoop () at postmaster.c:1230 #16 0x00535fcf in PostmasterMain (argc=3, argv=0x8493c0) at postmaster.c:941 #17 0x004fcaa5 in main (argc=3, argv=0x8493c0) at main.c:265 Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] EXPLAIN and HashAggregate
While playing around with large work_mem(or in that case a bit insane) and maintenance_work_mem settings I noticed that EXPLAIN behaves quite weird: foo=# set work_mem to 20; SET Time: 0.187 ms foo=# explain select count(*) from testtable2 group by a; QUERY PLAN --- GroupAggregate (cost=8845616.04..9731787.89 rows=37349188 width=4) - Sort (cost=8845616.04..8985385.04 rows=55907600 width=4) Sort Key: a - Seq Scan on testtable2 (cost=0.00..1088488.00 rows=55907600 width= 4) (4 rows) Time: 0.364 ms foo=# set work_mem to 250; SET Time: 0.195 ms foo=# explain select count(*) from testtable2 group by a; QUERY PLAN - HashAggregate (cost=1368026.00..1834890.85 rows=37349188 width=4) - Seq Scan on testtable2 (cost=0.00..1088488.00 rows=55907600 width=4) (2 rows) Time: 615.108 ms it looks like that postgresql is actually allocating the memory for the hashtable of the HashAggregate which is a bit unexpected for a plain EXPLAIN. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: forgot to mention that this is 8.1.3 compiled from source. See the discussion starting here: http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php I was following this thread - and it was partly a reason why I'm playing with that(the CREATE INDEX on that table finished after about 12 hours with a bit less 2GB for maintenance_work_mem(for comparision it took me only about 2,5hours to create this table) . It would be interesting to try the same test with CVS tip to see if the sorting improvements Simon and I made over the past few weeks help much. playing with CVS tip right now, it is a bit faster for both the initial bulkloading (about 5%) and for the CREATE INDEX itself (11h30min vs 11h54min) though not a dramatic improvement. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: I wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: samples %symbol name 24915704 96.2170 ltsReleaseBlock We probably need to tweak things so this doesn't get called during the final merge pass. Looking at it now. I've committed a fix for this into CVS HEAD --- please try it out. just tried that with CVS HEAD (includes the second fix too): CREATE INDEX on a 1,8B row table (5 int columns - index created on the first row about 300M distinct values): before: 11h 51min after: 3h 11min(!) Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: CREATE INDEX on a 1,8B row table (5 int columns - index created on the first row about 300M distinct values): before: 11h 51min after: 3h 11min(!) Cool. Does it seem to be I/O bound now? Would you be willing to do it over with oprofile turned on? while it now does a fair amount of IO during the whole operation, it is not yet IObound afaiks. profile: samples %symbol name 103520432 47.9018 inlineApplySortFunction 33382738 15.4471 comparetup_index 25296438 11.7054 tuplesort_heap_siftup 10089122 4.6685 btint4cmp 8395676 3.8849 LogicalTapeRead 2873556 1.3297 tuplesort_heap_insert 2796545 1.2940 tuplesort_gettuple_common 2752345 1.2736 AllocSetFree 2233889 1.0337 IndexBuildHeapScan 2035265 0.9418 heapgettup 1571035 0.7270 LWLockAcquire 1498800 0.6935 readtup_index 1213587 0.5616 index_form_tuple 1097172 0.5077 AllocSetAlloc 1056964 0.4891 heap_fill_tuple 1041172 0.4818 btbuildCallback 9900050.4581 LWLockRelease 8976620.4154 slot_deform_tuple 8585270.3973 LogicalTapeWrite 8068490.3734 PageAddItem 7641360.3536 LockBuffer trace_sort: LOG: begin index sort: unique = f, workMem = 2048000, randomAccess = f LOG: switching to external sort with 7315 tapes: CPU 4.07s/13.70u sec elapsed 17.79 sec LOG: finished writing run 1 to tape 0: CPU 240.07s/3926.66u sec elapsed 4498.49 sec LOG: performsort starting: CPU 535.66s/8138.92u sec elapsed 9435.11 sec LOG: finished writing final run 2 to tape 1: CPU 538.54s/8242.23u sec elapsed 9541.55 sec LOG: performsort done (except final merge): CPU 539.39s/8254.83u sec elapsed 9559.75 sec LOG: external sort ended, 4398827 disk blocks used: CPU 768.38s/10027.39u sec elapsed 11884.63 sec Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: samples %symbol name 103520432 47.9018 inlineApplySortFunction 33382738 15.4471 comparetup_index 25296438 11.7054 tuplesort_heap_siftup 10089122 4.6685 btint4cmp 8395676 3.8849 LogicalTapeRead 2873556 1.3297 tuplesort_heap_insert 2796545 1.2940 tuplesort_gettuple_common 2752345 1.2736 AllocSetFree 2233889 1.0337 IndexBuildHeapScan Interesting. What's the platform, and what compiler exactly? For me, gcc seems to inline inlineApplySortFunction successfully, but your compiler evidently is not doing that. Debian Sarge/AMD64 with gcc version 3.3.5 (Debian 1:3.3.5-13) running on a Dual AMD Opteron 280 (so 4 cores @2,4GHz) with 16GB of RAM and a very recent Kernel. Debian has gcc 3.4 as an optional package in Sarge too so I certainly can try that one. [...] Your machine seems not to be subject to nearly the same amount of memory delay. Which is interesting because most of the argument for changing sort algorithms seems to hinge on the assumption that main-memory delay is the main thing we need to worry about. That looks to be valid on the Xeon I'm testing but not on your machine ... hmm very interesting, Opterons are known for there very high memory bandwidth and some (rather limited) testing using various benchmarktools against a 3,2Ghz DP Xeon with 2MB L2 cache shows that the Opteron box really has a significant advantage here ... Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: LOG: begin index sort: unique = f, workMem = 8024000, randomAccess = f LOG: switching to external sort with 28658 tapes: CPU 4.18s/13.96u sec elapsed 32.43 sec LOG: finished writing run 1 to tape 0: CPU 173.56s/3425.85u sec elapsed 3814.82 sec LOG: performsort starting: CPU 344.17s/7013.20u sec elapsed 7715.45 sec LOG: finished writing final run 2 to tape 1: CPU 347.19s/7121.78u sec elapsed 7827.25 sec LOG: performsort done (except 2-way final merge): CPU 348.25s/7132.99u sec elapsed 7846.47 sec after that the postmaster is now consuming 99% CPU for about 22 hours(!) I'll look into it, but I was already wondering if we shouldn't bound the number of tapes somehow. It's a bit hard to believe that 28000 tapes is a sane setting. heh - don't think it is a sane setting either (and it doesn't look like that pg is using more than 2GB anyway). If this testing helps with defining appropriate upper bounds to prevent bad behaviour like this (not responding to signals any more and eating CPU like mad) I'm more than happy. And the ltsReleaseBlock-fix already reduced dumprestore times for one of our production databases by at about 15% which is already quite an impressive improvment on its own ;-) Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] random observations while testing with a 1,8B row table
Hi all! During my testing of large work_mem and maintainence_work_mem setting wrt to CREATE INDEX and sorting I encountered a number of things wrt to doing various operations on such a large table (about 106GB on disk with no dead tuples). I will summarize some of the just in case somebody is interested: - table used has 5 integer columns non-indexed during the loads - hardware is a Dual Opteron 280 with 4 [EMAIL PROTECTED],4GHz and 16GB RAM, data is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL and data. 1. data loading - I'm using COPY with batches of 300M rows it takes *) with one copy running it takes about 20minutes/batch to load the data (~250k rows/sec) and virtually no context switches. *) with two copys running concurrently it takes a bit less then 30 minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall) *) with three copy it takes about 40min/batch at 140k context switches/sec (380k rows/sec overall) the profiles for those runs look very similiar to: samples %symbol name 5065118 20.9607 XLogInsert 3496868 14.4709 DoCopy 2807313 11.6174 CopyReadLine 1373621 5.6844 PageAddItem 1227069 5.0779 heap_formtuple 1193319 4.9383 LWLockAcquire 8942433.7006 hash_search 7174272.9689 LWLockRelease 6993592.8941 pg_atoi 6913852.8611 FunctionCall3 6403832.6501 heap_insert 5793312.3974 int4in 4112861.7020 AllocSetReset 3764521.5579 hash_any 3492201.4452 RelationGetBufferForTuple 2615681.0824 AllocSetAlloc 2575111.0656 ReadBuffer while the amount of IO going on is quite a lot it looks like we are still mostly CPU-bound for COPY. 2. updating all of the rows in the table: I updated all of the rows in the table with a simple UPDATE testtable set a=a+1; this took about 2,5 hours (~200rows/sec) with a profile looking like: samples %symbol name 27860285 26.5844 XLogInsert 4828077 4.6070 PageAddItem 4490535 4.2849 heap_update 4267647 4.0722 slot_deform_tuple 3996750 3.8137 LWLockAcquire 3716184 3.5460 slot_getattr 3454679 3.2965 hash_search 2998742 2.8614 hash_any 2909261 2.7760 heap_fill_tuple 2825256 2.6959 LWLockRelease 2283086 2.1785 LockBuffer 2135048 2.0373 ExecTargetList 1636017 1.5611 ExecEvalVar 1632377 1.5576 UnpinBuffer 1566087 1.4944 RelationGetBufferForTuple 1561378 1.4899 ExecMakeFunctionResultNoSets 1511366 1.4421 ReadBuffer 1381614 1.3183 heap_compute_data_size 3. vacuuming this table - it turned out that VACUUM FULL is completly unusable on a table(which i actually expected before) of this size not only to the locking involved but rather due to a gigantic memory requirement and unbelievable slowness. It seems that the heap-scan part of vacuum full completed after about 2 hours ending up with a postmaster having a resident size of about 8,5GB(!!!) with maintainance_work_mem set to 1GB. profile for this stage looks like: samples %symbol name 941058 26.0131 scan_heap 35 12.2852 HeapTupleSatisfiesVacuum 2421176.6927 TransactionIdIsInProgress 2200446.0825 _mdfd_getseg 2125715.8760 hash_search 1869635.1681 TransactionIdPrecedes 1760164.8655 SetBufferCommitInfoNeedsSave 1376683.8055 TransactionIdDidCommit 1370683.7889 PageRepairFragmentation 1114743.0814 TransactionLogFetch 1038142.8697 LWLockAcquire 1029252.8451 LWLockRelease 1024562.8321 hash_any 67199 1.8575 BufferAlloc after that the postmaster started slowly consuming more and more memory, doing virtually no IO and eating CPU like mad with a profile similiar to: samples %symbol name 2708391248 94.1869 repair_frag 155395833 5.4040 enough_space 5707137 0.1985 XLogInsert 1410703 0.0491 PageAddItem 6916160.0241 BgBufferSync I actually ended up canceling the VACUUM FULL after about 50 hours of runtime with a resident size of ~11,5GB. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] random observations while testing with a 1,8B row table
Luke Lonergan wrote: Stefan, On 3/10/06 9:40 AM, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: I will summarize some of the just in case somebody is interested: I am! heh - not surprised :-) - table used has 5 integer columns non-indexed during the loads - hardware is a Dual Opteron 280 with 4 [EMAIL PROTECTED],4GHz and 16GB RAM, data is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL and data. How many connections out of the machine? How many disks behind each LUN? 2 HBAs in the server, 2x2 possible paths to each LUN. 6 disks for the WAL and 12 disks for the data So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's 108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is? 1. data loading - I'm using COPY with batches of 300M rows it takes *) with one copy running it takes about 20minutes/batch to load the data (~250k rows/sec) and virtually no context switches. *) with two copys running concurrently it takes a bit less then 30 minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall) *) with three copy it takes about 40min/batch at 140k context switches/sec (380k rows/sec overall) So, from 15 MB/s up to about 20 MB/s. while the amount of IO going on is quite a lot it looks like we are still mostly CPU-bound for COPY. It's what we see almost always. In this case if your I/O configuration is capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s, you will be CPU limited. the IO-System I use should be capable of doing that if pushed hard enough :-) The 3x is approximate, and based on observations, the reasoning underneath it is that Postgres is writing the data several times, once to the WAL, then from the WAL to the heap files. 2. updating all of the rows in the table: I updated all of the rows in the table with a simple UPDATE testtable set a=a+1; this took about 2,5 hours (~200rows/sec) Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and post results. This scales linearly in Bizgres MPP with the number of disks and CPUs available, but I would hope for much more than that. interesting to know, but still I'm testing/playing with postgresql here not bizgres MPP ... 3. vacuuming this table - it turned out that VACUUM FULL is completly unusable on a table(which i actually expected before) of this size not only to the locking involved but rather due to a gigantic memory requirement and unbelievable slowness. Simple vacuum should be enough IMO. sure, that was mostly meant as an experiment, if I had to do this on a production database I would most likely use CLUSTER to get the desired effect (which in my case was purely getting back the diskspace wasted by dead tuples) Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] random observations while testing with a 1,8B row table
Luke Lonergan wrote: Stefan, On 3/10/06 11:48 AM, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: 2 HBAs in the server, 2x2 possible paths to each LUN. 6 disks for the WAL and 12 disks for the data So - you have 18 disks worth of potential bandwidth, not factoring loss due to RAID. That's roughly 18 * 60 = 1,080 MB/s. If we organized that into four banks, one for each CPU and made each one RAID5 and left two disks for spares, you'd have 12 disks working for you at 720MB/s, which is possibly double the number of active FC channels you have, unless they are all active, in which case you have a nicely matched 800MB/s of FC. wrong(or rather extremely optimistic) the array itself only has two (redundant) FC-loops(@2GB )to the attached expansion chassis. The array has 2 active/active controllers (with a failover penalty) with two host interfaces each, furthermore it has write-cache mirroring(to the standby controller) enabled which means the traffic has to go over the internal FC-loop too. beside that the host(as I said) itself only has two HBAs @2GB which are configured for failover which limits the hosts maximum available bandwith to less than 200MB/S per LUN. So, from 15 MB/s up to about 20 MB/s. Gee - seems a long distance from 700 MB/s potential :-) well the array is capable of about 110MB/s write per controller head (a bit more half the possible due to write mirroring enabled which uses delta-syncronisation). WAL and data are on different controllers though by default. the IO-System I use should be capable of doing that if pushed hard enough :-) I would expect some 10x this if configured well. see above ... interesting to know, but still I'm testing/playing with postgresql here not bizgres MPP ... Sure. Still, what I'd expect is something like 10x this update rate using the parallelism buried in your hardware. If you configure the same machine with 4 Bizgres MPP segments running on 4 LUNs I think you'd be shocked at the speedups. that might be true, though it might sound a bit harsh I really prefer to spend the small amount of spare time I have with testing(and helping to improve if possible) postgresql than playing with a piece of commercial software I'm not going to use anyway ... Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] random observations while testing with a 1,8B row table
Luke Lonergan wrote: Stefan, On 3/10/06 12:23 PM, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: wrong(or rather extremely optimistic) the array itself only has two (redundant) FC-loops(@2GB )to the attached expansion chassis. The array has 2 active/active controllers (with a failover penalty) with two host interfaces each, furthermore it has write-cache mirroring(to the standby controller) enabled which means the traffic has to go over the internal FC-loop too. beside that the host(as I said) itself only has two HBAs @2GB which are configured for failover which limits the hosts maximum available bandwith to less than 200MB/S per LUN. Wow - the ickiness of SAN fro a performance / value standpoint never ceases to astound me. Well while make it sound a bit like that, performance is not everything. One has to factor manageability,scalability (in terms of future upgrades using the same platform and such) and high-availability features in too. With that in mind a SAN (or a NAS - depends on the actual usecases) suddenly looks much more interesting than plain old DASD. Gee - seems a long distance from 700 MB/s potential :-) well the array is capable of about 110MB/s write per controller head (a bit more half the possible due to write mirroring enabled which uses delta-syncronisation). WAL and data are on different controllers though by default. So - you're getting 20MB/s on loading from a potential of 200MB/s? no - I can write 110MB/s on thw WAL LUN and 110MB/s on the other LUN concurrently. I would expect some 10x this if configured well. see above ... OTOH - configured well could include taking the disks out of the smart (?) chassis, plugging them into a dumb chassis and deploying 2 dual channel U320 SCSI adapters - total cost of about $3,000. as i said above even if that would work (it does not because the disks have FC-connectors) I would loose a LOT of features like being able to use the SAN for more than a single host (big one!) or doing firmware-upgrades without downtime, using SAN-replication, having cable-length exceeding 12m(makes it possible to place parts of the infrastructure at remote sites),out-of-band management,scriptable(!),... Beside that, sequential-io as you are propagating everywhere is NOT the holy grail or the sole solution to a fast database. While the SAN above really is not a screamer for that kind of application it is actually a very good performer(compared with some of the DASD based boxes) under heavy random-io and concurrent load. This has a direct measurable influence on the overall speed of our production applications which are mostly OLTP ;-) that might be true, though it might sound a bit harsh I really prefer to spend the small amount of spare time I have with testing(and helping to improve if possible) postgresql than playing with a piece of commercial software I'm not going to use anyway ... No problem - that's our job anyway - to make the case for Postgres' use in typical large scale use-cases like the one you describe. yep Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] random observations while testing with a 1,8B row table
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: 3. vacuuming this table - it turned out that VACUUM FULL is completly unusable on a table(which i actually expected before) of this size not only to the locking involved but rather due to a gigantic memory requirement and unbelievable slowness. sure, that was mostly meant as an experiment, if I had to do this on a production database I would most likely use CLUSTER to get the desired effect (which in my case was purely getting back the diskspace wasted by dead tuples) Yeah, the VACUUM FULL algorithm is really designed for situations where just a fraction of the rows have to be moved to re-compact the table. It might be interesting to teach it to abandon that plan and go to a CLUSTER-like table rewrite once the percentage of dead space is seen to reach some suitable level. CLUSTER has its own disadvantages though (2X peak disk space usage, doesn't work on core catalogs, etc). hmm very interesting idea, I for myself like it but from what i have seen people quite often use vacuum full to get their disk usage down _because_ they are running low on space (and because it's not that well known that CLUSTER could be much faster) - maybe we should add a note/hint about this to the maintenance/vacuum docs at least ? Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] problem with large maintenance_work_mem settings and
Tom Lane wrote: I wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: samples %symbol name 350318533 98.8618 mergepreread 9718220.2743 tuplesort_gettuple_common 4136740.1167 tuplesort_heap_siftup I don't have enough memory to really reproduce this, but I've come close enough that I believe I see what's happening. It's an artifact of the code I added recently to prevent the SortTuple array from growing during the merge phase, specifically the mergeslotsfree logic. I've committed a fix for this; if you're still interested, please retest. This seems to have fixed the problem - mergepreread is gone from the profile and the CREATE INDEX finished in about 2h 37 minutes: LOG: begin index sort: unique = f, workMem = 8024000, randomAccess = f LOG: switching to external sort with 28658 tapes: CPU 4.68s/11.67u sec elapsed 16.36 sec LOG: finished writing run 1 to tape 0: CPU 303.18s/3324.51u sec elapsed 3858.55 sec LOG: performsort starting: CPU 611.37s/6829.75u sec elapsed 7830.90 sec LOG: finished writing final run 2 to tape 1: CPU 614.15s/6928.92u sec elapsed 7933.38 sec LOG: performsort done (except 2-way final merge): CPU 615.26s/6940.77u sec elapsed 7951.58 sec LOG: external sort ended, 4398827 disk blocks used: CPU 827.45s/8519.86u sec elapsed 10046.31 sec profile for this run looks like: CPU: AMD64 processors, speed 2405.5 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 77645525 39.6040 comparetup_index 47024448 23.9854 btint4cmp 22508630 11.4808 tuplesort_heap_siftup 12657874 6.4563 LogicalTapeRead 4049912 2.0657 tuplesort_heap_insert 3025537 1.5432 tuplesort_gettuple_common 2922149 1.4905 AllocSetFree 2499669 1.2750 readtup_index 1725984 0.8804 AllocSetAlloc 1318344 0.6724 LWLockAcquire 1298245 0.6622 PageAddItem 1271657 0.6486 heapgettup Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] ERROR: record type has not been registered on CVS head
While trying to help somebody on IRC with slow queries against information_schema i stumbled across the following EXPLAIN buglet (much reduced from the original one and does not make a lot of sense therefore): foo=# explain SELECT * FROM information_schema.constraint_column_usage JOIN information_schema.key_column_usage ON key_column_usage.constraint_name = constraint_column_usage.constraint_name; ERROR: record type has not been registered Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] problems compiling CVS HEAD - LDAP auth and Kerberos
Tom Lane wrote: Albe Laurenz [EMAIL PROTECTED] writes: Compiling src/interfaces/libpq/fe-auth.c on Linux I get gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -fno-strict-aliasing -I../../../src/include -D_GNU_SOURCE -I/usr/kerberos/include -c -o auth.o auth.c auth.c:793:1: directives may not be used inside a macro argument auth.c:792:56: unterminated argument list invoking macro ereport auth.c: In function `CheckLDAPAuth': auth.c:794: warning: implicit declaration of function `ereport' auth.c:798: syntax error before ')' token Fixed, thanks for the report. Anybody want to enable the LDAP code on any of the buildfarm machines? This shoulda been noticed sooner. added --with-ldap to lionfish and spoonbill Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Number of dimensions of an array parameter
Thomas Hallgren wrote: I can create a function that takes a two dimension int array: CREATE FUNCTION twodims(int[][]) RETURNS void AS ... but there's nothing stopping me from calling this function with an arbitrary number of dimensions on the array. I'd like to map a parameter like the one above to a corresponding representation in Java (it would be int[][] there too). As it turns out, I can't do that. PostgreSQL will not store any information that can tell me how many dimensions that where used in the declaration, i.e. it's impossible to write a language VALIDATOR that, based on the information in pg_proc, builds a signature where the number of dimensions is reflected. This leaves me with two choices: Only allow arrays with one dimension unless the parameter is of a domain type (domains are apparently stored with the actual number of dimensions). Any call that uses an array parameter with more then one dimension will yield an exception. --OR-- Always map to Object[] instead of mapping to the correct type, . This will work since an array in Java is also an Object and all primitive types can be represented as objects (i.e. int can be a java.lang.Integer). The strong typing and the ability to use primitives are lost however. I'm leaning towards #1 and hoping that PostgreSQL will enhance the parameter type declarations to include the dimensions in future releases. ... After some more testing ... Unfortunately, I run into problems even when I use domains. Consider the following: thhal=# CREATE DOMAIN twodims as int[][]; CREATE DOMAIN thhal=# SELECT typndims FROM pg_type WHERE typname = 'twodims'; typndims -- 2 (1 row) thhal=# SELECT array_dims('{{{1,2,3},{3,4,3}},{{5,3,2},{9,9,9}}}'::twodims); array_dims - [1:2][1:2][1:3] (1 row) IMO, there is something seriously wrong here. Clearly the number of dimensions is a property of the type. Any array with a different number of dimensions should yield an error or at least be coerced into the right number of dimensions. while it would be nice to improve that - it is actually documented quite clearly. http://www.postgresql.org/docs/current/static/arrays.html has: However, the current implementation does not enforce the array size limits — the behavior is the same as for arrays of unspecified length. Actually, the current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring number of dimensions or sizes in CREATE TABLE is simply documentation, it does not affect run-time behavior. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] hashagg, statistisics and excessive memory allocation
Hi! on irc somebody complained yesterday that a simple group by on a 25M integer row caused his backend to exhaust the 3GB process limit on his 32bit built(one a box with 16GB Ram). Some testing showed that the planner was seriously underestimating the number of distinct rows in the table (with the default statistic target it estimated ~150k rows while there are about 19M distinct values) and chosing a hashagg for the aggregate. uping the statistics target to 1000 improves the estimate to about 5M rows which unfortunably is still not enough to cause the planner to switch to a groupagg with work_mem set to 256000. Some testing seems to indicate that even with perfectly matching stats like(8.1.3 here): foo=# create table testtable AS select a from generate_series(1,500) as a; SELECT foo=# CREATE INDEX test_idx on testtable (a); CREATE INDEX foo=# ANALYZE ; ANALYZE foo=# explain select a,count(*) from testtable group by a; QUERY PLAN - HashAggregate (cost=97014.73..159504.51 rows=4999182 width=4) - Seq Scan on testtable (cost=0.00..72018.82 rows=4999182 width=4) (2 rows) will use about 2,5x of what work_mem is set too, while that is partly expected it seems quite dangerous that one can even with only moderate underestimation of the expected resultcount(say 2x or 4x) run a server out of memory. Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Going for all green buildfarm results
Tom Lane wrote: I've been making another pass over getting rid of buildfarm failures. The remaining ones I see at the moment are: firefly HEAD: intermittent failures in the stats test. We seem to have fixed every other platform back in January, but not this one. kudu HEAD: one-time failure 6/1/06 in statement_timeout test, never seen before. Is it possible system was under enough load that the 1-second timeout fired before control reached the exception block? [...] FWIW: lionfish had a weird make check error 3 weeks ago which I (unsuccessfully) tried to reproduce multiple times after that: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-05-12%2005:30:14 [...] cobra, stoat, sponge 7.4: pilot error. Either install Tk or configure --without-tk. sorry for that but the issue with sponge on 7.4 was fixed nearly a week ago though there have been no changes until today to trigger a new build ;-) Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COPY (query) TO file
Andrew Dunstan wrote: Mark Woodward wrote: Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout Isn't this already being worked on? The TODO list says: Allow COPY to output from views Another idea would be to allow actual SELECT statements in a COPY. Personally I strongly favor the second option as being more flexible than the first. I second that - allowing arbitrary SELECT statements as a COPY source seems much more powerful and flexible than just supporting COPY FROM VIEW. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Going for all green buildfarm results
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: FWIW: lionfish had a weird make check error 3 weeks ago which I (unsuccessfully) tried to reproduce multiple times after that: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-05-12%2005:30:14 Weird. SELECT ''::text AS eleven, unique1, unique2, stringu1 FROM onek WHERE unique1 50 ORDER BY unique1 DESC LIMIT 20 OFFSET 39; ! ERROR: could not open relation with OID 27035 AFAICS, the only way to get that error in HEAD is if ScanPgRelation can't find a pg_class row with the mentioned OID. Presumably 27035 belongs to onek or one of its indexes. The very next command also refers to onek, and doesn't fail, so what we seem to have here is a transient lookup failure. We've found a btree bug like that once before ... wonder if there's still one left? If there is still one left it must be quite hard to trigger (using the regression tests). Like i said before - I tried quite hard to reproduce the issue back then - without any success. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] bison version
ohp@pyrenet.fr wrote: Hi, I'd like to check 2 things: What's the bison version required to compile gram.y ? Trying to set up a build farm machine, it seems I can't compile with bison 2.1 ... 2.1 should work fine - there are a number of boxes on the buildfarm running with that version (like sponge the FC5/ppc I own). What exact problem do you see on your platform ? Also where is the documentation page that gives postgresql limits (number of column/table max size of col) http://www.postgresql.org/docs/faqs.FAQ.html#item4.4 Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Test request for Stats collector performance improvement
Bruce Momjian wrote: Would some people please run the attached test procedure and report back the results? I basically need to know the patch is an improvement on more platforms than just my own. Thanks OpenBSD 3.9-current/x86: without stats: 0m6.79s real 0m1.56s user 0m1.12s system -HEAD + stats: 0m10.44s real 0m2.26s user 0m1.22s system -HEAD + stats + patch: 0m10.68s real 0m2.16s user 0m1.36s system Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Test request for Stats collector performance improvement
Bruce Momjian wrote: Would some people please run the attached test procedure and report back the results? I basically need to know the patch is an improvement on more platforms than just my own. Thanks Debian Sarge/AMD64 Kernel 2.6.16.16 (all tests done multiple times with variation of less then 10%): -HEAD: real0m0.486s user0m0.064s sys 0m0.048s -HEAD with 10 SELECT 1; queries: real0m4.763s user0m0.896s sys 0m1.232s -HEAD + stats: real0m0.720s user0m0.128s sys 0m0.096s -HEAD + stats (100k): real0m7.204s user0m1.504s sys 0m1.028s -HEAD + stats + patch: there is something weird going on here - I get either runtimes like: real0m0.729s user0m0.092s sys 0m0.100s and occasionally: real0m3.926s user0m0.144s sys 0m0.140s (always ~0,7 vs ~4 seconds - same variation as Qingqing Zhou seems to see) -HEAD + stats + patch(100k): similiar variation with: real0m7.955s user0m1.124s sys 0m1.164s and real0m11.836s user0m1.368s sys 0m1.156s (ie 7-8 seconds vs 11-12 seconds) looks like this patch is actually a loss on that box. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Test request for Stats collector performance improvement
Bruce Momjian wrote: OK, based on reports I have seen, generally stats_query_string adds 50% to the total runtime of a SELECT 1 query, and the patch reduces the overhead to 25%. that is actually not true for both of the platforms(a slow OpenBSD 3.9/x86 and a very fast Linux/x86_64) I tested on. Both of them show virtually no improvement with the patch and even worst it causes considerable (negative) variance on at least the Linux box. However, that 25% is still much too large. Consider that SELECT 1 has to travel from psql to the server, go through the parser/optimizer/executor, and then return, it is clearly wrong that the stats_query_string performance hit should be measurable. I am actually surprised that so few people in the community are concerned about this. While we have lots of people studying large queries, these small queries should also get attention from a performance perspective. I have created a new test that also turns off writing of the stats file. This will not pass regression tests, but it will show the stats write overhead. will try to run those too in a few. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Test request for Stats collector performance improvement
Bruce Momjian wrote: Stefan Kaltenbrunner wrote: Bruce Momjian wrote: OK, based on reports I have seen, generally stats_query_string adds 50% to the total runtime of a SELECT 1 query, and the patch reduces the overhead to 25%. that is actually not true for both of the platforms(a slow OpenBSD 3.9/x86 and a very fast Linux/x86_64) I tested on. Both of them show virtually no improvement with the patch and even worst it causes considerable (negative) variance on at least the Linux box. I see the results I suggested on OpenBSD that you reported. OpenBSD 3.9-current/x86: without stats: 0m6.79s real 0m1.56s user 0m1.12s system -HEAD + stats: 0m10.44s real 0m2.26s user 0m1.22s system -HEAD + stats + patch: 0m10.68s real 0m2.16s user 0m1.36s system yep those are very stable even over a large number of runs and I got similar results reported from a Debian: Linux 2.6.16 on a single processor HT 2.8Ghz Pentium compiled with gcc 4.0.4. real0m3.306s real0m4.905s real0m4.448s I am unclear on the cuase for the widely varying results you saw in Debian. I can reproduce the widely varying results on a number of x86 and x86_64 based Linux boxes here (Debian,Fedora and CentOS) though I cannot reproduce it on a Fedora core 5/ppc box. All the x86 boxes are SMP - while the ppc one is not - that might have some influence on the results. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] regresssion script hole
Michael Fuhr wrote: On Sun, Jun 18, 2006 at 07:18:07PM -0600, Michael Fuhr wrote: Maybe I'm misreading the packet, but I think the query is for ''kaltenbrunner.cc (two single quotes followed by kaltenbrunner.cc) Correction: ''.kaltenbrunner.cc yes that is exactly the issue - the postmaster tries to resolve ''.kaltenbrunner.cc multiple times during startup and getting ServFail as a response from the upstream resolver. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] regresssion script hole
Andrew Dunstan wrote: Tom Lane wrote: Anyway, the tail end of the trace shows it repeatedly sending off a UDP packet and getting practically the same data back: I'm not too up on what the DNS protocol looks like on-the-wire, but I'll bet this is it. I think it's trying to look up kaltenbrunner.cc and failing. Why are we actually looking up anything? Just so we can bind to a listening socket? Anyway, maybe the box needs a lookup line in its /etc/resolv.conf to direct it to use files first, something like lookup file bind Stefan, can you look into that? It would be a bit ugly if it's calling DNS (and failing) to resolve localhost. no - resolving localhost works fine (both using /etc/hosts and through the dns-resolver) - and I infact verified that when we initially started to investigate that issue a while ago :-) Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] regresssion script hole
Martijn van Oosterhout wrote: On Mon, Jun 19, 2006 at 09:21:21AM -0400, Tom Lane wrote: Of course the $64 question is *why* is 8.0 trying to resolve that name, particularly seeing that the later branches apparently aren't. The formatting of the message suggests it is a gethostbyname('') doing it. Did any quoting rules change between 8.0 and 8.1 w.r.t. the configuration files? I tcpdump'd the dns-traffic on that box during a postmaster startup and it's definitly trying to look up ''.kaltenbrunner.cc a lot of times. And from what it looks like it might be getting somehow rate limited by my ISPs recursive resolvers after doing the same query a dozens of times and getting a servfail every time. At least the timestamps seem to indicate that the responses are getting delayed up to 10 seconds after a number of queries ... It might be a complete shot in the dark but spoonbill worked fine on REL_8_0_STABLE until i disabled reporting 3 month ago. During this time the large escaping security fix/standard_strings patch went in - could this be related in any way ? Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] regresssion script hole
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: The question isn't whether is succeeds, it's how long it takes to succeed. When I increased the pg_regress timeout it actually went through the whole regression test happily. I suspect we have 2 things eating up the 60s timeout here: loading the timezone db and resolving whatever it is we are trying to resolve. The behavior of loading the whole TZ database was there for awhile before anyone noticed; I believe it could only be responsible for a few seconds. So the failed DNS responses must be the problem. Could we get a ktrace with timestamps on the syscalls to confirm that? Of course the $64 question is *why* is 8.0 trying to resolve that name, particularly seeing that the later branches apparently aren't. hmm maybe the later branches are trying to resolve that too - but only the combination of the TZ database loading + the failed DNS-queries is pushing the startup time over the 60 second limit on this (quite slow) box ? I will try to verify what the later branches are doing exactly ... Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [CORE] GPL Source and Copyright Questions
Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Yeah, thanks for reminding me. Will do before feature freeze. As soon as I can figure out how to generate a patch that removes directories. Don't worry about that; CVS never deletes directories. But anyway, I can easily handle removing the code. I just want someone else to stuff it into pgfoundry, because I'm not up to speed on pgfoundry. I believe the current hit-list for modules to move to pgfoundry is adddepend dbase dbmirror fulltextindex mSQL-interface mac oracle tips userlock one thing to note is that at least on IRC we are still getting a notable number of questions about fixing up constraint triggers left over from importing dumps of old pg versions. The usual answer to that is try contrib/adddepend - but i guess redirecting them to pgfoundry will work too. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Overhead for stats_command_string et al, take 2
Tom Lane wrote: The bad news is that except in the stats_command_string cases, HEAD is noticeably slower than 8.1 on the machine with slow gettimeofday. In the single-transaction test this might be blamed on the addition of statement_timestamp support (which requires a gettimeofday per statement that wasn't there in 8.1) ... but in the one-transaction- per-statement tests that doesn't hold water, because each branch is doing a gettimeofday per statement, just in different places. Can anyone else reproduce this slowdown? It might be only an artifact of these particular builds, but it's a bit too consistent in my x86 data to just ignore. This is what I get on a fast AMD Dual Opteron box(Running Debian Sarge/AMD64): 8.1.4 HEAD 100 SELECT 1; 74,74,7377,76,77 stats_command_string=1; 105,99,106 78,79,78 log_min_duration_statement=100 79,80,8175,80,76 statement_timeout=100 78,79,7875,79,77 all 3 104,108,107 82,81,81 all values in seconds with 3 consecutive runs of one million SELECT 1; queries. It takes about 48 seconds to run the same test without stat-collection btw. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Overhead for stats_command_string et al, take 2
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: This is what I get on a fast AMD Dual Opteron box(Running Debian Sarge/AMD64): 8.1.4 HEAD 100 SELECT 1;74,74,7377,76,77 stats_command_string=1; 105,99,106 78,79,78 log_min_duration_statement=100 79,80,8175,80,76 statement_timeout=10078,79,7875,79,77 all 3104,108,107 82,81,81 all values in seconds with 3 consecutive runs of one million SELECT 1; queries. It takes about 48 seconds to run the same test without stat-collection btw. I'm confused. Isn't your first table row for the case of no stat collection? Or do you mean that you have stats_row_level and/or stats_block_level on in all four cases? yes - stats_row_level and stats_block_level on in all cases (sorry for the confusion) - I can easily redo the tests without those - but that's what I had in the running conf and I only remember that after I was nearly done with all the testing :-) Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Overhead for stats_command_string et al, take 2
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Or do you mean that you have stats_row_level and/or stats_block_level on in all four cases? yes - stats_row_level and stats_block_level on in all cases (sorry for the confusion) - I can easily redo the tests without those - but that's what I had in the running conf and I only remember that after I was nearly done with all the testing :-) It'd be interesting to compare 8.1 and HEAD for the no-overhead case; I don't think you need to redo all four cases, but I'd like to see that one. 8.1:50,50,49 HEAD: 49,48,49 Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I think at some point we have to admit that _polling_ the tables, which is what autovacuum does, just isn't going to work well, no matter how much it is tweeked, and another approach should be considered for certain workload cases. Autovacuum polls in its current, first-generation implementation; what I said upthread was it needs to be smarter than that. I am not sure how you get from that to the conclusion that the very next step is to abandon the vacuuming approach altogether. yeah autovacuum still can be improved quite a lot, but as always this can be done on a step by step base. What I see in this discussion is a huge amount of the grass must be greener on the other side syndrome, and hardly any recognition that every technique has its downsides and complications. Furthermore, I do not believe that this project has the ability to support multiple fundamental storage models, as a number of people seem to be blithely suggesting. We're having a hard enough time debugging and optimizing *one* storage model. I think the correct path forward is to stick with the same basic storage model and vacuuming concept, and address the known performance issues with better-optimized vacuuming. No, it will never be perfect for every scenario, but we can certainly make it much better than it is now, without risking killing the project by introducing undebuggable, unmaintainable complexity. While I'm not an expert on MVCC - it certainly seems that sticking to the current storage model and continuing to improve on it (especially wrt vacuum performance) gradually over time (as it has happened for the last years) is a much better and safer approach than trying to do something revolutionary which in theory might (or might not) be better than the current approach for this or that workload. PostgreSQL got a _LOT_ faster for each of the last releases and by my testing -HEAD is already significantly(20-30%) faster for some of our apps than 8.1 and all that was achieved without radically redesigning a proven (reliability wise) storage engine. Maybe and only maybe one day(or when somebody comes up with a usable patch - as always) we will at the point where we really need to think about doing that but for now there seems to be still a lot of low hanging fruit left to improve for month and years to come. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] GIN index creation extremely slow ?
on IRC somebody mentioned that it took 34h to greate a GIN index (on a tsvector) on a ~3 Million column table (wikipedia dump) with a reasonable speced box (AMD 3400+). After getting hold of a dump of said table (around 4,1GB in size) I managed to get the following timings: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms ouch - that makes for a whoppy 14,6hours(!). During that time the box is completely CPU bottlenecked and during virtually no IO at all - (varing maintainance_work_mem does not seem to make any noticable difference). That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10 for the WAL and 12 disks for the data running a very recent -HEAD checkout ... It looks like we still don't have any docs for GIN in the tree so I don't know if those timings are expected or not ... Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
Teodor Sigaev wrote: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms I'll look at this, but GiST time creation is suspiciously small. Can you test on smaller table, for example with 10 records and if results are repeat, pls, send to me test suite... I won't have access to the original testcase and server for a few days but I just redid some testing on a slower personal box of mine with a smaller(but similiar) testset and on that box I could not reproduce that issue. So the problem is either caused by the size of the table or somehow by the data itself :-( Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] GIN index creation extremely slow ?
Teodor Sigaev wrote: Tom did commit a patch a while ago which made a huge difference in index creation time for tsearch by changing one routine. I don't know if it got backpatched, so it might be worth checking people are working on the same version. I saw that patch, but I still think that 7 minutes is too small :) hmm I did some further testing on that and it looks like you might indeed be right in suspecting that there is something fishy with the GIST results. It might be possible that there was some issue with the generated tsvectors (all of them empty due to hitting the too long value error case for exceeding MAXSTRPOS) in the GIST case - sorry for the confusion :-( Speaking of the too long value error message - some of the errormessages in tsvector.c are a bit terse (ie it is not really obvious what is causing the above error without looking at the source for example). Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Removing AddDepends; should I bother with a project?
Bruce Momjian wrote: Josh Berkus wrote: Folks, For the code sprint, I'm starting off by removing the projects from contrib which need to be removed by still have some usefulness. I'm not exactly sure what to do with adddepends, though. It seems unlike to lead an independent existence on pgFoundry; I'm inclined to just nuke it. I vote for the nuclear option. ;-) as I said when this first came up - we still get a sizable number of support requests from people trying to import dumps(!) of very old postgresql versions on IRC. adddepends is often of some value for those people and I would rather like to see it fixed for 8.1 and maybe even 8.2 ... Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] contrib promotion?
Greg Sabino Mullane wrote: Doesn't our inclusion of md5() pretty much blow that argument away? (Just asking). I don't think so because md5 is just a one way hash function. There is no method to decrypt anything :). Actually, I've had to install pgcrypto on more than one occasion for clients who needed to have sha1 instead of md5. I've had to install pgcrypto for other functions as well, so +1 for me on coring it, but at the least please consider adding in sha1. I don't have a very strong opinion on that but sha1() is something I need on a regular base too from pgcrypto. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Windows buildfarm support, or lack of it
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Dave Page wrote: I can bump that up as high as you'd like within reason. 4? 6 times a day? Let's go for 6, at least for HEAD. There's probably no need to check the back branches oftener than once a day, but if you can do HEAD every 4 hours that'd be great ... I will have seahorse doing Windows builds at the same rate then ... Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor
Katsuhiko Okano wrote: Tom Lane [EMAIL PROTECTED] wrote: Katsuhiko Okano [EMAIL PROTECTED] writes: It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. The problem was only postponed. Can you provide a reproducible test case for this? Seven machines are required in order to perform measurement. (DB*1,AP*2,CLient*4) Enough work load was not able to be given in two machines. (DB*1,{AP+CL}*1) It was not able to reappear to a multiplex run of pgbench or a simple SELECT query. TPC-W of a work load tool used this time is a full scratch. Regrettably it cannot open to the public. If there is a work load tool of a free license, I would like to try. FYI: there is a free tpc-w implementation done by Jan available at: http://pgfoundry.org/projects/tpc-w-php/ Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Adding a pgbench run to buildfarm
Mark Kirkwood wrote: Tom Lane wrote: Bort, Paul [EMAIL PROTECTED] writes: Andrew said I should solicit opinions as to what parameters to use. A cursory search through the archives led me to pick a scaling factor of 10, 5 users, and 100 transactions. 100 transactions seems barely enough to get through startup transients. Maybe 1000 would be good. Scale factor 10 produces an accounts table of about 130 Mb. Given that most HW these days has at least 1G of ram, this probably means not much retrieval IO is tested (only checkpoint and wal fsync). Do we want to try 100 or even 200? (or recommend scale factor such that size ram)? hmm - that 1GB is a rather optimistic estimate for most of the buildfarm boxes(mine at least). Out of the 6 ones I have - only one that actually has much RAM (allocated) and lionfish for example is rather resource starved at only 48(!) MB of RAM and very limited diskspace - which has been plenty enough until now doing the builds (with enough swap of course). I supposed that anything that would result in additional diskspace usage in excess of maybe 150MB or so would run it out of resources :-( I'm also not too keen on running excessivly long pgbench runs on some of the buildfarm members so I would prefer to make that one optional. Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [COMMITTERS] pgsql: /contrib/cube improvements: Update
Bruce Momjian wrote: Tom Lane wrote: Joshua Reich [EMAIL PROTECTED] writes: The problem is that there are new functions in cube.sql, so the output is now different and breaks the diff (to state the obvious). Actually, the new theory on this is that you should explicitly create a shell type first: CREATE TYPE cube; and then create the functions, and then make the type for real. This will still involve a change in the earthdistance expected output (down to zero expected NOTICEs) but if we're gonna fix it then let's fix it right. OK, I will wait for the next patch. Funny I tested the cube regression test, but not earthdistance. FYI: the buildfarm-script has a no reporting mode and can use preexisting sourcetrees - that's what I regulary use to test patches and modifications to the source. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [Pgbuildfarm-members] [Fwd: RE: Build farm on Windows]
Andrew Dunstan wrote: Can one of the Windows buildfarm owners please try building and running make check by hand rather than using the buildfarm script? It looks like they all stopped reporting around the same time, and this might give us a better clue about when things fall over. Also, if you're up for it, please try reversing this patch, which looks innocuous enough, but is the only thing I can see in the relevant time period that looks at all suspicious: http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php will see what i can do(it definitly hangs in make check here too) - but this issue seem to kill my box up to the point where it is impossible to login(!) and i have to hard-reboot it. Looks like it is churning CPU like mad when that happens ... Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [Pgbuildfarm-members] [Fwd: RE: Build farm on Windows]
Tom Lane wrote: I wrote: Andrew Dunstan [EMAIL PROTECTED] writes: The TimeZone changes are looking might suspicious ... FATAL: failed to initialize timezone_abbreviations to Default Hm. It looks like this is working in the postmaster but failing in subprocesses. I'll see if I can duplicate it using EXEC_BACKEND. Nope, works fine with EXEC_BACKEND, so it's something Windows-specific. I'm not sure why you're not getting any more specific messages --- they should be coming out at WARNING level AFAICS. You'll need to trace through load_tzoffsets() and see why it's failing in the subprocess. that was a bit painful but we failed to see a useful error message due to the fact that we have been activly suppressing it - with a quick hack like: --- /home/pgbuild/pgfarmbuild/HEAD/pgsql/src/backend/utils/misc/tzparser.c Tue Jul 25 05:51:21 2006 +++ src/backend/utils/misc/tzparser.c Fri Jul 28 19:33:24 2006 @@ -326,7 +326,6 @@ if (!tzFile) { /* at level 0, if file doesn't exist, guc.c's complaint is enough */ - if (errno != ENOENT || depth 0) ereport(tz_elevel, (errcode_for_file_access(), errmsg(could not read time zone file \%s\: %m, (will probably get mangled by my mailer) I get a much more useful: WARNING: could not read time zone file Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default WARNING: could not read time zone file Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default LOG: background writer process (PID 3776) exited with exit code 0 LOG: terminating any other active server processes WARNING: could not read time zone file Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default LOG: all server processes terminated; reinitializing WARNING: could not read time zone file Default: No such file or directory which gives a strong further hint at the underlying issue. Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [Pgbuildfarm-members] [Fwd: RE: Build farm on Windows]
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: I get a much more useful: WARNING: could not read time zone file Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default Hm, but why would the file not be there? Try hacking it to print the whole path it's trying to open, maybe that will help. WARNING: could not read time zone file /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default WARNING: could not read time zone file /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default: No such file or directory FATAL: failed to initialize timezone_abbreviations to Default LOG: background writer process (PID 1460) exited with exit code 0 LOG: terminating any other active server processes WARNING: could not read time zone file /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default: No such file or directory $ ls -l /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default -rw-r--r--1 pgbuild Administ28630 Jul 28 20:03 /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default so it's there but as a msys-virtual path - is that get passed to some win32 function expecting a windows-style path ? Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Going for all green buildfarm results
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: FWIW: lionfish had a weird make check error 3 weeks ago which I (unsuccessfully) tried to reproduce multiple times after that: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-05-12%2005:30:14 Weird. SELECT ''::text AS eleven, unique1, unique2, stringu1 FROM onek WHERE unique1 50 ORDER BY unique1 DESC LIMIT 20 OFFSET 39; ! ERROR: could not open relation with OID 27035 AFAICS, the only way to get that error in HEAD is if ScanPgRelation can't find a pg_class row with the mentioned OID. Presumably 27035 belongs to onek or one of its indexes. The very next command also refers to onek, and doesn't fail, so what we seem to have here is a transient lookup failure. We've found a btree bug like that once before ... wonder if there's still one left? FYI: lionfish just managed to hit that problem again: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-07-29%2023:30:06 Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Going for all green buildfarm results
Alvaro Herrera wrote: Stefan Kaltenbrunner wrote: Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: FWIW: lionfish had a weird make check error 3 weeks ago which I (unsuccessfully) tried to reproduce multiple times after that: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-05-12%2005:30:14 Weird. SELECT ''::text AS eleven, unique1, unique2, stringu1 FROM onek WHERE unique1 50 ORDER BY unique1 DESC LIMIT 20 OFFSET 39; ! ERROR: could not open relation with OID 27035 AFAICS, the only way to get that error in HEAD is if ScanPgRelation can't find a pg_class row with the mentioned OID. Presumably 27035 belongs to onek or one of its indexes. The very next command also refers to onek, and doesn't fail, so what we seem to have here is a transient lookup failure. We've found a btree bug like that once before ... wonder if there's still one left? FYI: lionfish just managed to hit that problem again: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-07-29%2023:30:06 The error message this time is ! ERROR: could not open relation with OID 27006 yeah and before it was: ! ERROR: could not open relation with OID 27035 which looks quite related :-) It's worth mentioning that the portals_p2 test, which happens in the parallel group previous to where this test is run, also accesses the onek table successfully. It may be interesting to see exactly what relation is 27006. sorry but i don't have access to the cluster in question any more (lionfish is quite resource starved and I only enabled to keep failed builds on -HEAD after the last incident ...) The test alter_table, which is on the same parallel group as limit (the failing test), contains these lines: ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1; ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1; hmm interesting - lionfish is a slow box(250Mhz MIPS) and particulary low on memory(48MB+140MB swap) so it is quite likely that the parallel regress tests are driving it into swap - maybe some sort of subtile timing issue ? Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Going for all green buildfarm results
Jim C. Nasby wrote: On Sun, Jul 30, 2006 at 11:44:44AM -0400, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Stefan Kaltenbrunner wrote: FYI: lionfish just managed to hit that problem again: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2006-07-29%2023:30:06 The test alter_table, which is on the same parallel group as limit (the failing test), contains these lines: ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1; ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1; I bet Alvaro's spotted the problem. ALTER INDEX RENAME doesn't seem to take any lock on the index's parent table, only on the index itself. That means that a query on onek could be trying to read the pg_class entries for onek's indexes concurrently with someone trying to commit a pg_class update to rename an index. If the query manages to visit the new and old versions of the row in that order, and the commit happens between, *neither* of the versions would look valid. MVCC doesn't save us because this is all SnapshotNow. Not sure what to do about this. Trying to lock the parent table could easily be a cure-worse-than-the-disease, because it would create deadlock risks (we've already locked the index before we could look up and lock the parent). Thoughts? The path of least resistance might just be to not run these tests in parallel. The chance of this issue causing problems in the real world seems small. It doesn't seem that unusual to want to rename an index on a running system, and it certainly doesn't seem like the kind of operation that should pose a problem. So at the very least, we'd need a big fat warning in the docs about how renaming an index could cause other queries in the system to fail, and the error message needs to be improved. it is my understanding that Tom is already tackling the underlying issue on a much more general base ... Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Going for all green buildfarm results
Andrew Dunstan wrote: Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Jim C. Nasby wrote: On Sun, Jul 30, 2006 at 11:44:44AM -0400, Tom Lane wrote: The path of least resistance might just be to not run these tests in parallel. The chance of this issue causing problems in the real world seems small. It doesn't seem that unusual to want to rename an index on a running system, and it certainly doesn't seem like the kind of operation that should pose a problem. So at the very least, we'd need a big fat warning in the docs about how renaming an index could cause other queries in the system to fail, and the error message needs to be improved. it is my understanding that Tom is already tackling the underlying issue on a much more general base ... Done in HEAD, but we might still wish to think about changing the regression tests in the back branches, else we'll probably continue to see this failure once in a while ... How sure are we that this is the cause of the problem? The feeling I got was this is a good guess. If so, do we want to prevent ourselves getting any further clues in case we're wrong? It's also an interesting case of a (low likelihood) bug which is not fixable on any stable branch. well I have a lot of trust into tom - though the main issue is that this issue seems to be difficult hard to trigger. afaik only one box (lionfish) ever managed to hit it and even there only 2 times out of several hundred builds - I don't suppose we can come up with a testcase that might be more reliably showing that issue ? Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.2 features status
Merlin Moncure wrote: On 8/3/06, Tom Lane [EMAIL PROTECTED] wrote: I'm not clear on why there's all this doom and gloom about how 8.2 will be merely a performance-oriented release, with few new features, eg http://archives.postgresql.org/pgsql-hackers/2006-07/msg00111.php Certainly there's been a ton of effort spent on high-end performance issues. But a quick troll through the CVS logs shows a fair number of features that could be considered killer must-have things by their respective target audiences: i can't resist this unproductive distraction from actual work. this is a huge release for me as it nails a lot of the features i've been waiting literally years for. it feels a lot like the 7.4 release where similar debates when on esp. regarding the windows port, etc. note that even if the release had no user level features at all, it would be better to release: the outside world likes to see the project is still active and moving forward. I fully agree here - 8.2 is a release that is of more interest to us than say 8.0 was. For some of our existing apps 8.2 is dramatically faster due to much better planed queries and things like 20-25% faster dump/restore cycles due to the dramatic improvements on sorting (and therefor CREATE INDEX) are really really cool things. Just switching to 8.2 makes one of our (interactive-web) app feel blazingly fast instead of just ok and that is a good thing - a very good one in fact ... And beside that the list tom posted is already damn impressive on it's own - i guess there are a number of large projects that can only dream of having a new features list like that. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] buildfarm - make check failures for leveret on 8.0
Jeremy Drake wrote: On Mon, 7 Aug 2006, Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: *) why the large difference in the build-flags ? CVS HEAD configure.in knows about icc and the release branches don't. I think the changes were only put into HEAD because of lack of testing, but if we have buildfarm coverage I think it'd be OK to back-port the configure logic to the prior branches. Plus if it is backported, I can enable 8.x builds on mongoose (my x86 icc buildfarm box). well with two buildfarm boxes (one 32bit and the other one 64bit) we have pretty good coverage that should allow to backport a rather simple fix like that. Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] buildfarm - make check failures for leveret on 8.0
Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: Plus if it is backported, I can enable 8.x builds on mongoose (my x86 icc buildfarm box). Please do --- I've applied the changes in 8.1 and 8.0 branches. and leveret went green on both 8.0 and 8.1 ... Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] buildfarm - make check failures for leveret on 8.0
Andrew Dunstan wrote: Stefan Kaltenbrunner wrote: Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: Plus if it is backported, I can enable 8.x builds on mongoose (my x86 icc buildfarm box). Please do --- I've applied the changes in 8.1 and 8.0 branches. and leveret went green on both 8.0 and 8.1 ... Good. Now we need to clean up the huge number of warnings, such as: /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/proto.h(95): warning #1292: attribute warn_unused_result ignored __attribute__warn_unused_result__; ^ and pg_restore.c(332): warning #188: enumerated type mixed with another type AH = OpenArchive(inputFileSpec, opts-format); well a large number of those look a bit bogus(annoying) - and icc has ways to disable individual warnings (indicated by the number following the #) like: -wdL1[,L2,...LN] Disable diagnostics L1 through LN. maybe we should use that(ftp://download.intel.com/support/performancetools/c/linux/v9/icc.txt has the full manpage)? Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] seahorse buildfarm issues
hi all! seahorse is struggling to submit buildfarm reports for a few days now. there seems to a rather weird thing going on since what appears to happen is that the build fails during make check with a crashing postmaster but that crash is also hanging the buildfarm-script and so it fails to submit a report. After that event - there are no remaining processes in the taskmanager but the OS still believes that some files (postgres.exe and some of the files used in the regression test) are still in use. The only way to recover from that (at least that I found as a non-windows person) is a reboot of the whole VM - this results then in a real large number of application failed to initialize properly errors during shutdown(!). the applications mentioned in the errors are more.exe,cmd.exe and diff.exe. running the buildfarm script manually works just fine - which makes that whole issue even more weird. the following is in the postmaster.log: ERROR: invalid input syntax for type circle: (3,(1,2),3) ERROR: date/time value current is no longer supported ERROR: date/time field value out of range: 1997-02-29 ERROR: invalid input syntax for type time with time zone: 15:36:39 America/New_York LOG: server process (PID 2016) exited with exit code -1073741502 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process I'm out of ideas how to go further in debugging that issue - any ideas(maybe from somebody who knows windows better than I do) ? Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Going for all green buildfarm results
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Maybe we could write a suitable test case using Martijn's concurrent testing framework. The trick is to get process A to commit between the times that process B looks at the new and old versions of the pg_class row (and it has to happen to do so in that order ... although that's not a bad bet given the way btree handles equal keys). I think the reason we've not tracked this down before is that that's a pretty small window. You could force the problem by stopping process B with a debugger breakpoint and then letting A do its thing, but short of something like that you'll never reproduce it with high probability. As far as Andrew's question goes: I have no doubt that this race condition is (or now, was) real and could explain Stefan's failure. It's not impossible that there's some other problem in there, though. If so we will still see the problem from time to time on HEAD, and know that we have more work to do. But I don't think that continuing to see it on the back branches will teach us anything. maybe the following buildfarm report means that we need a new theory :-( http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spongedt=2006-08-16%2021:30:02 Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend