While fooling around with the GIN fast insert patch tonight, I managed to hang my test database. :-(
I'm going to try to reproduce this, but here's approximately what I did. create table foo (id serial, x int[], primary key (id)); create index foo_gin on foo using gin (x); insert into foo (x) select array[(random() * 100)::int, (random() * 90)::int, (random()*80)::int] from generate_series(1,1000000); The last insert command was run multiple times with various numbers in place of the constant 1000000 through several autovacuum cycles. Eventually, though, it froze up. Here's pg_stat_activity, with apologies for the crappy formatting: datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_ start | backend_start | client_addr | client_port -------+---------+---------+----------+---------+------------------------------- -------------------------------------------------------------------------------- ---------------------+---------+-------------------------------+---------------- ---------------+-------------------------------+-------------+------------- 16384 | rhaas | 15071 | 10 | rhaas | insert into foo (x) select arr ay[(random() * 100)::int, (random() * 90)::int, (random()*80)::int] from generat e_series(1,1000000); | f | 2009-02-11 21:42:24.984759-05 | 2009-02-11 21:4 2:24.984759-05 | 2009-02-11 21:31:45.061903-05 | | -1 16384 | rhaas | 15530 | 10 | rhaas | autovacuum: VACUUM public.foo | f | 2009-02-11 21:42:10.046085-05 | 2009-02-11 21:4 2:10.046085-05 | 2009-02-11 21:42:09.947378-05 | | 16384 | rhaas | 15780 | 10 | rhaas | explain analyze select sum(1) from foo where array[1] <@ x; | f | 2009-02-11 21:43:20.518837-05 | 2009-02-11 21:4 3:20.518837-05 | 2009-02-11 21:43:18.717031-05 | | -1 16384 | rhaas | 15994 | 10 | rhaas | <IDLE> | f | | 2009-02-11 21:4 Neither the INSERT nor the EXPLAIN ANALYZE nor the autovacuum died when sent SIGINT or SIGTERM. They all seem to be hung on semop: $ strace -p 15071 Process 15071 attached - interrupt to quit semop(3866630, 0x7fff14a46be0, 1^C <unfinished ...> Process 15071 detached $ strace -p 15530 Process 15530 attached - interrupt to quit semop(3866630, 0x7fff14a45740, 1^C <unfinished ...> Process 15530 detached $ strace -p 15780; Process 15780 attached - interrupt to quit semop(3866630, 0x7fff14a48c50, 1^C <unfinished ...> Process 15780 detached $ uname -a Linux njpen7.snipinc.net 2.6.27.12-78.2.8.fc9.x86_64 #1 SMP Mon Jan 19 19:25:03 EST 2009 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/fedora-release Fedora release 9 (Sulphur) Backtrace of pid 15071: #0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6 #1 0x00000000005a8117 in PGSemaphoreLock () #2 0x00000000005c6409 in LockBufferForCleanup () #3 0x00000000004944d1 in ginInsertCleanup () #4 0x000000000049501d in ginHeapTupleFastInsert () #5 0x000000000048a772 in gininsert () #6 0x000000000068fb39 in FunctionCall6 () #7 0x0000000000469181 in index_insert () #8 0x0000000000540ee2 in ExecInsertIndexTuples () #9 0x0000000000536504 in standard_ExecutorRun () #10 0x00000000005e07e1 in ProcessQuery () #11 0x00000000005e0a06 in PortalRunMulti () #12 0x00000000005e11b2 in PortalRun () #13 0x00000000005dc507 in exec_simple_query () #14 0x00000000005ddad7 in PostgresMain () #15 0x00000000005b2e58 in ServerLoop () #16 0x00000000005b3b39 in PostmasterMain () #17 0x000000000055e1e8 in main () Backtrace of pid 15530: #0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6 #1 0x00000000005a8117 in PGSemaphoreLock () #2 0x00000000005d624d in LWLockAcquire () #3 0x0000000000494322 in ginInsertCleanup () #4 0x00000000004922ed in ginvacuumcleanup () #5 0x000000000068fddc in FunctionCall2 () #6 0x0000000000530e1e in lazy_scan_heap () #7 0x00000000005312d0 in lazy_vacuum_rel () #8 0x000000000052f935 in vacuum_rel () #9 0x000000000052fd4c in vacuum () #10 0x00000000005aa338 in do_autovacuum () #11 0x00000000005aa62d in AutoVacWorkerMain () #12 0x00000000005aa756 in StartAutoVacWorker () #13 0x00000000005b525e in sigusr1_handler () #14 <signal handler called> #15 0x0000003b1bcdcbb3 in __select_nocancel () from /lib64/libc.so.6 #16 0x00000000005b281c in ServerLoop () #17 0x00000000005b3b39 in PostmasterMain () #18 0x000000000055e1e8 in main () Backtrace of pid 15780: #0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6 #1 0x00000000005a8117 in PGSemaphoreLock () #2 0x00000000005d624d in LWLockAcquire () #3 0x00000000004913ac in scanFastInsert () #4 0x000000000049201c in gingetbitmap () #5 0x000000000068fddc in FunctionCall2 () #6 0x000000000046878e in index_getbitmap () #7 0x00000000005459c5 in MultiExecBitmapIndexScan () #8 0x00000000005455f4 in BitmapHeapNext () #9 0x000000000053f492 in ExecScan () #10 0x0000000000538360 in ExecProcNode () #11 0x000000000054435a in ExecAgg () #12 0x00000000005382a0 in ExecProcNode () #13 0x0000000000535ed2 in standard_ExecutorRun () #14 0x00000000004f9b2f in ExplainOnePlan () #15 0x00000000004f9d4e in ExplainQuery () #16 0x00000000005df8f7 in PortalRunUtility () #17 0x00000000005e0c3c in FillPortalStore () #18 0x00000000005e123d in PortalRun () #19 0x00000000005dc507 in exec_simple_query () #20 0x00000000005ddad7 in PostgresMain () #21 0x00000000005b2e58 in ServerLoop () #22 0x00000000005b3b39 in PostmasterMain () #23 0x000000000055e1e8 in main () Since fast shutdown failed, I tried an immediate shutdown, which did work, but the database won't start back up: LOG: database system was interrupted; last known up at 2009-02-11 21:42:38 EST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/3600E050 FATAL: bad buffer id: 0 CONTEXT: xlog redo Delete list page (16), node: 19425/19426/19427 blkno: 19429 LOG: startup process (PID 17314) exited with exit code 1 LOG: aborting startup due to startup process failure Ugh... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers