[HACKERS] GIN fast insert database hang

Robert Haas Wed, 11 Feb 2009 19:05:02 -0800

While fooling around with the GIN fast insert patch tonight, I managed
to hang my test database.  :-(


I'm going to try to reproduce this, but here's approximately what I did.

create table foo (id serial, x int[], primary key (id));
create index foo_gin on foo using gin (x);
insert into foo (x) select array[(random() * 100)::int, (random() *
90)::int, (random()*80)::int] from generate_series(1,1000000);

The last insert command was run multiple times with various numbers in
place of the constant 1000000 through several autovacuum cycles.
Eventually, though, it froze up.

Here's pg_stat_activity, with apologies for the crappy formatting:

 datid | datname | procpid | usesysid | usename |
                            current_query
                     | waiting |          xact_start           |          query_
start          |         backend_start         | client_addr | client_port
-------+---------+---------+----------+---------+-------------------------------
--------------------------------------------------------------------------------
---------------------+---------+-------------------------------+----------------
---------------+-------------------------------+-------------+-------------
 16384 | rhaas   |   15071 |       10 | rhaas   | insert into foo (x) select arr
ay[(random() * 100)::int, (random() * 90)::int, (random()*80)::int] from generat
e_series(1,1000000); | f       | 2009-02-11 21:42:24.984759-05 | 2009-02-11 21:4
2:24.984759-05 | 2009-02-11 21:31:45.061903-05 |             |          -1
 16384 | rhaas   |   15530 |       10 | rhaas   | autovacuum: VACUUM public.foo

                     | f       | 2009-02-11 21:42:10.046085-05 | 2009-02-11 21:4
2:10.046085-05 | 2009-02-11 21:42:09.947378-05 |             |
 16384 | rhaas   |   15780 |       10 | rhaas   | explain analyze select sum(1)
from foo where array[1] <@ x;
                     | f       | 2009-02-11 21:43:20.518837-05 | 2009-02-11 21:4
3:20.518837-05 | 2009-02-11 21:43:18.717031-05 |             |          -1
 16384 | rhaas   |   15994 |       10 | rhaas   | <IDLE>

                     | f       |                               | 2009-02-11 21:4

Neither the INSERT nor the EXPLAIN ANALYZE nor the autovacuum died
when sent SIGINT or SIGTERM.  They all seem to be hung on semop:

$ strace -p 15071
Process 15071 attached - interrupt to quit
semop(3866630, 0x7fff14a46be0, 1^C <unfinished ...>
Process 15071 detached
$ strace -p 15530
Process 15530 attached - interrupt to quit
semop(3866630, 0x7fff14a45740, 1^C <unfinished ...>
Process 15530 detached
$ strace -p 15780;
Process 15780 attached - interrupt to quit
semop(3866630, 0x7fff14a48c50, 1^C <unfinished ...>
Process 15780 detached
$ uname -a
Linux njpen7.snipinc.net 2.6.27.12-78.2.8.fc9.x86_64 #1 SMP Mon Jan 19
19:25:03 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/fedora-release
Fedora release 9 (Sulphur)

Backtrace of pid 15071:

#0  0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x00000000005a8117 in PGSemaphoreLock ()
#2  0x00000000005c6409 in LockBufferForCleanup ()
#3  0x00000000004944d1 in ginInsertCleanup ()
#4  0x000000000049501d in ginHeapTupleFastInsert ()
#5  0x000000000048a772 in gininsert ()
#6  0x000000000068fb39 in FunctionCall6 ()
#7  0x0000000000469181 in index_insert ()
#8  0x0000000000540ee2 in ExecInsertIndexTuples ()
#9  0x0000000000536504 in standard_ExecutorRun ()
#10 0x00000000005e07e1 in ProcessQuery ()
#11 0x00000000005e0a06 in PortalRunMulti ()
#12 0x00000000005e11b2 in PortalRun ()
#13 0x00000000005dc507 in exec_simple_query ()
#14 0x00000000005ddad7 in PostgresMain ()
#15 0x00000000005b2e58 in ServerLoop ()
#16 0x00000000005b3b39 in PostmasterMain ()
#17 0x000000000055e1e8 in main ()

Backtrace of pid 15530:
#0  0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x00000000005a8117 in PGSemaphoreLock ()
#2  0x00000000005d624d in LWLockAcquire ()
#3  0x0000000000494322 in ginInsertCleanup ()
#4  0x00000000004922ed in ginvacuumcleanup ()
#5  0x000000000068fddc in FunctionCall2 ()
#6  0x0000000000530e1e in lazy_scan_heap ()
#7  0x00000000005312d0 in lazy_vacuum_rel ()
#8  0x000000000052f935 in vacuum_rel ()
#9  0x000000000052fd4c in vacuum ()
#10 0x00000000005aa338 in do_autovacuum ()
#11 0x00000000005aa62d in AutoVacWorkerMain ()
#12 0x00000000005aa756 in StartAutoVacWorker ()
#13 0x00000000005b525e in sigusr1_handler ()
#14 <signal handler called>
#15 0x0000003b1bcdcbb3 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000005b281c in ServerLoop ()
#17 0x00000000005b3b39 in PostmasterMain ()
#18 0x000000000055e1e8 in main ()

Backtrace of pid 15780:
#0  0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x00000000005a8117 in PGSemaphoreLock ()
#2  0x00000000005d624d in LWLockAcquire ()
#3  0x00000000004913ac in scanFastInsert ()
#4  0x000000000049201c in gingetbitmap ()
#5  0x000000000068fddc in FunctionCall2 ()
#6  0x000000000046878e in index_getbitmap ()
#7  0x00000000005459c5 in MultiExecBitmapIndexScan ()
#8  0x00000000005455f4 in BitmapHeapNext ()
#9  0x000000000053f492 in ExecScan ()
#10 0x0000000000538360 in ExecProcNode ()
#11 0x000000000054435a in ExecAgg ()
#12 0x00000000005382a0 in ExecProcNode ()
#13 0x0000000000535ed2 in standard_ExecutorRun ()
#14 0x00000000004f9b2f in ExplainOnePlan ()
#15 0x00000000004f9d4e in ExplainQuery ()
#16 0x00000000005df8f7 in PortalRunUtility ()
#17 0x00000000005e0c3c in FillPortalStore ()
#18 0x00000000005e123d in PortalRun ()
#19 0x00000000005dc507 in exec_simple_query ()
#20 0x00000000005ddad7 in PostgresMain ()
#21 0x00000000005b2e58 in ServerLoop ()
#22 0x00000000005b3b39 in PostmasterMain ()
#23 0x000000000055e1e8 in main ()

Since fast shutdown failed, I tried an immediate shutdown, which did
work, but the database won't start back up:

LOG:  database system was interrupted; last known up at 2009-02-11 21:42:38 EST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/3600E050
FATAL:  bad buffer id: 0
CONTEXT:  xlog redo Delete list page (16), node: 19425/19426/19427 blkno: 19429
LOG:  startup process (PID 17314) exited with exit code 1
LOG:  aborting startup due to startup process failure

Ugh...

...Robert

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GIN fast insert database hang

Reply via email to