subject:"\[HACKERS\] GIN fast insert database hang"

Re: [HACKERS] GIN fast insert database hang

2009-02-12 Thread Teodor Sigaev


This freezes the whole system even with autovacuum = off in
postgresql.conf.  As before, the backends wait on a semop() call.
Fixed. There was a deadlock of LockBufferForCleanup and LockBuffer(SHARE). 
Redesign that place to downgrade  LockBufferForCleanup to LockBuffer(EXCLUSIVE) 
with correction of page's locking during scan of pending list.




I was able to reproduce the recovery failure this way once as well,
but that part of the problem seems to be much more erratic.  Most of

Fixed too. I missed comments on XLogInsert:
 * NB: this routine feels free to scribble on the XLogRecData structs,
 * though not on the data they reference.  This is OK since the XLogRecData
 * structs are always just temporaries in the calling code.

and I reused once initialized XLogRecData many times in a loop.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.27.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas

While fooling around with the GIN fast insert patch tonight, I managed
to hang my test database.  :-(

I'm going to try to reproduce this, but here's approximately what I did.

create table foo (id serial, x int[], primary key (id));
create index foo_gin on foo using gin (x);
insert into foo (x) select array[(random() * 100)::int, (random() *
90)::int, (random()*80)::int] from generate_series(1,100);

The last insert command was run multiple times with various numbers in
place of the constant 100 through several autovacuum cycles.
Eventually, though, it froze up.

Here's pg_stat_activity, with apologies for the crappy formatting:

 datid | datname | procpid | usesysid | usename |
current_query
 | waiting |  xact_start   |  query_
start  | backend_start | client_addr | client_port
---+-+-+--+-+---

-+-+---+
---+---+-+-
 16384 | rhaas   |   15071 |   10 | rhaas   | insert into foo (x) select arr
ay[(random() * 100)::int, (random() * 90)::int, (random()*80)::int] from generat
e_series(1,100); | f   | 2009-02-11 21:42:24.984759-05 | 2009-02-11 21:4
2:24.984759-05 | 2009-02-11 21:31:45.061903-05 | |  -1
 16384 | rhaas   |   15530 |   10 | rhaas   | autovacuum: VACUUM public.foo

 | f   | 2009-02-11 21:42:10.046085-05 | 2009-02-11 21:4
2:10.046085-05 | 2009-02-11 21:42:09.947378-05 | |
 16384 | rhaas   |   15780 |   10 | rhaas   | explain analyze select sum(1)
from foo where array[1] @ x;
 | f   | 2009-02-11 21:43:20.518837-05 | 2009-02-11 21:4
3:20.518837-05 | 2009-02-11 21:43:18.717031-05 | |  -1
 16384 | rhaas   |   15994 |   10 | rhaas   | IDLE

 | f   |   | 2009-02-11 21:4

Neither the INSERT nor the EXPLAIN ANALYZE nor the autovacuum died
when sent SIGINT or SIGTERM.  They all seem to be hung on semop:

$ strace -p 15071
Process 15071 attached - interrupt to quit
semop(3866630, 0x7fff14a46be0, 1^C unfinished ...
Process 15071 detached
$ strace -p 15530
Process 15530 attached - interrupt to quit
semop(3866630, 0x7fff14a45740, 1^C unfinished ...
Process 15530 detached
$ strace -p 15780;
Process 15780 attached - interrupt to quit
semop(3866630, 0x7fff14a48c50, 1^C unfinished ...
Process 15780 detached
$ uname -a
Linux njpen7.snipinc.net 2.6.27.12-78.2.8.fc9.x86_64 #1 SMP Mon Jan 19
19:25:03 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/fedora-release
Fedora release 9 (Sulphur)

Backtrace of pid 15071:

#0  0x003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x005a8117 in PGSemaphoreLock ()
#2  0x005c6409 in LockBufferForCleanup ()
#3  0x004944d1 in ginInsertCleanup ()
#4  0x0049501d in ginHeapTupleFastInsert ()
#5  0x0048a772 in gininsert ()
#6  0x0068fb39 in FunctionCall6 ()
#7  0x00469181 in index_insert ()
#8  0x00540ee2 in ExecInsertIndexTuples ()
#9  0x00536504 in standard_ExecutorRun ()
#10 0x005e07e1 in ProcessQuery ()
#11 0x005e0a06 in PortalRunMulti ()
#12 0x005e11b2 in PortalRun ()
#13 0x005dc507 in exec_simple_query ()
#14 0x005ddad7 in PostgresMain ()
#15 0x005b2e58 in ServerLoop ()
#16 0x005b3b39 in PostmasterMain ()
#17 0x0055e1e8 in main ()

Backtrace of pid 15530:
#0  0x003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x005a8117 in PGSemaphoreLock ()
#2  0x005d624d in LWLockAcquire ()
#3  0x00494322 in ginInsertCleanup ()
#4  0x004922ed in ginvacuumcleanup ()
#5  0x0068fddc in FunctionCall2 ()
#6  0x00530e1e in lazy_scan_heap ()
#7  0x005312d0 in lazy_vacuum_rel ()
#8  0x0052f935 in vacuum_rel ()
#9  0x0052fd4c in vacuum ()
#10 0x005aa338 in do_autovacuum ()
#11 0x005aa62d in AutoVacWorkerMain ()
#12 0x005aa756 in StartAutoVacWorker ()
#13 0x005b525e in sigusr1_handler ()
#14 signal handler called
#15 0x003b1bcdcbb3 in __select_nocancel () from /lib64/libc.so.6
#16 0x005b281c in ServerLoop ()
#17 0x005b3b39 in PostmasterMain ()
#18 0x0055e1e8 in main ()

Backtrace of pid 15780:
#0  0x003b1bce6257 in semop () from /lib64/libc.so.6
#1  0x005a8117 in PGSemaphoreLock ()
#2  0x005d624d in LWLockAcquire ()
#3  0x004913ac in scanFastInsert ()
#4  0x0049201c in gingetbitmap ()
#5  0x0068fddc in FunctionCall2 ()
#6  0x0046878e in index_getbitmap ()
#7  0x005459c5 in MultiExecBitmapIndexScan ()
#8

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas

On Wed, Feb 11, 2009 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm going to try to reproduce this, but here's approximately what I did.

OK, I've managed to build a reproducible test case.  Initial setup is
just as I had before:

 create table foo (id serial, x int[], primary key (id));
 create index foo_gin on foo using gin (x);

Then just start these two commands running in different windows and wait:

while true; do psql -c explain analyze select sum(1) from foo where
array[1] @ x; done
while true; do psql -c insert into foo (x) select array[(random() *
100)::int, (random() * 90)::int, (random()*80)::int] from
generate_series(1,10);; done

I did this four times, sometimes with the variant of adding set
enable_bitmapscan to false; before the explain analyze.  In three
cases the database recovered succesfully after the immediate shutdown;
in the other case, it crapped out much as described in my original
email.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas

 I did this four times, sometimes with the variant of adding set
 enable_bitmapscan to false; before the explain analyze.  In three
 cases the database recovered succesfully after the immediate shutdown;
 in the other case, it crapped out much as described in my original
 email.

Sorry to keep replying to myself, but I've figured that autovacuum is
not required to trigger this bug.  In fact, I can reliably trigger it
much more quickly just by starting two concurrent copies of:

psql -c insert into foo (x) select array[(random() * 100)::int,
(random() * 90)::int, (random()*80)::int] from
generate_series(1,10);

This freezes the whole system even with autovacuum = off in
postgresql.conf.  As before, the backends wait on a semop() call.

I was able to reproduce the recovery failure this way once as well,
but that part of the problem seems to be much more erratic.  Most of
the time after an immediate shutdown, pg_ctl start triggers recovery
followed by normal startup.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert database hang

[HACKERS] GIN fast insert database hang

Re: [HACKERS] GIN fast insert database hang

Re: [HACKERS] GIN fast insert database hang

4 matches

Site Navigation

Mail list logo

Footer information