On 21.06.2013 21:55, Jeff Janes wrote:
I think I'm getting an undetected deadlock between the checkpointer and a user process running a TRUNCATE command.This is the checkpointer: #0 0x0000003a73eeaf37 in semop () from /lib64/libc.so.6 #1 0x00000000005ff847 in PGSemaphoreLock (sema=0x7f8c0a4eb730, interruptOK=0 '\000') at pg_sema.c:415 #2 0x00000000004b0abf in WaitOnSlot (upto=416178159648) at xlog.c:1775 #3 WaitXLogInsertionsToFinish (upto=416178159648) at xlog.c:2086 #4 0x00000000004b657a in CopyXLogRecordToWAL (write_len=32, isLogSwitch=1 '\001', rdata=0x0, StartPos=<value optimized out>, EndPos=416192397312) at xlog.c:1389 #5 0x00000000004b6fb2 in XLogInsert (rmid=0 '\000', info=<value optimized out>, rdata=0x7fff00000020) at xlog.c:1209 #6 0x00000000004b7644 in RequestXLogSwitch () at xlog.c:8748
Hmm, it looks like the xlog-switch is trying to wait for itself to finish. The concurrent TRUNCATE is just being blocked behind the xlog-switch, which is stuck on itself.
I wasn't able to reproduce exactly that, but I got a PANIC by running pgbench and concurrently doing "select pg_switch_xlog()" many times in psql.
Attached is a new version that fixes at least the problem I saw. Not sure if it fixes what you saw, but it's worth a try. How easily can you reproduce that?
This is using the same testing harness as in the last round of this patch.
This one? http://www.postgresql.org/message-id/CAMkU=1xoa6fdyoj_4fmlqpiczr1v9gp7clnxjdhu+iggqb6...@mail.gmail.com
Is there a way for me to dump the list of held/waiting lwlocks from gdb?
You can print out the held_lwlocks array. Or to make it more friendly, write a function that prints it out and call that from gdb. There's no easy way to print out who's waiting for what that I know of.
Thanks for the testing! - Heikki
xloginsert-scale-24.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers