Happened again, another backtrace from a COMMIT process. I changed the commit_delay to 0 (it was 4000 to help with our storage) and will report back.
Already logging to debuglog.txt. #0 0x00007f47b0789ec7 in semop () from /lib/libc.so.6 #1 0x00007f47b2513d91 in PGSemaphoreLock (sema=0x7f47abfe7ed0, interruptOK=0 '\000') at pg_sema.c:421 #2 0x00007f47b25704f4 in LWLockWaitForVar (l=0x7f449f452680, valptr=0x7f449f4526a0, oldval=<optimized out>, newval=0xffffffffffffffff) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 #3 0x00007f47b23a93e6 in WaitXLogInsertionsToFinish (upto=121780827906336) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 #4 0x00007f47b23aa1d3 in XLogFlush (record=121780827898328) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 #5 0x00007f47b2398f17 in RecordTransactionCommit () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1196 #6 0x00007f47b239921b in CommitTransaction () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1953 #7 0x00007f47b239a8e5 in CommitTransactionCommand () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:2732 #8 0x00007f47b257c105 in finish_xact_command () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:2437 #9 0x00007f47b2580a10 in exec_execute_message (max_rows=<optimized out>, portal_name=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:1974 #10 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x7f47b49b5730 "ifms_db", username=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:4137 #11 0x00007f47b25280f5 in BackendRun (port=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:4164 #12 BackendStartup (port=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:3829 #13 ServerLoop () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1597 #14 PostmasterMain (argc=<optimized out>, argv=<optimized out>) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1244 #15 0x00007f47b24ba474 in main (argc=5, argv=0x7f47b49b4920) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/main/main.c:228 Detaching from program: /usr/lib/postgresql/9.4/bin/postgres, process 15601 *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 15:21, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 07/20/2015 03:01 PM, Andres Freund wrote: > >> Heikki, >> >> On 2015-07-20 13:27:12 +0200, Andres Freund wrote: >> >>> On 2015-07-20 13:22:42 +0200, Andres Freund wrote: >>> >>>> Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in >>>> XLogFlush(). >>>> >>> >>> These are the relevant stack traces: >>> db9lock/debuglog-commit.txt >>> #2 0x00007f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, >>> valptr=0x7f70f2ab66a0, oldval=<optimized out>, newval=0xffffffffffffffff) >>> at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 >>> #3 0x00007f7405a0d3e6 in WaitXLogInsertionsToFinish >>> (upto=121713318915952) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 >>> #4 0x00007f7405a0e1d3 in XLogFlush (record=121713318911056) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 >>> >>> db9lock/debuglog-insert-8276.txt >>> #1 0x00007f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, >>> interruptOK=0 '\000') at pg_sema.c:421 >>> #2 0x00007f7405bd4849 in LWLockAcquireCommon (val=<optimized out>, >>> valptr=<optimized out>, mode=<optimized out>, l=<optimized out>) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626 >>> #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467 >>> #4 0x00007f7405a0dcca in AdvanceXLInsertBuffer (upto=<optimized out>, >>> opportunistic=<optimized out>) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161 >>> #5 0x00007f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848 >>> #6 0x00007f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=<optimized out>, >>> StartPos=<optimized out>, rdata=0x7ffff1c21b90, isLogSwitch=<optimized >>> out>, write_len=<optimized out>) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494 >>> #7 XLogInsert (rmid=<optimized out>, info=<optimized out>, >>> rdata=<optimized out>) at /tmp/buildd/postgre >>> >> >> >> XLogFlush() has the following comment: >> /* >> * Re-check how far we can now flush the WAL. >> It's generally not >> * safe to call WaitXLogInsertionsToFinish while >> holding >> * WALWriteLock, because an in-progress insertion >> might need to >> * also grab WALWriteLock to make progress. But >> we know that all >> * the insertions up to insertpos have already >> finished, because >> * that's what the earlier >> WaitXLogInsertionsToFinish() returned. >> * We're only calling it again to allow insertpos >> to be moved >> * further forward, not to actually wait for >> anyone. >> */ >> insertpos = WaitXLogInsertionsToFinish(insertpos); >> >> but I don't think that's valid reasoning. WaitXLogInsertionsToFinish() >> calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if >> there's a exlusive locker and some backend doesn't yet have set >> initializedUpto. Which seems like a ossible state? >> > > A backend always updates its insert position before sleeping/acquiring > another lock, by calling WALInsertLockUpdateInsertingAt. So even though > another backend might indeed be in the initializedUpto==InvalidXlogRecPtr > state, it will get out of that state before either by releasing the lock or > updating initializedUpto, before it will in turn do anything that might > deadlock. > > Clearly there's *something* wrong here, though, given the bug report... > > - Heikki > >