I wrote: > It's possible that this is not a deadlock per se, but the aftermath of > someone having errored out without releasing the BtreeVacuumLock --- but > I don't entirely see how that could happen either, at least not without > a core dump scenario.
On closer inspection, the autovac stack trace #4 0x080abe38 in _bt_end_vacuum (rel=0xb5f0b298) at nbtutils.c:1028 #5 0x080a9c68 in btbulkdelete (fcinfo=0xbfc58cd8) at nbtree.c:552 suggests that _bt_end_vacuum is called from the CATCH part of btbulkdelete, and that provides an idea: if either of the elog(ERROR) calls in _bt_start_vacuum were to actually fire, it would throw control without having released BtreeVacuumLock, and then _bt_end_vacuum would hang up. _bt_start_vacuum is coded on the assumption that the LWLock would get released by transaction abort cleanup, but we'd fail before getting there. So this is definitely a bug, but the next question is what's triggering it --- both of those elogs should be "can't happen" conditions. > Is there anything in the postmaster log when this happens? I repeat that with more urgency. Do you see any "multiple active vacuums for index \"%s\"" or "out of btvacinfo slots" log messages when these hangups occur? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster