Martin Pitt wrote:
Since our Launchpad developers switched from 8.1 to 8.2.3, they often
encounter a situation when the postmaster gets stuck and needs to be
restarted. This happens on various CREATE commands (FUNCTION,
DATABASE, not consistently).
The backtraces show that the process doing the CREATION, another
client connection that InitPostgres(), and the autovacuum daemon all
hang eternally in PGSemaphoreLock(), so I suspect that there is a race
condition of some sort?
I tried to reproduce this with an hour-long run of CREATE/DROP
DATABASE, but without success.
It occurred to me that the process which does the CREATE is in
_bt_vacuum_cycleid(), is that also related to the autovacuum daemon?
I asked the original reporter (Mark Shuttleworth, in CC) to disable
the autovacuum daemon for testing.
Ok, I think I know what's happening. In btbulkdelete we have a
PG_TRY-CATCH block. In the try-block, we call _bt_start_vacuum which
acquires and releases the BtreeVacuumLock. Under certain error
conditions, _bt_start_vacuum calls elog(ERROR) while holding the
BtreeVacuumLock. The PG_CATCH block calls _bt_end_vacuum which also
tries to acquire BtreeVacuumLock.
If those error conditions in _bt_start_vacuum is triggered, we'll enter
the PG_CATCH-block and _bt_end_vacuum while already holding
BtreeVacuumLock -> deadlock.
The two error conditions in _bt_start_vacuum that throw elog(ERROR) are
the case when there's multiple vacuums for a single index, and running
out of vacuum slots. Neither scenario should ever happen, but maybe
there's a bug related to CREATE DATABASE that triggers it.
I'll investigate this more this evening or tomorrow...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate