Martin Pitt wrote:
Since our Launchpad developers switched from 8.1 to 8.2.3, they often
encounter a situation when the postmaster gets stuck and needs to be
restarted. This happens on various CREATE commands (FUNCTION,
DATABASE, not consistently).

The backtraces show that the process doing the CREATION, another
client connection that InitPostgres(), and the autovacuum daemon all
hang eternally in PGSemaphoreLock(), so I suspect that there is a race
condition of some sort?
I tried to reproduce this with an hour-long run of CREATE/DROP
DATABASE, but without success.

It occurred to me that the process which does the CREATE is in
_bt_vacuum_cycleid(), is that also related to the autovacuum daemon?
I asked the original reporter (Mark Shuttleworth, in CC) to disable
the autovacuum daemon for testing.

Ok, I think I know what's happening. In btbulkdelete we have a PG_TRY-CATCH block. In the try-block, we call _bt_start_vacuum which acquires and releases the BtreeVacuumLock. Under certain error conditions, _bt_start_vacuum calls elog(ERROR) while holding the BtreeVacuumLock. The PG_CATCH block calls _bt_end_vacuum which also tries to acquire BtreeVacuumLock.

If those error conditions in _bt_start_vacuum is triggered, we'll enter the PG_CATCH-block and _bt_end_vacuum while already holding BtreeVacuumLock -> deadlock.

The two error conditions in _bt_start_vacuum that throw elog(ERROR) are the case when there's multiple vacuums for a single index, and running out of vacuum slots. Neither scenario should ever happen, but maybe there's a bug related to CREATE DATABASE that triggers it.

I'll investigate this more this evening or tomorrow...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to