Re: [ADMIN] Strange deadlock error last night

2009-01-13 Thread Tom Lane
Scott Whitney swhit...@journyx.com writes:
 Last night, I got this:

 Jan 13 03:31:28 db01 postgres[23537]: [140-2] DETAIL:  Process 23537 waits
 for AccessShareLock on relation 1260 of database 0; blocked by process
 8.
 Jan 13 03:31:28 db01 postgres[23537]: [140-3]   Process 8 waits for
 AccessShareLock on relation 1262 of database 0; blocked by process 6816.
 Jan 13 03:31:28 db01 postgres[23537]: [140-4]   Process 6816 waits for
 AccessShareLock on relation 1260 of database 0; blocked by process 14624.
 Jan 13 03:31:28 db01 postgres[23537]: [140-5]   Process 14624 waits for
 AccessExclusiveLock on relation 1260 of database 0; blocked by process
 23537.
 Jan 13 03:31:28 db01 postgres[14624]: [243-1] ERROR:  deadlock detected

Well, the problem is evidently that guy trying to get exclusive lock on
relation 1260, which is pg_authid (a quick way to check that is select
1260::regclass).  (Note: the database 0 just means it's a system catalog
that's shared across all databases.)  pg_authid is touched often enough
that trying to ex-lock it in an active database is just asking for
trouble.

Is it possible that that process was trying to run VACUUM FULL or
REINDEX on pg_authid?  I can't think of any other likely-sounding
explanation for something trying to take an exclusive lock on that
catalog.  The best solution would be don't do that ...

 It ended up locking up about 250 customer databases until I restarted the
 postmaster. This is version 8.1.4. Upgrading right now (even to a minor rev)
 is not really an option.

Not related to the immediate problem, but: you really need to schedule
5 minutes' downtime so you can update to 8.1.latest.  Reasons can be
found here:
http://www.postgresql.org/docs/8.1/static/release.html

regards, tom lane

-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Strange deadlock error last night

2009-01-13 Thread Alvaro Herrera
Tom Lane wrote:
 Scott Whitney swhit...@journyx.com writes:

  It ended up locking up about 250 customer databases until I restarted the
  postmaster. This is version 8.1.4. Upgrading right now (even to a minor rev)
  is not really an option.
 
 Not related to the immediate problem, but: you really need to schedule
 5 minutes' downtime so you can update to 8.1.latest.  Reasons can be
 found here:
 http://www.postgresql.org/docs/8.1/static/release.html

Particularly so if you run autovacuum, because otherwise you'll be hit
by a bug fixed in 8.1.6 about template0 being unfrozen (thawed?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Strange deadlock error last night

2009-01-13 Thread Scott Marlowe
On Tue, Jan 13, 2009 at 10:37 AM, Scott Whitney swhit...@journyx.com wrote:

 It ended up locking up about 250 customer databases until I restarted the
 postmaster. This is version 8.1.4. Upgrading right now (even to a minor rev)
 is not really an option. This box has been up and running for 306 days. This
 postgres level has been installed for..err...well...at least Aug 9, 2006,
 based on some dates in the directories.

You need to ask yourself how much downtime you can afford.  The 2 or 3
minutes every few months to go from 8.1.x to 8.1.x+1, or the half a
day of downtime when some horrendous bug takes down the whole site
because you didn't update it.  Seriously, that unfozen template0 bug
that Alvarro mentioned is one of those kinds of bugs.

Nothing like your db going down in the middle of the day with an error
message that it's going down to prevent txid wraparound induced loss,
please run vacuum on all your databases in single user mode.

If you can't find set aside a minute or two at 0200 hrs, then don't be
surprised when you get one of those failures.

-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Strange deadlock error last night

2009-01-13 Thread Scott Whitney
Thanks for all the information, guys. I think Tom was right. Our application
was doing a couple of full vacs at the same time. It's weird that we didn't
run into this in the past.

You're all absolutely right about the upgrading, but in our environment,
it's not 2-3 minutes. It's 2-3 weeks. I've got to fully vet the app on the
platform internally with full test plans, etc, even for the most minor
upgrades; corp policy.

Right now, my effort is in going to the latest stable branch. Moving
forward, I will use these notes to get the company to revisit the minor
upgrade policy, though.

After all, when I _do_ get hit by one of those bugs, I _will_ be asked why
we weren't upgraded. *sigh*

 

-Original Message-
From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
Sent: Tuesday, January 13, 2009 4:16 PM
To: Scott Whitney
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Strange deadlock error last night

On Tue, Jan 13, 2009 at 10:37 AM, Scott Whitney swhit...@journyx.com
wrote:

 It ended up locking up about 250 customer databases until I restarted the
 postmaster. This is version 8.1.4. Upgrading right now (even to a minor
rev)
 is not really an option. This box has been up and running for 306 days.
This
 postgres level has been installed for..err...well...at least Aug 9, 2006,
 based on some dates in the directories.

You need to ask yourself how much downtime you can afford.  The 2 or 3
minutes every few months to go from 8.1.x to 8.1.x+1, or the half a
day of downtime when some horrendous bug takes down the whole site
because you didn't update it.  Seriously, that unfozen template0 bug
that Alvarro mentioned is one of those kinds of bugs.

Nothing like your db going down in the middle of the day with an error
message that it's going down to prevent txid wraparound induced loss,
please run vacuum on all your databases in single user mode.

If you can't find set aside a minute or two at 0200 hrs, then don't be
surprised when you get one of those failures.


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin