Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-08 Thread daveg
On Wed, Sep 07, 2011 at 09:02:04PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 07:39:15PM -0400, Tom Lane wrote: BTW ... what were the last versions you were running on which you had *not* seen the problem? (Just wondering about the possibility that we

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-08 Thread Tom Lane
daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 09:02:04PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: The first version we saw it on was 8.4.7. Yeah, you said that. I was wondering what you'd last run before 8.4.7. Sorry, misunderstood. We were previously running 8.4.4, but

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Tue, Aug 23, 2011 at 12:15:23PM -0400, Robert Haas wrote: On Mon, Aug 22, 2011 at 3:31 AM, daveg da...@sonic.net wrote: So far I've got:  - affects system tables  - happens very soon after process startup  - in 8.4.7 and 9.0.4  - not likely to be hardware or OS related  - happens

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Robert Haas
On Wed, Sep 7, 2011 at 5:16 AM, daveg da...@sonic.net wrote: On Tue, Aug 23, 2011 at 12:15:23PM -0400, Robert Haas wrote: On Mon, Aug 22, 2011 at 3:31 AM, daveg da...@sonic.net wrote: So far I've got:  - affects system tables  - happens very soon after process startup  - in 8.4.7 and

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: After spending some time staring at the code, I do have one idea as to what might be going on here. When a backend is terminated, ShutdownPostgres() calls AbortOutOfAnyTransaction() and then LockReleaseAll(USER_LOCKMETHOD, true). The second call

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Wed, Sep 07, 2011 at 10:20:24AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: After spending some time staring at the code, I do have one idea as to what might be going on here. When a backend is terminated, ShutdownPostgres() calls AbortOutOfAnyTransaction() and then

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Robert Haas
On Wed, Sep 7, 2011 at 4:22 PM, daveg da...@sonic.net wrote: Yes, we make extensive use of advisory locks. That was my thought too when Robert mentioned session level locks. I'm happy to add any additional instrumentation, but my client would be happier to actually run it if there was a way

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Tom's right to be skeptical of my theory, because it would require a CHECK_FOR_INTERRUPTS() outside of a transaction block in one of the pathways that use session-level locks, and I can't find one. More to the point: session-level locks are released on

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Robert Haas
On Wed, Sep 7, 2011 at 4:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, and for that matter it seems to let VACUUM off the hook too. If we assume that the reported object ID is non-corrupt (and if it's always the same, that seems like the way to bet) then this is a lock on pg_authid. Hmmm

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Wed, Sep 07, 2011 at 04:55:24PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Tom's right to be skeptical of my theory, because it would require a CHECK_FOR_INTERRUPTS() outside of a transaction block in one of the pathways that use session-level locks, and I can't

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The whole mechanism feels a bit rickety to me - we don't actually

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
daveg da...@sonic.net writes: It does not seem restricted to pg_authid: 2011-08-24 18:35:57.445 24987 c23 apps ERROR: lock AccessShareLock on object 16403/2615/0 And I think I've seen it on other tables too. Hmm. 2615 = pg_namespace, which most likely is the first catalog accessed by

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Wed, Sep 07, 2011 at 06:35:08PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: It does not seem restricted to pg_authid: 2011-08-24 18:35:57.445 24987 c23 apps ERROR: lock AccessShareLock on object 16403/2615/0 And I think I've seen it on other tables too. Hmm. 2615 =

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Wed, Sep 07, 2011 at 06:25:23PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 06:25:23PM -0400, Tom Lane wrote: ... But maybe it'd be interesting for Dave to stick a LockReleaseAll call into ProcKill() and see if that makes things better. (Dave: test that before you put it in production, I'm not totally sure it's

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
daveg da...@sonic.net writes: Also, this is very intermittant, we have seen it only in recent months on both 8.4.7 and 9.0.4 after years of no problems. Lately we see it what feels like a few times a month. Possibly some new application behaviour is provoking it, but I have no guesses as to

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread daveg
On Wed, Sep 07, 2011 at 07:39:15PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: Also, this is very intermittant, we have seen it only in recent months on both 8.4.7 and 9.0.4 after years of no problems. Lately we see it what feels like a few times a month. Possibly some new

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Tom Lane
daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 07:39:15PM -0400, Tom Lane wrote: BTW ... what were the last versions you were running on which you had *not* seen the problem? (Just wondering about the possibility that we back-patched some fix that broke things. It would be good to

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-09-07 Thread Robert Haas
On Wed, Sep 7, 2011 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else.  

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-08-23 Thread Robert Haas
On Mon, Aug 22, 2011 at 3:31 AM, daveg da...@sonic.net wrote: So far I've got:  - affects system tables  - happens very soon after process startup  - in 8.4.7 and 9.0.4  - not likely to be hardware or OS related  - happens in clusters for period of a few second to many minutes I'll work

Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held

2011-08-22 Thread daveg
On Fri, Aug 12, 2011 at 04:19:37PM -0700, daveg wrote: This seems to be bug month for my client. Now there are seeing periods where all new connections fail immediately with the error: FATAL: lock AccessShareLock on object 0/1260/0 is already held This happens on postgresql 8.4.7 on