Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-16 Thread Robert Haas
On Tue, Aug 16, 2011 at 3:45 PM, Tom Lane wrote: >> It would be nice to move the short-circuit test I recently inserted at >> the top of SIGetDataEntries() somewhere higher up in the call stack, >> but right now the layers of abstraction are so thick that it's not >> exactly clear how to do that.

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-16 Thread Tom Lane
Robert Haas writes: > On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane wrote: >> What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack, >> which was meant to expose exactly this sort of problem, failed to do so >> --- buildfarm member jaguar has been running with that flag for ages and

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-15 Thread daveg
[adding back hackers so the thread shows the resolution] On Sun, Aug 14, 2011 at 07:02:55PM -0400, Tom Lane wrote: > Sounds good. Based on my own testing so far, I think that patch will > probably make things measurably better for you, though it won't resolve > every corner case. The most recent

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-11 Thread Robert Haas
On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane wrote: > I can reproduce the problem fairly conveniently with this crude hack: > > diff --git a/src/backend/storage/ipc/sinval.c > b/src/backend/storage/ipc/sinval.c > index 8499615..5ad2aee 100644 > *** a/src/backend/storage/ipc/sinval.c > --- b/src/back

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-11 Thread Tom Lane
I wrote: > I still haven't reproduced the behavior here, but I think I see what > must be happening: we are getting an sinval reset while attempting to > open pg_class_oid_index. After a number of false starts, I've managed to reproduce this behavior locally. The above theory turns out to be wron

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread Tom Lane
daveg writes: > Should this be applied in addition to the earlier patch, or to replace it? Apply it instead of the earlier one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postg

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread daveg
On Fri, Aug 05, 2011 at 12:10:31PM -0400, Tom Lane wrote: > I wrote: > > Ahh ... you know what, never mind about stack traces, let's just see if > > the attached patch doesn't fix it. > > On reflection, that patch would only fix the issue for pg_class, and > that's not the only catalog that gets c

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread Tom Lane
I wrote: > Ahh ... you know what, never mind about stack traces, let's just see if > the attached patch doesn't fix it. On reflection, that patch would only fix the issue for pg_class, and that's not the only catalog that gets consulted during relcache reloads. I think we'd better do it as attache

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
Ahh ... you know what, never mind about stack traces, let's just see if the attached patch doesn't fix it. I still haven't reproduced the behavior here, but I think I see what must be happening: we are getting an sinval reset while attempting to open pg_class_oid_index. The latter condition cause

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg writes: > On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote: >> If this theory is correct then all of the file-related errors ought to >> match up to recently-vacuumed mapped catalogs or indexes (those are the >> ones with relfilenode = 0 in pg_class). Do you want to expand your >> l

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote: > daveg writes: > > We are seeing "cannot read' and 'cannot open' errors too that would be > > consistant with trying to use a vanished file. > > Yeah, these all seem consistent with the idea that the failing backend > somehow missed an up

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg writes: > We are seeing "cannot read' and 'cannot open' errors too that would be > consistant with trying to use a vanished file. Yeah, these all seem consistent with the idea that the failing backend somehow missed an update for the relation mapping file. You would get the "could not find

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote: > daveg writes: > > Summary: the failing process reads 0 rows from 0 blocks from the OLD > > relfilenode. > > Hmm. This seems to mean that we're somehow missing a relation mapping > invalidation message, or perhaps not processing it soon

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote: > daveg writes: > > Summary: the failing process reads 0 rows from 0 blocks from the OLD > > relfilenode. > > Hmm. This seems to mean that we're somehow missing a relation mapping > invalidation message, or perhaps not processing it soon

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg writes: > Summary: the failing process reads 0 rows from 0 blocks from the OLD > relfilenode. Hmm. This seems to mean that we're somehow missing a relation mapping invalidation message, or perhaps not processing it soon enough during some complex set of invalidations. I did some testing

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Wed, Aug 03, 2011 at 11:18:20AM -0400, Tom Lane wrote: > Evidently not, if it's not logging anything, but now the question is > why. One possibility is that for some reason RelationGetNumberOfBlocks > is persistently lying about the file size. (We've seen kernel bugs > before that resulted in

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread Tom Lane
daveg writes: > We have installed the patch and have encountered the error as usual. > However there is no additional output from the patch. I'm speculating > that the pg_class scan in ScanPgRelationDetailed() fails to return > tuples somehow. Evidently not, if it's not logging anything, but now

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread daveg
On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote: > daveg writes: > > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: > >> I think we need to start adding some instrumentation so we can get a > >> better handle on what's going on in your database. If I were to send > >> you a so

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-01 Thread Tom Lane
daveg writes: > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: >> I think we need to start adding some instrumentation so we can get a >> better handle on what's going on in your database. If I were to send >> you a source-code patch for the server that adds some more logging >> printo

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread daveg
On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: > daveg writes: > > Here is the update: the problem happens with vacuum full alone, no reindex > > is needed to trigger it. I updated the script to avoid reindexing after > > vacuum. Over the past two days there are still many ocurrances of

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread Tom Lane
daveg writes: > Here is the update: the problem happens with vacuum full alone, no reindex > is needed to trigger it. I updated the script to avoid reindexing after > vacuum. Over the past two days there are still many ocurrances of this > error coincident with the vacuum. Well, that jives with t

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread daveg
On Thu, Jul 28, 2011 at 11:31:31PM -0700, daveg wrote: > On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: > > REINDEX. My guess is that this is happening either right around the > > time the VACUUM FULL commits or right around the time the REINDEX > > commits. It'd be helpful to know

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread daveg
On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote: > The thing that was bizarre about the one instance in the buildfarm was > that the error was persistent, ie, once a session had failed all its > subsequent attempts to access pg_class failed too. I gather from Dave's > description that it'

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas writes: > On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote: >> Well, no, because the ScanPgRelation call is not failing internally. >> It's performing a seqscan of pg_class and not finding a matching tuple. > SnapshotNow race? That's what I would have guessed to start with, except t

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: >>> The thing that was bizarre about the one instance in the buildfarm was >>> that the error was persistent, ie, once a session had failed all its >>> subsequent attempts

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas writes: > On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: >> The thing that was bizarre about the one instance in the buildfarm was >> that the error was persistent, ie, once a session had failed all its >> subsequent attempts to access pg_class failed too. > I was thinking more alo

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane wrote: > daveg writes: >> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: >>> Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which >>> means that a REINDEX is happening as part of the same operation.  In >>> 9.0, there's no p

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
daveg writes: > On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: >> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which >> means that a REINDEX is happening as part of the same operation. In >> 9.0, there's no point in doing VACUUM FULL immediately followed by >> REI

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread daveg
On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: > On Thu, Jul 28, 2011 at 5:46 PM, daveg wrote: > > On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: > >> On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote: > >> > My client has been seeing regular instances of the following sort

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 5:46 PM, daveg wrote: > On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: >> On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote: >> > My client has been seeing regular instances of the following sort of >> > problem: >> On what version of PostgreSQL? > > 9.0.4. > > I

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread daveg
On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: > On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote: > > My client has been seeing regular instances of the following sort of > > problem: > On what version of PostgreSQL? 9.0.4. I previously said: > > This occurs on postgresql 9.0.4. on 3

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread Robert Haas
On Wed, Jul 27, 2011 at 8:28 PM, daveg wrote: > My client has been seeing regular instances of the following sort of problem: On what version of PostgreSQL? > If simplicity worked, the world would be overrun with insects. I thought it was... :-) -- Robert Haas EnterpriseDB: http://www.enterp