Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack, which was meant to expose exactly this sort of problem, failed to do so --- buildfarm member jaguar has been

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-16 Thread Robert Haas
On Tue, Aug 16, 2011 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: It would be nice to move the short-circuit test I recently inserted at the top of SIGetDataEntries() somewhere higher up in the call stack, but right now the layers of abstraction are so thick that it's not exactly clear how

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-15 Thread daveg
[adding back hackers so the thread shows the resolution] On Sun, Aug 14, 2011 at 07:02:55PM -0400, Tom Lane wrote: Sounds good. Based on my own testing so far, I think that patch will probably make things measurably better for you, though it won't resolve every corner case. The most recent

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-11 Thread Tom Lane
I wrote: I still haven't reproduced the behavior here, but I think I see what must be happening: we are getting an sinval reset while attempting to open pg_class_oid_index. After a number of false starts, I've managed to reproduce this behavior locally. The above theory turns out to be wrong,

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-11 Thread Robert Haas
On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: I can reproduce the problem fairly conveniently with this crude hack: diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c index 8499615..5ad2aee 100644 *** a/src/backend/storage/ipc/sinval.c

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread Tom Lane
I wrote: Ahh ... you know what, never mind about stack traces, let's just see if the attached patch doesn't fix it. On reflection, that patch would only fix the issue for pg_class, and that's not the only catalog that gets consulted during relcache reloads. I think we'd better do it as

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread daveg
On Fri, Aug 05, 2011 at 12:10:31PM -0400, Tom Lane wrote: I wrote: Ahh ... you know what, never mind about stack traces, let's just see if the attached patch doesn't fix it. On reflection, that patch would only fix the issue for pg_class, and that's not the only catalog that gets

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-05 Thread Tom Lane
daveg da...@sonic.net writes: Should this be applied in addition to the earlier patch, or to replace it? Apply it instead of the earlier one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Wed, Aug 03, 2011 at 11:18:20AM -0400, Tom Lane wrote: Evidently not, if it's not logging anything, but now the question is why. One possibility is that for some reason RelationGetNumberOfBlocks is persistently lying about the file size. (We've seen kernel bugs before that resulted in

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg da...@sonic.net writes: Summary: the failing process reads 0 rows from 0 blocks from the OLD relfilenode. Hmm. This seems to mean that we're somehow missing a relation mapping invalidation message, or perhaps not processing it soon enough during some complex set of invalidations. I did

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: Summary: the failing process reads 0 rows from 0 blocks from the OLD relfilenode. Hmm. This seems to mean that we're somehow missing a relation mapping invalidation message, or perhaps not processing

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: Summary: the failing process reads 0 rows from 0 blocks from the OLD relfilenode. Hmm. This seems to mean that we're somehow missing a relation mapping invalidation message, or perhaps not processing

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg da...@sonic.net writes: We are seeing cannot read' and 'cannot open' errors too that would be consistant with trying to use a vanished file. Yeah, these all seem consistent with the idea that the failing backend somehow missed an update for the relation mapping file. You would get the

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread daveg
On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: We are seeing cannot read' and 'cannot open' errors too that would be consistant with trying to use a vanished file. Yeah, these all seem consistent with the idea that the failing backend somehow

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
daveg da...@sonic.net writes: On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote: If this theory is correct then all of the file-related errors ought to match up to recently-vacuumed mapped catalogs or indexes (those are the ones with relfilenode = 0 in pg_class). Do you want to expand

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-04 Thread Tom Lane
Ahh ... you know what, never mind about stack traces, let's just see if the attached patch doesn't fix it. I still haven't reproduced the behavior here, but I think I see what must be happening: we are getting an sinval reset while attempting to open pg_class_oid_index. The latter condition

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread daveg
On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: I think we need to start adding some instrumentation so we can get a better handle on what's going on in your database. If I were to send you

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread Tom Lane
daveg da...@sonic.net writes: We have installed the patch and have encountered the error as usual. However there is no additional output from the patch. I'm speculating that the pg_class scan in ScanPgRelationDetailed() fails to return tuples somehow. Evidently not, if it's not logging

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-01 Thread Tom Lane
daveg da...@sonic.net writes: On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: I think we need to start adding some instrumentation so we can get a better handle on what's going on in your database. If I were to send you a source-code patch for the server that adds some more logging

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread daveg
On Thu, Jul 28, 2011 at 11:31:31PM -0700, daveg wrote: On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: REINDEX. My guess is that this is happening either right around the time the VACUUM FULL commits or right around the time the REINDEX commits. It'd be helpful to know which,

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread Tom Lane
daveg da...@sonic.net writes: Here is the update: the problem happens with vacuum full alone, no reindex is needed to trigger it. I updated the script to avoid reindexing after vacuum. Over the past two days there are still many ocurrances of this error coincident with the vacuum. Well, that

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-31 Thread daveg
On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: daveg da...@sonic.net writes: Here is the update: the problem happens with vacuum full alone, no reindex is needed to trigger it. I updated the script to avoid reindexing after vacuum. Over the past two days there are still many

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread daveg
On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 5:46 PM, daveg da...@sonic.net wrote: On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: On Wed, Jul 27, 2011 at 8:28 PM, daveg da...@sonic.net wrote: My client has been seeing regular instances

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
daveg da...@sonic.net writes: On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which means that a REINDEX is happening as part of the same operation. In 9.0, there's no point in doing VACUUM FULL immediately followed

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: daveg da...@sonic.net writes: On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which means that a REINDEX is happening as part of the same operation.  

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: The thing that was bizarre about the one instance in the buildfarm was that the error was persistent, ie, once a session had failed all its subsequent attempts to access pg_class

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: The thing that was bizarre about the one instance in the buildfarm was that the error was persistent, ie, once a

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, no, because the ScanPgRelation call is not failing internally. It's performing a seqscan of pg_class and not finding a matching tuple. SnapshotNow race? That's what I would

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread daveg
On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote: The thing that was bizarre about the one instance in the buildfarm was that the error was persistent, ie, once a session had failed all its subsequent attempts to access pg_class failed too. I gather from Dave's description that it's

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread Robert Haas
On Wed, Jul 27, 2011 at 8:28 PM, daveg da...@sonic.net wrote: My client has been seeing regular instances of the following sort of problem: On what version of PostgreSQL? If simplicity worked, the world would be overrun with insects. I thought it was... :-) -- Robert Haas EnterpriseDB:

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread daveg
On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: On Wed, Jul 27, 2011 at 8:28 PM, daveg da...@sonic.net wrote: My client has been seeing regular instances of the following sort of problem: On what version of PostgreSQL? 9.0.4. I previously said: This occurs on postgresql

Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 5:46 PM, daveg da...@sonic.net wrote: On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote: On Wed, Jul 27, 2011 at 8:28 PM, daveg da...@sonic.net wrote: My client has been seeing regular instances of the following sort of problem: On what version of