On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote: > daveg <da...@sonic.net> writes: > > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: > >> I think we need to start adding some instrumentation so we can get a > >> better handle on what's going on in your database. If I were to send > >> you a source-code patch for the server that adds some more logging > >> printout when this happens, would you be willing/able to run a patched > >> build on your machine? > > > Yes we can run an instrumented server so long as the instrumentation does > > not interfere with normal operation. However, scheduling downtime to switch > > binaries is difficult, and generally needs to be happen on a weekend, but > > sometimes can be expedited. I'll look into that. > > OK, attached is a patch against 9.0 branch that will re-scan pg_class > after a failure of this sort occurs, and log what it sees in the tuple > header fields for each tuple for the target index. This should give us > some useful information. It might be worthwhile for you to also log the > results of > > select relname,pg_relation_filenode(oid) from pg_class > where relname like 'pg_class%'; > > in your script that does VACUUM FULL, just before and after each time it > vacuums pg_class. That will help in interpreting the relfilenodes in > the log output.
We have installed the patch and have encountered the error as usual. However there is no additional output from the patch. I'm speculating that the pg_class scan in ScanPgRelationDetailed() fails to return tuples somehow. I have also been trying to trace it further by reading the code, but have not got any solid hypothesis yet. In the absence of any debugging output I've been trying to deduce the call tree leading to the original failure. So far it looks like this: RelationReloadIndexInfo(Relation) // Relation is 2662 and !rd_isvalid pg_class_tuple = ScanPgRelation(2662, indexOK=false) // returns NULL pg_class_desc = heap_open(1259, ACC_SHARE) r = relation_open(1259, ACC_SHARE) // locks oid, ensures RelationIsValid(r) r = RelationIdGetRelation(1259) r = RelationIdCacheLookup(1259) // assume success if !rd_isvalid: RelationClearRelation(r, true) RelationInitPhysicalAddr(r) // r is pg_class relcache -dg -- David Gould da...@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers