On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > daveg <da...@sonic.net> writes: >> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: >>> Ah, OK, sorry. Well, in 9.0, VACUUM FULL is basically CLUSTER, which >>> means that a REINDEX is happening as part of the same operation. In >>> 9.0, there's no point in doing VACUUM FULL immediately followed by >>> REINDEX. My guess is that this is happening either right around the >>> time the VACUUM FULL commits or right around the time the REINDEX >>> commits. It'd be helpful to know which, if you can figure it out. > >> I'll update my vacuum script to skip reindexes after vacuum full for 9.0 >> servers and see if that makes the problem go away. > > The thing that was bizarre about the one instance in the buildfarm was > that the error was persistent, ie, once a session had failed all its > subsequent attempts to access pg_class failed too. I gather from Dave's > description that it's working that way for him too. I can think of ways > that there might be a transient race condition against a REINDEX, but > it's very unclear why the failure would persist across multiple > attempts. The best idea I can come up with is that the session has > somehow cached a wrong commit status for the reindexing transaction, > causing it to believe that both old and new copies of the index's > pg_class row are dead ... but how could that happen? The underlying > state in the catalog is not wrong, because no concurrent sessions are > upset (at least not in the buildfarm case ... Dave, do you see more than > one session doing this at a time?).
I was thinking more along the lines of a failure while processing a sinval message emitted by the REINDEX. The sinval message doesn't get fully processed and therefore we get confused about what the relfilenode is for pg_class. If that happened for any other relation, we could recover by scanning pg_class. But if it happens for pg_class or pg_class_oid_index, we're toast, because we can't scan them without knowing what relfilenode to open. Now that can't be quite right, because of course those are mapped relations... and I'm pretty sure there are some other holes in my line of reasoning too. Just thinking out loud... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers