Re: [HACKERS] Bug in point releases 9.3.6 and 9.2.10?

Peter Geoghegan Thu, 12 Mar 2015 17:43:18 -0700

In a hurry right now, so unfortunately I'll need to be brief for now.

On Thu, Mar 12, 2015 at 5:21 PM, Andres Freund <and...@2ndquadrant.com> wrote:
> On 2015-03-12 16:42:24 -0700, Peter Geoghegan wrote:
>> We want to create a new role when this happens, for various reasons.
>> This occurs after recovery ends, but before the database has been
>> "unfenced". The template code that generates various ALTER ROLE
>> statements in our internal provisioning system - which has apparently
>> worked just fine for a long time - is:
>
> Is this all the code that's exececuted after recovery? How are these
> forks brought up? Promoted how? Is it a common 'source' database?


We do PITR up to a recovery target. We're talking about the same issue
occurring on entirely distinct customer databases, with entirely
distinct major PG versions. I'm not sure what other code might have
already been run at this point, but it won't have been much. As I
said, the only common factor that I know of is all affected databases
being on the latest point release.

> Have you looked at these files? Are they indeed zero bytes when this
> error occurs?

I think that they are indeed zero. I looked at that last week, when I
didn't consider that this might be a more widespread issue. I'll check
again later.

> Do you still have a base backup from the relevant time, so you could
> repeat the whole thing?

Yes.

>> The only common factor is that this occurs on the latest point
>> releases (either 9.3.6 and 9.2.10, at least so far). In all cases I've
>> seen so far, the relation in question is the pg_auth_members heap
>> relation. For example:
>
> Any chance that the new nodes also use a different kernel version or
> such?

They may differ, but that doesn't seem likely to be relevant, at least
to me. This has happened something like 6 or 7 times already, starting
late last week. I am unfamiliar with this provisioning code, so, as I
mentioned, offhand I cannot be entirely sure that there isn't some
other code run when the problem originally arises (that I should have
included in my report). What I can tell you is that I saw the same
error messages when I manually ran the statements generated by the
above code within a transaction...until I ran "VACUUM FULL
pg_auth_members;".

> This filenode got to be pg_auth_member's original one, given it's below
> FirstNormalObjectId. I get a lower value, but that's probably caused by
> having fewer collations and other data generated during initdb. That
> implies that the table hasn't ever been rewritten.
>
> What's 12811?

It's the same catalog, pg_auth_member. As I said, the messages you saw
are on entirely different customer databases, servers and (sometimes)
PG version.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in point releases 9.3.6 and 9.2.10?

Reply via email to