Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-08-09 Thread Bruce Momjian
On Wed, Aug  8, 2018 at 10:27:59AM -0700, Peter Geoghegan wrote:
> On Wed, Aug 8, 2018 at 10:23 AM, Kyle Samson  wrote:
> > We found the exact same issue on a production database at TripAdvisor. We 
> > have no new information to add for debugging. Bumping this to up the 
> > priority on a patch version release getting out.
> 
> There is a batch of point releases that were just stamped + tagged.
> They should be released shortly.

They have been released:

https://www.postgresql.org/about/news/1878/

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-08-08 Thread Peter Geoghegan
On Wed, Aug 8, 2018 at 10:23 AM, Kyle Samson  wrote:
> We found the exact same issue on a production database at TripAdvisor. We 
> have no new information to add for debugging. Bumping this to up the priority 
> on a patch version release getting out.

There is a batch of point releases that were just stamped + tagged.
They should be released shortly.

-- 
Peter Geoghegan



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-08-08 Thread Kyle Samson
Hello,

We found the exact same issue on a production database at TripAdvisor. We have 
no new information to add for debugging. Bumping this to up the priority on a 
patch version release getting out.

Thanks,
-Kyle Samson


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-22 Thread Matheus de Oliveira
Hello again...

On Tue, Jun 19, 2018 at 12:53 PM, Andres Freund  wrote:

> ...
>
> After that, yes, deleting the
> global/pg_internal.init file is the way to go, and I can't think of a
> case where it's problematic, even without stopping the server.
>
>
Just to let you know. I applied the work around in the affected server and
seemed to work way, so far no error.

Thank you a lot for all the information and help.

Best regards,
-- 
Matheus de Oliveira


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-20 Thread Andres Freund
Hi,

On 2018-06-20 15:05:59 +0300, Sergey Burladyan wrote:
> Andres Freund  writes:
> 
> > You should first make sure it's actually this problem - which tables are
> > holding back the xmin horizon?  After that, yes, deleting the
> > global/pg_internal.init file is the way to go, and I can't think of a
> > case where it's problematic, even without stopping the server.
> 
> Thanks for clarification! I also have this problem, BTW, autovacuum does
> not worked at all:
> # select max(last_autovacuum) from pg_stat_user_tables;
>   max
> ---
>  2018-06-06 00:48:47.813841+03
> 
> all it workers stoped with this messages:
> ERROR: found xmin 982973690 from before relfrozenxid 2702858737
> CONTEXT: automatic vacuum of table "avito_delta.pg_catalog.pg_authid"
> ERROR: found xmin 982973690 from before relfrozenxid 2702858761
> CONTEXT: automatic vacuum of table "avito_delta.pg_catalog.pg_auth_members"
> 
> and it does not try to vacuum other tables.

Yea, that's this bug.  I'd remove global/pg_internal.init, reconnect,
and manually vacuum.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Andres Freund
Hi,

On 2018-06-19 17:05:48 -0300, Matheus de Oliveira wrote:
> > You should first make sure it's actually this problem - which tables are
> > holding back the xmin horizon?
> 
> 
> How can I be sure? The tables are `pg_authid` and `pg_auth_members`, the
> following message is logged every minute (which I belive is due to
> `autovacuum_naptime`, which is using the default of 1 minute):

Yes, that sounds like the issue. Basically just wanted the table names:

> ERROR:  found xmin 3460221635 from before relfrozenxid 1245633870
> CONTEXT:  automatic vacuum of table "template0.pg_catalog.pg_authid"

which indeed are shared relations.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Matheus de Oliveira
On Tue, Jun 19, 2018 at 12:53 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-06-19 10:26:26 -0300, Matheus de Oliveira wrote:
> > Hello friends.
> >
> > On Tue, Jun 12, 2018 at 3:31 PM, Andres Freund 
> wrote:
> >
> > >
> > > On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
> > > > I plan to go over the change again tomorrow, and then push. Unless
> > > > somebody has comments before then, obviously.
> > >
> > > Done.
> > >
> > >
> > Sorry to bother about this, but do you have any plan to do the minor
> > release before planned due to this bug?
>
> Unclear at this point.  There's been discussion about it, without coming
> to a conclusion.
>
>
Ok. Thank you for the information.

I really hope you decide to release it soon, I'm very afraid about the
users that have hit the bug but haven't noticed the issue.


>
> > I'm pondering what is the best option to avoid a forced shutdown of this
> > server:
> > - should I just wait for a release (if it is soon, I would be fine)?
> > - build PG from the git version by myself?
> > - or is there a safer workaround to the problem? (not clear to me if
> > deleting the `global/pg_internal.init` file is really the way to go, and
> > the details, is it safe? Should I stop the server, delete, start?)
>
> You should first make sure it's actually this problem - which tables are
> holding back the xmin horizon?


How can I be sure? The tables are `pg_authid` and `pg_auth_members`, the
following message is logged every minute (which I belive is due to
`autovacuum_naptime`, which is using the default of 1 minute):

ERROR:  found xmin 3460221635 from before relfrozenxid 1245633870
CONTEXT:  automatic vacuum of table "template0.pg_catalog.pg_authid"
ERROR:  found xmin 3460221635 from before relfrozenxid 1245633870
CONTEXT:  automatic vacuum of table
"template0.pg_catalog.pg_auth_members"

Do you need controldata or more info to validate it?


>   After that, yes, deleting the
> global/pg_internal.init file is the way to go, and I can't think of a
> case where it's problematic, even without stopping the server.
>
>
I'll try that and get back to you if it worked or not. Thank you for the
confirmation.

And thank you for all clarifications.

Best regards,
-- 
Matheus de Oliveira


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Andres Freund
Hi,

On 2018-06-19 08:25:33 -0500, Jeremy Finzel wrote:
> I have a question related to this - and specifically, preventing the error
> until we have a patch :).

The patch has been committed to postgres, although no release has been
made including it yet.


> We are encountering this error every few weeks on one very high
> transaction db, and have to restart to fix it.

> If I read you correctly, the cache may never be invalidated for these
> catalogs even if I manually VACUUM them?  I was thinking if I routinely run
> VACUUM FREEZE on these tables in every database I might avoid the
> issue.

Correct, that won't help. In fact, it'll quite possibly make it worse.


> But given the cause of the issue, would that just make no difference and I
> will still hit the error eventually?

Yes. You might be able to get away with just removing the
pg_internal.init files before and after an explicit VACUUM on shared
system tables.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Andres Freund
Hi,

On 2018-06-19 10:26:26 -0300, Matheus de Oliveira wrote:
> Hello friends.
> 
> On Tue, Jun 12, 2018 at 3:31 PM, Andres Freund  wrote:
> 
> >
> > On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
> > > I plan to go over the change again tomorrow, and then push. Unless
> > > somebody has comments before then, obviously.
> >
> > Done.
> >
> >
> Sorry to bother about this, but do you have any plan to do the minor
> release before planned due to this bug?

Unclear at this point.  There's been discussion about it, without coming
to a conclusion.


> I'm pondering what is the best option to avoid a forced shutdown of this
> server:
> - should I just wait for a release (if it is soon, I would be fine)?
> - build PG from the git version by myself?
> - or is there a safer workaround to the problem? (not clear to me if
> deleting the `global/pg_internal.init` file is really the way to go, and
> the details, is it safe? Should I stop the server, delete, start?)

You should first make sure it's actually this problem - which tables are
holding back the xmin horizon?  After that, yes, deleting the
global/pg_internal.init file is the way to go, and I can't think of a
case where it's problematic, even without stopping the server.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Jeremy Finzel
On Tue, Jun 19, 2018 at 8:26 AM Matheus de Oliveira <
matioli.math...@gmail.com> wrote:

> Hello friends.
>
> On Tue, Jun 12, 2018 at 3:31 PM, Andres Freund  wrote:
>
>>
>> On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
>> > I plan to go over the change again tomorrow, and then push. Unless
>> > somebody has comments before then, obviously.
>>
>> Done.
>>
>>
> Sorry to bother about this, but do you have any plan to do the minor
> release before planned due to this bug?
>
> There seem to have too many users affected by this. And worst is that many
> users may not have even noticed they have the problem, which can cause
> `age(datfrozenxid)` to keep increasing until reachs 2.1 billions and the
> system goes down.
>
> In my case, I have a server that its `age(datfrozenxid)` is already at 1.9
> billions, and I expect it to reach 2.1 billions in about 14 days.
> Fortunately, I have monitoring system over `age(datfrozenxid)`, that is why
> I found the issue in one of my servers.
>
> I'm pondering what is the best option to avoid a forced shutdown of this
> server:
> - should I just wait for a release (if it is soon, I would be fine)?
> - build PG from the git version by myself?
> - or is there a safer workaround to the problem? (not clear to me if
> deleting the `global/pg_internal.init` file is really the way to go, and
> the details, is it safe? Should I stop the server, delete, start?)
>
> Best regards,
> --
> Matheus de Oliveira
>
>
> Restarting the database has fixed the error on these pg_catalog tables,
allowing us to vacuum them and avoid wraparound.

We first noticed a restart fixed the issue because SAN snapshots did not
have the error. The only difference really being shared memory and nothing
disk-level.

Jeremy


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Matheus de Oliveira
Hello friends.

On Tue, Jun 12, 2018 at 3:31 PM, Andres Freund  wrote:

>
> On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
> > I plan to go over the change again tomorrow, and then push. Unless
> > somebody has comments before then, obviously.
>
> Done.
>
>
Sorry to bother about this, but do you have any plan to do the minor
release before planned due to this bug?

There seem to have too many users affected by this. And worst is that many
users may not have even noticed they have the problem, which can cause
`age(datfrozenxid)` to keep increasing until reachs 2.1 billions and the
system goes down.

In my case, I have a server that its `age(datfrozenxid)` is already at 1.9
billions, and I expect it to reach 2.1 billions in about 14 days.
Fortunately, I have monitoring system over `age(datfrozenxid)`, that is why
I found the issue in one of my servers.

I'm pondering what is the best option to avoid a forced shutdown of this
server:
- should I just wait for a release (if it is soon, I would be fine)?
- build PG from the git version by myself?
- or is there a safer workaround to the problem? (not clear to me if
deleting the `global/pg_internal.init` file is really the way to go, and
the details, is it safe? Should I stop the server, delete, start?)

Best regards,
-- 
Matheus de Oliveira


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Jeremy Finzel
On Fri, May 25, 2018 at 3:37 PM, Andres Freund  wrote:

> Hi,
>
> Moving discussion to -hackers.  Tom, I think you worked most with this
> code, your input would be appreciated.
>
> Original discussion is around:
> http://archives.postgresql.org/message-id/20180524211311.
> tnswfnjwnii54htx%40alvherre.pgsql
>
> On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote:
> > On 2018-May-24, Andres Freund wrote:
> > > Then there's also:
> > > http://archives.postgresql.org/message-id/1527193504642.
> 36340%40amazon.com
> >
> > ah, so deleting the relcache file makes the problem to go away?  That's
> > definitely pretty strange.  I see no reason for the value in relcache to
> > become out of step with the catalogued value in the same database ... I
> > don't think we transmit in any way values of one database to another.
>
> I can reproduce the issue. As far as I can tell we just don't ever
> actually update nailed relcache entries in the normal course, leaving
> the "physical address" aside.  VACUUM will, via
> vac_update_relstats() -> heap_inplace_update() ->
> CacheInvalidateHeapTuple(),
> send out an invalidation. But invalidation, in my case another session,
> will essentially ignore most of that due to:
>
> static void
> RelationClearRelation(Relation relation, bool rebuild)
> ...
> /*
>  * Never, never ever blow away a nailed-in system relation,
> because we'd
>  * be unable to recover.  However, we must redo
> RelationInitPhysicalAddr
>  * in case it is a mapped relation whose mapping changed.
>  *
>  * If it's a nailed-but-not-mapped index, then we need to re-read
> the
>  * pg_class row to see if its relfilenode changed. We do that
> immediately
>  * if we're inside a valid transaction and the relation is open
> (not
>  * counting the nailed refcount).  Otherwise just mark the entry as
>  * possibly invalid, and it'll be fixed when next opened.
>  */
> if (relation->rd_isnailed)
> {
> RelationInitPhysicalAddr(relation);
>
> if (relation->rd_rel->relkind == RELKIND_INDEX ||
> relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_INDEX)
> {
> relation->rd_isvalid = false;   /* needs to be
> revalidated */
> if (relation->rd_refcnt > 1 &&
> IsTransactionState())
> RelationReloadIndexInfo(relation);
> }
> return;
> }
>
> Which basically means that once running we'll never update the relcache
> data for nailed entries.  That's unproblematic for most relcache fields,
> but not for things like RelationData->rd_rel->relfrozenxid / relminmxid.
>
> This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive
> vacuums despite being required. And it'll lead, triggering this thread,
> to wrong errors being raised during vacuum because relfrozenxid just is
> some random value from the past.  I suspect this might also be
> co-responsible for a bunch of planning issues for queries involving the
> catalog, because the planner will use wrong relcache data until the next
> time the init file is thrown away?
>
> This looks like a very longstanding bug to me.  I'm not yet quite sure
> what the best way to deal with this is.  I suspect we might get away
> with just looking up a new version of the pg_class tuple and copying
> rd_rel over?
>
> Greetings,
>
> Andres Freund
>

I have a question related to this - and specifically, preventing the error
until we have a patch :).  We are encountering this error every few weeks
on one very high transaction db, and have to restart to fix it.

If I read you correctly, the cache may never be invalidated for these
catalogs even if I manually VACUUM them?  I was thinking if I routinely run
VACUUM FREEZE on these tables in every database I might avoid the issue.
But given the cause of the issue, would that just make no difference and I
will still hit the error eventually?

Thanks,
Jeremy


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-12 Thread Andres Freund
Hi,

On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
> I plan to go over the change again tomorrow, and then push. Unless
> somebody has comments before then, obviously.

Done.

- Andres



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-11 Thread Andres Freund
Hi,

On 2018-05-28 12:52:06 -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-05-27 13:00:06 -0700, Andres Freund wrote:
> > I've a patch that seems to work, that mostly needs some comment
> > polishing.
> 
> Attached is what I currently have. Still needs some more work, but I
> think it's more than good enough to review the approach.  Basically the
> approach consists out of two changes:
> 
> 1) Send init file removals for shared nailed relations as well.
> 
>This fixes that the shared init file contains arbitrarily outdated
>information for relfrozenxid etc. Leading to e.g. the pg_authid
>errors we've seen in some recent threads.  Only applies to
>new connections.
> 
> 2) Reread RelationData->rd_rel for nailed relations when invalidated.
> 
>This ensures that already built relcache entries for nailed relations
>are updated. Currently they never are.  This currently doesn't cause
>*that* frequently an issue for !shared entries, because for those the
>init file gets zapped regularly, and autovacuum workers usually don't
>live that long.  But it's still a significant correctness issue for
>both shared an non shared relations.

Here's a more polished v2 version of the patch.  I, locally, did the
work to backpatch the change. Besides trivialities there's two
nontrivial changes:

- In earlier versions there's no global invalidations. I've inquired and
  complained about what exactly they're for in
  
http://archives.postgresql.org/message-id/20180611231634.w2rgtlzxaw4loefk%40alap3.anarazel.de

  In earlier branches we just don't do the global thing. That seems
  unproblematic to me.

- The bigger issue is that in 9.3, and just there
  
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8de3e410faa06ab20ec1aa6d0abb0a2c040261ba
  does not yet exist.

  That means back then we performed reloads even outside a
  transaction. I don't feel confident about invoking additional catalog
  reloads in the new situations, so I kept the IsTransactionState()
  checks in RelationReloadNailed().  That seems less risky, but possibly
  somebody wants to argue the other way round?

There's some minor other conflicts, but they're all pretty obvious.

I plan to go over the change again tomorrow, and then push. Unless
somebody has comments before then, obviously.


> FWIW, I wonder if this isn't critical enough to make us consider having
> a point release earlier..

Still think this is something we should seriously consider.

- Andres
>From d4fc41f841951e74e298811ad5ad9b872d14d607 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 28 May 2018 12:38:22 -0700
Subject: [PATCH v2] Fix bugs in vacuum of shared rels, by keeping their
 relcache entries current.

When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point.  After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
  ERROR: found xmin ... from before relfrozenxid ...
and
  ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
   theory that the relcache entry for such tables doesn't
   change. Which is largely true, except for fields like relfrozenxid
   etc.  This means that changes to relations vacuumed in other
   sessions weren't picked up by already existing sessions.  Luckily
   autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
   was never invalidated while running.  That means that for such
   tables (e.g. pg_authid, pg_database) it's not just already existing
   sessions that are affected, but even new connections are as well.
   That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad.  The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
Backpatch: 9.3-
---
 src/backend/utils/cache/inval.c

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-11 Thread Andres Freund
Hi,

On 2018-05-29 19:14:51 -0400, Alvaro Herrera wrote:
> I added an Assert(DatabasePath != NULL) to
> RelationCacheInitFilePreInvalidate() and didn't see a single crash when
> running the tests.  I thought that adding a "VACUUM FREEZE pg_class"
> in the recovery tests (where there is a standby) ought to do it, but it
> doesn't.  What's the deal there?
> 
> Here are some proposed changes.  Some of these comment edits are WIP :-)

I'm a bit confused by these changes - there seems to be some that look
like a borked diff?  And a number of others look pretty unrelated?

> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

> diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
> index 0d6100fb08..8a8620943f 100644
> --- a/src/backend/utils/cache/inval.c
> +++ b/src/backend/utils/cache/inval.c
> @@ -872,6 +872,8 @@ 
> ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
>int 
> nmsgs, bool RelcacheInitFileInval,
>Oid 
> dbid, Oid tsid)
>  {
> + Assert(InRecovery);
> +

Idk, seems unrelated.

>   if (nmsgs <= 0)
>   return;
>  
> @@ -884,12 +886,13 @@ 
> ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
>dbid);
>  
>   /*
> -  * RelationCacheInitFilePreInvalidate, when the invalidation 
> message
> -  * is for a specific database, requires DatabasePath to be set, 
> but we
> -  * should not use SetDatabasePath during recovery, since it is
> -  * intended to be used only once by normal backends.  Hence, a 
> quick
> -  * hack: set DatabasePath directly then unset after use.
> +  * When the invalidation message is for a specific database,
> +  * RelationCacheInitFilePreInvalidate requires DatabasePath to 
> be set,
> +  * but we're not allowed to use SetDatabasePath during recovery 
> (since
> +  * it is intended to be used by normal backends).  Hence, a 
> quick hack:
> +  * set DatabasePath directly then unset after use.
>*/
> + Assert(!DatabasePath);  /* don't clobber an existing value */
>   if (OidIsValid(dbid))
>   DatabasePath = GetDatabasePath(dbid, tsid);

Same.


> diff --git a/src/backend/utils/cache/relcache.c 
> b/src/backend/utils/cache/relcache.c
> index aa4427724d..71b2212cbd 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -1934,6 +1934,11 @@ RelationIdGetRelation(Oid relationId)
>   RelationClearRelation(rd, true);
>  
>   /*
> +  * Normal entries are valid by now -- except nailed 
> ones when
> +  * loaded before relcache initialization.  There isn't 
> enough
> +  * infrastructure yet to do pg_class lookups, but we 
> need their
> +  * rd_rel entries to be updated, so we let these 
> through.
> +  */
>* Normally entries need to be valid here, but before 
> the relcache
>* has been initialized, not enough infrastructure 
> exists to
>* perform pg_class lookups. The structure of such 
> entries doesn't
> @@ -2346,8 +2351,7 @@ RelationClearRelation(Relation relation, bool rebuild)
>   RelationCloseSmgr(relation);

This sure looks like it's a syntax error? So I guess you might not have
staged the removal ports of the diff?


>   /*
> -  * Treat nailed-in system relations separately, they always need to be
> -  * accessible, so we can't blow them away.
> +  * We cannot blow away nailed-in relations, so treat them especially.
>*/

The former seems just as accurate, and is basically just the already
existing comment?


>   if (relation->rd_isnailed)
>   {
> @@ -5942,7 +5946,8 @@ write_relcache_init_file(bool shared)
>* wrote out was up-to-date.)
>*
>* This mustn't run concurrently with the code that unlinks an init file
> -  * and sends SI messages, so grab a serialization lock for the duration.
> +  * and sends SI messages (see RelationCacheInitFilePreInvalidate), so 
> grab
> +  * a serialization lock for the duration.
>*/
>   LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);

Unrelated?


> @@ -6061,6 +6066,10 @@ RelationHasUnloggedIndex(Relation rel)
>   * changed one or more of the relation cache entries that are kept in the
>   * local init file.
>   *
> + * When DatabasePath is set, both the init file for that database and the
> + * shared (global) init files are to be removed; otherwise only the latter 
> is.
> + * This is useful 

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-29 Thread Alvaro Herrera
On 2018-May-29, Alvaro Herrera wrote:

> I added an Assert(DatabasePath != NULL) to
> RelationCacheInitFilePreInvalidate() and didn't see a single crash when
> running the tests.  I thought that adding a "VACUUM FREEZE pg_class"
> in the recovery tests (where there is a standby) ought to do it, but it
> doesn't.  What's the deal there?

Sorry, that was dumb -- the assert obviously is hit if I remove the code
to set DatabasePath beforehand :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-29 Thread Alvaro Herrera
I added an Assert(DatabasePath != NULL) to
RelationCacheInitFilePreInvalidate() and didn't see a single crash when
running the tests.  I thought that adding a "VACUUM FREEZE pg_class"
in the recovery tests (where there is a standby) ought to do it, but it
doesn't.  What's the deal there?

Here are some proposed changes.  Some of these comment edits are WIP :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 0d6100fb08..8a8620943f 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -872,6 +872,8 @@ 
ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 int 
nmsgs, bool RelcacheInitFileInval,
 Oid 
dbid, Oid tsid)
 {
+   Assert(InRecovery);
+
if (nmsgs <= 0)
return;
 
@@ -884,12 +886,13 @@ 
ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 dbid);
 
/*
-* RelationCacheInitFilePreInvalidate, when the invalidation 
message
-* is for a specific database, requires DatabasePath to be set, 
but we
-* should not use SetDatabasePath during recovery, since it is
-* intended to be used only once by normal backends.  Hence, a 
quick
-* hack: set DatabasePath directly then unset after use.
+* When the invalidation message is for a specific database,
+* RelationCacheInitFilePreInvalidate requires DatabasePath to 
be set,
+* but we're not allowed to use SetDatabasePath during recovery 
(since
+* it is intended to be used by normal backends).  Hence, a 
quick hack:
+* set DatabasePath directly then unset after use.
 */
+   Assert(!DatabasePath);  /* don't clobber an existing value */
if (OidIsValid(dbid))
DatabasePath = GetDatabasePath(dbid, tsid);
 
diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index aa4427724d..71b2212cbd 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1934,6 +1934,11 @@ RelationIdGetRelation(Oid relationId)
RelationClearRelation(rd, true);
 
/*
+* Normal entries are valid by now -- except nailed 
ones when
+* loaded before relcache initialization.  There isn't 
enough
+* infrastructure yet to do pg_class lookups, but we 
need their
+* rd_rel entries to be updated, so we let these 
through.
+*/
 * Normally entries need to be valid here, but before 
the relcache
 * has been initialized, not enough infrastructure 
exists to
 * perform pg_class lookups. The structure of such 
entries doesn't
@@ -2346,8 +2351,7 @@ RelationClearRelation(Relation relation, bool rebuild)
RelationCloseSmgr(relation);
 
/*
-* Treat nailed-in system relations separately, they always need to be
-* accessible, so we can't blow them away.
+* We cannot blow away nailed-in relations, so treat them especially.
 */
if (relation->rd_isnailed)
{
@@ -5942,7 +5946,8 @@ write_relcache_init_file(bool shared)
 * wrote out was up-to-date.)
 *
 * This mustn't run concurrently with the code that unlinks an init file
-* and sends SI messages, so grab a serialization lock for the duration.
+* and sends SI messages (see RelationCacheInitFilePreInvalidate), so 
grab
+* a serialization lock for the duration.
 */
LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);
 
@@ -6061,6 +6066,10 @@ RelationHasUnloggedIndex(Relation rel)
  * changed one or more of the relation cache entries that are kept in the
  * local init file.
  *
+ * When DatabasePath is set, both the init file for that database and the
+ * shared (global) init files are to be removed; otherwise only the latter is.
+ * This is useful during recovery (XXX really?)
+ *
  * To be safe against concurrent inspection or rewriting of the init file,
  * we must take RelCacheInitLock, then remove the old init file, then send
  * the SI messages that include relcache inval for such relations, and then
@@ -6180,9 +6189,9 @@ unlink_initfile(const char *initfilename, int elevel)
 {
if (unlink(initfilename) < 0)
{
-   /* It might not be there, but log any error other than ENOENT */
+   /* It might not be there, but report any error other than 

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-29 Thread Andres Freund
On 2018-05-29 18:06:12 +, Nishant, Fnu wrote:
> Hi,
> 
> > To achieve this we can allocate Form_pg_class structures (for shared
> > relations… a small number) on shared memory.
> 
> But why would this be necessary / a good idea? Even if we decided it
> were, it seems like it'd end up being quite invasive.  But I doubt it's
> a good plan, because relcache entries want / need to be updated
> differently in the transaction that does the changes (as it needs to see
> the effect of catalog changes before commit) than other sessions (which
> only should see them after commit).
> 
> It will be a good idea as, we can avoid maintaining file
> (creation/deletion/updation) for period of engine running and we do
> not need to invalidate other backend cache.

a) That's a major change. Shouldn't be considered for a bugfix.
b) This is going to have locking issues / lock contention.


> > Why is this good idea? 
> Lets take an example (assuming we have 1000 postgres backends running).
> With shared memory scheme-
>   Operation wise, for a transaction, we allocate/free once (private 
> memory allocation) and memcpy data to and fro (from shared to private and 
> back to shared)...
>   Overall memory footprint 1 shared copy and 1 private only when updating.
>   No file creation/deletion/updation.

I don't buy that nailed relations are a meaningful part of that
problem. They hardly ever change.  And a shared cache is a much bigger
issues.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-29 Thread Nishant, Fnu
Hi,

> To achieve this we can allocate Form_pg_class structures (for shared
> relations… a small number) on shared memory.

But why would this be necessary / a good idea? Even if we decided it
were, it seems like it'd end up being quite invasive.  But I doubt it's
a good plan, because relcache entries want / need to be updated
differently in the transaction that does the changes (as it needs to see
the effect of catalog changes before commit) than other sessions (which
only should see them after commit).

It will be a good idea as, we can avoid maintaining file 
(creation/deletion/updation) for period of engine running and we do not need to 
invalidate other backend cache.
For transaction(which change catalog), we can allocate a new private 
Form_pg_class structure and do operations on it and update shared structure 
post commit.
Routine roughly will be-
1) A backend having a relcache entry for a shared rel where rd_rel part points 
to shared memory structure.
Rel->rd_rel is storing a shared memory pointer.
2) Wants to update the entry...allocate a new private structure and memcpy the 
shared content to this new memory and point rd_rel to private memory
Rel->rd_rel is storing a pointer to process specific structure.
3) Transaction committed and we copy the private memory content to shared 
memory area and point rd_rel again to shared memory.
4) free private memory.
5) Other backends do not do any invalidation but still get the latest updated 
values.

Why is this good idea? 
Lets take an example (assuming we have 1000 postgres backends running).
With shared memory scheme-
Operation wise, for a transaction, we allocate/free once (private 
memory allocation) and memcpy data to and fro (from shared to private and back 
to shared)...
Overall memory footprint 1 shared copy and 1 private only when updating.
No file creation/deletion/updation.
With current file scheme-
Operation wise, for a transaction, we use private cache but we need to 
invalidate 1000 other caches( which will be atleast 1000 memcpy and 
allocate/free) and may involve reading back in page of pg_class.
Overall memory footprint 1000 private copies.
We have to create/delete/update init file and synchronize around it.

Having said that we may not worry about transaction for updating all 
values...(I think relfrozenxid can be updated by a CAS operation ...still 
thinking on it).

-Nishant




Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-28 Thread Andres Freund
Hi,

On 2018-05-27 13:00:06 -0700, Andres Freund wrote:
> I've a patch that seems to work, that mostly needs some comment
> polishing.

Attached is what I currently have. Still needs some more work, but I
think it's more than good enough to review the approach.  Basically the
approach consists out of two changes:

1) Send init file removals for shared nailed relations as well.

   This fixes that the shared init file contains arbitrarily outdated
   information for relfrozenxid etc. Leading to e.g. the pg_authid
   errors we've seen in some recent threads.  Only applies to
   new connections.

2) Reread RelationData->rd_rel for nailed relations when invalidated.

   This ensures that already built relcache entries for nailed relations
   are updated. Currently they never are.  This currently doesn't cause
   *that* frequently an issue for !shared entries, because for those the
   init file gets zapped regularly, and autovacuum workers usually don't
   live that long.  But it's still a significant correctness issue for
   both shared an non shared relations.

FWIW, I wonder if this isn't critical enough to make us consider having
a point release earlier..

Greetings,

Andres Freund
>From c7121eec1e76b6f9b2cfa5d7d7f5f4b7a4f8be96 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 28 May 2018 12:38:22 -0700
Subject: [PATCH] WIP: Ensure relcache entries for nailed relations are
 accurate.

Author: Andres Freund
Reviewed-By:
Discussion:
https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
Backpatch:
---
 src/backend/utils/cache/inval.c|  31 +++--
 src/backend/utils/cache/relcache.c | 198 -
 src/include/storage/standbydefs.h  |   2 +-
 3 files changed, 156 insertions(+), 75 deletions(-)

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 955f5c7fdc5..0d6100fb082 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -521,12 +521,11 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
 	(void) GetCurrentCommandId(true);
 
 	/*
-	 * If the relation being invalidated is one of those cached in the local
-	 * relcache init file, mark that we need to zap that file at commit. Same
-	 * is true when we are invalidating whole relcache.
+	 * If the relation being invalidated is one of those cached in a relcache
+	 * init file, mark that we need to zap that file at commit. Same is true
+	 * when we are invalidating whole relcache.
 	 */
-	if (OidIsValid(dbId) &&
-		(RelationIdIsInInitFile(relId) || relId == InvalidOid))
+	if (relId == InvalidOid || RelationIdIsInInitFile(relId))
 		transInvalInfo->RelcacheInitFileInval = true;
 }
 
@@ -881,18 +880,26 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 
 	if (RelcacheInitFileInval)
 	{
+		elog(trace_recovery(DEBUG4), "removing relcache init files for database %u",
+			 dbid);
+
 		/*
-		 * RelationCacheInitFilePreInvalidate requires DatabasePath to be set,
-		 * but we should not use SetDatabasePath during recovery, since it is
+		 * RelationCacheInitFilePreInvalidate, when the invalidation message
+		 * is for a specific database, requires DatabasePath to be set, but we
+		 * should not use SetDatabasePath during recovery, since it is
 		 * intended to be used only once by normal backends.  Hence, a quick
 		 * hack: set DatabasePath directly then unset after use.
 		 */
-		DatabasePath = GetDatabasePath(dbid, tsid);
-		elog(trace_recovery(DEBUG4), "removing relcache init file in \"%s\"",
-			 DatabasePath);
+		if (OidIsValid(dbid))
+			DatabasePath = GetDatabasePath(dbid, tsid);
+
 		RelationCacheInitFilePreInvalidate();
-		pfree(DatabasePath);
-		DatabasePath = NULL;
+
+		if (OidIsValid(dbid))
+		{
+			pfree(DatabasePath);
+			DatabasePath = NULL;
+		}
 	}
 
 	SendSharedInvalidMessages(msgs, nmsgs);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3dfb1b8fbe2..aa4427724d5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -250,6 +250,7 @@ static void RelationDestroyRelation(Relation relation, bool remember_tupdesc);
 static void RelationClearRelation(Relation relation, bool rebuild);
 
 static void RelationReloadIndexInfo(Relation relation);
+static void RelationReloadNailed(Relation relation);
 static void RelationFlushRelation(Relation relation);
 static void RememberToFreeTupleDescAtEOX(TupleDesc td);
 static void AtEOXact_cleanup(Relation relation, bool isCommit);
@@ -286,7 +287,7 @@ static void IndexSupportInitialize(oidvector *indclass,
 static OpClassCacheEnt *LookupOpclassInfo(Oid operatorClassOid,
   StrategyNumber numSupport);
 static void 

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-28 Thread Nishant, Fnu
Hi,
We were working on this issue and thinking if we could actually make 
pg_class(rd_rel) part of recache entry upgradable.
To achieve this we can allocate Form_pg_class structures (for shared relations… 
a small number) on shared memory.
We do not need global pg_internal_init file as new backend during boot up will 
be set to point at already stored Form_pg_class structure.

Thanks,
Nishant

On 5/27/18, 1:01 PM, "Andres Freund"  wrote:

Hi,

On 2018-05-27 13:22:21 -0400, Tom Lane wrote:
> But I don't think there's any need for special magic here: we just
> have to accept the fact that there's a need to flush that cache
> sometimes.  In normal use it shouldn't happen often enough to be a
> performance problem.

Yea, it's not that problematic. We already remove the local init
file. I started out trying to write a version of invalidation that also
WAL logs shared inval, but that turns out to be hard to do without
breaking compatibilty.  So I think we should just always unlink the
shared one - that seems to work well.


> FWIW, I'm not on board with "memcpy the whole row".  I think the right
> thing is more like what we do in RelationReloadIndexInfo, ie copy over
> the specific fields that we expect to be mutable.

But that's what RelationReloadIndexInfo() etc do?
relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
I don't think we need to modify anything outside of rd_rel at this
point?

I've a patch that seems to work, that mostly needs some comment
polishing.

Greetings,

Andres Freund





Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Andres Freund
Hi,

On 2018-05-27 13:22:21 -0400, Tom Lane wrote:
> But I don't think there's any need for special magic here: we just
> have to accept the fact that there's a need to flush that cache
> sometimes.  In normal use it shouldn't happen often enough to be a
> performance problem.

Yea, it's not that problematic. We already remove the local init
file. I started out trying to write a version of invalidation that also
WAL logs shared inval, but that turns out to be hard to do without
breaking compatibilty.  So I think we should just always unlink the
shared one - that seems to work well.


> FWIW, I'm not on board with "memcpy the whole row".  I think the right
> thing is more like what we do in RelationReloadIndexInfo, ie copy over
> the specific fields that we expect to be mutable.

But that's what RelationReloadIndexInfo() etc do?
relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
I don't think we need to modify anything outside of rd_rel at this
point?

I've a patch that seems to work, that mostly needs some comment
polishing.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Tom Lane
Andres Freund  writes:
> On May 27, 2018 9:39:49 AM PDT, "Nasby, Jim"  wrote:
>> How about only keeping the critical info for being able to find
>> relations in the .init files, and then fully populate the cache by
>> doing a normal lookup?

> Then the cache wouldn't have any benefits, no? It's been a while, but last 
> time I checked it does make quite a measurable performance difference in a 
> new backend.

Yeah, we don't want to lose the performance benefit.  But I don't think
there's any need for special magic here: we just have to accept the fact
that there's a need to flush that cache sometimes.  In normal use it
shouldn't happen often enough to be a performance problem.

FWIW, I'm not on board with "memcpy the whole row".  I think the right
thing is more like what we do in RelationReloadIndexInfo, ie copy over
the specific fields that we expect to be mutable.

regards, tom lane



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Andres Freund


On May 27, 2018 9:39:49 AM PDT, "Nasby, Jim"  wrote:
>On May 26, 2018, at 1:45 PM, Andres Freund  wrote:
>> Does anybody see a way to not have to remove the .init file?
>
>How about only keeping the critical info for being able to find
>relations in the .init files, and then fully populate the cache by
>doing a normal lookup? Since there’s multiple entries needed to
>bootstrap things I guess there’d need to be a flag in relcache to
>indicate that an existing entry wasn’t fully formed.

Then the cache wouldn't have any benefits, no? It's been a while, but last time 
I checked it does make quite a measurable performance difference in a new 
backend.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Nasby, Jim
On May 26, 2018, at 1:45 PM, Andres Freund  wrote:
> Does anybody see a way to not have to remove the .init file?

How about only keeping the critical info for being able to find relations in 
the .init files, and then fully populate the cache by doing a normal lookup? 
Since there’s multiple entries needed to bootstrap things I guess there’d need 
to be a flag in relcache to indicate that an existing entry wasn’t fully formed.

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-26 Thread Andres Freund
On 2018-05-26 13:45:06 -0700, Andres Freund wrote:
> On 2018-05-25 15:05:31 -0700, Andres Freund wrote:
> > On 2018-05-25 17:47:37 -0400, Tom Lane wrote:
> > > For nailed indexes, we allow updating of some additional fields, and I
> > > guess what has to happen here is that we teach the code to update some
> > > additional fields for nailed tables too.
> > 
> > Yea, it seems like we could just get a new version of the pg_class tuple
> > if in the right state, and memcpy() it into place. Not sure if there's
> > any other issues...
> 
> That part isn't too hard. I've a patch that appears to address the
> issue, and isn't *too* ugly.
> 
> We don't really have a way to force .init file removal / update for
> shared relations however. Otherwise we'll just continue to read old data
> from .init files at startup. And there'll commonly not be any
> outstanding invalidation.  Thus it appears to me that we need to extend
> RelcacheInitFileInval to also support the shared file.  That's WAL
> logged, but it looks like we can just add flag like
> XACT_COMPLETION_UPDATE_RELCACHE_FILE without breaking the WAL format.
> 
> Does anybody see a way to not have to remove the .init file?

Just to be clear: We already remove the non-shared relcache init file
when a non-shared table in it is changed . Which I presume is the reason
this issue hasn't bitten us in a much bigger way. While the lack of
proper invalidations means that already running sessions will see the
wrong values and make wrong decisions, the fact that the non-shared file
will regularly be removed has reduced the impact quite a bit.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-26 Thread Andres Freund
On 2018-05-25 15:05:31 -0700, Andres Freund wrote:
> On 2018-05-25 17:47:37 -0400, Tom Lane wrote:
> > For nailed indexes, we allow updating of some additional fields, and I
> > guess what has to happen here is that we teach the code to update some
> > additional fields for nailed tables too.
> 
> Yea, it seems like we could just get a new version of the pg_class tuple
> if in the right state, and memcpy() it into place. Not sure if there's
> any other issues...

That part isn't too hard. I've a patch that appears to address the
issue, and isn't *too* ugly.

We don't really have a way to force .init file removal / update for
shared relations however. Otherwise we'll just continue to read old data
from .init files at startup. And there'll commonly not be any
outstanding invalidation.  Thus it appears to me that we need to extend
RelcacheInitFileInval to also support the shared file.  That's WAL
logged, but it looks like we can just add flag like
XACT_COMPLETION_UPDATE_RELCACHE_FILE without breaking the WAL format.

Does anybody see a way to not have to remove the .init file?

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Andres Freund
On 2018-05-25 17:47:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Moving discussion to -hackers.  Tom, I think you worked most with this
> > code, your input would be appreciated.
> 
> Yeah, the assumption in the relcache is that the only part of a nailed
> catalog's relcache entry that really needs to be updated intrasession is
> the relfilenode mapping.

Paging through the changes to relcache.c and vacuum[lazy].c it looks to
me like that hasn't been true in a long time, right?


> For nailed indexes, we allow updating of some additional fields, and I
> guess what has to happen here is that we teach the code to update some
> additional fields for nailed tables too.

Yea, it seems like we could just get a new version of the pg_class tuple
if in the right state, and memcpy() it into place. Not sure if there's
any other issues...


BTW, and I guess this mostly goes to Alvaro, I don't understand why that
code accepts relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX?
That seems like something we'll hopefully never support.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Tom Lane
Andres Freund  writes:
> Moving discussion to -hackers.  Tom, I think you worked most with this
> code, your input would be appreciated.

Yeah, the assumption in the relcache is that the only part of a nailed
catalog's relcache entry that really needs to be updated intrasession is
the relfilenode mapping.  For nailed indexes, we allow updating of some
additional fields, and I guess what has to happen here is that we teach
the code to update some additional fields for nailed tables too.

regards, tom lane



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Andres Freund
Hi,

Moving discussion to -hackers.  Tom, I think you worked most with this
code, your input would be appreciated.

Original discussion is around:
http://archives.postgresql.org/message-id/20180524211311.tnswfnjwnii54htx%40alvherre.pgsql

On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote:
> On 2018-May-24, Andres Freund wrote:
> > Then there's also:
> > http://archives.postgresql.org/message-id/1527193504642.36340%40amazon.com
> 
> ah, so deleting the relcache file makes the problem to go away?  That's
> definitely pretty strange.  I see no reason for the value in relcache to
> become out of step with the catalogued value in the same database ... I
> don't think we transmit in any way values of one database to another.

I can reproduce the issue. As far as I can tell we just don't ever
actually update nailed relcache entries in the normal course, leaving
the "physical address" aside.  VACUUM will, via
vac_update_relstats() -> heap_inplace_update() -> CacheInvalidateHeapTuple(),
send out an invalidation. But invalidation, in my case another session,
will essentially ignore most of that due to:

static void
RelationClearRelation(Relation relation, bool rebuild)
...
/*
 * Never, never ever blow away a nailed-in system relation, because we'd
 * be unable to recover.  However, we must redo RelationInitPhysicalAddr
 * in case it is a mapped relation whose mapping changed.
 *
 * If it's a nailed-but-not-mapped index, then we need to re-read the
 * pg_class row to see if its relfilenode changed. We do that 
immediately
 * if we're inside a valid transaction and the relation is open (not
 * counting the nailed refcount).  Otherwise just mark the entry as
 * possibly invalid, and it'll be fixed when next opened.
 */
if (relation->rd_isnailed)
{
RelationInitPhysicalAddr(relation);

if (relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
{
relation->rd_isvalid = false;   /* needs to be 
revalidated */
if (relation->rd_refcnt > 1 && IsTransactionState())
RelationReloadIndexInfo(relation);
}
return;
}

Which basically means that once running we'll never update the relcache
data for nailed entries.  That's unproblematic for most relcache fields,
but not for things like RelationData->rd_rel->relfrozenxid / relminmxid.

This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive
vacuums despite being required. And it'll lead, triggering this thread,
to wrong errors being raised during vacuum because relfrozenxid just is
some random value from the past.  I suspect this might also be
co-responsible for a bunch of planning issues for queries involving the
catalog, because the planner will use wrong relcache data until the next
time the init file is thrown away?

This looks like a very longstanding bug to me.  I'm not yet quite sure
what the best way to deal with this is.  I suspect we might get away
with just looking up a new version of the pg_class tuple and copying
rd_rel over?

Greetings,

Andres Freund