On 2021-Apr-08, Tom Lane wrote:
> Alvaro Herrera <[email protected]> writes:
> > autovacuum: handle analyze for partitioned tables
>
> Looks like this has issues under EXEC_BACKEND:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08
Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
think this is unrelated to that, but rather a race condition.
The backtrace saved by buildfarm is:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 relation_needs_vacanalyze (relid=relid@entry=43057,
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0,
tabentry=0x5611ec71b030,
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1,
wraparound=0x7ffd78cc4ee2) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
3237 childclass = (Form_pg_class)
GETSTRUCT(childtuple);
#0 relation_needs_vacanalyze (relid=relid@entry=43057,
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0,
tabentry=0x5611ec71b030,
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1,
wraparound=0x7ffd78cc4ee2) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
#1 0x00005611eb09fc91 in do_autovacuum () at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:2168
#2 0x00005611eb0a0f8b in AutoVacWorkerMain (argc=argc@entry=1,
argv=argv@entry=0x5611ec61f1e0) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:1715
the code in question is:
children = find_all_inheritors(relid, AccessShareLock,
NULL);
foreach(lc, children)
{
Oid childOID =
lfirst_oid(lc);
HeapTuple childtuple;
Form_pg_class childclass;
childtuple = SearchSysCache1(RELOID,
ObjectIdGetDatum(childOID));
childclass = (Form_pg_class)
GETSTRUCT(childtuple);
Evidently SearchSysCache must be returning NULL, but how come that
happens, when we have acquired lock on the rel during
find_all_inheritors?
I would suggest that we do not take lock here at all, and just skip the
rel if SearchSysCache returns empty, as in the attached. Still, I am
baffled about this crash.
--
Álvaro Herrera Valdivia, Chile
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)
>From 2bb3e54862c37ee2a20fed21513a3df309381919 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <[email protected]>
Date: Thu, 8 Apr 2021 11:10:44 -0400
Subject: [PATCH] Fix race condition in relation_needs_vacanalyze
---
src/backend/postmaster/autovacuum.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aef9ac4dd2..96073d4597 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3223,18 +3223,23 @@ relation_needs_vacanalyze(Oid relid,
ListCell *lc;
reltuples = 0;
- /* Find all members of inheritance set taking AccessShareLock */
- children = find_all_inheritors(relid, AccessShareLock, NULL);
+ /*
+ * Find all members of inheritance set. Beware that they may
+ * disappear from under us, since we don't acquire any locks.
+ */
+ children = find_all_inheritors(relid, NoLock, NULL);
foreach(lc, children)
{
Oid childOID = lfirst_oid(lc);
HeapTuple childtuple;
Form_pg_class childclass;
childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID));
+ if (childtuple == NULL)
+ continue;
childclass = (Form_pg_class) GETSTRUCT(childtuple);
/* Skip a partitioned table and foreign partitions */
if (RELKIND_HAS_STORAGE(childclass->relkind))
--
2.20.1