Re: error_severity of brin work item

2021-12-02 Thread Daniel Gustafsson
This thread has stalled, and from the discussions there isn't consencus on how
to proceed (if at all?), so I'm marking this Returned with Feedback.  If the
discussion and work picks up again then it can be resubmitted.

--
Daniel Gustafsson   https://vmware.com/





Re: error_severity of brin work item

2021-09-24 Thread Justin Pryzby
On Fri, Sep 24, 2021 at 10:42:36AM -0300, Alvaro Herrera wrote:
> I think the most reasonable action is to push the patch in
> https://postgr.es/m/20201123193957.GA21810@alvherre.pgsql to all
> branches, closing the immediate hole, and we can see about the xact-hook
> stuff (when we have it) to master only, which I view as a better
> mechanism to protect work-items going forward.  That's what I understand
> Justin was suggesting in his last reply.

I suggested to use the 20 line solution rather than invent new infrastructure
just to avoid corner case errors.

But I also don't understand why changing to use an INFO message doesn't work.
vacuum_open_relation() outputs a WARNING for manual vacuum and an INFO for
autovacuum when log_min_duration >= 0.  Why couldn't this do the same thing?

I found a few instances of this in my logs, indicating that my understanding of
the logic is not wrong.

log_time   | 2021-01-13 15:23:40.687-05
session_line   | 2
session_start_time | 2021-01-13 15:23:19-05
error_severity | LOG
sql_state_code | 42P01
message| skipping vacuum of "pg_toast_984781820" --- relation 
no longer exists
backend_type   | autovacuum worker

-- 
Justin




Re: error_severity of brin work item

2021-09-24 Thread Alvaro Herrera
On 2021-Sep-24, Jaime Casanova wrote:

> Do you plan to work on this for this CF?

I think the most reasonable action is to push the patch in
https://postgr.es/m/20201123193957.GA21810@alvherre.pgsql to all
branches, closing the immediate hole, and we can see about the xact-hook
stuff (when we have it) to master only, which I view as a better
mechanism to protect work-items going forward.  That's what I understand
Justin was suggesting in his last reply.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)




Re: error_severity of brin work item

2021-09-24 Thread Jaime Casanova
On Wed, Mar 10, 2021 at 08:24:50AM -0500, David Steele wrote:
> On 12/1/20 5:25 PM, Justin Pryzby wrote:
> > On Tue, Dec 01, 2020 at 03:57:24PM -0300, Alvaro Herrera wrote:
> > 
> > > > Another idea is if perform_work_item() were responsible for discarding
> > > > relations which disappear.  Currently it does this, which is racy since 
> > > > it
> > > > holds no lock.
> > > 
> > > That has the property that it remains contained in autovacuum.c, but no
> > > other advantages I think.
> > 
> > It has the advantage that it moves all the try_open stuff out of brin.
> > 
> > I started implementing this, and then realized that the try_open stuff 
> > *has* to
> > be in the brin_summarize function, to handle the case that someone passes a
> > non-index, since it's SQL exposed.
> > So maybe we should use your LockOid patch now, and refactor in the future 
> > if we
> > add additional work-item types.
> 
> Thoughts on this, Álvaro? I can see that the first version of this patch was
> not ideal but the rework seems to have stalled. Since it is a bug perhaps it
> would be better to get something in as Justin suggests?
> 

Hi Álvaro,

Do you plan to work on this for this CF?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: error_severity of brin work item

2021-03-10 Thread David Steele

On 12/1/20 5:25 PM, Justin Pryzby wrote:

On Tue, Dec 01, 2020 at 03:57:24PM -0300, Alvaro Herrera wrote:


Another idea is if perform_work_item() were responsible for discarding
relations which disappear.  Currently it does this, which is racy since it
holds no lock.


That has the property that it remains contained in autovacuum.c, but no
other advantages I think.


It has the advantage that it moves all the try_open stuff out of brin.

I started implementing this, and then realized that the try_open stuff *has* to
be in the brin_summarize function, to handle the case that someone passes a
non-index, since it's SQL exposed.
So maybe we should use your LockOid patch now, and refactor in the future if we
add additional work-item types.


Thoughts on this, Álvaro? I can see that the first version of this patch 
was not ideal but the rework seems to have stalled. Since it is a bug 
perhaps it would be better to get something in as Justin suggests?


Regards,
--
-David
da...@pgmasters.net




Re: error_severity of brin work item

2020-12-01 Thread Justin Pryzby
On Tue, Dec 01, 2020 at 03:57:24PM -0300, Alvaro Herrera wrote:
> On 2020-Dec-01, Justin Pryzby wrote:
> 
> > This was an idea I made up - I don't know any of the details of this, but if
> > you give a hint I could look at it more.  There'd (still) be a race window, 
> > but
> > I think that's ok.
> 
> See CommitTransaction() and friends, where AtEOXact_on_commit_actions()

..thanks.  I was going to persue the other idea first since this is new to me.

> Not sure what's a good place for OIDs to be added to the list.  We don't
> have AM-specific entry points for relation drop.  I think this is the
> weakest point of this.

I assume it would just add OIDs of *all* dropped rels, and the autovacuum half
would silently ignore any OID for which there's no work item.  (As it would do
in any case).

> > Another idea is if perform_work_item() were responsible for discarding
> > relations which disappear.  Currently it does this, which is racy since it
> > holds no lock.
> 
> That has the property that it remains contained in autovacuum.c, but no
> other advantages I think.

It has the advantage that it moves all the try_open stuff out of brin.

I started implementing this, and then realized that the try_open stuff *has* to
be in the brin_summarize function, to handle the case that someone passes a
non-index, since it's SQL exposed.
So maybe we should use your LockOid patch now, and refactor in the future if we
add additional work-item types.

-- 
Justin




Re: error_severity of brin work item

2020-12-01 Thread Alvaro Herrera
On 2020-Dec-01, Justin Pryzby wrote:

> This was an idea I made up - I don't know any of the details of this, but if
> you give a hint I could look at it more.  There'd (still) be a race window, 
> but
> I think that's ok.

See CommitTransaction() and friends, where AtEOXact_on_commit_actions()
and others are called.  You'd have to create a new routine (say
AtEOXact_Autovacuum or more specific AtEOXact_AutovacuumWorkItems), to
be called at the right places in xact.c.  Keep a global variable, say a
list of OIDs.  On subxact commit, the list is reassigned to its parent
transaction; on subxact abort, the list is discarded.  On top xact
commit, the list of OIDs is passed to some new routine in autovacuum.c
that scans the workitem array and deletes items as appropriate.

Not sure what's a good place for OIDs to be added to the list.  We don't
have AM-specific entry points for relation drop.  I think this is the
weakest point of this.


> Another idea is if perform_work_item() were responsible for discarding
> relations which disappear.  Currently it does this, which is racy since it
> holds no lock.

That has the property that it remains contained in autovacuum.c, but no
other advantages I think.




Re: error_severity of brin work item

2020-12-01 Thread Justin Pryzby
On Tue, Dec 01, 2020 at 11:07:30AM -0300, Alvaro Herrera wrote:
> > Should it be done in an AtCommit hook or something like that ?
> 
> I didn't like this idea much on first read, on extensibility grounds,
> but perhaps it's not so bad because we can generalize it whenever
> there's pressure to add a second type of work-item (*if* that ever
> happens).
> 
> I suppose the idea is that index_drop saves the index OID when a BRIN
> index with autosummarization=on is dropped, and then the
> AtCommit_WorkItems() call scans the items list and drops those that
> match any OIDs in the list.  (It does require to be mindful of subxact
> aborts, of course.)

This was an idea I made up - I don't know any of the details of this, but if
you give a hint I could look at it more.  There'd (still) be a race window, but
I think that's ok.

Another idea is if perform_work_item() were responsible for discarding
relations which disappear.  Currently it does this, which is racy since it
holds no lock.

cur_relname = get_rel_name(workitem->avw_relation);
cur_nspname = 
get_namespace_name(get_rel_namespace(workitem->avw_relation));
cur_datname = get_database_name(MyDatabaseId);
if (!cur_relname || !cur_nspname || !cur_datname)
goto deleted2;

-- 
Justin




Re: error_severity of brin work item

2020-12-01 Thread Alvaro Herrera
On 2020-Nov-30, Justin Pryzby wrote:

> On Mon, Nov 30, 2020 at 08:47:32PM -0300, Alvaro Herrera wrote:
> > The more I look at this, the less I like it.  This would set a precedent
> > that any action that can be initiated from an autovac work-item has a
> > requirement of silently being discarded when it referenced a
> > non-existant relation.
> 
> My original request was to change to INFO, which is what vacuum proper does at
> vacuum_open_relation().  I realize that still means that new work item 
> handlers
> would have to do the LockOid, try_open dance - maybe it could be factored out.

As I understand, INFO is not well suited to messages that are not going
to the client.  Anyway, my point is about the contortions that are
needed to support the case, rather than what exactly do we do when it
happens.

Retrospectively, it's strange that this problem (what happens when
indexes with pending work-items are dropped) hadn't manifested.  It
seems a pretty obvious one.

> > I'd rather have the code that drops the index go through the list of
> > work-items and delete those that reference that relation.
> > 
> > I'm not sure if this is something that ought to be done in index_drop();
> > One objection might be that if the drop is rolled back, the work-items
> > are lost.
> 
> Should it be done in an AtCommit hook or something like that ?

I didn't like this idea much on first read, on extensibility grounds,
but perhaps it's not so bad because we can generalize it whenever
there's pressure to add a second type of work-item (*if* that ever
happens).

I suppose the idea is that index_drop saves the index OID when a BRIN
index with autosummarization=on is dropped, and then the
AtCommit_WorkItems() call scans the items list and drops those that
match any OIDs in the list.  (It does require to be mindful of subxact
aborts, of course.)




Re: error_severity of brin work item

2020-11-30 Thread Justin Pryzby
On Mon, Nov 30, 2020 at 08:47:32PM -0300, Alvaro Herrera wrote:
> The more I look at this, the less I like it.  This would set a precedent
> that any action that can be initiated from an autovac work-item has a
> requirement of silently being discarded when it referenced a
> non-existant relation.

My original request was to change to INFO, which is what vacuum proper does at
vacuum_open_relation().  I realize that still means that new work item handlers
would have to do the LockOid, try_open dance - maybe it could be factored out.

I noticed this on multiple servers immediate after changing a nagios script to
look for all ERRORs (rather than a specific list of log_messages) and for a
longer time period.

> I'd rather have the code that drops the index go through the list of
> work-items and delete those that reference that relation.
> 
> I'm not sure if this is something that ought to be done in index_drop();
> One objection might be that if the drop is rolled back, the work-items
> are lost.

Should it be done in an AtCommit hook or something like that ?

-- 
Justin




Re: error_severity of brin work item

2020-11-30 Thread Alvaro Herrera
The more I look at this, the less I like it.  This would set a precedent
that any action that can be initiated from an autovac work-item has a
requirement of silently being discarded when it referenced a
non-existant relation.

I'd rather have the code that drops the index go through the list of
work-items and delete those that reference that relation.

I'm not sure if this is something that ought to be done in index_drop();
One objection might be that if the drop is rolled back, the work-items
are lost.  It's the easiest, though; and work-items are supposed to be
lossy anyway, and vacuum would fix the lack of summarization eventually.
So, not pretty, but not all that bad.  (Hopefully rolled-back drops are
not all that common.)




Re: error_severity of brin work item

2020-11-25 Thread Justin Pryzby
On Mon, Nov 23, 2020 at 04:39:57PM -0300, Alvaro Herrera wrote:
> I think this formulation (attached v3) has fewer moving parts.
> 
> However, now that I did that, I wonder if this is really the best
> approach to solve this problem.  Maybe instead of doing this at the BRIN
> level, it should be handled at the autovac level, by having the worker
> copy the work-item to local memory and remove it from the shared list as
> soon as it is in progress.  That way, if *any* error occurs while trying
> to execute it, it will go away instead of being retried for all
> eternity.
> 
> Preliminary patch for that attached as autovacuum-workitem.patch.
> 
> I would propose to clean that up to apply instead of your proposed fix.

I don't know why you said it would be retried for eternity ?
I think the perform_work_item() TRY/CATCH avoids that.

Also .. I think your idea doesn't solve my issue, that REINDEX CONCURRENTLY is
causing vacuum to leave errors in my logs.

I check that the first patch avoids the issue and the 2nd one does not.

-- 
Justin




Re: error_severity of brin work item

2020-11-23 Thread Alvaro Herrera
I think this formulation (attached v3) has fewer moving parts.

However, now that I did that, I wonder if this is really the best
approach to solve this problem.  Maybe instead of doing this at the BRIN
level, it should be handled at the autovac level, by having the worker
copy the work-item to local memory and remove it from the shared list as
soon as it is in progress.  That way, if *any* error occurs while trying
to execute it, it will go away instead of being retried for all
eternity.

Preliminary patch for that attached as autovacuum-workitem.patch.

I would propose to clean that up to apply instead of your proposed fix.
>From e0b457866e503a1b70b94c540147026b652fb019 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Nov 2020 13:39:31 -0300
Subject: [PATCH v3] Avoid errors in brin summarization..

..which can happen if an index is reindexed concurrently
---
 src/backend/access/brin/brin.c | 37 --
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..4d012ebd76 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -33,6 +33,7 @@
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
+#include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/index_selfuncs.h"
@@ -886,21 +887,35 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
-	 * passed indexoid isn't an index then IndexGetRelation() will fail.
-	 * Rather than emitting a not-very-helpful error message, postpone
-	 * complaining, expecting that the is-it-an-index test below will fail.
+	 * passed indexoid isn't an index, then IndexGetRelation(false) would
+	 * emit a not-very-helpful error message.  Instead, postpone complaining
+	 * until the is-it-an-index test, below.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
-	if (OidIsValid(heapoid))
-		heapRel = table_open(heapoid, ShareUpdateExclusiveLock);
-	else
-		heapRel = NULL;
+	if (heapoid != InvalidOid)
+		LockRelationOid(heapoid, ShareUpdateExclusiveLock);
 
-	indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+	/* Now we can open the index; silently skip if it's gone. */
+	indexRel = try_relation_open(indexoid, ShareUpdateExclusiveLock);
+	if (!indexRel)
+	{
+		if (heapoid != InvalidOid)
+			UnlockRelationOid(heapoid, ShareUpdateExclusiveLock);
+		PG_RETURN_INT32(0);
+	}
+
+	/* If passed a non-index, fail noisily */
+	if (indexRel->rd_rel->relkind != RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not an index",
+		RelationGetRelationName(indexRel;
+
+	/* Now we can open the table */
+	heapRel = table_open(heapoid, NoLock);
 
 	/* Must be a BRIN index */
-	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
-		indexRel->rd_rel->relam != BRIN_AM_OID)
+	if (indexRel->rd_rel->relam != BRIN_AM_OID)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("\"%s\" is not a BRIN index",
@@ -916,7 +931,7 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	 * barely possible that a race against an index drop/recreation could have
 	 * netted us the wrong table.  Recheck.
 	 */
-	if (heapRel == NULL || heapoid != IndexGetRelation(indexoid, false))
+	if (heapoid != IndexGetRelation(indexoid, false))
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_TABLE),
  errmsg("could not open parent table of index %s",
-- 
2.20.1

commit 2326df869ee8516b8b1f0e01930946e9a63800b5
Author: Alvaro Herrera 
AuthorDate: Mon Nov 23 15:56:23 2020 -0300
CommitDate: Mon Nov 23 15:57:28 2020 -0300

autovacuum-centered workitem fix

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97fbac..77c1401bc9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2561,6 +2561,7 @@ deleted:
 	for (i = 0; i < NUM_WORKITEMS; i++)
 	{
 		AutoVacuumWorkItem *workitem = >av_workItems[i];
+		AutoVacuumWorkItem my_workitem;
 
 		if (!workitem->avw_used)
 			continue;
@@ -2569,11 +2570,14 @@ deleted:
 		if (workitem->avw_database != MyDatabaseId)
 			continue;
 
+		my_workitem = *workitem;
+		workitem->avw_used = false;
+
 		/* claim this one, and release lock while performing it */
 		workitem->avw_active = true;
 		LWLockRelease(AutovacuumLock);
 
-		perform_work_item(workitem);
+		perform_work_item(_workitem);
 
 		/*
 		 * Check for config changes before acquiring lock for further jobs.


Re: error_severity of brin work item

2020-11-19 Thread Justin Pryzby
On Thu, Nov 19, 2020 at 03:15:21PM -0300, Alvaro Herrera wrote:
> On 2020-Nov-19, Justin Pryzby wrote:
> 
> > On Fri, Nov 13, 2020 at 12:11:21PM -0600, Justin Pryzby wrote:
> 
> > > Your patch didn't actually say "try_relation_open", so didn't work.
> > > But it does works if I do that, and close the table.
> 
> Thanks for fixing and testing.
> 
> > That patch broke the case that a non-index is passed, which I addressed 
> > here.
> 
> Hmm, I think the reaction to that should be the same as before, so
> rather than return 0, the patch should raise the same error that
> index_open() would.

The resulting logic is not very clear and requires a lot of commentary..

BTW I saw that in tablecmds.c, RangeVarCallbackForAttachIndex() does this:

if (classform->relkind != RELKIND_PARTITIONED_INDEX &&
classform->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 errmsg("\"%s\" is not an index", 
rv->relname)));

Is it wrong to use ERRCODE_INVALID_OBJECT_DEFINITION ?
Most other places say ERRCODE_WRONG_OBJECT_TYPE

Likewise, transformPartitionCmd() in parse_utilcmd.c:
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 errmsg("\"%s\" is not a partitioned table",
RelationGetRelationName(parentRel;

-- 
Justin
>From 3c6deeb57ba007b142f35312a436defb9d1b3783 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Nov 2020 13:39:31 -0300
Subject: [PATCH] Avoid errors in brin summarization..

..which can happen if an index is reindexed concurrently
---
 src/backend/access/brin/brin.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..a1089d5f9e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -886,9 +886,9 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
-	 * passed indexoid isn't an index then IndexGetRelation() will fail.
-	 * Rather than emitting a not-very-helpful error message, postpone
-	 * complaining, expecting that the is-it-an-index test below will fail.
+	 * passed indexoid isn't an index, then IndexGetRelation(true) would
+	 * emit a not-very-helpful error message.  Instead, postpone complaining
+	 * until the is-it-an-index test, below.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
 	if (OidIsValid(heapoid))
@@ -896,11 +896,33 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	else
 		heapRel = NULL;
 
-	indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+	indexRel = try_relation_open(indexoid, ShareUpdateExclusiveLock);
+
+	/* Silently skip autovacuum work-items if an index has disappeared. */
+	if (!indexRel)
+	{
+		if (heapRel)
+			table_close(heapRel, ShareUpdateExclusiveLock);
+
+		PG_RETURN_INT32(0);
+	}
+
+	/* If passed a non-index, fail noisily */
+	if (indexRel->rd_rel->relkind != RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not an index",
+		RelationGetRelationName(indexRel;
+
+	/* If the table wasn't opened for some other reason, silently skip it */
+	if (!heapRel)
+	{
+		relation_close(indexRel, ShareUpdateExclusiveLock);
+		PG_RETURN_INT32(0);
+	}
 
 	/* Must be a BRIN index */
-	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
-		indexRel->rd_rel->relam != BRIN_AM_OID)
+	if (indexRel->rd_rel->relam != BRIN_AM_OID)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("\"%s\" is not a BRIN index",
-- 
2.17.0



Re: error_severity of brin work item

2020-11-19 Thread Alvaro Herrera
On 2020-Nov-19, Justin Pryzby wrote:

> On Fri, Nov 13, 2020 at 12:11:21PM -0600, Justin Pryzby wrote:

> > Your patch didn't actually say "try_relation_open", so didn't work.
> > But it does works if I do that, and close the table.

Thanks for fixing and testing.

> That patch broke the case that a non-index is passed, which I addressed here.

Hmm, I think the reaction to that should be the same as before, so
rather than return 0, the patch should raise the same error that
index_open() would.





Re: error_severity of brin work item

2020-11-19 Thread Justin Pryzby
On Fri, Nov 13, 2020 at 12:11:21PM -0600, Justin Pryzby wrote:
> On Fri, Nov 13, 2020 at 01:39:31PM -0300, Alvaro Herrera wrote:
> > On 2020-Nov-13, Justin Pryzby wrote:
> > 
> > > I saw a bunch of these in my logs:
> > > 
> > > log_time | 2020-10-25 22:59:45.619-07
> > > database | 
> > > left | could not open relation with OID 292103095
> > > left | processing work entry for relation 
> > > "ts.child.alarms_202010_alarm_clear_time_idx"
> > > 
> > > Those happen following a REINDEX job on that index.
> > > 
> > > I think that should be more like an INFO message, since that's what 
> > > vacuum does
> > > (vacuum_open_relation), and a queued work item is even more likely to hit 
> > > a
> > > dropped relation.
> > 
> > Ah, interesting.  Yeah, I agree this is a bug.  I think it can be fixed
> > by using try_relation_open() on the index; if that returns NULL, discard
> > the work item.
> > 
> > Does this patch solve the problem?
> 
> Your patch didn't actually say "try_relation_open", so didn't work.
> But it does works if I do that, and close the table.

That patch broke the case that a non-index is passed, which I addressed here.

I wondered if the function should return NULL in those cases, but it seems to
be "impossible".
>From ef5685253a04cf720574f21b711e177e13299ba3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Nov 2020 13:39:31 -0300
Subject: [PATCH v3] Avoid errors in brin summarization..

..which can happen if an index is reindexed concurrently
---
 src/backend/access/brin/brin.c | 22 +++---
 src/test/regress/expected/brin.out |  6 +-
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..46e5269260 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -887,16 +887,24 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
 	 * passed indexoid isn't an index then IndexGetRelation() will fail.
-	 * Rather than emitting a not-very-helpful error message, postpone
-	 * complaining, expecting that the is-it-an-index test below will fail.
+	 * Rather than emitting a not-very-helpful error message, prepare to
+	 * return without doing anything.  This allows autovacuum work-items to be
+	 * silently discarded rather than uselessly accumulating error messages in
+	 * the server log.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
-	if (OidIsValid(heapoid))
-		heapRel = table_open(heapoid, ShareUpdateExclusiveLock);
-	else
-		heapRel = NULL;
+	heapRel = OidIsValid(heapoid) ?
+		table_open(heapoid, ShareUpdateExclusiveLock) : NULL;
 
-	indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+	if (heapRel == NULL)
+		PG_RETURN_INT32(0);
+
+	indexRel = try_relation_open(indexoid, ShareUpdateExclusiveLock);
+	if (indexRel == NULL)
+	{
+		table_close(heapRel, ShareUpdateExclusiveLock);
+		PG_RETURN_INT32(0);
+	}
 
 	/* Must be a BRIN index */
 	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index e53d6e4885..51dff7295d 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -405,7 +405,11 @@ UPDATE brintest SET int8col = int8col * int4col;
 UPDATE brintest SET textcol = '' WHERE textcol IS NOT NULL;
 -- Tests for brin_summarize_new_values
 SELECT brin_summarize_new_values('brintest'); -- error, not an index
-ERROR:  "brintest" is not an index
+ brin_summarize_new_values 
+---
+ 0
+(1 row)
+
 SELECT brin_summarize_new_values('tenk1_unique1'); -- error, not a BRIN index
 ERROR:  "tenk1_unique1" is not a BRIN index
 SELECT brin_summarize_new_values('brinidx'); -- ok, no change expected
-- 
2.17.0



Re: error_severity of brin work item

2020-11-13 Thread Justin Pryzby
On Fri, Nov 13, 2020 at 01:39:31PM -0300, Alvaro Herrera wrote:
> On 2020-Nov-13, Justin Pryzby wrote:
> 
> > I saw a bunch of these in my logs:
> > 
> > log_time | 2020-10-25 22:59:45.619-07
> > database | 
> > left | could not open relation with OID 292103095
> > left | processing work entry for relation 
> > "ts.child.alarms_202010_alarm_clear_time_idx"
> > 
> > Those happen following a REINDEX job on that index.
> > 
> > I think that should be more like an INFO message, since that's what vacuum 
> > does
> > (vacuum_open_relation), and a queued work item is even more likely to hit a
> > dropped relation.
> 
> Ah, interesting.  Yeah, I agree this is a bug.  I think it can be fixed
> by using try_relation_open() on the index; if that returns NULL, discard
> the work item.
> 
> Does this patch solve the problem?

Your patch didn't actually say "try_relation_open", so didn't work.
But it does works if I do that, and close the table.

I tested like:

pryzbyj=# ALTER SYSTEM SET 
backtrace_functions='try_relation_open,relation_open';
pryzbyj=# ALTER SYSTEM SET autovacuum_naptime=3; SELECT pg_reload_conf();
pryzbyj=# CREATE TABLE tt AS SELECT generate_series(1,)i;
pryzbyj=# CREATE INDEX ON tt USING brin(i) 
WITH(autosummarize,pages_per_range=1);
pryzbyj=# \! while :; do psql -h /tmp -qc 'SET client_min_messages=info' -c 
'REINDEX INDEX CONCURRENTLY tt_i_idx'; done&

-- run this 5-10 times and hit the "...was not recorded" message, which for
-- whatever reason causes the race condition involving work queue
pryzbyj=# UPDATE tt SET i=1+i;

2020-11-13 11:50:46.093 CST [30687] ERROR:  could not open relation with OID 
1110882
2020-11-13 11:50:46.093 CST [30687] CONTEXT:  processing work entry for 
relation "pryzbyj.public.tt_i_idx"
2020-11-13 11:50:46.093 CST [30687] BACKTRACE:
postgres: autovacuum worker pryzbyj(+0xb9ce8) [0x55acf2af0ce8]
postgres: autovacuum worker pryzbyj(index_open+0xb) [0x55acf2bab59b]
postgres: autovacuum worker pryzbyj(brin_summarize_range+0x8f) 
[0x55acf2b5b5bf]
postgres: autovacuum worker pryzbyj(DirectFunctionCall2Coll+0x62) 
[0x55acf2f40372]
...

-- 
Justin
>From e08c6d3e2b10964633904ff247e70330077d31b4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Nov 2020 13:39:31 -0300
Subject: [PATCH v2] error_severity of brin work item

---
 src/backend/access/brin/brin.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..8278a5209c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -887,8 +887,10 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
 	 * passed indexoid isn't an index then IndexGetRelation() will fail.
-	 * Rather than emitting a not-very-helpful error message, postpone
-	 * complaining, expecting that the is-it-an-index test below will fail.
+	 * Rather than emitting a not-very-helpful error message, prepare to
+	 * return without doing anything.  This allows autovacuum work-items to be
+	 * silently discarded rather than uselessly accumulating error messages in
+	 * the server log.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
 	if (OidIsValid(heapoid))
@@ -896,7 +898,14 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	else
 		heapRel = NULL;
 
-	indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+	indexRel = try_relation_open(indexoid, ShareUpdateExclusiveLock);
+	if (heapRel == NULL || indexRel == NULL)
+	{
+		if (heapRel != NULL)
+			table_close(heapRel, ShareUpdateExclusiveLock);
+
+		PG_RETURN_INT32(0);
+	}
 
 	/* Must be a BRIN index */
 	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
-- 
2.17.0



Re: error_severity of brin work item

2020-11-13 Thread Alvaro Herrera
On 2020-Nov-13, Justin Pryzby wrote:

> I saw a bunch of these in my logs:
> 
> log_time | 2020-10-25 22:59:45.619-07
> database | 
> left | could not open relation with OID 292103095
> left | processing work entry for relation 
> "ts.child.alarms_202010_alarm_clear_time_idx"
> 
> Those happen following a REINDEX job on that index.
> 
> I think that should be more like an INFO message, since that's what vacuum 
> does
> (vacuum_open_relation), and a queued work item is even more likely to hit a
> dropped relation.

Ah, interesting.  Yeah, I agree this is a bug.  I think it can be fixed
by using try_relation_open() on the index; if that returns NULL, discard
the work item.

Does this patch solve the problem?

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..6021b16a01 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -887,8 +887,10 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
 	 * passed indexoid isn't an index then IndexGetRelation() will fail.
-	 * Rather than emitting a not-very-helpful error message, postpone
-	 * complaining, expecting that the is-it-an-index test below will fail.
+	 * Rather than emitting a not-very-helpful error message, prepare to
+	 * return without doing anything.  This allows autovacuum work-items to be
+	 * silently discarded rather than uselessly accumulating error messages in
+	 * the server log.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
 	if (OidIsValid(heapoid))
@@ -897,6 +899,11 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 		heapRel = NULL;
 
 	indexRel = index_open(indexoid, ShareUpdateExclusiveLock);
+	if (heapRel == NULL || indexRel == NULL)
+	{
+		/* Table or index don't exist anymore; ignore this request */
+		PG_RETURN_INT32(0);
+	}
 
 	/* Must be a BRIN index */
 	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||


error_severity of brin work item

2020-11-13 Thread Justin Pryzby
I saw a bunch of these in my logs:

log_time | 2020-10-25 22:59:45.619-07
database | 
left | could not open relation with OID 292103095
left | processing work entry for relation 
"ts.child.alarms_202010_alarm_clear_time_idx"

Those happen following a REINDEX job on that index.

I think that should be more like an INFO message, since that's what vacuum does
(vacuum_open_relation), and a queued work item is even more likely to hit a
dropped relation.  It's easy to hit by setting autovacuum_naptime=1 and looping
around REINDEX CONCURRENTLY while updating a table.

Autovacuum is queueing work items for later:

src/backend/postmaster/autovacuum.c-switch (workitem->avw_type)
src/backend/postmaster/autovacuum.c-{
src/backend/postmaster/autovacuum.c-case 
AVW_BRINSummarizeRange:
src/backend/postmaster/autovacuum.c:
DirectFunctionCall2(brin_summarize_range,
src/backend/postmaster/autovacuum.c-
ObjectIdGetDatum(workitem->avw_relation),
src/backend/postmaster/autovacuum.c-
Int64GetDatum((int64) workitem->avw_blockNumber));
src/backend/postmaster/autovacuum.c-break;

And if the index is missing:

brin_summarize_range(PG_FUNCTION_ARGS)
|/*
| * We must lock table before index to avoid deadlocks.  However, if the
| * passed indexoid isn't an index then IndexGetRelation() will fail.
| * Rather than emitting a not-very-helpful error message, postpone
| * complaining, expecting that the is-it-an-index test below will fail.
| */
|heapoid = IndexGetRelation(indexoid, true);
|if (OidIsValid(heapoid))
|heapRel = table_open(heapoid, ShareUpdateExclusiveLock);
|else
|heapRel = NULL;
|
|indexRel = index_open(indexoid, ShareUpdateExclusiveLock);

table_open is succcessful and then index_open fails.  (I thought locks would
have prevented that ?)

Justin