Buildfarm member hyrax has shown this failure twice recently:

--- 
/home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/brin.out   
    2020-01-23 11:10:05.730014075 -0500
+++ 
/home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/brin.out    
    2020-06-30 03:50:23.651196117 -0400
@@ -490,6 +490,7 @@
 INSERT INTO brintest_2 VALUES (numrange(0, 2^1000::numeric));
 INSERT INTO brintest_2 VALUES ('(-1, 0)');
 SELECT brin_desummarize_range('brinidx', 0);
+WARNING:  leftover placeholder tuple detected in BRIN index "brinidx", deleting
  brin_desummarize_range 
 ------------------------
  
This happened on HEAD:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-30%2001%3A41%3A50
and v13:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-17%2005%3A50%3A46
and lousyjack has also shown it once on HEAD:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-02%2013%3A03%3A04

The warning is from this bit (commit c655899ba):

    /*
     * Because of ShareUpdateExclusive lock, this function shouldn't run
     * concurrently with summarization.  Placeholder tuples can only exist as
     * leftovers from crashed summarization, so if we detect any, we complain
     * but proceed.
     */
    if (BrinTupleIsPlaceholder(tup))
        ereport(WARNING,
                (errmsg("leftover placeholder tuple detected in BRIN index 
\"%s\", deleting",
                        RelationGetRelationName(idxrel))));

Now, there was no preceding crash in these tests, so the comment's claim
is evidently a lie.  But what's going wrong?  The postmaster log provides
a strong hint:

2020-06-30 03:50:14.603 EDT [3699:48] pg_regress/brin LOG:  statement: SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:15.894 EDT [3699:49] pg_regress/brin LOG:  process 3699 still 
waiting for ShareUpdateExclusiveLock on relation 24795 of database 16384 after 
1000.094 ms
2020-06-30 03:50:15.894 EDT [3699:50] pg_regress/brin DETAIL:  Process holding 
the lock: 7237. Wait queue: 3699.
2020-06-30 03:50:15.894 EDT [3699:51] pg_regress/brin STATEMENT:  SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:15.895 EDT [7237:1] ERROR:  canceling autovacuum task
2020-06-30 03:50:15.895 EDT [7237:2] CONTEXT:  while cleaning up index 
"brinidx" of relation "public.brintest"
        automatic vacuum of table "regression.public.brintest"
2020-06-30 03:50:15.899 EDT [3699:52] pg_regress/brin LOG:  process 3699 
acquired ShareUpdateExclusiveLock on relation 24795 of database 16384 after 
1001.626 ms
2020-06-30 03:50:15.899 EDT [3699:53] pg_regress/brin STATEMENT:  SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:16.018 EDT [3699:54] pg_regress/brin WARNING:  leftover 
placeholder tuple detected in BRIN index "brinidx", deleting
2020-06-30 03:50:16.031 EDT [3699:55] pg_regress/brin LOG:  statement: SELECT 
brin_summarize_range('brinidx', 0);

I think the "crash" actually was the forced autovac cancellation we see
here.  Thus, the fact that both these animals use CLOBBER_CACHE_ALWAYS is
not a direct cause of the failure, though it might contribute to getting
the timing right for this to happen.

So (1) the comment needs to be adjusted to mention that vacuum
cancellation is enough to create the situation, and (2) probably the
message needs to be downgraded quite a lot, DEBUG2 or even less.
Or maybe we could just delete the quoted stanza altogether.

                        regards, tom lane


Reply via email to