On Thu, Nov 14, 2019 at 05:35:06PM -0500, Tom Lane wrote:
Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
On Thu, Nov 14, 2019 at 04:36:54PM -0500, Tom Lane wrote:
Hm.  No, it's not enough, unless you add more logic to deal with the
possibility that the stats object is gone by the time you have the
table lock.  Consider e.g. two concurrent DROP STATISTICS commands,
or a statistics drop that's cascading from something like a drop
of a relevant function and so has no earlier table lock.

Isn't that solved by RemoveObjects() doing this?

    /* Get an ObjectAddress for the object. */
    address = get_object_address(stmt->removeType,
                                 object,
                                 &relation,
                                 AccessExclusiveLock,
                                 stmt->missing_ok);

Ah, I see, we already have AEL on the stats object itself.  So that
eliminates my concern about a race between two RemoveStatisticsById
calls, but what we have instead is fear of deadlock.  A DROP STATISTICS
command will acquire AEL on the stats object but then AEL on the table,
the opposite of what will happen during DROP TABLE, so concurrent
executions of those will deadlock.  That might be better than the
failures Mark is seeing now, but not by much.


Hmmm, yeah :-(

A correct fix I think is that the planner ought to acquire AccessShareLock
on a stats object it's trying to use (and then recheck whether the object
is still there).  That seems rather expensive, but there may be no other
way.

Yes, so something like for indexes, although we don't need the recheck
in that case. I think the attached patch does that (but it's 1AM here).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 5e889d1861..a1e325615c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -46,6 +46,7 @@
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
+#include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/partcache.h"
@@ -1305,9 +1306,13 @@ get_relation_statistics(RelOptInfo *rel, Relation 
relation)
                Bitmapset  *keys = NULL;
                int                     i;
 
+               LockDatabaseObject(StatisticExtRelationId, statOid, 0, 
AccessShareLock);
+
+               /* we may get invalid tuple, if the statistic just got dropped 
*/
                htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
                if (!HeapTupleIsValid(htup))
-                       elog(ERROR, "cache lookup failed for statistics object 
%u", statOid);
+                       continue;
+
                staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
 
                dtup = SearchSysCache1(STATEXTDATASTXOID, 
ObjectIdGetDatum(statOid));

Reply via email to