On 11/18/22 14:00, Tomas Vondra wrote:
Seems fine. I wonder if we could/could introduce a new constant for 0,
similar to ATTSTATSSLOT_NUMBERS/ATTSTATSSLOT_VALUES, instead of using a
magic constant. Say, ATTSTATSSLOT_NONE or ATTSTATSSLOT_CHECK.
Good idea. I called it ATTSTATSSLOT_EXISTS. New patch attached.
I don't think you can write a test for this, because there is no change
to behavior that can be observed by the user. If one side has no MCV,
the only difference is whether we try to load the other MCV or not.

Yeah. I thought along the lines of checking the number of pages read when the pg_stats entry is not in syscache yet. But that seems awfully implementation specific. So no test provided.

--
David Geier
(ServiceNow)
From 5c5c0fb9dd99e79daaa015984c9dda22e4ccda17 Mon Sep 17 00:00:00 2001
From: David Geier <geidav...@gmail.com>
Date: Fri, 18 Nov 2022 09:35:08 +0100
Subject: [PATCH] Don't read MCV stats needlessly in join selectivity
 estimation

Join selectivity estimation only uses MCV stats if they're present
on both join attributes. Therefore, if MCV stats are not available
on one of the two columns, skip reading MCV stats altogether.

Frequently encountered situations in which MCV stats not available
is unique columns or columns for which all values have roughly
the same frequency and therefore ANALYZE decided to not store them.
---
 src/backend/utils/adt/selfuncs.c    | 17 +++++++++++++++--
 src/backend/utils/cache/lsyscache.c |  4 ++++
 src/include/utils/lsyscache.h       |  3 ++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d597b7e81f..85a29eaeb9 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2263,6 +2263,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	bool		have_mcvs2 = false;
 	bool		join_is_reversed;
 	RelOptInfo *inner_rel;
+	bool		get_mcv_stats;
 
 	get_join_variables(root, args, sjinfo,
 					   &vardata1, &vardata2, &join_is_reversed);
@@ -2275,11 +2276,23 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	memset(&sslot1, 0, sizeof(sslot1));
 	memset(&sslot2, 0, sizeof(sslot2));
 
+	/*
+	 * There is no use in fetching one side's MCVs if we lack MCVs for the
+	 * other side, so do a quick check to verify that both stats exist.
+	 */
+	get_mcv_stats = (HeapTupleIsValid(vardata1.statsTuple) &&
+			 HeapTupleIsValid(vardata2.statsTuple) &&
+			 get_attstatsslot(&sslot1, vardata1.statsTuple,
+					  STATISTIC_KIND_MCV, InvalidOid, ATTSTATSSLOT_EXISTS) &&
+			 get_attstatsslot(&sslot2, vardata2.statsTuple,
+					  STATISTIC_KIND_MCV, InvalidOid, ATTSTATSSLOT_EXISTS));
+    
+    
 	if (HeapTupleIsValid(vardata1.statsTuple))
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
-		if (statistic_proc_security_check(&vardata1, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata1, opfuncoid))
 			have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
@@ -2289,7 +2302,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	{
 		/* note we allow use of nullfrac regardless of security check */
 		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
-		if (statistic_proc_security_check(&vardata2, opfuncoid))
+		if (get_mcv_stats && statistic_proc_security_check(&vardata2, opfuncoid))
 			have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
 										  STATISTIC_KIND_MCV, InvalidOid,
 										  ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a16a63f495..191f68f7b4 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3157,6 +3157,10 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
  * reqkind: STAKIND code for desired statistics slot kind.
  * reqop: STAOP value wanted, or InvalidOid if don't care.
  * flags: bitmask of ATTSTATSSLOT_VALUES and/or ATTSTATSSLOT_NUMBERS.
+ *        Passing 0 will only check if the requested slot type exists but not
+ *        extract any content. This is useful in cases where MCV stats are only
+ *        useful if they exists on all involved columns (e.g. during join
+ *        selectivity computation).
  *
  * If a matching slot is found, true is returned, and *sslot is filled thus:
  * staop: receives the actual STAOP value.
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 50f0288305..d9a3916997 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -39,7 +39,8 @@ typedef enum IOFuncSelector
 } IOFuncSelector;
 
 /* Flag bits for get_attstatsslot */
-#define ATTSTATSSLOT_VALUES		0x01
+#define ATTSTATSSLOT_EXISTS	0x00
+#define ATTSTATSSLOT_VALUES	0x01
 #define ATTSTATSSLOT_NUMBERS	0x02
 
 /* Result struct for get_attstatsslot */
-- 
2.34.1

Reply via email to