Hello

At my current job, we have a lot of multi-tenant databases, thus with tables 
containing a tenant_id column. Such a column introduces a severe bias in 
statistics estimation since any other FK in the next columns is very likely to 
have a functional dependency on the tenant id. We found several queries where 
this functional dependency messed up the estimations so much the optimizer 
chose wrong plans.
When we tried to use extended statistics with CREATE STATISTIC on tenant_id, 
other_id, we noticed that the current implementation for detecting functional 
dependency lacks two features (at least in our use case):
- support for IN clauses
- support for the array contains operator (that could be considered as a 
special case of IN)

After digging in the source code, I think the lack of support for IN clauses 
is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr 
instead of OpExpr. The attached patch fixes this by simply copying the code-
path for OpExpr and changing the type name. It compiles and the results are 
correct, with a dependency being correctly taken into consideration when 
estimating rows. If you think such a copy paste is bad and should be factored 
in another static bool function, please say so and I will happily provide an 
updated patch.
The lack of support for @> operator, on the other hand, seems to be a decision 
taken when writing the initial code, but I can not find any mathematical nor 
technical reason for it. The current code restricts dependency calculation to 
the = operator, obviously because inequality operators are not going to 
work... but array contains is just several = operators grouped, thus the same 
for the dependency calculation. The second patch refactors the operator check 
in order to also include array contains.

I tested the patches on current HEAD, but I can test and provide back-ported 
versions of the patch for other versions if needed (this code path hardly 
changed since its introduction in 10).

Best regards

 Pierre Ducroquet
>From d2b89748e1b520c276875538149eb466e97c21b4 Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 31 Jan 2020 23:58:35 +0100
Subject: [PATCH 1/2] Add support for IN clauses in dependencies check

---
 src/backend/statistics/dependencies.c | 34 +++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index e2f6c5bb97..d4844a9ec6 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -801,6 +801,40 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 
 		/* OK to proceed with checking "var" */
 	}
+	else if (IsA(rinfo->clause, ScalarArrayOpExpr))
+	{
+		/* If it's an opclause, check for Var IN Const. */
+		ScalarArrayOpExpr	   *expr = (ScalarArrayOpExpr *) rinfo->clause;
+
+		/* Only expressions with two arguments are candidates. */
+		if (list_length(expr->args) != 2)
+			return false;
+
+		/* Make sure non-selected argument is a pseudoconstant. */
+		if (is_pseudo_constant_clause(lsecond(expr->args)))
+			var = linitial(expr->args);
+		else if (is_pseudo_constant_clause(linitial(expr->args)))
+			var = lsecond(expr->args);
+		else
+			return false;
+
+		/*
+		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * compatible with functional dependencies.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 *
+		 * XXX this is pretty dubious; probably it'd be better to check btree
+		 * or hash opclass membership, so as not to be fooled by custom
+		 * selectivity functions, and to be more consistent with decisions
+		 * elsewhere in the planner.
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return false;
+
+		/* OK to proceed with checking "var" */
+	}
 	else if (is_notclause(rinfo->clause))
 	{
 		/*
-- 
2.24.1

>From 02363c52d0d03f58b8e79088a7261a9fc1e3bbb7 Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 31 Jan 2020 23:58:55 +0100
Subject: [PATCH 2/2] Add support for array contains in dependency check

---
 src/backend/statistics/dependencies.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index d4844a9ec6..c3f37a474b 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -785,7 +785,7 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 			return false;
 
 		/*
-		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * If it's not an "=" or "@>" operator, just ignore the clause, as it's not
 		 * compatible with functional dependencies.
 		 *
 		 * This uses the function for estimating selectivity, not the operator
@@ -796,8 +796,13 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 		 * selectivity functions, and to be more consistent with decisions
 		 * elsewhere in the planner.
 		 */
-		if (get_oprrest(expr->opno) != F_EQSEL)
-			return false;
+		switch(get_oprrest(expr->opno)) {
+			case F_EQSEL:
+			case F_ARRAYCONTSEL:
+				break;
+			default:
+				return false;
+		}
 
 		/* OK to proceed with checking "var" */
 	}
-- 
2.24.1

Reply via email to