On Tue, Apr 1, 2025 at 1:32 PM Amit Langote <amitlangot...@gmail.com> wrote:
>
> I think David’s suggestion to use 64 as the fixed initial size is
> simpler and more predictable. Since list_length * 2 will always be >=
> 64 anyway, unless we expect clause counts to grow significantly right
> after the threshold, the fixed size avoids the need to reason about
> sizing heuristics and keeps the logic clearer.
>
> It also lines up well with David’s point -- 64 strikes a good balance
> for both memory usage and CPU efficiency. Unless there’s a strong
> reason to favor dynamic sizing, I’d prefer to stick with the fixed 64.

Consider following table and query (based on a similar table in
join.sql, which exercises find_derived_clause_for_ec_member() code
path.
#create table fkest (x integer, x10 integer, x10b integer, x100
integer, unique(x, x10, x100), foreign key (x, x10b, x100) references
fkest(x, x10, x100));
#select 'select * from fkest f1 join ' || string_agg(format('fkest f%s
on (f1.x = f%s.x and f1.x10 = f%s.x10b and f1.x100 = f%s.x100)', i, i,
i , i), ' join ') || ' where f1.x100 = 2' query from
generate_series(2, 100) i; \gset
#explain (summary) :query;

This is a 100-way self-join between foreign key and referenced key and
one of the foreign keys being set to a constant. This exercises the
case of derived clauses with constant EM.

When planning this query, all the 100 derived clauses containing the
constant EM are created first and then they are searched one by one
for every EM. Thus when the hash table is created in
find_derived_clause_for_ec_member()->ec_search_derived_clause_for_ems(),
the length of ec_derives_list is already 100, so the hash table will
be expanded while it's being filled up the first time, if we use
constant 64 as initial hash table size. This can be avoided if we use
list_length() * 2. The pattern for create_join_clause() however is
search then insert - so it will always create the hash table with
initial size 64 right when the list length is 32. Both these cases are
served well if we base the initial hash table size as a multiple of
list_length(ec_derives_list).

For the record, without the patch this query takes about 5800ms on
average for planning on my laptop. With the patch the planning time
reduces to about 5400ms - 6-7% of improvement if my approximate math
is correct.

>
> As David suggested off-list, it seems better to have a static inline
> function for the key canonicalization logic than to duplicate it in
> the insert and lookup paths, as done in your 0004.

Static inline function to fill the key in canonical form is a good
idea. Thanks for the patch.

>
> * Replaces literal values for threshold and initial size with macros,
> defined near the key and entry types.

I don't see this in your attached patches. Am I missing something?
It's still using list_length() for initial hash table size. But with
my explanation above, I believe we will keep it that way.

>
> I wasn’t sure why you removed this comment:
>
> - * We do not attempt a second lookup with EMs swapped when using the hash
> - * table; such clauses are inserted under both orderings at the time of
> - * insertion.

When I read it, I didn't understand why we mentioned a second lookup,
so I dropped it. But now I see that the comment is in the context of
two comparisons being done when using list. I have rephrased the
comment a bit to make this comparison clear.


>
> 0003’s commit message includes a blurb I plan to paste into the main
> patch’s message (with minor tweaks) when we squash the patches.

Slight correction in the first sentence of that blurb message.

Derived clauses are now stored in the ec_derives_hash using canonicalized
keys: the EquivalenceMember with lower memory address is always placed
in em1, and ...

+/*
+ * fill_ec_derives_key
+ * Compute a canonical key for ec_derives_hash lookup or insertion.
+ *
+ * Derived clauses are looked up using a pair of EquivalenceMembers and a
+ * parent EquivalenceClass. To avoid storing or searching for both EM
orderings,
+ * we canonicalize the key:
+ *
+ * - For clauses involving two non-constant EMs, we order the EMs by address
+ * and place the lower one first.
+ * - For clauses involving a constant EM, the caller must pass the non-constant
+ * EM as leftem; we then set em1 = NULL and em2 = leftem.
+ */
+static inline void
+fill_ec_derives_key(ECDerivesKey *key,
+ EquivalenceMember *leftem,
+ EquivalenceMember *rightem,
+ EquivalenceClass *parent_ec)
+{
+ Assert(leftem); /* Always required for lookup or insertion */
+
+ /*
+ * Clauses with a constant EM are always stored and looked up using only
+ * the non-constant EM, with the other key slot set to NULL.
+ */

This comment seems to overlap with what's already there in the
prologue. However "Clauses with a constant EM are always stored and
looked up using the non-constant EM" is non-overlapping part. This
part is also repeated in ec_add_clause_to_derives_hash(), which I
think is a better place for this comment. Changed in the attached
patch.

*/
Assert(!rinfo->left_em->em_is_const);
+ /*
+ * Clauses containing a constant are never considered redundant, so
+ * parent_ec is not set.
+ */
+ Assert(!rinfo->parent_ec || !rinfo->right_em->em_is_const);

These two Assertions are applicable to all the derived clauses but are
added to a function which deals with only hash table. If an EC doesn't
have enough derived clauses to create a hash table these assertions
won't be checked. The reason they are here is because we are using
these properties to create canonical hash table key. Should we move
them to ec_add_derived_clause() for better coverage?

I have also removed some comments repeated in the function prologue
and function body.

0001, 0002 and 0003 are the same as your set. 0004 contains my changes
in a separate patch so that it's easy to review those.

--
Best Wishes,
Ashutosh Bapat
From 49c6c25df01d830f1dc367048a4db013291107c9 Mon Sep 17 00:00:00 2001
From: Amit Langote <amit...@postgresql.org>
Date: Mon, 24 Mar 2025 21:17:46 +0900
Subject: [PATCH 1/4] Add assertion to verify derived clause has constant RHS

find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member as the target.

The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.

Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().

Author: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Discussion: https://postgr.es/m/caexhw5scmxyfrqofe6odmbiw2rnvbemeeca-p4w_cyueiku...@mail.gmail.com
---
 src/backend/optimizer/path/equivclass.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0f9ecf5ee8b..493a95d26cc 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2664,7 +2664,10 @@ find_derived_clause_for_ec_member(EquivalenceClass *ec,
 		 * members on the left side of derived clauses.
 		 */
 		if (rinfo->left_em == em)
+		{
+			Assert(rinfo->right_em->em_is_const);
 			return rinfo;
+		}
 	}
 	return NULL;
 }

base-commit: 764d501d24baab8ead6dc3bf7bb0dbd13ea86084
-- 
2.34.1

From ff721aeed52f15e9aad22f03777e37225ff4183f Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Date: Tue, 1 Apr 2025 22:13:08 +0530
Subject: [PATCH 4/4] Minor fixes

1. Assert movement
2. grammar fixes
3. removed duplicate comments
4. comment rephrased

Author: Ashutosh Bapat
---
 src/backend/optimizer/path/equivclass.c | 68 +++++++++++--------------
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 12155c0c957..3249f966573 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -90,7 +90,8 @@ static RestrictInfo *ec_search_derived_clause_for_ems(PlannerInfo *root,
 /*
  * Hash key identifying a derived clause.
  *
- * See fill_ec_derives_key() for details on canonical key construction.
+ * The structure is not supposed to be filled manually. Please use
+ * fill_ec_derives_key() to setup key in canonical form.
  */
 typedef struct
 {
@@ -3478,10 +3479,24 @@ ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec)
  *		Add a clause to the set of derived clauses for the given
  *		EquivalenceClass. Always appends to ec_derives_list; also adds
  *		to ec_derives_hash if it exists.
+ *
+ * While at it, also check some derived clause invariants.
  */
 static void
 ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause)
 {
+	/*
+	 * Constant, if present, is always placed on the RHS; see
+	 * generate_base_implied_equalities_const(). LHS is never a constant.
+	 */
+	Assert(!clause->left_em->em_is_const);
+
+	/*
+	 * Clauses containing a constant are never considered redundant, so
+	 * parent_ec is not set.
+	 */
+	Assert(!clause->parent_ec || !clause->right_em->em_is_const);
+
 	ec->ec_derives_list = lappend(ec->ec_derives_list, clause);
 	if (ec->ec_derives_hash)
 		ec_add_clause_to_derives_hash(ec, clause);
@@ -3492,8 +3507,10 @@ ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause)
  *		Add a list of clauses to the set of clauses derived from the given
  *		EquivalenceClass; adding to the list and hash table if needed.
  *
- * This function is similar to above function but optimized for adding multiple
- * clauses at a time to the ec_derives_list.
+ * This function is similar to ec_add_derived_clause() but optimized for adding
+ * multiple clauses at a time to the ec_derives_list. However we don't need to
+ * repeat the Asserts in that function since they must have been applied when
+ * the clauses were inserted in the derived clause list from where they came.
  */
 static void
 ec_add_derived_clauses(EquivalenceClass *ec, List *clauses)
@@ -3506,16 +3523,16 @@ ec_add_derived_clauses(EquivalenceClass *ec, List *clauses)
 
 /*
  * fill_ec_derives_key
- *		Compute a canonical key for ec_derives_hash lookup or insertion.
+ *		Setup canonical key for ec_derives_hash lookup or insertion.
  *
  * Derived clauses are looked up using a pair of EquivalenceMembers and a
  * parent EquivalenceClass. To avoid storing or searching for both EM orderings,
  * we canonicalize the key:
  *
- * - For clauses involving two non-constant EMs, we order the EMs by address
- *   and place the lower one first.
+ * - For clauses involving two non-constant EMs, em1 is set to the EM with lower
+ *   memory address and em2 is set to the other one.
  * - For clauses involving a constant EM, the caller must pass the non-constant
- *   EM as leftem; we then set em1 = NULL and em2 = leftem.
+ *   EM as leftem and NULL as rightem; we then set em1 = NULL and em2 = leftem.
  */
 static inline void
 fill_ec_derives_key(ECDerivesKey *key,
@@ -3525,10 +3542,6 @@ fill_ec_derives_key(ECDerivesKey *key,
 {
 	Assert(leftem);				/* Always required for lookup or insertion */
 
-	/*
-	 * Clauses with a constant EM are always stored and looked up using only
-	 * the non-constant EM, with the other key slot set to NULL.
-	 */
 	if (rightem == NULL)
 	{
 		key->em1 = NULL;
@@ -3549,9 +3562,9 @@ fill_ec_derives_key(ECDerivesKey *key,
 
 /*
  * ec_add_clause_to_derives_hash
- *		Add a derived clause to ec_derives_hash for the given EquivalenceClass.
+ *		Add a derived clause to ec_derives_hash in the given EquivalenceClass.
  *
- * Each clause is inserted under a canonicalized key. For constant-containing
+ * Each clause is associated with a canonicalized key. For constant-containing
  * clauses, only the non-constant EM is used for lookup; see comments in
  * fill_ec_derives_key().
  */
@@ -3562,23 +3575,6 @@ ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
 	ECDerivesEntry *entry;
 	bool		found;
 
-	/*
-	 * Constants are always placed on the RHS; see
-	 * generate_base_implied_equalities_const().
-	 */
-	Assert(!rinfo->left_em->em_is_const);
-
-	/*
-	 * Clauses containing a constant are never considered redundant, so
-	 * parent_ec is not set.
-	 */
-	Assert(!rinfo->parent_ec || !rinfo->right_em->em_is_const);
-
-	/*
-	 * See fill_ec_derives_key() for details: we use a canonicalized key to
-	 * avoid storing both EM orderings. For constant EMs, only the
-	 * non-constant EM is included in the key.
-	 */
 	fill_ec_derives_key(&key,
 						rinfo->left_em,
 						rinfo->right_em->em_is_const ? NULL : rinfo->right_em,
@@ -3654,11 +3650,12 @@ ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
  * lookup; otherwise, scan ec_derives_list linearly.
  *
  * Clauses involving constants are looked up using only the non-constant EM
- * (leftem), with rightem set to NULL. In that case, we expect to find a
+ * passed as leftem and rightem set to NULL. In that case, we expect to find a
  * clause with a constant on the RHS.
  *
- * We do not attempt a second lookup with EMs swapped, because the key is
- * canonicalized during both insertion and lookup.
+ * While looking up the derived clause in the list, we match each given EM with
+ * both sides of the clause respectively.  But hash table look up is carried out
+ * only once using a canonicalized key.
  */
 static RestrictInfo *
 ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
@@ -3686,11 +3683,6 @@ ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
 		{
 			rinfo = entry->rinfo;
 			Assert(rinfo);
-
-			/*
-			 * If this is a lookup in a const-containing EC, the RHS must be a
-			 * constant. The caller signals this by passing NULL for rightem.
-			 */
 			Assert(rightem || rinfo->right_em->em_is_const);
 			return rinfo;
 		}
-- 
2.34.1

From d13e8975522ec8b90b33bd0749c370b3ac8ac196 Mon Sep 17 00:00:00 2001
From: Amit Langote <amit...@postgresql.org>
Date: Tue, 25 Mar 2025 20:36:26 +0900
Subject: [PATCH 2/4] Make derived clause lookup in EquivalenceClass more
 efficient

Previously, derived clauses were stored in ec_derives, a List of
RestrictInfos. These clauses are looked up later using the left and
right EquivalenceMembers and the clause's parent EC, typically during
join clause generation.

This lookup becomes expensive in queries with many partitions or
joins, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.

To improve performance, a hash table (ec_derives_hash) is now built
when ec_derives_list exceeds 32 entries.  This is the same threshold
used for join_rel_hash. To reflect this dual structure, the
ec_derives field is renamed to ec_derives_list. The list is retained
alongside the hash table to support _outEquivalenceClass() and
EquivalenceClass merging, which both operate on ec_derives_list.

Lookup logic is updated to consult the hash table when available. For
ECs with a constant EM, lookups are done using only the non-constant
EM; the constant EM is assumed to be unique and may not be available
at lookup time. In such cases, the hash table stores em2 = NULL, which
can be matched by using NULL for em2 in the lookup key.

An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() to enforce the same
invariant in both hash and list lookup paths.

The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old ec_derives
field did not. This ensures consistent cleanup and avoids leaking
memory when the list grows large.

Author: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Amit Langote <amitlangot...@gmail.com>
Tested-by: Dmitry Dolgov <9erthali...@gmail.com>
Tested-by: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangot...@gmail.com>
Discussion: https://postgr.es/m/caexhw5vziqtwu6moszlp5iz8glx_zaubgex0dxglx9pgwct...@mail.gmail.com
---
 src/backend/nodes/outfuncs.c              |   3 +-
 src/backend/optimizer/path/costsize.c     |   3 +-
 src/backend/optimizer/path/equivclass.c   | 403 ++++++++++++++++++----
 src/backend/optimizer/plan/analyzejoins.c |   5 +-
 src/include/nodes/pathnodes.h             |  17 +-
 src/include/optimizer/paths.h             |   4 +-
 src/tools/pgindent/typedefs.list          |   2 +
 7 files changed, 357 insertions(+), 80 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bb9bdd67192..557f06e344f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -467,7 +467,8 @@ _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 	WRITE_OID_FIELD(ec_collation);
 	WRITE_NODE_FIELD(ec_members);
 	WRITE_NODE_FIELD(ec_sources);
-	WRITE_NODE_FIELD(ec_derives);
+	/* Only ec_derives_list is written; hash is not serialized. */
+	WRITE_NODE_FIELD(ec_derives_list);
 	WRITE_BITMAPSET_FIELD(ec_relids);
 	WRITE_BOOL_FIELD(ec_has_const);
 	WRITE_BOOL_FIELD(ec_has_volatile);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f6f77b8fe19..c474c7a06af 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5848,7 +5848,8 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				if (ec && ec->ec_has_const)
 				{
 					EquivalenceMember *em = fkinfo->fk_eclass_member[i];
-					RestrictInfo *rinfo = find_derived_clause_for_ec_member(ec,
+					RestrictInfo *rinfo = find_derived_clause_for_ec_member(root,
+																			ec,
 																			em);
 
 					if (rinfo)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 493a95d26cc..b988db4d7b3 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -20,6 +20,7 @@
 
 #include "access/stratnum.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -72,7 +73,50 @@ static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root,
 												Relids relids);
 static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1,
 											Relids relids2);
+static void ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec);
+static void ec_add_derived_clauses(EquivalenceClass *ec, List *clauses);
+static void ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause);
+static void ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo);
+static RestrictInfo *ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+											  EquivalenceMember *leftem,
+											  EquivalenceMember *rightem,
+											  EquivalenceClass *parent_ec);
+static RestrictInfo *ec_search_derived_clause_for_ems(PlannerInfo *root,
+													  EquivalenceClass *ec,
+													  EquivalenceMember *leftem,
+													  EquivalenceMember *rightem,
+													  EquivalenceClass *parent_ec);
 
+/*
+ * Hash key identifying a derived clause by EquivalenceMembers and parent EC.
+ */
+typedef struct
+{
+	EquivalenceMember *em1;
+	EquivalenceMember *em2;
+	EquivalenceClass *parent_ec;
+} ECDerivesKey;
+
+/* Hash table entry in ec_derives_hash. */
+typedef struct
+{
+	uint32		status;
+	ECDerivesKey key;
+	RestrictInfo *rinfo;
+} ECDerivesEntry;
+
+#define SH_PREFIX               derives
+#define SH_ELEMENT_TYPE			ECDerivesEntry
+#define SH_KEY_TYPE             ECDerivesKey
+#define SH_KEY                  key
+#define SH_HASH_KEY(tb, key)	\
+	hash_bytes((const unsigned char *) &(key), sizeof(ECDerivesKey))
+#define SH_EQUAL(tb, a, b)	\
+	((a).em1 == (b).em1 && (a).em2 == (b).em2 && (a).parent_ec == (b).parent_ec)
+#define SH_SCOPE                static inline
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
 
 /*
  * process_equivalence
@@ -342,7 +386,12 @@ process_equivalence(PlannerInfo *root,
 		 */
 		ec1->ec_members = list_concat(ec1->ec_members, ec2->ec_members);
 		ec1->ec_sources = list_concat(ec1->ec_sources, ec2->ec_sources);
-		ec1->ec_derives = list_concat(ec1->ec_derives, ec2->ec_derives);
+
+		/*
+		 * Appends ec2's derived clauses to ec1->ec_derives_list and adds them
+		 * to ec1->ec_derives_hash if present.
+		 */
+		ec_add_derived_clauses(ec1, ec2->ec_derives_list);
 		ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids);
 		ec1->ec_has_const |= ec2->ec_has_const;
 		/* can't need to set has_volatile */
@@ -355,7 +404,7 @@ process_equivalence(PlannerInfo *root,
 		/* just to avoid debugging confusion w/ dangling pointers: */
 		ec2->ec_members = NIL;
 		ec2->ec_sources = NIL;
-		ec2->ec_derives = NIL;
+		ec_clear_derived_clauses(ec2);
 		ec2->ec_relids = NULL;
 		ec1->ec_sources = lappend(ec1->ec_sources, restrictinfo);
 		ec1->ec_min_security = Min(ec1->ec_min_security,
@@ -412,7 +461,8 @@ process_equivalence(PlannerInfo *root,
 		ec->ec_collation = collation;
 		ec->ec_members = NIL;
 		ec->ec_sources = list_make1(restrictinfo);
-		ec->ec_derives = NIL;
+		ec->ec_derives_list = NIL;
+		ec->ec_derives_hash = NULL;
 		ec->ec_relids = NULL;
 		ec->ec_has_const = false;
 		ec->ec_has_volatile = false;
@@ -671,7 +721,8 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	newec->ec_collation = collation;
 	newec->ec_members = NIL;
 	newec->ec_sources = NIL;
-	newec->ec_derives = NIL;
+	newec->ec_derives_list = NIL;
+	newec->ec_derives_hash = NULL;
 	newec->ec_relids = NULL;
 	newec->ec_has_const = false;
 	newec->ec_has_volatile = contain_volatile_functions((Node *) expr);
@@ -1026,8 +1077,8 @@ relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
  * scanning of the quals and before Path construction begins.
  *
  * We make no attempt to avoid generating duplicate RestrictInfos here: we
- * don't search ec_sources or ec_derives for matches.  It doesn't really
- * seem worth the trouble to do so.
+ * don't search existing source or derived clauses in the EC for matches.  It
+ * doesn't really seem worth the trouble to do so.
  */
 void
 generate_base_implied_equalities(PlannerInfo *root)
@@ -1188,11 +1239,11 @@ generate_base_implied_equalities_const(PlannerInfo *root,
 
 		/*
 		 * If the clause didn't degenerate to a constant, fill in the correct
-		 * markings for a mergejoinable clause, and save it in ec_derives. (We
-		 * will not re-use such clauses directly, but selectivity estimation
-		 * may consult the list later.  Note that this use of ec_derives does
-		 * not overlap with its use for join clauses, since we never generate
-		 * join clauses from an ec_has_const eclass.)
+		 * markings for a mergejoinable clause, and save it as a derived
+		 * clause. (We will not re-use such clauses directly, but selectivity
+		 * estimation may consult those later.  Note that this use of derived
+		 * clauses does not overlap with its use for join clauses, since we
+		 * never generate join clauses from an ec_has_const eclass.)
 		 */
 		if (rinfo && rinfo->mergeopfamilies)
 		{
@@ -1200,7 +1251,7 @@ generate_base_implied_equalities_const(PlannerInfo *root,
 			rinfo->left_ec = rinfo->right_ec = ec;
 			rinfo->left_em = cur_em;
 			rinfo->right_em = const_em;
-			ec->ec_derives = lappend(ec->ec_derives, rinfo);
+			ec_add_derived_clause(ec, rinfo);
 		}
 	}
 }
@@ -1265,10 +1316,10 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
 
 			/*
 			 * If the clause didn't degenerate to a constant, fill in the
-			 * correct markings for a mergejoinable clause.  We don't put it
-			 * in ec_derives however; we don't currently need to re-find such
-			 * clauses, and we don't want to clutter that list with non-join
-			 * clauses.
+			 * correct markings for a mergejoinable clause.  We don't record
+			 * it as a derived clause, since we don't currently need to
+			 * re-find such clauses, and don't want to clutter the
+			 * derived-clause set with non-join clauses.
 			 */
 			if (rinfo && rinfo->mergeopfamilies)
 			{
@@ -1369,7 +1420,7 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
  * we consider different join paths, we avoid generating multiple copies:
  * whenever we select a particular pair of EquivalenceMembers to join,
  * we check to see if the pair matches any original clause (in ec_sources)
- * or previously-built clause (in ec_derives).  This saves memory and allows
+ * or previously-built derived clause.  This saves memory and allows
  * re-use of information cached in RestrictInfos.  We also avoid generating
  * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but
  * we already have "b.y = a.x", we return the existing clause.
@@ -1754,9 +1805,9 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
 	/*
 	 * If we have to translate, just brute-force apply adjust_appendrel_attrs
 	 * to all the RestrictInfos at once.  This will result in returning
-	 * RestrictInfos that are not listed in ec_derives, but there shouldn't be
-	 * any duplication, and it's a sufficiently narrow corner case that we
-	 * shouldn't sweat too much over it anyway.
+	 * RestrictInfos that are not included in EC's derived clauses, but there
+	 * shouldn't be any duplication, and it's a sufficiently narrow corner
+	 * case that we shouldn't sweat too much over it anyway.
 	 *
 	 * Since inner_rel might be an indirect descendant of the baserel
 	 * mentioned in the ec_sources clauses, we have to be prepared to apply
@@ -1823,43 +1874,11 @@ create_join_clause(PlannerInfo *root,
 {
 	RestrictInfo *rinfo;
 	RestrictInfo *parent_rinfo = NULL;
-	ListCell   *lc;
 	MemoryContext oldcontext;
 
-	/*
-	 * Search to see if we already built a RestrictInfo for this pair of
-	 * EquivalenceMembers.  We can use either original source clauses or
-	 * previously-derived clauses, and a commutator clause is acceptable.
-	 *
-	 * We used to verify that opno matches, but that seems redundant: even if
-	 * it's not identical, it'd better have the same effects, or the operator
-	 * families we're using are broken.
-	 */
-	foreach(lc, ec->ec_sources)
-	{
-		rinfo = (RestrictInfo *) lfirst(lc);
-		if (rinfo->left_em == leftem &&
-			rinfo->right_em == rightem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-		if (rinfo->left_em == rightem &&
-			rinfo->right_em == leftem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-	}
-
-	foreach(lc, ec->ec_derives)
-	{
-		rinfo = (RestrictInfo *) lfirst(lc);
-		if (rinfo->left_em == leftem &&
-			rinfo->right_em == rightem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-		if (rinfo->left_em == rightem &&
-			rinfo->right_em == leftem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-	}
+	rinfo = ec_search_clause_for_ems(root, ec, leftem, rightem, parent_ec);
+	if (rinfo)
+		return rinfo;
 
 	/*
 	 * Not there, so build it, in planner context so we can re-use it. (Not
@@ -1923,7 +1942,7 @@ create_join_clause(PlannerInfo *root,
 	rinfo->left_em = leftem;
 	rinfo->right_em = rightem;
 	/* and save it for possible re-use */
-	ec->ec_derives = lappend(ec->ec_derives, rinfo);
+	ec_add_derived_clause(ec, rinfo);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2648,28 +2667,14 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Returns NULL if no such clause can be found.
  */
 RestrictInfo *
-find_derived_clause_for_ec_member(EquivalenceClass *ec,
+find_derived_clause_for_ec_member(PlannerInfo *root,
+								  EquivalenceClass *ec,
 								  EquivalenceMember *em)
 {
-	ListCell   *lc;
-
 	Assert(ec->ec_has_const);
 	Assert(!em->em_is_const);
-	foreach(lc, ec->ec_derives)
-	{
-		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
-		/*
-		 * generate_base_implied_equalities_const will have put non-const
-		 * members on the left side of derived clauses.
-		 */
-		if (rinfo->left_em == em)
-		{
-			Assert(rinfo->right_em->em_is_const);
-			return rinfo;
-		}
-	}
-	return NULL;
+	return ec_search_derived_clause_for_ems(root, ec, em, NULL, NULL);
 }
 
 
@@ -3442,3 +3447,255 @@ get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2)
 	/* Calculate and return the common EC indexes, recycling the left input. */
 	return bms_int_members(rel1ecs, rel2ecs);
 }
+
+/*
+ * ec_build_derives_hash
+ *	  Construct the auxiliary hash table for derived clauses.
+ */
+static void
+ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec)
+{
+	Assert(!ec->ec_derives_hash);
+
+	/* Create the hash table */
+	ec->ec_derives_hash = derives_create(root->planner_cxt, 256, NULL);
+
+	foreach_node(RestrictInfo, rinfo, ec->ec_derives_list)
+		ec_add_clause_to_derives_hash(ec, rinfo);
+}
+
+/*
+ * ec_add_derived_clause
+ *		Add a clause to the set of derived clauses for the given
+ *		EquivalenceClass. Always appends to ec_derives_list; also adds
+ *		to ec_derives_hash if it exists.
+ */
+static void
+ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause)
+{
+	ec->ec_derives_list = lappend(ec->ec_derives_list, clause);
+	if (ec->ec_derives_hash)
+		ec_add_clause_to_derives_hash(ec, clause);
+}
+
+/*
+ * ec_add_derived_clauses
+ *		Add a list of clauses to the set of clauses derived from the given
+ *		EquivalenceClass; adding to the list and hash table if needed.
+ *
+ * This function is similar to above function but optimized for adding multiple
+ * clauses at a time to the ec_derives_list.
+ */
+static void
+ec_add_derived_clauses(EquivalenceClass *ec, List *clauses)
+{
+	ec->ec_derives_list = list_concat(ec->ec_derives_list, clauses);
+	if (ec->ec_derives_hash)
+		foreach_node(RestrictInfo, rinfo, clauses)
+			ec_add_clause_to_derives_hash(ec, rinfo);
+}
+
+/*
+ * ec_add_clause_to_derives_hash
+ *      Add a derived clause to the ec_derives_hash of the given
+ *      EquivalenceClass.
+ */
+static void
+ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
+{
+	ECDerivesKey key;
+	ECDerivesEntry *entry;
+	bool		found;
+
+	/*
+	 * Constants are always placed on the RHS; see
+	 * generate_base_implied_equalities_const().
+	 */
+	Assert(!rinfo->left_em->em_is_const);
+
+	if (rinfo->right_em->em_is_const)
+	{
+		/*
+		 * Clauses containing a constant are never considered redundant, so
+		 * parent_ec is not set.
+		 */
+		Assert(!rinfo->parent_ec);
+
+		/*
+		 * find_derived_clause_for_ec_member() performs lookup of a clause
+		 * involving a constant using only the non-constant EM and NULL for
+		 * the RHS. Since there's only one constant EM per EC, we don't need
+		 * to store or match it during lookup.  We set key.em2 = NULL to
+		 * reflect this.
+		 */
+		key.em1 = rinfo->left_em;
+		key.em2 = NULL;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+	}
+	else
+	{
+		key.em1 = rinfo->left_em;
+		key.em2 = rinfo->right_em;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+
+		/*
+		 * Insert the clause under the given EM pair key, and also under the
+		 * reverse order. This ensures we can find the clause regardless of
+		 * the order in which EMs are passed to the lookup function.
+		 */
+		key.em1 = rinfo->right_em;
+		key.em2 = rinfo->left_em;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+	}
+}
+
+/*
+ * ec_clear_derived_clauses
+ *      Reset ec_derives_list and ec_derives_hash.
+ *
+ * We destroy the hash table explicitly, since it may consume significant
+ * space. The list holds the same set of entries and can become equally large
+ * when thousands of partitions are involved, so we free it as well -- even
+ * though we do not typically free lists.
+ */
+void
+ec_clear_derived_clauses(EquivalenceClass *ec)
+{
+	list_free(ec->ec_derives_list);
+	ec->ec_derives_list = NIL;
+
+	if (ec->ec_derives_hash)
+	{
+		derives_destroy(ec->ec_derives_hash);
+		ec->ec_derives_hash = NULL;
+	}
+}
+
+/*
+ * ec_search_clause_for_ems
+ *		Search for an existing RestrictInfo that equates the given pair
+ *		of EquivalenceMembers, either from ec_sources or ec_derives.
+ *
+ * We accept clauses in either operand order, so commutators are matched. We
+ * used to require matching operator OIDs, but dropped that since any
+ * semantically different operator here would indicate a broken operator
+ * family.
+ *
+ * Returns NULL if no matching clause is found.
+ */
+
+static RestrictInfo *
+ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+						 EquivalenceMember *leftem, EquivalenceMember *rightem,
+						 EquivalenceClass *parent_ec)
+{
+	/* Check original source clauses */
+	foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+	{
+		if (rinfo->left_em == leftem &&
+			rinfo->right_em == rightem &&
+			rinfo->parent_ec == parent_ec)
+			return rinfo;
+		if (rinfo->left_em == rightem &&
+			rinfo->right_em == leftem &&
+			rinfo->parent_ec == parent_ec)
+			return rinfo;
+	}
+
+	/* Not found in ec_sources; search derived clauses */
+	return ec_search_derived_clause_for_ems(root, ec, leftem, rightem,
+											parent_ec);
+}
+
+/*
+ * ec_search_derived_clause_for_ems
+ *		Search for an existing derived clause between two EquivalenceMembers.
+ *
+ * If the number of derived clauses exceeds a threshold, switch to hash table
+ * lookup; otherwise, scan ec_derives_list linearly.
+ *
+ * Clauses involving constants are looked up using only the non-const EM
+ * (leftem) and a NULL rightem. In that case, we expect to find a clause with
+ * a constant on the RHS.
+ *
+ * We do not attempt a second lookup with EMs swapped when using the hash
+ * table; such clauses are inserted under both orderings at the time of
+ * insertion.
+ */
+static RestrictInfo *
+ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+								 EquivalenceMember *leftem,
+								 EquivalenceMember *rightem,
+								 EquivalenceClass *parent_ec)
+{
+	/*
+	 * Switch to using hash lookup when list grows "too long". The threshold
+	 * is arbitrary and is known only here.
+	 */
+	if (!ec->ec_derives_hash && list_length(ec->ec_derives_list) >= 32)
+		ec_build_derives_hash(root, ec);
+
+	/* Perform hash table lookup if available */
+	if (ec->ec_derives_hash)
+	{
+		ECDerivesKey key;
+		RestrictInfo *rinfo;
+		ECDerivesEntry *entry;
+
+		/*
+		 * See ec_add_clause_to_derives_hash() for rationale: derived clauses
+		 * are inserted into the hash table under both (em1, em2) and (em2,
+		 * em1), so a single lookup with the original order is sufficient.
+		 */
+		key.em1 = leftem;
+		key.em2 = rightem;
+		key.parent_ec = parent_ec;
+		entry = derives_lookup(ec->ec_derives_hash, key);
+		if (entry)
+		{
+			rinfo = entry->rinfo;
+			Assert(rinfo);
+
+			/*
+			 * If this is a lookup in a const-containing EC, the RHS must be a
+			 * constant. The caller signals this by passing NULL for rightem.
+			 */
+			Assert(rightem || rinfo->right_em->em_is_const);
+			return rinfo;
+		}
+	}
+	else
+	{
+		/* Fallback to linear search over ec_derives_list */
+		foreach_node(RestrictInfo, rinfo, ec->ec_derives_list)
+		{
+			/* Handle special case: lookup by non-const EM alone */
+			if (!rightem &&
+				rinfo->left_em == leftem)
+			{
+				/* See the comment above in hash path for rationale. */
+				Assert(rinfo->right_em->em_is_const);
+				return rinfo;
+			}
+			if (rinfo->left_em == leftem &&
+				rinfo->right_em == rightem &&
+				rinfo->parent_ec == parent_ec)
+				return rinfo;
+			if (rinfo->left_em == rightem &&
+				rinfo->right_em == leftem &&
+				rinfo->parent_ec == parent_ec)
+				return rinfo;
+		}
+	}
+
+	return NULL;
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 8a8d4a2af33..ae20691ca91 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -749,7 +749,7 @@ remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
 	 * drop them.  (At this point, any such clauses would be base restriction
 	 * clauses, which we'd not need anymore anyway.)
 	 */
-	ec->ec_derives = NIL;
+	ec_clear_derived_clauses(ec);
 }
 
 /*
@@ -1544,8 +1544,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
 	list_free(ec->ec_members);
 	ec->ec_members = new_members;
 
-	list_free(ec->ec_derives);
-	ec->ec_derives = NULL;
+	ec_clear_derived_clauses(ec);
 
 	/* Update EC source expressions */
 	foreach_node(RestrictInfo, rinfo, ec->ec_sources)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c24a1fc8514..e29d26a1814 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1403,6 +1403,18 @@ typedef struct JoinDomain
  * entry: consider SELECT random() AS a, random() AS b ... ORDER BY b,a.
  * So we record the SortGroupRef of the originating sort clause.
  *
+ * Derived equality clauses are stored in ec_derives_list. For small queries,
+ * this list is scanned directly during lookup. For larger queries -- e.g.,
+ * with many partitions or joins -- a hash table (ec_derives_hash) is built
+ * when the list grows beyond a threshold, for faster lookup. When present,
+ * the hash table contains the same RestrictInfos and is maintained alongside
+ * the list. We retain the list even when the hash is used to simplify
+ * serialization (e.g., in _outEquivalenceClass()) and support
+ * EquivalenceClass merging.
+ *
+ * In contrast, ec_sources holds equality clauses that appear directly in the
+ * query. These are typically few and do not require a hash table for lookup.
+ *
  * NB: if ec_merged isn't NULL, this class has been merged into another, and
  * should be ignored in favor of using the pointed-to class.
  *
@@ -1422,7 +1434,10 @@ typedef struct EquivalenceClass
 	Oid			ec_collation;	/* collation, if datatypes are collatable */
 	List	   *ec_members;		/* list of EquivalenceMembers */
 	List	   *ec_sources;		/* list of generating RestrictInfos */
-	List	   *ec_derives;		/* list of derived RestrictInfos */
+	List	   *ec_derives_list;	/* list of derived RestrictInfos */
+	struct derives_hash *ec_derives_hash;	/* optional hash table for fast
+											 * lookup; contains same
+											 * RestrictInfos as list */
 	Relids		ec_relids;		/* all relids appearing in ec_members, except
 								 * for child members (see below) */
 	bool		ec_has_const;	/* any pseudoconstants in ec_members? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index bc5dfd7db41..16dc4d5ee82 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -167,7 +167,8 @@ extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
 extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
-extern RestrictInfo *find_derived_clause_for_ec_member(EquivalenceClass *ec,
+extern RestrictInfo *find_derived_clause_for_ec_member(PlannerInfo *root,
+													   EquivalenceClass *ec,
 													   EquivalenceMember *em);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 									   AppendRelInfo *appinfo,
@@ -197,6 +198,7 @@ extern bool eclass_useful_for_merging(PlannerInfo *root,
 extern bool is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist);
 extern bool is_redundant_with_indexclauses(RestrictInfo *rinfo,
 										   List *indexclauses);
+extern void ec_clear_derived_clauses(EquivalenceClass *ec);
 
 /*
  * pathkeys.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b66cecd8799..997abe1427c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -667,6 +667,8 @@ DumpableObjectWithAcl
 DynamicFileList
 DynamicZoneAbbrev
 EC_KEY
+ECDerivesKey
+ECDerivesEntry
 EDGE
 ENGINE
 EOM_flatten_into_method
-- 
2.34.1

From aeb2e9e7058dc90851123185135176ebfd24317d Mon Sep 17 00:00:00 2001
From: Amit Langote <amit...@postgresql.org>
Date: Tue, 1 Apr 2025 16:06:48 +0900
Subject: [PATCH 3/4] Canonicalize keys for derived clause hash table

Derived clauses are now stored in the ec_derives_hash using canonicalized
keys: the lower-addressed EquivalenceMember is always placed in em1, and
the higher-addressed one in em2. This avoids storing or searching for
both permutations of the same clause and eliminates the need for
redundant lookups.

For clauses involving a constant EM, only the non-constant EM is included
in the key; em1 is set to NULL and em2 to the non-constant EM.

The initial hash table size is set to twice the length of ec_derives_list
to avoid immediate resizing.

This design follows suggestions by David Rowley, who proposed both the
key canonicalization to reduce entry count and guidance on sizing the
hash table to balance memory usage and CPU efficiency.
---
 src/backend/optimizer/path/equivclass.c | 146 ++++++++++++++----------
 1 file changed, 84 insertions(+), 62 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b988db4d7b3..12155c0c957 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -88,7 +88,9 @@ static RestrictInfo *ec_search_derived_clause_for_ems(PlannerInfo *root,
 													  EquivalenceClass *parent_ec);
 
 /*
- * Hash key identifying a derived clause by EquivalenceMembers and parent EC.
+ * Hash key identifying a derived clause.
+ *
+ * See fill_ec_derives_key() for details on canonical key construction.
  */
 typedef struct
 {
@@ -3450,15 +3452,22 @@ get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2)
 
 /*
  * ec_build_derives_hash
- *	  Construct the auxiliary hash table for derived clauses.
+ *	  Construct the auxiliary hash table for derived clause lookups.
  */
 static void
 ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec)
 {
 	Assert(!ec->ec_derives_hash);
 
-	/* Create the hash table */
-	ec->ec_derives_hash = derives_create(root->planner_cxt, 256, NULL);
+	/*
+	 * Create the hash table.
+	 *
+	 * We set the initial size to twice the number of clauses in
+	 * ec_derives_list to avoid immediate resizing during insertion.
+	 */
+	ec->ec_derives_hash = derives_create(root->planner_cxt,
+										 list_length(ec->ec_derives_list) * 2,
+										 NULL);
 
 	foreach_node(RestrictInfo, rinfo, ec->ec_derives_list)
 		ec_add_clause_to_derives_hash(ec, rinfo);
@@ -3495,10 +3504,56 @@ ec_add_derived_clauses(EquivalenceClass *ec, List *clauses)
 			ec_add_clause_to_derives_hash(ec, rinfo);
 }
 
+/*
+ * fill_ec_derives_key
+ *		Compute a canonical key for ec_derives_hash lookup or insertion.
+ *
+ * Derived clauses are looked up using a pair of EquivalenceMembers and a
+ * parent EquivalenceClass. To avoid storing or searching for both EM orderings,
+ * we canonicalize the key:
+ *
+ * - For clauses involving two non-constant EMs, we order the EMs by address
+ *   and place the lower one first.
+ * - For clauses involving a constant EM, the caller must pass the non-constant
+ *   EM as leftem; we then set em1 = NULL and em2 = leftem.
+ */
+static inline void
+fill_ec_derives_key(ECDerivesKey *key,
+					EquivalenceMember *leftem,
+					EquivalenceMember *rightem,
+					EquivalenceClass *parent_ec)
+{
+	Assert(leftem);				/* Always required for lookup or insertion */
+
+	/*
+	 * Clauses with a constant EM are always stored and looked up using only
+	 * the non-constant EM, with the other key slot set to NULL.
+	 */
+	if (rightem == NULL)
+	{
+		key->em1 = NULL;
+		key->em2 = leftem;
+	}
+	else if (leftem < rightem)
+	{
+		key->em1 = leftem;
+		key->em2 = rightem;
+	}
+	else
+	{
+		key->em1 = rightem;
+		key->em2 = leftem;
+	}
+	key->parent_ec = parent_ec;
+}
+
 /*
  * ec_add_clause_to_derives_hash
- *      Add a derived clause to the ec_derives_hash of the given
- *      EquivalenceClass.
+ *		Add a derived clause to ec_derives_hash for the given EquivalenceClass.
+ *
+ * Each clause is inserted under a canonicalized key. For constant-containing
+ * clauses, only the non-constant EM is used for lookup; see comments in
+ * fill_ec_derives_key().
  */
 static void
 ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
@@ -3513,49 +3568,24 @@ ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
 	 */
 	Assert(!rinfo->left_em->em_is_const);
 
-	if (rinfo->right_em->em_is_const)
-	{
-		/*
-		 * Clauses containing a constant are never considered redundant, so
-		 * parent_ec is not set.
-		 */
-		Assert(!rinfo->parent_ec);
-
-		/*
-		 * find_derived_clause_for_ec_member() performs lookup of a clause
-		 * involving a constant using only the non-constant EM and NULL for
-		 * the RHS. Since there's only one constant EM per EC, we don't need
-		 * to store or match it during lookup.  We set key.em2 = NULL to
-		 * reflect this.
-		 */
-		key.em1 = rinfo->left_em;
-		key.em2 = NULL;
-		key.parent_ec = rinfo->parent_ec;
-		entry = derives_insert(ec->ec_derives_hash, key, &found);
-		Assert(!found);
-		entry->rinfo = rinfo;
-	}
-	else
-	{
-		key.em1 = rinfo->left_em;
-		key.em2 = rinfo->right_em;
-		key.parent_ec = rinfo->parent_ec;
-		entry = derives_insert(ec->ec_derives_hash, key, &found);
-		Assert(!found);
-		entry->rinfo = rinfo;
+	/*
+	 * Clauses containing a constant are never considered redundant, so
+	 * parent_ec is not set.
+	 */
+	Assert(!rinfo->parent_ec || !rinfo->right_em->em_is_const);
 
-		/*
-		 * Insert the clause under the given EM pair key, and also under the
-		 * reverse order. This ensures we can find the clause regardless of
-		 * the order in which EMs are passed to the lookup function.
-		 */
-		key.em1 = rinfo->right_em;
-		key.em2 = rinfo->left_em;
-		key.parent_ec = rinfo->parent_ec;
-		entry = derives_insert(ec->ec_derives_hash, key, &found);
-		Assert(!found);
-		entry->rinfo = rinfo;
-	}
+	/*
+	 * See fill_ec_derives_key() for details: we use a canonicalized key to
+	 * avoid storing both EM orderings. For constant EMs, only the
+	 * non-constant EM is included in the key.
+	 */
+	fill_ec_derives_key(&key,
+						rinfo->left_em,
+						rinfo->right_em->em_is_const ? NULL : rinfo->right_em,
+						rinfo->parent_ec);
+	entry = derives_insert(ec->ec_derives_hash, key, &found);
+	Assert(!found);
+	entry->rinfo = rinfo;
 }
 
 /*
@@ -3623,13 +3653,12 @@ ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
  * If the number of derived clauses exceeds a threshold, switch to hash table
  * lookup; otherwise, scan ec_derives_list linearly.
  *
- * Clauses involving constants are looked up using only the non-const EM
- * (leftem) and a NULL rightem. In that case, we expect to find a clause with
- * a constant on the RHS.
+ * Clauses involving constants are looked up using only the non-constant EM
+ * (leftem), with rightem set to NULL. In that case, we expect to find a
+ * clause with a constant on the RHS.
  *
- * We do not attempt a second lookup with EMs swapped when using the hash
- * table; such clauses are inserted under both orderings at the time of
- * insertion.
+ * We do not attempt a second lookup with EMs swapped, because the key is
+ * canonicalized during both insertion and lookup.
  */
 static RestrictInfo *
 ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
@@ -3651,14 +3680,7 @@ ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
 		RestrictInfo *rinfo;
 		ECDerivesEntry *entry;
 
-		/*
-		 * See ec_add_clause_to_derives_hash() for rationale: derived clauses
-		 * are inserted into the hash table under both (em1, em2) and (em2,
-		 * em1), so a single lookup with the original order is sufficient.
-		 */
-		key.em1 = leftem;
-		key.em2 = rightem;
-		key.parent_ec = parent_ec;
+		fill_ec_derives_key(&key, leftem, rightem, parent_ec);
 		entry = derives_lookup(ec->ec_derives_hash, key);
 		if (entry)
 		{
-- 
2.34.1

Reply via email to