Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Alena Rybakina Wed, 28 Jun 2023 06:53:35 -0700

Hi!

On 15.06.2023 11:30, Andrey Lepikhov wrote:

Hi, all.
Some of my clients use JOIN's with three - four clauses. Quitefrequently, I see complaints on unreasonable switch of JOIN algorithmto Merge Join instead of Hash Join. Quick research have shown one weakplace - estimation of an average bucket size in final_cost_hashjoin(see q2.sql in attachment) with very conservative strategy.Unlike estimation of groups, here we use smallest ndistinct valueacross all buckets instead of multiplying them (or trying to makemultivariate analysis).It works fine for the case of one clause. But if we have many clauses,and if each has high value of ndistinct, we will overestimate averagesize of a bucket and, as a result, prefer to use Merge Join. As theexample in attachment shows, it leads to worse plan than possible,sometimes drastically worse.I assume, this is done with fear of functional dependencies betweenhash clause components. But as for me, here we should go the same way,as estimation of groups.
The attached patch shows a sketch of the solution.

This problem is very important.

Honestly, I'm still learning your code and looking for cases on whichcases your patch can affect for the worse or for the better. But I havealready found something that seemed interesting to me. I have foundseveral other interesting cases where your patch can solve some problemin order to choose a more correct plan, but in focus on memory consumption.

To make it easier to evaluate, I added a hook to your patch that makesit easier to switch to your or the original way of estimating the sizeof baskets (diff_estimate.diff).


Here are other cases where your fix improves the query plan.

First of all, I changed the way creation of tables are created to lookat the behavior of the query plan in terms of planning and execution time:


DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
  SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
  FROM generate_series(1,1e5) AS gs;
CREATE TABLE b AS
  SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
  FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;

SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;

SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;


                                QUERY PLAN
---------------------------------------------------------------------------
 Hash Join (actual time=200.872..200.879 rows=0 loops=1)
   Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
   ->  Seq Scan on b (actual time=0.029..15.946 rows=100000 loops=1)
   ->  Hash (actual time=97.645..97.649 rows=100000 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 5612kB
         ->  Seq Scan on a (actual time=0.024..17.153 rows=100000 loops=1)
 Planning Time: 2.910 ms
 Execution Time: 201.949 ms
(8 rows)

SET
                                QUERY PLAN
---------------------------------------------------------------------------
 Merge Join (actual time=687.415..687.416 rows=0 loops=1)
   Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
   ->  Sort (actual time=462.022..536.716 rows=100000 loops=1)
         Sort Key: b.y, b.x, b.z
         Sort Method: external merge  Disk: 3328kB
         ->  Seq Scan on b (actual time=0.017..12.326 rows=100000 loops=1)
   ->  Sort (actual time=111.295..113.196 rows=16001 loops=1)
         Sort Key: a.y, a.x, a.z
         Sort Method: external sort  Disk: 2840kB
         ->  Seq Scan on a (actual time=0.020..10.129 rows=100000 loops=1)
 Planning Time: 0.752 ms
 Execution Time: 688.829 ms
(12 rows)

Secondly, I found another case that is not related to the fact that theplanner would prefer to choose merge join rather than hash join, but wehave the opportunity to see that the plan has become better due to theconsumption of less memory, and also takes less planning time.

Here, with the same query, the planning time was reduced by 5 times, andthe number of buckets by 128 times, therefore, memory consumption alsodecreased:


DROP TABLE IF EXISTS a,b CASCADE;

CREATE TABLE a AS
  SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
  FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
  SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
  FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;

SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;

SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------

Hash Join (cost=20.50..3157.58 rows=8 width=32) (actualtime=95.648..95.651 rows=0 loops=1) Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND(b.z = (a.z)::numeric)) -> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actualtime=0.027..17.980 rows=100000 loops=1) -> Hash (cost=10.00..10.00 rows=600 width=12) (actualtime=2.046..2.047 rows=600 loops=1)

         Buckets: 1024  Batches: 1  Memory Usage: 34kB

-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12)(actual time=0.022..0.315 rows=600 loops=1)

 Planning Time: 0.631 ms
 Execution Time: 95.730 ms
(8 rows)

SET
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------

Hash Join (cost=3387.00..8621.58 rows=8 width=32) (actualtime=102.873..102.877 rows=0 loops=1) Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND((a.z)::numeric = b.z)) -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actualtime=0.014..0.131 rows=600 loops=1) -> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actualtime=101.920..101.921 rows=100000 loops=1)

         Buckets: 131072  Batches: 1  Memory Usage: 6474kB

-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20)(actual time=0.013..16.349 rows=100000 loops=1)

 Planning Time: 0.153 ms
 Execution Time: 103.518 ms
(8 rows)

I also give an improvement relative to the left external or rightconnection:


DROP TABLE IF EXISTS a,b CASCADE;

CREATE TABLE a AS
  SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
  FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
  SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
  FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;


SET enable_cost_size = 'on';

EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;

SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------

Hash Left Join (cost=20.50..3157.58 rows=100000 width=32) (actualtime=1.846..102.264 rows=100000 loops=1) Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND(b.z = (a.z)::numeric)) -> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actualtime=0.041..15.328 rows=100000 loops=1) -> Hash (cost=10.00..10.00 rows=600 width=12) (actualtime=1.780..1.781 rows=600 loops=1)

         Buckets: 1024  Batches: 1  Memory Usage: 34kB

-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12)(actual time=0.031..0.252 rows=600 loops=1)

 Planning Time: 0.492 ms
 Execution Time: 107.609 ms
(8 rows)

SET
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------

Hash Right Join (cost=3387.00..8500.08 rows=100000 width=32) (actualtime=80.919..101.613 rows=100000 loops=1) Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND((a.z)::numeric = b.z)) -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actualtime=0.017..0.084 rows=600 loops=1) -> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actualtime=80.122..80.123 rows=100000 loops=1)

         Buckets: 131072  Batches: 1  Memory Usage: 6474kB

-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20)(actual time=0.015..11.819 rows=100000 loops=1)

 Planning Time: 0.194 ms
 Execution Time: 104.662 ms
(8 rows)

--
Regards,
Alena Rybakina
Postgres Professional

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..31771dfba46 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -153,6 +153,7 @@ bool		enable_parallel_hash = true;
 bool		enable_partition_pruning = true;
 bool		enable_presorted_aggregate = true;
 bool		enable_async_append = true;
+bool 		enable_cost_size = true;
 
 typedef struct
 {
@@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 				thismcvfreq = restrictinfo->left_mcvfreq;
 			}
 
+			if (enable_cost_size)
+			{
+				innerbucketsize *= thisbucketsize;
+				innermcvfreq *= thismcvfreq;
+			}
+			else
+			{
 			if (innerbucketsize > thisbucketsize)
 				innerbucketsize = thisbucketsize;
 			if (innermcvfreq > thismcvfreq)
 				innermcvfreq = thismcvfreq;
+			}
 		}
+
+		if (enable_cost_size && innerbucketsize > virtualbuckets)
+			innerbucketsize = 1.0 / virtualbuckets;
 	}
 
 	/*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 71e27f8eb05..ded9ba3b7a9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("set the optimizer coefficient"
+						 "so that custom or generic plan is selected more often. "
+						 "by default, the value is set to 1, which means that "
+						 "the choice of using both depends on the calculated cost"),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_cost_size,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of async append plans."),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..c79ec12e6d5 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
 extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT bool enable_cost_size;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
 								  double index_pages, PlannerInfo *root);

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Reply via email to