> On Thu, Aug 13, 2015 at 2:49 AM, Kouhei Kaigai <[email protected]> wrote:
> > In fact, cost of HashJoin underlying Sort node is:
> > -> Hash Join (cost=621264.91..752685.48 rows=1 width=132)
> >
> > On the other hands, NestedLoop on same place is:
> > -> Nested Loop (cost=0.00..752732.26 rows=1 width=132)
> >
> > Probably, small GUC adjustment may make optimizer to prefer HashJoin towards
> > these kind of queries.
>
> With that kind of discrepancy I doubt adjusting GUCs will be sufficient
>
> > Do you have a good idea?
>
> Do you have EXPLAIN ANALYZE from the plan that finishes? Are there any
> row estimates that are way off?
>
Yes, EXPLAIN ANALYZE is attached.
According to this, CTE year_total generates 384,208 rows. It is much smaller
than estimation (4.78M rows), however, filter's selectivity of CTE Scan was
not large as expectation.
For example, the deepest CTE Scan returns 37923 rows and 26314 rows, even though
40 rows were expected. On the next level, relations join between 11324 rows and
9952 rows, towards to estimation of 40rows x 8 rows.
If NestLoop is placed instead of HashJoin, it will make an explosion of the
number
of loops.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <[email protected]>
[kaigai@ayu tpcds]$ (echo "SET enable_nestloop=off;"; echo EXPLAIN ANALYZE; cat
query04.sql) | psql tpcds
SET
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1248761.93..1248761.93 rows=1 width=132) (actual
time=10831.134..10831.134 rows=8 loops=1)
CTE year_total
-> Append (cost=193769.66..496076.44 rows=4778919 width=220) (actual
time=5510.862..10034.982 rows=384208 loops=1)
-> HashAggregate (cost=193769.66..226692.26 rows=2633808
width=178) (actual time=5510.862..5654.366 rows=190581 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name,
customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country,
customer.c_login, customer.c_email_address, date_dim.d_year
-> Custom Scan (GpuJoin) (cost=14554.84..108170.90
rows=2633808 width=178) (actual time=987.623..1221.769 rows=2685453 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys:
(ss_sold_date_sk), JoinQual: (ss_sold_date_sk = d_date_sk), nrows_ratio:
0.95623338
Depth 2: Logic: GpuHashJoin, HashKeys: (ss_customer_sk),
JoinQual: (ss_customer_sk = c_customer_sk), nrows_ratio: 0.91441411
-> Custom Scan (BulkScan) on store_sales
(cost=0.00..96501.23 rows=2880323 width=38) (actual time=10.139..935.822
rows=2880404 loops=1)
-> Seq Scan on date_dim (cost=0.00..2705.49 rows=73049
width=16) (actual time=0.012..13.443 rows=73049 loops=1)
-> Seq Scan on customer (cost=0.00..4358.00
rows=100000 width=156) (actual time=0.004..18.978 rows=100000 loops=1)
-> HashAggregate (cost=125474.72..143301.10 rows=1426111
width=181) (actual time=2784.068..2882.514 rows=136978 loops=1)
Group Key: customer_1.c_customer_id, customer_1.c_first_name,
customer_1.c_last_name, customer_1.c_preferred_cust_flag,
customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address,
date_dim_1.d_year
-> Custom Scan (GpuJoin) (cost=14610.07..79126.11
rows=1426111 width=181) (actual time=319.825..431.830 rows=1430939 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys:
(cs_bill_customer_sk), JoinQual: (c_customer_sk = cs_bill_customer_sk),
nrows_ratio: 0.99446636
Depth 2: Logic: GpuHashJoin, HashKeys:
(cs_sold_date_sk), JoinQual: (cs_sold_date_sk = d_date_sk), nrows_ratio:
0.98929483
-> Custom Scan (BulkScan) on catalog_sales
(cost=0.00..65628.43 rows=1441543 width=41) (actual time=9.649..260.027
rows=1441548 loops=1)
-> Seq Scan on customer customer_1 (cost=0.00..4358.00
rows=100000 width=156) (actual time=0.010..13.686 rows=100000 loops=1)
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2705.49
rows=73049 width=16) (actual time=0.004..9.383 rows=73049 loops=1)
-> HashAggregate (cost=69306.38..78293.88 rows=719000 width=181)
(actual time=1435.470..1469.888 rows=56649 loops=1)
Group Key: customer_2.c_customer_id, customer_2.c_first_name,
customer_2.c_last_name, customer_2.c_preferred_cust_flag,
customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address,
date_dim_2.d_year
-> Custom Scan (GpuJoin) (cost=14702.18..45938.88
rows=719000 width=181) (actual time=196.365..252.823 rows=719119 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys:
(ws_bill_customer_sk), JoinQual: (c_customer_sk = ws_bill_customer_sk),
nrows_ratio: 0.99973309
Depth 2: Logic: GpuHashJoin, HashKeys:
(ws_sold_date_sk), JoinQual: (ws_sold_date_sk = d_date_sk), nrows_ratio:
0.99946618
-> Custom Scan (BulkScan) on web_sales
(cost=0.00..32877.84 rows=719384 width=41) (actual time=10.217..137.788
rows=719384 loops=1)
-> Seq Scan on customer customer_2 (cost=0.00..4358.00
rows=100000 width=156) (actual time=0.009..13.679 rows=100000 loops=1)
-> Seq Scan on date_dim date_dim_2 (cost=0.00..2705.49
rows=73049 width=16) (actual time=0.004..9.343 rows=73049 loops=1)
-> Sort (cost=752685.49..752685.50 rows=1 width=132) (actual
time=10831.134..10831.134 rows=8 loops=1)
Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name,
t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
Sort Method: quicksort Memory: 27kB
-> Hash Join (cost=621264.91..752685.48 rows=1 width=132) (actual
time=10812.727..10831.071 rows=8 loops=1)
Hash Cond: (t_s_secyear.customer_id = t_w_secyear.customer_id)
Join Filter: (CASE WHEN (t_c_firstyear.year_total >
'0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE
NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN
(t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END)
Rows Removed by Join Filter: 4
-> Hash Join (cost=501790.45..633210.98 rows=1 width=308)
(actual time=518.438..537.969 rows=72 loops=1)
Hash Cond: (t_s_secyear.customer_id =
t_c_secyear.customer_id)
Join Filter: (CASE WHEN (t_c_firstyear.year_total >
'0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE
NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN
(t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END)
Rows Removed by Join Filter: 57
-> Hash Join (cost=382315.99..513736.47 rows=1
width=320) (actual time=434.939..454.328 rows=437 loops=1)
Hash Cond: (t_s_firstyear.customer_id =
t_s_secyear.customer_id)
-> Hash Join (cost=262841.53..394261.97 rows=2
width=156) (actual time=342.768..361.650 rows=1171 loops=1)
Hash Cond: (t_w_firstyear.customer_id =
t_s_firstyear.customer_id)
-> CTE Scan on year_total t_w_firstyear
(cost=0.00..131420.27 rows=40 width=52) (actual time=93.409..110.721 rows=11324
loops=1)
Filter: ((year_total > '0'::numeric) AND
(sale_type = 'w'::text) AND (dyear = 2001))
Rows Removed by Filter: 372884
-> Hash (cost=262841.43..262841.43 rows=8
width=104) (actual time=249.311..249.311 rows=9952 loops=1)
Buckets: 16384 (originally 1024)
Batches: 1 (originally 1) Memory Usage: 934kB
-> Hash Join
(cost=131420.77..262841.43 rows=8 width=104) (actual time=120.248..246.712
rows=9952 loops=1)
Hash Cond:
(t_s_firstyear.customer_id = t_c_firstyear.customer_id)
-> CTE Scan on year_total
t_s_firstyear (cost=0.00..131420.27 rows=40 width=52) (actual
time=0.008..119.731 rows=37923 loops=1)
Filter: ((year_total >
'0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
Rows Removed by Filter:
346285
-> Hash
(cost=131420.27..131420.27 rows=40 width=52) (actual time=120.209..120.209
rows=26314 loops=1)
Buckets: 32768 (originally
1024) Batches: 1 (originally 1) Memory Usage: 1725kB
-> CTE Scan on year_total
t_c_firstyear (cost=0.00..131420.27 rows=40 width=52) (actual
time=53.855..114.811 rows=26314 loops=1)
Filter: ((year_total >
'0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
Rows Removed by
Filter: 357894
-> Hash (cost=119472.98..119472.98 rows=119
width=164) (actual time=92.151..92.151 rows=38175 loops=1)
Buckets: 65536 (originally 1024) Batches: 1
(originally 1) Memory Usage: 6369kB
-> CTE Scan on year_total t_s_secyear
(cost=0.00..119472.98 rows=119 width=164) (actual time=0.006..79.445 rows=38175
loops=1)
Filter: ((sale_type = 's'::text) AND
(dyear = 2002))
Rows Removed by Filter: 346033
-> Hash (cost=119472.98..119472.98 rows=119 width=52)
(actual time=83.245..83.245 rows=27177 loops=1)
Buckets: 32768 (originally 1024) Batches: 1
(originally 1) Memory Usage: 1772kB
-> CTE Scan on year_total t_c_secyear
(cost=0.00..119472.98 rows=119 width=52) (actual time=35.634..77.900 rows=27177
loops=1)
Filter: ((sale_type = 'c'::text) AND (dyear =
2002))
Rows Removed by Filter: 357031
-> Hash (cost=119472.98..119472.98 rows=119 width=52) (actual
time=10293.033..10293.033 rows=11252 loops=1)
Buckets: 16384 (originally 1024) Batches: 1 (originally
1) Memory Usage: 759kB
-> CTE Scan on year_total t_w_secyear
(cost=0.00..119472.98 rows=119 width=52) (actual time=10212.333..10290.662
rows=11252 loops=1)
Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
Rows Removed by Filter: 372956
Planning time: 9.320 ms
Execution time: 11249.081 ms
(77 rows)
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers