Hi Ashutosh,
Thanks for your reply!
>I first thought enable_hashagg should be sufficient to choose one strategy
over the other. But that is not true, enable_hashagg = true allows both the
strategies, enable_hashagg = false disables just hash strategy. There's no
way to disable group agg alone. So I think it makes sense to have this GUC.
Yes, exactly! Thank you for the clarification.
>I am surprised that we didn't see this being a problem for so long.
I was also wondering why this GUC parameter hadn't been introduced
earlier.
>We seem to disable mixed strategy when enable_hashagg is false. Do we want
to do the same when enable_groupagg = false as well?
As I mentioned in my previous email, setting enable_groupagg = false
resulted in better execution time compared to the mixed strategy (which,
as I understand, includes both HashAgg and GroupAgg). So, I believe
this new GUC parameter would be helpful for users in certain situations.
Here’s how the current and proposed behaviors compare:
Current behavior:
enable_hashagg = ON → Mixed
enable_hashagg = OFF → GroupAgg
After applying the patch:
enable_hashagg = ON, enable_groupagg = ON → Mixed
enable_hashagg = OFF, enable_groupagg = ON → GroupAgg
enable_hashagg = ON, enable_groupagg = OFF → HashAgg (new)
enable_hashagg = OFF, enable_groupagg = OFF → GroupAgg
In addition, if the both parameters = OFF, GroupAgg will be selected in
the patch. The reason is that I found a comment that HashAgg might use
too much memory, so I decided to prioritize using GroupAgg, which is
more secure.
In the last case, I chose to default to GroupAgg since I found a
comment suggesting HashAgg might consume excessive memory, so GroupAgg
seemed the safer fallback.
Some hackers may want to compare the actual execution plans and times
under different GUC settings, so I've attached my test results in
"test_result.txt".
>Some of those instances are for plan stability, all of which need not be
replicated. But some of them explicitly test sort based grouping. For rest
of them hash based plan seems to be the best one, so explicit
enable_groupagg = false is not needed. We will need some test to test the
switch though.
Thanks for your advice. I'll create a regression test and send a new patch
to -hackers in my next email.
Regards,
Tatsuro Yamada
# 1. Setting: default (enable_hashagg = ON, enable_groupagg = ON)
# Mix strategy was used
# Execution time was 41 sec
====
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=159135.83..7790760.21 rows=100 width=66) (actual
time=12449.124..41253.731 rows=100.00 loops=1)
Buffers: shared hit=116473 read=30167, temp read=1482 written=11035
-> GroupAggregate (cost=159135.83..1689495507.58 rows=22136 width=66)
(actual time=12449.121..41253.648 rows=100.00 loops=1)
Group Key: cd.cd_gender, cd.cd_marital_status, cd.cd_education_status,
cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Buffers: shared hit=116473 read=30167, temp read=1482 written=11035
-> Nested Loop (cost=159135.83..1689494898.84 rows=22136 width=34)
(actual time=12317.664..41252.393 rows=101.00 loops=1)
Join Filter: (cd.cd_demo_sk = c.c_current_cdemo_sk)
Rows Removed by Join Filter: 197188565
Buffers: shared hit=116473 read=30167, temp read=1482
written=11035
-> Gather Merge (cost=125190.47..348899.28 rows=1920800
width=38) (actual time=10122.049..10269.869 rows=34276.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4889 read=11029, temp read=1482
written=11035
-> Sort (cost=124190.45..126191.28 rows=800333 width=38)
(actual time=9985.641..10044.083 rows=11982.00 loops=3)
Sort Key: cd.cd_gender, cd.cd_marital_status,
cd.cd_education_status, cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Sort Method: external merge Disk: 28488kB
Buffers: shared hit=4889 read=11029, temp read=1482
written=11035
Worker 0: Sort Method: external merge Disk: 29392kB
Worker 1: Sort Method: external merge Disk: 30112kB
-> Parallel Seq Scan on customer_demographics cd
(cost=0.00..23831.33 rows=800333 width=38) (actual time=0.377..852.678
rows=640266.67 loops=3)
Buffers: shared hit=4799 read=11029
-> Materialize (cost=33945.36..1051363622.90 rows=22136
width=4) (actual time=0.052..0.337 rows=5752.97 loops=34276)
Storage: Memory Maximum Storage: 289kB
Buffers: shared hit=111584 read=19138
-> Nested Loop (cost=33945.36..1051363512.22 rows=22136
width=4) (actual time=1785.640..2183.175 rows=5753.00 loops=1)
Buffers: shared hit=111584 read=19138
-> Nested Loop (cost=33945.07..1051355209.75
rows=24605 width=8) (actual time=1785.567..2119.781 rows=6409.00 loops=1)
Buffers: shared hit=92357 read=19138
-> HashAggregate (cost=33944.78..34272.85
rows=32807 width=4) (actual time=819.215..842.604 rows=28544.00 loops=1)
Group Key: ss.ss_customer_sk
Batches: 1 Memory Usage: 2329kB
Buffers: shared hit=2187 read=11744
-> Gather (cost=2683.76..33862.76
rows=32807 width=4) (actual time=23.253..768.211 rows=33609.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2187 read=11744
-> Hash Join
(cost=1683.76..29582.06 rows=13670 width=4) (actual time=22.719..780.073
rows=11203.00 loops=3)
Hash Cond:
(ss.ss_sold_date_sk = d.d_date_sk)
Buffers: shared hit=2187
read=11744
-> Parallel Seq Scan on
store_sales ss (cost=0.00..24747.68 rows=1200168 width=8) (actual
time=5.144..424.772 rows=960134.67 loops=3)
Buffers: shared
hit=1002 read=11744
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=17.501..17.502
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d (cost=0.00..1673.36 rows=832 width=4) (actual time=0.024..17.253
rows=848.00 loops=3)
Filter: ((d_moy
>= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed by
Filter: 72201
Buffers: shared
hit=1185
-> Index Scan using customer_pkey on customer
c (cost=0.29..36766.33 rows=1 width=12) (actual time=0.044..0.044 rows=0.22
loops=28544)
Index Cond: (c_customer_sk =
ss.ss_customer_sk)
Filter: ((ANY (c_customer_sk = (hashed
SubPlan 2).col1)) OR (ANY (c_customer_sk = (hashed SubPlan 4).col1)))
Rows Removed by Filter: 1
Index Searches: 28544
Buffers: shared hit=90170 read=7394
SubPlan 2
-> Gather (cost=2683.76..10471.46
rows=8194 width=4) (actual time=18.769..103.477 rows=8156.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2496
read=1873
-> Hash Join
(cost=1683.76..8652.06 rows=3414 width=4) (actual time=27.114..97.488
rows=2718.67 loops=3)
Hash Cond:
(ws.ws_sold_date_sk = d_1.d_date_sk)
Buffers: shared hit=2496
read=1873
-> Parallel Seq Scan on
web_sales ws (cost=0.00..6181.43 rows=299743 width=8) (actual
time=0.005..26.446 rows=239794.67 loops=3)
Buffers: shared
hit=1311 read=1873
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=27.079..27.080
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_1 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.033..26.791
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
SubPlan 4
-> Gather (cost=2683.76..18287.89
rows=16419 width=4) (actual time=11.472..831.348 rows=16874.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2043
read=5521
-> Hash Join
(cost=1683.76..15645.99 rows=6841 width=4) (actual time=18.591..824.234
rows=5624.67 loops=3)
Hash Cond:
(cs.cs_sold_date_sk = d_2.d_date_sk)
Buffers: shared hit=2043
read=5521
-> Parallel Seq Scan on
catalog_sales cs (cost=0.00..12385.45 rows=600645 width=8) (actual
time=0.334..455.274 rows=480516.00 loops=3)
Buffers: shared
hit=858 read=5521
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=18.205..18.206
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_2 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.048..17.917
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_address_pkey on
customer_address ca (cost=0.29..0.34 rows=1 width=4) (actual time=0.009..0.009
rows=0.90 loops=6409)
Index Cond: (ca_address_sk =
c.c_current_addr_sk)
Filter: ((ca_county)::text = ANY ('{"Rush
County","Toole County","Jefferson County","Dona Ana County","La Porte
County"}'::text[]))
Rows Removed by Filter: 0
Index Searches: 6409
Buffers: shared hit=19227
Planning:
Buffers: shared hit=316
Planning Time: 5.683 ms
Execution Time: 41263.299 ms
(98 rows)
====
# 2. Setting: enable_hashagg = OFF, enable_groupagg = ON
# GroupAgg was used
# Execution time was 44 sec
====
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=161514.62..7793138.25 rows=100 width=66) (actual
time=10197.188..44152.766 rows=100.00 loops=1)
Buffers: shared hit=118758 read=27882, temp read=1482 written=11040
-> GroupAggregate (cost=161514.62..1689497722.34 rows=22136 width=66)
(actual time=10197.187..44152.676 rows=100.00 loops=1)
Group Key: cd.cd_gender, cd.cd_marital_status, cd.cd_education_status,
cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Buffers: shared hit=118758 read=27882, temp read=1482 written=11040
-> Nested Loop (cost=161514.62..1689497113.60 rows=22136 width=34)
(actual time=10129.162..44151.929 rows=101.00 loops=1)
Join Filter: (cd.cd_demo_sk = c.c_current_cdemo_sk)
Rows Removed by Join Filter: 197184348
Buffers: shared hit=118758 read=27882, temp read=1482
written=11040
-> Gather Merge (cost=125190.47..348899.28 rows=1920800
width=38) (actual time=8741.033..8900.320 rows=34276.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5469 read=10449, temp read=1482
written=11040
-> Sort (cost=124190.45..126191.28 rows=800333 width=38)
(actual time=8499.285..8571.085 rows=11989.67 loops=3)
Sort Key: cd.cd_gender, cd.cd_marital_status,
cd.cd_education_status, cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Sort Method: external merge Disk: 30368kB
Buffers: shared hit=5469 read=10449, temp read=1482
written=11040
Worker 0: Sort Method: external merge Disk: 29104kB
Worker 1: Sort Method: external merge Disk: 28560kB
-> Parallel Seq Scan on customer_demographics cd
(cost=0.00..23831.33 rows=800333 width=38) (actual time=2.220..766.199
rows=640266.67 loops=3)
Buffers: shared hit=5379 read=10449
-> Materialize (cost=36324.15..1051365837.65 rows=22136
width=4) (actual time=0.032..0.363 rows=5752.84 loops=34276)
Storage: Memory Maximum Storage: 289kB
Buffers: shared hit=113289 read=17433
-> Nested Loop (cost=36324.15..1051365726.97 rows=22136
width=4) (actual time=1084.423..1391.426 rows=5753.00 loops=1)
Buffers: shared hit=113289 read=17433
-> Nested Loop (cost=36323.86..1051357424.51
rows=24605 width=8) (actual time=1084.375..1346.642 rows=6409.00 loops=1)
Buffers: shared hit=94062 read=17433
-> Unique (cost=36323.57..36487.60
rows=32807 width=4) (actual time=545.023..574.883 rows=28544.00 loops=1)
Buffers: shared hit=2031 read=11900
-> Sort (cost=36323.57..36405.58
rows=32807 width=4) (actual time=545.019..556.733 rows=33609.00 loops=1)
Sort Key: ss.ss_customer_sk
Sort Method: quicksort Memory:
1537kB
Buffers: shared hit=2031 read=11900
-> Gather
(cost=2683.76..33862.76 rows=32807 width=4) (actual time=33.440..529.429
rows=33609.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2031
read=11900
-> Hash Join
(cost=1683.76..29582.06 rows=13670 width=4) (actual time=37.334..511.304
rows=11203.00 loops=3)
Hash Cond:
(ss.ss_sold_date_sk = d.d_date_sk)
Buffers: shared
hit=2031 read=11900
-> Parallel Seq Scan
on store_sales ss (cost=0.00..24747.68 rows=1200168 width=8) (actual
time=4.599..258.454 rows=960134.67 loops=3)
Buffers: shared
hit=846 read=11900
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=32.247..32.248
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d (cost=0.00..1673.36 rows=832 width=4) (actual time=0.024..31.303
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows
Removed by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_pkey on customer
c (cost=0.29..36766.33 rows=1 width=12) (actual time=0.026..0.026 rows=0.22
loops=28544)
Index Cond: (c_customer_sk =
ss.ss_customer_sk)
Filter: ((ANY (c_customer_sk = (hashed
SubPlan 2).col1)) OR (ANY (c_customer_sk = (hashed SubPlan 4).col1)))
Rows Removed by Filter: 1
Index Searches: 28544
Buffers: shared hit=92031 read=5533
SubPlan 2
-> Gather (cost=2683.76..10471.46
rows=8194 width=4) (actual time=28.354..100.442 rows=8156.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4369
-> Hash Join
(cost=1683.76..8652.06 rows=3414 width=4) (actual time=15.023..74.578
rows=2718.67 loops=3)
Hash Cond:
(ws.ws_sold_date_sk = d_1.d_date_sk)
Buffers: shared hit=4369
-> Parallel Seq Scan on
web_sales ws (cost=0.00..6181.43 rows=299743 width=8) (actual
time=0.007..19.459 rows=239794.67 loops=3)
Buffers: shared
hit=3184
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=14.987..14.987
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_1 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.016..14.672
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
SubPlan 4
-> Gather (cost=2683.76..18287.89
rows=16419 width=4) (actual time=69.799..443.128 rows=16874.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2031
read=5533
-> Hash Join
(cost=1683.76..15645.99 rows=6841 width=4) (actual time=66.236..401.988
rows=5624.67 loops=3)
Hash Cond:
(cs.cs_sold_date_sk = d_2.d_date_sk)
Buffers: shared hit=2031
read=5533
-> Parallel Seq Scan on
catalog_sales cs (cost=0.00..12385.45 rows=600645 width=8) (actual
time=10.580..208.806 rows=480516.00 loops=3)
Buffers: shared
hit=846 read=5533
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=55.629..55.630
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_2 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.021..53.491
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_address_pkey on
customer_address ca (cost=0.29..0.34 rows=1 width=4) (actual time=0.006..0.006
rows=0.90 loops=6409)
Index Cond: (ca_address_sk =
c.c_current_addr_sk)
Filter: ((ca_county)::text = ANY ('{"Rush
County","Toole County","Jefferson County","Dona Ana County","La Porte
County"}'::text[]))
Rows Removed by Filter: 0
Index Searches: 6409
Buffers: shared hit=19227
Settings: enable_hashagg = 'off'
Planning:
Buffers: shared hit=316
Planning Time: 2.101 ms
Execution Time: 44168.266 ms
(101 rows)
====
# 3. Setting: enable_hashagg = ON, enable_groupagg = OFF
# HashAgg was used
# Execution time was 1 sec
====
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1051399723.16..1051399723.41 rows=100 width=66) (actual
time=1096.948..1097.642 rows=100.00 loops=1)
Buffers: shared hit=133811 read=19929
-> Sort (cost=1051399723.16..1051399778.50 rows=22136 width=66) (actual
time=1096.946..1097.633 rows=100.00 loops=1)
Sort Key: cd.cd_gender, cd.cd_marital_status, cd.cd_education_status,
cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Sort Method: top-N heapsort Memory: 49kB
Buffers: shared hit=133811 read=19929
-> HashAggregate (cost=1051398655.78..1051398877.14 rows=22136
width=66) (actual time=1087.378..1090.664 rows=5740.00 loops=1)
Group Key: cd.cd_gender, cd.cd_marital_status,
cd.cd_education_status, cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Batches: 1 Memory Usage: 1049kB
Buffers: shared hit=133805 read=19929
-> Nested Loop (cost=33945.79..1051398268.40 rows=22136
width=34) (actual time=702.467..1076.172 rows=5753.00 loops=1)
Buffers: shared hit=133805 read=19929
-> Nested Loop (cost=33945.36..1051363512.22 rows=22136
width=4) (actual time=702.451..958.299 rows=5753.00 loops=1)
Buffers: shared hit=114045 read=16677
-> Nested Loop (cost=33945.07..1051355209.75
rows=24605 width=8) (actual time=702.387..921.978 rows=6409.00 loops=1)
Buffers: shared hit=94818 read=16677
-> HashAggregate (cost=33944.78..34272.85
rows=32807 width=4) (actual time=354.993..373.328 rows=28544.00 loops=1)
Group Key: ss.ss_customer_sk
Batches: 1 Memory Usage: 2329kB
Buffers: shared hit=2223 read=11708
-> Gather (cost=2683.76..33862.76
rows=32807 width=4) (actual time=17.988..336.471 rows=33609.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2223 read=11708
-> Hash Join
(cost=1683.76..29582.06 rows=13670 width=4) (actual time=16.964..337.747
rows=11203.00 loops=3)
Hash Cond:
(ss.ss_sold_date_sk = d.d_date_sk)
Buffers: shared hit=2223
read=11708
-> Parallel Seq Scan on
store_sales ss (cost=0.00..24747.68 rows=1200168 width=8) (actual
time=0.394..157.640 rows=960134.67 loops=3)
Buffers: shared
hit=1038 read=11708
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=16.492..16.493
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d (cost=0.00..1673.36 rows=832 width=4) (actual time=0.021..15.934
rows=848.00 loops=3)
Filter: ((d_moy
>= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed by
Filter: 72201
Buffers: shared
hit=1185
-> Index Scan using customer_pkey on customer
c (cost=0.29..36766.33 rows=1 width=12) (actual time=0.018..0.018 rows=0.22
loops=28544)
Index Cond: (c_customer_sk =
ss.ss_customer_sk)
Filter: ((ANY (c_customer_sk = (hashed
SubPlan 2).col1)) OR (ANY (c_customer_sk = (hashed SubPlan 4).col1)))
Rows Removed by Filter: 1
Index Searches: 28544
Buffers: shared hit=92595 read=4969
SubPlan 2
-> Gather (cost=2683.76..10471.46
rows=8194 width=4) (actual time=72.010..153.327 rows=8156.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4369
-> Hash Join
(cost=1683.76..8652.06 rows=3414 width=4) (actual time=44.282..111.007
rows=2718.67 loops=3)
Hash Cond:
(ws.ws_sold_date_sk = d_1.d_date_sk)
Buffers: shared hit=4369
-> Parallel Seq Scan on
web_sales ws (cost=0.00..6181.43 rows=299743 width=8) (actual
time=0.006..22.145 rows=239794.67 loops=3)
Buffers: shared
hit=3184
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=44.238..44.239
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_1 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.015..42.452
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
SubPlan 4
-> Gather (cost=2683.76..18287.89
rows=16419 width=4) (actual time=34.268..183.058 rows=16874.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2595
read=4969
-> Hash Join
(cost=1683.76..15645.99 rows=6841 width=4) (actual time=30.132..174.422
rows=5624.67 loops=3)
Hash Cond:
(cs.cs_sold_date_sk = d_2.d_date_sk)
Buffers: shared hit=2595
read=4969
-> Parallel Seq Scan on
catalog_sales cs (cost=0.00..12385.45 rows=600645 width=8) (actual
time=3.432..69.160 rows=480516.00 loops=3)
Buffers: shared
hit=1410 read=4969
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=26.671..26.672
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_2 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.019..26.377
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_address_pkey on
customer_address ca (cost=0.29..0.34 rows=1 width=4) (actual time=0.005..0.005
rows=0.90 loops=6409)
Index Cond: (ca_address_sk =
c.c_current_addr_sk)
Filter: ((ca_county)::text = ANY ('{"Rush
County","Toole County","Jefferson County","Dona Ana County","La Porte
County"}'::text[]))
Rows Removed by Filter: 0
Index Searches: 6409
Buffers: shared hit=19227
-> Index Scan using customer_demographics_pkey on
customer_demographics cd (cost=0.43..1.57 rows=1 width=38) (actual
time=0.019..0.019 rows=1.00 loops=5753)
Index Cond: (cd_demo_sk = c.c_current_cdemo_sk)
Index Searches: 5753
Buffers: shared hit=19760 read=3252
Settings: enable_groupagg = 'off'
Planning:
Buffers: shared hit=316
Planning Time: 21.057 ms
Execution Time: 1105.672 ms
(91 rows)
====
# 4. Setting: enable_hashagg = OFF, enable_groupagg = OFF
# GroupAgg was used
# Execution time was 38 sec
====
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=161514.62..7793138.25 rows=100 width=66) (actual
time=7611.145..38354.564 rows=100.00 loops=1)
Buffers: shared hit=119783 read=26857, temp read=1482 written=11039
-> GroupAggregate (cost=161514.62..1689497722.34 rows=22136 width=66)
(actual time=7611.128..38354.435 rows=100.00 loops=1)
Group Key: cd.cd_gender, cd.cd_marital_status, cd.cd_education_status,
cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Buffers: shared hit=119783 read=26857, temp read=1482 written=11039
-> Nested Loop (cost=161514.62..1689497113.60 rows=22136 width=34)
(actual time=7510.502..38353.114 rows=101.00 loops=1)
Join Filter: (cd.cd_demo_sk = c.c_current_cdemo_sk)
Rows Removed by Join Filter: 197184348
Buffers: shared hit=119783 read=26857, temp read=1482
written=11039
-> Gather Merge (cost=125190.47..348899.28 rows=1920800
width=38) (actual time=6363.769..6499.733 rows=34276.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5833 read=10085, temp read=1482
written=11039
-> Sort (cost=124190.45..126191.28 rows=800333 width=38)
(actual time=6163.457..6232.614 rows=11966.67 loops=3)
Sort Key: cd.cd_gender, cd.cd_marital_status,
cd.cd_education_status, cd.cd_income_band, cd.cd_credit_rating, cd.cd_dep_count
Sort Method: external merge Disk: 28992kB
Buffers: shared hit=5833 read=10085, temp read=1482
written=11039
Worker 0: Sort Method: external merge Disk: 28752kB
Worker 1: Sort Method: external merge Disk: 30280kB
-> Parallel Seq Scan on customer_demographics cd
(cost=0.00..23831.33 rows=800333 width=38) (actual time=0.737..321.713
rows=640266.67 loops=3)
Buffers: shared hit=5743 read=10085
-> Materialize (cost=36324.15..1051365837.65 rows=22136
width=4) (actual time=0.025..0.325 rows=5752.84 loops=34276)
Storage: Memory Maximum Storage: 289kB
Buffers: shared hit=113950 read=16772
-> Nested Loop (cost=36324.15..1051365726.97 rows=22136
width=4) (actual time=865.332..1133.116 rows=5753.00 loops=1)
Buffers: shared hit=113950 read=16772
-> Nested Loop (cost=36323.86..1051357424.51
rows=24605 width=8) (actual time=863.365..1073.375 rows=6409.00 loops=1)
Buffers: shared hit=94723 read=16772
-> Unique (cost=36323.57..36487.60
rows=32807 width=4) (actual time=393.964..415.541 rows=28544.00 loops=1)
Buffers: shared hit=2128 read=11803
-> Sort (cost=36323.57..36405.58
rows=32807 width=4) (actual time=393.957..402.146 rows=33609.00 loops=1)
Sort Key: ss.ss_customer_sk
Sort Method: quicksort Memory:
1537kB
Buffers: shared hit=2128 read=11803
-> Gather
(cost=2683.76..33862.76 rows=32807 width=4) (actual time=80.713..349.762
rows=33609.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2128
read=11803
-> Hash Join
(cost=1683.76..29582.06 rows=13670 width=4) (actual time=44.679..308.141
rows=11203.00 loops=3)
Hash Cond:
(ss.ss_sold_date_sk = d.d_date_sk)
Buffers: shared
hit=2128 read=11803
-> Parallel Seq Scan
on store_sales ss (cost=0.00..24747.68 rows=1200168 width=8) (actual
time=9.033..125.081 rows=960134.67 loops=3)
Buffers: shared
hit=943 read=11803
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=35.605..35.606
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d (cost=0.00..1673.36 rows=832 width=4) (actual time=0.036..35.172
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows
Removed by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_pkey on customer
c (cost=0.29..36766.33 rows=1 width=12) (actual time=0.022..0.022 rows=0.22
loops=28544)
Index Cond: (c_customer_sk =
ss.ss_customer_sk)
Filter: ((ANY (c_customer_sk = (hashed
SubPlan 2).col1)) OR (ANY (c_customer_sk = (hashed SubPlan 4).col1)))
Rows Removed by Filter: 1
Index Searches: 28544
Buffers: shared hit=92595 read=4969
SubPlan 2
-> Gather (cost=2683.76..10471.46
rows=8194 width=4) (actual time=39.476..124.540 rows=8156.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4369
-> Hash Join
(cost=1683.76..8652.06 rows=3414 width=4) (actual time=20.456..94.877
rows=2718.67 loops=3)
Hash Cond:
(ws.ws_sold_date_sk = d_1.d_date_sk)
Buffers: shared hit=4369
-> Parallel Seq Scan on
web_sales ws (cost=0.00..6181.43 rows=299743 width=8) (actual
time=0.020..28.916 rows=239794.67 loops=3)
Buffers: shared
hit=3184
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=20.396..20.397
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_1 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.019..20.094
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
SubPlan 4
-> Gather (cost=2683.76..18287.89
rows=16419 width=4) (actual time=56.442..326.649 rows=16874.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2595
read=4969
-> Hash Join
(cost=1683.76..15645.99 rows=6841 width=4) (actual time=64.537..301.087
rows=5624.67 loops=3)
Hash Cond:
(cs.cs_sold_date_sk = d_2.d_date_sk)
Buffers: shared hit=2595
read=4969
-> Parallel Seq Scan on
catalog_sales cs (cost=0.00..12385.45 rows=600645 width=8) (actual
time=11.234..130.365 rows=480516.00 loops=3)
Buffers: shared
hit=1410 read=4969
-> Hash
(cost=1673.36..1673.36 rows=832 width=4) (actual time=53.242..53.243
rows=848.00 loops=3)
Buckets: 1024
Batches: 1 Memory Usage: 38kB
Buffers: shared
hit=1185
-> Seq Scan on
date_dim d_2 (cost=0.00..1673.36 rows=832 width=4) (actual time=0.025..52.328
rows=848.00 loops=3)
Filter:
((d_moy >= 1) AND (d_moy <= 4) AND (d_year = 2002))
Rows Removed
by Filter: 72201
Buffers:
shared hit=1185
-> Index Scan using customer_address_pkey on
customer_address ca (cost=0.29..0.34 rows=1 width=4) (actual time=0.009..0.009
rows=0.90 loops=6409)
Index Cond: (ca_address_sk =
c.c_current_addr_sk)
Filter: ((ca_county)::text = ANY ('{"Rush
County","Toole County","Jefferson County","Dona Ana County","La Porte
County"}'::text[]))
Rows Removed by Filter: 0
Index Searches: 6409
Buffers: shared hit=19227
Settings: enable_hashagg = 'off', enable_groupagg = 'off'
Planning:
Buffers: shared hit=316
Planning Time: 10.736 ms
Execution Time: 38358.196 ms
(101 rows)
====