On Sat, Aug 2, 2014 at 1:40 PM, Jeff Davis <pg...@j-davis.com> wrote:
> This is a prerequisite for memory-bounded HashAgg, which I intend to
> submit for the next CF.

FWIW, I think that's a good project. A large number of these TPC-H
queries used HashAggs when I checked, on a moderate sized sample TPC-H
database:

http://examples.citusdata.com/tpch_queries.html

(I found these queries at random from Googling, but happened to have a
~2GB TPC-H database on my laptop). I attach EXPLAIN ANALYZE ouput for
each, as shown on master. From this admittedly unscientific random
sampling, 5 out of 8 query plans have a hash aggregate node. TPC-H is
a benchmark that Postgres does not tend to do too well on [1], and I
suspect that this has something to do with it; lower work_mem settings
will spook the optimizer into using a group aggregate within
choose_hashed_grouping(). Of course, in order to get the benefit of
your patch, that will need to be adjusted. I think that part is
surprisingly straightforward, though.

[1] https://wiki.postgresql.org/wiki/TPC-H
-- 
Peter Geoghegan
                                                            QUERY PLAN          
                                                   
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=410317.35..410317.37 rows=6 width=25) (actual 
time=10424.815..10424.816 rows=4 loops=1)
   Sort Key: l_returnflag, l_linestatus
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=410317.11..410317.27 rows=6 width=25) (actual 
time=10424.769..10424.775 rows=4 loops=1)
         Group Key: l_returnflag, l_linestatus
         ->  Seq Scan on lineitem  (cost=0.00..173620.19 rows=5917423 width=25) 
(actual time=0.017..1179.993 rows=5916591 loops=1)
               Filter: (l_shipdate <= '1998-09-02 00:00:00'::timestamp without 
time zone)
               Rows Removed by Filter: 84624
 Planning time: 0.380 ms
 Execution time: 10424.894 ms
(10 rows)

                                                                       QUERY 
PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=267628.27..267628.32 rows=20 width=24) (actual 
time=2227.928..2227.931 rows=20 loops=1)
   ->  Sort  (cost=267628.27..268428.35 rows=320034 width=24) (actual 
time=2227.927..2227.929 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - 
lineitem.l_discount)))), orders.o_orderdate
         Sort Method: top-N heapsort  Memory: 26kB
         ->  HashAggregate  (cost=255111.85..259112.28 rows=320034 width=24) 
(actual time=2222.023..2225.916 rows=11620 loops=1)
               Group Key: lineitem.l_orderkey, orders.o_orderdate, 
orders.o_shippriority
               ->  Hash Join  (cost=61347.05..250311.34 rows=320034 width=24) 
(actual time=511.401..2197.516 rows=30519 loops=1)
                     Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                     ->  Seq Scan on lineitem  (cost=0.00..173620.19 
rows=3238338 width=16) (actual time=0.007..880.520 rows=3241776 loops=1)
                           Filter: (l_shipdate > '1995-03-15'::date)
                           Rows Removed by Filter: 2759439
                     ->  Hash  (cost=59494.05..59494.05 rows=148240 width=12) 
(actual time=511.151..511.151 rows=147126 loops=1)
                           Buckets: 16384  Batches: 1  Memory Usage: 6322kB
                           ->  Hash Join  (cost=5841.56..59494.05 rows=148240 
width=12) (actual time=32.502..482.473 rows=147126 loops=1)
                                 Hash Cond: (orders.o_custkey = 
customer.c_custkey)
                                 ->  Seq Scan on orders  (cost=0.00..43975.00 
rows=728452 width=16) (actual time=0.006..226.220 rows=727305 loops=1)
                                       Filter: (o_orderdate < 
'1995-03-15'::date)
                                       Rows Removed by Filter: 772695
                                 ->  Hash  (cost=5460.00..5460.00 rows=30525 
width=4) (actual time=32.464..32.464 rows=30142 loops=1)
                                       Buckets: 4096  Batches: 1  Memory Usage: 
1060kB
                                       ->  Seq Scan on customer  
(cost=0.00..5460.00 rows=30525 width=4) (actual time=0.006..28.441 rows=30142 
loops=1)
                                             Filter: (c_mktsegment = 
'BUILDING'::bpchar)
                                             Rows Removed by Filter: 119858
 Planning time: 0.483 ms
 Execution time: 2230.504 ms
(25 rows)

                                                                         QUERY 
PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=194199.92..194199.98 rows=25 width=38) (actual 
time=675.182..675.183 rows=5 loops=1)
   Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - 
lineitem.l_discount))))
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=194199.03..194199.34 rows=25 width=38) (actual 
time=675.170..675.171 rows=5 loops=1)
         Group Key: nation.n_name
         ->  Hash Join  (cost=6797.46..194125.81 rows=7322 width=38) (actual 
time=37.295..666.937 rows=7243 loops=1)
               Hash Cond: ((lineitem.l_suppkey = supplier.s_suppkey) AND 
(customer.c_nationkey = supplier.s_nationkey))
               ->  Nested Loop  (cost=6325.46..188087.98 rows=183087 width=50) 
(actual time=34.245..581.860 rows=184082 loops=1)
                     ->  Hash Join  (cost=6325.03..55651.73 rows=45763 
width=38) (actual time=34.224..379.780 rows=46008 loops=1)
                           Hash Cond: (orders.o_custkey = customer.c_custkey)
                           ->  Seq Scan on orders  (cost=0.00..47725.00 
rows=228813 width=8) (actual time=0.005..210.170 rows=227597 loops=1)
                                 Filter: ((o_orderdate >= '1994-01-01'::date) 
AND (o_orderdate < '1995-01-01 00:00:00'::timestamp without time zone))
                                 Rows Removed by Filter: 1272403
                           ->  Hash  (cost=5950.03..5950.03 rows=30000 
width=38) (actual time=34.197..34.197 rows=30183 loops=1)
                                 Buckets: 4096  Batches: 1  Memory Usage: 2123kB
                                 ->  Hash Join  (cost=2.53..5950.03 rows=30000 
width=38) (actual time=0.026..29.180 rows=30183 loops=1)
                                       Hash Cond: (customer.c_nationkey = 
nation.n_nationkey)
                                       ->  Seq Scan on customer  
(cost=0.00..5085.00 rows=150000 width=8) (actual time=0.002..12.259 rows=150000 
loops=1)
                                       ->  Hash  (cost=2.47..2.47 rows=5 
width=30) (actual time=0.015..0.015 rows=5 loops=1)
                                             Buckets: 1024  Batches: 1  Memory 
Usage: 1kB
                                             ->  Hash Join  (cost=1.07..2.47 
rows=5 width=30) (actual time=0.012..0.014 rows=5 loops=1)
                                                   Hash Cond: 
(nation.n_regionkey = region.r_regionkey)
                                                   ->  Seq Scan on nation  
(cost=0.00..1.25 rows=25 width=34) (actual time=0.002..0.004 rows=25 loops=1)
                                                   ->  Hash  (cost=1.06..1.06 
rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1)
                                                         Buckets: 1024  
Batches: 1  Memory Usage: 1kB
                                                         ->  Seq Scan on region 
 (cost=0.00..1.06 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
                                                               Filter: (r_name 
= 'ASIA'::bpchar)
                                                               Rows Removed by 
Filter: 4
                     ->  Index Scan using l_orderkey_idx on lineitem  
(cost=0.43..2.74 rows=15 width=20) (actual time=0.003..0.004 rows=4 loops=46008)
                           Index Cond: (l_orderkey = orders.o_orderkey)
               ->  Hash  (cost=322.00..322.00 rows=10000 width=8) (actual 
time=3.028..3.028 rows=10000 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 391kB
                     ->  Seq Scan on supplier  (cost=0.00..322.00 rows=10000 
width=8) (actual time=0.005..1.698 rows=10000 loops=1)
 Planning time: 1.136 ms
 Execution time: 675.249 ms
(35 rows)

                                                                                
                 QUERY PLAN                                                     
                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=234212.12..234212.14 rows=1 width=12) (actual 
time=1091.744..1091.745 rows=1 loops=1)
   ->  Seq Scan on lineitem  (cost=0.00..233632.34 rows=115957 width=12) 
(actual time=0.014..1059.285 rows=114160 loops=1)
         Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < 
'1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.05) 
AND (l_discount <= 0.07) AND (l_quantity < 24::numeric))
         Rows Removed by Filter: 5887055
 Planning time: 0.072 ms
 Execution time: 1091.763 ms
(6 rows)

                                                                            
QUERY PLAN                                                                      
      
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=252248.87..252248.92 rows=20 width=182) (actual 
time=1864.614..1864.618 rows=20 loops=1)
   ->  Sort  (cost=252248.87..252394.97 rows=58442 width=182) (actual 
time=1864.613..1864.616 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - 
lineitem.l_discount))))
         Sort Method: top-N heapsort  Memory: 30kB
         ->  HashAggregate  (cost=249963.22..250693.75 rows=58442 width=182) 
(actual time=1836.884..1855.133 rows=37967 loops=1)
               Group Key: customer.c_custkey, customer.c_name, 
customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, 
customer.c_comment
               ->  Hash Join  (cost=55425.61..248502.17 rows=58442 width=182) 
(actual time=258.113..1683.082 rows=114705 loops=1)
                     Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                     ->  Hash Join  (cost=55424.05..247697.03 rows=58442 
width=160) (actual time=258.097..1656.935 rows=114705 loops=1)
                           Hash Cond: (orders.o_custkey = customer.c_custkey)
                           ->  Hash Join  (cost=48464.05..239349.03 rows=58442 
width=16) (actual time=200.985..1518.488 rows=114705 loops=1)
                                 Hash Cond: (lineitem.l_orderkey = 
orders.o_orderkey)
                                 ->  Seq Scan on lineitem  
(cost=0.00..173620.19 rows=1482700 width=16) (actual time=0.009..865.622 
rows=1478870 loops=1)
                                       Filter: (l_returnflag = 'R'::bpchar)
                                       Rows Removed by Filter: 4522345
                                 ->  Hash  (cost=47725.00..47725.00 rows=59124 
width=8) (actual time=200.939..200.939 rows=57069 loops=1)
                                       Buckets: 8192  Batches: 1  Memory Usage: 
2230kB
                                       ->  Seq Scan on orders  
(cost=0.00..47725.00 rows=59124 width=8) (actual time=0.005..193.035 rows=57069 
loops=1)
                                             Filter: ((o_orderdate >= 
'1993-10-01'::date) AND (o_orderdate < '1994-01-01 00:00:00'::timestamp without 
time zone))
                                             Rows Removed by Filter: 1442931
                           ->  Hash  (cost=5085.00..5085.00 rows=150000 
width=148) (actual time=57.066..57.066 rows=150000 loops=1)
                                 Buckets: 16384  Batches: 1  Memory Usage: 
26813kB
                                 ->  Seq Scan on customer  (cost=0.00..5085.00 
rows=150000 width=148) (actual time=0.002..24.265 rows=150000 loops=1)
                     ->  Hash  (cost=1.25..1.25 rows=25 width=30) (actual 
time=0.008..0.008 rows=25 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 2kB
                           ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 
width=30) (actual time=0.002..0.004 rows=25 loops=1)
 Planning time: 0.515 ms
 Execution time: 1864.759 ms
(28 rows)

                                                                                
                                           QUERY PLAN                           
                                                                                
                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=241770.28..241770.29 rows=1 width=41) (actual 
time=1331.551..1331.551 rows=2 loops=1)
   Sort Key: lineitem.l_shipmode
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=241770.26..241770.27 rows=1 width=41) (actual 
time=1331.519..1331.519 rows=2 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Nested Loop  (cost=0.43..241752.36 rows=1023 width=41) (actual 
time=0.068..1313.268 rows=30988 loops=1)
               ->  Seq Scan on lineitem  (cost=0.00..233632.34 rows=1023 
width=36) (actual time=0.061..1232.505 rows=30988 loops=1)
                     Filter: ((l_shipmode = ANY ('{MAIL,SHIP}'::text[])) AND 
(l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND 
(l_receiptdate >= '1994-01-01'::date) AND (l_receiptdate < '1995-01-01 
00:00:00'::timestamp without time zone))
                     Rows Removed by Filter: 5970227
               ->  Index Scan using o_orderkey_idx on orders  (cost=0.43..7.93 
rows=1 width=13) (actual time=0.002..0.002 rows=1 loops=30988)
                     Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 0.225 ms
 Execution time: 1331.586 ms
(13 rows)

                                                            QUERY PLAN          
                                                  
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=200018.78..200018.79 rows=1 width=33) (actual 
time=1052.716..1052.716 rows=1 loops=1)
   ->  Hash Join  (cost=8597.00..198712.78 rows=74628 width=33) (actual 
time=60.716..1003.388 rows=75983 loops=1)
         Hash Cond: (lineitem.l_partkey = part.p_partkey)
         ->  Seq Scan on lineitem  (cost=0.00..188623.22 rows=74628 width=16) 
(actual time=0.028..873.368 rows=75983 loops=1)
               Filter: ((l_shipdate >= '1995-09-01'::date) AND (l_shipdate < 
'1995-10-01 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 5925232
         ->  Hash  (cost=6097.00..6097.00 rows=200000 width=25) (actual 
time=60.662..60.662 rows=200000 loops=1)
               Buckets: 32768  Batches: 1  Memory Usage: 11552kB
               ->  Seq Scan on part  (cost=0.00..6097.00 rows=200000 width=25) 
(actual time=0.003..34.523 rows=200000 loops=1)
 Planning time: 0.269 ms
 Execution time: 1052.747 ms
(11 rows)

                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          QUERY 
PLAN                                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=69127.72..69127.73 rows=1 width=12) (actual 
time=83.373..83.373 rows=1 loops=1)
   ->  Nested Loop  (cost=4.61..69127.66 rows=8 width=12) (actual 
time=0.243..83.235 rows=121 loops=1)
         ->  Seq Scan on part  (cost=0.00..12597.00 rows=481 width=30) (actual 
time=0.023..59.944 rows=485 loops=1)
               Filter: ((p_size >= 1) AND (((p_brand = 'Brand#12'::bpchar) AND 
(p_container = ANY ('{"SM CASE","SM BOX","SM PACK","SM PKG"}'::bpchar[])) AND 
(p_size <= 5)) OR ((p_brand = 'Brand#23'::bpchar) AND (p_container = ANY 
('{"MED BAG","MED BOX","MED PKG","MED PACK"}'::bpchar[])) AND (p_size <= 10)) 
OR ((p_brand = 'Brand#34'::bpchar) AND (p_container = ANY ('{"LG CASE","LG 
BOX","LG PACK","LG PKG"}'::bpchar[])) AND (p_size <= 15))))
               Rows Removed by Filter: 199515
         ->  Bitmap Heap Scan on lineitem  (cost=4.61..117.52 rows=1 width=21) 
(actual time=0.042..0.046 rows=0 loops=485)
               Recheck Cond: (l_partkey = part.p_partkey)
               Filter: ((l_shipmode = ANY ('{AIR,"AIR REG"}'::text[])) AND 
(l_shipinstruct = 'DELIVER IN PERSON'::text) AND (((l_quantity >= 1::numeric) 
AND (l_quantity <= 11::numeric)) OR ((l_quantity >= 10::numeric) AND 
(l_quantity <= 20::numeric)) OR ((l_quantity >= 20::numeric) AND (l_quantity <= 
30::numeric))) AND (((part.p_brand = 'Brand#12'::bpchar) AND (part.p_container 
= ANY ('{"SM CASE","SM BOX","SM PACK","SM PKG"}'::bpchar[])) AND (l_quantity >= 
1::numeric) AND (l_quantity <= 11::numeric) AND (part.p_size <= 5)) OR 
((part.p_brand = 'Brand#23'::bpchar) AND (part.p_container = ANY ('{"MED 
BAG","MED BOX","MED PKG","MED PACK"}'::bpchar[])) AND (l_quantity >= 
10::numeric) AND (l_quantity <= 20::numeric) AND (part.p_size <= 10)) OR 
((part.p_brand = 'Brand#34'::bpchar) AND (part.p_container = ANY ('{"LG 
CASE","LG BOX","LG PACK","LG PKG"}'::bpchar[])) AND (l_quantity >= 20::numeric) 
AND (l_quantity <= 30::numeric) AND (part.p_size <= 15))))
               Rows Removed by Filter: 29
               Heap Blocks: exact=14299
               ->  Bitmap Index Scan on l_partkey_idx  (cost=0.00..4.61 rows=30 
width=0) (actual time=0.007..0.007 rows=29 loops=485)
                     Index Cond: (l_partkey = part.p_partkey)
 Planning time: 0.399 ms
 Execution time: 83.411 ms
(14 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to