On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas <[email protected]> wrote:
> On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro
> <[email protected]> wrote:
>> Thank you for the original pointer and the commit. Everything here
>> seems to make intuitive sense and the accompanying throw-away tests
>> that I posted above seem to produce sensible results except in some
>> cases that we discussed, so I think this is progress. There is still
>> something pretty funny about the cardinality estimates for TPCH Q21
>> which I haven't grokked though. I suspect it is crafted to look for a
>> technique we don't know (an ancient challenge set by some long retired
>> database gurus back in 1992 that their RDBMSs know how to solve,
>> hopefully not in the manner of a certain car manufacturer's air
>> pollution tests), but I haven't yet obtained enough round tuits to dig
>> further. I will, though.
>
> Hmm, do you have an example of the better but still-funky estimates
> handy? Like an EXPLAIN plan?
Sure. Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
indexes[1]. There's a version from HEAD with and without commit
7ca25b7d.
[1]
https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh
--
Thomas Munro
http://www.enterprisedb.com
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=544020.68..544020.69 rows=1 width=34) (actual
time=5070.235..5070.253 rows=100 loops=1)
-> Sort (cost=544020.68..544020.69 rows=1 width=34) (actual
time=5070.233..5070.241 rows=100 loops=1)
Sort Key: (count(*)) DESC, supplier.s_name
Sort Method: top-N heapsort Memory: 39kB
-> GroupAggregate (cost=544020.65..544020.67 rows=1 width=34)
(actual time=5061.289..5068.050 rows=1194 loops=1)
Group Key: supplier.s_name
-> Sort (cost=544020.65..544020.66 rows=1 width=26) (actual
time=5061.275..5063.360 rows=11887 loops=1)
Sort Key: supplier.s_name
Sort Method: quicksort Memory: 1406kB
-> Nested Loop (cost=1752.62..544020.64 rows=1 width=26)
(actual time=3.134..4926.365 rows=11887 loops=1)
-> Nested Loop Semi Join (cost=1752.19..544015.98
rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1)
-> Gather (cost=1751.75..544010.27 rows=1
width=34) (actual time=2.909..4354.282 rows=40387 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Anti Join
(cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283
rows=13462 loops=3)
-> Hash Join
(cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800
rows=151103 loops=3)
Hash Cond: (l1.l_suppkey =
supplier.s_suppkey)
-> Parallel Seq Scan on
lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual
time=0.046..2742.912 rows=3792542 loops=3)
Filter: (l_receiptdate
> l_commitdate)
Rows Removed by
Filter: 2206328
-> Hash
(cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194
loops=3)
Buckets: 2048
Batches: 1 Memory Usage: 91kB
-> Nested Loop
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194
loops=3)
-> Seq Scan on
nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1
loops=3)
Filter:
(n_name = 'ALGERIA'::bpchar)
Rows
Removed by Filter: 24
-> Bitmap Heap
Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual
time=0.460..2.809 rows=1194 loops=3)
Recheck
Cond: (s_nationkey = nation.n_nationkey)
Heap
Blocks: exact=564
-> Bitmap
Index Scan on idx_supplier_nation_key (cost=0.00..25.29 rows=1200 width=0)
(actual time=0.350..0.350 rows=1194 loops=3)
Index Cond: (s_nationkey = nation.n_nationkey)
-> Index Scan using
idx_lineitem_orderkey on lineitem l3 (cost=0.44..5.81 rows=15 width=8) (actual
time=0.007..0.007 rows=1 loops=453308)
Index Cond: (l_orderkey =
l1.l_orderkey)
Filter: ((l_receiptdate >
l_commitdate) AND (l_suppkey <> l1.l_suppkey))
Rows Removed by Filter: 1
-> Index Scan using idx_lineitem_orderkey on
lineitem l2 (cost=0.44..5.70 rows=45 width=8) (actual time=0.005..0.005 rows=1
loops=40387)
Index Cond: (l_orderkey = l1.l_orderkey)
Filter: (l_suppkey <> l1.l_suppkey)
Rows Removed by Filter: 1
-> Index Scan using orders_pkey on orders
(cost=0.43..4.66 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=24138)
Index Cond: (o_orderkey = l1.l_orderkey)
Filter: (o_orderstatus = 'F'::bpchar)
Rows Removed by Filter: 1
Planning time: 4.473 ms
Execution time: 5070.780 ms
(45 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=928256.19..928256.20 rows=1 width=34) (actual
time=10481.473..10481.495 rows=100 loops=1)
-> Sort (cost=928256.19..928256.20 rows=1 width=34) (actual
time=10481.472..10481.481 rows=100 loops=1)
Sort Key: (count(*)) DESC, supplier.s_name
Sort Method: top-N heapsort Memory: 39kB
-> GroupAggregate (cost=928256.16..928256.18 rows=1 width=34)
(actual time=10471.441..10478.748 rows=1194 loops=1)
Group Key: supplier.s_name
-> Sort (cost=928256.16..928256.17 rows=1 width=26) (actual
time=10469.949..10472.172 rows=11887 loops=1)
Sort Key: supplier.s_name
Sort Method: quicksort Memory: 1406kB
-> Nested Loop Anti Join (cost=172401.80..928256.15
rows=1 width=26) (actual time=2663.307..10340.567 rows=11887 loops=1)
-> Gather (cost=172401.37..928250.19 rows=1
width=34) (actual time=2651.873..8386.936 rows=210800 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Semi Join
(cost=171401.37..927250.09 rows=1 width=34) (actual time=2647.954..8515.304
rows=70267 loops=3)
Join Filter: (l2.l_suppkey <>
l1.l_suppkey)
Rows Removed by Join Filter: 18351
-> Hash Join
(cost=171400.93..624463.20 rows=48771 width=38) (actual time=2647.713..6984.199
rows=72959 loops=3)
Hash Cond: (l1.l_orderkey =
orders.o_orderkey)
-> Hash Join
(cost=751.32..442412.33 rows=99981 width=34) (actual time=18.468..3552.191
rows=151103 loops=3)
Hash Cond: (l1.l_suppkey =
supplier.s_suppkey)
-> Parallel Seq Scan on
lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual
time=0.047..2796.852 rows=3792542 loops=3)
Filter: (l_receiptdate
> l_commitdate)
Rows Removed by
Filter: 2206328
-> Hash
(cost=736.32..736.32 rows=1200 width=30) (actual time=18.222..18.222 rows=1194
loops=3)
Buckets: 2048
Batches: 1 Memory Usage: 91kB
-> Nested Loop
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.986..17.583 rows=1194
loops=3)
-> Seq Scan on
nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.020..0.028 rows=1
loops=3)
Filter:
(n_name = 'ALGERIA'::bpchar)
Rows
Removed by Filter: 24
-> Bitmap Heap
Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual
time=0.958..17.108 rows=1194 loops=3)
Recheck
Cond: (s_nationkey = nation.n_nationkey)
Heap
Blocks: exact=564
-> Bitmap
Index Scan on idx_supplier_nation_key (cost=0.00..25.29 rows=1200 width=0)
(actual time=0.829..0.829 rows=1194 loops=3)
Index Cond: (s_nationkey = nation.n_nationkey)
-> Hash
(cost=134635.25..134635.25 rows=2195149 width=4) (actual
time=2616.583..2616.583 rows=2192233 loops=3)
Buckets: 131072 Batches: 32
Memory Usage: 3440kB
-> Seq Scan on orders
(cost=0.00..134635.25 rows=2195149 width=4) (actual time=0.026..1883.496
rows=2192233 loops=3)
Filter: (o_orderstatus
= 'F'::bpchar)
Rows Removed by
Filter: 2307767
-> Index Scan using
idx_lineitem_orderkey on lineitem l2 (cost=0.44..5.65 rows=45 width=8) (actual
time=0.020..0.020 rows=1 loops=218876)
Index Cond: (l_orderkey =
orders.o_orderkey)
-> Index Scan using idx_lineitem_orderkey on
lineitem l3 (cost=0.44..5.81 rows=15 width=8) (actual time=0.009..0.009 rows=1
loops=210800)
Index Cond: (l_orderkey = l1.l_orderkey)
Filter: ((l_receiptdate > l_commitdate) AND
(l_suppkey <> l1.l_suppkey))
Rows Removed by Filter: 1
Planning time: 17.594 ms
Execution time: 10483.094 ms
(47 rows)