On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> Thank you for the original pointer and the commit. Everything here >> seems to make intuitive sense and the accompanying throw-away tests >> that I posted above seem to produce sensible results except in some >> cases that we discussed, so I think this is progress. There is still >> something pretty funny about the cardinality estimates for TPCH Q21 >> which I haven't grokked though. I suspect it is crafted to look for a >> technique we don't know (an ancient challenge set by some long retired >> database gurus back in 1992 that their RDBMSs know how to solve, >> hopefully not in the manner of a certain car manufacturer's air >> pollution tests), but I haven't yet obtained enough round tuits to dig >> further. I will, though. > > Hmm, do you have an example of the better but still-funky estimates > handy? Like an EXPLAIN plan?
Sure. Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few indexes[1]. There's a version from HEAD with and without commit 7ca25b7d. [1] https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh -- Thomas Munro http://www.enterprisedb.com
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=544020.68..544020.69 rows=1 width=34) (actual time=5070.235..5070.253 rows=100 loops=1) -> Sort (cost=544020.68..544020.69 rows=1 width=34) (actual time=5070.233..5070.241 rows=100 loops=1) Sort Key: (count(*)) DESC, supplier.s_name Sort Method: top-N heapsort Memory: 39kB -> GroupAggregate (cost=544020.65..544020.67 rows=1 width=34) (actual time=5061.289..5068.050 rows=1194 loops=1) Group Key: supplier.s_name -> Sort (cost=544020.65..544020.66 rows=1 width=26) (actual time=5061.275..5063.360 rows=11887 loops=1) Sort Key: supplier.s_name Sort Method: quicksort Memory: 1406kB -> Nested Loop (cost=1752.62..544020.64 rows=1 width=26) (actual time=3.134..4926.365 rows=11887 loops=1) -> Nested Loop Semi Join (cost=1752.19..544015.98 rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1) -> Gather (cost=1751.75..544010.27 rows=1 width=34) (actual time=2.909..4354.282 rows=40387 loops=1) Workers Planned: 2 Workers Launched: 2 -> Nested Loop Anti Join (cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283 rows=13462 loops=3) -> Hash Join (cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800 rows=151103 loops=3) Hash Cond: (l1.l_suppkey = supplier.s_suppkey) -> Parallel Seq Scan on lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual time=0.046..2742.912 rows=3792542 loops=3) Filter: (l_receiptdate > l_commitdate) Rows Removed by Filter: 2206328 -> Hash (cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194 loops=3) Buckets: 2048 Batches: 1 Memory Usage: 91kB -> Nested Loop (cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194 loops=3) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1 loops=3) Filter: (n_name = 'ALGERIA'::bpchar) Rows Removed by Filter: 24 -> Bitmap Heap Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual time=0.460..2.809 rows=1194 loops=3) Recheck Cond: (s_nationkey = nation.n_nationkey) Heap Blocks: exact=564 -> Bitmap Index Scan on idx_supplier_nation_key (cost=0.00..25.29 rows=1200 width=0) (actual time=0.350..0.350 rows=1194 loops=3) Index Cond: (s_nationkey = nation.n_nationkey) -> Index Scan using idx_lineitem_orderkey on lineitem l3 (cost=0.44..5.81 rows=15 width=8) (actual time=0.007..0.007 rows=1 loops=453308) Index Cond: (l_orderkey = l1.l_orderkey) Filter: ((l_receiptdate > l_commitdate) AND (l_suppkey <> l1.l_suppkey)) Rows Removed by Filter: 1 -> Index Scan using idx_lineitem_orderkey on lineitem l2 (cost=0.44..5.70 rows=45 width=8) (actual time=0.005..0.005 rows=1 loops=40387) Index Cond: (l_orderkey = l1.l_orderkey) Filter: (l_suppkey <> l1.l_suppkey) Rows Removed by Filter: 1 -> Index Scan using orders_pkey on orders (cost=0.43..4.66 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=24138) Index Cond: (o_orderkey = l1.l_orderkey) Filter: (o_orderstatus = 'F'::bpchar) Rows Removed by Filter: 1 Planning time: 4.473 ms Execution time: 5070.780 ms (45 rows)
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=928256.19..928256.20 rows=1 width=34) (actual time=10481.473..10481.495 rows=100 loops=1) -> Sort (cost=928256.19..928256.20 rows=1 width=34) (actual time=10481.472..10481.481 rows=100 loops=1) Sort Key: (count(*)) DESC, supplier.s_name Sort Method: top-N heapsort Memory: 39kB -> GroupAggregate (cost=928256.16..928256.18 rows=1 width=34) (actual time=10471.441..10478.748 rows=1194 loops=1) Group Key: supplier.s_name -> Sort (cost=928256.16..928256.17 rows=1 width=26) (actual time=10469.949..10472.172 rows=11887 loops=1) Sort Key: supplier.s_name Sort Method: quicksort Memory: 1406kB -> Nested Loop Anti Join (cost=172401.80..928256.15 rows=1 width=26) (actual time=2663.307..10340.567 rows=11887 loops=1) -> Gather (cost=172401.37..928250.19 rows=1 width=34) (actual time=2651.873..8386.936 rows=210800 loops=1) Workers Planned: 2 Workers Launched: 2 -> Nested Loop Semi Join (cost=171401.37..927250.09 rows=1 width=34) (actual time=2647.954..8515.304 rows=70267 loops=3) Join Filter: (l2.l_suppkey <> l1.l_suppkey) Rows Removed by Join Filter: 18351 -> Hash Join (cost=171400.93..624463.20 rows=48771 width=38) (actual time=2647.713..6984.199 rows=72959 loops=3) Hash Cond: (l1.l_orderkey = orders.o_orderkey) -> Hash Join (cost=751.32..442412.33 rows=99981 width=34) (actual time=18.468..3552.191 rows=151103 loops=3) Hash Cond: (l1.l_suppkey = supplier.s_suppkey) -> Parallel Seq Scan on lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual time=0.047..2796.852 rows=3792542 loops=3) Filter: (l_receiptdate > l_commitdate) Rows Removed by Filter: 2206328 -> Hash (cost=736.32..736.32 rows=1200 width=30) (actual time=18.222..18.222 rows=1194 loops=3) Buckets: 2048 Batches: 1 Memory Usage: 91kB -> Nested Loop (cost=25.59..736.32 rows=1200 width=30) (actual time=0.986..17.583 rows=1194 loops=3) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.020..0.028 rows=1 loops=3) Filter: (n_name = 'ALGERIA'::bpchar) Rows Removed by Filter: 24 -> Bitmap Heap Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual time=0.958..17.108 rows=1194 loops=3) Recheck Cond: (s_nationkey = nation.n_nationkey) Heap Blocks: exact=564 -> Bitmap Index Scan on idx_supplier_nation_key (cost=0.00..25.29 rows=1200 width=0) (actual time=0.829..0.829 rows=1194 loops=3) Index Cond: (s_nationkey = nation.n_nationkey) -> Hash (cost=134635.25..134635.25 rows=2195149 width=4) (actual time=2616.583..2616.583 rows=2192233 loops=3) Buckets: 131072 Batches: 32 Memory Usage: 3440kB -> Seq Scan on orders (cost=0.00..134635.25 rows=2195149 width=4) (actual time=0.026..1883.496 rows=2192233 loops=3) Filter: (o_orderstatus = 'F'::bpchar) Rows Removed by Filter: 2307767 -> Index Scan using idx_lineitem_orderkey on lineitem l2 (cost=0.44..5.65 rows=45 width=8) (actual time=0.020..0.020 rows=1 loops=218876) Index Cond: (l_orderkey = orders.o_orderkey) -> Index Scan using idx_lineitem_orderkey on lineitem l3 (cost=0.44..5.81 rows=15 width=8) (actual time=0.009..0.009 rows=1 loops=210800) Index Cond: (l_orderkey = l1.l_orderkey) Filter: ((l_receiptdate > l_commitdate) AND (l_suppkey <> l1.l_suppkey)) Rows Removed by Filter: 1 Planning time: 17.594 ms Execution time: 10483.094 ms (47 rows)