On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> Thank you for the original pointer and the commit.  Everything here
>> seems to make intuitive sense and the accompanying throw-away tests
>> that I posted above seem to produce sensible results except in some
>> cases that we discussed, so I think this is progress.  There is still
>> something pretty funny about the cardinality estimates for TPCH Q21
>> which I haven't grokked though.  I suspect it is crafted to look for a
>> technique we don't know (an ancient challenge set by some long retired
>> database gurus back in 1992 that their RDBMSs know how to solve,
>> hopefully not in the manner of a certain car manufacturer's air
>> pollution tests), but I haven't yet obtained enough round tuits to dig
>> further.  I will, though.
>
> Hmm, do you have an example of the better but still-funky estimates
> handy?  Like an EXPLAIN plan?

Sure.  Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
indexes[1].  There's a version from HEAD with and without commit
7ca25b7d.

[1] 
https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh

-- 
Thomas Munro
http://www.enterprisedb.com
                                                                                
               QUERY PLAN                                                       
                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=544020.68..544020.69 rows=1 width=34) (actual 
time=5070.235..5070.253 rows=100 loops=1)
   ->  Sort  (cost=544020.68..544020.69 rows=1 width=34) (actual 
time=5070.233..5070.241 rows=100 loops=1)
         Sort Key: (count(*)) DESC, supplier.s_name
         Sort Method: top-N heapsort  Memory: 39kB
         ->  GroupAggregate  (cost=544020.65..544020.67 rows=1 width=34) 
(actual time=5061.289..5068.050 rows=1194 loops=1)
               Group Key: supplier.s_name
               ->  Sort  (cost=544020.65..544020.66 rows=1 width=26) (actual 
time=5061.275..5063.360 rows=11887 loops=1)
                     Sort Key: supplier.s_name
                     Sort Method: quicksort  Memory: 1406kB
                     ->  Nested Loop  (cost=1752.62..544020.64 rows=1 width=26) 
(actual time=3.134..4926.365 rows=11887 loops=1)
                           ->  Nested Loop Semi Join  (cost=1752.19..544015.98 
rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1)
                                 ->  Gather  (cost=1751.75..544010.27 rows=1 
width=34) (actual time=2.909..4354.282 rows=40387 loops=1)
                                       Workers Planned: 2
                                       Workers Launched: 2
                                       ->  Nested Loop Anti Join  
(cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283 
rows=13462 loops=3)
                                             ->  Hash Join  
(cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800 
rows=151103 loops=3)
                                                   Hash Cond: (l1.l_suppkey = 
supplier.s_suppkey)
                                                   ->  Parallel Seq Scan on 
lineitem l1  (cost=0.00..431288.00 rows=2499520 width=8) (actual 
time=0.046..2742.912 rows=3792542 loops=3)
                                                         Filter: (l_receiptdate 
> l_commitdate)
                                                         Rows Removed by 
Filter: 2206328
                                                   ->  Hash  
(cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194 
loops=3)
                                                         Buckets: 2048  
Batches: 1  Memory Usage: 91kB
                                                         ->  Nested Loop  
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194 
loops=3)
                                                               ->  Seq Scan on 
nation  (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1 
loops=3)
                                                                     Filter: 
(n_name = 'ALGERIA'::bpchar)
                                                                     Rows 
Removed by Filter: 24
                                                               ->  Bitmap Heap 
Scan on supplier  (cost=25.59..723.00 rows=1200 width=34) (actual 
time=0.460..2.809 rows=1194 loops=3)
                                                                     Recheck 
Cond: (s_nationkey = nation.n_nationkey)
                                                                     Heap 
Blocks: exact=564
                                                                     ->  Bitmap 
Index Scan on idx_supplier_nation_key  (cost=0.00..25.29 rows=1200 width=0) 
(actual time=0.350..0.350 rows=1194 loops=3)
                                                                           
Index Cond: (s_nationkey = nation.n_nationkey)
                                             ->  Index Scan using 
idx_lineitem_orderkey on lineitem l3  (cost=0.44..5.81 rows=15 width=8) (actual 
time=0.007..0.007 rows=1 loops=453308)
                                                   Index Cond: (l_orderkey = 
l1.l_orderkey)
                                                   Filter: ((l_receiptdate > 
l_commitdate) AND (l_suppkey <> l1.l_suppkey))
                                                   Rows Removed by Filter: 1
                                 ->  Index Scan using idx_lineitem_orderkey on 
lineitem l2  (cost=0.44..5.70 rows=45 width=8) (actual time=0.005..0.005 rows=1 
loops=40387)
                                       Index Cond: (l_orderkey = l1.l_orderkey)
                                       Filter: (l_suppkey <> l1.l_suppkey)
                                       Rows Removed by Filter: 1
                           ->  Index Scan using orders_pkey on orders  
(cost=0.43..4.66 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=24138)
                                 Index Cond: (o_orderkey = l1.l_orderkey)
                                 Filter: (o_orderstatus = 'F'::bpchar)
                                 Rows Removed by Filter: 1
 Planning time: 4.473 ms
 Execution time: 5070.780 ms
(45 rows)

                                                                                
               QUERY PLAN                                                       
                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=928256.19..928256.20 rows=1 width=34) (actual 
time=10481.473..10481.495 rows=100 loops=1)
   ->  Sort  (cost=928256.19..928256.20 rows=1 width=34) (actual 
time=10481.472..10481.481 rows=100 loops=1)
         Sort Key: (count(*)) DESC, supplier.s_name
         Sort Method: top-N heapsort  Memory: 39kB
         ->  GroupAggregate  (cost=928256.16..928256.18 rows=1 width=34) 
(actual time=10471.441..10478.748 rows=1194 loops=1)
               Group Key: supplier.s_name
               ->  Sort  (cost=928256.16..928256.17 rows=1 width=26) (actual 
time=10469.949..10472.172 rows=11887 loops=1)
                     Sort Key: supplier.s_name
                     Sort Method: quicksort  Memory: 1406kB
                     ->  Nested Loop Anti Join  (cost=172401.80..928256.15 
rows=1 width=26) (actual time=2663.307..10340.567 rows=11887 loops=1)
                           ->  Gather  (cost=172401.37..928250.19 rows=1 
width=34) (actual time=2651.873..8386.936 rows=210800 loops=1)
                                 Workers Planned: 2
                                 Workers Launched: 2
                                 ->  Nested Loop Semi Join  
(cost=171401.37..927250.09 rows=1 width=34) (actual time=2647.954..8515.304 
rows=70267 loops=3)
                                       Join Filter: (l2.l_suppkey <> 
l1.l_suppkey)
                                       Rows Removed by Join Filter: 18351
                                       ->  Hash Join  
(cost=171400.93..624463.20 rows=48771 width=38) (actual time=2647.713..6984.199 
rows=72959 loops=3)
                                             Hash Cond: (l1.l_orderkey = 
orders.o_orderkey)
                                             ->  Hash Join  
(cost=751.32..442412.33 rows=99981 width=34) (actual time=18.468..3552.191 
rows=151103 loops=3)
                                                   Hash Cond: (l1.l_suppkey = 
supplier.s_suppkey)
                                                   ->  Parallel Seq Scan on 
lineitem l1  (cost=0.00..431288.00 rows=2499520 width=8) (actual 
time=0.047..2796.852 rows=3792542 loops=3)
                                                         Filter: (l_receiptdate 
> l_commitdate)
                                                         Rows Removed by 
Filter: 2206328
                                                   ->  Hash  
(cost=736.32..736.32 rows=1200 width=30) (actual time=18.222..18.222 rows=1194 
loops=3)
                                                         Buckets: 2048  
Batches: 1  Memory Usage: 91kB
                                                         ->  Nested Loop  
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.986..17.583 rows=1194 
loops=3)
                                                               ->  Seq Scan on 
nation  (cost=0.00..1.31 rows=1 width=4) (actual time=0.020..0.028 rows=1 
loops=3)
                                                                     Filter: 
(n_name = 'ALGERIA'::bpchar)
                                                                     Rows 
Removed by Filter: 24
                                                               ->  Bitmap Heap 
Scan on supplier  (cost=25.59..723.00 rows=1200 width=34) (actual 
time=0.958..17.108 rows=1194 loops=3)
                                                                     Recheck 
Cond: (s_nationkey = nation.n_nationkey)
                                                                     Heap 
Blocks: exact=564
                                                                     ->  Bitmap 
Index Scan on idx_supplier_nation_key  (cost=0.00..25.29 rows=1200 width=0) 
(actual time=0.829..0.829 rows=1194 loops=3)
                                                                           
Index Cond: (s_nationkey = nation.n_nationkey)
                                             ->  Hash  
(cost=134635.25..134635.25 rows=2195149 width=4) (actual 
time=2616.583..2616.583 rows=2192233 loops=3)
                                                   Buckets: 131072  Batches: 32 
 Memory Usage: 3440kB
                                                   ->  Seq Scan on orders  
(cost=0.00..134635.25 rows=2195149 width=4) (actual time=0.026..1883.496 
rows=2192233 loops=3)
                                                         Filter: (o_orderstatus 
= 'F'::bpchar)
                                                         Rows Removed by 
Filter: 2307767
                                       ->  Index Scan using 
idx_lineitem_orderkey on lineitem l2  (cost=0.44..5.65 rows=45 width=8) (actual 
time=0.020..0.020 rows=1 loops=218876)
                                             Index Cond: (l_orderkey = 
orders.o_orderkey)
                           ->  Index Scan using idx_lineitem_orderkey on 
lineitem l3  (cost=0.44..5.81 rows=15 width=8) (actual time=0.009..0.009 rows=1 
loops=210800)
                                 Index Cond: (l_orderkey = l1.l_orderkey)
                                 Filter: ((l_receiptdate > l_commitdate) AND 
(l_suppkey <> l1.l_suppkey))
                                 Rows Removed by Filter: 1
 Planning time: 17.594 ms
 Execution time: 10483.094 ms
(47 rows)

Reply via email to