On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas wrote:
> On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro
> wrote:
>> Thank you for the original pointer and the commit. Everything here
>> seems to make intuitive sense and the accompanying throw-away tests
>> that I posted above seem to produce sensible results except in some
>> cases that we discussed, so I think this is progress. There is still
>> something pretty funny about the cardinality estimates for TPCH Q21
>> which I haven't grokked though. I suspect it is crafted to look for a
>> technique we don't know (an ancient challenge set by some long retired
>> database gurus back in 1992 that their RDBMSs know how to solve,
>> hopefully not in the manner of a certain car manufacturer's air
>> pollution tests), but I haven't yet obtained enough round tuits to dig
>> further. I will, though.
>
> Hmm, do you have an example of the better but still-funky estimates
> handy? Like an EXPLAIN plan?
Sure. Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
indexes[1]. There's a version from HEAD with and without commit
7ca25b7d.
[1]
https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh
--
Thomas Munro
http://www.enterprisedb.com
QUERY PLAN
-
Limit (cost=544020.68..544020.69 rows=1 width=34) (actual
time=5070.235..5070.253 rows=100 loops=1)
-> Sort (cost=544020.68..544020.69 rows=1 width=34) (actual
time=5070.233..5070.241 rows=100 loops=1)
Sort Key: (count(*)) DESC, supplier.s_name
Sort Method: top-N heapsort Memory: 39kB
-> GroupAggregate (cost=544020.65..544020.67 rows=1 width=34)
(actual time=5061.289..5068.050 rows=1194 loops=1)
Group Key: supplier.s_name
-> Sort (cost=544020.65..544020.66 rows=1 width=26) (actual
time=5061.275..5063.360 rows=11887 loops=1)
Sort Key: supplier.s_name
Sort Method: quicksort Memory: 1406kB
-> Nested Loop (cost=1752.62..544020.64 rows=1 width=26)
(actual time=3.134..4926.365 rows=11887 loops=1)
-> Nested Loop Semi Join (cost=1752.19..544015.98
rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1)
-> Gather (cost=1751.75..544010.27 rows=1
width=34) (actual time=2.909..4354.282 rows=40387 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Anti Join
(cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283
rows=13462 loops=3)
-> Hash Join
(cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800
rows=151103 loops=3)
Hash Cond: (l1.l_suppkey =
supplier.s_suppkey)
-> Parallel Seq Scan on
lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual
time=0.046..2742.912 rows=3792542 loops=3)
Filter: (l_receiptdate
> l_commitdate)
Rows Removed by
Filter: 2206328
-> Hash
(cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194
loops=3)
Buckets: 2048
Batches: 1 Memory Usage: 91kB
-> Nested Loop
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194
loops=3)
-> Seq Scan on
nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1
loops=3)
Filter:
(n_name = 'ALGERIA'::bpchar)
Rows
Removed by Filter: 24
-> Bitmap Heap
Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual
time=0.460..2.809 rows=1194 loops=3)
Recheck
Cond: (s_nationkey = nation.n_nationkey)
Heap
Blocks: exact=564
-> Bitmap
Index Scan