On 01/04/2017 03:21 PM, Dilip Kumar wrote:
On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.
v22 fixes the problem, I reported. In my test, I observed that group
by estimation is much better with ndistinct stat.
Here is one example:
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..38992.00 rows=100000 width=36) (actual
time=953.359..1011.302 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..163.672 rows=1000000 loops=1)
Planning time: 0.194 ms
Execution time: 1020.776 ms
(5 rows)
postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
ANALYZE
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..39622.46 rows=163046 width=36) (actual
time=935.162..992.944 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..156.746 rows=1000000 loops=1)
Planning time: 0.308 ms
Execution time: 1001.889 ms
In above example,
Without MVStat-> estimated: 100000 Actual: 186607
With MVStat-> estimated: 163046 Actual: 186607
Thanks. Those plans match my experiments with the TPC-H data set,
although I've been playing with the smallest scale (1GB).
It's not very difficult to make the estimation error arbitrary large,
e.g. by using perfectly correlated (identical) columns.
regard
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers