Re: [HACKERS] improving GROUP BY estimation

Tomas Vondra Sat, 19 Mar 2016 06:16:36 -0700

On 03/13/2016 11:09 PM, Tomas Vondra wrote:

Hi,


On Sun, 2016-03-13 at 15:24 +0000, Dean Rasheed wrote:

On 4 March 2016 at 13:10, Tomas Vondra <[email protected]>
wrote:

...
>>>


I think that a better formula to use would be

reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples /
reldistinct)

Attached is a v3 of the patch using this formula instead of the original one. Interestingly, that apparently reduces the number of regression tests that get broken to a single one.

I'm not sure whether we need to provide a link to the PDF the formula comes from - perhaps we should?

I've also repeated the tests for the two tables (dependent and independent columns), comparing the actual number of groups and different estimates, and the results look like this (v3 is the formula used in this patch):



1) independent

               |   10 |   50 |  100 |  500 |  1000 |  5000
   ---------------------------------------------------------
        actual |  919 | 3829 | 6244 | 9944 | 10001 | 10001
       current |   10 |   50 |  102 |  516 |  1018 |  4996
      new (v1) |  973 | 4001 | 6382 | 9897 |  9951 |  9951
      new (v3) | 1117 | 3852 | 6229 | 9943 | 10004 | 10004


2) dependent

                |  10 |   50 |  100 |  500 |  1000 |  5000
     --------------------------------------------------------
         actual |  10 |   50 |  100 |  500 |  1000 |  5000
        current |  10 |   53 |  105 |  508 |  1016 |  5014
       new (v1) | 880 | 4105 | 6472 | 9955 | 10018 | 10018
       new (v3) | 807 | 3680 | 6050 | 9916 |  9983 |  9983

I only collected numbers for the new estimator, the other numbers are just a copy from the previous message. So there might be minor differences due to slightly different ndistinct estimates etc.

Anyway, the numbers are obviously quite close to the formula from v1 of the patch, plus the formula gives better estimates when scanning nearly all rows.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

estimate-num-groups-v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving GROUP BY estimation

Reply via email to