On Mon, Aug 13, 2012 at 1:11 AM, Alexander Korotkov <aekorot...@gmail.com>wrote:

> On Thu, Aug 9, 2012 at 12:44 AM, Alexander Korotkov 
> <aekorot...@gmail.com>wrote:
>
>> My conclusion is so, that current errors are probably ok for selectivity
>> estimation. But taking into attention that generated datasets ideally fits
>> assumptions of estimation, there could be room for improvement. Especially,
>> it's unclear why estimate for "<@" and "@>" have much greater error than
>> estimate for "&&". Possibly, it's caused by some bugs.
>>
>
> ITSM, I found reason of inaccuracy. Implementation of linear interpolation
> was wrong. Fixed version is attached. Now, need to rerun tests, possible
> refactoring and comments rework.
>

After fixing few more bugs, I've a version with much more reasonable
accuracy.

Statistics target = 100.

Relatively large result sets (>= 10)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 100 and actual_count >= 10 group by
operator;
 operator |    avg_ratio     |     avg_error
----------+------------------+--------------------
 <@       | 1.00404179116863 | 0.0504415454560903
 @>       | 1.06364108531688 |  0.105646077989812
 &&       | 1.00757984721409 | 0.0420984234933233
(3 rows)

Small result sets (1 - 9)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 100 and actual_count between 1 and 9 group
by
operator;
 operator |    avg_ratio     |     avg_error
----------+------------------+-------------------
 <@       | 1.31530838062865 | 0.654886592410495
 @>       | 2.78708078320147 |  1.94124123003433
 &&       | 1.93268112525538 |  1.09904919063335
(3 rows)

Empty result sets

test=# select operator, avg(estimate_count) as avg_estimate, count(*) as
tests_count from datasets d join test_results tr on tr.test_id = d.id where
d.stat_target = 100 and actual_count = 0 group by operator;
 operator |    avg_estimate    | tests_count
----------+--------------------+-------------
 <@       | 1.1437670609645132 |        1099
 @>       | 1.0479430126460701 |       87458
(2 rows)

Statistics target = 1000.

Relatively large result sets (>= 10)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 1000 and actual_count >= 10 group by
operator;
 operator |    avg_ratio     |     avg_error
----------+------------------+--------------------
 <@       | 1.00073999445381 |  0.045099762607524
 @>       | 1.05296320350853 | 0.0907489633452971
 &&       | 1.00217602359039 | 0.0353421159150165
(3 rows)

Small result sets (1 - 9)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 1000 and actual_count between 1 and 9 group
by
operator;
 operator |    avg_ratio     |     avg_error
----------+------------------+-------------------
 <@       | 1.26946358795998 | 0.577803898836364
 @>       | 2.69000633430211 |  1.83165424646645
 &&       | 1.48715184186882 | 0.577998652291105
(3 rows)

Empty result sets

test=# select operator, avg(estimate_count) as avg_estimate, count(*) as
tests_count from datasets d join test_results tr on tr.test_id = d.id where
d.stat_target = 1000 and actual_count = 0 group by operator;
 operator |    avg_estimate    | tests_count
----------+--------------------+-------------
 <@       | 1.0887096774193548 |        1364
 @>       | 1.0423876983771183 |       89224
 &&       | 5.0000000000000000 |           1
(3 rows)

------
With best regards,
Alexander Korotkov.

Attachment: range_stat-0.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to