[HACKERS] Data point on the competition regarding selectivity of unknown parameters

Greg Stark Tue, 31 Aug 2004 23:35:27 -0700


One of the things I think has to change with postgres is the default
selectivity assumptions for inequality operators. They're way to high
currently. Probably the single most frequently asked question on -performance
and -general are people asking why Postgres isn't using their index. And while
some of the cases are caused by other things, easily half the time it's simply
Postgres making unreasonably pessimistic assumptions about the selectivity of
inequalities.


Just out of curiosity I checked what Oracle does. This took my a lot of effort
so I hope you find the data useful. 

The short of it is that Oracle assumes 5% for a single inequality. It assumes
.25% for a range query, but that could just be the product of two 5%s. I would
guess it's not handling range queries as a special case.

This actually is a startling coincidence. I was already planning to argue for
precisely 5% myself as a reasonable compromise. I think even lower values make
sense but at least 5% would be low enough to consistently cause index scans,
which seems to match users' expectations.


Methodology:

This is with Oracle 10g (aka 10.1.0.2.0) on Linux.

I created a table with 100,000 records containing a single integer column
populated with integers ranging from 1 to 100,000 and ran "analyze table test2
compute statistics" on it. This is vaguely equivalent to "vacuum analyze full"
as far as gathering statistics. It tells it to scan the entire table, not to
use any sampling.

I then ran

 explain plan for select 1 from test2 where a > :0;
 select cardinality from plan_table;

The results were 5000.

 I did the same thing for "where a>:0 and a<:1"

The results were 250.

I intend to try some other distributions and other where clauses tomorrow. But
for now it's time for some sleep.

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Data point on the competition regarding selectivity of unknown parameters

Reply via email to