Peter Eisentraut wrote:
Arturo Pérez wrote:
The DBA therefore pokes the
right information into
the planner's statistical tables (or, perhaps, a more human-
manageable one that gets
"compiled" into the planner's stats).

I think we're perfectly capable of producing a system that can collect the statistics. We just don't want to collect every possible statistic, but just those that someone declared to be interesting beforehand. There need not be any manual poking. Just manual declaring.

But we need to work this from the other end anyway. We need to determine first, what sort of statistics the planner could make use of. Then we can figure out the difficulties in collecting them.

I've been told that oracle has an interesting feature regarding
materialized views that gave me an idea how to declare what statistics
to gather. It seems as if oracle is able to figure out that it can
use a certain materialized view to speed up execution of a certain
query, even if the query doesn't use that view explicitly. So, e.g.
if you do

1) create materialized view v as select * from t1 join t2 on t1.t2_id =
2) select * from t1 join t2 on t1.t2_id = join t3 on t3.t2_id =

then oracle seems to be able to use the already-joined tuples in v, and
only needs to join t3 to those, instead of having to rejoin t1 and t2.

That gave me the idea that something similar could be used to declare
what statistics to gather, in a very general way. Imagine that I could

1) create statistics for select * from t1 join t2 on t1.t2_id and t1.flag = TRUE. 2) select * from t1 join t2 on t1.t2_id and t1.flag = TRUE join t3 on ... join t4 on ...

The command 1) would basically gather the same statistics for the result
of the query as it would gather for a normal table with the same signature.
When planning 2), postgres would recognize that it can use those
statistics (similar to how oracle recognizes that it can use a certain
materialized view), and would thus. know the selectivity of that
particular join very accurately.

I think there might even be a way to do (1) without actually executing the (whole) query. If every access-method in the query plan could be
told to deliver only say 10% of the rows it would deliver "normally",
but the rest of the plan was executed normally, then the result should
have the same statistical properties as the complete result would have.

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to