On Sep 13, 2006, at 14:44 , Gregory Stark wrote:
I think we need a serious statistics jock to pipe up with some
standard
metrics that do what we need. Otherwise we'll never have a solid
footing for
the predictions we make and will never know how much we can trust
them.
That said I'm now going to do exactly what I just said we should
stop doing
and brain storm about an ad-hoc metric that might help:
I wonder if what we need is something like: sort the sampled values
by value
and count up the average number of distinct blocks per value. That
might let
us predict how many pages a fetch of a specific value would
retrieve. Or
perhaps we need a second histogram where the quantities are of
distinct pages
rather than total records.
We might also need a separate "average number of n-block spans per
value"
metric to predict how sequential the i/o will be in addition to how
many pages
will be fetched.
Currently, statistics are only collected during an "ANALYZE". Why
aren't statistics collected during actual query runs such as seq
scans? One could turn such as beast off in order to get repeatable,
deterministic optimizer results.
-M
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match