On 01/10/14 05:54, Jeff Janes wrote:
On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower
<gavinflo...@archidevsys.co.nz <mailto:gavinflo...@archidevsys.co.nz>>
wrote:
Would it be feasible to get a competent statistician to advise what data to
collect, and to analyze it? Maybe it is possible to get a better estimate on
how much of a table needs to be scanned, based on some fairly simple
statistics. But unless research is done, it is probably impossible to
determine what statistics might be useful, and how effective a better estimate
could be.
I have a nasty feeling that assuming a uniform distribution, may
still end up being the best we can do - but I maybe being unduly
pessimistic!.
As a semi-competent statistician, my gut feeling is that our best bet
would be not to rely on the competence of statisticians for too much,
and instead try to give the executor the ability to abandon a
fruitless path and pick a different plan instead. Of course this
option is foreclosed once a tuple is returned to the client (unless
the ctid is also cached, so we can make sure not to send it again on
the new plan).
I think that the exponential explosion of possibilities is going to be
too great to analyze in any rigorous way.
Cheers,
Jeff
Many moons ago, I passed several 300 level statistics papers.
I looked at this problem and found it was too hard to even properly
characterise the problem (looks 'simple' - if you don't look too
closely), and ended up feeling it was definitely 'way above my pay
grade'! :-)
It might be possible to tackle it more pragmatically, instead of trying
to be all analytic and rigorously list all the possible influences, have
a look at queries of this nature that are taking far too long. Then get
a feel for combinations of issues involved and how they contribute. If
you have enough data, you might be able to use something like Principle
Component Analysis (I was fortunate to meet a scientist who had got
heavily into this area of statistics). Such an approach might yield
valuable insights, even if the problem is not fully characterised, let
alone 'solved'.
Cheers,
Gavin