[ Sorry, forgot to cc list ]
>> It is said to be 10%. i would like to raise that, because we are getting bas
>> estimations for n_distinct.
>
> More to the point, the estimator we use is going to be biased for many
> ( probably most ) distributions no matter how large your sample size
> is.
>
> If
> The accesses to an index are far more likely to be clustered than the
> accesses to the underlying table, because the index is organized in a
> way that's application-meaningful and the table not so much.
So, to clarify, are you saying that if query were actually requesting
rows uniformly random
>> If you model the costing to reflect the reality on your server, good
>> plans will be chosen.
>
> Wouldn't it be "better" to derive those costs from actual performance
> data measured at runtime?
>
> Say, pg could measure random/seq page cost, *per tablespace* even.
>
> Has that been tried?
FWI
> how explian works as math equations to estimate cost with constatn query
> parameters
> such as cpu_tuple cost ,random page cost ...etc
> i want maths expression in order to know how these parameters will effect
> in cost ???
The expressions are complicated, and they are certainly not linear
> mergejoinscansel doesn't currently try to fix up the histogram bounds by
> consulting indexes. At the time I was afraid of the costs of doing
> that, and I still am; but it would be a way to address this issue.
>
Another cheaper but less accurate way to deal with this is to note
that we are try
ally, we could keep
the current estimates ( every passed array would be of length one )
and then make changes as problems appear ( like Josh's )
I hope my little estimation procedure tutorial has been a little
helpful, please feel free to contact me off list if you have
questions/want referenc
> Uh, no, it wouldn't. Visually:
>
> L1 -
> L2 ---
> L3 -
>
> R1
>
> At L2, you'd conclude that you're done matching R1.
>
No, you should conclude that you're done matching