On 1/15/19 11:07 AM, Surafel Temesgen wrote:
> 
> 
> On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra
> <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
>     After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
>     patch should tweak estimates in some way. Currently, the cardinality
>     estimate is the same as for plain LIMIT, using the requested number of
>     rows. But let's say there are very few large groups - that will
>     naturally increase the number of rows produced.
> 
>     As an example, let's say the subplan produces 1M rows, and there are
>     1000 groups (when split according to the ORDER BY clause).
> 
> 
>  
> 
> can we use ORDER BY column raw statistic in limit node reliably?
> because it seems to me it can be affected by other operation in a
> subplan like filter condition
> 

What do you mean by "raw statistic"? Using stats from the underlying
table is not quite possible, because you might be operating on top of
join or something like that.

IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.

But maybe we can do better when there really is a single table to
consider, in which case we might look at MCV lists and estimate the
largest group. That would give us a much better idea of the worst case.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to