On Tue, Dec 05, 2017 at 01:50:11PM -0500, Tom Lane wrote:
> Jeff Janes <jeff.ja...@gmail.com> writes:
> > On Dec 3, 2017 15:31, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
> >> Jeff Janes <jeff.ja...@gmail.com> writes:
> >>> But I do see that ties within the logical order of the column values are
> >>> broken to agree with the physical order.  That is wrong, right?  Is there
> >>> any argument that this is desirable?
> >> Uh ... what do you propose doing instead?  We'd have to do something with
> >> ties, and it's not so obvious this way is wrong.
> > Let them be tied.
> I thought some more about this.  What we really want the correlation stat
> to do is help us estimate how randomly an index-ordered scan will access
> the heap.  If the values we've sampled are all unequal then there's no
> particular issue.  However, if we have some group of equal values, we
> do not really know what order an indexscan will visit them in.  The
> existing correlation calculation is making the *most optimistic possible*
> assumption, that such a group will be visited exactly in heap order ---
> and that assumption isn't too defensible.

I'm interested in discusstion regarding bitmap cost, since it would have helped
our case discussed here ~18 months ago:

...but remember: in Vitaliy's case (as opposed to mine), the index scan is
*faster* but being estimated at higher cost than bitmap (I have to keep
reminding myself).  So the rest of this discussion is about the
overly-optimistic cost estimate of index scans, which moves in the opposite
direction for this reported problem.  For the test cases I looked at, index
scans were used when RPC=1 and redundant conditions were avoided, so I'm not
sure if there's any remaining issue (but I haven't looked at the latest cases
Vitaliy sent).

> In any case, given that we do this calculation without regard
> to any specific index,

One solution is to compute stats (at least correlation) for all indices, not
just expr inds.  I did that earlier this year while throwing around/out ideas.

> We do have an idea, from the data we have, whether the duplicates are close
> together in the heap or spread all over.

I think you just mean pg_stats.correlation for all values, not just duplicates
(with the understanding that duplicates might be a large fraction of the
tuples, and high weight in correlation).

Another issue I noted in an earlier thread is that as table sizes increase, the
existing correlation computation approaches 1 for correlated insertions, (like
"append-only" timestamps clustered around now()), due to ANALYZE sampling a
fraction of the table, and thereby representing only large-scale correlation,
and, to an increasing degree, failing to represent small-scale variations
between adjacent index TIDs, which has real cost (and for which the mitigation
by cache effects probably decreases WRT table size, too).  I think any solution
needs to handle this somehow.

Generated data demonstrating this (I reused this query so it's more complicated
than it needs to be):

[pryzbyj@database ~]$ time for sz in 9999{,9{,9{,9{,9}}}} ; do psql postgres 
t(i);INSERT INTO t SELECT i/99999.0+pow(2,(-random())) FROM 
generate_series(1,$sz) i ORDER BY i; ANALYZE t; SELECT $sz, correlation, 
most_common_freqs[1] FROM pg_stats WHERE attname='i' AND tablename='t'"; done

     9999 |    0.187146 |                  
    99999 |    0.900629 |                  
   999999 |    0.998772 |                  
  9999999 |    0.999987 |                  

Trying to keep it all in my own head: For sufficiently large number of pages,
bitmap scan should be preferred to idx scan due to reduced random-page-cost
outweighing its overhead in CPU cost.  Probably by penalizing index scans, not
discounting bitmap scans.  Conceivably a correlation adjustment can be
conditionalized or weighted based on index_pages_fetched() ...
        x = ln (x/999999);
        if (x>1) correlation/=x;

I think one could look at the fraction of duplicated index keys expected to be
returned: if we expect to return 1000 tuples, with 200 duplicates from MCV,
cost_index would multiply correlation by (1 - 200/1000), meaning to use
something closer to max_IO_cost rather than min_IO_cost.  I imagine it'd be
possible to limit to only those MCVs which pass quals - if none pass, then
there may be few tuples returned, so apply no correction to (additionally)
penalize index scan.

In my tests, at one point I implemented idx_corr_fudge(), returning a value
like "fragmentation" from pgstatindex (which I'm sure is where I got the phrase
when reporting the problem).  That only uses the leaf nodes' "next" pointer,
and not the individual tuples, which probably works if there's a sufficiently
number of repeated keys.

I think that's all for now..


Reply via email to