On Wed, Dec 6, 2017 at 1:46 PM, Justin Pryzby <pry...@telsasoft.com> wrote:

## Advertising

> On Tue, Dec 05, 2017 at 01:50:11PM -0500, Tom Lane wrote: > > Jeff Janes <jeff.ja...@gmail.com> writes: > > > On Dec 3, 2017 15:31, "Tom Lane" <t...@sss.pgh.pa.us> wrote: > > >> Jeff Janes <jeff.ja...@gmail.com> writes: > > >>> But I do see that ties within the logical order of the column values > are > > >>> broken to agree with the physical order. That is wrong, right? Is > there > > >>> any argument that this is desirable? > > > > >> Uh ... what do you propose doing instead? We'd have to do something > with > > >> ties, and it's not so obvious this way is wrong. > > > > > Let them be tied. > ... > > I thought some more about this. What we really want the correlation stat > > to do is help us estimate how randomly an index-ordered scan will access > > the heap. If the values we've sampled are all unequal then there's no > > particular issue. However, if we have some group of equal values, we > > do not really know what order an indexscan will visit them in. The > > existing correlation calculation is making the *most optimistic possible* > > assumption, that such a group will be visited exactly in heap order --- > > and that assumption isn't too defensible. > > I'm interested in discusstion regarding bitmap cost, since it would have > helped > our case discussed here ~18 months ago: > https://www.postgresql.org/message-id/flat/20160524173914.GA11880% > 40telsasoft.com#20160524173914.ga11...@telsasoft.com > > ...but remember: in Vitaliy's case (as opposed to mine), the index scan is > *faster* but being estimated at higher cost than bitmap (I have to keep > reminding myself). So the rest of this discussion is about the > overly-optimistic cost estimate of index scans, which moves in the opposite > direction for this reported problem. For the test cases I looked at, index > scans were used when RPC=1 and redundant conditions were avoided, so I'm > not > sure if there's any remaining issue (but I haven't looked at the latest > cases > Vitaliy sent). > > > In any case, given that we do this calculation without regard > > to any specific index, > > One solution is to compute stats (at least correlation) for all indices, > not > just expr inds. I did that earlier this year while throwing around/out > ideas. > https://www.postgresql.org/message-id/20170707234119. > GN17566%40telsasoft.com When is the correlation of a column which is not the leading column of a btree index or in a brin index ever used? If we did compute index-specific correlations, we could maybe just drop pure-column correlations. > > > We do have an idea, from the data we have, whether the duplicates are > close > > together in the heap or spread all over. > > I think you just mean pg_stats.correlation for all values, not just > duplicates > (with the understanding that duplicates might be a large fraction of the > tuples, and high weight in correlation). > > Another issue I noted in an earlier thread is that as table sizes > increase, the > existing correlation computation approaches 1 for correlated insertions, > (like > "append-only" timestamps clustered around now()), due to ANALYZE sampling a > fraction of the table, and thereby representing only large-scale > correlation, > That isn't due to sampling. That is due to the definition of linear correlation. Large scale is what it is about. > Generated data demonstrating this (I reused this query so it's more > complicated > than it needs to be): > > [pryzbyj@database ~]$ time for sz in 9999{,9{,9{,9{,9}}}} ; do psql > postgres -tc "DROP TABLE IF EXISTS t; CREATE TABLE t(i float, j int); > CREATE INDEX ON t(i);INSERT INTO t SELECT i/99999.0+pow(2,(-random())) FROM > generate_series(1,$sz) i ORDER BY i; ANALYZE t; SELECT $sz, correlation, > most_common_freqs[1] FROM pg_stats WHERE attname='i' AND tablename='t'"; > done > > 9999 | 0.187146 | > 99999 | 0.900629 | > 999999 | 0.998772 | > 9999999 | 0.999987 | > Because the amount of jitter introduced is constant WRT $sz, but the range of i/99999.0 increases with $sz, the correlation actually does increase; it is not a sampling effect. Trying to keep it all in my own head: For sufficiently large number of > pages, > bitmap scan should be preferred to idx scan due to reduced random-page-cost > outweighing its overhead in CPU cost. But CPU cost is probably not why it is losing anyway. Index scans get a double bonus from high correlation. It assumes that only a small fraction of the table will be visited. And then it assumes that the visits it does make will be largely sequential. I think that you are saying that for a large enough table, that last assumption is wrong, that the residual amount of non-correlation is enough to make the table reads more random than sequential. Maybe. Do you have a test case that demonstrates this? If so, how big do we need to go, and can you see the problem on SSD as well as HDD? The thing is, the bitmap scan gets cheated out of one of these bonuses. It gets no credit for visiting only a small part of the table when the correlation is high, but does get credit for being mostly sequential in whatever visits it does make (even if the correlation is low, because of course the bitmap perfects the correlation). I think it would be easier to give bitmap scans their due rather than try to knock down index scans. But of course a synthetic test case would go a long way to either one. Probably by penalizing index scans, not > discounting bitmap scans. Conceivably a correlation adjustment can be > conditionalized or weighted based on index_pages_fetched() ... > x = ln (x/999999); > if (x>1) correlation/=x; > I think we should go with something with some statistical principle behind it if we want to do that. There is the notion of "restricted range" in correlations, that if there is a high correlation over the full range of data, but you zoom into only a small fraction of that range, the correlation you see over that restricted range will be much less than the full correlation is. I don't know that field well enough to give a formula off the top of my head, but I think it will be based on the fraction of the key space which is being scanned, and (1-RSQ), rather than an arbitrary number like 999999. Cheers, Jeff