Re: [HACKERS] new correlation metric

Heikki Linnakangas Sun, 26 Oct 2008 08:58:14 -0700

Martijn van Oosterhout wrote:

On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote:

I worked with Nathan Boley to come up with what we think is a better
metric for measuring this cost. It is based on the number of times in
the ordered sample that you have to physically backtrack (i.e. the data
value increases, but the physical position is earlier).


For example, if the table's physical order is

6 7 8 9 10 1 2 3 4 5


How does it deal with a case like the following:

1 6 2 7 3 8 4 9 5 10  (interleaving)

ISTM that your code will overestimate the cost whereas the old code
wouldn't have done too badly.

Another similar case is where the tables is almost in order, but notquite. For example, take an ordered table, but swap every other value:


2 1 4 3 6 5 7 8 10 9

Distributions similar to that that would happen naturully if you have alogging application that receives timestamped events from elsewhere. Theevents would be come in roughly in timestamp order, but not quitebecause of different delays, for example. For all practical purposes,that is just as good as a perfectly sorted table.

However, the patch actually operates on the *sample*, not the realphysical order. So I think it would actually work well for my example,because if you take just a few values from that sequence in random, thesample is likely to be in good order. Not sure about the interleaved case.

I wonder how the proposed measurement behaves wrt. the sample size vs.table size ratio. Intuitively, it feels like it becomes less sensitiveto disorder, as the table size grows and (sample size)/(table size)becomes smaller. Which is not good, I believe.

I think the code is in the right direction, but I think want you want
is some kind of estimate of "given I've looked for tuple X, how many
tuples in the next k pages are near this one". Unfortunatly I don't see
a way of calculating it other than a full simulation.

Yeah, it would be nice to figure out something, as the current methodsure isn't perfect.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new correlation metric

Reply via email to