Martijn van Oosterhout wrote:
On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote:
I worked with Nathan Boley to come up with what we think is a better
metric for measuring this cost. It is based on the number of times in
the ordered sample that you have to physically backtrack (i.e. the data
value increases, but the physical position is earlier).
For example, if the table's physical order is
6 7 8 9 10 1 2 3 4 5
How does it deal with a case like the following:
1 6 2 7 3 8 4 9 5 10 (interleaving)
ISTM that your code will overestimate the cost whereas the old code
wouldn't have done too badly.
Another similar case is where the tables is almost in order, but not
quite. For example, take an ordered table, but swap every other value:
2 1 4 3 6 5 7 8 10 9
Distributions similar to that that would happen naturully if you have a
logging application that receives timestamped events from elsewhere. The
events would be come in roughly in timestamp order, but not quite
because of different delays, for example. For all practical purposes,
that is just as good as a perfectly sorted table.
However, the patch actually operates on the *sample*, not the real
physical order. So I think it would actually work well for my example,
because if you take just a few values from that sequence in random, the
sample is likely to be in good order. Not sure about the interleaved case.
I wonder how the proposed measurement behaves wrt. the sample size vs.
table size ratio. Intuitively, it feels like it becomes less sensitive
to disorder, as the table size grows and (sample size)/(table size)
becomes smaller. Which is not good, I believe.
I think the code is in the right direction, but I think want you want
is some kind of estimate of "given I've looked for tuple X, how many
tuples in the next k pages are near this one". Unfortunatly I don't see
a way of calculating it other than a full simulation.
Yeah, it would be nice to figure out something, as the current method
sure isn't perfect.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers