Re: [HACKERS] GiST penalty functions [PoC]

Heikki Linnakangas Tue, 06 Sep 2016 09:58:45 -0700

On 08/29/2016 09:04 AM, Andrew Borodin wrote:

In this message I address only first problem. I want to construct a
penalty function that will choose:
1.    Entry with a zero volume and smallest margin, that can
accommodate insertion tuple without extension, if one exists;
2.    Entry with smallest volume, that can accommodate insertion tuple
without extension, if one exists;
3.    Entry with zero volume increase after insert and smallest margin
increase, if one exists;
4.    Entry with minimal volume increase after insert.

Looking at the code, by "margin", you mean the sum of all "edges", i.e.of each dimension, of the cube. I guess the point of that is todifferentiate between cubes that have the same volume, but a differentshape.

So, let's try to come up with a scheme that doesn't require IEEE 754floats. Those above cases can be split into two categories, depending onwhether the new value has zero volume or not. We can use a completelydifferent scheme for the two cases, if we want, because when we'rechoosing a target page to insert to, penalty gets called for everyoriginal tuple, but with the same new tuple.

Here's a scheme I just came up with. There might be better ones, butit's something.


Let's have:

oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase

oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase

Case A: New entry has zero volume.
------

Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.

So how do we cram those two into a single float?

If we offset case A1 by n, we can use the range between [0, nE] for A2.Something like this pseudocode:


if (dE > 0)
    return nE + dE;       /* A1, offset dE into range [nE, inf] */
else
    return nE - (nE/oE);  /* A2, scale oE into range [0, nE] */

Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.

By offsetting cases B1 and B2, we can again divide the range into threeparts, with pseudocode like:


if (dV > 0)
    return nV + nE + dV;   /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)
    return nV + dE;        /* B2, offset dE into range [nV, nV+nE] */
else
    return nV - (nV/oV)    /* B3, scale oV into range [0, nV]

I’ve tested patch performance with attached test. On my machine patch
slows index construction time from 60 to 76 seconds, reduces size of
the index from 85Mb to 82Mb, reduces time of aggregates computation
from 36 seconds to 29 seconds.

Hmm. That's a pretty large increase in construction time. Have you doneany profiling on where the time goes?

I do not know: should I continue this work in cube, or it’d be better
to fork cube?

Should definitely continue work within cube, IMHO. I don't think forkingit to a new datatype would make this any easier.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

Reply via email to