On 08/29/2016 09:04 AM, Andrew Borodin wrote:
In this message I address only first problem. I want to construct a
penalty function that will choose:
1.    Entry with a zero volume and smallest margin, that can
accommodate insertion tuple without extension, if one exists;
2.    Entry with smallest volume, that can accommodate insertion tuple
without extension, if one exists;
3.    Entry with zero volume increase after insert and smallest margin
increase, if one exists;
4.    Entry with minimal volume increase after insert.


Looking at the code, by "margin", you mean the sum of all "edges", i.e. of each dimension, of the cube. I guess the point of that is to differentiate between cubes that have the same volume, but a different shape.

So, let's try to come up with a scheme that doesn't require IEEE 754 floats. Those above cases can be split into two categories, depending on whether the new value has zero volume or not. We can use a completely different scheme for the two cases, if we want, because when we're choosing a target page to insert to, penalty gets called for every original tuple, but with the same new tuple.

Here's a scheme I just came up with. There might be better ones, but it's something.

Let's have:

oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase

oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase

Case A: New entry has zero volume.
------

Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.

So how do we cram those two into a single float?

If we offset case A1 by n, we can use the range between [0, nE] for A2. Something like this pseudocode:

if (dE > 0)
    return nE + dE;       /* A1, offset dE into range [nE, inf] */
else
    return nE - (nE/oE);  /* A2, scale oE into range [0, nE] */

Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.

By offsetting cases B1 and B2, we can again divide the range into three parts, with pseudocode like:

if (dV > 0)
    return nV + nE + dV;   /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)
    return nV + dE;        /* B2, offset dE into range [nV, nV+nE] */
else
    return nV - (nV/oV)    /* B3, scale oV into range [0, nV]

I’ve tested patch performance with attached test. On my machine patch
slows index construction time from 60 to 76 seconds, reduces size of
the index from 85Mb to 82Mb, reduces time of aggregates computation
from 36 seconds to 29 seconds.

Hmm. That's a pretty large increase in construction time. Have you done any profiling on where the time goes?

I do not know: should I continue this work in cube, or it’d be better
to fork cube?

Should definitely continue work within cube, IMHO. I don't think forking it to a new datatype would make this any easier.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to