On 08/29/2016 09:04 AM, Andrew Borodin wrote:

In this message I address only first problem. I want to construct a penalty function that will choose: 1. Entry with a zero volume and smallest margin, that can accommodate insertion tuple without extension, if one exists; 2. Entry with smallest volume, that can accommodate insertion tuple without extension, if one exists; 3. Entry with zero volume increase after insert and smallest margin increase, if one exists; 4. Entry with minimal volume increase after insert.

## Advertising

`Looking at the code, by "margin", you mean the sum of all "edges", i.e.`

`of each dimension, of the cube. I guess the point of that is to`

`differentiate between cubes that have the same volume, but a different`

`shape.`

`So, let's try to come up with a scheme that doesn't require IEEE 754`

`floats. Those above cases can be split into two categories, depending on`

`whether the new value has zero volume or not. We can use a completely`

`different scheme for the two cases, if we want, because when we're`

`choosing a target page to insert to, penalty gets called for every`

`original tuple, but with the same new tuple.`

`Here's a scheme I just came up with. There might be better ones, but`

`it's something.`

Let's have: oX: Original tuple's edge sum nX: New tuple's edge sum dX: Edge increase oV: Original tuple's volume nX: New tuple's volume dX: Volume increase Case A: New entry has zero volume. ------ Two sub-cases: A1: if dE > 0, use dE. dE must be in the range [0, nE] A2: otherwise, use oE. So how do we cram those two into a single float?

`If we offset case A1 by n, we can use the range between [0, nE] for A2.`

`Something like this pseudocode:`

if (dE > 0) return nE + dE; /* A1, offset dE into range [nE, inf] */ else return nE - (nE/oE); /* A2, scale oE into range [0, nE] */ Case B: New entry has non-zero volume ------ B1: if dV > 0. use dV. dV must be in the range [0, nV]. B2: if dE > 0, use dE. dE must be in the range [0, nE]. B3: oV, otherwise.

`By offsetting cases B1 and B2, we can again divide the range into three`

`parts, with pseudocode like:`

if (dV > 0) return nV + nE + dV; /* B1, offset dV into range [nV+nE, inf] */ else if (dE > 0) return nV + dE; /* B2, offset dE into range [nV, nV+nE] */ else return nV - (nV/oV) /* B3, scale oV into range [0, nV]

I’ve tested patch performance with attached test. On my machine patch slows index construction time from 60 to 76 seconds, reduces size of the index from 85Mb to 82Mb, reduces time of aggregates computation from 36 seconds to 29 seconds.

`Hmm. That's a pretty large increase in construction time. Have you done`

`any profiling on where the time goes?`

I do not know: should I continue this work in cube, or it’d be better to fork cube?

`Should definitely continue work within cube, IMHO. I don't think forking`

`it to a new datatype would make this any easier.`

- Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers