### Re: [HACKERS] GiST penalty functions [PoC]

> This thread has basically died, so marking as returned with feedback. Well, not exactly died, but it's kind of in blind lead. I'll summarize a bit for future references: 1. cube extension for indexing lowered dimensionality data (2d in 3d) is broken. Queries effectively will degrade to full index scan. 2. In existing GiST API this can be fixed only with technique, implemented in this patch. But this technique seems controversial. 3. This patch also brings 30%-40% speedup, but expected speedup of advanced GIST API techniques is much higher. As Norbert Beckmann put it in other discussion: > Overlap optimization [requires API advancement] is one of the main elements, > if not the main query performance tuning element of the RR*-tree. You would > fall back to old R-Tree times if that would be left off. Outcome: I'm planning to rollout another patch for both GiST and cube, probably to next CF, which will render this patch unnecessary. Thanks to everyone for discussion. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

On Thu, Sep 22, 2016 at 3:45 AM, Andrew Borodinwrote: > [blah] > > Practically, this algorithm cannot be implemented in current GiST API > (only if we provide non-penalty-based choose subtree function, > optional for GiST extension), but it certainly has scientific value. This thread has basically died, so marking as returned with feedback. If you feel that's not adapted, feel free to move it to next CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Hi Heikki! > Got a link for a description of the RR*-tree? Would be good to reference it > in the patch comments, too. Well, as for now, the best way to reach the paper is DOI 10.1145/1559845.1559929 http://sci-hub.cc/ Authors in conversations clearly stated that they endorse (not sure in correct word) implementation in PostgreSQL, so I do not think it's a bad kind of piracy. More or less persistent link is http://dl.acm.org/citation.cfm?id=1559929 > If I understand correctly, cases #1 and #3 arise when one of the dimensions > is 0. For example, in a 3D space, if the existing entry is a rectangle on a > plane, with zero-length edge on one of the dimensions, and the new entry is > on the same plane. Case #1 arises if the new entry falls within that > rectangle, and case #3 if it's outside it. Currently, we treat all such > cases as 0-penalty, because the degenerate 0-dimension causes all the > calculated volumes to become zero. So clearly we can do better, which is > what this patch does. > > At first blush, I'm surprised that you switch to using the sum of the edges > in those cases. I would expect to ignore the degenerate 0-dimension, and > calculate the volume using the other dimensions. So in a 3D space, you would > calculate the increase of the area of the rectangle (A*B), not the sum of > the edges (A+B). And it probably would be best to take into account how many > of the dimensions are zero. So in a 3D space, if there is an existing line > segment that the new point falls into, and also a rectangle that the new > point falls into, you should prefer the 1-dimensional line segment over the > 2-dimensional rectangle. > > I don't know how big a difference that makes in practice. But it seems odd > that if you e.g. have a set of 3D points, but the Z dimension in all of the > points is 0, the algorithm behaves differently than if you had the exact > same points in a 2D space. > > (If this is explained in the RR*-tree paper, feel free to just point me to > that and I'll ask again if I have further questions.) As far as I know, your version of penalty function degradation is a new concept regarding R-trees. I have not saw this idea before. It is based on two assertions: 1. Degrading algorithms should resemble general algorithm, if choice of general algorithm is correct. 2. Degradation happens when and only when at least on edge of MBB has zero size. First assertion seems correct, while second is wrong. When you index high-dimensional data (say 10 dimensions), you can easily reach 0 by multiplication of values around 10^-4. And such data often is a result of scaling and normalizing in machine learning, these are concepts natural for them, along with high dimensinonality. We can rejig your algorithm: edges are sorted in descending order, and multiplied just before getting zero. Choose subtree picks tuple by count of numbers multiplied, resolving ties by result of multiplication. We can get on fire with big edges, but current cube has no more than 100 dimensions. That is a little more than 1000 for each before overlow (if multiplication is done in doubles). Practically, this algorithm cannot be implemented in current GiST API (only if we provide non-penalty-based choose subtree function, optional for GiST extension), but it certainly has scientific value. Regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

On 09/14/2016 07:57 PM, Andrew Borodin wrote: Here is v6 of cube penalty patch. There was a warning about unused edge function under systems without __STDC_IEC_559__ defined, this patch fixes it. Thanks! Reading through this thread again, I think we got a bit side-tracked with this float bit-twiddling aspect, and we haven't really discussed the algorithm itself. So: On 08/29/2016 09:04 AM, Andrew Borodin wrote: Some time ago I put a patch to commitfest that optimizes slightly GiST inserts and updates. It's effect in general is quite modest (15% perf improved), but for sorted data inserts are 3 times quicker. This effect I attribute to mitigation of deficiencies of old split algorithm. Alexander Korotkov advised me to lookup some features of the RR*-tree. > The RR*-tree differs from combination of Gist + cube extension in two algorithms: 1.Algorithm to choose subtree for insertion 2.Split algorithm Got a link for a description of the RR*-tree? Would be good to reference it in the patch comments, too. This is the message regarding implementation of first one in GiST. I think that both of this algorithms will improve querying performance. Long story short, there are two problems in choose subtree algorithm. 1.GiST chooses subtree via penalty calculation for each entry, while RR*-tree employs complicated conditional logic: when there are MBBs (Minimum Bounding Box) which do not require extensions, smallest of them is chosen; in some cases, entries are chosen not by volume, but by theirs margin. 2.RR*-tree algorithm jumps through entry comparation non-linearly, I think this means that GiST penalty computation function will have to access other entries on a page. In this message I address only first problem. I want to construct a penalty function that will choose: 1.Entry with a zero volume and smallest margin, that can accommodate insertion tuple without extension, if one exists; 2.Entry with smallest volume, that can accommodate insertion tuple without extension, if one exists; 3.Entry with zero volume increase after insert and smallest margin increase, if one exists; 4.Entry with minimal volume increase after insert. Current cube behavior inspects only 4th option by returning as a penalty (float) MBB’s volume increase. To implement all 4 options, I use a hack: order of positive floats corresponds exactly to order of integers with same binary representation (I’m not sure this stands for every single supported platform). So I use two bits of float as if it were integer, and all others are used as shifted bits of corresponding float: option 4 – volume increase, 3 - margin increase, 2 – MBB volume, 1 – MBB margin. You can check the reinterpretation cast function pack_float() in the patch. If I understand correctly, cases #1 and #3 arise when one of the dimensions is 0. For example, in a 3D space, if the existing entry is a rectangle on a plane, with zero-length edge on one of the dimensions, and the new entry is on the same plane. Case #1 arises if the new entry falls within that rectangle, and case #3 if it's outside it. Currently, we treat all such cases as 0-penalty, because the degenerate 0-dimension causes all the calculated volumes to become zero. So clearly we can do better, which is what this patch does. At first blush, I'm surprised that you switch to using the sum of the edges in those cases. I would expect to ignore the degenerate 0-dimension, and calculate the volume using the other dimensions. So in a 3D space, you would calculate the increase of the area of the rectangle (A*B), not the sum of the edges (A+B). And it probably would be best to take into account how many of the dimensions are zero. So in a 3D space, if there is an existing line segment that the new point falls into, and also a rectangle that the new point falls into, you should prefer the 1-dimensional line segment over the 2-dimensional rectangle. I don't know how big a difference that makes in practice. But it seems odd that if you e.g. have a set of 3D points, but the Z dimension in all of the points is 0, the algorithm behaves differently than if you had the exact same points in a 2D space. (If this is explained in the RR*-tree paper, feel free to just point me to that and I'll ask again if I have further questions.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Here is v6 of cube penalty patch. There was a warning about unused edge function under systems without __STDC_IEC_559__ defined, this patch fixes it. Regards, Andrey Borodin. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..ad868ac 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -18,6 +18,7 @@ #include "cubedata.h" + PG_MODULE_MAGIC; /* @@ -27,6 +28,15 @@ PG_MODULE_MAGIC; #define ARRNELEMS(x) ArrayGetNItems( ARR_NDIM(x), ARR_DIMS(x)) /* + * If IEEE754 floats are fully supported we use advanced penalty + * which takes into account cube volume, perimeter and tend to favor + * small nodes over big ones. For more info see g_cube_penalty implementation + */ +#ifdef __STDC_IEC_559__ +#define ADVANCED_PENALTY +#endif + +/* ** Input/Output routines */ PG_FUNCTION_INFO_V1(cube_in); @@ -91,14 +101,18 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static int32 cube_cmp_v0(NDBOX *a, NDBOX *b); +static boolcube_contains_v0(NDBOX *a, NDBOX *b); +static boolcube_overlap_v0(NDBOX *a, NDBOX *b); +static NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +#ifdef ADVANCED_PENALTY +static float pack_float(float value, int realm); +static voidrt_cube_edge(NDBOX *a, double *sz); +#endif +static voidrt_cube_size(NDBOX *a, double *sz); +static NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions @@ -419,7 +433,36 @@ g_cube_decompress(PG_FUNCTION_ARGS) } PG_RETURN_POINTER(entry); } +/* + * Function to pack floats of different realms + * This function serves to pack bit flags inside float type + * Resulted value represent can be from four different "realms" + * Every value from realm 3 is greater than any value from realms 2,1 and 0. + * Every value from realm 2 is less than every value from realm 3 and greater + * than any value from realm 1 and 0, and so on. Values from the same realm + * loose two bits of precision. This technique is possible due to floating + * point numbers specification according to IEEE 754: exponent bits are highest + * (excluding sign bits, but here penalty is always positive). If float a is + * greater than float b, integer A with same bit representation as a is greater + * than integer B with same bits as b. + */ +#ifdef ADVANCED_PENALTY +static float +pack_float(const float value, const int realm) +{ + union { +float f; +struct { unsigned value:31, sign:1; } vbits; +struct { unsigned value:29, realm:2, sign:1; } rbits; + } a; + + a.f = value; + a.rbits.value = a.vbits.value >> 2; + a.rbits.realm = realm; + return a.f; +} +#endif /* ** The GiST Penalty method for boxes @@ -441,9 +484,42 @@ g_cube_penalty(PG_FUNCTION_ARGS) rt_cube_size(DatumGetNDBOX(origentry->key), ); *result = (float) (tmp1 - tmp2); - /* -* fprintf(stderr, "penalty\n"); fprintf(stderr, "\t%g\n", *result); -*/ +#ifdef ADVANCED_PENALTY + /* Realm tricks are used only in case of IEEE754 support(IEC 60559) */ + + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + + if( *result == 0 ) + { + double tmp3 = tmp1; /* remember entry volume */ + rt_cube_edge(ud, ); + rt_cube_edge(DatumGetNDBOX(origentry->key), ); + *result = (float) (tmp1 - tmp2); + if( *result == 0 ) + { + if( tmp3 != 0 ) + { + *result = pack_float(tmp3, 1); /* REALM 1 */ + } + else + { + *result = pack_float(tmp1, 0); /* REALM 0 */ + } + } + else + { + *result = pack_float(*result, 2); /* REALM 2 */ + } + } + else + { +

### Re: [HACKERS] GiST penalty functions [PoC]

Hi hackers! Here is gist cube penaly patch v5. Changes: 1. union version of pack_float() is used, without warings and with minimum asm commands emitted 2. test for __STDC_IEC_559__ is added, this is a standard way of C99 to determine float compliance with IEEE754. ANSI-only compiler+libs, along with non-IEEE-float-systems, will fallback to pre-patch behavior. Best regards, Andrey Borodin. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..c86418d 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -91,14 +91,18 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static int32 cube_cmp_v0(NDBOX *a, NDBOX *b); +static boolcube_contains_v0(NDBOX *a, NDBOX *b); +static boolcube_overlap_v0(NDBOX *a, NDBOX *b); +static NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +#ifdef __STDC_IEC_559__ +static float pack_float(float value, int realm); +#endif +static voidrt_cube_size(NDBOX *a, double *sz); +static voidrt_cube_edge(NDBOX *a, double *sz); +static NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions @@ -419,7 +423,36 @@ g_cube_decompress(PG_FUNCTION_ARGS) } PG_RETURN_POINTER(entry); } +/* + * Function to pack floats of different realms + * This function serves to pack bit flags inside float type + * Resulted value represent can be from four different "realms" + * Every value from realm 3 is greater than any value from realms 2,1 and 0. + * Every value from realm 2 is less than every value from realm 3 and greater + * than any value from realm 1 and 0, and so on. Values from the same realm + * loose two bits of precision. This technique is possible due to floating + * point numbers specification according to IEEE 754: exponent bits are highest + * (excluding sign bits, but here penalty is always positive). If float a is + * greater than float b, integer A with same bit representation as a is greater + * than integer B with same bits as b. + */ +#ifdef __STDC_IEC_559__ +static float +pack_float(const float value, const int realm) +{ + union { +float f; +struct { unsigned value:31, sign:1; } vbits; +struct { unsigned value:29, realm:2, sign:1; } rbits; + } a; + a.f = value; + a.rbits.value = a.vbits.value >> 2; + a.rbits.realm = realm; + + return a.f; +} +#endif /* ** The GiST Penalty method for boxes @@ -428,6 +461,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -441,9 +479,36 @@ g_cube_penalty(PG_FUNCTION_ARGS) rt_cube_size(DatumGetNDBOX(origentry->key), ); *result = (float) (tmp1 - tmp2); - /* -* fprintf(stderr, "penalty\n"); fprintf(stderr, "\t%g\n", *result); -*/ +#ifdef __STDC_IEC_559__ + /* Realm tricks are in use only in case of IEEE754 support(IEC 60559)*/ + if( *result == 0 ) + { + double tmp3 = tmp1; /* remember entry volume */ + rt_cube_edge(ud, ); + rt_cube_edge(DatumGetNDBOX(origentry->key), ); + *result = (float) (tmp1 - tmp2); + if( *result == 0 ) + { + if( tmp3 != 0 ) + { + *result = pack_float(tmp3, 1); /* REALM 1 */ + } + else + { + *result = pack_float(tmp1, 0); /* REALM 0 */ + } + } + else + { + *result = pack_float(*result, 2); /* REALM 2 */ + } + } + else + { +

### Re: [HACKERS] GiST penalty functions [PoC]

>Personally I wouldn't be very happy about an IEEE754 assumption. Ok, so let's avoid autoconf man. There is no real explanations of the ground for this assumption, just a reference to paper of David Goldberg (and there is no metion about IEEE754 is absoulte everywhere). BTW, may be we can ask David Goldberg how to hack that float properly? I do not see any math sense in this: pack_float(float actualValue, int realm) { ... int realmAjustment = *((int*))/4; float something = *((float*)) ... } No wonder it's not in libs. Nither I see a way to implement it with ldexp siblings. >I did go to the trouble of testing Postgres on a VAX and we fixed the few instances where it had such dependencies for a net gain. Greg, could you please point out those places to see how exaclty it should be #ifdef'd? Regrads, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

On Thu, Sep 8, 2016 at 9:29 AM, Andrew Borodinwrote: >>autoconf check for IEEE 754 floats > Autoconf man says folowing: >>it is safe to assume IEEE-754 in most portable code these days > https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability Personally I wouldn't be very happy about an IEEE754 assumption. I did go to the trouble of testing Postgres on a VAX and we fixed the few instances where it had such dependencies for a net gain. If we intentionally put a dependency in in one place then we'll never be able to determine anywhere else where there are unintentional dependencies. I haven't followed the thread closely but I'm puzzled why you would need to use bit twiddling to set a floating point number. Isn't there a perfectly good way to calculate the value you want using ldexp() and other standard C library functions? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

If you are still interested in. Here are 3 versions of pack-float. The union version of pack-float should run faster. The code is simpler, the dependencies are easier. But it may be less accurate or even wrong, as for signed integers (x>>2) and (x/4) are not the same. Consider x = -1. You may try pack_float_good, which gives the same asm as v3, but without warnings. - Mikhail, respectfully On Thu, Sep 08, 2016 at 01:29:36PM +0500, Andrew Borodin wrote: > >autoconf check for IEEE 754 floats > Autoconf man says folowing: > >it is safe to assume IEEE-754 in most portable code these days > https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability > > > A union might be more readable > Here is union version of the patch. It's slower 10% than original cube > and dereference version. Have no idea why. > Select performance is improved as in v3. > #include typedef union { float fp; int i; } U; float pack_float(const float v, const int r) { const U a = { .fp = v }; const U b = { .i = (a.i >> 2) + r * (INT32_MAX / 4) }; return b.fp; } float pack_float_av(float v, int r) { U buf; buf.fp = v; buf.i = (buf.i >> 2) + (INT32_MAX / 4) * r; return buf.fp; } float pack_float_v3(float actualValue, int realm) { /* two bits for realm, others for value */ /* we have 4 realms */ int realmAjustment = *((int*))/4; int realCode = realm * (INT32_MAX/4) + realmAjustment; return *((float*)); } float pack_float_good(const float v, const int r) { const U a = { .fp = v }; const U b = { .i = a.i/4 + r * (INT32_MAX / 4) }; return b.fp; } .file "pack-float.c" .text .p2align 4,,15 .globl pack_float .type pack_float, @function pack_float: .LFB0: .cfi_startproc movd %xmm0, %eax movl %edi, %edx sall $29, %edx sarl $2, %eax subl %edi, %edx addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE0: .size pack_float, .-pack_float .p2align 4,,15 .globl pack_float_av .type pack_float_av, @function pack_float_av: .LFB1: .cfi_startproc movd %xmm0, %eax movl %edi, %edx sall $29, %edx sarl $2, %eax subl %edi, %edx addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE1: .size pack_float_av, .-pack_float_av .p2align 4,,15 .globl pack_float_v3 .type pack_float_v3, @function pack_float_v3: .LFB2: .cfi_startproc movd %xmm0, %edx leal 3(%rdx), %eax testl %edx, %edx cmovns %edx, %eax sarl $2, %eax movl %eax, %edx movl %edi, %eax sall $29, %eax subl %edi, %eax addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE2: .size pack_float_v3, .-pack_float_v3 .p2align 4,,15 .globl pack_float_good .type pack_float_good, @function pack_float_good: .LFB3: .cfi_startproc movd %xmm0, %edx leal 3(%rdx), %eax testl %edx, %edx cmovns %edx, %eax sarl $2, %eax movl %eax, %edx movl %edi, %eax sall $29, %eax subl %edi, %eax addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE3: .size pack_float_good, .-pack_float_good .ident "GCC: (GNU) 6.1.1 20160802" .section .note.GNU-stack,"",@progbits signature.asc Description: PGP signature

### Re: [HACKERS] GiST penalty functions [PoC]

Yes. You are right, ANSI C allows only load-time initializers. Attached ANSI compatible version leads to the same assembly. And let me suggest a bit-twiddling version as well. It gives 12 instructions, instead of 13. 12 is better, as modern x86 CPU will fetch them at most in 3 cycles, one less than for 13 instructions. Also this bit-twiddling is more parallel at instruction level. And for ARM, which is unsurpassed at bit-twiddling this code is a way better. Of course speed is influenced by a lot of factors as always, so it needs to be tested on some datasets. - Mikhail, respectfully On Fri, Sep 09, 2016 at 08:50:53AM +0500, Andrey Borodin wrote: > Thank you for your attention to details, Mikhail. > > pack_float_good() looks good. But I'm not sure inline strict init is allowed > under ansi C. Converting to regular ancient form b.fp = v; won't change > compile result, would it? > > Regards, Andrey Borodin. #include typedef union { float fp; int i; } U; float pack_float(const float v, const int r) { const U a = { .fp = v }; const U b = { .i = (a.i >> 2) + r * (INT32_MAX / 4) }; return b.fp; } float pack_float_av(float v, int r) { U buf; buf.fp = v; buf.i = (buf.i >> 2) + (INT32_MAX / 4) * r; return buf.fp; } float pack_float_v3(float actualValue, int realm) { /* two bits for realm, others for value */ /* we have 4 realms */ int realmAjustment = *((int*))/4; int realCode = realm * (INT32_MAX/4) + realmAjustment; return *((float*)); } float pack_float_good(const float v, const int r) { const U a = { .fp = v }; const U b = { .i = a.i/4 + r * (INT32_MAX / 4) }; return b.fp; } float pack_float_ansi(const float v, const int r) { union { float f; int i; } a; a.f = v; a.i = a.i / 4 + r * (INT32_MAX / 4); return a.f; } float pack_float_bits(const float v, const int r) { union { float f; struct { unsigned value:31, sign:1; } vbits; struct { unsigned value:29, realm:2, sign:1; } rbits; } a; a.f = v; a.rbits.value = a.vbits.value >> 2; a.rbits.realm = r; return a.f; } .file "pack-float.c" .text .p2align 4,,15 .globl pack_float .type pack_float, @function pack_float: .LFB0: .cfi_startproc movd %xmm0, %eax movl %edi, %edx sall $29, %edx sarl $2, %eax subl %edi, %edx addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE0: .size pack_float, .-pack_float .p2align 4,,15 .globl pack_float_av .type pack_float_av, @function pack_float_av: .LFB1: .cfi_startproc movd %xmm0, %eax movl %edi, %edx sall $29, %edx sarl $2, %eax subl %edi, %edx addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE1: .size pack_float_av, .-pack_float_av .p2align 4,,15 .globl pack_float_v3 .type pack_float_v3, @function pack_float_v3: .LFB2: .cfi_startproc movd %xmm0, %edx leal 3(%rdx), %eax testl %edx, %edx cmovns %edx, %eax sarl $2, %eax movl %eax, %edx movl %edi, %eax sall $29, %eax subl %edi, %eax addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE2: .size pack_float_v3, .-pack_float_v3 .p2align 4,,15 .globl pack_float_good .type pack_float_good, @function pack_float_good: .LFB3: .cfi_startproc movd %xmm0, %edx leal 3(%rdx), %eax testl %edx, %edx cmovns %edx, %eax sarl $2, %eax movl %eax, %edx movl %edi, %eax sall $29, %eax subl %edi, %eax addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE3: .size pack_float_good, .-pack_float_good .p2align 4,,15 .globl pack_float_ansi .type pack_float_ansi, @function pack_float_ansi: .LFB4: .cfi_startproc movd %xmm0, %edx leal 3(%rdx), %eax testl %edx, %edx cmovns %edx, %eax sarl $2, %eax movl %eax, %edx movl %edi, %eax sall $29, %eax subl %edi, %eax addl %edx, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE4: .size pack_float_ansi, .-pack_float_ansi .p2align 4,,15 .globl pack_float_bits .type pack_float_bits, @function pack_float_bits: .LFB5: .cfi_startproc movd %xmm0, %edx movd %xmm0, %eax andl $3, %edi sall $29, %edi andl $2147483647, %edx andl $-2147483648, %eax shrl $2, %edx orl %edx, %eax orl %edi, %eax movl %eax, -4(%rsp) movss -4(%rsp), %xmm0 ret .cfi_endproc .LFE5: .size pack_float_bits, .-pack_float_bits .ident "GCC: (GNU) 6.1.1 20160802" .section .note.GNU-stack,"",@progbits .arch armv7-a .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "pack-float.c" .text .align 2 .global pack_float .syntax unified .arm .fpu vfpv3-d16 .type pack_float, %function pack_float: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vmov r3, s0 @ int rsb r0, r0, r0, lsl #29 add r0, r0, r3, asr #2 vmov s0, r0 bx lr .size pack_float,

### Re: [HACKERS] GiST penalty functions [PoC]

Thank you for your attention to details, Mikhail. pack_float_good() looks good. But I'm not sure inline strict init is allowed under ansi C. Converting to regular ancient form b.fp = v; won't change compile result, would it? Regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Excuse me for intervention. It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which does not have sqrt support. But you are right, it is supported in most cases. And if execution speed of this very fuction is of concern, sqrtf(x) should be used instead of sqrt(x). Despite this, Andrew's solution gives more accurate representation of values. And as far as i understand, this improves overall performance by decreasing the overall amount of instructions, which must be executed. It is possible to speed up Andrew's implementation and get rid of warnings by using bit-masks and unions. Something like: union { float f; struct { unsigned int mantissa:23, exponent:8, sign:1; } bits; } I am sorry, i have no time to check this. But it is common wisdom to avoid pointer-based memory accesses in high-performance code, as they create a lot of false write-to-read dependencies. - Mikhail, respectfully On Wed, Sep 07, 2016 at 05:58:42PM -0400, Tom Lane wrote: > Heikki Linnakangaswrites: > > Unfortunately, sqrt(x) isn't very cheap. > > You'd be surprised: sqrt is built-in on most modern hardware. On my > three-year-old workstation, sqrt(x) seems to take about 2.6ns. For > comparison, the pack_float version posted in > > takes 3.9ns (and draws multiple compiler warnings, too). > > regards, tom lane signature.asc Description: PGP signature

### Re: [HACKERS] GiST penalty functions [PoC]

>BTW, would someone explain to me why using a float here will not fail >catastrophically for inputs outside the range of float? Indeed, it will. And that's one of the reason I'm considering advancing GiST API instead of advancing extensions. It won't crash, but will produce terrible index, not suitable of performing efficient queries. GiST treats penalty this way: /* disallow negative or NaN penalty */ if (isnan(penalty) || penalty < 0.0) penalty = 0.0; Any NaN is a "perfect fit". Calculation of real (say float) penalty and choice of subtree with smallest penalty is an idea from 92. Modern spatial index requires more than just a float to compare subtrees. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Heikki Linnakangaswrites: > BTW, I would be OK with the bit-twiddling hack, if we had an autoconf > check for IEEE 754 floats, and a graceful fallback for other systems. I would still object to the version submitted, on the grounds of the compiler warnings it causes. Possibly those could be avoided with a union, though; Float4GetDatum doesn't produce such warnings. BTW, would someone explain to me why using a float here will not fail catastrophically for inputs outside the range of float? Cubes are based on doubles, not floats. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

>autoconf check for IEEE 754 floats Autoconf man says folowing: >it is safe to assume IEEE-754 in most portable code these days https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability > A union might be more readable Here is union version of the patch. It's slower 10% than original cube and dereference version. Have no idea why. Select performance is improved as in v3. Also I've investigated a bit why linear package failed. It's usefull to think in terms of bijections, not in terms of arithmetic functions. float 1 is 1065353216 hex 3f80 FLT_MAX / ( 8 >> 3 ) is 2139095039 hex 7f7f FLT_MAX / ( 8 >> 2 ) is 2130706431 hex 7eff FLT_MAX / ( 8 >> 1 ) is 2122317823 hex 7e7f FLT_MAX / ( 8 >> 0 ) is 2113929215 hex 7dff This realm borders are completly wrong. That maens I used 800 thousands values to pack 2billions of values of realms 1-3, and all other 2.1 bils of values to pack realm 0. Fragile arithmetics was done wrong. What I had to use was Realm 3 INT32_MAX is nan (or something little less than nan) 3*INT32_MAX/4 is 3.689348e+19 Total little less than 512kk different values Realm 2 3*INT32_MAX/4 is 3.689348e+19 INT32_MAX/2 is 2.00e+00 Total 512kk different values Realm 1 INT32_MAX/2 is 2.00e+00 INT32_MAX/4 is 1.084202e-19 Total 512kk different values Realm 0 INT32_MAX/4 is 1.084202e-19 0 is 0.00e+00 Total 512kk different values Though I hadn't tested it yet. I'm not sure, BTW, hardcoding this consts is a good idea. Best regards, Andrey Borodin, Octonica & Ural Federal University. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..ded639b 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -91,14 +91,16 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static int32 cube_cmp_v0(NDBOX *a, NDBOX *b); +static boolcube_contains_v0(NDBOX *a, NDBOX *b); +static boolcube_overlap_v0(NDBOX *a, NDBOX *b); +static NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +static float pack_float(float actualValue, int realm); +static voidrt_cube_size(NDBOX *a, double *sz); +static voidrt_cube_edge(NDBOX *a, double *sz); +static NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions @@ -419,7 +421,32 @@ g_cube_decompress(PG_FUNCTION_ARGS) } PG_RETURN_POINTER(entry); } +/* + * Function to pack floats of different realms + * This function serves to pack bit flags inside float type + * Resulted value represent can be from four different "realms" + * Every value from realm 3 is greater than any value from realms 2,1 and 0. + * Every value from realm 2 is less than every value from realm 3 and greater + * than any value from realm 1 and 0, and so on. Values from the same realm + * loose two bits of precision. This technique is possible due to floating + * point numbers specification according to IEEE 754: exponent bits are highest + * (excluding sign bits, but here penalty is always positive). If float a is + * greater than float b, integer A with same bit representation as a is greater + * than integer B with same bits as b. + */ +static float +pack_float(float actualValue, int realm) +{ + union + { + float fp; + int32 bits; + } buffer; + buffer.fp = actualValue; + buffer.bits = (buffer.bits >> 2) + (INT32_MAX / 4) * realm; + return buffer.fp; +} /* ** The GiST Penalty method for boxes @@ -428,6 +455,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -441,9 +473,33 @@ g_cube_penalty(PG_FUNCTION_ARGS)

### Re: [HACKERS] GiST penalty functions [PoC]

On 09/08/2016 09:39 AM, Михаил Бахтерев wrote: Excuse me for intervention. It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which does not have sqrt support. But you are right, it is supported in most cases. And if execution speed of this very fuction is of concern, sqrtf(x) should be used instead of sqrt(x). Despite this, Andrew's solution gives more accurate representation of values. And as far as i understand, this improves overall performance by decreasing the overall amount of instructions, which must be executed. BTW, I would be OK with the bit-twiddling hack, if we had an autoconf check for IEEE 754 floats, and a graceful fallback for other systems. The fallback could be simply the current penalty function. You wouldn't get the benefit from the better penalty function on non-IEEE systems, then, but it would still be correct. It is possible to speed up Andrew's implementation and get rid of warnings by using bit-masks and unions. Something like: union { float f; struct { unsigned int mantissa:23, exponent:8, sign:1; } bits; } I am sorry, i have no time to check this. But it is common wisdom to avoid pointer-based memory accesses in high-performance code, as they create a lot of false write-to-read dependencies. The compiler should be smart enough to generate the same instructions either way. A union might be more readable, though. (We don't need to extract the mantissa, exponent and sign, so a union of float and int32 would do.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Heikki Linnakangaswrites: > Unfortunately, sqrt(x) isn't very cheap. You'd be surprised: sqrt is built-in on most modern hardware. On my three-year-old workstation, sqrt(x) seems to take about 2.6ns. For comparison, the pack_float version posted in takes 3.9ns (and draws multiple compiler warnings, too). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

On 09/07/2016 09:20 PM, Andrew Borodin wrote: Well, arithmetics is too fragile. This version of float packing with arithmetical packaging static float pack_float(float actualValue, int realm) { double max,min; max = FLT_MAX / ( 8 >> realm ); min = FLT_MAX / ( 16 >> realm ); if( realm == 0 ) min = 0; /* squeeze the actual value between min and max */ return ( min + (actualValue * ( max - min ) / FLT_MAX)); } Inserts are the same as of bithacked pack_float, but selects are 5 times slower. When we are trying to pack value linearly into range we loose too much bits. That's why I suggested scaling it by the new value's volume and/or edge-sum. I was hoping that the old and new values are roughly of the same magnitude, so that it would work out. I guess not. But we could probably something like the above too, if we use logarithmic or exponential, rather than linear, packing. Something like: static float pack_float(float actualValue, int realm) { double val; val = sqrt(sqrt(actualValue)); if (realm == 0) return actualvalue; if (realm == 1) return actualValue * sqrt(sqrt(FLT_MAX)); if (realm == 2) return actualValue * sqrt(FLT_MAX); } Unfortunately, sqrt(x) isn't very cheap. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Well, arithmetics is too fragile. This version of float packing with arithmetical packaging static float pack_float(float actualValue, int realm) { double max,min; max = FLT_MAX / ( 8 >> realm ); min = FLT_MAX / ( 16 >> realm ); if( realm == 0 ) min = 0; /* squeeze the actual value between min and max */ return ( min + (actualValue * ( max - min ) / FLT_MAX)); } Inserts are the same as of bithacked pack_float, but selects are 5 times slower. When we are trying to pack value linearly into range we loose too much bits. Here is how it happens: in floats addition of small values to big values causes loss of small values. Applied to Heikki's algorithm this means that nV, nE and dV can all be in different mantissa ranges. (Please note that dV ca be many times more than nV and many times smaller that nV) Integer bitshift of a float have no arithmetical meaning. It would be hard to describe in formulas what carring of mantissa bit to significant field mean. But this bitshift preserves an order among almost all float values (except 2 controllably lost bits and some values become sNaN ). Entire idea of the advanced GiST penalty stands on this. The other way is to add to API optional handler which executes all of choose subtree algorithm inside cube (or other extension). Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Oh, sorry, made one more attemp and now I see your algorithm differently. So you propose to use oE and oV as a markers of borders for what I call Realm. But there may be very little significant bits in one of this ranges. pg_sphere and PostGiS extensions tried to use 1 as a marker, with alike formula. And that generated not a good tree. Here's how they done it https://github.com/postgis/postgis/commit/9569e898078eeac8928891e8019ede2cbf27d06f I'll try to compose a patch for your formula later, I think we should just try and see, it is easier than to reason about digital floating points. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

On 09/07/2016 09:42 AM, Andrew Borodin wrote: 2. Your algorithm, among loosing some bits of precision (which is absolutely acceptable - we have like 31 of them and that’s a lot) rely on false assumption. We compare tuples on page not by MBR of inserted tuple, but by MBR of tuple on page, which is different for every penalty calculation. The penalty function has two arguments: the new tuple, and the existing tuple on the page. When we're inserting, it gets called for every existing tuple on the page, with the same new tuple. And then we compare the returned penalties. So for a single insertion, the "new tuple" argument stays the same for every call. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Hi Heikki! Thank you for reviewing the code, it's always inspiring when a work is noted (: >Looking at the code, by "margin", you mean the sum of all "edges", i.e. of each dimension, of the cube. Indeed. As far as I remember, this is a terminology of old R*-tree paper. Now they use the word "perimeter", seems like it's closer for English-speaking folks, so in future patches I'll switch to "perimeter" >I guess the point of that is to differentiate between cubes that have the same volume, but a different shape. Somewhat like this. For an R[something]-tree volume is an ultimate measure to compare figures, close to multidimensional cube. Tree must tend to form MBRs with equal edges to ensure search optimizations spread across all dimensions equally. But on practice volume degrade quickly. When you index dots, you quickly get a page with a "plane", lot's of dots with one dimensions on a fixed value. And then R-tree do not work anymore, inserts are somewhat-randomized by GiST, but in practice tree is a lot worse than randomized, due to randomization skew towards early insert candidates. Perimeter is not as good as volume in general case. But it does not degrade until you have truly equal object MBRs. So this was volume vs perimeter. For example in MySQL they use #define to choose one strategy of these, which seems for me not a good design. In this patch I propose to use perimeter when volume strategy degraded, effectively when one dimension edge is 0 or some dimensions edges are close to epsilon. But there is one more tie to break. When you choose subtree for insertion, some candidates have zero volume extension and zero edge extension. They do not need to be extended to accommodate new tuple. In this case we choose smallest one, by volume. If all of them have 0 volume, we choose by perimeter. All in all, we have 4 different cases here, ordered by priority. >So, let's try to come up with a scheme that doesn't require IEEE 754 floats. 1. Conversion for my algorithm to float ops. I actually haven't thought about it before, but seems it's impossible. Bit shift makes no sense for regular float operations, under any circumstances exponent bits cannot shit to significant field and vise versa. If we do not move last bit of exponent - all significant field bits are garbage, that leaves us with only 6 bits for actual value, which is unacceptable. 2. Your algorithm, among loosing some bits of precision (which is absolutely acceptable - we have like 31 of them and that’s a lot) rely on false assumption. We compare tuples on page not by MBR of inserted tuple, but by MBR of tuple on page, which is different for every penalty calculation. I'll put it in your var names, they are good to reason about proposed algorithm. oE: Original tuple's edge sum (perimeter measure). This tuple is on a page. nE: New tuple's edge sum. This tuple is to be inserted. We calc penalty to choose subtree with minimum penalty. dE: Edge increase. Edge of union(original|new) minus oE. oV: Original tuple's volume nV: New tuple's volume dV: Volume increase Best desired algorithm: sort tuples by dV, then by dE, then by oV, then by oE (all ascending) and pick top 1. Unfortunately, I do not see a way to do that in 31 bit of float (in GiST we cannot use sign bit, <=0 is used as a magic condition). So implemented is following: Say we have number ALOT which is guaranteed to be bigger than dV,dE,oV,oE. if( dV > 0 ) penalty = ALOT*3 + dV;//nonzero dV goes last elif ( dE > 0 ) penalty = ALOT*2 + dE;//than goes dE elif ( oV > 0 ) penalty = ALOT + oV then penalty = oE; //oE is zero for subtrees containing only equal points. In this case we pass split optimization possibility to next GiST attribute Volume and perimeter of new entry does not affect choice of subtree directly, whether it is zero or not. >Hmm. That's a pretty large increase in construction time. Have you done any profiling on where the time goes? Since I've changes only one function, I didn't consider profiling. I've asked Michael to check disassembly of the function call tree, implemented his recommendations and patch v2 have all build performance back (: And now it computes aggregates 40% faster. >continue work within cube There is one more option: implement extension index using CREATE ACCESS METHOD feature of 9.6. It is an option: we can skip all GiST API hacks and implement real RR*-tree, there are some other improvements. But, on the other hand, improvement of GiST API could be beneficial to other extensions. Again, thanks for your attention. Possibly we can come up with some solution without dirty hacks. Best regards, Andrey Borodin, Octonica & Ural Federal University.

### Re: [HACKERS] GiST penalty functions [PoC]

On 08/29/2016 09:04 AM, Andrew Borodin wrote: In this message I address only first problem. I want to construct a penalty function that will choose: 1.Entry with a zero volume and smallest margin, that can accommodate insertion tuple without extension, if one exists; 2.Entry with smallest volume, that can accommodate insertion tuple without extension, if one exists; 3.Entry with zero volume increase after insert and smallest margin increase, if one exists; 4.Entry with minimal volume increase after insert. Looking at the code, by "margin", you mean the sum of all "edges", i.e. of each dimension, of the cube. I guess the point of that is to differentiate between cubes that have the same volume, but a different shape. So, let's try to come up with a scheme that doesn't require IEEE 754 floats. Those above cases can be split into two categories, depending on whether the new value has zero volume or not. We can use a completely different scheme for the two cases, if we want, because when we're choosing a target page to insert to, penalty gets called for every original tuple, but with the same new tuple. Here's a scheme I just came up with. There might be better ones, but it's something. Let's have: oX: Original tuple's edge sum nX: New tuple's edge sum dX: Edge increase oV: Original tuple's volume nX: New tuple's volume dX: Volume increase Case A: New entry has zero volume. -- Two sub-cases: A1: if dE > 0, use dE. dE must be in the range [0, nE] A2: otherwise, use oE. So how do we cram those two into a single float? If we offset case A1 by n, we can use the range between [0, nE] for A2. Something like this pseudocode: if (dE > 0) return nE + dE; /* A1, offset dE into range [nE, inf] */ else return nE - (nE/oE); /* A2, scale oE into range [0, nE] */ Case B: New entry has non-zero volume -- B1: if dV > 0. use dV. dV must be in the range [0, nV]. B2: if dE > 0, use dE. dE must be in the range [0, nE]. B3: oV, otherwise. By offsetting cases B1 and B2, we can again divide the range into three parts, with pseudocode like: if (dV > 0) return nV + nE + dV; /* B1, offset dV into range [nV+nE, inf] */ else if (dE > 0) return nV + dE;/* B2, offset dE into range [nV, nV+nE] */ else return nV - (nV/oV)/* B3, scale oV into range [0, nV] I’ve tested patch performance with attached test. On my machine patch slows index construction time from 60 to 76 seconds, reduces size of the index from 85Mb to 82Mb, reduces time of aggregates computation from 36 seconds to 29 seconds. Hmm. That's a pretty large increase in construction time. Have you done any profiling on where the time goes? I do not know: should I continue this work in cube, or it’d be better to fork cube? Should definitely continue work within cube, IMHO. I don't think forking it to a new datatype would make this any easier. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Here is new version of the patch. Now function pack_float is commented better. All inline keywords are removed. I haven't found any performance loss for that. Remove of static's showed 1%-7% performance loss in SELECT computation (3 test runs), so I left statics where they are. Actually, to avoid this kind of hacks, probably, it would be better to fork GiST to GiSTv2 and implement many API changes there: 1. Bulk load API 2. Richier choose subtree API 3. Allow for key test not just NO\MAYBE answers, but SURE answer to skip key test for underlying tree And some other improvements require breaking chanes for extensions. GiST idea is almost 25, modern spatial indices vary a lot from things that were there during 90th. Best regards, Andrey Borodin, Octonica & Ural Federal University. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..9ce3cd6 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -91,14 +91,16 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static int32 cube_cmp_v0(NDBOX *a, NDBOX *b); +static boolcube_contains_v0(NDBOX *a, NDBOX *b); +static boolcube_overlap_v0(NDBOX *a, NDBOX *b); +static NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +static float pack_float(float actualValue, int realm); +static voidrt_cube_size(NDBOX *a, double *sz); +static voidrt_cube_edge(NDBOX *a, double *sz); +static NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions @@ -419,7 +421,27 @@ g_cube_decompress(PG_FUNCTION_ARGS) } PG_RETURN_POINTER(entry); } - +/* + * Function to pack bit flags inside float type + * Resulted value represent can be from four different "realms" + * Every value from realm 3 is greater than any value from realms 2,1 and 0. + * Every value from realm 2 is less than every value from realm 3 and greater + * than any value from realm 1 and 0, and so on. Values from the same realm + * loose two bits of precision. This technique is possible due to floating + * point numbers specification according to IEEE 754: exponent bits are highest + * (excluding sign bits, but here penalty is always positive). If float a is + * greater than float b, integer A with same bit representation as a is greater + * than integer B with same bits as b. + */ +static float +pack_float(float actualValue, int realm) +{ + /* two bits for realm, others for value */ + /* we have 4 realms */ + int realmAjustment = *((int*))/4; + int realCode = realm * (INT32_MAX/4) + realmAjustment; + return *((float*)); +} /* ** The GiST Penalty method for boxes @@ -428,6 +450,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -441,9 +468,33 @@ g_cube_penalty(PG_FUNCTION_ARGS) rt_cube_size(DatumGetNDBOX(origentry->key), ); *result = (float) (tmp1 - tmp2); - /* -* fprintf(stderr, "penalty\n"); fprintf(stderr, "\t%g\n", *result); -*/ + if( *result == 0 ) + { + double tmp3 = tmp1; /* remember entry volume */ + rt_cube_edge(ud, ); + rt_cube_edge(DatumGetNDBOX(origentry->key), ); + *result = (float) (tmp1 - tmp2); + if( *result == 0 ) + { + if( tmp3 != 0 ) + { + *result = pack_float(tmp3, 1); /* REALM 1 */ + } + else + { + *result = pack_float(tmp1, 0); /* REALM 0 */ + } + } +

### Re: [HACKERS] GiST penalty functions [PoC]

Thank you for your coments, Tom. > Modern compilers are generally able to make their own decisions about it, and > trying to put your thumb on the scales this heavily is not likely to improve > the code. Well, I have tested that combination of "static inline" affets performance of index build on a scale of 5%. Though I didn't tested with "static" only. AFAIK compiler cannot prove that array of function input and output do not intersect, so it emits lots of writes to output address inside loop body. >That pack_float function is absolutely not acceptable Well, possibly, there are ways to achive penalty optimization without such hacks, but it failed for pg_shpere and in PostGIS code. They reverted plain arithmetic optimization without bit hacks, becuase it didn't worked. This one works. There is other way: advance GiST API. But I'm not sure it is possible to do so without breaking compatibility with many existing extensions. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Andrew Borodinwrites: > Does every supported Postgres platform conforms to IEEE 754 floating > point specification? Possibly, but it is against project policy to allow code to assume so. That pack_float function is absolutely not acceptable IMV, and would still be if it were adequately documented, which it's far away from being. On general grounds, I would forget all the "inline"s. Modern compilers are generally able to make their own decisions about it, and trying to put your thumb on the scales this heavily is not likely to improve the code. Haven't really read the patch, just responding to a couple points you mentioned in the intro. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] GiST penalty functions [PoC]

Hi hackers! Here is the new patch version. With the help of Mikhail Bakhterev I've optimized subroutines of the penalty function. Index build time now is roughly equivalent to build time before patch (test attached to thread start). Time of SELECT statement execution is reduced by 40%. Changes in the patch: 1. Wrong usage of realms is fixed 2. Cube size and margin (edge) functions are optimized to reduce memory write instructions count (result of these functions were written on evey cycle of a loop) 3. All private functions are marked as static inline 4. Comments are formatted per project style I'm going to put this to commitfest queue, because performance of gist queries is improved significantly and I do not see any serious drawbacks. Any ideas about this patch are welcome. Especialy I'm conserned about portability of pack_float function. Does every supported Postgres platform conforms to IEEE 754 floating point specification? Also I'm not sure about possibility to hit any problems with float NaNs during float package? Best regards, Andrey Borodin, Octonica & Ural Federal University. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..dec314f 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -91,20 +91,22 @@ PG_FUNCTION_INFO_V1(cube_enlarge); /* ** For internal use only */ -int32 cube_cmp_v0(NDBOX *a, NDBOX *b); -bool cube_contains_v0(NDBOX *a, NDBOX *b); -bool cube_overlap_v0(NDBOX *a, NDBOX *b); -NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); -void rt_cube_size(NDBOX *a, double *sz); -NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); -bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); -bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static inline int32cube_cmp_v0(NDBOX *a, NDBOX *b); +static inline bool cube_contains_v0(NDBOX *a, NDBOX *b); +static inline bool cube_overlap_v0(NDBOX *a, NDBOX *b); +static inline NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); +static inline floatpack_float(float actualValue, int realm); +static inline void rt_cube_size(NDBOX *a, double *sz); +static inline void rt_cube_edge(NDBOX *a, double *sz); +static inline NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); +static inline bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); +static inline bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); /* ** Auxiliary funxtions */ -static double distance_1D(double a1, double a2, double b1, double b2); -static bool cube_is_point_internal(NDBOX *cube); +static inline double distance_1D(double a1, double a2, double b1, double b2); +static inline bool cube_is_point_internal(NDBOX *cube); /* @@ -420,6 +422,15 @@ g_cube_decompress(PG_FUNCTION_ARGS) PG_RETURN_POINTER(entry); } +static inline float +pack_float(float actualValue, int realm) +{ + /* two bits for realm, other for value */ + /* we have 4 realms */ + int realmAjustment = *((int*))/4; + int realCode = realm * (INT32_MAX/4) + realmAjustment; + return *((float*)); +} /* ** The GiST Penalty method for boxes @@ -428,6 +439,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + /* REALM 0: No extension is required, volume is zero, return edge */ + /* REALM 1: No extension is required, return nonzero volume */ + /* REALM 2: Volume extension is zero, return nonzero edge extension */ + /* REALM 3: Volume extension is nonzero, return it */ + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -441,9 +457,33 @@ g_cube_penalty(PG_FUNCTION_ARGS) rt_cube_size(DatumGetNDBOX(origentry->key), ); *result = (float) (tmp1 - tmp2); - /* -* fprintf(stderr, "penalty\n"); fprintf(stderr, "\t%g\n", *result); -*/ + if( *result == 0 ) + { + double tmp3 = tmp1; /* remember entry volume */ + rt_cube_edge(ud, ); + rt_cube_edge(DatumGetNDBOX(origentry->key), ); + *result = (float) (tmp1 - tmp2); + if( *result == 0 ) + { + if( tmp3 != 0 ) + { + *result = pack_float(tmp3, 1); /* REALM 1 */ + } + else + { + *result = pack_float(tmp1, 0); /* REALM 0 */ +

### [HACKERS] GiST penalty functions [PoC]

Hi, hackers! Some time ago I put a patch to commitfest that optimizes slightly GiST inserts and updates. It's effect in general is quite modest (15% perf improved), but for sorted data inserts are 3 times quicker. This effect I attribute to mitigation of deficiencies of old split algorithm. Alexander Korotkov advised me to lookup some features of the RR*-tree. The RR*-tree differs from combination of Gist + cube extension in two algorithms: 1.Algorithm to choose subtree for insertion 2.Split algorithm This is the message regarding implementation of first one in GiST. I think that both of this algorithms will improve querying performance. Long story short, there are two problems in choose subtree algorithm. 1.GiST chooses subtree via penalty calculation for each entry, while RR*-tree employs complicated conditional logic: when there are MBBs (Minimum Bounding Box) which do not require extensions, smallest of them is chosen; in some cases, entries are chosen not by volume, but by theirs margin. 2.RR*-tree algorithm jumps through entry comparation non-linearly, I think this means that GiST penalty computation function will have to access other entries on a page. In this message I address only first problem. I want to construct a penalty function that will choose: 1.Entry with a zero volume and smallest margin, that can accommodate insertion tuple without extension, if one exists; 2.Entry with smallest volume, that can accommodate insertion tuple without extension, if one exists; 3.Entry with zero volume increase after insert and smallest margin increase, if one exists; 4.Entry with minimal volume increase after insert. Current cube behavior inspects only 4th option by returning as a penalty (float) MBB’s volume increase. To implement all 4 options, I use a hack: order of positive floats corresponds exactly to order of integers with same binary representation (I’m not sure this stands for every single supported platform). So I use two bits of float as if it were integer, and all others are used as shifted bits of corresponding float: option 4 – volume increase, 3 - margin increase, 2 – MBB volume, 1 – MBB margin. You can check the reinterpretation cast function pack_float() in the patch. I’ve tested patch performance with attached test. On my machine patch slows index construction time from 60 to 76 seconds, reduces size of the index from 85Mb to 82Mb, reduces time of aggregates computation from 36 seconds to 29 seconds. I do not know: should I continue this work in cube, or it’d be better to fork cube? Maybe anyone have tried already RR*-tree implementation? Science papers show very good search performance increase. Please let me know if you have any ideas, information or interest in this topic. Best regards, Andrey Borodin, Octonica & Ural Federal University. diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c index 3feddef..9c8c63d 100644 --- a/contrib/cube/cube.c +++ b/contrib/cube/cube.c @@ -96,6 +96,7 @@ bool cube_contains_v0(NDBOX *a, NDBOX *b); bool cube_overlap_v0(NDBOX *a, NDBOX *b); NDBOX *cube_union_v0(NDBOX *a, NDBOX *b); void rt_cube_size(NDBOX *a, double *sz); +void rt_cube_edge(NDBOX *a, double *sz); NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep); bool g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); bool g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy); @@ -420,6 +421,13 @@ g_cube_decompress(PG_FUNCTION_ARGS) PG_RETURN_POINTER(entry); } +float pack_float(float actualValue, int realm) +{ + // two bits for realm, other for value + int realmAjustment = *((int*))/4; + int realCode = realm * (INT32_MAX/4) + realmAjustment; // we have 4 realms + return *((float*)); +} /* ** The GiST Penalty method for boxes @@ -428,6 +436,11 @@ g_cube_decompress(PG_FUNCTION_ARGS) Datum g_cube_penalty(PG_FUNCTION_ARGS) { + // REALM 0: No extension is required, volume is zerom return edge + // REALM 1: No extension is required, return nonzero volume + // REALM 2: Volume extension is zero, return nonzero edge extension + // REALM 3: Volume extension is nonzero, return it + GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1); float *result = (float *) PG_GETARG_POINTER(2); @@ -440,6 +453,32 @@ g_cube_penalty(PG_FUNCTION_ARGS) rt_cube_size(ud, ); rt_cube_size(DatumGetNDBOX(origentry->key), ); *result = (float) (tmp1 - tmp2); + if(*result == 0) + { + double tmp3 = tmp1; + rt_cube_edge(ud, ); + rt_cube_edge(DatumGetNDBOX(origentry->key), ); + *result = (float) (tmp1 - tmp2); + if(*result == 0) + { + if(tmp3!=0) +