Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-04-23 Thread Bruce Momjian
On Thu, Mar 31, 2016 at 05:46:41PM +0200, Emre Hasegeli wrote:
> > Thank you, pushed
> 
> Thank you for working on this.
> 
> I noticed that have overlooked this:
> 
>  static RectBox *
> -initRectBox()
> +initRectBox(void)
>  {

Done, thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-31 Thread Emre Hasegeli
> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:

 static RectBox *
-initRectBox()
+initRectBox(void)
 {


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-30 Thread Teodor Sigaev

Thank you, pushed

Emre Hasegeli wrote:

I'll try to explain with two-dimensional example over points. ASCII-art:


Thank you for the explanation.  Should we incorporate this with the patch.


added


I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.


+ cmp_double(const double a, const double b)


Does this function necessary?  We can just compare the doubles.


Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.


I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.


+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)


The "rectangle" variable in here should be renamed.


fixed


I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.


evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.


I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.


it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().


It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-27 Thread Emre Hasegeli
>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation.  Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary?  We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist


q4d-5.patch
Description: Binary data


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Teodor Sigaev

+  * boxtype_spgist.c

The names on the file header need to be changed, too.

Oops. fixed




I'll try to explain with two-dimensional example over points. ASCII-art:

Thank you for the explanation.  Should we incorporate this with the patch.

added



+ cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.
Yes, it compares with limited precision as it does by geometry operations. 
Renamed to point actual arguments.





+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

fixed




+ Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

I didn't know:
# select '(1,inf)'::point;
  point
-
 (1,inf)

fixed




Wouldn't it be simpler to palloc and return the value on those functions?

evalRangeBox() initializes part of RectBox, so, it could not palloc its
result.
Other methods use the same signature just for consistency.


I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into 
RangeBox struct and free. Let both fucntion use the same interface.



I went through all other variables:

+ int r = is_infinite(a);
+ double  x = *(double *) a;
+ double  y = *(double *) b;
+ spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
+ spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
+ BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?
They could. But it doesn't required. To be in consistent state I've removed 
const modifier where possible






+/*
+ * Begin. This block evaluates the median of coordinates of boxes
+ */


I would rather explain what the function does on the function header.


fixed

The "end" part of it is still there.

Oops again, fixed




Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.


Seems, yes. spgist tree could contain a locng branches with allTheSame.


Does SP-GiST allows any node under allTheSame to not being allTheSame?

No, SP-GiST doesn't allow that

  Not setting traversalValues for allTheSame worked fine with my test.
it works until allthesame branch contains only one inner node. Otherwise 
traversalValue will be freeed several times, see spgWalk().

# select i as id, '1,2,3,4'::box as b into x from generate_series(1,100) i;
# create index ix on i using spgist (b);
# select count(*) from x where b && '1,2,3,4'::box; -- coredump
gdb:
#0  0x00080143564a in thr_kill () from /lib/libc.so.7
#1  0x000801435636 in raise () from /lib/libc.so.7
#2  0x0008014355b9 in abort () from /lib/libc.so.7
#3  0x00a80739 in in_error_recursion_trouble () at elog.c:199
#4  0x00abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016
#5  0x0053330c in freeScanStackEntry (so=0x801e8d358, 
stackEntry=0x801e935d8) at spgscan.c:47
#6  0x00532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358, 
scanWholeIndex=1 '\001', storeRes=0x532d10 
...




+ if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

yep, fixed




+ out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.

yep, fixed

I've attached all patches again. Thank you very much!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-4.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Emre Hasegeli
> +  * boxtype_spgist.c

The names on the file header need to be changed, too.

> I'll try to explain with two-dimensional example over points. ASCII-art:
>|
>|
> 1  |  2
>|
> ---+-
>|P
>  3 |  4
>|
>|
>
> '+' with 'A' represents a centroid or, other words, point which splits plane
> for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of
> quadrants, each labeling will be the same for all pictures and all
> centriods, and i will not show them in pictures later to prevent too
> complicated images. Let we add C - child node (and again, it will split
> plane for 4 quads):
>
>
>| |
>+-+---
>  X |B|C
>| |
> ---+-+---
>|P|A
>| |
>|
>|
>
> A and B are points of intersection of lines. So, box PBCAis a bounding box
> for points contained in 3-rd (see labeling above). For example X labeled
> point is not a descendace of child node with centroid  C because it must be
> in branch of 1-st quad of parent node. So, each node (except root) will have
> a limitation in its quadrant. To transfer that limitation the traversalValue
> is used.
>
> To keep formatting I attached a txt file with this fragment.

Thank you for the explanation.  Should we incorporate this with the patch.

>>> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>>> Group
>>
>> 2016.
>
> fixed

Not on the patch.

> + cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.

> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

> + Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

>> Wouldn't it be simpler to palloc and return the value on those functions?
>
> evalRangeBox() initializes part of RectBox, so, it could not palloc its
> result.
> Other methods use the same signature just for consistency.

I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

>> Many variables are defined with "const".  I am not sure they are all
>> that doesn't change.  If it applies to the pointer, "out" should also
>> not change in here.  Am I wrong?
>
> No, changes

I now read about "const".  I am sorry for not being experienced in C.
The new version of the patch marks them as "const" by mistake.

I went through all other variables:

> + int r = is_infinite(a);
>
> + double  x = *(double *) a;
> + double  y = *(double *) b;
>
> + spgInnerConsistentIn *in = (spgInnerConsistentIn *) 
> PG_GETARG_POINTER(0);
>
> + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>
> + BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?

>>> +/*
>>> + * Begin. This block evaluates the median of coordinates of boxes
>>> + */
>>
>> I would rather explain what the function does on the function header.
>
> fixed

The "end" part of it is still there.

>> Do we really need to copy the traversalValues on allTheSame case.
>> Wouldn't it work if just the same value is passed for all of them.
>> The search shouldn't continue after allTheSame case.
>
> Seems, yes. spgist tree could contain a locng branches with allTheSame.

Does SP-GiST allows any node under allTheSame to not being allTheSame?
 Not setting traversalValues for allTheSame worked fine with my test.

> + if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

> + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev



Isn't this basically the same thing that the cube contrib module does? (Which
has the added benefit of kNN-capable operators).
No, cube module introduces new type - N-dimensional box. And adds an index 
support for it.


Current patch suggests non-traditional indexing technique for 2D boxes by 
treating them as point in 4D space. With such representation it's became 
possible to use quad-tree technique which:

1 supports only points to index
2 already supported by SP-GiST

Such technique provides some benefits in comparance with traditional R-Tree 
which implemented in pg with a help GiST. See Oleg's message.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev

tend to think of a *point* as having *zero* dimensions.  Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

Exactly, sorry for ambiguity.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Jim Nasby

On 3/21/16 7:41 PM, Stas Kelvich wrote:

While people tends to use machine learning and regressions models more and more 
it is interesting to have some general n-dim indexing with kNN,
but I think it is different problem and should be solved in a different way.


I think one of the issues here is it's not very clear to someone without 
a good amount of ML knowledge how these things relate. I hear "box' and 
'cube' and think it's just a 2D vs 3D issue, and intarray isn't even on 
the radar.


Maybe what's needed are actual vector and matrix types?

In any case, if you've got a good reason why box and cube should stay 
separate then further discussion should happen in another thread.


BTW, if you haven't seen it, take a look at http://madlib.apache.org/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Stas Kelvich

> On 22 Mar 2016, at 01:47, Jim Nasby  wrote:
> 
> On 3/21/16 11:57 AM, Teodor Sigaev wrote:
>> A and B are points of intersection of lines. So, box PBCAis a bounding
>> box for points contained in 3-rd (see labeling above). For example X
>> labeled point is not a descendace of child node with centroid  C because
>> it must be in branch of 1-st quad of parent node. So, each node (except
>> root) will have a limitation in its quadrant. To transfer that
>> limitation the traversalValue is used.
> 
> Isn't this basically the same thing that the cube contrib module does? (Which 
> has the added benefit of kNN-capable operators).

Cube and postgres own geometric types are indexed by building R-tree. While 
this is one of the most popular solutions it
degrades almost to linear search when bounding boxes for each index node 
overlaps a lot. This problem will arise when one will
try to index streets in the city with grid system — a lot of street's bounding 
boxes will just coincide with bounding box for whole city.

But that patch use totally different strategy for building index and do not 
suffer from above problem.

> 
> If that's true then ISTM it'd be better to work on pulling cube's features 
> into box?
> 
> If it's not true, I'm still wondering if there's enough commonality here that 
> we should pull cube into core…

There is also intarray module with very common functionality, but it also 
addressing different data pattern.

Optimal indexing and kNN strategy are very sensible to the data itself and if 
we want some general multidimensional search routines,
then I think none of the mentioned extensions is general enough. Cube barely 
applicable when dimensions number is higher than 10-20,
intarray performs bad on data with big sets of possible coordinates, this patch 
is also intended to help with specific, niche problem.

While people tends to use machine learning and regressions models more and more 
it is interesting to have some general n-dim indexing with kNN,
but I think it is different problem and should be solved in a different way.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Jim Nasby

On 3/21/16 11:57 AM, Teodor Sigaev wrote:

A and B are points of intersection of lines. So, box PBCAis a bounding
box for points contained in 3-rd (see labeling above). For example X
labeled point is not a descendace of child node with centroid  C because
it must be in branch of 1-st quad of parent node. So, each node (except
root) will have a limitation in its quadrant. To transfer that
limitation the traversalValue is used.


Isn't this basically the same thing that the cube contrib module does? 
(Which has the added benefit of kNN-capable operators).


If that's true then ISTM it'd be better to work on pulling cube's 
features into box?


If it's not true, I'm still wondering if there's enough commonality here 
that we should pull cube into core...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Stas Kelvich

> On 21 Mar 2016, at 22:38, Kevin Grittner  wrote:
> 
> On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev  wrote:
> 
>>> I couldn't get the term 4D point.  Maybe it means that we are
>>> using box datatype as the prefix, but we are not treating them
>>> as boxes.
>> 
>> exactly, we treat boxes as 4-dimentional points.
> 
> I'm not entirely sure I understand the terminology either, since I
> tend to think of a *point* as having *zero* dimensions.  Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?

Or just say 4-d vector instead of 4-d point. Look like it will be enough 
rigorous.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Kevin Grittner
On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev  wrote:

>> I couldn't get the term 4D point.  Maybe it means that we are
>> using box datatype as the prefix, but we are not treating them
>> as boxes.
>
> exactly, we treat boxes as 4-dimentional points.

I'm not entirely sure I understand the terminology either, since I
tend to think of a *point* as having *zero* dimensions.  Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Teodor Sigaev

+ *implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?
No. The idea if this work is a representation of 2d box as 4d point. Quad means 
quadrant of 2d plane. Originally such kind of tree was developed for 2d point, 
and each node of tree splits plane on 4 quadrant. For 3d tree each node splits 
space for 8 "quadrants", 4d - 16 ones.



+  * For example consider the case of intersection.

No need for a new line after this.

fixed


+  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner 
nodes.

I couldn't get the term 4D point.  Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

exactly, we treat boxes as 4-dimentional points.




+ * We use traversalValue to calculate quadrant bounds from parent's quadrant
+ * bounds.

I couldn't understand this sentence.  We are using the parent's prefix
and the quadrant to generate the traversalValue.  I think this is the
crucial part of the opclass.  It would be nice to explain it more
clearly.


I'll try to explain with two-dimensional example over points. ASCII-art:
   |
   |
1  |  2
   |
---+-
   |P
 3 |  4
   |
   |

'+' with 'A' represents a centroid or, other words, point which splits plane for 
4 quadrants and it strorend in parent node. 1,2,3,4 are labels of quadrants, 
each labeling will be the same for all pictures and all centriods, and i will 
not show them in pictures later to prevent too complicated images. Let we add C 
- child node (and again, it will split plane for 4 quads):



   | |
   +-+---
 X |B|C
   | |
---+-+---
   |P|A
   | |
   |
   |

A and B are points of intersection of lines. So, box PBCAis a bounding box for 
points contained in 3-rd (see labeling above). For example X labeled point is 
not a descendace of child node with centroid  C because it must be in branch of 
1-st quad of parent node. So, each node (except root) will have a limitation in 
its quadrant. To transfer that limitation the traversalValue is used.


To keep formatting I attached a txt file with this fragment.




+  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

2016.

fixed




+ *  src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

done


+ #include "utils/builtins.h";

Extra ;.

fixed




+ #include "utils/datum.h"

I think this is unnecessary.

removed




+ /* InfR type implements doubles and +- infinity */
+ typedef struct
+ {
+int infFlag;
+double  val;
+ }   InfR;

Why do we need this?  Can't we store infinity in "double"?

Hmm, you are right. fixed.


This isn't returning the value.

removed




+ typedef struct
+ {
+Range   range_x;
+Range   range_y;
+ }   Rectangle;


Wouldn't it be simpler to just using BOX instead of this?

Too much the same names in RectBox




+ static void
+ evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
+  const int half2, IRangeBox *new_range_box)

+ static void
+ evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
+ const uint8 quadrant, IRectBox * new_rect_box)

+ inline static void
+ initializeUnboundedBox(IRectBox * rect_box)


Wouldn't it be simpler to palloc and return the value on those functions?

evalRangeBox() initializes part of RectBox, so, it could not palloc its result.
Other methods use the same signature just for consistency.




+ static int
+ intersect2D(const Range * range, const IRangeBox * range_box)


Wouldn't it be better to return "bool" on those functions.

done




+return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

removed




+ iconst spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
+ ispgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);


Many variables are defined with "const".  I am not sure they are all
that doesn't change.  If it applies to the pointer, "out" should also
not change in here.  Am I wrong?

No, changes




+/*
+ * Begin. This block evaluates the median of coordinates of boxes
+ */

I would rather explain what the function does on the function header.

fixed




+ memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

Seems, yes. spgist tree could contain a locng branches with allTheSame.




+ for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

fixed




+boxPointerToRectangle(
+DatumGetBoxP(in->scankeys[i].sk_argument),
+p_query_rect
+   

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Emre Hasegeli
Here are my comments about the operator class implementation:

> + *implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?

> +  * For example consider the case of intersection.

No need for a new line after this.

> +  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner 
> nodes.

I couldn't get the term 4D point.  Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
> + * bounds.

I couldn't understand this sentence.  We are using the parent's prefix
and the quadrant to generate the traversalValue.  I think this is the
crucial part of the opclass.  It would be nice to explain it more
clearly.

> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

2016.

> + *  src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

> + #include "utils/builtins.h";

Extra ;.

> + #include "utils/datum.h"

I think this is unnecessary.

> + /* InfR type implements doubles and +- infinity */
> + typedef struct
> + {
> +int infFlag;
> +double  val;
> + }   InfR;

Why do we need this?  Can't we store infinity in "double"?

> + /* wrap double to InfR */
> + static InfR
> + toInfR(double v, InfR * r)
> + {
> +r->infFlag = NotInf;
> +r->val = v;
> + }

This isn't returning the value.

> + typedef struct
> + {
> +Range   range_x;
> +Range   range_y;
> + }   Rectangle;

Wouldn't it be simpler to just using BOX instead of this?

> + static void
> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int 
> half1,
> +  const int half2, IRangeBox *new_range_box)
>
> + static void
> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
> + const uint8 quadrant, IRectBox * new_rect_box)
>
> + inline static void
> + initializeUnboundedBox(IRectBox * rect_box)

Wouldn't it be simpler to palloc and return the value on those functions?

> + static int
> + intersect2D(const Range * range, const IRangeBox * range_box)

Wouldn't it be better to return "bool" on those functions.

> +return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

> + iconst spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
> + ispgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

Many variables are defined with "const".  I am not sure they are all
that doesn't change.  If it applies to the pointer, "out" should also
not change in here.  Am I wrong?

> +/*
> + * Begin. This block evaluates the median of coordinates of boxes
> + */

I would rather explain what the function does on the function header.

> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

> + for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

> +boxPointerToRectangle(
> +DatumGetBoxP(in->scankeys[i].sk_argument),
> +p_query_rect
> +);

I don't think this matches the project coding style.

> + int flag = 1,

Wouldn't it be better as "bool"?

The regression tests for the remaining operators are still missing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Teodor Sigaev

Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?
reconstructedValues is needed to reconctruct full indexed value and should match 
to type info indexed column




We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?
because pg knows type of reconstructedValues (see above why) but traversalValue 
just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it.






+   Memory for traversalValues should be allocated in
+   the default context, but make sure to switch to
+   traversalMemoryContext before allocating memory
+   for pointers themselves.


This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

Clarified, I hope




The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We

hope, fixed. At least, seems, syncronized with seqscan


really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

agree, but it's not a deal for this patch



It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

Again, agree. But I suggest to do it by separate patch.



The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

fixed



+ #define LT  -1
+ #define GT   1
+ #define EQ   0


Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

fixed




+ /* SP-GiST API functions */
+ Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);


I guess they should go to the header file.
Remove them because they are presented in auto-generated file 
./src/include/utils/builtins.h


range patch is unchanged, but still attached to keep them altogether.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-2.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Emre Hasegeli
Here are my first comments.  I haven't read the actual index
implementation, yet.

I think traversal value is a useful addition.  It is nice that the
implementation for the range types is also changed.  My questions
about them are:

Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?

We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?

> +   Memory for traversalValues should be allocated in
> +   the default context, but make sure to switch to
> +   traversalMemoryContext before allocating memory
> +   for pointers themselves.

This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We
really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

We needs tests for the remaining operators and for adding new values
to the index.  I am not sure create_index.sql is the best place for
them.  Maybe we should use the test scripts of "box".

> + #define LT  -1
> + #define GT   1
> + #define EQ   0

Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

> + /* SP-GiST API functions */
> + Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);

I guess they should go to the header file.


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-15 Thread Oleg Bartunov
On Mon, Mar 14, 2016 at 9:01 PM, David Steele  wrote:

> On 2/15/16 10:29 AM, Teodor Sigaev wrote:
>
> It's very pity but author is not able to continue work on this patch,
>> and I would like to raise this flag.
>>
>> I'd like to add some comments about patches:
>>
>> traversalValue patch adds arbitrary value assoсiated with branch in
>> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
>> it havn't to be a regular pgsql type. Also, it could be used independly
>> from reconstructedValue. This patch is used both following attached
>> patches.
>>
>> range patch just switchs using reconstructedValue to traversalValue in
>> range opclasses. reconstructedValue was used just because it was an
>> acceptable workaround in case of range type. Range opclase stores a full
>> value in leafs and doesn't need to use reconstructedValue to return
>> tuple in index only scan.
>> See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us
>>
>> q4d patch implements index over boxes using SP-GiST. Basic idea was an
>> observation, range opclass thinks about one-dimensional ranges as 2D
>> points.
>> Following this idea, we can think that 2D box (what is 2 points or 4
>> numbers) could represent a 4D point. We hoped that this approach will be
>> much more effective than traditional R-Tree in case of many overlaps in
>> box's collection.
>> Performance results are coming shortly.
>>
>
> It appears that the issues raised in this thread have been addressed but
> the patch still has not gone though a real review.
>
> Anybody out there willing to take a crack at a review?  All three patches
> apply (with whitespace issues).
>

Emre Hasegeli will review the patch.


>
> Thanks,
> --
> -David
> da...@pgmasters.net
>


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-14 Thread David Steele

On 2/15/16 10:29 AM, Teodor Sigaev wrote:


It's very pity but author is not able to continue work on this patch,
and I would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in
SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
it havn't to be a regular pgsql type. Also, it could be used independly
from reconstructedValue. This patch is used both following attached
patches.

range patch just switchs using reconstructedValue to traversalValue in
range opclasses. reconstructedValue was used just because it was an
acceptable workaround in case of range type. Range opclase stores a full
value in leafs and doesn't need to use reconstructedValue to return
tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an
observation, range opclass thinks about one-dimensional ranges as 2D
points.
Following this idea, we can think that 2D box (what is 2 points or 4
numbers) could represent a 4D point. We hoped that this approach will be
much more effective than traditional R-Tree in case of many overlaps in
box's collection.
Performance results are coming shortly.


It appears that the issues raised in this thread have been addressed but 
the patch still has not gone though a real review.


Anybody out there willing to take a crack at a review?  All three 
patches apply (with whitespace issues).


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-02-15 Thread Teodor Sigaev
It's very pity but author is not able to continue work on this patch, and I 
would like to raise this flag.


I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree 
walk. Unlike to recostructedValue it could be just pointer, it havn't to be a 
regular pgsql type. Also, it could be used independly from reconstructedValue. 
This patch is used both following attached patches.


range patch just switchs using reconstructedValue to traversalValue in range 
opclasses. reconstructedValue was used just because it was an acceptable 
workaround in case of range type. Range opclase stores a full value in leafs and 
doesn't need to use reconstructedValue to return tuple in index only scan.

See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an 
observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers) 
could represent a 4D point. We hoped that this approach will be much more 
effective than traditional R-Tree in case of many overlaps in box's collection.

Performance results are coming shortly.

In near future (say, tommorrow) I hope to suggest a opclass over polygons based 
on this approach.


Alexander Lebedev wrote:

Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


traversalValue-1.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data


q4d-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-01-27 Thread Alvaro Herrera
Alexander Lebedev wrote:
> Hello, Hacker.
> 
> * [PATCH] add a box index to sp-gist

I closed this patch as "returned with feedback" because the author
didn't reply in three months.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-01-04 Thread Alvaro Herrera
CC'ing Teodor because he's author of one of the patches.

Alexander Lebedev wrote:
> Hello, Hacker.

So, can we have a more thorough explanation of what is the point of this
patch?  If it is supposed to improve the performance of searching for
boxes, can we see measurements from some benchmark?  Do the patches
still apply, or do you need to rebase?  If so, please post an updated
version.

I'm setting this patch as Waiting on Author.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2015-10-31 Thread Oleg Bartunov
On Sat, Oct 31, 2015 at 9:49 PM, Alexander Lebedev  wrote:

> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist
>
>   We have extended sp-gist with an index that keeps track of boxes
>
>   We have used ideas underlying sp-gist range implementation to
>   represent 2D boxes as points in 4D space. We use quad tree
>   analogue, but in 4-dimensional space. We call this tree q4d. Each
>   node of this tree is a box (a point in 4D space) which splits space
>   in 16 hyperrectangles.
>
>   Rationale: r-tree assumes that boxes we're trying to index don't
>   overlap much. When this assumption fails, r-tree performs badly,
>   while our proposal to represent a rectangle as a point in 4D space
>   solves this problem.
>
>   NB: the index uses traversalValue introduced in a separate patch.
>
> * [PATCH] add traversalValue in sp-gist
>
>   During implementation of box index for sp-gist we saw that we only
>   keep rectangles, but to determine traversal direction we may need
>   to know boundaries of a hyperrectangle. So we calculate them
>   on the fly and store them in traversalValue, available
>   when traversing child nodes, because otherwise we would't be able to
>   calculate them from inside the inner_consistent function call.
>
>   This patch was written by Teodor Sigaev.
>
> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>
>   During traversal, local context is
>   insufficient to pick traversal direction. As of now, this is worked
>   around with the help of reconstructValue. reconstructValue only
>   stores data of the same type as a tree node, that is, a range.
>
>   We have written a patch that calculates auxillary values and makes
>   them accessible during traversal.
>
>   We then use traversalValue in a new box index and change
>   range_spgist.c to use it in place of reconstructValue.
>
>   NB: apply this patch after traversalValue patch.
>
>
Did you forget to show us performance numbers ?



>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


[HACKERS] [PATCH] we have added support for box type in SP-GiST index

2015-10-31 Thread Alexander Lebedev

Hello, Hacker.

* [PATCH] add a box index to sp-gist

  We have extended sp-gist with an index that keeps track of boxes

  We have used ideas underlying sp-gist range implementation to
  represent 2D boxes as points in 4D space. We use quad tree
  analogue, but in 4-dimensional space. We call this tree q4d. Each
  node of this tree is a box (a point in 4D space) which splits space
  in 16 hyperrectangles.

  Rationale: r-tree assumes that boxes we're trying to index don't
  overlap much. When this assumption fails, r-tree performs badly,
  while our proposal to represent a rectangle as a point in 4D space
  solves this problem.

  NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

  During implementation of box index for sp-gist we saw that we only
  keep rectangles, but to determine traversal direction we may need
  to know boundaries of a hyperrectangle. So we calculate them
  on the fly and store them in traversalValue, available
  when traversing child nodes, because otherwise we would't be able to
  calculate them from inside the inner_consistent function call.

  This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

  During traversal, local context is
  insufficient to pick traversal direction. As of now, this is worked
  around with the help of reconstructValue. reconstructValue only
  stores data of the same type as a tree node, that is, a range.

  We have written a patch that calculates auxillary values and makes
  them accessible during traversal.

  We then use traversalValue in a new box index and change
  range_spgist.c to use it in place of reconstructValue.

  NB: apply this patch after traversalValue patch.

diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 56827e5..d558b8e 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -542,6 +542,8 @@ typedef struct spgInnerConsistentIn
 int nkeys;  /* length of array */
 
 Datum   reconstructedValue; /* value reconstructed at parent */
+void   *traversalValue; /* opclass-specific traverse value */
+MemoryContext traversalMemoryContext;
 int level;  /* current level (counting from zero) */
 boolreturnData; /* original data must be returned? */
 
@@ -559,6 +561,8 @@ typedef struct spgInnerConsistentOut
 int*nodeNumbers;/* their indexes in the node array */
 int*levelAdds;  /* increment level by this much for each */
 Datum  *reconstructedValues;/* associated reconstructed values */
+void  **traversalValues;/* opclass-specific traverse values */
+
 } spgInnerConsistentOut;
 
 
@@ -593,6 +597,9 @@ typedef struct spgInnerConsistentOut
inner tuple, and
nodeLabels is an array of their label values, or
NULL if the nodes do not have labels.
+   traversalValue is a pointer to data that
+   inner_consistent gets when called on child nodes from an
+   outer call of inner_consistent on parent nodes.
   
 
   
@@ -612,6 +619,15 @@ typedef struct spgInnerConsistentOut
responsible for palloc'ing the
nodeNumbers, levelAdds and
reconstructedValues arrays.
+   Sometimes accumulating some information is needed, while 
+   descending from parent to child node was happened. In this case
+
+   traversalValues array keeps pointers to
+   specific data you need to accumulate for every child node.
+   Memory for traversalValues should be allocated in
+   the default context, but make sure to switch to
+   traversalMemoryContext before allocating memory
+   for pointers themselves.
   
  
 
@@ -638,6 +654,7 @@ typedef struct spgLeafConsistentIn
 ScanKey scankeys;   /* array of operators and comparison values */
 int nkeys;  /* length of array */
 
+void   *traversalValue; /* opclass-specific traverse value */
 Datum   reconstructedValue; /* value reconstructed at parent */
 int level;  /* current level (counting from zero) */
 boolreturnData; /* original data must be returned? */
diff --git a/src/backend/access/spgist/spgscan.c 
b/src/backend/access/spgist/spgscan.c
index 8a0d909..bec6daf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -30,6 +30,7 @@ typedef void (*storeRes_func) (SpGistScanOpaque so, 
ItemPointer heapPtr,
 typedef struct ScanStackEntry
 {
Datum   reconstructedValue; /* value reconstructed 
from parent */
+   void   *traversalValue; /* opclass-specific traverse value */
int level;  /* level of items on 
this page */
ItemPointerData ptr;/* block and offset to scan from */
 } ScanStackEntry;
@@ -42,6 +43,9 @@ freeScanStackEntr