Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-30 Thread Alexander Korotkov
On Sun, Oct 29, 2017 at 1:30 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> I've reviewed code of ~> operator and its KNN-GiST support.
> Unfortunately, it appears that it's broken in design...  The explanation is
> above.
>
> We've following implementation of ~> operator.
>
> if (coord <= DIM(cube))
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[coord - 1]);
>> else
>> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
>> cube->x[coord - 1 + DIM(cube)]));
>> }
>> else
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
>> else
>> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
>> cube->x[coord - 1 - DIM(cube)]));
>> }
>
>
> Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N -
> 1] are upper bounds.  However, we might have indexed cubes of different
> dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
> upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
> lower bound of 3rd dimension.
>
> Therefore, despite ~> operator was specially invented to be supported by
> KNN-GIST, it can't serve this way.
>
> Regarding particular case discovered by Tomas, I think the error is in the
> GiST supporting code.
>
> if (strategy == CubeKNNDistanceCoord)
>> {
>> int coord = PG_GETARG_INT32(1);
>> if (DIM(cube) == 0)
>> retval = 0.0;
>> else if (IS_POINT(cube))
>> retval = cube->x[(coord - 1) % DIM(cube)];
>> else
>> retval = Min(cube->x[(coord - 1) % DIM(cube)],
>> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
>> }
>
>
> g_cube_distance() always returns lower bound of cube.  It should return
> upper bound for coord > DIM(cube).
>
> It would be also nice to provide descending ordering using KNN-GiST.  It
> would be especially effective for upper bound.  Since, KNN-GiST doesn't
> support explicit descending ordering, it might be useful to make ~>
> operator return negative of coordinate when negative argument is provided.
> For instance, '(1,2), (3,4)'::cube ~> -1 return -1.
>
> I'd like to propose following way to resolve design issue.  cube ~> (2*N -
> 1) should return lower bound of Nth coordinate of the cube while cube ~>
> 2*N should return upper bound of Nth coordinate.  Then it would be
> independent on number of coordinates in particular cube.  For sure, it
> would be user-visible incompatible change.  But I think there is not so
> many users of this operator yet.  Also, current behavior of ~> seems quite
> useless.
>

Thus, I heard no objection to my approach.  Attached patch changes ~>
operator in the proposed way and fixes KNN-GiST search for that.  I'm going
to register this patch to the nearest commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


cube-knn-fix-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-28 Thread Alexander Korotkov
On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
>> which I believe is a bug in the extension (or the GiST index support).
>> The following example comes directly from regression tests added by
>> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>>
>> This query should produce results with ordering "ascending by 2nd
>> coordinate or upper right corner". To make it clear, I've added the
>> "c~>4" expression to the query, otherwise it's right from the test.
>>
>> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>>  c~>4 | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   155 | (18037, 155),(17941, 109)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   191 | (16906, 191),(16816, 139)
>>   187 | (759, 187),(662, 163)
>>   266 | (22684, 266),(22656, 181)
>>   255 | (24423, 255),(24360, 213)
>>   249 | (45989, 249),(45910, 222)
>>   377 | (11399, 377),(11360, 294)
>>   389 | (12162, 389),(12103, 309)
>> (15 rows)
>>
>> As you can see, it's not actually sorted by the c~>4 coordinate (but by
>> c~>2, which it the last number).
>>
>> Moreover, disabling index scans fixes the ordering:
>>
>> test=# set enable_indexscan = off;
>> SET
>> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
>> ascending by 2nd coordinate or upper right corner
>>  ?column? | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   155 | (18037, 155),(17941, 109)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   187 | (759, 187),(662, 163)
>>   191 | (16906, 191),(16816, 139)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   249 | (45989, 249),(45910, 222)
>>   255 | (24423, 255),(24360, 213)
>>   266 | (22684, 266),(22656, 181)
>>   367 | (31018, 367),(30946, 333)
>>   377 | (11399, 377),(11360, 294)
>> (15 rows)
>>
>>
>> Seems like a bug somewhere in gist_cube_ops, I guess?
>>
>
> +1,
> that definitely looks like a bug. Thank you for reporting!
> I'll take a look on it in couple days.
>

I've reviewed code of ~> operator and its KNN-GiST support.  Unfortunately,
it appears that it's broken in design...  The explanation is above.

We've following implementation of ~> operator.

if (coord <= DIM(cube))
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[coord - 1]);
> else
> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
> cube->x[coord - 1 + DIM(cube)]));
> }
> else
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
> else
> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
> cube->x[coord - 1 - DIM(cube)]));
> }


Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - 1]
are upper bounds.  However, we might have indexed cubes of different
dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
lower bound of 3rd dimension.

Therefore, despite ~> operator was specially invented to be supported by
KNN-GIST, it can't serve this way.

Regarding particular case discovered by Tomas, I think the error is in the
GiST supporting code.

if (strategy == CubeKNNDistanceCoord)
> {
> int coord = PG_GETARG_INT32(1);
> if (DIM(cube) == 0)
> retval = 0.0;
> else if (IS_POINT(cube))
> retval = cube->x[(coord - 1) % DIM(cube)];
> else
> retval = Min(cube->x[(coord - 1) % DIM(cube)],
> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
> }


g_cube_distance() always returns lower bound of cube.  It should return
upper bound for coord > DIM(cube).

It would be also nice to provide descending ordering using KNN-GiST.  It
would be especially effective for upper bound.  Since, KNN-GiST doesn't
support explicit descending ordering, it might be useful to make ~>
operator return negative of coordinate when negative argument is provided.
For instance, '(1,2), (3,4)'::cube ~> -1 return -1.

I'd like to propose following way to resolve design issue.  cube ~> (2*N -
1) should return lower bound of Nth coordinate of the cube while cube ~>
2*N should return upper bound of Nth coordinate.  Then it would be
independent on number of coordinates in particular cube.  For sure, it
would be user-visible incompatible change.  But I think there is not so
many users of this operator yet.  Also, current behavior of ~> seems quite
useless.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-19 Thread Alexander Korotkov
Hi!

On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra  wrote:

> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
> which I believe is a bug in the extension (or the GiST index support).
> The following example comes directly from regression tests added by
> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>
> This query should produce results with ordering "ascending by 2nd
> coordinate or upper right corner". To make it clear, I've added the
> "c~>4" expression to the query, otherwise it's right from the test.
>
> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>  c~>4 | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   155 | (18037, 155),(17941, 109)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   191 | (16906, 191),(16816, 139)
>   187 | (759, 187),(662, 163)
>   266 | (22684, 266),(22656, 181)
>   255 | (24423, 255),(24360, 213)
>   249 | (45989, 249),(45910, 222)
>   377 | (11399, 377),(11360, 294)
>   389 | (12162, 389),(12103, 309)
> (15 rows)
>
> As you can see, it's not actually sorted by the c~>4 coordinate (but by
> c~>2, which it the last number).
>
> Moreover, disabling index scans fixes the ordering:
>
> test=# set enable_indexscan = off;
> SET
> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
> ascending by 2nd coordinate or upper right corner
>  ?column? | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   155 | (18037, 155),(17941, 109)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   187 | (759, 187),(662, 163)
>   191 | (16906, 191),(16816, 139)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   249 | (45989, 249),(45910, 222)
>   255 | (24423, 255),(24360, 213)
>   266 | (22684, 266),(22656, 181)
>   367 | (31018, 367),(30946, 333)
>   377 | (11399, 377),(11360, 294)
> (15 rows)
>
>
> Seems like a bug somewhere in gist_cube_ops, I guess?
>

+1,
that definitely looks like a bug. Thank you for reporting!
I'll take a look on it in couple days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company