Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-30 Thread Alexander Korotkov
On Sun, Oct 29, 2017 at 1:30 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> I've reviewed code of ~> operator and its KNN-GiST support.
> Unfortunately, it appears that it's broken in design...  The explanation is
> above.
>
> We've following implementation of ~> operator.
>
> if (coord <= DIM(cube))
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[coord - 1]);
>> else
>> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
>> cube->x[coord - 1 + DIM(cube)]));
>> }
>> else
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
>> else
>> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
>> cube->x[coord - 1 - DIM(cube)]));
>> }
>
>
> Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N -
> 1] are upper bounds.  However, we might have indexed cubes of different
> dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
> upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
> lower bound of 3rd dimension.
>
> Therefore, despite ~> operator was specially invented to be supported by
> KNN-GIST, it can't serve this way.
>
> Regarding particular case discovered by Tomas, I think the error is in the
> GiST supporting code.
>
> if (strategy == CubeKNNDistanceCoord)
>> {
>> int coord = PG_GETARG_INT32(1);
>> if (DIM(cube) == 0)
>> retval = 0.0;
>> else if (IS_POINT(cube))
>> retval = cube->x[(coord - 1) % DIM(cube)];
>> else
>> retval = Min(cube->x[(coord - 1) % DIM(cube)],
>> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
>> }
>
>
> g_cube_distance() always returns lower bound of cube.  It should return
> upper bound for coord > DIM(cube).
>
> It would be also nice to provide descending ordering using KNN-GiST.  It
> would be especially effective for upper bound.  Since, KNN-GiST doesn't
> support explicit descending ordering, it might be useful to make ~>
> operator return negative of coordinate when negative argument is provided.
> For instance, '(1,2), (3,4)'::cube ~> -1 return -1.
>
> I'd like to propose following way to resolve design issue.  cube ~> (2*N -
> 1) should return lower bound of Nth coordinate of the cube while cube ~>
> 2*N should return upper bound of Nth coordinate.  Then it would be
> independent on number of coordinates in particular cube.  For sure, it
> would be user-visible incompatible change.  But I think there is not so
> many users of this operator yet.  Also, current behavior of ~> seems quite
> useless.
>

Thus, I heard no objection to my approach.  Attached patch changes ~>
operator in the proposed way and fixes KNN-GiST search for that.  I'm going
to register this patch to the nearest commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


cube-knn-fix-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-28 Thread Alexander Korotkov
On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
>> which I believe is a bug in the extension (or the GiST index support).
>> The following example comes directly from regression tests added by
>> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>>
>> This query should produce results with ordering "ascending by 2nd
>> coordinate or upper right corner". To make it clear, I've added the
>> "c~>4" expression to the query, otherwise it's right from the test.
>>
>> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>>  c~>4 | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   155 | (18037, 155),(17941, 109)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   191 | (16906, 191),(16816, 139)
>>   187 | (759, 187),(662, 163)
>>   266 | (22684, 266),(22656, 181)
>>   255 | (24423, 255),(24360, 213)
>>   249 | (45989, 249),(45910, 222)
>>   377 | (11399, 377),(11360, 294)
>>   389 | (12162, 389),(12103, 309)
>> (15 rows)
>>
>> As you can see, it's not actually sorted by the c~>4 coordinate (but by
>> c~>2, which it the last number).
>>
>> Moreover, disabling index scans fixes the ordering:
>>
>> test=# set enable_indexscan = off;
>> SET
>> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
>> ascending by 2nd coordinate or upper right corner
>>  ?column? | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   155 | (18037, 155),(17941, 109)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   187 | (759, 187),(662, 163)
>>   191 | (16906, 191),(16816, 139)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   249 | (45989, 249),(45910, 222)
>>   255 | (24423, 255),(24360, 213)
>>   266 | (22684, 266),(22656, 181)
>>   367 | (31018, 367),(30946, 333)
>>   377 | (11399, 377),(11360, 294)
>> (15 rows)
>>
>>
>> Seems like a bug somewhere in gist_cube_ops, I guess?
>>
>
> +1,
> that definitely looks like a bug. Thank you for reporting!
> I'll take a look on it in couple days.
>

I've reviewed code of ~> operator and its KNN-GiST support.  Unfortunately,
it appears that it's broken in design...  The explanation is above.

We've following implementation of ~> operator.

if (coord <= DIM(cube))
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[coord - 1]);
> else
> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
> cube->x[coord - 1 + DIM(cube)]));
> }
> else
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
> else
> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
> cube->x[coord - 1 - DIM(cube)]));
> }


Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - 1]
are upper bounds.  However, we might have indexed cubes of different
dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
lower bound of 3rd dimension.

Therefore, despite ~> operator was specially invented to be supported by
KNN-GIST, it can't serve this way.

Regarding particular case discovered by Tomas, I think the error is in the
GiST supporting code.

if (strategy == CubeKNNDistanceCoord)
> {
> int coord = PG_GETARG_INT32(1);
> if (DIM(cube) == 0)
> retval = 0.0;
> else if (IS_POINT(cube))
> retval = cube->x[(coord - 1) % DIM(cube)];
> else
> retval = Min(cube->x[(coord - 1) % DIM(cube)],
> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
> }


g_cube_distance() always returns lower bound of cube.  It should return
upper bound for coord > DIM(cube).

It would be also nice to provide descending ordering using KNN-GiST.  It
would be especially effective for upper bound.  Since, KNN-GiST doesn't
support explicit descending ordering, it might be useful to make ~>
operator return negative of coordinate when negative argument is provided.
For instance, '(1,2), (3,4)'::cube ~> -1 return -1.

I'd like to propose following way to resolve design issue.  cube ~> (2*N -
1) should return lower bound of Nth coordinate of the cube while cube ~>
2*N should return upper bound of Nth coordinate.  Then it would be
independent on number of coordinates in particular cube.  For sure, it
would be user-visible incompatible change.  But I think there is not so
many users of this operator yet.  Also, current behavior of ~> seems quite
useless.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-19 Thread Alexander Korotkov
Hi!

On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra  wrote:

> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
> which I believe is a bug in the extension (or the GiST index support).
> The following example comes directly from regression tests added by
> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>
> This query should produce results with ordering "ascending by 2nd
> coordinate or upper right corner". To make it clear, I've added the
> "c~>4" expression to the query, otherwise it's right from the test.
>
> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>  c~>4 | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   155 | (18037, 155),(17941, 109)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   191 | (16906, 191),(16816, 139)
>   187 | (759, 187),(662, 163)
>   266 | (22684, 266),(22656, 181)
>   255 | (24423, 255),(24360, 213)
>   249 | (45989, 249),(45910, 222)
>   377 | (11399, 377),(11360, 294)
>   389 | (12162, 389),(12103, 309)
> (15 rows)
>
> As you can see, it's not actually sorted by the c~>4 coordinate (but by
> c~>2, which it the last number).
>
> Moreover, disabling index scans fixes the ordering:
>
> test=# set enable_indexscan = off;
> SET
> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
> ascending by 2nd coordinate or upper right corner
>  ?column? | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   155 | (18037, 155),(17941, 109)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   187 | (759, 187),(662, 163)
>   191 | (16906, 191),(16816, 139)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   249 | (45989, 249),(45910, 222)
>   255 | (24423, 255),(24360, 213)
>   266 | (22684, 266),(22656, 181)
>   367 | (31018, 367),(30946, 333)
>   377 | (11399, 377),(11360, 294)
> (15 rows)
>
>
> Seems like a bug somewhere in gist_cube_ops, I guess?
>

+1,
that definitely looks like a bug. Thank you for reporting!
I'll take a look on it in couple days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-19 Thread Tomas Vondra
Hi,

I've noticed this suspicious behavior of "cube" data type with ORDER BY,
which I believe is a bug in the extension (or the GiST index support).
The following example comes directly from regression tests added by
33bd250f (so CC Teodor and Stas, who are mentioned in the commit).

This query should produce results with ordering "ascending by 2nd
coordinate or upper right corner". To make it clear, I've added the
"c~>4" expression to the query, otherwise it's right from the test.

test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
 c~>4 | c
--+---
   50 | (30333, 50),(30273, 6)
   75 | (43301, 75),(43227, 43)
  142 | (19650, 142),(19630, 51)
  160 | (2424, 160),(2424, 81)
  171 | (3449, 171),(3354, 108)
  155 | (18037, 155),(17941, 109)
  208 | (28511, 208),(28479, 114)
  217 | (19946, 217),(19941, 118)
  191 | (16906, 191),(16816, 139)
  187 | (759, 187),(662, 163)
  266 | (22684, 266),(22656, 181)
  255 | (24423, 255),(24360, 213)
  249 | (45989, 249),(45910, 222)
  377 | (11399, 377),(11360, 294)
  389 | (12162, 389),(12103, 309)
(15 rows)

As you can see, it's not actually sorted by the c~>4 coordinate (but by
c~>2, which it the last number).

Moreover, disabling index scans fixes the ordering:

test=# set enable_indexscan = off;
SET
test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
ascending by 2nd coordinate or upper right corner
 ?column? | c
--+---
   50 | (30333, 50),(30273, 6)
   75 | (43301, 75),(43227, 43)
  142 | (19650, 142),(19630, 51)
  155 | (18037, 155),(17941, 109)
  160 | (2424, 160),(2424, 81)
  171 | (3449, 171),(3354, 108)
  187 | (759, 187),(662, 163)
  191 | (16906, 191),(16816, 139)
  208 | (28511, 208),(28479, 114)
  217 | (19946, 217),(19941, 118)
  249 | (45989, 249),(45910, 222)
  255 | (24423, 255),(24360, 213)
  266 | (22684, 266),(22656, 181)
  367 | (31018, 367),(30946, 333)
  377 | (11399, 377),(11360, 294)
(15 rows)


Seems like a bug somewhere in gist_cube_ops, I guess?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers