Re: [HACKERS] CUBE seems a bit confused about ORDER BY
On Sun, Oct 29, 2017 at 1:30 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > I've reviewed code of ~> operator and its KNN-GiST support. > Unfortunately, it appears that it's broken in design... The explanation is > above. > > We've following implementation of ~> operator. > > if (coord <= DIM(cube)) >> { >> if (IS_POINT(cube)) >> PG_RETURN_FLOAT8(cube->x[coord - 1]); >> else >> PG_RETURN_FLOAT8(Min(cube->x[coord - 1], >> cube->x[coord - 1 + DIM(cube)])); >> } >> else >> { >> if (IS_POINT(cube)) >> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]); >> else >> PG_RETURN_FLOAT8(Max(cube->x[coord - 1], >> cube->x[coord - 1 - DIM(cube)])); >> } > > > Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - > 1] are upper bounds. However, we might have indexed cubes of different > dimensions. For example, for cube of 2 dimensions "cube ~> 3" selects > upper bound of 1st dimension. For cube of 3 dimensions "cube ~> 3" selects > lower bound of 3rd dimension. > > Therefore, despite ~> operator was specially invented to be supported by > KNN-GIST, it can't serve this way. > > Regarding particular case discovered by Tomas, I think the error is in the > GiST supporting code. > > if (strategy == CubeKNNDistanceCoord) >> { >> int coord = PG_GETARG_INT32(1); >> if (DIM(cube) == 0) >> retval = 0.0; >> else if (IS_POINT(cube)) >> retval = cube->x[(coord - 1) % DIM(cube)]; >> else >> retval = Min(cube->x[(coord - 1) % DIM(cube)], >> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]); >> } > > > g_cube_distance() always returns lower bound of cube. It should return > upper bound for coord > DIM(cube). > > It would be also nice to provide descending ordering using KNN-GiST. It > would be especially effective for upper bound. Since, KNN-GiST doesn't > support explicit descending ordering, it might be useful to make ~> > operator return negative of coordinate when negative argument is provided. > For instance, '(1,2), (3,4)'::cube ~> -1 return -1. > > I'd like to propose following way to resolve design issue. cube ~> (2*N - > 1) should return lower bound of Nth coordinate of the cube while cube ~> > 2*N should return upper bound of Nth coordinate. Then it would be > independent on number of coordinates in particular cube. For sure, it > would be user-visible incompatible change. But I think there is not so > many users of this operator yet. Also, current behavior of ~> seems quite > useless. > Thus, I heard no objection to my approach. Attached patch changes ~> operator in the proposed way and fixes KNN-GiST search for that. I'm going to register this patch to the nearest commitfest. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company cube-knn-fix-1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUBE seems a bit confused about ORDER BY
On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra < > tomas.von...@2ndquadrant.com> wrote: > >> I've noticed this suspicious behavior of "cube" data type with ORDER BY, >> which I believe is a bug in the extension (or the GiST index support). >> The following example comes directly from regression tests added by >> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit). >> >> This query should produce results with ordering "ascending by 2nd >> coordinate or upper right corner". To make it clear, I've added the >> "c~>4" expression to the query, otherwise it's right from the test. >> >> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15; >> c~>4 | c >> --+--- >>50 | (30333, 50),(30273, 6) >>75 | (43301, 75),(43227, 43) >> 142 | (19650, 142),(19630, 51) >> 160 | (2424, 160),(2424, 81) >> 171 | (3449, 171),(3354, 108) >> 155 | (18037, 155),(17941, 109) >> 208 | (28511, 208),(28479, 114) >> 217 | (19946, 217),(19941, 118) >> 191 | (16906, 191),(16816, 139) >> 187 | (759, 187),(662, 163) >> 266 | (22684, 266),(22656, 181) >> 255 | (24423, 255),(24360, 213) >> 249 | (45989, 249),(45910, 222) >> 377 | (11399, 377),(11360, 294) >> 389 | (12162, 389),(12103, 309) >> (15 rows) >> >> As you can see, it's not actually sorted by the c~>4 coordinate (but by >> c~>2, which it the last number). >> >> Moreover, disabling index scans fixes the ordering: >> >> test=# set enable_indexscan = off; >> SET >> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; -- >> ascending by 2nd coordinate or upper right corner >> ?column? | c >> --+--- >>50 | (30333, 50),(30273, 6) >>75 | (43301, 75),(43227, 43) >> 142 | (19650, 142),(19630, 51) >> 155 | (18037, 155),(17941, 109) >> 160 | (2424, 160),(2424, 81) >> 171 | (3449, 171),(3354, 108) >> 187 | (759, 187),(662, 163) >> 191 | (16906, 191),(16816, 139) >> 208 | (28511, 208),(28479, 114) >> 217 | (19946, 217),(19941, 118) >> 249 | (45989, 249),(45910, 222) >> 255 | (24423, 255),(24360, 213) >> 266 | (22684, 266),(22656, 181) >> 367 | (31018, 367),(30946, 333) >> 377 | (11399, 377),(11360, 294) >> (15 rows) >> >> >> Seems like a bug somewhere in gist_cube_ops, I guess? >> > > +1, > that definitely looks like a bug. Thank you for reporting! > I'll take a look on it in couple days. > I've reviewed code of ~> operator and its KNN-GiST support. Unfortunately, it appears that it's broken in design... The explanation is above. We've following implementation of ~> operator. if (coord <= DIM(cube)) > { > if (IS_POINT(cube)) > PG_RETURN_FLOAT8(cube->x[coord - 1]); > else > PG_RETURN_FLOAT8(Min(cube->x[coord - 1], > cube->x[coord - 1 + DIM(cube)])); > } > else > { > if (IS_POINT(cube)) > PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]); > else > PG_RETURN_FLOAT8(Max(cube->x[coord - 1], > cube->x[coord - 1 - DIM(cube)])); > } Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - 1] are upper bounds. However, we might have indexed cubes of different dimensions. For example, for cube of 2 dimensions "cube ~> 3" selects upper bound of 1st dimension. For cube of 3 dimensions "cube ~> 3" selects lower bound of 3rd dimension. Therefore, despite ~> operator was specially invented to be supported by KNN-GIST, it can't serve this way. Regarding particular case discovered by Tomas, I think the error is in the GiST supporting code. if (strategy == CubeKNNDistanceCoord) > { > int coord = PG_GETARG_INT32(1); > if (DIM(cube) == 0) > retval = 0.0; > else if (IS_POINT(cube)) > retval = cube->x[(coord - 1) % DIM(cube)]; > else > retval = Min(cube->x[(coord - 1) % DIM(cube)], > cube->x[(coord - 1) % DIM(cube) + DIM(cube)]); > } g_cube_distance() always returns lower bound of cube. It should return upper bound for coord > DIM(cube). It would be also nice to provide descending ordering using KNN-GiST. It would be especially effective for upper bound. Since, KNN-GiST doesn't support explicit descending ordering, it might be useful to make ~> operator return negative of coordinate when negative argument is provided. For instance, '(1,2), (3,4)'::cube ~> -1 return -1. I'd like to propose following way to resolve design issue. cube ~> (2*N - 1) should return lower bound of Nth coordinate of the cube while cube ~> 2*N should return upper bound of Nth coordinate. Then it would be independent on number of coordinates in particular cube. For sure, it would be user-visible incompatible change. But I think there is not so many users of this operator yet. Also, current behavior of ~> seems quite useless. Any thoughts? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] CUBE seems a bit confused about ORDER BY
Hi! On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondrawrote: > I've noticed this suspicious behavior of "cube" data type with ORDER BY, > which I believe is a bug in the extension (or the GiST index support). > The following example comes directly from regression tests added by > 33bd250f (so CC Teodor and Stas, who are mentioned in the commit). > > This query should produce results with ordering "ascending by 2nd > coordinate or upper right corner". To make it clear, I've added the > "c~>4" expression to the query, otherwise it's right from the test. > > test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15; > c~>4 | c > --+--- >50 | (30333, 50),(30273, 6) >75 | (43301, 75),(43227, 43) > 142 | (19650, 142),(19630, 51) > 160 | (2424, 160),(2424, 81) > 171 | (3449, 171),(3354, 108) > 155 | (18037, 155),(17941, 109) > 208 | (28511, 208),(28479, 114) > 217 | (19946, 217),(19941, 118) > 191 | (16906, 191),(16816, 139) > 187 | (759, 187),(662, 163) > 266 | (22684, 266),(22656, 181) > 255 | (24423, 255),(24360, 213) > 249 | (45989, 249),(45910, 222) > 377 | (11399, 377),(11360, 294) > 389 | (12162, 389),(12103, 309) > (15 rows) > > As you can see, it's not actually sorted by the c~>4 coordinate (but by > c~>2, which it the last number). > > Moreover, disabling index scans fixes the ordering: > > test=# set enable_indexscan = off; > SET > test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; -- > ascending by 2nd coordinate or upper right corner > ?column? | c > --+--- >50 | (30333, 50),(30273, 6) >75 | (43301, 75),(43227, 43) > 142 | (19650, 142),(19630, 51) > 155 | (18037, 155),(17941, 109) > 160 | (2424, 160),(2424, 81) > 171 | (3449, 171),(3354, 108) > 187 | (759, 187),(662, 163) > 191 | (16906, 191),(16816, 139) > 208 | (28511, 208),(28479, 114) > 217 | (19946, 217),(19941, 118) > 249 | (45989, 249),(45910, 222) > 255 | (24423, 255),(24360, 213) > 266 | (22684, 266),(22656, 181) > 367 | (31018, 367),(30946, 333) > 377 | (11399, 377),(11360, 294) > (15 rows) > > > Seems like a bug somewhere in gist_cube_ops, I guess? > +1, that definitely looks like a bug. Thank you for reporting! I'll take a look on it in couple days. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company