Hello, thank you for the revised patch. The only comment from me is about comments on the new over*2D funnctions.
At Mon, 30 Jan 2017 21:12:31 +0300, Nikita Glukhov <n.glu...@postgrespro.ru> wrote in <4450e7a6-01e7-0fb2-a01e-98fb5405d...@postgrespro.ru> > On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote: > > Hello, > > > > At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov > > <n.glu...@postgrespro.ru> wrote > > in <9ea5b157-478c-8874-bc9b-050076b7d...@postgrespro.ru>: > >> Working on the tests for new SP-GiST opclasses for polygons and > >> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some > >> operators > >> (&<, &>, $<|, |&>) have wrong tests in > >> spg_box_quad_inner_consistent(). > >> This obviously leads to incorrect results of a SP-GiST index scan (see > >> tests > >> in the attached patch, their results were taken from a sequential > >> scan). > > > > Your problem is not necessarily evident for others. Perhaps you > > have to provide an explanation and/or a test case that describes > > what is wrong for you so that others can get a clue for this > > problem. Simpler test is better. > > The problem is that there are mixed low/high values for x/y axes. For > example, > overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range > &query->right, > and also lower2D() here must use query->high instead of query->low. > > The corresponding test is very simple: insert 10000 nearly arbitrary > boxes and > see the difference between results of sequential scan and results of > index scan. > > regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box); > > regression=# INSERT INTO quad_box_tbl > SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5)) > FROM generate_series(1, 100) x, > generate_series(1, 100) y; > > regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING > spgist(b); Thank you for the explanation and the test with fixed numbers. I clearly understand what is the problem. > regression=# SET enable_seqscan = ON; > regression=# SET enable_indexscan = OFF; > regression=# SET enable_bitmapscan = OFF; > > regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box > '((100,200),(300,500))'; > count > ------- > 2900 > (1 row) > > regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box > '((100,200),(300,500))'; > count > ------- > 9100 > (1 row) > > regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box > '((100,200),(300,500))'; > count > ------- > 4900 > (1 row) > > regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box > '((100,200),(300,500))'; > count > ------- > 8100 > (1 row) > > regression=# SET enable_seqscan = OFF; > regression=# SET enable_indexscan = ON; > regression=# SET enable_bitmapscan = ON; (different result for the same queies) The minimal bad example for the '&<' case is the following. =# create table t (b box); =# create index on t using spgist(b); =# insert into t values ('((215, 305),(210,300))'::box); =# select * from t where b &< '((100,200),(300,400))'::box; b --------------------- (215,305),(210,300) (1 row) =# set enable_seqscan to false; =# select * from t where b &< '((100,200),(300,400))'::box; b --- (0 rows) Index scan excludes the row and it obviously is not a difference comes from error nor tolerance. The current overLeft4D is equivalent to the following. | FPlt(rect_box->range_box_x->left.high, query->right.high) && | FPlt(rect_box->range_box_x->right.high, query->right.high) This is translated into the BOX context as the following. | FPlt(range high(low.x), query->high.y) && | FPlt(range high(high.x), query->high.y) Yes, it is obviously wrong as your report. The code part of the v02 patch looks very good than the previous one. It is exactly what I had in my mind. Thanks. The following comment, > /* Can any range from range_box to be overlower than this argument? */ This might be better to be using the same wording to its users, for example the comment for overLeft4D is the following. | /* Can any rectangle from rect_box does not extend the right of this argument? */ And the documentation uses the following sentsnce, https://www.postgresql.org/docs/current/static/functions-geometry.html | Does not extend to the right of? So I think "Can any range from range_box does not extend the upper of this argument?" is better than the overlower. What do you think? > > counting them in this way is unstable. Even though this were > > stable due to a fixed initial, this would be unacceptable, I > > think. This kind of test should use nonrandom numbers. > > I have replaced random data in this test with stable uniformly > distributed data. > I would also like to use existing box data from rect.data that is > loaded to > slow_emp4000 table in copy.sql test, but box.sql test is executed > before copy.sql. Nonrandom data is good. I'd like to compare the result of corresnponding queries but unfortunately we don't have outer-join on boxes. Anyway the test is enough since it detects this bug. I confirmed other functions in geo_spgist but found no bugs like this. > > Even though I don't understand this in depth, the following > > change seems somewhat wrong in direction. Changing the second > > argument type seems breaking the basis of the design. > > > > | -lower2D(RangeBox *range_box, Range *query) > > | +lower2D(RangeBox *range_box, double query) > > Maybe. I also thought that these changes are quite doubtful, so I > decided to > replace lowerEqual2D()/higherEqual2D() with > overlower2D()/overhigher2D() and > not to touch lower2D()/higher2D(). Thanks, this looks bery good. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers