Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

Kyotaro HORIGUCHI Tue, 31 Jan 2017 02:05:55 -0800

Hello, thank you for the revised patch.

The only comment from me is about comments on the new over*2D
funnctions.



At Mon, 30 Jan 2017 21:12:31 +0300, Nikita Glukhov <n.glu...@postgrespro.ru> 
wrote in <4450e7a6-01e7-0fb2-a01e-98fb5405d...@postgrespro.ru>
> On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> > At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov
> > <n.glu...@postgrespro.ru> wrote
> > in <9ea5b157-478c-8874-bc9b-050076b7d...@postgrespro.ru>:
> >> Working on the tests for new SP-GiST opclasses for polygons and
> >> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some
> >> operators
> >> (&<, &>, $<|, |&>) have wrong tests in
> >> spg_box_quad_inner_consistent().
> >> This obviously leads to incorrect results of a SP-GiST index scan (see
> >> tests
> >> in the attached patch, their results were taken from a sequential
> >> scan).
> >
> > Your problem is not necessarily evident for others.  Perhaps you
> > have to provide an explanation and/or a test case that describes
> > what is wrong for you so that others can get a clue for this
> > problem. Simpler test is better.
> 
> The problem is that there are mixed low/high values for x/y axes.  For
> example,
> overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range
> &query->right,
> and also lower2D() here must use query->high instead of query->low.
> 
> The corresponding test is very simple: insert 10000 nearly arbitrary
> boxes and
> see the difference between results of sequential scan and results of
> index scan.
> 
> regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box);
> 
> regression=# INSERT INTO quad_box_tbl
>       SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
>       FROM generate_series(1, 100) x,
>                generate_series(1, 100) y;
> 
> regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING
> spgist(b);

Thank you for the explanation and the test with fixed numbers.
I clearly understand what is the problem.

> regression=# SET enable_seqscan = ON;
> regression=# SET enable_indexscan = OFF;
> regression=# SET enable_bitmapscan = OFF;
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box
> '((100,200),(300,500))';
>  count
> -------
>   2900
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box
> '((100,200),(300,500))';
>  count
> -------
>   9100
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box
> '((100,200),(300,500))';
>  count
> -------
>   4900
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box
> '((100,200),(300,500))';
>  count
> -------
>   8100
> (1 row)
> 
> regression=# SET enable_seqscan = OFF;
> regression=# SET enable_indexscan = ON;
> regression=# SET enable_bitmapscan = ON;
(different result for the same queies)

The minimal bad example for the '&<' case is the following.

=# create table t (b box);
=# create index on t using spgist(b);
=# insert into t values ('((215, 305),(210,300))'::box);
=# select * from t where b &< '((100,200),(300,400))'::box;
          b          
---------------------
 (215,305),(210,300)
(1 row)

=# set enable_seqscan to false;
=# select * from t where b &< '((100,200),(300,400))'::box;
 b 
---
(0 rows)

Index scan excludes the row and it obviously is not a difference
comes from error nor tolerance.

The current overLeft4D is equivalent to the following.

| FPlt(rect_box->range_box_x->left.high,  query->right.high) &&
| FPlt(rect_box->range_box_x->right.high, query->right.high)

This is translated into the BOX context as the following.

| FPlt(range high(low.x), query->high.y) &&
| FPlt(range high(high.x), query->high.y)

Yes, it is obviously wrong as your report.


The code part of the v02 patch looks very good than the previous
one. It is exactly what I had in my mind. Thanks.

The following comment,

> /* Can any range from range_box to be overlower than this argument? */

This might be better to be using the same wording to its users,
for example the comment for overLeft4D is the following.

| /* Can any rectangle from rect_box does not extend the right of this 
argument? */

And the documentation uses the following sentsnce,

https://www.postgresql.org/docs/current/static/functions-geometry.html
| Does not extend to the right of?

So I think "Can any range from range_box does not extend the
upper of this argument?" is better than the overlower. What do
you think?


> > counting them in this way is unstable. Even though this were
> > stable due to a fixed initial, this would be unacceptable, I
> > think. This kind of test should use nonrandom numbers.
> 
> I have replaced random data in this test with stable uniformly
> distributed data.
> I would also like to use existing box data from rect.data that is
> loaded to
> slow_emp4000 table in copy.sql test, but box.sql test is executed
> before copy.sql.

Nonrandom data is good. I'd like to compare the result of
corresnponding queries but unfortunately we don't have outer-join
on boxes. Anyway the test is enough since it detects this bug.

I confirmed other functions in geo_spgist but found no bugs like
this.

> > Even though I don't understand this in depth, the following
> > change seems somewhat wrong in direction. Changing the second
> > argument type seems breaking the basis of the design.
> >
> > | -lower2D(RangeBox *range_box, Range *query)
> > | +lower2D(RangeBox *range_box, double query)
> 
> Maybe.  I also thought that these changes are quite doubtful, so I
> decided to
> replace lowerEqual2D()/higherEqual2D() with
> overlower2D()/overhigher2D() and
> not to touch lower2D()/higher2D().

Thanks, this looks bery good.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

Reply via email to