On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorot...@gmail.com> writes: > > On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov > > <aekorot...@gmail.com>wrote: > >> Described differences leads to incorrect behaviour of GiST index. > >> The question is: what is correct way to fix it? Should on_pb also use > FP* > >> or consistent method should behave like on_pb? > > > Any comments on this? Current behaviour definitely indicates a bug, and > I'm > > ready to fix it. The only question: is this bug in on_pb or gist? > > I'm inclined to think the right answer is to make on_pb use the FP* > macros, for consistency with other geometric operators. But it's worth > asking whether that will actually fix the problem. I've thought for > some time that we'd eventually find cases where geo_ops' use of fuzzy > comparisons breaks index behavior entirely, because it destroys natural > assumptions like the transitive law. So that could eventually lead us > to rip out the FP* macros everywhere. > > In any case, this doesn't seem like something we could back-patch; > it'd be a behavioral change in HEAD only. > Analyzing GiST interface methods more detail I found one other place where fuzzy comparison was used. It is gist_box_same function. And it's really scary thing. It means that entry of internal page is not extended when inserting index tuple extends it in less than EPSILON. Correspondingly current GiST search behaviour is neither exact search or fuzzy search with given EPSILON. It is something in the middle. See following example for proof. test=# create table test(a box); CREATE TABLE test=# insert into test (select (case when i%2= 0 then box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from generate_series(1,300) i); INSERT 0 300 test=# create index test_idx on test using gist(a); CREATE INDEX test=# test=# insert into test values (box(point(1.0000009,1.0000009), point(1.0000009,1.0000009))); INSERT 0 1 test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018)); a --------------------------------------------- (1.0000009,1.0000009),(1.0000009,1.0000009) (1 row) test=# set enable_seqscan = off; SET test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018)); a --- (0 rows) But, I believe we still can bachpatch it in one of following ways: 1) Fix consistent and same functions. Add to release note notice that users should rebuild indexes if they want correct behaviour. 2) Leave same function as is. Add kluges to consistent functions which provides correct search on current not truly correct trees. But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I really dislike the way FP* are currently implemented. Why EPSILON is 1.0E-06? We don't know anything about nature of data that users store in geometrical datatypes. The lesser double precision value is 5E-324. For some datasets EPSILON can easily cover whole range. ------ With bestregards, Alexander Korotkov.