I forgot to mention this.. At Thu, 02 Feb 2017 21:11:16 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20170202.211116.49572782.horiguchi.kyot...@lab.ntt.co.jp> > Hello, I digged this topic covered by a spiderweb.. > > # PostGIS requires so many libraries installed :( > > FIY, for the given test case, the following query hits the bug. > > | SELECT edge.geom AS geom > | FROM (SELECT * FROM knn_recheck_geom) AS edge > | ORDER BY '010100000014AE47E17A141FC09A99999999A170C0' <-> edge.geom LIMIT 2; > > > At Tue, 3 Jan 2017 11:18:55 -0500, Robert Haas <robertmh...@gmail.com> wrote > in <ca+tgmoauhlf6r07sauzqircstf5kfrw7nwiwn4vzgisf8ma...@mail.gmail.com> > > On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe <l...@pcorp.us> wrote: > > > It's still not quite clear to me even looking at that git commit, why > > > those need to error instead of going thru recheck aside from efficiency. > > > > The code that reorders the returned tuples assumes that (1) the actual > > distance is always greater than or equal to the estimated distance and > > (2) the index returns the tuples in order of increasing estimated > > distance. Imagine that the estimated distances are 0, 1, 2, 3... and > > the real distances are 2,3,4,5... When it sees the > > estimated-distance-0 tuple it computes that the actual distance is 2, > > but it doesn't know whether there's going to be a tuple later with an > > actual distance between 0 and 2, so it buffers the tuple. When it sees > > the estimated-distance-1 tuple it computes that the actual distance is > > 2, and now it knows there won't be any more estimated or actual > > distances between 0 and 1, but there could still be a tuple with an > > estimated distance of 1 and 2 whose actual distance is also between 1 > > and 2, so it buffers the second tuple as well. When it sees the third > > tuple, with estimated distance 2, it now knows that there won't be any > > further tuples whose estimated or actual distance is less than 2. So > > now it can emit the first tuple that it saw, because that had an > > actual distance of 2 and from this point forward the index will never > > return anything with a smaller estimated or actual distance. The > > estimated-distance-1 tuple still has to stay in the buffer, though, > > until we see a tuple whose estimated distance is greater than that > > tuple's actual distance (3). > > The estimation is calculated using box2df_distance, and the > recheck evaluation is using distnce (of libgeom). > > For the problematic case, the distance's result is > "6.992999999999999439" and box2df_distance's is > "6.993000030517578125" by "%20.18lf". The point should be just on > the edge of its bounding box.

## Advertising

The binary representaions of the two are the following. distance: "40 1b f8 d4 fd f3 b6 45" =(0:plus) (10000000001 = 2^(1025-1023) = *4) 1."bf8d4fdf3b645" estimation: "40 1b f8 d5 00 00 00 00" =(0:plus) (10000000001 = 2^(1025-1023) = *4) 1."bf8d500000000" I mean, the estimation (box2df_distance) seems getting precision reduction on its matissa. (I cannot say more than this, though) > gserialized_gist_distance_2d() of PostGIS 2.3.0: > > /* In all cases, since we only have keys (boxes) we'll return */ > > /* the minimum possible distance, which is the box2df_distance */ > > /* and let the recheck sort things out in the case of leaves */ > > distance = (double)box2df_distance(entry_box, &query_box); > > This theoretically doesn't give larger value than the real > distance but it is failing. I'm not sure how to fix this but at > least it is not a problem of GIST or PostgreSQL. > > If set the distance() of libgeom as the base, box2df_distance > might should return a very-very slightly smaller value (I don't > know how much it should be.) so that it can conseal the error of > its operation. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers