Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
I forgot to mention this.. At Thu, 02 Feb 2017 21:11:16 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20170202.26.49572782.horiguchi.kyot...@lab.ntt.co.jp> > Hello, I digged this topic covered by a spiderweb.. > > # PostGIS requires so many libraries installed :( > > FIY, for the given test case, the following query hits the bug. > > | SELECT edge.geom AS geom > | FROM (SELECT * FROM knn_recheck_geom) AS edge > | ORDER BY '01010014AE47E17A141FC09AA170C0' <-> edge.geom LIMIT 2; > > > At Tue, 3 Jan 2017 11:18:55 -0500, Robert Haas wrote > in > > On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe wrote: > > > It's still not quite clear to me even looking at that git commit, why > > > those need to error instead of going thru recheck aside from efficiency. > > > > The code that reorders the returned tuples assumes that (1) the actual > > distance is always greater than or equal to the estimated distance and > > (2) the index returns the tuples in order of increasing estimated > > distance. Imagine that the estimated distances are 0, 1, 2, 3... and > > the real distances are 2,3,4,5... When it sees the > > estimated-distance-0 tuple it computes that the actual distance is 2, > > but it doesn't know whether there's going to be a tuple later with an > > actual distance between 0 and 2, so it buffers the tuple. When it sees > > the estimated-distance-1 tuple it computes that the actual distance is > > 2, and now it knows there won't be any more estimated or actual > > distances between 0 and 1, but there could still be a tuple with an > > estimated distance of 1 and 2 whose actual distance is also between 1 > > and 2, so it buffers the second tuple as well. When it sees the third > > tuple, with estimated distance 2, it now knows that there won't be any > > further tuples whose estimated or actual distance is less than 2. So > > now it can emit the first tuple that it saw, because that had an > > actual distance of 2 and from this point forward the index will never > > return anything with a smaller estimated or actual distance. The > > estimated-distance-1 tuple still has to stay in the buffer, though, > > until we see a tuple whose estimated distance is greater than that > > tuple's actual distance (3). > > The estimation is calculated using box2df_distance, and the > recheck evaluation is using distnce (of libgeom). > > For the problematic case, the distance's result is > "6.992439" and box2df_distance's is > "6.99330517578125" by "%20.18lf". The point should be just on > the edge of its bounding box. The binary representaions of the two are the following. distance: "40 1b f8 d4 fd f3 b6 45" =(0:plus) (101 = 2^(1025-1023) = *4) 1."bf8d4fdf3b645" estimation: "40 1b f8 d5 00 00 00 00" =(0:plus) (101 = 2^(1025-1023) = *4) 1."bf8d5" I mean, the estimation (box2df_distance) seems getting precision reduction on its matissa. (I cannot say more than this, though) > gserialized_gist_distance_2d() of PostGIS 2.3.0: > > /* In all cases, since we only have keys (boxes) we'll return */ > > /* the minimum possible distance, which is the box2df_distance */ > > /* and let the recheck sort things out in the case of leaves */ > > distance = (double)box2df_distance(entry_box, _box); > > This theoretically doesn't give larger value than the real > distance but it is failing. I'm not sure how to fix this but at > least it is not a problem of GIST or PostgreSQL. > > If set the distance() of libgeom as the base, box2df_distance > might should return a very-very slightly smaller value (I don't > know how much it should be.) so that it can conseal the error of > its operation. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
Hello, I digged this topic covered by a spiderweb.. # PostGIS requires so many libraries installed :( FIY, for the given test case, the following query hits the bug. | SELECT edge.geom AS geom | FROM (SELECT * FROM knn_recheck_geom) AS edge | ORDER BY '01010014AE47E17A141FC09AA170C0' <-> edge.geom LIMIT 2; At Tue, 3 Jan 2017 11:18:55 -0500, Robert Haaswrote in > On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe wrote: > > It's still not quite clear to me even looking at that git commit, why those > > need to error instead of going thru recheck aside from efficiency. > > The code that reorders the returned tuples assumes that (1) the actual > distance is always greater than or equal to the estimated distance and > (2) the index returns the tuples in order of increasing estimated > distance. Imagine that the estimated distances are 0, 1, 2, 3... and > the real distances are 2,3,4,5... When it sees the > estimated-distance-0 tuple it computes that the actual distance is 2, > but it doesn't know whether there's going to be a tuple later with an > actual distance between 0 and 2, so it buffers the tuple. When it sees > the estimated-distance-1 tuple it computes that the actual distance is > 2, and now it knows there won't be any more estimated or actual > distances between 0 and 1, but there could still be a tuple with an > estimated distance of 1 and 2 whose actual distance is also between 1 > and 2, so it buffers the second tuple as well. When it sees the third > tuple, with estimated distance 2, it now knows that there won't be any > further tuples whose estimated or actual distance is less than 2. So > now it can emit the first tuple that it saw, because that had an > actual distance of 2 and from this point forward the index will never > return anything with a smaller estimated or actual distance. The > estimated-distance-1 tuple still has to stay in the buffer, though, > until we see a tuple whose estimated distance is greater than that > tuple's actual distance (3). The estimation is calculated using box2df_distance, and the recheck evaluation is using distnce (of libgeom). For the problematic case, the distance's result is "6.992439" and box2df_distance's is "6.99330517578125" by "%20.18lf". The point should be just on the edge of its bounding box. gserialized_gist_distance_2d() of PostGIS 2.3.0: > /* In all cases, since we only have keys (boxes) we'll return */ > /* the minimum possible distance, which is the box2df_distance */ > /* and let the recheck sort things out in the case of leaves */ > distance = (double)box2df_distance(entry_box, _box); This theoretically doesn't give larger value than the real distance but it is failing. I'm not sure how to fix this but at least it is not a problem of GIST or PostgreSQL. If set the distance() of libgeom as the base, box2df_distance might should return a very-very slightly smaller value (I don't know how much it should be.) so that it can conseal the error of its operation. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
On Tue, Jan 3, 2017 at 12:36 AM, Regina Obewrote: >> cmp would return 0 if the estimated distance returned by the index AM were >> greater than the actual distance. >> The estimated distance can be less than the actual distance, but it isn't >> allowed to be more. See gist_bbox_distance for an example of a "lossy" >> distance calculation, and more generally "git show >> 35fcb1b3d038a501f3f4c87c05630095abaaadab". > > Did you mean would return < 0 ? Yes, sorry. > Since I thought 0 meant exact and not where it's Erroring? > > I think for points then maybe we should turn it off, as this could just be > floating point issues with the way we compute the index. > That would explain why it doesn't happen for other cases like polygon / > point in our code > or polygon /polygon in our code since the box box distance in our code would > always be <= actual distance for those. > > So maybe the best course of action is just for us inspect the geometries and > if both are points just disable recheck. > > It's still not quite clear to me even looking at that git commit, why those > need to error instead of going thru recheck aside from efficiency. The code that reorders the returned tuples assumes that (1) the actual distance is always greater than or equal to the estimated distance and (2) the index returns the tuples in order of increasing estimated distance. Imagine that the estimated distances are 0, 1, 2, 3... and the real distances are 2,3,4,5... When it sees the estimated-distance-0 tuple it computes that the actual distance is 2, but it doesn't know whether there's going to be a tuple later with an actual distance between 0 and 2, so it buffers the tuple. When it sees the estimated-distance-1 tuple it computes that the actual distance is 2, and now it knows there won't be any more estimated or actual distances between 0 and 1, but there could still be a tuple with an estimated distance of 1 and 2 whose actual distance is also between 1 and 2, so it buffers the second tuple as well. When it sees the third tuple, with estimated distance 2, it now knows that there won't be any further tuples whose estimated or actual distance is less than 2. So now it can emit the first tuple that it saw, because that had an actual distance of 2 and from this point forward the index will never return anything with a smaller estimated or actual distance. The estimated-distance-1 tuple still has to stay in the buffer, though, until we see a tuple whose estimated distance is greater than that tuple's actual distance (3). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
>> If things are out of order, why isn't just going to was_exact = false >> good enough? >> >> I'm not sure if the mistake is in our PostGIS code or something in >> PostgreSQL recheck logic. >> If I change the elog(ERROR ...) to a elog(NOTICE, the answers are >> correct and sort order is right. >> >> Under what conditions would cmp return less than 0? I tried following >> the code in cmp_orderbyvals, but got lost and trying to put elog >> notices in to see what the distance is returning (I probably did it >> wrong), just ended up crashing by backend. > cmp would return 0 if the estimated distance returned by the index AM were > greater than the actual distance. > The estimated distance can be less than the actual distance, but it isn't > allowed to be more. See gist_bbox_distance for an example of a "lossy" > distance calculation, and more generally "git show > 35fcb1b3d038a501f3f4c87c05630095abaaadab". >-- >Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Did you mean would return < 0 ? Since I thought 0 meant exact and not where it's Erroring? I think for points then maybe we should turn it off, as this could just be floating point issues with the way we compute the index. That would explain why it doesn't happen for other cases like polygon / point in our code or polygon /polygon in our code since the box box distance in our code would always be <= actual distance for those. So maybe the best course of action is just for us inspect the geometries and if both are points just disable recheck. It's still not quite clear to me even looking at that git commit, why those need to error instead of going thru recheck aside from efficiency. Thanks, Regina -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
On Fri, Dec 30, 2016 at 12:51 AM, Regina Obewrote: > I've been trying to troubleshoot the cause of this PostGIS recheck bug we > have reported by two people so far. The last test was a nice simple > repeatable one that triggered the issue: > > https://trac.osgeo.org/postgis/ticket/3418 > > from what I have seen this only affects cases where we are doing a distance > check between two points, which we actually don't need to enable recheck for > anyway, but trying to disable that seems like just shoving the real problem > under the covers. Agreed. > If things are out of order, why isn't just going to was_exact = false good > enough? > > I'm not sure if the mistake is in our PostGIS code or something in > PostgreSQL recheck logic. > If I change the elog(ERROR ...) to a elog(NOTICE, the answers are correct > and sort order is right. > > Under what conditions would cmp return less than 0? I tried following the > code in cmp_orderbyvals, but got lost > and trying to put elog notices in to see what the distance is returning (I > probably did it wrong), just ended up crashing by backend. cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance. The estimated distance can be less than the actual distance, but it isn't allowed to be more. See gist_bbox_distance for an example of a "lossy" distance calculation, and more generally "git show 35fcb1b3d038a501f3f4c87c05630095abaaadab". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?
I've been trying to troubleshoot the cause of this PostGIS recheck bug we have reported by two people so far. The last test was a nice simple repeatable one that triggered the issue: https://trac.osgeo.org/postgis/ticket/3418 from what I have seen this only affects cases where we are doing a distance check between two points, which we actually don't need to enable recheck for anyway, but trying to disable that seems like just shoving the real problem under the covers. Where it errors is this line 272 in src/backend/executor/nodeIndexscan https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/exe cutor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l27 2 /* 259 * Was the ORDER BY value returned by the index accurate? The 260 * recheck flag means that the index can return inaccurate values, 261 * but then again, the value returned for any particular tuple 262 * could also be exactly correct. Compare the value returned by 263 * the index with the recalculated value. (If the value returned 264 * by the index happened to be exact right, we can often avoid 265 * pushing the tuple to the queue, just to pop it back out again.) 266 */ 267 cmp = cmp_orderbyvals(node->iss_OrderByValues, 268 node->iss_OrderByNulls, 269 scandesc->xs_orderbyvals, 270 scandesc->xs_orderbynulls, 271 node); 272 if (cmp < 0) 273 elog(ERROR, "index returned tuples in wrong order"); 274 else if (cmp == 0) 275 was_exact = true; 276 else 277 was_exact = false; If things are out of order, why isn't just going to was_exact = false good enough? I'm not sure if the mistake is in our PostGIS code or something in PostgreSQL recheck logic. If I change the elog(ERROR ...) to a elog(NOTICE, the answers are correct and sort order is right. Under what conditions would cmp return less than 0? I tried following the code in cmp_orderbyvals, but got lost and trying to put elog notices in to see what the distance is returning (I probably did it wrong), just ended up crashing by backend. Thanks for any thoughts, Regina -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers