Re: Strange behavior with polygon and NaN
At Mon, 1 Aug 2022 13:29:09 -0700, Jacob Champion wrote in > As discussed in [1], we're taking this opportunity to return some > patchsets that don't appear to be getting enough reviewer interest. Oh, sorry. I missed that thread. Thank you for kindly noticing that. > This is not a rejection, since we don't necessarily think there's > anything unacceptable about the entry, but it differs from a standard > "Returned with Feedback" in that there's probably not much actionable > feedback at all. Rather than code changes, what this patch needs is more > community interest. You might > > - ask people for help with your approach, > - see if there are similar patches that your code could supplement, > - get interested parties to agree to review your patch in a CF, or > - possibly present the functionality in a way that's easier to review > overall. > > (Doing these things is no guarantee that there will be interest, but > it's hopefully better than endlessly rebasing a patchset that is not > receiving any feedback from the community.) > > Once you think you've built up some community support and the patchset > is ready for review, you (or any interested party) can resurrect the > patch entry by visiting > > https://commitfest.postgresql.org/38/2710/ > > and changing the status to "Needs Review", and then changing the > status again to "Move to next CF". (Don't forget the second step; > hopefully we will have streamlined this in the near future!) Thanks. I don't insist on this patch unless some other people are interested in. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Strange behavior with polygon and NaN
As discussed in [1], we're taking this opportunity to return some patchsets that don't appear to be getting enough reviewer interest. This is not a rejection, since we don't necessarily think there's anything unacceptable about the entry, but it differs from a standard "Returned with Feedback" in that there's probably not much actionable feedback at all. Rather than code changes, what this patch needs is more community interest. You might - ask people for help with your approach, - see if there are similar patches that your code could supplement, - get interested parties to agree to review your patch in a CF, or - possibly present the functionality in a way that's easier to review overall. (Doing these things is no guarantee that there will be interest, but it's hopefully better than endlessly rebasing a patchset that is not receiving any feedback from the community.) Once you think you've built up some community support and the patchset is ready for review, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/2710/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob [1] https://postgr.es/m/flat/0ab66589-2f71-69b3-2002-49e821740b0d%40timescale.com
Re: Strange behavior with polygon and NaN
At Wed, 15 Dec 2021 16:20:55 +0900 (JST), Kyotaro Horiguchi wrote in > adjusted so that it treats null as false. On the way doing this, the > bug #17334 [2] and another bug raised earlier [3] are naturally fixed. That being said, even if this patch were committed to the master branch, we won't apply the whole patch set for backbranches. I guess applying the computeDistance part in the patch works. # But I found that the part contains a bugX( regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Strange behavior with polygon and NaN
On 12/21/20 3:30 AM, Kyotaro Horiguchi wrote: At Tue, 01 Dec 2020 10:03:42 -0500, Tom Lane wrote in I think it should be "needs review" now. Conflicted with some commit(s) uncertain to me. Rebased. Tom, Georgios, thoughts on the new patch? Regards, -- -David da...@pgmasters.net
Re: Strange behavior with polygon and NaN
At Tue, 01 Dec 2020 10:03:42 -0500, Tom Lane wrote in > I think it should be "needs review" now. Conflicted with some commit(s) uncertain to me. Rebased. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 26d9edeccf3f3111a7e0ff049aa6e3ba3746dce9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 13 Nov 2020 13:31:20 +0900 Subject: [PATCH v8] fix geometric nan handling --- doc/src/sgml/func.sgml | 17 ++ src/backend/utils/adt/geo_ops.c| 337 ++--- src/include/utils/float.h | 22 ++ src/test/regress/expected/create_index.out | 2 +- src/test/regress/expected/geometry.out | 331 +--- 5 files changed, 485 insertions(+), 224 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d5cd705eeb..2f7f3c633c 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10855,6 +10855,23 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + + + NaN and Infinity make geometric functions and operators behave + inconsistently. Geometric operators or functions that return a boolean + return false for operands that contain NaNs. Number-returning functions + and operators return NaN in most cases but sometimes return a valid + value if no NaNs are met while actual calculation. Object-returning ones + yield an object that contain NaNs depending to the operation. Likewise + the objects containing Infinity can make geometric operators and + functions behave inconsistently. For example (point + '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point + '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never + be a value other than these, but you should consider it uncertain how + geometric operators behave for objects containing Infinity. + + + Geometric Functions diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index c1dc511a1a..e74bf3031e 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -904,9 +904,9 @@ box_intersect(PG_FUNCTION_ARGS) result = (BOX *) palloc(sizeof(BOX)); - result->high.x = float8_min(box1->high.x, box2->high.x); + result->high.x = float8_min_nan(box1->high.x, box2->high.x); result->low.x = float8_max(box1->low.x, box2->low.x); - result->high.y = float8_min(box1->high.y, box2->high.y); + result->high.y = float8_min_nan(box1->high.y, box2->high.y); result->low.y = float8_max(box1->low.y, box2->low.y); PG_RETURN_BOX_P(result); @@ -1061,15 +1061,23 @@ line_construct(LINE *result, Point *pt, float8 m) result->A = -1.0; result->B = 0.0; result->C = pt->x; + + /* Avoid creating a valid line from an invalid point */ + if (likely(!isnan(pt->x) && !isnan(pt->y))) + return; } - else if (m == 0) + else if (m == 0.0) { /* horizontal - use "y = C" */ result->A = 0.0; result->B = -1.0; result->C = pt->y; + + /* Avoid creating a valid line from an invalid point */ + if (likely(!isnan(pt->x) && !isnan(pt->y))) + return; } - else + else if (!isnan(m)) { /* use "mx - y + yinter = 0" */ result->A = m; @@ -1078,7 +1086,12 @@ line_construct(LINE *result, Point *pt, float8 m) /* on some platforms, the preceding expression tends to produce -0 */ if (result->C == 0.0) result->C = 0.0; + + if (likely(!isnan(result->C))) + return; } + + result->A = result->B = result->C = get_float8_nan(); } /* line_construct_pp() @@ -1091,6 +1104,7 @@ line_construct_pp(PG_FUNCTION_ARGS) Point *pt2 = PG_GETARG_POINT_P(1); LINE *result = (LINE *) palloc(sizeof(LINE)); + /* NaNs are considered to be equal by point_eq_point */ if (point_eq_point(pt1, pt2)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -,8 +1125,12 @@ line_intersect(PG_FUNCTION_ARGS) { LINE *l1 = PG_GETARG_LINE_P(0); LINE *l2 = PG_GETARG_LINE_P(1); + Point xp; - PG_RETURN_BOOL(line_interpt_line(NULL, l1, l2)); + if (line_interpt_line(, l1, l2) && !isnan(xp.x) && !isnan(xp.y)) + PG_RETURN_BOOL(true); + else + PG_RETURN_BOOL(false); } Datum @@ -1130,14 +1148,17 @@ line_perp(PG_FUNCTION_ARGS) LINE *l1 = PG_GETARG_LINE_P(0); LINE *l2 = PG_GETARG_LINE_P(1); + if (unlikely(isnan(l1->C) || isnan(l2->C))) + return false; + if (FPzero(l1->A)) - PG_RETURN_BOOL(FPzero(l2->B)); + PG_RETURN_BOOL(FPzero(l2->B) && !isnan(l1->B) && !isnan(l2->A)); if (FPzero(l2->A)) - PG_RETURN_BOOL(FPzero(l1->B)); + PG_RETURN_BOOL(FPzero(l1->B) && !isnan(l2->B) && !isnan(l1->A)); if (FPzero(l1->B)) - PG_RETURN_BOOL(FPzero(l2->A)); + PG_RETURN_BOOL(FPzero(l2->A) && !isnan(l1->A) && !isnan(l2->B)); if (FPzero(l2->B)) - PG_RETURN_BOOL(FPzero(l1->A)); + PG_RETURN_BOOL(FPzero(l1->A) && !isnan(l2->A) && !isnan(l1->B)); PG_RETURN_BOOL(FPeq(float8_div(float8_mul(l1->A, l2->A),
Re: Strange behavior with polygon and NaN
Anastasia Lubennikova writes: > The commitfest is closed now and this entry is "Waiting on author". > As far as I see, part of the fixes is already committed. Is there > anything left to work on or this patch needs review/ ready for committer > now? I think it should be "needs review" now. regards, tom lane
Re: Strange behavior with polygon and NaN
On 25.11.2020 11:14, Kyotaro Horiguchi wrote: At Wed, 25 Nov 2020 11:39:39 +0900 (JST), Kyotaro Horiguchi wrote in So that line of thought prompts me to tread *very* carefully when trying to dodge NaN results. We need to be certain that we introduce only logically-defensible special cases. Something like float8_coef_mul() seems much more likely to lead us into errors than away from them. Agreed on that point. I'm going to rewirte the patch in that direction. Removed the function float8_coef_mul(). I noticed that the check you proposed to add to line_closept_point doesn't work for the following case: select line('{1,-1,0}') <-> point(1e300, 'Infinity'); Ax + By + C = 1 * 1e300 + -1 * Inf + 0 = -Inf is not NaN so we go on the following steps. derive the perpendicular line: => line(-1, -1, Inf} derive the cross point : => point(Inf, Inf) calculate the distance : => NaN (which should be Infinity) So I left the check whether distance is NaN in this version. In the previous version the check is done before directly calculating the distance, but since we already have the result of Ax+Bx+C so I decided not to use point_dt() in this version. Although I wrote that it should be wrong that applying FPzero() to coefficients, there are some places already doing that so I followed those predecessors. Reverted the change of pg_hypot(). While checking the regression results, I noticed that the follwoing calculation, which seems wrong. select line('{3,NaN,5}') = line('{3,NaN,5}'); ?column? -- t But after looking point_eq(), I decided to let the behavior alone since I'm not sure the reason for the behavior of the functions. At least the comment for point_eq() says that is the delibarate behvior. box_same, poly_same base on the point_eq_point so they behave the same way. By the way, '=' doesn't compare the shape but compares the area. However, what is the area of a line? That should be always 0 even if we considered it. And it is also strange that we don't have corresponding comparison ('<' and so) operators. It seems to me as if a mistake of '~='. If it is correct, I should revert the change of line_eq() along with fixing operator assignment. regards. Status update for a commitfest entry. The commitfest is closed now and this entry is "Waiting on author". As far as I see, part of the fixes is already committed. Is there anything left to work on or this patch needs review/ ready for committer now? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Strange behavior with polygon and NaN
At Wed, 25 Nov 2020 11:39:39 +0900 (JST), Kyotaro Horiguchi wrote in > > So that line of thought prompts me to tread *very* carefully when > > trying to dodge NaN results. We need to be certain that we > > introduce only logically-defensible special cases. Something like > > float8_coef_mul() seems much more likely to lead us into errors > > than away from them. > > Agreed on that point. I'm going to rewirte the patch in that > direction. Removed the function float8_coef_mul(). I noticed that the check you proposed to add to line_closept_point doesn't work for the following case: select line('{1,-1,0}') <-> point(1e300, 'Infinity'); Ax + By + C = 1 * 1e300 + -1 * Inf + 0 = -Inf is not NaN so we go on the following steps. derive the perpendicular line: => line(-1, -1, Inf} derive the cross point : => point(Inf, Inf) calculate the distance : => NaN (which should be Infinity) So I left the check whether distance is NaN in this version. In the previous version the check is done before directly calculating the distance, but since we already have the result of Ax+Bx+C so I decided not to use point_dt() in this version. Although I wrote that it should be wrong that applying FPzero() to coefficients, there are some places already doing that so I followed those predecessors. Reverted the change of pg_hypot(). While checking the regression results, I noticed that the follwoing calculation, which seems wrong. select line('{3,NaN,5}') = line('{3,NaN,5}'); ?column? -- t But after looking point_eq(), I decided to let the behavior alone since I'm not sure the reason for the behavior of the functions. At least the comment for point_eq() says that is the delibarate behvior. box_same, poly_same base on the point_eq_point so they behave the same way. By the way, '=' doesn't compare the shape but compares the area. However, what is the area of a line? That should be always 0 even if we considered it. And it is also strange that we don't have corresponding comparison ('<' and so) operators. It seems to me as if a mistake of '~='. If it is correct, I should revert the change of line_eq() along with fixing operator assignment. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 4f8f8605afcd97571bc891bff17b63c963e27fb5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 13 Nov 2020 13:31:20 +0900 Subject: [PATCH v7] fix geometric nan handling --- doc/src/sgml/func.sgml | 17 ++ src/backend/utils/adt/geo_ops.c| 337 ++--- src/include/utils/float.h | 22 ++ src/test/regress/expected/create_index.out | 2 +- src/test/regress/expected/geometry.out | 331 +--- 5 files changed, 485 insertions(+), 224 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 507bc1a668..dc9fe52174 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10859,6 +10859,23 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + + + NaN and Infinity make geometric functions and operators behave + inconsistently. Geometric operators or functions that return a boolean + return false for operands that contain NaNs. Number-returning functions + and operators return NaN in most cases but sometimes return a valid + value if no NaNs are met while actual calculation. Object-returning ones + yield an object that contain NaNs depending to the operation. Likewise + the objects containing Infinity can make geometric operators and + functions behave inconsistently. For example (point + '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point + '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never + be a value other than these, but you should consider it uncertain how + geometric operators behave for objects containing Infinity. + + + Geometric Functions diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index c1dc511a1a..e74bf3031e 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -904,9 +904,9 @@ box_intersect(PG_FUNCTION_ARGS) result = (BOX *) palloc(sizeof(BOX)); - result->high.x = float8_min(box1->high.x, box2->high.x); + result->high.x = float8_min_nan(box1->high.x, box2->high.x); result->low.x = float8_max(box1->low.x, box2->low.x); - result->high.y = float8_min(box1->high.y, box2->high.y); + result->high.y = float8_min_nan(box1->high.y, box2->high.y); result->low.y = float8_max(box1->low.y, box2->low.y); PG_RETURN_BOX_P(result); @@ -1061,15 +1061,23 @@ line_construct(LINE *result, Point *pt, float8 m) result->A = -1.0; result->B = 0.0; result->C = pt->x; + + /* Avoid creating a valid line from an invalid point */ + if (likely(!isnan(pt->x) && !isnan(pt->y))) + return; } - else if (m == 0) + else if (m ==
Re: Strange behavior with polygon and NaN
(My mailer seems to have recovered from unresponsiveness.) At Tue, 24 Nov 2020 12:29:41 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane wrote in > >> I don't much like anything about float8_coef_mul(). > > > I have the same feeling on the function, but I concluded that > > coefficients and coordinates should be regarded as different things in > > the practical standpoint. > > > For example, consider Ax + By + C == 0, if B is 0.0, we can remove the > > second term from the equation, regardless of the value of y, of course > > even if it were inf. that is, The function imitates that kind of > > removals. > > Meh --- I can see where you're going with that, but I don't much like it. > I fear that it's as likely to introduce weird behaviors as remove any. > > The core of the issue in > > > | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}'); > > | ?column? > > | -- > > | NaN > > is that we generate the line y = Inf: > > (gdb) p tmp > $1 = {A = 0, B = -1, C = inf} > > and then try to find the intersection with {1,0,5} (x = -5), but that > calculation involves 0 * Inf so we get NaNs. It seems reasonable that > the intersection should be (-5,Inf), but I don't think we should try > to force the normal calculation to produce that. I think we'd be > better off to explicitly special-case vertical and/or horizontal lines > in line_interpt_line. I don't object to have explicit special case for vertical lines since it is clear than embedding such a function in the formula, but it seems equivalent to what the function is doing, that is, treating inf * 0.0 as 0.0 in some special cases. # And after rethinking, the FPzero() used in the function is wrong # since the macro (function) is expected to be applied to coordinates, # not to coefficients. > Actually though ... even if we successfully got that intersection > point, we'd still end up with a NaN distance between (1e300,Inf) and > (-5,Inf), on account of Inf - Inf being NaN. I think this is correct > and we'd be ill-advised to try to force it to be something else. > Although we pretend that two Infs are equal for purposes such as > sorting, they aren't really, so we should not assume that their > difference is zero. The definition "inf == inf" comes from some practical reasons uncertain to me, and actually inf - inf yields NaN in IEEE 754. However, aren't we going to assume a line on which B is exactly 0.0 as a completely vertical line? Thus things are slightiy different from the IEEE's definition. The "Inf" as the y-coord of the perpendicular foot is actually "the same y-coord with the point". So what we should do on our definition for the calculation is: perp-foot (line {1,0,5}, point(1e300, Inf)) => point(-5, ) distance (point(1e300, Inf), point(-5, )) => 1e300 (+5) This is what the code below is doing: + return float8_div(fabs(float8_pl( + float8_pl( + float8_coef_mul(line->A, point->x, false), + float8_coef_mul(line->B, point->y, false)), + line->C)), + HYPOT(line->A, line->B)); > So that line of thought prompts me to tread *very* carefully when > trying to dodge NaN results. We need to be certain that we > introduce only logically-defensible special cases. Something like > float8_coef_mul() seems much more likely to lead us into errors > than away from them. Agreed on that point. I'm going to rewirte the patch in that direction. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Strange behavior with polygon and NaN
Kyotaro Horiguchi writes: > At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane wrote in >> I don't much like anything about float8_coef_mul(). > I have the same feeling on the function, but I concluded that > coefficients and coordinates should be regarded as different things in > the practical standpoint. > For example, consider Ax + By + C == 0, if B is 0.0, we can remove the > second term from the equation, regardless of the value of y, of course > even if it were inf. that is, The function imitates that kind of > removals. Meh --- I can see where you're going with that, but I don't much like it. I fear that it's as likely to introduce weird behaviors as remove any. The core of the issue in > | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}'); > | ?column? > | -- > | NaN is that we generate the line y = Inf: (gdb) p tmp $1 = {A = 0, B = -1, C = inf} and then try to find the intersection with {1,0,5} (x = -5), but that calculation involves 0 * Inf so we get NaNs. It seems reasonable that the intersection should be (-5,Inf), but I don't think we should try to force the normal calculation to produce that. I think we'd be better off to explicitly special-case vertical and/or horizontal lines in line_interpt_line. Actually though ... even if we successfully got that intersection point, we'd still end up with a NaN distance between (1e300,Inf) and (-5,Inf), on account of Inf - Inf being NaN. I think this is correct and we'd be ill-advised to try to force it to be something else. Although we pretend that two Infs are equal for purposes such as sorting, they aren't really, so we should not assume that their difference is zero. So that line of thought prompts me to tread *very* carefully when trying to dodge NaN results. We need to be certain that we introduce only logically-defensible special cases. Something like float8_coef_mul() seems much more likely to lead us into errors than away from them. regards, tom lane
Re: Strange behavior with polygon and NaN
At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane wrote in > I spent some more time looking at this patch. > > Some experimentation shows that the changes around bounding box > calculation (ie float8_min_nan() and its call sites) seem to be > completely pointless: removing them doesn't change any of the regression > results. Nor does using float8_min_nan() in the other two bounding-box > calculations I'd asked about. So I think we should just drop that set > of changes and stick with the rule that bounding box upper and lower > values are sorted as per float.h comparison rules. This isn't that hard > to work with: if you want to know whether any NaNs are in the box, test > the upper limit (and *not* the lower limit) for isnan(). Moreover, even > if we wanted a different coding rule, we really can't have it because we > will still need to work with existing on-disk values that have bounding > boxes computed the old way. Actually that changes the result since that code gives a shortcut of checking NaNs in the object coordinates. I don't think that the it is pointless to avoid full calculations that are performed only to find NaNs are involved, if bounding box check is meaningful. > I don't much like anything about float8_coef_mul(). In the first place, > FPzero() is an ugly, badly designed condition that we should be trying > to get rid of not add more dependencies on. In the second place, it's > really unclear why allowing 0 times Inf to be something other than NaN > is a good idea, and it's even less clear why allowing small-but-not-zero > times Inf to be zero rather than Inf is a good idea. In the third > place, the asymmetry between the two inputs looks more like a bug than > something we should actually want. I have the same feeling on the function, but I concluded that coefficients and coordinates should be regarded as different things in the practical standpoint. For example, consider Ax + By + C == 0, if B is 0.0, we can remove the second term from the equation, regardless of the value of y, of course even if it were inf. that is, The function imitates that kind of removals. > After some time spent staring at the specific case of line_closept_point > and its callers, I decided that the real problems there are twofold. > First, the API, or at least the callers' interpretation of this > undocumented point, is that whether the distance is undefined (NaN) is > equivalent to whether the closest point is undefined. This is not so; > in some cases we know that the distance is infinite even though we can't > calculate a unique closest point. Second, it's not making any attempt > to eliminate undefined cases up front. We can do that pretty easily > by plugging the point's coordinates into the line's equation Ax+By+C > and seeing whether we get a NaN. The attached 0002 is a subset patch > that just fixes these two issues, and I like the results it produces. Actually the code reacts to some "problem" cases in a "wrong" way: +* If it is unclear whether the point is on the line or not, then the +* results are ill-defined. This eliminates cases where any of the given +* coordinates are NaN, as well as cases where infinite coordinates give +* rise to Inf - Inf, 0 * Inf, etc. +*/ + if (unlikely(isnan(float8_pl(float8_pl(float8_mul(line->A, point->x), + float8_mul(line->B, point->y)), +line->C | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}'); | ?column? | -- | NaN Aren't our guts telling that is 1e+300? You might be thinking to put some special case handling into that path (as mentioned below?), but otherwise it yeildsa "wrong" result. The reason for the expectation is that we assume that "completely vertical" lines have a constant x value regardless of the y coordinate. That is the reason for the float8_coef_mul() function. > I wonder now whether the problems around line_interpt_line() and the > other intersection-ish functions wouldn't be better handled in a similar > way, by making changes to their API specs to be clearer about what > happens with NaNs and trying to eliminate ill-defined cases explicitly. > I've not tried to code that though. One of the "ill-defined" cases is the zero-coefficient issue. The asymmetric multiply function "fixes" it, at least. Of course it could be open-coded instead of being as a function that looks as if having some general interpretation. > Changing pg_hypot() the way you've done here is right out. See the > comment for the function: what it is doing now is per all the relevant > standards, and your patch breaks that. It's extremely unlikely that > doing it differently from IEEE and POSIX is a good idea. Mmm. Ok, I agree to that. > Attached are the same old 0001 (adding the extra point_tbl entry) > and a small 0002 that
Re: Strange behavior with polygon and NaN
At Sat, 21 Nov 2020 17:33:53 -0500, Tom Lane wrote in > I went ahead and pushed 0001 and 0003 (the latter in two parts), since > they didn't seem particularly controversial to me. Just to keep the > cfbot from whining, here's a rebased version of 0002. I didn't noticed that inf == inf sould be true (in IEEE754). # (inf - inf == 0) => false but (inf == inf + 0) == false is somewhat # uneasy but, yes, it's the standare we are basing on. So, I agree that the changes of line_construct() and line_(inv)sl() looks good to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Strange behavior with polygon and NaN
I went ahead and pushed 0001 and 0003 (the latter in two parts), since they didn't seem particularly controversial to me. Just to keep the cfbot from whining, here's a rebased version of 0002. regards, tom lane diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index c1dc511a1a..d9f577bd8b 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -2719,9 +2719,14 @@ lseg_interpt_line(Point *result, LSEG *lseg, LINE *line) *---*/ /* - * If *result is not NULL, it is set to the intersection point of a - * perpendicular of the line through the point. Returns the distance - * of those two points. + * Determine the closest point on the given line to the given point. + * If result is not NULL, *result is set to the coordinates of that point. + * In any case, returns the distance from the point to the line. + * Returns NaN for any ill-defined value. + * + * NOTE: in some cases the distance is well defined (i.e., infinity) + * even though the specific closest point is not. Hence, if you care + * about whether the point is well-defined, check its coordinates for NaNs. */ static float8 line_closept_point(Point *result, LINE *line, Point *point) @@ -2729,17 +2734,30 @@ line_closept_point(Point *result, LINE *line, Point *point) Point closept; LINE tmp; + /* + * If it is unclear whether the point is on the line or not, then the + * results are ill-defined. This eliminates cases where any of the given + * coordinates are NaN, as well as cases where infinite coordinates give + * rise to Inf - Inf, 0 * Inf, etc. + */ + if (unlikely(isnan(float8_pl(float8_pl(float8_mul(line->A, point->x), + float8_mul(line->B, point->y)), + line->C + { + if (result != NULL) + result->x = result->y = get_float8_nan(); + return get_float8_nan(); + } + /* * We drop a perpendicular to find the intersection point. Ordinarily we - * should always find it, but that can fail in the presence of NaN - * coordinates, and perhaps even from simple roundoff issues. + * should always find it, but perhaps roundoff issues could prevent that. */ line_construct(, point, line_invsl(line)); if (!line_interpt_line(, , line)) { if (result != NULL) - *result = *point; - + result->x = result->y = get_float8_nan(); return get_float8_nan(); } @@ -2758,7 +2776,9 @@ close_pl(PG_FUNCTION_ARGS) result = (Point *) palloc(sizeof(Point)); - if (isnan(line_closept_point(result, line, pt))) + (void) line_closept_point(result, line, pt); + + if (unlikely(isnan(result->x) || isnan(result->y))) PG_RETURN_NULL(); PG_RETURN_POINT_P(result); diff --git a/src/test/regress/expected/geometry.out b/src/test/regress/expected/geometry.out index c4f853ae9f..6210075bc1 100644 --- a/src/test/regress/expected/geometry.out +++ b/src/test/regress/expected/geometry.out @@ -1070,9 +1070,9 @@ SELECT p.f1, l.s, p.f1 ## l.s FROM POINT_TBL p, LINE_TBL l; (1e+300,Infinity) | {0,-1,5} | (1e+300,5) (1e+300,Infinity) | {1,0,5} | (1e+300,Infinity) | {0,3,0} | (1e+300,0) - (1e+300,Infinity) | {1,-1,0} | (Infinity,NaN) - (1e+300,Infinity) | {-0.4,-1,-6} | (-Infinity,NaN) - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | (-Infinity,NaN) + (1e+300,Infinity) | {1,-1,0} | + (1e+300,Infinity) | {-0.4,-1,-6} | + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | (1e+300,Infinity) | {3,NaN,5} | (1e+300,Infinity) | {NaN,NaN,NaN} | (1e+300,Infinity) | {0,-1,3} | (1e+300,3)
Re: Strange behavior with polygon and NaN
Further to this ... I realized after looking at things some more that one of line_closept_point's issues is really a bug in line_construct: it fails to draw a horizontal line through a point with x = Inf, though surely that's not particularly ill-defined. The reason is that somebody thought they could dispense with a special case for m == 0, but then we end up doing result->C = float8_mi(pt->y, float8_mul(m, pt->x)); and if m = 0 and pt->x = Inf, we get NaN. It also annoyed me that the code was still using DBL_MAX instead of a true Inf to represent infinite slope. That's sort of okay as long as it's just a communication mechanism between line_construct and places like line_sl, but it's not really okay, because in some places you can get a true infinity from a slope calculation. Thus in HEAD you get different results from regression=# select line(point(1,2),point(1,'inf')); line -- {-1,0,1} (1 row) regression=# select line(point(1,2),point(4,'inf')); line - {Infinity,-1,-Infinity} (1 row) which is completely silly: we ought to "round off" that infinitesimal slope to a true vertical, rather than producing a line representation we can't do anything with. So I fixed that too, but then I got a weird regression test diff: the case of lseg '[(-10,2),(-10,3)]' ?|| lseg '[(-10,2),(-10,3)]' was no longer returning true. The reason turned out to be that lseg_parallel does PG_RETURN_BOOL(FPeq(lseg_sl(l1), lseg_sl(l2))); and now lseg_sl is returning true infinities for vertical lines, and FPeq *gets the wrong answer* when asked to compare Inf to Inf. It should say equal, surely, but internally it computes a NaN and ends up with false. So the attached 0003 patch also fixes FPeq() and friends to give sane answers for Inf-vs-Inf comparisons. That part seems like a fairly fundamental bug fix, and so I feel like we ought to go ahead and apply it before we do too much more messing with the logic in this area. (Note that the apparently-large results diff in 0003 is mostly a whitespace change: the first hunk just reflects slopes coming out as Infinity not DBL_MAX.) I'm reposting 0001 and 0002 just to keep the cfbot happy, they're the same as in my previous message. regards, tom lane diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 93a8736a3f..76679bae8d 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; count --- - 3 + 4 (1 row) SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; @@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)'; count --- - 4 + 5 (1 row) SELECT count(*) FROM point_tbl p WHERE p.f1 ~= '(-5, -12)'; @@ -188,10 +188,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; (10,10) (-5,-12) (5.1,34.5) + (Infinity,1e+300) (1e+300,Infinity) (NaN,NaN) -(10 rows) +(11 rows) SELECT * FROM point_tbl WHERE f1 IS NULL; f1 @@ -202,16 +203,17 @@ SELECT * FROM point_tbl WHERE f1 IS NULL; SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1'; f1 --- - (1e-300,-1e-300) (0,0) + (1e-300,-1e-300) (-3,4) (-10,0) (10,10) (-5,-12) (5.1,34.5) (1e+300,Infinity) + (Infinity,1e+300) (NaN,NaN) -(9 rows) +(10 rows) SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1'; f1 @@ -464,7 +466,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; count --- - 3 + 4 (1 row) EXPLAIN (COSTS OFF) @@ -494,7 +496,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)'; count --- - 4 + 5 (1 row) EXPLAIN (COSTS OFF) @@ -530,10 +532,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; (10,10) (-5,-12) (5.1,34.5) + (Infinity,1e+300) (1e+300,Infinity) (NaN,NaN) -(10 rows) +(11 rows) EXPLAIN (COSTS OFF) SELECT * FROM point_tbl WHERE f1 IS NULL; @@ -568,9 +571,10 @@ SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1'; (10,10) (-5,-12) (5.1,34.5) + (Infinity,1e+300) (1e+300,Infinity) (NaN,NaN) -(9 rows) +(10 rows) EXPLAIN (COSTS OFF) SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1'; diff --git a/src/test/regress/expected/geometry.out b/src/test/regress/expected/geometry.out index 5b9d37030f..81202a4c33 100644 --- a/src/test/regress/expected/geometry.out +++ b/src/test/regress/expected/geometry.out @@ -120,6 +120,7 @@ SELECT p1.f1, p2.f1, slope(p1.f1,
Re: Strange behavior with polygon and NaN
I spent some more time looking at this patch. Some experimentation shows that the changes around bounding box calculation (ie float8_min_nan() and its call sites) seem to be completely pointless: removing them doesn't change any of the regression results. Nor does using float8_min_nan() in the other two bounding-box calculations I'd asked about. So I think we should just drop that set of changes and stick with the rule that bounding box upper and lower values are sorted as per float.h comparison rules. This isn't that hard to work with: if you want to know whether any NaNs are in the box, test the upper limit (and *not* the lower limit) for isnan(). Moreover, even if we wanted a different coding rule, we really can't have it because we will still need to work with existing on-disk values that have bounding boxes computed the old way. I don't much like anything about float8_coef_mul(). In the first place, FPzero() is an ugly, badly designed condition that we should be trying to get rid of not add more dependencies on. In the second place, it's really unclear why allowing 0 times Inf to be something other than NaN is a good idea, and it's even less clear why allowing small-but-not-zero times Inf to be zero rather than Inf is a good idea. In the third place, the asymmetry between the two inputs looks more like a bug than something we should actually want. After some time spent staring at the specific case of line_closept_point and its callers, I decided that the real problems there are twofold. First, the API, or at least the callers' interpretation of this undocumented point, is that whether the distance is undefined (NaN) is equivalent to whether the closest point is undefined. This is not so; in some cases we know that the distance is infinite even though we can't calculate a unique closest point. Second, it's not making any attempt to eliminate undefined cases up front. We can do that pretty easily by plugging the point's coordinates into the line's equation Ax+By+C and seeing whether we get a NaN. The attached 0002 is a subset patch that just fixes these two issues, and I like the results it produces. I wonder now whether the problems around line_interpt_line() and the other intersection-ish functions wouldn't be better handled in a similar way, by making changes to their API specs to be clearer about what happens with NaNs and trying to eliminate ill-defined cases explicitly. I've not tried to code that though. Changing pg_hypot() the way you've done here is right out. See the comment for the function: what it is doing now is per all the relevant standards, and your patch breaks that. It's extremely unlikely that doing it differently from IEEE and POSIX is a good idea. Attached are the same old 0001 (adding the extra point_tbl entry) and a small 0002 that fixes just line_closept_point. I've not tried to figure out just which of the rest of your diffs should be dropped given that. I did note though that the example you add to func.sgml doesn't apply to this version of line_closept_point: regression=# select point '(Infinity,Infinity)' <-> line '{-1,0,5}'; ?column? -- NaN (1 row) regression=# select point '(Infinity,Infinity)' <-> line '{0,-1,5}'; ?column? -- NaN (1 row) regards, tom lane diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 93a8736a3f..76679bae8d 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; count --- - 3 + 4 (1 row) SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; @@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)'; count --- - 4 + 5 (1 row) SELECT count(*) FROM point_tbl p WHERE p.f1 ~= '(-5, -12)'; @@ -188,10 +188,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; (10,10) (-5,-12) (5.1,34.5) + (Infinity,1e+300) (1e+300,Infinity) (NaN,NaN) -(10 rows) +(11 rows) SELECT * FROM point_tbl WHERE f1 IS NULL; f1 @@ -202,16 +203,17 @@ SELECT * FROM point_tbl WHERE f1 IS NULL; SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1'; f1 --- - (1e-300,-1e-300) (0,0) + (1e-300,-1e-300) (-3,4) (-10,0) (10,10) (-5,-12) (5.1,34.5) (1e+300,Infinity) + (Infinity,1e+300) (NaN,NaN) -(9 rows) +(10 rows) SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1'; f1 @@ -464,7 +466,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; count --- - 3 + 4 (1 row) EXPLAIN (COSTS OFF) @@ -494,7 +496,7 @@ SELECT count(*)
Re: Strange behavior with polygon and NaN
Kyotaro Horiguchi writes: > At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane wrote in >> For instance, {1,-1,0} is the line "x = y". We could argue about >> whether it'd be sensible to return zero for the distance between that >> and the point (inf,inf), but surely any point with one inf and one >> finite coordinate must be an infinite distance away from that line. >> There's nothing ill-defined about that situation. > Mmm... (swinging my arms to mimic lines..) > dist(x = y, (1e300, Inf)) looks indeterminant to me.. Well, what you're showing is that we get an internal overflow, essentially, on the way to calculating the result. Which is true, so it's sort of accidental that we got a sensible result before. Nonetheless, we *did* get a sensible result, so producing NaN instead seems like a regression. We might need to introduce special-case handling to protect the low-level calculations from ever seeing NaN or Inf in their inputs. Getting the right answer to "just fall out" of those calculations might be an unreasonable hope. For example, for a line with positive slope (A and B of opposite signs), I think that the right answer for points (Inf,Inf) and (-Inf,-Inf) should be NaN, on much the same grounds that Inf minus Inf is NaN not zero. But all other points involving any Inf coordinates are clearly an infinite distance away from that line. regards, tom lane
Re: Strange behavior with polygon and NaN
At Fri, 13 Nov 2020 15:35:58 +0900 (JST), Kyotaro Horiguchi wrote in > Thank you for the review, Georgios and Tom. > > At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane wrote in > > I spent some time looking this over, and have a few thoughts: > > > > 1. I think it's useful to split the test changes into two patches, > > as I've done below: first, just add the additional row in point_tbl > > and let the fallout from that happen, and then in the second patch > > make the code changes. This way, it's much clearer what the actual > > behavioral changes are. Some of them don't look right, either. > > For instance, in the very first hunk in geometry.out, we have > > this: > > > > - (Infinity,1e+300) | {1,0,5} | > > NaN |NaN > > + (Infinity,1e+300) | {1,0,5} | > > Infinity | Infinity > > > > which seems right, and also this: > > For example, ('Infinity', 1e300) <-> {1,0,5}, that is: > >line "x = -5" <-> point(1e300, Inf) > > So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right. ??! Correction: It's sqrt((1e300 - 5)^2 + 0^2) = Inf, which looks right. reagrds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Strange behavior with polygon and NaN
Thank you for the review, Georgios and Tom. At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane wrote in > I spent some time looking this over, and have a few thoughts: > > 1. I think it's useful to split the test changes into two patches, > as I've done below: first, just add the additional row in point_tbl > and let the fallout from that happen, and then in the second patch > make the code changes. This way, it's much clearer what the actual > behavioral changes are. Some of them don't look right, either. > For instance, in the very first hunk in geometry.out, we have > this: > > - (Infinity,1e+300) | {1,0,5} | > NaN |NaN > + (Infinity,1e+300) | {1,0,5} | > Infinity | Infinity > > which seems right, and also this: For example, ('Infinity', 1e300) <-> {1,0,5}, that is: line "x = -5" <-> point(1e300, Inf) So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right. > - (1e+300,Infinity) | {1,-1,0} | > Infinity | Infinity > - (1e+300,Infinity) | {-0.4,-1,-6} | > Infinity | Infinity > - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | > Infinity | Infinity > + (1e+300,Infinity) | {1,-1,0} | > NaN |NaN > + (1e+300,Infinity) | {-0.4,-1,-6} | > NaN |NaN > + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | > NaN |NaN > > which does not. Why aren't these distances infinite as well? > > For instance, {1,-1,0} is the line "x = y". We could argue about > whether it'd be sensible to return zero for the distance between that > and the point (inf,inf), but surely any point with one inf and one > finite coordinate must be an infinite distance away from that line. > There's nothing ill-defined about that situation. Mmm... (swinging my arms to mimic lines..) dist(x = y, (1e300, Inf)) looks indeterminant to me.. The calcuation is performed in the following steps. 1. construct the perpendicular line for the line. perpine(1e300, 'Infinity') => {-1, -1, Inf} 2. calculate the cross point. corsspoint({-1, -1, Inf}, {1,-1,0}) => (Inf, NaN) 3. calculte the distance from the crosspoint to the point. point_dt((Inf, NaN), (1e300, 'Infinity')) = HYPOT(Inf - 1e300, NaN - Inf); = HYPOT(Inf, NaN); 4. HYPOT changed the behavior by the patch Before: HYPOT(Inf, NaN) = Inf After : HYPOT(Inf, NaN) = NaN- Result A So if we will "fix" that, we should fix any, some, or all of 1-3. 1. seems to have no other way than the result. 2. crosspoint (x = - y + Inf, x = y) could be (Inf, Inf)? 3.point_dt((Inf, Inf), (1e300, Inf)) = HYPOT(Inf - 1e300, Inf - Inf) = HYPOT(Inf, -NaN) = NaN. - Result B I'm not sure why Inf - Inf is negative, but |Inf-Inf| = NaN is reasonable. That is, we don't get a "reasonable" result this way. The formula for the distance((x0,y0) - (ax + by + c = 0)) is |ax0 + by0 + c|/sqrt(a^2 + b^2) where a = -1, b = -1, c = Inf, x0 = 1e300, y0 = Inf, abs(-1 * 1e300 + -1 * Inf + Inf) / sqrt(1 + 1) = abs(-1e300 -Inf + Inf) / C = NaN. - Result C All of the Result A - C is NaN. At last NaN looks to be the right result. By the way that the formula is far simple than what we are doing now. Is there any reason to take the above steps for the calculation? > 2. Rather than coding around undesirable behavior of float8_min, > it seems like it's better to add a primitive to float.h that > does what you want, ie "NaN if either input is NaN, else the > smaller input". This is more readable, and possibly more efficient > (depending on whether the compiler is smart enough to optimize > away redundant isnan checks). I did that in the attached. Sounds reasonable. I found that I forgot to do the same thing to y coordinate. > 3. Looking for other calls of float8_min, I wonder why you did not > touch the bounding-box calculations in box_interpt_lseg() or > boxes_bound_box(). While doing that, I didn't make changes just by looking a code locally since I thought that that might be looked as overdone. Maybe, for example box_interpt_lseg, even if bounding-box check overlooked NaNs, I thought that the following calcualaions reflect any involved NaNs to the result. (But I'm not confident that that is perfect, though..) > 4. The line changes feel a bit weird, like there's no clear idea > of what a "valid" or "invalid" line is. For instance the first > hunk in line_construct(): > > + /* Avoid creating a valid line from an invalid point */ > + if (unlikely(isnan(pt->y))) > + result->C = get_float8_nan(); > > Why's it appropriate to set C and only C to NaN? Not limited to here,
Re: Strange behavior with polygon and NaN
I spent some time looking this over, and have a few thoughts: 1. I think it's useful to split the test changes into two patches, as I've done below: first, just add the additional row in point_tbl and let the fallout from that happen, and then in the second patch make the code changes. This way, it's much clearer what the actual behavioral changes are. Some of them don't look right, either. For instance, in the very first hunk in geometry.out, we have this: - (Infinity,1e+300) | {1,0,5} | NaN |NaN + (Infinity,1e+300) | {1,0,5} | Infinity | Infinity which seems right, and also this: - (1e+300,Infinity) | {1,-1,0} | Infinity | Infinity - (1e+300,Infinity) | {-0.4,-1,-6} | Infinity | Infinity - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | Infinity | Infinity + (1e+300,Infinity) | {1,-1,0} | NaN |NaN + (1e+300,Infinity) | {-0.4,-1,-6} | NaN |NaN + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | NaN |NaN which does not. Why aren't these distances infinite as well? For instance, {1,-1,0} is the line "x = y". We could argue about whether it'd be sensible to return zero for the distance between that and the point (inf,inf), but surely any point with one inf and one finite coordinate must be an infinite distance away from that line. There's nothing ill-defined about that situation. 2. Rather than coding around undesirable behavior of float8_min, it seems like it's better to add a primitive to float.h that does what you want, ie "NaN if either input is NaN, else the smaller input". This is more readable, and possibly more efficient (depending on whether the compiler is smart enough to optimize away redundant isnan checks). I did that in the attached. 3. Looking for other calls of float8_min, I wonder why you did not touch the bounding-box calculations in box_interpt_lseg() or boxes_bound_box(). 4. The line changes feel a bit weird, like there's no clear idea of what a "valid" or "invalid" line is. For instance the first hunk in line_construct(): + /* Avoid creating a valid line from an invalid point */ + if (unlikely(isnan(pt->y))) + result->C = get_float8_nan(); Why's it appropriate to set C and only C to NaN? 5. But actually there's a bigger issue with that particular hunk. This code branch is dealing with "draw a vertical line through this point", so why should we care what the point's y coordinate is --- that is, why is this particular change appropriate at all? The usual rule as I understand it is that if a function's result is determined by some of its arguments independently of what another argument's value is, then it doesn't matter if that one is NaN, you can still return the same result. 6. I'm a bit uncomfortable with the use of "bool isnan" in a couple of places. I think it's confusing to use that alongside the isnan() macro. Moreover, it's at least possible that some platforms implement isnan() in a way that would break this usage. The C spec specifically says that isnan() is a macro not a function ... but it doesn't commit to it being a macro-with-arguments. I think "anynan" or something like that would be a better choice of name. [ a bit later... ] Indeed, I get a compile failure on gaur: geo_ops.c: In function 'lseg_closept_lseg': geo_ops.c:2906:17: error: called object 'isnan' is not a function geo_ops.c:2906:32: error: called object 'isnan' is not a function geo_ops.c:2916:16: error: called object 'isnan' is not a function geo_ops.c:2924:16: error: called object 'isnan' is not a function geo_ops.c: In function 'box_closept_point': geo_ops.c:2989:16: error: called object 'isnan' is not a function geo_ops.c:2992:16: error: called object 'isnan' is not a function geo_ops.c:3004:16: error: called object 'isnan' is not a function geo_ops.c:3014:16: error: called object 'isnan' is not a function make: *** [geo_ops.o] Error 1 So that scenario isn't hypothetical. Please rename the variables. regards, tom lane diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 93a8736a3f..76679bae8d 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)'; count --- - 3 + 4 (1 row) SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; @@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)'; SELECT count(*) FROM point_tbl p WHERE
Re: Strange behavior with polygon and NaN
At Mon, 02 Nov 2020 14:43:32 +, Georgios Kokolatos wrote in > Hi, > > apologies for the very, very late reply to your fixes. > > You have answered/addressed all my questions concerns. The added documentation > reads well, at least to a non native English speaker. > > The patch still applies and as far as I can see the tests are passing. > > It gets my :+1: and I am changing the status to "Ready for Committer". > > For what little is worth, I learned a lot from this patch, thank you. > > Cheers, > Georgios > > The new status of this patch is: Ready for Committer Oh! Thanks. Since a part of this patch is committed (Thanks to Tom.) this is a rebased version on the commit. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1005fb2d992c115a5881892d273fed865a6c73d4 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 5 Nov 2020 14:01:16 +0900 Subject: [PATCH v3] Fix NaN handling of some geometric operators and functions Some geometric operators shows somewhat odd behavior comes from NaN handling and that leads to inconsistency between index scan and seq scan on geometric conditions. For example: point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN point '(NaN,NaN)' <@ polygon '((0,0),(0,1),(1,1))' => true, not false Some other functions returned totally wrong result like this: point '(1e+300,Infinity)' ## box '(2,2),(0,0)' => '(0,2)' Fix NaN and Infinity handling of geo_ops.c so that these generates saner results. However, the behavioral inconsistency that comes from Infinity cannot be eliminated with a moderate amount of additional code against the benefit so they are left alone and added documentation instead. --- doc/src/sgml/func.sgml | 16 + src/backend/utils/adt/geo_ops.c| 215 -- src/test/regress/expected/create_index.out | 24 +- src/test/regress/expected/geometry.out | 458 ++--- src/test/regress/expected/point.out| 138 +-- src/test/regress/sql/point.sql | 2 + 6 files changed, 629 insertions(+), 224 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7b1dc264f6..1aeeb48585 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10889,6 +10889,22 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple on), where available for these types, likewise compare areas. + + + NaN and Infinity make geometric functions and operators behave + inconsistently. Geometric operators or functions that return a boolean + return false for operands that contain NaNs. Number-returning functions + and operators return NaN in most cases but sometimes return a valid + value if no NaNs are met while actual calculation. Object-returning ones + yield an object that contain NaNs depending to the operation. Likewise + the objects containing Infinity can make geometric operators and + functions behave inconsistently. For example (point + '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point + '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never + be a value other than these, but you should consider it uncertain how + geometric operators behave for objects containing Infinity. + + Geometric Functions diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index a7db783958..06deeb6d12 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -904,9 +904,19 @@ box_intersect(PG_FUNCTION_ARGS) result = (BOX *) palloc(sizeof(BOX)); - result->high.x = float8_min(box1->high.x, box2->high.x); + /* float8_min conceals NaN, check separately for NaNs */ + if (unlikely(isnan(box1->high.x) || isnan(box2->high.x))) + result->high.x = get_float8_nan(); + else + result->high.x = float8_min(box1->high.x, box2->high.x); + result->low.x = float8_max(box1->low.x, box2->low.x); - result->high.y = float8_min(box1->high.y, box2->high.y); + + if (unlikely(isnan(box1->high.y) || isnan(box2->high.y))) + result->high.x = get_float8_nan(); + else + result->high.y = float8_min(box1->high.y, box2->high.y); + result->low.y = float8_max(box1->low.y, box2->low.y); PG_RETURN_BOX_P(result); @@ -1061,6 +1071,21 @@ line_construct(LINE *result, Point *pt, float8 m) result->A = -1.0; result->B = 0.0; result->C = pt->x; + + /* Avoid creating a valid line from an invalid point */ + if (unlikely(isnan(pt->y))) + result->C = get_float8_nan(); + } + else if (m == 0.0) + { + /* use "mx - y + yinter = 0" */ + result->A = 0.0; + result->B = -1.0; + result->C = pt->y; + + /* Avoid creating a valid line from an invalid point */ + if (unlikely(isnan(pt->x))) + result->C = get_float8_nan(); } else { @@ -1084,6 +1109,7 @@ line_construct_pp(PG_FUNCTION_ARGS)
Re: Strange behavior with polygon and NaN
Hi, apologies for the very, very late reply to your fixes. You have answered/addressed all my questions concerns. The added documentation reads well, at least to a non native English speaker. The patch still applies and as far as I can see the tests are passing. It gets my :+1: and I am changing the status to "Ready for Committer". For what little is worth, I learned a lot from this patch, thank you. Cheers, Georgios The new status of this patch is: Ready for Committer
Re: Strange behavior with polygon and NaN
Hello, Georgios. At Mon, 07 Sep 2020 12:46:50 +, gkokola...@pm.me wrote in > ‐‐‐ Original Message ‐‐‐ > On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi > wrote: > > > At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane t...@sss.pgh.pa.us wrote in > > > > > Kyotaro Horiguchi horikyota@gmail.com writes: > > > > > > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian br...@momjian.us > > > > wrote in > > > > > > > > > I can confirm that this two-month old email report still produces > > > > > different results with indexes on/off in git master, which I don't > > > > > think > > > > > is ever correct behavior. > > > > > > > I agree to that the behavior is broken. > > > > > > Yeah, but ... what is "non broken" in this case? I'm not convinced > > > that having point_inside() return zero for any case involving NaN > > > is going to lead to noticeably saner behavior than today. > > > > Yes, just doing that leaves many unfixed behavior come from NaNs, but > > at least it seems to me one of sane definition candidates that a point > > cannot be inside a polygon when NaN is involved. It's similar to > > Fpxx() returns false if NaN is involved. As mentioned, I had't fully > > checked and haven't considered this seriously, but I changed my mind > > to check all the callers. I started checking all the callers of > > point_inside, then finally I had to check all functions in geo_ops.c:( > > > > For what is worth, I agree with this definition. Thanks. > > The attached is the result as of now. > > > > === Resulting behavior > > > > With the attached: > > > > - All boolean functions return false if NaN is involved. > > - All float8 functions return NaN if NaN is involved. > > - All geometric arithmetics return NaN as output if NaN is involved. > > Agreed! As in both this behaviour conforms to the definition above and the > patch provides this behaviour with the exceptions below. > > > > > With some exceptions: > > > > - line_eq: needs to consider that NaNs are equal each other. > > - point_eq/ne (point_eq_pint): ditto > > - lseg_eq/ne: ditto ... > > === pg_hypot mistake? > > > > I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but > > I think NaN is appropriate here since other operators behaves that > > way. This change causes a change of distance between point(1e+300,Inf) > > and line (1,-1,0) from infinity to NaN, which I think is correct > > because the arithmetic generates NaN as an intermediate value. > > > > === Infinity annoyances > > > > Infinity makes some not-great changes in regresssion results. For example: > > > > - point '(1e+300,Infinity)' <-> path '((10,20))' returns > > NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path > > '[(1,2),(3,4)]' returns Infinity. The difference of the two > > expressions is whether (0 * Inf = NaN) is performed or not. The > > former performs it but that was concealed by not propagating NaN to > > upper layer without the patch. > > Although I understand the reasoning for this change. I am not certain I agree > with the result. I feel that: > point '(1e+300,Infinity)' <-> path '((10,20))' > should return Infinity. Even if I am wrong to think that, the two results > should be expected to behave the same. Am I wrong on that too? No. Actually that's not correct and that just comes from avoiding special code paths for Infinity. I put more thought on line_interpt_line and found that that issue is "fixed" by just simplifying formulas by removing invariants. But one more if-block is needed to make the function work a symmetrical way, though.. However, still we have a similar issue. point '(Infinity,1e+300)' <-> line '{-1,0,5}' => Infinity point '(Infinity,1e+300)' <-> line '{0,-1,5}' => NaN point '(Infinity,1e+300)' <-> line '{1,1,5}' => NaN The second should be 1e+300 and the third infinity. This is because line_closept_point taking the distance between the foot of the perpendicular line from the point and the point. We can fix the second case by adding special code paths for vertical and horizontal lines, but the third needs another special code path explicitly identifying Infinity. It seems a kind of too-much.. Finally, I gave up fixing that and added doucmentation. As another issue, (point '(Infinity, 1e+300)' <-> path '((10,20))') results in NaN. That is "fixed" by adding a special path for "m == 0.0" case, but I'm not sure it's worth doing.. By the way I found that float8_div(, infinity) erros out as underflow. It is inconsistent with the behavior that float8_mul doesn't error-out as overflow when Infinity is given. So fixed it. > > - This is not a difference caused by this patch, but for both patched > > and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN, > > which should be 1e+300. However, the behavior comes from arithmetic > > reasons and wouldn't be a problem. > > > > create_index.out is changed since point(NaN,NaN) <@ polygon
Re: Strange behavior with polygon and NaN
‐‐‐ Original Message ‐‐‐ On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi wrote: > At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane t...@sss.pgh.pa.us wrote in > > > Kyotaro Horiguchi horikyota@gmail.com writes: > > > > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian br...@momjian.us wrote > > > in > > > > > > > I can confirm that this two-month old email report still produces > > > > different results with indexes on/off in git master, which I don't think > > > > is ever correct behavior. > > > > > I agree to that the behavior is broken. > > > > Yeah, but ... what is "non broken" in this case? I'm not convinced > > that having point_inside() return zero for any case involving NaN > > is going to lead to noticeably saner behavior than today. > > Yes, just doing that leaves many unfixed behavior come from NaNs, but > at least it seems to me one of sane definition candidates that a point > cannot be inside a polygon when NaN is involved. It's similar to > Fpxx() returns false if NaN is involved. As mentioned, I had't fully > checked and haven't considered this seriously, but I changed my mind > to check all the callers. I started checking all the callers of > point_inside, then finally I had to check all functions in geo_ops.c:( > For what is worth, I agree with this definition. > The attached is the result as of now. > > === Resulting behavior > > With the attached: > > - All boolean functions return false if NaN is involved. > - All float8 functions return NaN if NaN is involved. > - All geometric arithmetics return NaN as output if NaN is involved. Agreed! As in both this behaviour conforms to the definition above and the patch provides this behaviour with the exceptions below. > > With some exceptions: > > - line_eq: needs to consider that NaNs are equal each other. > - point_eq/ne (point_eq_pint): ditto > - lseg_eq/ne: ditto > > The change makes some difference in the regression test. > For example, > > <-> changed from 0 to NaN. (distance) > > > <@ changed from true to false. (contained) > <-> changed from 0 to NaN. (distance) > ?# changed from true to false (overlaps) > > === pg_hypot mistake? > > I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but > I think NaN is appropriate here since other operators behaves that > way. This change causes a change of distance between point(1e+300,Inf) > and line (1,-1,0) from infinity to NaN, which I think is correct > because the arithmetic generates NaN as an intermediate value. > > === Infinity annoyances > > Infinity makes some not-great changes in regresssion results. For example: > > - point '(1e+300,Infinity)' <-> path '((10,20))' returns > NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path > '[(1,2),(3,4)]' returns Infinity. The difference of the two > expressions is whether (0 * Inf = NaN) is performed or not. The > former performs it but that was concealed by not propagating NaN to > upper layer without the patch. Although I understand the reasoning for this change. I am not certain I agree with the result. I feel that: point '(1e+300,Infinity)' <-> path '((10,20))' should return Infinity. Even if I am wrong to think that, the two results should be expected to behave the same. Am I wrong on that too? > > - Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)' > generates '(0,2)', which is utterly wrong. It is because > box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets > the wrong point for distance=NaN is set. With the patch, the NaN > makes the result NULL. Agreed. > > - This is not a difference caused by this patch, but for both patched > and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN, > which should be 1e+300. However, the behavior comes from arithmetic > reasons and wouldn't be a problem. > > create_index.out is changed since point(NaN,NaN) <@ polygon changed > from true to false, which seems rather saner. > > I haven't checked unchanged results but at least changed results seems > saner to me. All in all a great patch! It is clean, well reasoned and carefully crafted. Do you think that the documentation needs to get updated to the 'new' behaviour? //Georgios > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > From 432a671517e78061edc87d18aec291f5629fcbe6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 27 Aug 2020 14:49:21 +0900 Subject: [PATCH v1] Fix NaN handling of some geometric operators and functions Some geometric operators shows somewhat odd behavior comes from NaN handling and that leads to inconsistency between index scan and seq scan on geometric conditions. For example: point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN point '(NaN,NaN)' <@ polygon
Re: Strange behavior with polygon and NaN
At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian wrote > > in > >> I can confirm that this two-month old email report still produces > >> different results with indexes on/off in git master, which I don't think > >> is ever correct behavior. > > > I agree to that the behavior is broken. > > Yeah, but ... what is "non broken" in this case? I'm not convinced > that having point_inside() return zero for any case involving NaN > is going to lead to noticeably saner behavior than today. Yes, just doing that leaves many unfixed behavior come from NaNs, but at least it seems to me one of sane definition candidates that a point cannot be inside a polygon when NaN is involved. It's similar to Fpxx() returns false if NaN is involved. As mentioned, I had't fully checked and haven't considered this seriously, but I changed my mind to check all the callers. I started checking all the callers of point_inside, then finally I had to check all functions in geo_ops.c:( The attached is the result as of now. === Resulting behavior With the attached: - All boolean functions return false if NaN is involved. - All float8 functions return NaN if NaN is involved. - All geometric arithmetics return NaN as output if NaN is involved. With some exceptions: - line_eq: needs to consider that NaNs are equal each other. - point_eq/ne (point_eq_pint): ditto - lseg_eq/ne: ditto The change makes some difference in the regression test. For example, <-> changed from 0 to NaN. (distance) <@ changed from true to false. (contained) <-> changed from 0 to NaN. (distance) ?# changed from true to false (overlaps) === pg_hypot mistake? I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but I think NaN is appropriate here since other operators behaves that way. This change causes a change of distance between point(1e+300,Inf) and line (1,-1,0) from infinity to NaN, which I think is correct because the arithmetic generates NaN as an intermediate value. === Infinity annoyances Infinity makes some not-great changes in regresssion results. For example: - point '(1e+300,Infinity)' <-> path '((10,20))' returns NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path '[(1,2),(3,4)]' returns Infinity. The difference of the two expressions is whether (0 * Inf = NaN) is performed or not. The former performs it but that was concealed by not propagating NaN to upper layer without the patch. - Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)' generates '(0,2)', which is utterly wrong. It is because box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets the wrong point for distance=NaN is set. With the patch, the NaN makes the result NULL. - This is not a difference caused by this patch, but for both patched and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN, which should be 1e+300. However, the behavior comes from arithmetic reasons and wouldn't be a problem. create_index.out is changed since point(NaN,NaN) <@ polygon changed from true to false, which seems rather saner. I haven't checked unchanged results but at least changed results seems saner to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 432a671517e78061edc87d18aec291f5629fcbe6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 27 Aug 2020 14:49:21 +0900 Subject: [PATCH v1] Fix NaN handling of some geometric operators and functions Some geometric operators shows somewhat odd behavior comes from NaN handling and that leads to inconsistency between index scan and seq scan on geometric conditions. For example: point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN point '(NaN,NaN)' <@ polygon '((0,0),(0,1),(1,1))' => true, not false Some other functions returned totally wrong result like this: point '(1e+300,Infinity)' ## box '(2,2),(0,0)' => '(0,2)' Fix NaN handling of geo_ops.c so that these generates saner results. --- src/backend/utils/adt/geo_ops.c| 183 -- src/test/regress/expected/create_index.out | 2 +- src/test/regress/expected/geometry.out | 209 + 3 files changed, 248 insertions(+), 146 deletions(-) diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index a7db783958..916ae87d05 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -904,9 +904,19 @@ box_intersect(PG_FUNCTION_ARGS) result = (BOX *) palloc(sizeof(BOX)); - result->high.x = float8_min(box1->high.x, box2->high.x); + /* float8_min conceals NaN, check separately for NaNs */ + if (unlikely(isnan(box1->high.x) || isnan(box2->high.x))) + result->high.x = get_float8_nan(); + else + result->high.x = float8_min(box1->high.x, box2->high.x); + result->low.x =
Re: Strange behavior with polygon and NaN
Kyotaro Horiguchi writes: > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian wrote in >> I can confirm that this two-month old email report still produces >> different results with indexes on/off in git master, which I don't think >> is ever correct behavior. > I agree to that the behavior is broken. Yeah, but ... what is "non broken" in this case? I'm not convinced that having point_inside() return zero for any case involving NaN is going to lead to noticeably saner behavior than today. regards, tom lane
Re: Strange behavior with polygon and NaN
At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian wrote in > > I can confirm that this two-month old email report still produces > different results with indexes on/off in git master, which I don't think > is ever correct behavior. I agree to that the behavior is broken. > --- > > On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote: > > Hi hackers, > > > > While working with Chris Hajas on merging Postgres 12 with Greenplum > > Database we stumbled upon the following strange behavior in the geometry > > type polygon: > > > > -- >8 > > > > CREATE TEMP TABLE foo (p point); > > CREATE INDEX ON foo USING gist(p); > > > > INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN'); > > > > SELECT $q$ > > SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' > > $q$ AS qry \gset > > > > BEGIN; > > SAVEPOINT yolo; > > SET LOCAL enable_seqscan TO off; > > :qry; > > > > ROLLBACK TO SAVEPOINT yolo; > > SET LOCAL enable_indexscan TO off; > > SET LOCAL enable_bitmapscan TO off; > > :qry; > > > > -- 8< > > > > If you run the above repro SQL in HEAD (and 12, and likely all older > > versions), you get the following output: > > > > CREATE TABLE > > CREATE INDEX > > INSERT 0 3 > > BEGIN > > SAVEPOINT > > SET > >p > > --- > > (0,0) > > (1,1) > > (2 rows) > > > > ROLLBACK > > SET > > SET > > p > > --- > > (0,0) > > (1,1) > > (NaN,NaN) > > (3 rows) > > > > > > At first glance, you'd think this is the gist AM's bad, but on a second > > thought, something else is strange here. The following query returns > > true: > > > > SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' > > > > The above behavior of the "contained in" operator is surprising, and > > it's probably not what the GiST AM is expecting. I took a look at > > point_inside() in geo_ops.c, and it doesn't seem well equipped to handle > > NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the > > distnace operator for polygon. It gives the following interesting > > output: > > > > SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance > > FROM ( > > SELECT circle(point(100 * i, 'NaN'), 50) AS c > > FROM generate_series(-2, 4) i > > ) t(c) > > ORDER BY 2; > > > > c| distance > > -+-- > > <(-200,NaN),50> |0 > > <(-100,NaN),50> |0 > > <(0,NaN),50>|0 > > <(100,NaN),50> |0 > > <(200,NaN),50> | NaN > > <(300,NaN),50> | NaN > > <(400,NaN),50> | NaN > > (7 rows) > > > > Should they all be NaN? Am I alone in thinking the index is right but > > the operators are wrong? Or should we call the indexes wrong here? There may be other places that NaN is not fully considered. For example, the following (perpendicular op) returns true. SELECT lseg '[(0,0),(0,NaN)]' ?-| lseg '[(0,0),(10,0)]'; It is quite hard to fix all of the defect.. For the above cases, it's enough to make sure that point_inside don't pass NaN's to lseg_crossing(), but it's much of labor to fill all holes reasonable and principled way.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index a7db783958..1875f8d436 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -5350,6 +5350,9 @@ point_inside(Point *p, int npts, Point *plist) x0 = float8_mi(plist[0].x, p->x); y0 = float8_mi(plist[0].y, p->y); + if (isnan(x0) || isnan(y0) || isnan(p->x) || isnan(p->y)) + return 0; + prev_x = x0; prev_y = y0; /* loop over polygon points and aggregate total_cross */ @@ -5359,6 +5362,9 @@ point_inside(Point *p, int npts, Point *plist) x = float8_mi(plist[i].x, p->x); y = float8_mi(plist[i].y, p->y); + if (isnan(x) || isnan(y)) + return 0; + /* compute previous to current point crossing */ if ((cross = lseg_crossing(x, y, prev_x, prev_y)) == POINT_ON_POLYGON) return 2;
Re: Strange behavior with polygon and NaN
I can confirm that this two-month old email report still produces different results with indexes on/off in git master, which I don't think is ever correct behavior. --- On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote: > Hi hackers, > > While working with Chris Hajas on merging Postgres 12 with Greenplum > Database we stumbled upon the following strange behavior in the geometry > type polygon: > > -- >8 > > CREATE TEMP TABLE foo (p point); > CREATE INDEX ON foo USING gist(p); > > INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN'); > > SELECT $q$ > SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' > $q$ AS qry \gset > > BEGIN; > SAVEPOINT yolo; > SET LOCAL enable_seqscan TO off; > :qry; > > ROLLBACK TO SAVEPOINT yolo; > SET LOCAL enable_indexscan TO off; > SET LOCAL enable_bitmapscan TO off; > :qry; > > -- 8< > > If you run the above repro SQL in HEAD (and 12, and likely all older > versions), you get the following output: > > CREATE TABLE > CREATE INDEX > INSERT 0 3 > BEGIN > SAVEPOINT > SET >p > --- > (0,0) > (1,1) > (2 rows) > > ROLLBACK > SET > SET > p > --- > (0,0) > (1,1) > (NaN,NaN) > (3 rows) > > > At first glance, you'd think this is the gist AM's bad, but on a second > thought, something else is strange here. The following query returns > true: > > SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' > > The above behavior of the "contained in" operator is surprising, and > it's probably not what the GiST AM is expecting. I took a look at > point_inside() in geo_ops.c, and it doesn't seem well equipped to handle > NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the > distnace operator for polygon. It gives the following interesting > output: > > SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance > FROM ( > SELECT circle(point(100 * i, 'NaN'), 50) AS c > FROM generate_series(-2, 4) i > ) t(c) > ORDER BY 2; > > c| distance > -+-- > <(-200,NaN),50> |0 > <(-100,NaN),50> |0 > <(0,NaN),50>|0 > <(100,NaN),50> |0 > <(200,NaN),50> | NaN > <(300,NaN),50> | NaN > <(400,NaN),50> | NaN > (7 rows) > > Should they all be NaN? Am I alone in thinking the index is right but > the operators are wrong? Or should we call the indexes wrong here? > > Cheers, > Jesse and Chris > > -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee