Re: Strange behavior with polygon and NaN

2022-08-02 Thread Kyotaro Horiguchi
At Mon, 1 Aug 2022 13:29:09 -0700, Jacob Champion  
wrote in 
> As discussed in [1], we're taking this opportunity to return some
> patchsets that don't appear to be getting enough reviewer interest.

Oh, sorry. I missed that thread. Thank you for kindly noticing that.

> This is not a rejection, since we don't necessarily think there's
> anything unacceptable about the entry, but it differs from a standard
> "Returned with Feedback" in that there's probably not much actionable
> feedback at all. Rather than code changes, what this patch needs is more
> community interest. You might
> 
> - ask people for help with your approach,
> - see if there are similar patches that your code could supplement,
> - get interested parties to agree to review your patch in a CF, or
> - possibly present the functionality in a way that's easier to review
>   overall.
> 
> (Doing these things is no guarantee that there will be interest, but
> it's hopefully better than endlessly rebasing a patchset that is not
> receiving any feedback from the community.)
> 
> Once you think you've built up some community support and the patchset
> is ready for review, you (or any interested party) can resurrect the
> patch entry by visiting
> 
> https://commitfest.postgresql.org/38/2710/
> 
> and changing the status to "Needs Review", and then changing the
> status again to "Move to next CF". (Don't forget the second step;
> hopefully we will have streamlined this in the near future!)

Thanks. I don't insist on this patch unless some other people are
interested in.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2022-08-01 Thread Jacob Champion
As discussed in [1], we're taking this opportunity to return some
patchsets that don't appear to be getting enough reviewer interest.

This is not a rejection, since we don't necessarily think there's
anything unacceptable about the entry, but it differs from a standard
"Returned with Feedback" in that there's probably not much actionable
feedback at all. Rather than code changes, what this patch needs is more
community interest. You might

- ask people for help with your approach,
- see if there are similar patches that your code could supplement,
- get interested parties to agree to review your patch in a CF, or
- possibly present the functionality in a way that's easier to review
  overall.

(Doing these things is no guarantee that there will be interest, but
it's hopefully better than endlessly rebasing a patchset that is not
receiving any feedback from the community.)

Once you think you've built up some community support and the patchset
is ready for review, you (or any interested party) can resurrect the
patch entry by visiting

https://commitfest.postgresql.org/38/2710/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob

[1]
https://postgr.es/m/flat/0ab66589-2f71-69b3-2002-49e821740b0d%40timescale.com




Re: Strange behavior with polygon and NaN

2021-12-15 Thread Kyotaro Horiguchi
At Wed, 15 Dec 2021 16:20:55 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> adjusted so that it treats null as false. On the way doing this, the
> bug #17334 [2] and another bug raised earlier [3] are naturally fixed.

That being said, even if this patch were committed to the master
branch, we won't apply the whole patch set for backbranches.

I guess applying the computeDistance part in the patch works.

# But I found that the part contains a bugX(

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2021-03-15 Thread David Steele

On 12/21/20 3:30 AM, Kyotaro Horiguchi wrote:

At Tue, 01 Dec 2020 10:03:42 -0500, Tom Lane  wrote in

I think it should be "needs review" now.


Conflicted with some commit(s) uncertain to me. Rebased.


Tom, Georgios, thoughts on the new patch?

Regards,
--
-David
da...@pgmasters.net




Re: Strange behavior with polygon and NaN

2020-12-21 Thread Kyotaro Horiguchi
At Tue, 01 Dec 2020 10:03:42 -0500, Tom Lane  wrote in 
> I think it should be "needs review" now.

Conflicted with some commit(s) uncertain to me. Rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 26d9edeccf3f3111a7e0ff049aa6e3ba3746dce9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 13 Nov 2020 13:31:20 +0900
Subject: [PATCH v8] fix geometric nan handling

---
 doc/src/sgml/func.sgml |  17 ++
 src/backend/utils/adt/geo_ops.c| 337 ++---
 src/include/utils/float.h  |  22 ++
 src/test/regress/expected/create_index.out |   2 +-
 src/test/regress/expected/geometry.out | 331 +---
 5 files changed, 485 insertions(+), 224 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d5cd705eeb..2f7f3c633c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10855,6 +10855,23 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 

 
+   
+ 
+   NaN and Infinity make geometric functions and operators behave
+   inconsistently. Geometric operators or functions that return a boolean
+   return false for operands that contain NaNs. Number-returning functions
+   and operators return NaN in most cases but sometimes return a valid
+   value if no NaNs are met while actual calculation.  Object-returning ones
+   yield an object that contain NaNs depending to the operation.  Likewise
+   the objects containing Infinity can make geometric operators and
+   functions behave inconsistently. For example (point
+   '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point
+   '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never
+   be a value other than these, but you should consider it uncertain how
+   geometric operators behave for objects containing Infinity.
+ 
+   
+

 Geometric Functions
 
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index c1dc511a1a..e74bf3031e 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -904,9 +904,9 @@ box_intersect(PG_FUNCTION_ARGS)
 
 	result = (BOX *) palloc(sizeof(BOX));
 
-	result->high.x = float8_min(box1->high.x, box2->high.x);
+	result->high.x = float8_min_nan(box1->high.x, box2->high.x);
 	result->low.x = float8_max(box1->low.x, box2->low.x);
-	result->high.y = float8_min(box1->high.y, box2->high.y);
+	result->high.y = float8_min_nan(box1->high.y, box2->high.y);
 	result->low.y = float8_max(box1->low.y, box2->low.y);
 
 	PG_RETURN_BOX_P(result);
@@ -1061,15 +1061,23 @@ line_construct(LINE *result, Point *pt, float8 m)
 		result->A = -1.0;
 		result->B = 0.0;
 		result->C = pt->x;
+
+		/* Avoid creating a valid line from an invalid point */
+		if (likely(!isnan(pt->x) && !isnan(pt->y)))
+			return;
 	}
-	else if (m == 0)
+	else if (m == 0.0)
 	{
 		/* horizontal - use "y = C" */
 		result->A = 0.0;
 		result->B = -1.0;
 		result->C = pt->y;
+
+		/* Avoid creating a valid line from an invalid point */
+		if (likely(!isnan(pt->x) && !isnan(pt->y)))
+			return;
 	}
-	else
+	else if (!isnan(m))
 	{
 		/* use "mx - y + yinter = 0" */
 		result->A = m;
@@ -1078,7 +1086,12 @@ line_construct(LINE *result, Point *pt, float8 m)
 		/* on some platforms, the preceding expression tends to produce -0 */
 		if (result->C == 0.0)
 			result->C = 0.0;
+
+		if (likely(!isnan(result->C)))
+			return;
 	}
+
+	result->A = result->B = result->C = get_float8_nan();
 }
 
 /* line_construct_pp()
@@ -1091,6 +1104,7 @@ line_construct_pp(PG_FUNCTION_ARGS)
 	Point	   *pt2 = PG_GETARG_POINT_P(1);
 	LINE	   *result = (LINE *) palloc(sizeof(LINE));
 
+	/* NaNs are considered to be equal by point_eq_point */
 	if (point_eq_point(pt1, pt2))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -,8 +1125,12 @@ line_intersect(PG_FUNCTION_ARGS)
 {
 	LINE	   *l1 = PG_GETARG_LINE_P(0);
 	LINE	   *l2 = PG_GETARG_LINE_P(1);
+	Point		xp;
 
-	PG_RETURN_BOOL(line_interpt_line(NULL, l1, l2));
+	if (line_interpt_line(, l1, l2) && !isnan(xp.x) && !isnan(xp.y))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
 }
 
 Datum
@@ -1130,14 +1148,17 @@ line_perp(PG_FUNCTION_ARGS)
 	LINE	   *l1 = PG_GETARG_LINE_P(0);
 	LINE	   *l2 = PG_GETARG_LINE_P(1);
 
+	if (unlikely(isnan(l1->C) || isnan(l2->C)))
+		return false;
+
 	if (FPzero(l1->A))
-		PG_RETURN_BOOL(FPzero(l2->B));
+		PG_RETURN_BOOL(FPzero(l2->B) && !isnan(l1->B) && !isnan(l2->A));
 	if (FPzero(l2->A))
-		PG_RETURN_BOOL(FPzero(l1->B));
+		PG_RETURN_BOOL(FPzero(l1->B) && !isnan(l2->B) && !isnan(l1->A));
 	if (FPzero(l1->B))
-		PG_RETURN_BOOL(FPzero(l2->A));
+		PG_RETURN_BOOL(FPzero(l2->A) && !isnan(l1->A) && !isnan(l2->B));
 	if (FPzero(l2->B))
-		PG_RETURN_BOOL(FPzero(l1->A));
+		PG_RETURN_BOOL(FPzero(l1->A) && !isnan(l2->A) && !isnan(l1->B));
 
 	PG_RETURN_BOOL(FPeq(float8_div(float8_mul(l1->A, l2->A),
 			

Re: Strange behavior with polygon and NaN

2020-12-01 Thread Tom Lane
Anastasia Lubennikova  writes:
> The commitfest is closed now and this entry is "Waiting on author".
> As far as I see, part of the fixes is already committed. Is there 
> anything left to work on or this patch needs review/ ready for committer 
> now?

I think it should be "needs review" now.

regards, tom lane




Re: Strange behavior with polygon and NaN

2020-12-01 Thread Anastasia Lubennikova

On 25.11.2020 11:14, Kyotaro Horiguchi wrote:

At Wed, 25 Nov 2020 11:39:39 +0900 (JST), Kyotaro Horiguchi 
 wrote in

So that line of thought prompts me to tread *very* carefully when
trying to dodge NaN results.  We need to be certain that we
introduce only logically-defensible special cases.  Something like
float8_coef_mul() seems much more likely to lead us into errors
than away from them.

Agreed on that point.  I'm going to rewirte the patch in that
direction.

Removed the function float8_coef_mul().


I noticed that the check you proposed to add to line_closept_point
doesn't work for the following case:

select line('{1,-1,0}') <-> point(1e300, 'Infinity');

Ax + By + C = 1 * 1e300 + -1 * Inf + 0 = -Inf is not NaN so we go on
the following steps.

derive the perpendicular line: => line(-1, -1, Inf}
derive the cross point   : => point(Inf, Inf)
calculate the distance   : => NaN  (which should be Infinity)

So I left the check whether distance is NaN in this version. In the previous 
version the check is done before directly calculating the distance, but since 
we already have the result of Ax+Bx+C so I decided not to use point_dt() in this
version.

Although I wrote that it should be wrong that applying FPzero() to
coefficients, there are some places already doing that so I followed
those predecessors.


Reverted the change of pg_hypot().


While checking the regression results, I noticed that the follwoing
calculation, which seems wrong.

select line('{3,NaN,5}') = line('{3,NaN,5}');
  ?column?
--
  t

But after looking point_eq(), I decided to let the behavior alone
since I'm not sure the reason for the behavior of the functions. At
least the comment for point_eq() says that is the delibarate
behvior. box_same, poly_same base on the point_eq_point so they behave
the same way.


By the way, '=' doesn't compare the shape but compares the area.
However, what is the area of a line?  That should be always 0 even if
we considered it. And it is also strange that we don't have
corresponding comparison ('<' and so) operators.  It seems to me as if
a mistake of '~='.  If it is correct, I should revert the change of
line_eq() along with fixing operator assignment.

regards.



Status update for a commitfest entry.

The commitfest is closed now and this entry is "Waiting on author".
As far as I see, part of the fixes is already committed. Is there 
anything left to work on or this patch needs review/ ready for committer 
now?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Strange behavior with polygon and NaN

2020-11-25 Thread Kyotaro Horiguchi
At Wed, 25 Nov 2020 11:39:39 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > So that line of thought prompts me to tread *very* carefully when
> > trying to dodge NaN results.  We need to be certain that we
> > introduce only logically-defensible special cases.  Something like
> > float8_coef_mul() seems much more likely to lead us into errors
> > than away from them.
> 
> Agreed on that point.  I'm going to rewirte the patch in that
> direction.

Removed the function float8_coef_mul().


I noticed that the check you proposed to add to line_closept_point
doesn't work for the following case:

select line('{1,-1,0}') <-> point(1e300, 'Infinity');

Ax + By + C = 1 * 1e300 + -1 * Inf + 0 = -Inf is not NaN so we go on
the following steps.

derive the perpendicular line: => line(-1, -1, Inf}
derive the cross point   : => point(Inf, Inf)
calculate the distance   : => NaN  (which should be Infinity)

So I left the check whether distance is NaN in this version. In the previous 
version the check is done before directly calculating the distance, but since 
we already have the result of Ax+Bx+C so I decided not to use point_dt() in this
version.

Although I wrote that it should be wrong that applying FPzero() to
coefficients, there are some places already doing that so I followed
those predecessors.


Reverted the change of pg_hypot().


While checking the regression results, I noticed that the follwoing
calculation, which seems wrong.

select line('{3,NaN,5}') = line('{3,NaN,5}');
 ?column? 
--
 t

But after looking point_eq(), I decided to let the behavior alone
since I'm not sure the reason for the behavior of the functions. At
least the comment for point_eq() says that is the delibarate
behvior. box_same, poly_same base on the point_eq_point so they behave
the same way.


By the way, '=' doesn't compare the shape but compares the area.
However, what is the area of a line?  That should be always 0 even if
we considered it. And it is also strange that we don't have
corresponding comparison ('<' and so) operators.  It seems to me as if
a mistake of '~='.  If it is correct, I should revert the change of
line_eq() along with fixing operator assignment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4f8f8605afcd97571bc891bff17b63c963e27fb5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 13 Nov 2020 13:31:20 +0900
Subject: [PATCH v7] fix geometric nan handling

---
 doc/src/sgml/func.sgml |  17 ++
 src/backend/utils/adt/geo_ops.c| 337 ++---
 src/include/utils/float.h  |  22 ++
 src/test/regress/expected/create_index.out |   2 +-
 src/test/regress/expected/geometry.out | 331 +---
 5 files changed, 485 insertions(+), 224 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 507bc1a668..dc9fe52174 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10859,6 +10859,23 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 

 
+   
+ 
+   NaN and Infinity make geometric functions and operators behave
+   inconsistently. Geometric operators or functions that return a boolean
+   return false for operands that contain NaNs. Number-returning functions
+   and operators return NaN in most cases but sometimes return a valid
+   value if no NaNs are met while actual calculation.  Object-returning ones
+   yield an object that contain NaNs depending to the operation.  Likewise
+   the objects containing Infinity can make geometric operators and
+   functions behave inconsistently. For example (point
+   '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point
+   '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never
+   be a value other than these, but you should consider it uncertain how
+   geometric operators behave for objects containing Infinity.
+ 
+   
+

 Geometric Functions
 
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index c1dc511a1a..e74bf3031e 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -904,9 +904,9 @@ box_intersect(PG_FUNCTION_ARGS)
 
 	result = (BOX *) palloc(sizeof(BOX));
 
-	result->high.x = float8_min(box1->high.x, box2->high.x);
+	result->high.x = float8_min_nan(box1->high.x, box2->high.x);
 	result->low.x = float8_max(box1->low.x, box2->low.x);
-	result->high.y = float8_min(box1->high.y, box2->high.y);
+	result->high.y = float8_min_nan(box1->high.y, box2->high.y);
 	result->low.y = float8_max(box1->low.y, box2->low.y);
 
 	PG_RETURN_BOX_P(result);
@@ -1061,15 +1061,23 @@ line_construct(LINE *result, Point *pt, float8 m)
 		result->A = -1.0;
 		result->B = 0.0;
 		result->C = pt->x;
+
+		/* Avoid creating a valid line from an invalid point */
+		if (likely(!isnan(pt->x) && !isnan(pt->y)))
+			return;
 	}
-	else if (m == 0)
+	else if (m == 

Re: Strange behavior with polygon and NaN

2020-11-24 Thread Kyotaro Horiguchi
(My mailer seems to have recovered from unresponsiveness.)

At Tue, 24 Nov 2020 12:29:41 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane  wrote in 
> >> I don't much like anything about float8_coef_mul().
> 
> > I have the same feeling on the function, but I concluded that
> > coefficients and coordinates should be regarded as different things in
> > the practical standpoint.
> 
> > For example, consider Ax + By + C == 0, if B is 0.0, we can remove the
> > second term from the equation, regardless of the value of y, of course
> > even if it were inf. that is, The function imitates that kind of
> > removals.
> 
> Meh --- I can see where you're going with that, but I don't much like it.
> I fear that it's as likely to introduce weird behaviors as remove any.
>
> The core of the issue in
> 
> > | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}');
> > |  ?column? 
> > | --
> > |   NaN
> 
> is that we generate the line y = Inf:
> 
> (gdb) p tmp
> $1 = {A = 0, B = -1, C = inf}
> 
> and then try to find the intersection with {1,0,5} (x = -5), but that
> calculation involves 0 * Inf so we get NaNs.  It seems reasonable that
> the intersection should be (-5,Inf), but I don't think we should try
> to force the normal calculation to produce that.  I think we'd be
> better off to explicitly special-case vertical and/or horizontal lines
> in line_interpt_line.

I don't object to have explicit special case for vertical lines since
it is clear than embedding such a function in the formula, but it
seems equivalent to what the function is doing, that is, treating inf
* 0.0 as 0.0 in some special cases.

# And after rethinking, the FPzero() used in the function is wrong
# since the macro (function) is expected to be applied to coordinates,
# not to coefficients.

> Actually though ... even if we successfully got that intersection
> point, we'd still end up with a NaN distance between (1e300,Inf) and
> (-5,Inf), on account of Inf - Inf being NaN.  I think this is correct
> and we'd be ill-advised to try to force it to be something else.
> Although we pretend that two Infs are equal for purposes such as
> sorting, they aren't really, so we should not assume that their
> difference is zero.

The definition "inf == inf" comes from some practical reasons
uncertain to me, and actually inf - inf yields NaN in IEEE
754. However, aren't we going to assume a line on which B is exactly
0.0 as a completely vertical line?  Thus things are slightiy different
from the IEEE's definition.  The "Inf" as the y-coord of the
perpendicular foot is actually "the same y-coord with the point".  So
what we should do on our definition for the calculation is:

perp-foot (line {1,0,5}, point(1e300, Inf)) => point(-5, )
distance (point(1e300, Inf), point(-5, )) => 1e300 (+5)

This is what the code below is doing:

+   return float8_div(fabs(float8_pl(
+  float8_pl(
+  
float8_coef_mul(line->A, point->x, false),
+  
float8_coef_mul(line->B, point->y, false)),
+  line->C)),
+ HYPOT(line->A, line->B));

> So that line of thought prompts me to tread *very* carefully when
> trying to dodge NaN results.  We need to be certain that we
> introduce only logically-defensible special cases.  Something like
> float8_coef_mul() seems much more likely to lead us into errors
> than away from them.

Agreed on that point.  I'm going to rewirte the patch in that
direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2020-11-24 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane  wrote in 
>> I don't much like anything about float8_coef_mul().

> I have the same feeling on the function, but I concluded that
> coefficients and coordinates should be regarded as different things in
> the practical standpoint.

> For example, consider Ax + By + C == 0, if B is 0.0, we can remove the
> second term from the equation, regardless of the value of y, of course
> even if it were inf. that is, The function imitates that kind of
> removals.

Meh --- I can see where you're going with that, but I don't much like it.
I fear that it's as likely to introduce weird behaviors as remove any.

The core of the issue in

> | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}');
> |  ?column? 
> | --
> |   NaN

is that we generate the line y = Inf:

(gdb) p tmp
$1 = {A = 0, B = -1, C = inf}

and then try to find the intersection with {1,0,5} (x = -5), but that
calculation involves 0 * Inf so we get NaNs.  It seems reasonable that
the intersection should be (-5,Inf), but I don't think we should try
to force the normal calculation to produce that.  I think we'd be
better off to explicitly special-case vertical and/or horizontal lines
in line_interpt_line.

Actually though ... even if we successfully got that intersection
point, we'd still end up with a NaN distance between (1e300,Inf) and
(-5,Inf), on account of Inf - Inf being NaN.  I think this is correct
and we'd be ill-advised to try to force it to be something else.
Although we pretend that two Infs are equal for purposes such as
sorting, they aren't really, so we should not assume that their
difference is zero.

So that line of thought prompts me to tread *very* carefully when
trying to dodge NaN results.  We need to be certain that we
introduce only logically-defensible special cases.  Something like
float8_coef_mul() seems much more likely to lead us into errors
than away from them.

regards, tom lane




Re: Strange behavior with polygon and NaN

2020-11-23 Thread Kyotaro Horiguchi
At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane  wrote in 
> I spent some more time looking at this patch.
> 
> Some experimentation shows that the changes around bounding box
> calculation (ie float8_min_nan() and its call sites) seem to be
> completely pointless: removing them doesn't change any of the regression
> results.  Nor does using float8_min_nan() in the other two bounding-box
> calculations I'd asked about.  So I think we should just drop that set
> of changes and stick with the rule that bounding box upper and lower
> values are sorted as per float.h comparison rules.  This isn't that hard
> to work with: if you want to know whether any NaNs are in the box, test
> the upper limit (and *not* the lower limit) for isnan().  Moreover, even
> if we wanted a different coding rule, we really can't have it because we
> will still need to work with existing on-disk values that have bounding
> boxes computed the old way.

Actually that changes the result since that code gives a shortcut of
checking NaNs in the object coordinates. I don't think that the it is
pointless to avoid full calculations that are performed only to find
NaNs are involved, if bounding box check is meaningful.

> I don't much like anything about float8_coef_mul().  In the first place,
> FPzero() is an ugly, badly designed condition that we should be trying
> to get rid of not add more dependencies on.  In the second place, it's
> really unclear why allowing 0 times Inf to be something other than NaN
> is a good idea, and it's even less clear why allowing small-but-not-zero
> times Inf to be zero rather than Inf is a good idea.  In the third
> place, the asymmetry between the two inputs looks more like a bug than
> something we should actually want.

I have the same feeling on the function, but I concluded that
coefficients and coordinates should be regarded as different things in
the practical standpoint.

For example, consider Ax + By + C == 0, if B is 0.0, we can remove the
second term from the equation, regardless of the value of y, of course
even if it were inf. that is, The function imitates that kind of
removals.

> After some time spent staring at the specific case of line_closept_point
> and its callers, I decided that the real problems there are twofold.
> First, the API, or at least the callers' interpretation of this
> undocumented point, is that whether the distance is undefined (NaN) is
> equivalent to whether the closest point is undefined.  This is not so;
> in some cases we know that the distance is infinite even though we can't
> calculate a unique closest point.  Second, it's not making any attempt
> to eliminate undefined cases up front.  We can do that pretty easily
> by plugging the point's coordinates into the line's equation Ax+By+C
> and seeing whether we get a NaN.  The attached 0002 is a subset patch
> that just fixes these two issues, and I like the results it produces.

Actually the code reacts to some "problem" cases in a "wrong" way:

+* If it is unclear whether the point is on the line or not, then the
+* results are ill-defined.  This eliminates cases where any of the 
given
+* coordinates are NaN, as well as cases where infinite coordinates give
+* rise to Inf - Inf, 0 * Inf, etc.
+*/
+   if (unlikely(isnan(float8_pl(float8_pl(float8_mul(line->A, point->x),
+   
   float8_mul(line->B, point->y)),
+line->C

| postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}');
|  ?column? 
| --
|   NaN

Aren't our guts telling that is 1e+300?  You might be thinking to put
some special case handling into that path (as mentioned below?), but
otherwise it yeildsa "wrong" result.  The reason for the expectation
is that we assume that "completely vertical" lines have a constant x
value regardless of the y coordinate.  That is the reason for the
float8_coef_mul() function.

> I wonder now whether the problems around line_interpt_line() and the
> other intersection-ish functions wouldn't be better handled in a similar
> way, by making changes to their API specs to be clearer about what
> happens with NaNs and trying to eliminate ill-defined cases explicitly.
> I've not tried to code that though.

One of the "ill-defined" cases is the zero-coefficient issue.  The
asymmetric multiply function "fixes" it, at least.  Of course it could
be open-coded instead of being as a function that looks as if having
some general interpretation.

> Changing pg_hypot() the way you've done here is right out.  See the
> comment for the function: what it is doing now is per all the relevant
> standards, and your patch breaks that.  It's extremely unlikely that
> doing it differently from IEEE and POSIX is a good idea.

Mmm. Ok, I agree to that.

> Attached are the same old 0001 (adding the extra point_tbl entry)
> and a small 0002 that 

Re: Strange behavior with polygon and NaN

2020-11-23 Thread Kyotaro Horiguchi
At Sat, 21 Nov 2020 17:33:53 -0500, Tom Lane  wrote in 
> I went ahead and pushed 0001 and 0003 (the latter in two parts), since
> they didn't seem particularly controversial to me.  Just to keep the
> cfbot from whining, here's a rebased version of 0002.

I didn't noticed that inf == inf sould be true (in IEEE754).

# (inf - inf == 0) => false but (inf == inf + 0) == false is somewhat
# uneasy but, yes, it's the standare we are basing on.

So, I agree that the changes of line_construct() and line_(inv)sl()
looks good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2020-11-21 Thread Tom Lane
I went ahead and pushed 0001 and 0003 (the latter in two parts), since
they didn't seem particularly controversial to me.  Just to keep the
cfbot from whining, here's a rebased version of 0002.

regards, tom lane

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index c1dc511a1a..d9f577bd8b 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2719,9 +2719,14 @@ lseg_interpt_line(Point *result, LSEG *lseg, LINE *line)
  *---*/
 
 /*
- * If *result is not NULL, it is set to the intersection point of a
- * perpendicular of the line through the point.  Returns the distance
- * of those two points.
+ * Determine the closest point on the given line to the given point.
+ * If result is not NULL, *result is set to the coordinates of that point.
+ * In any case, returns the distance from the point to the line.
+ * Returns NaN for any ill-defined value.
+ *
+ * NOTE: in some cases the distance is well defined (i.e., infinity)
+ * even though the specific closest point is not.  Hence, if you care
+ * about whether the point is well-defined, check its coordinates for NaNs.
  */
 static float8
 line_closept_point(Point *result, LINE *line, Point *point)
@@ -2729,17 +2734,30 @@ line_closept_point(Point *result, LINE *line, Point *point)
 	Point		closept;
 	LINE		tmp;
 
+	/*
+	 * If it is unclear whether the point is on the line or not, then the
+	 * results are ill-defined.  This eliminates cases where any of the given
+	 * coordinates are NaN, as well as cases where infinite coordinates give
+	 * rise to Inf - Inf, 0 * Inf, etc.
+	 */
+	if (unlikely(isnan(float8_pl(float8_pl(float8_mul(line->A, point->x),
+		   float8_mul(line->B, point->y)),
+ line->C
+	{
+		if (result != NULL)
+			result->x = result->y = get_float8_nan();
+		return get_float8_nan();
+	}
+
 	/*
 	 * We drop a perpendicular to find the intersection point.  Ordinarily we
-	 * should always find it, but that can fail in the presence of NaN
-	 * coordinates, and perhaps even from simple roundoff issues.
+	 * should always find it, but perhaps roundoff issues could prevent that.
 	 */
 	line_construct(, point, line_invsl(line));
 	if (!line_interpt_line(, , line))
 	{
 		if (result != NULL)
-			*result = *point;
-
+			result->x = result->y = get_float8_nan();
 		return get_float8_nan();
 	}
 
@@ -2758,7 +2776,9 @@ close_pl(PG_FUNCTION_ARGS)
 
 	result = (Point *) palloc(sizeof(Point));
 
-	if (isnan(line_closept_point(result, line, pt)))
+	(void) line_closept_point(result, line, pt);
+
+	if (unlikely(isnan(result->x) || isnan(result->y)))
 		PG_RETURN_NULL();
 
 	PG_RETURN_POINT_P(result);
diff --git a/src/test/regress/expected/geometry.out b/src/test/regress/expected/geometry.out
index c4f853ae9f..6210075bc1 100644
--- a/src/test/regress/expected/geometry.out
+++ b/src/test/regress/expected/geometry.out
@@ -1070,9 +1070,9 @@ SELECT p.f1, l.s, p.f1 ## l.s FROM POINT_TBL p, LINE_TBL l;
  (1e+300,Infinity) | {0,-1,5}  | (1e+300,5)
  (1e+300,Infinity) | {1,0,5}   | 
  (1e+300,Infinity) | {0,3,0}   | (1e+300,0)
- (1e+300,Infinity) | {1,-1,0}  | (Infinity,NaN)
- (1e+300,Infinity) | {-0.4,-1,-6}  | (-Infinity,NaN)
- (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | (-Infinity,NaN)
+ (1e+300,Infinity) | {1,-1,0}  | 
+ (1e+300,Infinity) | {-0.4,-1,-6}  | 
+ (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} | 
  (1e+300,Infinity) | {3,NaN,5} | 
  (1e+300,Infinity) | {NaN,NaN,NaN} | 
  (1e+300,Infinity) | {0,-1,3}  | (1e+300,3)


Re: Strange behavior with polygon and NaN

2020-11-20 Thread Tom Lane
Further to this ...

I realized after looking at things some more that one of
line_closept_point's issues is really a bug in line_construct:
it fails to draw a horizontal line through a point with x = Inf,
though surely that's not particularly ill-defined.  The reason
is that somebody thought they could dispense with a special case
for m == 0, but then we end up doing

result->C = float8_mi(pt->y, float8_mul(m, pt->x));

and if m = 0 and pt->x = Inf, we get NaN.

It also annoyed me that the code was still using DBL_MAX instead of a
true Inf to represent infinite slope.  That's sort of okay as long as
it's just a communication mechanism between line_construct and places
like line_sl, but it's not really okay, because in some places you can
get a true infinity from a slope calculation.  Thus in HEAD you get
different results from

regression=# select line(point(1,2),point(1,'inf'));
   line   
--
 {-1,0,1}
(1 row)

regression=# select line(point(1,2),point(4,'inf'));
  line   
-
 {Infinity,-1,-Infinity}
(1 row)

which is completely silly: we ought to "round off" that infinitesimal
slope to a true vertical, rather than producing a line representation
we can't do anything with.

So I fixed that too, but then I got a weird regression test diff:
the case of
lseg '[(-10,2),(-10,3)]' ?|| lseg '[(-10,2),(-10,3)]'
was no longer returning true.  The reason turned out to be that
lseg_parallel does

PG_RETURN_BOOL(FPeq(lseg_sl(l1), lseg_sl(l2)));

and now lseg_sl is returning true infinities for vertical lines, and
FPeq *gets the wrong answer* when asked to compare Inf to Inf.  It
should say equal, surely, but internally it computes a NaN and ends up
with false.

So the attached 0003 patch also fixes FPeq() and friends to give
sane answers for Inf-vs-Inf comparisons.  That part seems like
a fairly fundamental bug fix, and so I feel like we ought to
go ahead and apply it before we do too much more messing with
the logic in this area.

(Note that the apparently-large results diff in 0003 is mostly
a whitespace change: the first hunk just reflects slopes coming
out as Infinity not DBL_MAX.)

I'm reposting 0001 and 0002 just to keep the cfbot happy,
they're the same as in my previous message.

regards, tom lane

diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 93a8736a3f..76679bae8d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
  count 
 ---
- 3
+ 4
 (1 row)
 
 SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
@@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)';
  count 
 ---
- 4
+ 5
 (1 row)
 
 SELECT count(*) FROM point_tbl p WHERE p.f1 ~= '(-5, -12)';
@@ -188,10 +188,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
  (10,10)
  (-5,-12)
  (5.1,34.5)
+ (Infinity,1e+300)
  (1e+300,Infinity)
  (NaN,NaN)
  
-(10 rows)
+(11 rows)
 
 SELECT * FROM point_tbl WHERE f1 IS NULL;
  f1 
@@ -202,16 +203,17 @@ SELECT * FROM point_tbl WHERE f1 IS NULL;
 SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1';
 f1 
 ---
- (1e-300,-1e-300)
  (0,0)
+ (1e-300,-1e-300)
  (-3,4)
  (-10,0)
  (10,10)
  (-5,-12)
  (5.1,34.5)
  (1e+300,Infinity)
+ (Infinity,1e+300)
  (NaN,NaN)
-(9 rows)
+(10 rows)
 
 SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1';
 f1
@@ -464,7 +466,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
  count 
 ---
- 3
+ 4
 (1 row)
 
 EXPLAIN (COSTS OFF)
@@ -494,7 +496,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)';
  count 
 ---
- 4
+ 5
 (1 row)
 
 EXPLAIN (COSTS OFF)
@@ -530,10 +532,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
  (10,10)
  (-5,-12)
  (5.1,34.5)
+ (Infinity,1e+300)
  (1e+300,Infinity)
  (NaN,NaN)
  
-(10 rows)
+(11 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM point_tbl WHERE f1 IS NULL;
@@ -568,9 +571,10 @@ SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1';
  (10,10)
  (-5,-12)
  (5.1,34.5)
+ (Infinity,1e+300)
  (1e+300,Infinity)
  (NaN,NaN)
-(9 rows)
+(10 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1';
diff --git a/src/test/regress/expected/geometry.out b/src/test/regress/expected/geometry.out
index 5b9d37030f..81202a4c33 100644
--- a/src/test/regress/expected/geometry.out
+++ b/src/test/regress/expected/geometry.out
@@ -120,6 +120,7 @@ SELECT p1.f1, p2.f1, slope(p1.f1, 

Re: Strange behavior with polygon and NaN

2020-11-20 Thread Tom Lane
I spent some more time looking at this patch.

Some experimentation shows that the changes around bounding box
calculation (ie float8_min_nan() and its call sites) seem to be
completely pointless: removing them doesn't change any of the regression
results.  Nor does using float8_min_nan() in the other two bounding-box
calculations I'd asked about.  So I think we should just drop that set
of changes and stick with the rule that bounding box upper and lower
values are sorted as per float.h comparison rules.  This isn't that hard
to work with: if you want to know whether any NaNs are in the box, test
the upper limit (and *not* the lower limit) for isnan().  Moreover, even
if we wanted a different coding rule, we really can't have it because we
will still need to work with existing on-disk values that have bounding
boxes computed the old way.

I don't much like anything about float8_coef_mul().  In the first place,
FPzero() is an ugly, badly designed condition that we should be trying
to get rid of not add more dependencies on.  In the second place, it's
really unclear why allowing 0 times Inf to be something other than NaN
is a good idea, and it's even less clear why allowing small-but-not-zero
times Inf to be zero rather than Inf is a good idea.  In the third
place, the asymmetry between the two inputs looks more like a bug than
something we should actually want.

After some time spent staring at the specific case of line_closept_point
and its callers, I decided that the real problems there are twofold.
First, the API, or at least the callers' interpretation of this
undocumented point, is that whether the distance is undefined (NaN) is
equivalent to whether the closest point is undefined.  This is not so;
in some cases we know that the distance is infinite even though we can't
calculate a unique closest point.  Second, it's not making any attempt
to eliminate undefined cases up front.  We can do that pretty easily
by plugging the point's coordinates into the line's equation Ax+By+C
and seeing whether we get a NaN.  The attached 0002 is a subset patch
that just fixes these two issues, and I like the results it produces.

I wonder now whether the problems around line_interpt_line() and the
other intersection-ish functions wouldn't be better handled in a similar
way, by making changes to their API specs to be clearer about what
happens with NaNs and trying to eliminate ill-defined cases explicitly.
I've not tried to code that though.

Changing pg_hypot() the way you've done here is right out.  See the
comment for the function: what it is doing now is per all the relevant
standards, and your patch breaks that.  It's extremely unlikely that
doing it differently from IEEE and POSIX is a good idea.

Attached are the same old 0001 (adding the extra point_tbl entry)
and a small 0002 that fixes just line_closept_point.  I've not
tried to figure out just which of the rest of your diffs should be
dropped given that.  I did note though that the example you add
to func.sgml doesn't apply to this version of line_closept_point:

regression=# select point '(Infinity,Infinity)' <-> line '{-1,0,5}';
 ?column? 
--
  NaN
(1 row)

regression=# select point '(Infinity,Infinity)' <-> line '{0,-1,5}';
 ?column? 
--
  NaN
(1 row)

regards, tom lane

diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 93a8736a3f..76679bae8d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
  count 
 ---
- 3
+ 4
 (1 row)
 
 SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
@@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >^ '(0.0, 0.0)';
  count 
 ---
- 4
+ 5
 (1 row)
 
 SELECT count(*) FROM point_tbl p WHERE p.f1 ~= '(-5, -12)';
@@ -188,10 +188,11 @@ SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
  (10,10)
  (-5,-12)
  (5.1,34.5)
+ (Infinity,1e+300)
  (1e+300,Infinity)
  (NaN,NaN)
  
-(10 rows)
+(11 rows)
 
 SELECT * FROM point_tbl WHERE f1 IS NULL;
  f1 
@@ -202,16 +203,17 @@ SELECT * FROM point_tbl WHERE f1 IS NULL;
 SELECT * FROM point_tbl WHERE f1 IS NOT NULL ORDER BY f1 <-> '0,1';
 f1 
 ---
- (1e-300,-1e-300)
  (0,0)
+ (1e-300,-1e-300)
  (-3,4)
  (-10,0)
  (10,10)
  (-5,-12)
  (5.1,34.5)
  (1e+300,Infinity)
+ (Infinity,1e+300)
  (NaN,NaN)
-(9 rows)
+(10 rows)
 
 SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1';
 f1
@@ -464,7 +466,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
  count 
 ---
- 3
+ 4
 (1 row)
 
 EXPLAIN (COSTS OFF)
@@ -494,7 +496,7 @@ SELECT count(*) 

Re: Strange behavior with polygon and NaN

2020-11-13 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane  wrote in 
>> For instance, {1,-1,0} is the line "x = y".  We could argue about
>> whether it'd be sensible to return zero for the distance between that
>> and the point (inf,inf), but surely any point with one inf and one
>> finite coordinate must be an infinite distance away from that line.
>> There's nothing ill-defined about that situation.

> Mmm... (swinging my arms to mimic lines..)
> dist(x = y, (1e300, Inf)) looks indeterminant to me..

Well, what you're showing is that we get an internal overflow,
essentially, on the way to calculating the result.  Which is true,
so it's sort of accidental that we got a sensible result before.
Nonetheless, we *did* get a sensible result, so producing NaN
instead seems like a regression.

We might need to introduce special-case handling to protect the
low-level calculations from ever seeing NaN or Inf in their inputs.
Getting the right answer to "just fall out" of those calculations
might be an unreasonable hope.

For example, for a line with positive slope (A and B of opposite
signs), I think that the right answer for points (Inf,Inf) and
(-Inf,-Inf) should be NaN, on much the same grounds that Inf
minus Inf is NaN not zero.  But all other points involving any Inf
coordinates are clearly an infinite distance away from that line.

regards, tom lane




Re: Strange behavior with polygon and NaN

2020-11-12 Thread Kyotaro Horiguchi
At Fri, 13 Nov 2020 15:35:58 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Thank you for the review, Georgios and Tom.
> 
> At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane  wrote in 
> > I spent some time looking this over, and have a few thoughts:
> > 
> > 1. I think it's useful to split the test changes into two patches,
> > as I've done below: first, just add the additional row in point_tbl
> > and let the fallout from that happen, and then in the second patch
> > make the code changes.  This way, it's much clearer what the actual
> > behavioral changes are.  Some of them don't look right, either.
> > For instance, in the very first hunk in geometry.out, we have
> > this:
> > 
> > - (Infinity,1e+300) | {1,0,5}   |   
> >  NaN |NaN
> > + (Infinity,1e+300) | {1,0,5}   |   
> > Infinity |   Infinity
> > 
> > which seems right, and also this:
> 
> For example, ('Infinity', 1e300) <-> {1,0,5}, that is:
> 
>line "x = -5" <-> point(1e300, Inf)
> 
> So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right.

??! Correction:

 It's sqrt((1e300 - 5)^2 + 0^2) = Inf, which looks right.

reagrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2020-11-12 Thread Kyotaro Horiguchi
Thank you for the review, Georgios and Tom.

At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane  wrote in 
> I spent some time looking this over, and have a few thoughts:
> 
> 1. I think it's useful to split the test changes into two patches,
> as I've done below: first, just add the additional row in point_tbl
> and let the fallout from that happen, and then in the second patch
> make the code changes.  This way, it's much clearer what the actual
> behavioral changes are.  Some of them don't look right, either.
> For instance, in the very first hunk in geometry.out, we have
> this:
> 
> - (Infinity,1e+300) | {1,0,5}   |
> NaN |NaN
> + (Infinity,1e+300) | {1,0,5}   |   
> Infinity |   Infinity
> 
> which seems right, and also this:

For example, ('Infinity', 1e300) <-> {1,0,5}, that is:

   line "x = -5" <-> point(1e300, Inf)

So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right.


> - (1e+300,Infinity) | {1,-1,0}  |   
> Infinity |   Infinity
> - (1e+300,Infinity) | {-0.4,-1,-6}  |   
> Infinity |   Infinity
> - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |   
> Infinity |   Infinity
> + (1e+300,Infinity) | {1,-1,0}  |
> NaN |NaN
> + (1e+300,Infinity) | {-0.4,-1,-6}  |
> NaN |NaN
> + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |
> NaN |NaN
> 
> which does not.  Why aren't these distances infinite as well?
> 
> For instance, {1,-1,0} is the line "x = y".  We could argue about
> whether it'd be sensible to return zero for the distance between that
> and the point (inf,inf), but surely any point with one inf and one
> finite coordinate must be an infinite distance away from that line.
> There's nothing ill-defined about that situation.

Mmm... (swinging my arms to mimic lines..)
dist(x = y, (1e300, Inf)) looks indeterminant to me..

The calcuation is performed in the following steps.

1. construct the perpendicular line for the line.
 perpine(1e300, 'Infinity') => {-1, -1, Inf}

2. calculate the cross point.
 corsspoint({-1, -1, Inf}, {1,-1,0}) => (Inf, NaN)

3. calculte the distance from the crosspoint to the point.
 point_dt((Inf, NaN), (1e300, 'Infinity'))
   = HYPOT(Inf - 1e300, NaN - Inf);
   = HYPOT(Inf, NaN);

4. HYPOT changed the behavior by the patch

   Before: HYPOT(Inf, NaN) = Inf
   After : HYPOT(Inf, NaN) = NaN- Result A


So if we will "fix" that, we should fix any, some, or all of 1-3.

1. seems to have no other way than the result.

2. crosspoint (x = - y + Inf, x = y)  could be (Inf, Inf)?

3.point_dt((Inf, Inf), (1e300, Inf))
   = HYPOT(Inf - 1e300, Inf - Inf)
   = HYPOT(Inf, -NaN)
   = NaN.   - Result B

  I'm not sure why Inf - Inf is negative, but |Inf-Inf| = NaN is
  reasonable.

That is, we don't get a "reasonable" result this way.


The formula for the distance((x0,y0) - (ax + by + c = 0)) is

  |ax0 + by0 + c|/sqrt(a^2 + b^2)

 where a = -1, b = -1, c = Inf, x0 = 1e300, y0 = Inf,

abs(-1 * 1e300 + -1 * Inf + Inf) / sqrt(1 + 1)
  = abs(-1e300 -Inf + Inf) / C
  = NaN.   - Result C

All of the Result A - C is NaN. At last NaN looks to be the right
result.

By the way that the formula is far simple than what we are doing
now. Is there any reason to take the above steps for the calculation?


> 2. Rather than coding around undesirable behavior of float8_min,
> it seems like it's better to add a primitive to float.h that
> does what you want, ie "NaN if either input is NaN, else the
> smaller input".  This is more readable, and possibly more efficient
> (depending on whether the compiler is smart enough to optimize
> away redundant isnan checks).  I did that in the attached.

Sounds reasonable. I found that I forgot to do the same thing to y
coordinate.

> 3. Looking for other calls of float8_min, I wonder why you did not
> touch the bounding-box calculations in box_interpt_lseg() or
> boxes_bound_box().

While doing that, I didn't make changes just by looking a code locally
since I thought that that might be looked as overdone.  Maybe, for
example box_interpt_lseg, even if bounding-box check overlooked NaNs,
I thought that the following calcualaions reflect any involved NaNs to
the result. (But I'm not confident that that is perfect, though..)

> 4. The line changes feel a bit weird, like there's no clear idea
> of what a "valid" or "invalid" line is.  For instance the first
> hunk in line_construct():
> 
> + /* Avoid creating a valid line from an invalid point */
> + if (unlikely(isnan(pt->y)))
> + result->C = get_float8_nan();
> 
> Why's it appropriate to set C and only C to NaN?

Not limited to here, 

Re: Strange behavior with polygon and NaN

2020-11-10 Thread Tom Lane
I spent some time looking this over, and have a few thoughts:

1. I think it's useful to split the test changes into two patches,
as I've done below: first, just add the additional row in point_tbl
and let the fallout from that happen, and then in the second patch
make the code changes.  This way, it's much clearer what the actual
behavioral changes are.  Some of them don't look right, either.
For instance, in the very first hunk in geometry.out, we have
this:

- (Infinity,1e+300) | {1,0,5}   |
NaN |NaN
+ (Infinity,1e+300) | {1,0,5}   |   
Infinity |   Infinity

which seems right, and also this:

- (1e+300,Infinity) | {1,-1,0}  |   
Infinity |   Infinity
- (1e+300,Infinity) | {-0.4,-1,-6}  |   
Infinity |   Infinity
- (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |   
Infinity |   Infinity
+ (1e+300,Infinity) | {1,-1,0}  |
NaN |NaN
+ (1e+300,Infinity) | {-0.4,-1,-6}  |
NaN |NaN
+ (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |
NaN |NaN

which does not.  Why aren't these distances infinite as well?
For instance, {1,-1,0} is the line "x = y".  We could argue about
whether it'd be sensible to return zero for the distance between that
and the point (inf,inf), but surely any point with one inf and one
finite coordinate must be an infinite distance away from that line.
There's nothing ill-defined about that situation.

2. Rather than coding around undesirable behavior of float8_min,
it seems like it's better to add a primitive to float.h that
does what you want, ie "NaN if either input is NaN, else the
smaller input".  This is more readable, and possibly more efficient
(depending on whether the compiler is smart enough to optimize
away redundant isnan checks).  I did that in the attached.

3. Looking for other calls of float8_min, I wonder why you did not
touch the bounding-box calculations in box_interpt_lseg() or
boxes_bound_box().

4. The line changes feel a bit weird, like there's no clear idea
of what a "valid" or "invalid" line is.  For instance the first
hunk in line_construct():

+   /* Avoid creating a valid line from an invalid point */
+   if (unlikely(isnan(pt->y)))
+   result->C = get_float8_nan();

Why's it appropriate to set C and only C to NaN?

5. But actually there's a bigger issue with that particular hunk.
This code branch is dealing with "draw a vertical line through this
point", so why should we care what the point's y coordinate is --- that
is, why is this particular change appropriate at all?  The usual rule as
I understand it is that if a function's result is determined by some of
its arguments independently of what another argument's value is, then it
doesn't matter if that one is NaN, you can still return the same result.

6. I'm a bit uncomfortable with the use of "bool isnan" in a couple
of places.  I think it's confusing to use that alongside the isnan()
macro.  Moreover, it's at least possible that some platforms implement
isnan() in a way that would break this usage.  The C spec specifically
says that isnan() is a macro not a function ... but it doesn't commit
to it being a macro-with-arguments.  I think "anynan" or something
like that would be a better choice of name.

[ a bit later... ] Indeed, I get a compile failure on gaur:

geo_ops.c: In function 'lseg_closept_lseg':
geo_ops.c:2906:17: error: called object 'isnan' is not a function
geo_ops.c:2906:32: error: called object 'isnan' is not a function
geo_ops.c:2916:16: error: called object 'isnan' is not a function
geo_ops.c:2924:16: error: called object 'isnan' is not a function
geo_ops.c: In function 'box_closept_point':
geo_ops.c:2989:16: error: called object 'isnan' is not a function
geo_ops.c:2992:16: error: called object 'isnan' is not a function
geo_ops.c:3004:16: error: called object 'isnan' is not a function
geo_ops.c:3014:16: error: called object 'isnan' is not a function
make: *** [geo_ops.o] Error 1

So that scenario isn't hypothetical.  Please rename the variables.

regards, tom lane

diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 93a8736a3f..76679bae8d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -157,7 +157,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 << '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE p.f1 >> '(0.0, 0.0)';
  count 
 ---
- 3
+ 4
 (1 row)
 
 SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
@@ -169,7 +169,7 @@ SELECT count(*) FROM point_tbl p WHERE p.f1 <^ '(0.0, 0.0)';
 SELECT count(*) FROM point_tbl p WHERE 

Re: Strange behavior with polygon and NaN

2020-11-04 Thread Kyotaro Horiguchi
At Mon, 02 Nov 2020 14:43:32 +, Georgios Kokolatos 
 wrote in 
> Hi,
> 
> apologies for the very, very late reply to your fixes.
> 
> You have answered/addressed all my questions concerns. The added documentation
> reads well, at least to a non native English speaker.
> 
> The patch still applies and as far as I can see the tests are passing.
> 
> It gets my :+1: and I am changing the status to "Ready for Committer".
> 
> For what little is worth, I learned a lot from this patch, thank you.
> 
> Cheers,
> Georgios
> 
> The new status of this patch is: Ready for Committer

Oh! Thanks. Since a part of this patch is committed (Thanks to Tom.)
this is a rebased version on the commit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1005fb2d992c115a5881892d273fed865a6c73d4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Nov 2020 14:01:16 +0900
Subject: [PATCH v3] Fix NaN handling of some geometric operators and functions

Some geometric operators shows somewhat odd behavior comes from NaN
handling and that leads to inconsistency between index scan and seq
scan on geometric conditions.

For example:
  point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN
  point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN
  point '(NaN,NaN)' <@ polygon '((0,0),(0,1),(1,1))'  => true, not false

Some other functions returned totally wrong result like this:
 point '(1e+300,Infinity)' ## box '(2,2),(0,0)' => '(0,2)'

Fix NaN and Infinity handling of geo_ops.c so that these generates
saner results. However, the behavioral inconsistency that comes from
Infinity cannot be eliminated with a moderate amount of additional
code against the benefit so they are left alone and added
documentation instead.
---
 doc/src/sgml/func.sgml |  16 +
 src/backend/utils/adt/geo_ops.c| 215 --
 src/test/regress/expected/create_index.out |  24 +-
 src/test/regress/expected/geometry.out | 458 ++---
 src/test/regress/expected/point.out| 138 +--
 src/test/regress/sql/point.sql |   2 +
 6 files changed, 629 insertions(+), 224 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7b1dc264f6..1aeeb48585 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10889,6 +10889,22 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
  on), where available for these types, likewise compare areas.
 

+   
+ 
+   NaN and Infinity make geometric functions and operators behave
+   inconsistently. Geometric operators or functions that return a boolean
+   return false for operands that contain NaNs. Number-returning functions
+   and operators return NaN in most cases but sometimes return a valid
+   value if no NaNs are met while actual calculation.  Object-returning ones
+   yield an object that contain NaNs depending to the operation.  Likewise
+   the objects containing Infinity can make geometric operators and
+   functions behave inconsistently. For example (point
+   '(Infinity,Infinity)' - line '{-1,0,5}') is Infinity but (point
+   '(Infinity,Infinity)' - line '{0,-1,5}') is NaN. It can never
+   be a value other than these, but you should consider it uncertain how
+   geometric operators behave for objects containing Infinity.
+ 
+   
 

 Geometric Functions
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index a7db783958..06deeb6d12 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -904,9 +904,19 @@ box_intersect(PG_FUNCTION_ARGS)
 
 	result = (BOX *) palloc(sizeof(BOX));
 
-	result->high.x = float8_min(box1->high.x, box2->high.x);
+	/* float8_min conceals NaN, check separately for NaNs */
+	if (unlikely(isnan(box1->high.x) || isnan(box2->high.x)))
+		result->high.x = get_float8_nan();
+	else
+		result->high.x = float8_min(box1->high.x, box2->high.x);
+
 	result->low.x = float8_max(box1->low.x, box2->low.x);
-	result->high.y = float8_min(box1->high.y, box2->high.y);
+
+	if (unlikely(isnan(box1->high.y) || isnan(box2->high.y)))
+		result->high.x = get_float8_nan();
+	else
+		result->high.y = float8_min(box1->high.y, box2->high.y);
+
 	result->low.y = float8_max(box1->low.y, box2->low.y);
 
 	PG_RETURN_BOX_P(result);
@@ -1061,6 +1071,21 @@ line_construct(LINE *result, Point *pt, float8 m)
 		result->A = -1.0;
 		result->B = 0.0;
 		result->C = pt->x;
+
+		/* Avoid creating a valid line from an invalid point */
+		if (unlikely(isnan(pt->y)))
+			result->C = get_float8_nan();
+	}
+	else if (m == 0.0)
+	{
+		/* use "mx - y + yinter = 0" */
+		result->A = 0.0;
+		result->B = -1.0;
+		result->C = pt->y;
+
+		/* Avoid creating a valid line from an invalid point */
+		if (unlikely(isnan(pt->x)))
+			result->C = get_float8_nan();
 	}
 	else
 	{
@@ -1084,6 +1109,7 @@ line_construct_pp(PG_FUNCTION_ARGS)
 	

Re: Strange behavior with polygon and NaN

2020-11-02 Thread Georgios Kokolatos
Hi,

apologies for the very, very late reply to your fixes.

You have answered/addressed all my questions concerns. The added documentation
reads well, at least to a non native English speaker.

The patch still applies and as far as I can see the tests are passing.

It gets my :+1: and I am changing the status to "Ready for Committer".

For what little is worth, I learned a lot from this patch, thank you.

Cheers,
Georgios

The new status of this patch is: Ready for Committer


Re: Strange behavior with polygon and NaN

2020-09-10 Thread Kyotaro Horiguchi
Hello, Georgios.

At Mon, 07 Sep 2020 12:46:50 +, gkokola...@pm.me wrote in 
> ‐‐‐ Original Message ‐‐‐
> On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi 
>  wrote:
> 
> > At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane t...@sss.pgh.pa.us wrote in
> >
> > > Kyotaro Horiguchi horikyota@gmail.com writes:
> > >
> > > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian br...@momjian.us 
> > > > wrote in
> > > >
> > > > > I can confirm that this two-month old email report still produces
> > > > > different results with indexes on/off in git master, which I don't 
> > > > > think
> > > > > is ever correct behavior.
> > >
> > > > I agree to that the behavior is broken.
> > >
> > > Yeah, but ... what is "non broken" in this case? I'm not convinced
> > > that having point_inside() return zero for any case involving NaN
> > > is going to lead to noticeably saner behavior than today.
> >
> > Yes, just doing that leaves many unfixed behavior come from NaNs, but
> > at least it seems to me one of sane definition candidates that a point
> > cannot be inside a polygon when NaN is involved. It's similar to
> > Fpxx() returns false if NaN is involved. As mentioned, I had't fully
> > checked and haven't considered this seriously, but I changed my mind
> > to check all the callers. I started checking all the callers of
> > point_inside, then finally I had to check all functions in geo_ops.c:(
> >
> 
> For what is worth, I agree with this definition.

Thanks.

> > The attached is the result as of now.
> >
> > === Resulting behavior
> >
> > With the attached:
> >
> > -   All boolean functions return false if NaN is involved.
> > -   All float8 functions return NaN if NaN is involved.
> > -   All geometric arithmetics return NaN as output if NaN is involved.
> 
> Agreed! As in both this behaviour conforms to the definition above and the 
> patch provides this behaviour with the exceptions below.
> 
> >
> > With some exceptions:
> >
> > -   line_eq: needs to consider that NaNs are equal each other.
> > -   point_eq/ne (point_eq_pint): ditto
> > -   lseg_eq/ne: ditto
...
> > === pg_hypot mistake?
> >
> > I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
> > I think NaN is appropriate here since other operators behaves that
> > way. This change causes a change of distance between point(1e+300,Inf)
> > and line (1,-1,0) from infinity to NaN, which I think is correct
> > because the arithmetic generates NaN as an intermediate value.
> >
> > === Infinity annoyances
> >
> > Infinity makes some not-great changes in regresssion results. For example:
> >
> > -   point '(1e+300,Infinity)' <-> path '((10,20))' returns
> > NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
> > '[(1,2),(3,4)]' returns Infinity. The difference of the two
> > expressions is whether (0 * Inf = NaN) is performed or not. The
> > former performs it but that was concealed by not propagating NaN to
> > upper layer without the patch.
> 
> Although I understand the reasoning for this change. I am not certain I agree 
> with the result. I feel that:
> point '(1e+300,Infinity)' <-> path '((10,20))'
> should return Infinity. Even if I am wrong to think that, the two results 
> should be expected to behave the same. Am I wrong on that too?

No. Actually that's not correct and that just comes from avoiding
special code paths for Infinity.  I put more thought on
line_interpt_line and found that that issue is "fixed" by just
simplifying formulas by removing invariants. But one more if-block is
needed to make the function work a symmetrical way, though..

However, still we have a similar issue.

point '(Infinity,1e+300)' <-> line '{-1,0,5}' => Infinity
point '(Infinity,1e+300)' <-> line '{0,-1,5}' => NaN
point '(Infinity,1e+300)' <-> line '{1,1,5}' => NaN

The second should be 1e+300 and the third infinity. This is because
line_closept_point taking the distance between the foot of the
perpendicular line from the point and the point. We can fix the second
case by adding special code paths for vertical and horizontal lines,
but the third needs another special code path explicitly identifying
Infinity. It seems a kind of too-much..

Finally, I gave up fixing that and added doucmentation.

As another issue, (point '(Infinity, 1e+300)' <-> path '((10,20))')
results in NaN. That is "fixed" by adding a special path for "m ==
0.0" case, but I'm not sure it's worth doing..

By the way I found that float8_div(, infinity) erros
out as underflow. It is inconsistent with the behavior that float8_mul
doesn't error-out as overflow when Infinity is given. So fixed it.

> > -   This is not a difference caused by this patch, but for both patched
> > and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
> > which should be 1e+300. However, the behavior comes from arithmetic
> > reasons and wouldn't be a problem.
> >
> > create_index.out is changed since point(NaN,NaN) <@ polygon 

Re: Strange behavior with polygon and NaN

2020-09-07 Thread gkokolatos





‐‐‐ Original Message ‐‐‐
On Thursday, 27 August 2020 14:24, Kyotaro Horiguchi  
wrote:

> At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane t...@sss.pgh.pa.us wrote in
>
> > Kyotaro Horiguchi horikyota@gmail.com writes:
> >
> > > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian br...@momjian.us wrote 
> > > in
> > >
> > > > I can confirm that this two-month old email report still produces
> > > > different results with indexes on/off in git master, which I don't think
> > > > is ever correct behavior.
> >
> > > I agree to that the behavior is broken.
> >
> > Yeah, but ... what is "non broken" in this case? I'm not convinced
> > that having point_inside() return zero for any case involving NaN
> > is going to lead to noticeably saner behavior than today.
>
> Yes, just doing that leaves many unfixed behavior come from NaNs, but
> at least it seems to me one of sane definition candidates that a point
> cannot be inside a polygon when NaN is involved. It's similar to
> Fpxx() returns false if NaN is involved. As mentioned, I had't fully
> checked and haven't considered this seriously, but I changed my mind
> to check all the callers. I started checking all the callers of
> point_inside, then finally I had to check all functions in geo_ops.c:(
>

For what is worth, I agree with this definition.


> The attached is the result as of now.
>
> === Resulting behavior
>
> With the attached:
>
> -   All boolean functions return false if NaN is involved.
> -   All float8 functions return NaN if NaN is involved.
> -   All geometric arithmetics return NaN as output if NaN is involved.

Agreed! As in both this behaviour conforms to the definition above and the 
patch provides this behaviour with the exceptions below.

>
> With some exceptions:
>
> -   line_eq: needs to consider that NaNs are equal each other.
> -   point_eq/ne (point_eq_pint): ditto
> -   lseg_eq/ne: ditto
>
> The change makes some difference in the regression test.
> For example,
>
>  <->  changed from 0 to NaN. (distance)
>
>
>  <@  changed from true to false. (contained)
>  <->  changed from 0 to NaN. (distance)
>  ?#  changed from true to false (overlaps)
>
> === pg_hypot mistake?
>
> I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
> I think NaN is appropriate here since other operators behaves that
> way. This change causes a change of distance between point(1e+300,Inf)
> and line (1,-1,0) from infinity to NaN, which I think is correct
> because the arithmetic generates NaN as an intermediate value.
>
> === Infinity annoyances
>
> Infinity makes some not-great changes in regresssion results. For example:
>
> -   point '(1e+300,Infinity)' <-> path '((10,20))' returns
> NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
> '[(1,2),(3,4)]' returns Infinity. The difference of the two
> expressions is whether (0 * Inf = NaN) is performed or not. The
> former performs it but that was concealed by not propagating NaN to
> upper layer without the patch.

Although I understand the reasoning for this change. I am not certain I agree 
with the result. I feel that:
point '(1e+300,Infinity)' <-> path '((10,20))'
should return Infinity. Even if I am wrong to think that, the two results 
should be expected to behave the same. Am I wrong on that too?


>
> -   Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)'
> generates '(0,2)', which is utterly wrong. It is because
> box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets
> the wrong point for distance=NaN is set. With the patch, the NaN
> makes the result NULL.

Agreed.

>
> -   This is not a difference caused by this patch, but for both patched
> and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
> which should be 1e+300. However, the behavior comes from arithmetic
> reasons and wouldn't be a problem.
>
> create_index.out is changed since point(NaN,NaN) <@ polygon changed
> from true to false, which seems rather saner.
>
> I haven't checked unchanged results but at least changed results seems
> saner to me.

All in all a great patch!

It is clean, well reasoned and carefully crafted.

Do you think that the documentation needs to get updated to the 'new' behaviour?


//Georgios


>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

From 432a671517e78061edc87d18aec291f5629fcbe6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 27 Aug 2020 14:49:21 +0900
Subject: [PATCH v1] Fix NaN handling of some geometric operators and functions

Some geometric operators shows somewhat odd behavior comes from NaN
handling and that leads to inconsistency between index scan and seq
scan on geometric conditions.

For example:
  point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN
  point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN
  point '(NaN,NaN)' <@ polygon 

Re: Strange behavior with polygon and NaN

2020-08-27 Thread Kyotaro Horiguchi
At Wed, 26 Aug 2020 08:18:49 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian  wrote 
> > in 
> >> I can confirm that this two-month old email report still produces
> >> different results with indexes on/off in git master, which I don't think
> >> is ever correct behavior.
> 
> > I agree to that the behavior is broken. 
> 
> Yeah, but ... what is "non broken" in this case?  I'm not convinced
> that having point_inside() return zero for any case involving NaN
> is going to lead to noticeably saner behavior than today.

Yes, just doing that leaves many unfixed behavior come from NaNs, but
at least it seems to me one of sane definition candidates that a point
cannot be inside a polygon when NaN is involved. It's similar to
Fpxx() returns false if NaN is involved. As mentioned, I had't fully
checked and haven't considered this seriously, but I changed my mind
to check all the callers. I started checking all the callers of
point_inside, then finally I had to check all functions in geo_ops.c:(

The attached is the result as of now.

=== Resulting behavior

With the attached:

 - All boolean functions return false if NaN is involved.
 - All float8 functions return NaN if NaN is involved.
 - All geometric arithmetics return NaN as output if NaN is involved.

With some exceptions:
 - line_eq: needs to consider that NaNs are equal each other.
 - point_eq/ne (point_eq_pint): ditto
 - lseg_eq/ne: ditto

The change makes some difference in the regression test.
For example,

 <->  changed from 0 to NaN. (distance)
 <@   changed from true to false. (contained)
 <->  changed from 0 to NaN. (distance)
 ?#   changed from true to false (overlaps)


=== pg_hypot mistake?

I noticed that pg_hypot returns inf for the parameters (NaN, Inf) but
I think NaN is appropriate here since other operators behaves that
way. This change causes a change of distance between point(1e+300,Inf)
and line (1,-1,0) from infinity to NaN, which I think is correct
because the arithmetic generates NaN as an intermediate value.


=== Infinity annoyances

Infinity makes some not-great changes in regresssion results. For example:

- point '(1e+300,Infinity)' <-> path '((10,20))' returns
  NaN(previously Infinity), but point '(1e+300,Infinity)' <-> path
  '[(1,2),(3,4)]' returns Infinity.  The difference of the two
  expressions is whether (0 * Inf = NaN) is performed or not. The
  former performs it but that was concealed by not propagating NaN to
  upper layer without the patch.

- Without the patch, point '(1e+300,Infinity)' ## box '(2,2),(0,0)'
  generates '(0,2)', which is utterly wrong. It is because
  box_closept_point evaluates float8_lt(Inf, NaN) as true(!) and sets
  the wrong point for distance=NaN is set. With the patch, the NaN
  makes the result NULL.

- This is not a difference caused by this patch, but for both patched
  and unpatched, point '(1e+300,Inf)' <-> line '{3,0,0}' returns NaN,
  which should be 1e+300. However, the behavior comes from arithmetic
  reasons and wouldn't be a problem.

create_index.out is changed since point(NaN,NaN) <@ polygon changed
from true to false, which seems rather saner.

I haven't checked unchanged results but at least changed results seems
saner to me.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 432a671517e78061edc87d18aec291f5629fcbe6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 27 Aug 2020 14:49:21 +0900
Subject: [PATCH v1] Fix NaN handling of some geometric operators and functions

Some geometric operators shows somewhat odd behavior comes from NaN
handling and that leads to inconsistency between index scan and seq
scan on geometric conditions.

For example:
  point '(NaN,NaN)' <-> polygon '((0,0),(0,1),(1,1))' => 0, not NaN
  point '(0.3,0.5)' <-> polygon '((0,0),(0,NaN),(1,1))' => 1.14, not NaN
  point '(NaN,NaN)' <@ polygon '((0,0),(0,1),(1,1))'  => true, not false

Some other functions returned totally wrong result like this:
 point '(1e+300,Infinity)' ## box '(2,2),(0,0)' => '(0,2)'

Fix NaN handling of geo_ops.c so that these generates saner results.
---
 src/backend/utils/adt/geo_ops.c| 183 --
 src/test/regress/expected/create_index.out |   2 +-
 src/test/regress/expected/geometry.out | 209 +
 3 files changed, 248 insertions(+), 146 deletions(-)

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index a7db783958..916ae87d05 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -904,9 +904,19 @@ box_intersect(PG_FUNCTION_ARGS)
 
 	result = (BOX *) palloc(sizeof(BOX));
 
-	result->high.x = float8_min(box1->high.x, box2->high.x);
+	/* float8_min conceals NaN, check separately for NaNs */
+	if (unlikely(isnan(box1->high.x) || isnan(box2->high.x)))
+		result->high.x = get_float8_nan();
+	else
+		result->high.x = float8_min(box1->high.x, box2->high.x);
+
 	result->low.x = 

Re: Strange behavior with polygon and NaN

2020-08-26 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian  wrote in 
>> I can confirm that this two-month old email report still produces
>> different results with indexes on/off in git master, which I don't think
>> is ever correct behavior.

> I agree to that the behavior is broken. 

Yeah, but ... what is "non broken" in this case?  I'm not convinced
that having point_inside() return zero for any case involving NaN
is going to lead to noticeably saner behavior than today.

regards, tom lane




Re: Strange behavior with polygon and NaN

2020-08-26 Thread Kyotaro Horiguchi
At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian  wrote in 
> 
> I can confirm that this two-month old email report still produces
> different results with indexes on/off in git master, which I don't think
> is ever correct behavior.

I agree to that the behavior is broken. 

> ---
> 
> On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote:
> > Hi hackers,
> > 
> > While working with Chris Hajas on merging Postgres 12 with Greenplum
> > Database we stumbled upon the following strange behavior in the geometry
> > type polygon:
> > 
> > -- >8 
> > 
> > CREATE TEMP TABLE foo (p point);
> > CREATE INDEX ON foo USING gist(p);
> > 
> > INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
> > 
> > SELECT $q$
> > SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> > $q$ AS qry \gset
> > 
> > BEGIN;
> > SAVEPOINT yolo;
> > SET LOCAL enable_seqscan TO off;
> > :qry;
> > 
> > ROLLBACK TO SAVEPOINT yolo;
> > SET LOCAL enable_indexscan TO off;
> > SET LOCAL enable_bitmapscan TO off;
> > :qry;
> > 
> > -- 8< 
> > 
> > If you run the above repro SQL in HEAD (and 12, and likely all older
> > versions), you get the following output:
> > 
> > CREATE TABLE
> > CREATE INDEX
> > INSERT 0 3
> > BEGIN
> > SAVEPOINT
> > SET
> >p
> > ---
> >  (0,0)
> >  (1,1)
> > (2 rows)
> > 
> > ROLLBACK
> > SET
> > SET
> >  p
> > ---
> >  (0,0)
> >  (1,1)
> >  (NaN,NaN)
> > (3 rows)
> > 
> > 
> > At first glance, you'd think this is the gist AM's bad, but on a second
> > thought, something else is strange here. The following query returns
> > true:
> > 
> > SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> > 
> > The above behavior of the "contained in" operator is surprising, and
> > it's probably not what the GiST AM is expecting. I took a look at
> > point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
> > NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
> > distnace operator for polygon. It gives the following interesting
> > output:
> > 
> > SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
> > FROM (
> >   SELECT circle(point(100 * i, 'NaN'), 50) AS c
> >   FROM generate_series(-2, 4) i
> > ) t(c)
> > ORDER BY 2;
> > 
> > c| distance
> > -+--
> >  <(-200,NaN),50> |0
> >  <(-100,NaN),50> |0
> >  <(0,NaN),50>|0
> >  <(100,NaN),50>  |0
> >  <(200,NaN),50>  |  NaN
> >  <(300,NaN),50>  |  NaN
> >  <(400,NaN),50>  |  NaN
> > (7 rows)
> > 
> > Should they all be NaN? Am I alone in thinking the index is right but
> > the operators are wrong? Or should we call the indexes wrong here?


There may be other places that NaN is not fully considered. For
example, the following (perpendicular op) returns true.

SELECT lseg '[(0,0),(0,NaN)]' ?-| lseg '[(0,0),(10,0)]';

It is quite hard to fix all of the defect..

For the above cases, it's enough to make sure that point_inside don't
pass NaN's to lseg_crossing(), but it's much of labor to fill all
holes reasonable and principled way..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index a7db783958..1875f8d436 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -5350,6 +5350,9 @@ point_inside(Point *p, int npts, Point *plist)
 	x0 = float8_mi(plist[0].x, p->x);
 	y0 = float8_mi(plist[0].y, p->y);
 
+	if (isnan(x0) || isnan(y0) || isnan(p->x) || isnan(p->y))
+		return 0;
+
 	prev_x = x0;
 	prev_y = y0;
 	/* loop over polygon points and aggregate total_cross */
@@ -5359,6 +5362,9 @@ point_inside(Point *p, int npts, Point *plist)
 		x = float8_mi(plist[i].x, p->x);
 		y = float8_mi(plist[i].y, p->y);
 
+		if (isnan(x) || isnan(y))
+			return 0;
+
 		/* compute previous to current point crossing */
 		if ((cross = lseg_crossing(x, y, prev_x, prev_y)) == POINT_ON_POLYGON)
 			return 2;


Re: Strange behavior with polygon and NaN

2020-08-25 Thread Bruce Momjian


I can confirm that this two-month old email report still produces
different results with indexes on/off in git master, which I don't think
is ever correct behavior.

---

On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote:
> Hi hackers,
> 
> While working with Chris Hajas on merging Postgres 12 with Greenplum
> Database we stumbled upon the following strange behavior in the geometry
> type polygon:
> 
> -- >8 
> 
> CREATE TEMP TABLE foo (p point);
> CREATE INDEX ON foo USING gist(p);
> 
> INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
> 
> SELECT $q$
> SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> $q$ AS qry \gset
> 
> BEGIN;
> SAVEPOINT yolo;
> SET LOCAL enable_seqscan TO off;
> :qry;
> 
> ROLLBACK TO SAVEPOINT yolo;
> SET LOCAL enable_indexscan TO off;
> SET LOCAL enable_bitmapscan TO off;
> :qry;
> 
> -- 8< 
> 
> If you run the above repro SQL in HEAD (and 12, and likely all older
> versions), you get the following output:
> 
> CREATE TABLE
> CREATE INDEX
> INSERT 0 3
> BEGIN
> SAVEPOINT
> SET
>p
> ---
>  (0,0)
>  (1,1)
> (2 rows)
> 
> ROLLBACK
> SET
> SET
>  p
> ---
>  (0,0)
>  (1,1)
>  (NaN,NaN)
> (3 rows)
> 
> 
> At first glance, you'd think this is the gist AM's bad, but on a second
> thought, something else is strange here. The following query returns
> true:
> 
> SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> 
> The above behavior of the "contained in" operator is surprising, and
> it's probably not what the GiST AM is expecting. I took a look at
> point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
> NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
> distnace operator for polygon. It gives the following interesting
> output:
> 
> SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
> FROM (
>   SELECT circle(point(100 * i, 'NaN'), 50) AS c
>   FROM generate_series(-2, 4) i
> ) t(c)
> ORDER BY 2;
> 
> c| distance
> -+--
>  <(-200,NaN),50> |0
>  <(-100,NaN),50> |0
>  <(0,NaN),50>|0
>  <(100,NaN),50>  |0
>  <(200,NaN),50>  |  NaN
>  <(300,NaN),50>  |  NaN
>  <(400,NaN),50>  |  NaN
> (7 rows)
> 
> Should they all be NaN? Am I alone in thinking the index is right but
> the operators are wrong? Or should we call the indexes wrong here?
> 
> Cheers,
> Jesse and Chris
> 
> 

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee