Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-16 Thread Tom Lane
Andreas Seltenreich  writes:
> I wrote:
>> Sounds like some fuzz testing with nan/infinity is in order.

> related fallout: close_ps returns a NULL pointer with NaNs around:
> select close_ps('(nan,nan)', '(nan,nan),(nan,nan)');
> -- TRAP: FailedAssertion("!(result != ((void *)0))", File: "geo_ops.c", Line: 
> 2860)

Yeah, that Assert seems way too optimistic.  Even without NaNs, I wonder
whether plain old roundoff error couldn't trigger cases where interpt_sl
fails to find an intersection point.  I'm inclined to just let close_ps
return SQL NULL in such cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-16 Thread Andreas Seltenreich
I wrote:

> Sounds like some fuzz testing with nan/infinity is in order.

related fallout: close_ps returns a NULL pointer with NaNs around:

select close_ps('(nan,nan)', '(nan,nan),(nan,nan)');
-- TRAP: FailedAssertion("!(result != ((void *)0))", File: "geo_ops.c", Line: 
2860)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
I wrote:
> Andreas Seltenreich  writes:
>> The infinite loop from the bug report was triggered. Further, two
>> previously unseen errors are logged:
>> ERROR:  timestamp cannot be NaN
>> ERROR:  getQuadrant: impossible case
>> The first is porbably as boring as it gets, the second one is from the
>> getQuadrant() in spgquadtreeproc.c.

> Yeah, the first one is presumably from float8_timestamptz() intentionally
> rejecting a NaN, which seems fine.

>> Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
>> not have such a check.  I guess the boxes will just end up in an
>> undefined position in the index for these.

> Right, we probably want them all to apply some consistent ordering ---
> doesn't matter so much what it is, but float8's rule is as good as any.

I looked into these a bit more closely.  I think rangetypes_spgist.c is
fine as-is: it's relying on the range component type to provide a total
ordering of its values, and if the component type fails to do that, it's
the fault of the component type not the range code.

spgquadtreeproc.c and geo_spgist.c both have got NaN issues, but the code
in both of them is rather tightly tied to the fuzzy-geometric-comparisons
logic that Emre has been messing with.  I think we ought to put "what
to do with NaNs?" into that same can of worms, rather than try to resolve
it separately.  (Yeah, I know, I just made Emre's job even harder.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
I wrote:
> I looked into the problem reported in bug #14238,
> https://www.postgresql.org/message-id/20160708151747.1426.60...@wrigleys.postgresql.org
> I think that probably the most reasonable answer is to replace all the
> raw "double" comparisons in this code with float8_cmp_internal() or
> something that's the moral equivalent of that.  Does anyone want to
> propose something else?

Here's a draft patch that modifies gistproc.c to treat NaNs essentially
as though they are infinities.  I've only touched the code associated
with GiST index insertion, not with searches, so it's quite possible that
searches will still not work right in the presence of NaNs.  However,
this does fix the index-build infinite loop complained of originally.

I've not yet looked at the SPGiST issues reported by Andreas.

regards, tom lane

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index e8213e2..d47211a 100644
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***
*** 17,36 
   */
  #include "postgres.h"
  
  #include "access/gist.h"
  #include "access/stratnum.h"
  #include "utils/geo_decls.h"
  
  
  static bool gist_box_leaf_consistent(BOX *key, BOX *query,
  		 StrategyNumber strategy);
- static double size_box(BOX *box);
  static bool rtree_internal_consistent(BOX *key, BOX *query,
  		  StrategyNumber strategy);
  
  /* Minimum accepted ratio of split */
  #define LIMIT_RATIO 0.3
  
  
  /**
   * Box ops
--- 17,47 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/gist.h"
  #include "access/stratnum.h"
+ #include "utils/builtins.h"
  #include "utils/geo_decls.h"
  
  
  static bool gist_box_leaf_consistent(BOX *key, BOX *query,
  		 StrategyNumber strategy);
  static bool rtree_internal_consistent(BOX *key, BOX *query,
  		  StrategyNumber strategy);
  
  /* Minimum accepted ratio of split */
  #define LIMIT_RATIO 0.3
  
+ /* Convenience macros for NaN-aware comparisons */
+ #define FLOAT8_EQ(a,b)	(float8_cmp_internal(a, b) == 0)
+ #define FLOAT8_LT(a,b)	(float8_cmp_internal(a, b) < 0)
+ #define FLOAT8_LE(a,b)	(float8_cmp_internal(a, b) <= 0)
+ #define FLOAT8_GT(a,b)	(float8_cmp_internal(a, b) > 0)
+ #define FLOAT8_GE(a,b)	(float8_cmp_internal(a, b) >= 0)
+ #define FLOAT8_MAX(a,b)  (FLOAT8_GT(a, b) ? (a) : (b))
+ #define FLOAT8_MIN(a,b)  (FLOAT8_LT(a, b) ? (a) : (b))
+ 
  
  /**
   * Box ops
*** static bool rtree_internal_consistent(BO
*** 40,51 
   * Calculates union of two boxes, a and b. The result is stored in *n.
   */
  static void
! rt_box_union(BOX *n, BOX *a, BOX *b)
  {
! 	n->high.x = Max(a->high.x, b->high.x);
! 	n->high.y = Max(a->high.y, b->high.y);
! 	n->low.x = Min(a->low.x, b->low.x);
! 	n->low.y = Min(a->low.y, b->low.y);
  }
  
  /*
--- 51,103 
   * Calculates union of two boxes, a and b. The result is stored in *n.
   */
  static void
! rt_box_union(BOX *n, const BOX *a, const BOX *b)
  {
! 	n->high.x = FLOAT8_MAX(a->high.x, b->high.x);
! 	n->high.y = FLOAT8_MAX(a->high.y, b->high.y);
! 	n->low.x = FLOAT8_MIN(a->low.x, b->low.x);
! 	n->low.y = FLOAT8_MIN(a->low.y, b->low.y);
! }
! 
! /*
!  * Size of a BOX for penalty-calculation purposes.
!  * The result can be +Infinity, but not NaN.
!  */
! static double
! size_box(const BOX *box)
! {
! 	/*
! 	 * Check for zero-width cases.  Note that we define the size of a zero-
! 	 * by-infinity box as zero.  It's important to special-case this somehow,
! 	 * as naively multiplying infinity by zero will produce NaN.
! 	 *
! 	 * The less-than cases should not happen, but if they do, say "zero".
! 	 */
! 	if (FLOAT8_LE(box->high.x, box->low.x) ||
! 		FLOAT8_LE(box->high.y, box->low.y))
! 		return 0.0;
! 
! 	/*
! 	 * We treat NaN as larger than +Infinity, so any distance involving a NaN
! 	 * and a non-NaN is infinite.  Note the previous check eliminated the
! 	 * possibility that the low fields are NaNs.
! 	 */
! 	if (isnan(box->high.x) || isnan(box->high.y))
! 		return get_float8_infinity();
! 	return (box->high.x - box->low.x) * (box->high.y - box->low.y);
! }
! 
! /*
!  * Return amount by which the union of the two boxes is larger than
!  * the original BOX's area.  The result can be +Infinity, but not NaN.
!  */
! static double
! box_penalty(const BOX *original, const BOX *new)
! {
! 	BOX			unionbox;
! 
! 	rt_box_union(, original, new);
! 	return size_box() - size_box(original);
  }
  
  /*
*** gist_box_consistent(PG_FUNCTION_ARGS)
*** 85,100 
   strategy));
  }
  
  static void
! adjustBox(BOX *b, BOX *addon)
  {
! 	if (b->high.x < addon->high.x)
  		b->high.x = addon->high.x;
! 	if (b->low.x > addon->low.x)
  		b->low.x = addon->low.x;
! 	if (b->high.y < addon->high.y)
  		b->high.y = addon->high.y;
! 	if (b->low.y > addon->low.y)
  		b->low.y = addon->low.y;
  }
  
--- 

Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
Andreas Seltenreich  writes:
> Tom Lane writes:
>> More generally, this example makes me fearful that NaN coordinates in
>> geometric values are likely to cause all sorts of issues.  It's too late
>> to disallow them, probably, but I wonder how can we identify other bugs
>> of this ilk.

> Sounds like some fuzz testing with nan/infinity is in order.  sqlsmith
> doesn't generate any float literals, but it calls functions to satisfy
> its need for values of specific types.  Adding suitable functions[1] to
> the regression db, I made the following observations:

This is really useful, thanks!

> The infinite loop from the bug report was triggered. Further, two
> previously unseen errors are logged:
> ERROR:  timestamp cannot be NaN
> ERROR:  getQuadrant: impossible case
> The first is porbably as boring as it gets, the second one is from the
> getQuadrant() in spgquadtreeproc.c.

Yeah, the first one is presumably from float8_timestamptz() intentionally
rejecting a NaN, which seems fine.

> Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
> not have such a check.  I guess the boxes will just end up in an
> undefined position in the index for these.

Right, we probably want them all to apply some consistent ordering ---
doesn't matter so much what it is, but float8's rule is as good as any.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Andreas Seltenreich
Tom Lane writes:

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Sounds like some fuzz testing with nan/infinity is in order.  sqlsmith
doesn't generate any float literals, but it calls functions to satisfy
its need for values of specific types.  Adding suitable functions[1] to
the regression db, I made the following observations:

The infinite loop from the bug report was triggered. Further, two
previously unseen errors are logged:

ERROR:  timestamp cannot be NaN
ERROR:  getQuadrant: impossible case

The first is porbably as boring as it gets, the second one is from the
getQuadrant() in spgquadtreeproc.c.

Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
not have such a check.  I guess the boxes will just end up in an
undefined position in the index for these.

regards
Andreas

Footnotes:
[1]
create function smith_double_inf() returns float as $$select 
'infinity'::float$$ language sql immutable;
create function smith_double_ninf() returns float as $$select 
'-infinity'::float$$ language sql immutable;
create function smith_double_nan() returns float as $$select 'nan'::float$$ 
language sql immutable;
create function smith_real_nan() returns real as $$select 'nan'::real$$ 
language sql immutable;
create function smith_real_inf() returns real as $$select 'infinity'::real$$ 
language sql immutable;
create function smith_real_ninf() returns real as $$select '-infinity'::real$$ 
language sql immutable;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-11 Thread Tom Lane
Emre Hasegeli  writes:
>> I think that probably the most reasonable answer is to replace all the
>> raw "double" comparisons in this code with float8_cmp_internal() or
>> something that's the moral equivalent of that.  Does anyone want to
>> propose something else?

> We can probably get away by changing the comparison on the GiST code.
> It is not likely to cause inconsistent results.  Comparisons with NaN
> coordinates don't return true to anything, anyway:

Yes, and that is itself inconsistent with the behavior of the primitive
float8 datatype:

regression=# select '4'::float8 < 'NaN'::float8;
 ?column? 
--
 t
(1 row)

I'm inclined to think that we ought to try to make NaNs in geometric types
work like float8 thinks they work, ie they compare higher than non-NaNs.
Yeah, it would make an IEEE-spec purist blanch, but there is no room for
unordered values in a datatype that you would like to be indexable, or
groupable.

> Is it reasonable to disallow NaN coordinates on the next major
> release.  Are there any reason to deal with them?

I don't see how we can do that; what would you do about tables already
containing NaNs?  Even without that consideration, assuming that there's
no way a NaN could creep in seems a pretty fragile assumption, considering
that operations like Infinity/Infinity will produce one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-11 Thread Emre Hasegeli
> I think that probably the most reasonable answer is to replace all the
> raw "double" comparisons in this code with float8_cmp_internal() or
> something that's the moral equivalent of that.  Does anyone want to
> propose something else?

We can probably get away by changing the comparison on the GiST code.
It is not likely to cause inconsistent results.  Comparisons with NaN
coordinates don't return true to anything, anyway:

# select '(3,4),(nan,5)'::box = '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box < '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box > '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Is it reasonable to disallow NaN coordinates on the next major
release.  Are there any reason to deal with them?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GiST index build versus NaN coordinates

2016-07-11 Thread Tom Lane
I looked into the problem reported in bug #14238,
https://www.postgresql.org/message-id/20160708151747.1426.60...@wrigleys.postgresql.org
The submitter was kind enough to give me a copy of the problem data,
and it turns out that the issue is that a few of the boxes contain
NaN coordinates.  Armed with that knowledge, it's trivial to reproduce:

regression=# create table foo (f1 box);
CREATE TABLE
regression=# insert into foo values ('(3,4),(nan,5)');
INSERT 0 1
regression=# insert into foo select box(point(x,x+1),point(x,x+1)) from 
generate_series(1,1000) x;
INSERT 0 1000
regression=# create index on foo using gist(f1);
-- hangs, does not respond to control-C

The infinite loop is at lines 613ff in gistproc.c: once rightLower
contains a NaN, the test

while (i1 < nentries && rightLower == intervalsLower[i1].lower)

can never succeed, so i1 never increments again and the loop cannot exit.

I think that probably the most reasonable answer is to replace all the
raw "double" comparisons in this code with float8_cmp_internal() or
something that's the moral equivalent of that.  Does anyone want to
propose something else?

More generally, this example makes me fearful that NaN coordinates in
geometric values are likely to cause all sorts of issues.  It's too late
to disallow them, probably, but I wonder how can we identify other bugs
of this ilk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers