Re: [HACKERS] range_adjacent and discrete ranges

2011-11-23 Thread Tom Lane
I wrote:
 Attached is a draft patch for this.  It passes regression tests but I've
 not tried to exercise it with a canonical function that actually does
 something different.

I hacked up int4range_canonical to produce []-style ranges, and
confirmed that this version of range_adjacent seems to work with them.

 It's going to be a bit slower than Jeff's
 original, because it does not only range_cmp_bound_values but also a
 make_range cycle (in most cases).  So I guess the question is how much
 we care about supporting canonical functions with non-default policies.
 Thoughts?

I did a little bit of performance testing on an x86_64 machine (Fedora 14),
and found that the time to execute a clause like
WHERE int4range(1,2) -|- int4range(x, 1000)
(x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the
patched version of range_adjacent.  With float8 ranges it grows from
0.35 us to 0.54 us.  So these are noticeable penalties but they don't
seem like show-stoppers.  Since the alternative is to document that
the apparent freedom to choose a canonicalization policy is illusory,
I'm inclined to think we should change it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-23 Thread Tom Lane
I wrote:
 I did a little bit of performance testing on an x86_64 machine (Fedora 14),
 and found that the time to execute a clause like
   WHERE int4range(1,2) -|- int4range(x, 1000)
 (x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the
 patched version of range_adjacent.  With float8 ranges it grows from
 0.35 us to 0.54 us.  So these are noticeable penalties but they don't
 seem like show-stoppers.  Since the alternative is to document that
 the apparent freedom to choose a canonicalization policy is illusory,
 I'm inclined to think we should change it.

It occurred to me that we can easily buy back the extra time for range
types that don't have a canonical function (ie, continuous ranges).
If there's no such function, it's impossible for B..C to normalize to
empty when B  C, so we can skip the extra logic.  The attached version
is no slower than the original code for continuous ranges, and doesn't
seem measurably different from my previous patch for discrete ranges.

regards, tom lane

*** /home/postgres/pgsql/src/backend/utils/adt/rangetypes.c	Tue Nov 22 23:18:51 2011
--- new/rangetypes.c	Wed Nov 23 16:29:20 2011
***
*** 699,704 
--- 699,706 
  upper2;
  	bool		empty1,
  empty2;
+ 	RangeType  *r3;
+ 	int			cmp;
  
  	/* Different types should be prevented by ANYRANGE matching rules */
  	if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
***
*** 714,736 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * For two ranges to be adjacent, the lower boundary of one range has to
! 	 * match the upper boundary of the other. However, the inclusivity of
! 	 * those two boundaries must also be different.
  	 *
! 	 * The semantics for range_cmp_bounds aren't quite what we need here, so
! 	 * we do the comparison more directly.
  	 */
! 	if (lower1.inclusive != upper2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, lower1, upper2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
  
! 	if (upper1.inclusive != lower2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, upper1, lower2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
  
  	PG_RETURN_BOOL(false);
--- 716,774 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * Given two ranges A..B and C..D, where B  C, the ranges are adjacent
! 	 * if and only if the range B..C is empty, where inclusivity of these two
! 	 * bounds is inverted compared to the original bounds.  For discrete
! 	 * ranges, we have to rely on the canonicalization function to normalize
! 	 * B..C to empty if it contains no elements of the subtype.  (If there is
! 	 * no canonicalization function, it's impossible for such a range to
! 	 * normalize to empty, so we needn't bother to try.)
! 	 *
! 	 * If B == C, the ranges are adjacent only if these bounds have different
! 	 * inclusive flags (i.e., exactly one of the ranges includes the common
! 	 * boundary point).
  	 *
! 	 * And if B  C then the ranges cannot be adjacent in this order, but we
! 	 * must consider the other order (i.e., check D = A).
  	 */
! 	cmp = range_cmp_bound_values(typcache, upper1, lower2);
! 	if (cmp  0)
! 	{
! 		/* in a continuous subtype, there are assumed to be points between */
! 		if (!OidIsValid(typcache-rng_canonical_finfo.fn_oid))
! 			PG_RETURN_BOOL(false);
! 		/* flip the inclusion flags */
! 		upper1.inclusive = !upper1.inclusive;
! 		lower2.inclusive = !lower2.inclusive;
! 		/* change upper/lower labels to avoid Assert failures */
! 		upper1.lower = true;
! 		lower2.lower = false;
! 		r3 = make_range(typcache, upper1, lower2, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
! 	}
! 	if (cmp == 0)
  	{
! 		PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive);
  	}
  
! 	cmp = range_cmp_bound_values(typcache, upper2, lower1);
! 	if (cmp  0)
! 	{
! 		/* in a continuous subtype, there are assumed to be points between */
! 		if (!OidIsValid(typcache-rng_canonical_finfo.fn_oid))
! 			PG_RETURN_BOOL(false);
! 		/* flip the inclusion flags */
! 		upper2.inclusive = !upper2.inclusive;
! 		lower1.inclusive = !lower1.inclusive;
! 		/* change upper/lower labels to avoid Assert failures */
! 		upper2.lower = true;
! 		lower1.lower = false;
! 		r3 = make_range(typcache, upper2, lower1, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
! 	}
! 	if (cmp == 0)
  	{
! 		PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive);
  	}
  
  	PG_RETURN_BOOL(false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-22 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 More formally, let there be two arbitrary ranges
   a,b,i_a,i_b
   c,d,i_c,i_d
 where the first two parameters are the lower respectively upper bound, and
 the last two are booleans saying whether the lower respectively upper bound
 is inclusive (true) or exclusive (false).

 These ranges are then adjacent exactly if the range
   b,c,!i_b,!i_c
 is empty.

I tried to implement this, and I think it has a small bug.  It works as
stated if we have b  c.  However, if we have b == c, then we want to
consider the ranges adjacent if i_b != i_c (ie, only one of them claims
the common boundary point).  But a singleton range is empty unless it's
inclusive on both sides.  So we have to have a special case when the
bounds are equal.

(If b  c, then of course we have to consider the two ranges in the
opposite order.)

Attached is a draft patch for this.  It passes regression tests but I've
not tried to exercise it with a canonical function that actually does
something different.  It's going to be a bit slower than Jeff's
original, because it does not only range_cmp_bound_values but also a
make_range cycle (in most cases).  So I guess the question is how much
we care about supporting canonical functions with non-default policies.
Thoughts?

regards, tom lane

diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 3326cb17c895273fd01c4eda5eb0d65a521d0168..890b6850cb6f5611e9afa922b85b0c64125f31bb 100644
*** a/src/backend/utils/adt/rangetypes.c
--- b/src/backend/utils/adt/rangetypes.c
*** range_adjacent(PG_FUNCTION_ARGS)
*** 699,704 
--- 699,706 
  upper2;
  	bool		empty1,
  empty2;
+ 	RangeType  *r3;
+ 	int			cmp;
  
  	/* Different types should be prevented by ANYRANGE matching rules */
  	if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
*** range_adjacent(PG_FUNCTION_ARGS)
*** 714,736 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * For two ranges to be adjacent, the lower boundary of one range has to
! 	 * match the upper boundary of the other. However, the inclusivity of
! 	 * those two boundaries must also be different.
! 	 *
! 	 * The semantics for range_cmp_bounds aren't quite what we need here, so
! 	 * we do the comparison more directly.
  	 */
! 	if (lower1.inclusive != upper2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, lower1, upper2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
! 
! 	if (upper1.inclusive != lower2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, upper1, lower2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
  
  	PG_RETURN_BOOL(false);
--- 716,758 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * Given two ranges A..B and C..D, where B  C, the ranges are adjacent
! 	 * if and only if the range B..C is empty, where inclusivity of these two
! 	 * bounds is inverted compared to the original bounds.  For discrete
! 	 * ranges, we have to rely on the canonicalization function to normalize
! 	 * B..C to empty if it contains no elements of the subtype.  If B == C,
! 	 * the ranges are adjacent only if these bounds have different inclusive
! 	 * flags (i.e., exactly one of the ranges includes the common boundary).
! 	 * And if B  C then the ranges cannot be adjacent in this order, but we
! 	 * must consider the other order (i.e., check D = A).
  	 */
! 	cmp = range_cmp_bound_values(typcache, upper1, lower2);
! 	if (cmp  0)
  	{
! 		upper1.inclusive = !upper1.inclusive;
! 		upper1.lower = true;
! 		lower2.inclusive = !lower2.inclusive;
! 		lower2.lower = false;
! 		r3 = make_range(typcache, upper1, lower2, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
  	}
! 	if (cmp == 0)
  	{
! 		PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive);
! 	}
! 	cmp = range_cmp_bound_values(typcache, upper2, lower1);
! 	if (cmp  0)
! 	{
! 		upper2.inclusive = !upper2.inclusive;
! 		upper2.lower = true;
! 		lower1.inclusive = !lower1.inclusive;
! 		lower1.lower = false;
! 		r3 = make_range(typcache, upper2, lower1, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
! 	}
! 	if (cmp == 0)
! 	{
! 		PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive);
  	}
  
  	PG_RETURN_BOOL(false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-19 Thread Jeff Davis
On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
 Yeah, probably not.  However, I don't like the idea of
 '(3,4)'::int4range throwing an error, as it currently does, because it
 seems to require the application to have quite a lot of knowledge of the
 range semantics to avoid having errors sprung on it.

OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
OK as well (for any range type), because at least it's consistent.

The one that I find strange is [3,3), but I think that needs to work for
the range_adjacent idea to work. Seeing it as useful in the context of
range_adjacent might mean that it's useful elsewhere, too, so now I'm
leaning toward supporting [3,3) as an empty range.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-19 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
 Yeah, probably not.  However, I don't like the idea of
 '(3,4)'::int4range throwing an error, as it currently does, because it
 seems to require the application to have quite a lot of knowledge of the
 range semantics to avoid having errors sprung on it.

 OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
 OK as well (for any range type), because at least it's consistent.

 The one that I find strange is [3,3), but I think that needs to work for
 the range_adjacent idea to work. Seeing it as useful in the context of
 range_adjacent might mean that it's useful elsewhere, too, so now I'm
 leaning toward supporting [3,3) as an empty range.

OK, so the thought is that the only actual error condition would be
lower bound value  upper bound value?  Otherwise, if the range
specification represents an empty set, that's fine, we just normalize it
to 'empty'.  Seems consistent to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-19 Thread Florian Pflug
On Nov19, 2011, at 22:03 , Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
 On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
 Yeah, probably not.  However, I don't like the idea of
 '(3,4)'::int4range throwing an error, as it currently does, because it
 seems to require the application to have quite a lot of knowledge of the
 range semantics to avoid having errors sprung on it.
 
 OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
 OK as well (for any range type), because at least it's consistent.
 
 The one that I find strange is [3,3), but I think that needs to work for
 the range_adjacent idea to work. Seeing it as useful in the context of
 range_adjacent might mean that it's useful elsewhere, too, so now I'm
 leaning toward supporting [3,3) as an empty range.
 
 OK, so the thought is that the only actual error condition would be
 lower bound value  upper bound value?  Otherwise, if the range
 specification represents an empty set, that's fine, we just normalize it
 to 'empty'.  Seems consistent to me.

+1. Making the error condition independent from the boundary type makes
it easy for callers to avoid the error conditions. And it should still
catch mistakes that stem from swapping the lower and upper bound.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis
While thinking about range_cmp_bounds, I started to think that the way
range_adjacent works is wrong.

range_adjacent() depends on the bounds of two ranges to match up, such
that the boundary values are equal, but one is exclusive and the other
inclusive, and one is a lower bound and the other an upper bound.

That makes perfect sense for continuous ranges because that is the only
way they can possibly be adjacent. It also works for the discrete ranges
as defined so far, because they use a canonical function that translates
the values to [) form. But if someone were to write a canonical function
that translates the ranges to [] or () form, range_adjacent would be
useless.

There are a few approaches to solving it:

1. Make the canonical function accept a format argument much like the
constructor, and have an output format stored in the catalog that would
be passed in under most circumstances. However, range_adjacent could
pass in its own form that would be useful for its purposes.

2. Replace the canonical function with inc and dec functions, or
some variation thereof. We'd still need some kind of output format to
match the current behavior, otherwise every range would always be output
in [) form (I don't necessarily object to that, but it would be a
behavior difference for user-defined range types).

3. Remove range_adjacent.

4. Do nothing, and document that the canonical function should translate
to either [) or (] if you want range_adjacent to work.

Thoughts?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Florian Pflug
On Nov18, 2011, at 09:25 , Jeff Davis wrote:
 While thinking about range_cmp_bounds, I started to think that the way
 range_adjacent works is wrong.
 
 range_adjacent() depends on the bounds of two ranges to match up, such
 that the boundary values are equal, but one is exclusive and the other
 inclusive, and one is a lower bound and the other an upper bound.

 That makes perfect sense for continuous ranges because that is the only
 way they can possibly be adjacent. It also works for the discrete ranges
 as defined so far, because they use a canonical function that translates
 the values to [) form. But if someone were to write a canonical function
 that translates the ranges to [] or () form, range_adjacent would be
 useless.

Hm, the problem here is for range_adjacent to recognize that [1,2] is
adjacent to [3,4] when treated as integer ranges, but that they're not
adjacent when treated as float ranges. The reason being, of course, that
there's isn't any integer between 2 and 3, but there are floats between
2 and 3.

That information, however, *is* already contained in the canonical
functions, because those function know that (2,3) are empty as an integer
range, but non-empty as a float range.

For example, [1,2] is adjacent to [3,4] as integer ranges because (2,3)
is empty as an integer range. Conversely, since (2,3) is *not* empty as a
float range, [1,2] and [3,4] are *not* adjacent as float ranges.

More formally, let there be two arbitrary ranges
  a,b,i_a,i_b
  c,d,i_c,i_d
where the first two parameters are the lower respectively upper bound, and
the last two are booleans saying whether the lower respectively upper bound
is inclusive (true) or exclusive (false).

These ranges are then adjacent exactly if the range
  b,c,!i_b,!i_c
is empty.

This definition does not depend on any specific canonical form of ranges,
only on the canonicalize function's ability to detect empty ranges.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 ...This definition does not depend on any specific canonical form of ranges,
 only on the canonicalize function's ability to detect empty ranges.

Hmm, well, now that you mention it, I don't think the current canonical
functions handle empty ranges very nicely at all.  They tend to spit up:

regression=# select int4range(4,4,'[]');
 int4range 
---
 [4,5)
(1 row)

regression=# select int4range(4,4,'(]');
ERROR:  range lower bound must be less than or equal to range upper bound
regression=# select int4range(4,4,'()');
ERROR:  range lower bound must be less than or equal to range upper bound

Would it be better for them to silently transform such cases to empty?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis
On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
 regression=# select int4range(4,4,'(]');
 ERROR:  range lower bound must be less than or equal to range upper bound
 regression=# select int4range(4,4,'()');
 ERROR:  range lower bound must be less than or equal to range upper bound
 
 Would it be better for them to silently transform such cases to empty?

That had crossed my mind, but I read the first as saying that it
includes 4 and doesn't include 4, which is a little confusing.

But I wouldn't object to making them return empty ranges. Seeing that we
removed some other errors in favor of returning something, it might be a
little more consistent to return empty when possible.

I wouldn't like to extend that to int4range(4,3), however. When the
upper bound is less than the lower bound, it's almost certainly a
mistake, and the user should be informed.

By the way, what does this have to do with canonical functions? This
seems more like a constructor issue, and there is already a
zero-argument constructor to make empty ranges.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis
On Fri, 2011-11-18 at 13:32 +0100, Florian Pflug wrote:
 That information, however, *is* already contained in the canonical
 functions, because those function know that (2,3) are empty as an integer
 range, but non-empty as a float range.

Very good point. Thank you.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
 Would it be better for them to silently transform such cases to empty?

 I wouldn't like to extend that to int4range(4,3), however. When the
 upper bound is less than the lower bound, it's almost certainly a
 mistake, and the user should be informed.

Yeah, probably not.  However, I don't like the idea of
'(3,4)'::int4range throwing an error, as it currently does, because it
seems to require the application to have quite a lot of knowledge of the
range semantics to avoid having errors sprung on it.

 By the way, what does this have to do with canonical functions? This
 seems more like a constructor issue, and there is already a
 zero-argument constructor to make empty ranges.

What I was concerned about was whether Florian's idea of implementing
range_adjacent by testing for empty intervening range would work, or
would fail because of errors getting thrown.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers