Re: [HACKERS] Range Types and length function

2011-06-27 Thread Florian Pflug
On Jun27, 2011, at 03:12 , Jeff Davis wrote:
 But I think you're right, it shouldn't be the responsibility of range
 types. Perhaps I should leave length() as some inlinable SQL functions
 like I mentioned, or perhaps I should remove them completely.

Does the current definition of length(range), i.e.
  upper(range) - lower(range)
deal correctly with open vs. closed ranges and unbounded ranges? I'm thinking
that it probably doesn't - what would be the results of
  length('[0,1]'::intrange) -- Should be 2
  length('[0,1)'::intrange) -- Should be 1
  length('[0,inf]'::intrange) -- Should be infinity, but ints can't
 represent that, can't they?

If it cannot be easily made to support these cases, than I vote for
removing it all together.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and length function

2011-06-27 Thread Jeff Davis
On Mon, 2011-06-27 at 12:25 +0200, Florian Pflug wrote:
 Does the current definition of length(range), i.e.
   upper(range) - lower(range)
 deal correctly with open vs. closed ranges and unbounded ranges? I'm thinking
 that it probably doesn't - what would be the results of
   length('[0,1]'::intrange) -- Should be 2
   length('[0,1)'::intrange) -- Should be 1

I alluded to this problem in an earlier email.

I think this would need to be handled by the canonical function. If
the canonical function is specified to return values in [) or (] form,
then we'd get the behavior above.

However, it's a little strange, because for discrete ranges you probably
want cardinality, not length. I don't have a clear idea on exactly what
behavior users will expect in this case, which is a pretty good argument
to leave length() out.

   length('[0,inf]'::intrange) -- Should be infinity, but ints can't
  represent that, can't they?

That would throw an exception currently, for exactly the reason you
mention.

 If it cannot be easily made to support these cases, than I vote for
 removing it all together.

I now agree. I think you've brought up some good reasons for that. If
users write upper(r)-lower(r), then they know what the semantics will
be; or they can easily write their own length() function (perhaps
specific to a range type).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis
Currently, there is no way to define a generic length function over
range types, which would give you the distance between the boundary
points.

It sounds simple, but the system actually needs quite a lot more
information to accomplish that:
 * a function that subtracts two values of the range's subtype
 * it needs to know the result type of that function, which might not be
the subtype (for instance, for timestamp the difference type would be
interval)
 * it needs to know the zero value of the subtype for empty ranges
 * it also needs to know how to canonicalize discrete ranges for
meaningful results -- what's the length of [10,10]? If you write a
difference canonical function should the result be different? I
suppose so.

Even if the system knows all of that, we might run into problems with
the type system, because if you have a generic function:
  f(anyrange) - anyelement
how would it know whether anyelement should be the subtype (e.g. if
f is the function upper) or the difference type (e.g. if f is the
function length)?

My solution to all of this is somewhat simplistic, but the best idea I
have so far:

create function length(anyrange) returns anyelement
language sql as
$$
  select case when $1? then upper($1) - lower($1) else '0' end;
$$;

And then, for timestamp[tz] and date, just define specific functions for
those like:

create function length(tsrange) returns interval
language sql as
$$
  select case when $1? then upper($1) - lower($1) else '0 s' end;
$$;

In other words, special case the range types where the difference type
is not the same as the subtype, and rely on function overloading to sort
them out.

These work for the most part, but they have a few problems:

1. It assumes that - really means minus and is defined effectively
over the subtypes.

2. It assumes that '0' is valid input for the zero value of the
subtype.

3. If the difference type is not the same as the subtype, and you forget
to define the special-case function, then you are bound to get a cryptic
error.

I suppose the right way to solve these problems would be:

1. Force users to supply the minus function.

2. Force users to supply the zero value as a constant of the same type
as the minus function's return value.

3. Check to see if the minus function's return type is different from
the subtype. If so, automatically create a new entry in the catalog for
the length function.

I suppose it's not out of the question to do all of that work, but it
seems like a little much just to get the generic length() function.

I don't mind leaving it as-is, and I think it's a fairly reasonable
solution. But I thought I would re-open it for discussion in case
someone has a better idea. The length() function is obviously an
important function to provide.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and length function

2011-06-26 Thread Greg Stark
On Sun, Jun 26, 2011 at 8:18 AM, Jeff Davis pg...@j-davis.com wrote:
  * it needs to know the result type of that function, which might not be
 the subtype (for instance, for timestamp the difference type would be
 interval)

What's the use case for the length() function? Is it for users to be
able to display useful information about their ranges? Or is it for
implementing things like GIST indexes?

For the latter a length function that always returns a float might be
more useful. Even if it isn't guaranteed to always be perfectly
precise, that is if ranges of similar length sometimes returned
identical values, at least it could be used for things like penalty().


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis
On Sun, 2011-06-26 at 13:45 +0100, Greg Stark wrote:
 On Sun, Jun 26, 2011 at 8:18 AM, Jeff Davis pg...@j-davis.com wrote:
   * it needs to know the result type of that function, which might not be
  the subtype (for instance, for timestamp the difference type would be
  interval)
 
 What's the use case for the length() function? Is it for users to be
 able to display useful information about their ranges? Or is it for
 implementing things like GIST indexes?

Here I was talking about something for logical use, not GiST. It's
pretty common to want to know how long a range is.

 For the latter a length function that always returns a float might be
 more useful. Even if it isn't guaranteed to always be perfectly
 precise, that is if ranges of similar length sometimes returned
 identical values, at least it could be used for things like penalty().

I already have a function like that. It's actually a function that takes
the subtype and returns a float, and the GiST code does the subtraction.

But you're right, I could have a length function that always returns a
float instead, and that would do the job. Do you see an advantage?

If I had a length function that returned the subtype, I wouldn't need
that. Except for those pesky types like timestamp -- because then, even
if I had a length() function, I'd also need a total order on the
interval type.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and length function

2011-06-26 Thread Florian Pflug
On Jun26, 2011, at 09:18 , Jeff Davis wrote:
 Currently, there is no way to define a generic length function over
 range types, which would give you the distance between the boundary
 points.

I actually wouldn't expect there to be one. From what I gathered
during the last discussion, the ideal behind range types is that they
model sets of the form {x in T | a = x  b} for arbitrary types
T, with the only requirement being that T be ordered. To compute
a length, you additionally need either an algebraic structure on
T which defines an operation minus, or some metric which defines
distance(a,b). Both are *much* stronger concepts than simply being
ordered. The problems you outline below seem to me to all root in
this discrepancy.

Strings are a nice example of an ordered type on which no intuitive
definition of either s1 - s2 or distance(s1,s2) exists.

 I suppose the right way to solve these problems would be:
 
 1. Force users to supply the minus function.
 
 2. Force users to supply the zero value as a constant of the same type
 as the minus function's return value.
 
 3. Check to see if the minus function's return type is different from
 the subtype. If so, automatically create a new entry in the catalog for
 the length function.
 
 I suppose it's not out of the question to do all of that work, but it
 seems like a little much just to get the generic length() function.

That sounds like a poor man's version of type interfaces - i.e. of
a general-purpose way of having various algebraic structures defined
on a type. Having real type interface would be cool, but I don't think
that ranges should introduce a simplistic version of them.

 I don't mind leaving it as-is, and I think it's a fairly reasonable
 solution. But I thought I would re-open it for discussion in case
 someone has a better idea.

I suggest to simply provide no length() function at all. It's beyond
the realm of the mental model behind range types, and providing some
ad-hoc definition is IMHO bound to create more confusion than it'll
help. Especially since upper(range) - lower(range) isn't that much
longer to write than length(range) anyway.

 The length() function is obviously an
 important function to provide.


I'd say it isn't, but maybe I'm missing some use-case that you have
in mind.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis
On Mon, 2011-06-27 at 00:37 +0200, Florian Pflug wrote:
 I actually wouldn't expect there to be one. From what I gathered
 during the last discussion, the ideal behind range types is that they
 model sets of the form {x in T | a = x  b} for arbitrary types
 T, with the only requirement being that T be ordered. To compute
 a length, you additionally need either an algebraic structure on
 T which defines an operation minus, or some metric which defines
 distance(a,b). Both are *much* stronger concepts than simply being
 ordered. The problems you outline below seem to me to all root in
 this discrepancy.

I agree with you here. It does seem like supporting length() increases
the the complexity of range types quite a bit.

 Strings are a nice example of an ordered type on which no intuitive
 definition of either s1 - s2 or distance(s1,s2) exists.

Another good point. There's no logical length() function at all for a
text range.

  The length() function is obviously an
  important function to provide.
 
 
 I'd say it isn't, but maybe I'm missing some use-case that you have
 in mind.

The reason I said that is because, if making only a single range type
for, say, timestamptz, I would make a length() function without even
thinking about it.

There are a few types of queries where that kind of thing is useful,
like billing based on the amount of time some resource is allocated to
you.

But I think you're right, it shouldn't be the responsibility of range
types. Perhaps I should leave length() as some inlinable SQL functions
like I mentioned, or perhaps I should remove them completely.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers