Re: [HACKERS] Range Types, constructors, and the type system

Darren Duncan Sun, 26 Jun 2011 00:58:22 -0700

To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type tojust have a single total order, which is always used with that type in anygeneric order-sensitive operation, including any ranges defined over it, or anyORDER BY or any <,>,etc. The built-in generic text type would have exactly 1system-defined collation that can't be changed, and it would be something simpleand generic, such as simply sorting on the codepoint as integers.

When we want to have some other "native" ordering for an existing type, such aswhen we want to use a different text collation, we do this by creating a *newbase type*, using CREATE TYPE or some shorthand thereof, and this new typedefines its own ordering, such as that a particular text collation is used.


For example:

  CREATE TYPE text__en_US AS (v text);

  CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, becausehaving a different text collation implies that we consider 'foo'::text__en_USand 'foo'::text__C to be different values.

I believe that any other approach is worse, and in particular I believe thatcreating DOMAIN over text is worse, because DOMAIN are supposed to be subtypesof some other type, whose set of member values is a subset of the other type'svalues, and that have the same ordering. Multiple CREATE type over the samebase type don't interfere with each other like multiple DOMAIN could.

Assuming that what CREATE TYPE produces is actually a base type, I believe thereis no better solution using the facilities that SQL provides.

If there is concern about performance related to CREATE TYPE being a compositetype, I'm sure it is possible to engineer an optimization for when the type hasjust 1 attribute so that performance isn't an issue. The main point I'm tryingto raise here is about semantics and good type systems.

Likewise, don't let concern about syntax for using values of such compositetypes. Once again, there can be shorthands if necessary.

In fact, Postgres could provide a general shorthand for creating a compositetype of 1 attribute whose purpose is to make one type that is like but unequalto another, and using this shorthand could also cause the composite type tooverload/polymorph all the operators of it's attribute type, so that the syntaxto define one is very short.


For example:

  CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is justshorthand for defining an ordering function for text__en_US. A more generalform could be:


  CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

  RANGE OF text__en_US

  RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like 
this:

  range('foo','bar')  # default collation

  range('foo'::text__en_US, 'bar'::text__en_US)  # en_us collation

If that seems verbose, I have a few words for you:

1. Users should in practice name their wrapper types over their intendedmeaning, not their mechanics, such as like this (not using text for variety),and that may be more terse:


  CREATE TYPE acct_num WRAPS integer;  # inherits integer ordering by default

2. If the wrapper types overload the base operators, either automatically orselectively (does it make sense to multiply an acct_num?), one doesn't have tokeep unpacking and packing them to use them in most cases. For example, I'dexpect many text wrappers to polymorph catenation or substring etc, so no extrasyntax.

3. In practice, most literal values come from applications and are given to SQLcode either as function parameters or bind parameter arguments. While lots ofexample code may have literal values in it, I would think that most real-workcode would hardly have any, and hence you'd rarely see any 'foo'::text__en_USfor example. You'd more likely see the less common var::text__en_US or such.

So that's my position, CREATE TYPE on the regular types or the like is the bestsolution, and anything else is an inferior solution.


Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
  (anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org

by suggesting that functions of the form:
  (anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
  range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
  range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
  int4range(int4, int4)
  int8range(int8, int8)
  ...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
  range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
        Jeff Davis



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

Reply via email to