To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1 system-defined collation that can't be changed, and it would be something simple and generic, such as simply sorting on the codepoint as integers.

When we want to have some other "native" ordering for an existing type, such as when we want to use a different text collation, we do this by creating a *new base type*, using CREATE TYPE or some shorthand thereof, and this new type defines its own ordering, such as that a particular text collation is used.

For example:

  CREATE TYPE text__en_US AS (v text);

  CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, because having a different text collation implies that we consider 'foo'::text__en_US and 'foo'::text__C to be different values.

I believe that any other approach is worse, and in particular I believe that creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes of some other type, whose set of member values is a subset of the other type's values, and that have the same ordering. Multiple CREATE type over the same base type don't interfere with each other like multiple DOMAIN could.

Assuming that what CREATE TYPE produces is actually a base type, I believe there is no better solution using the facilities that SQL provides.

If there is concern about performance related to CREATE TYPE being a composite type, I'm sure it is possible to engineer an optimization for when the type has just 1 attribute so that performance isn't an issue. The main point I'm trying to raise here is about semantics and good type systems.

Likewise, don't let concern about syntax for using values of such composite types. Once again, there can be shorthands if necessary.

In fact, Postgres could provide a general shorthand for creating a composite type of 1 attribute whose purpose is to make one type that is like but unequal to another, and using this shorthand could also cause the composite type to overload/polymorph all the operators of it's attribute type, so that the syntax to define one is very short.

For example:

  CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is just shorthand for defining an ordering function for text__en_US. A more general form could be:

  CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

  RANGE OF text__en_US

  RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like 
this:

  range('foo','bar')  # default collation

  range('foo'::text__en_US, 'bar'::text__en_US)  # en_us collation

If that seems verbose, I have a few words for you:

1. Users should in practice name their wrapper types over their intended meaning, not their mechanics, such as like this (not using text for variety), and that may be more terse:

  CREATE TYPE acct_num WRAPS integer;  # inherits integer ordering by default

2. If the wrapper types overload the base operators, either automatically or selectively (does it make sense to multiply an acct_num?), one doesn't have to keep unpacking and packing them to use them in most cases. For example, I'd expect many text wrappers to polymorph catenation or substring etc, so no extra syntax.

3. In practice, most literal values come from applications and are given to SQL code either as function parameters or bind parameter arguments. While lots of example code may have literal values in it, I would think that most real-work code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US for example. You'd more likely see the less common var::text__en_US or such.

So that's my position, CREATE TYPE on the regular types or the like is the best solution, and anything else is an inferior solution.

Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:
Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
  (anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org

by suggesting that functions of the form:
  (anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
  range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
  range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
  int4range(int4, int4)
  int8range(int8, int8)
  ...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
  range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
        Jeff Davis




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to