To eludicate my earlier comments on this subject ...
I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.
When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*, using CREATE TYPE or some shorthand thereof, and this new type
defines its own ordering, such as that a particular text collation is used.
For example:
CREATE TYPE text__en_US AS (v text);
CREATE TYPE text__C AS (v text);
These will not compare equal to each other or to text, and that is good, because
having a different text collation implies that we consider 'foo'::text__en_US
and 'foo'::text__C to be different values.
I believe that any other approach is worse, and in particular I believe that
creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes
of some other type, whose set of member values is a subset of the other type's
values, and that have the same ordering. Multiple CREATE type over the same
base type don't interfere with each other like multiple DOMAIN could.
Assuming that what CREATE TYPE produces is actually a base type, I believe there
is no better solution using the facilities that SQL provides.
If there is concern about performance related to CREATE TYPE being a composite
type, I'm sure it is possible to engineer an optimization for when the type has
just 1 attribute so that performance isn't an issue. The main point I'm trying
to raise here is about semantics and good type systems.
Likewise, don't let concern about syntax for using values of such composite
types. Once again, there can be shorthands if necessary.
In fact, Postgres could provide a general shorthand for creating a composite
type of 1 attribute whose purpose is to make one type that is like but unequal
to another, and using this shorthand could also cause the composite type to
overload/polymorph all the operators of it's attribute type, so that the syntax
to define one is very short.
For example:
CREATE TYPE text__en_US WRAPS text COLLATE en_US;
... and I assume the name of that attribute would just be system-defined.
Note that the above is specific to wrapping text, and the COLLATE is just
shorthand for defining an ordering function for text__en_US. A more general
form could be:
CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;
And then we can say:
RANGE OF text__en_US
RANGE OF text
... similarly to how we declare array types with ARRAY.
One can also just define range values as they do array values, such as like
this:
range('foo','bar') # default collation
range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation
If that seems verbose, I have a few words for you:
1. Users should in practice name their wrapper types over their intended
meaning, not their mechanics, such as like this (not using text for variety),
and that may be more terse:
CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default
2. If the wrapper types overload the base operators, either automatically or
selectively (does it make sense to multiply an acct_num?), one doesn't have to
keep unpacking and packing them to use them in most cases. For example, I'd
expect many text wrappers to polymorph catenation or substring etc, so no extra
syntax.
3. In practice, most literal values come from applications and are given to SQL
code either as function parameters or bind parameter arguments. While lots of
example code may have literal values in it, I would think that most real-work
code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US
for example. You'd more likely see the less common var::text__en_US or such.
So that's my position, CREATE TYPE on the regular types or the like is the best
solution, and anything else is an inferior solution.
Such a design is also how I do collations and ranges in my Muldis D language.
-- Darren Duncan
Jeff Davis wrote:
Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.
For instance:
[a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.
But what if you have a function (like a constructor), of the form:
(anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).
This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php
My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:
http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us
I think Florian proposed the most promising line of attack here:
http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org
by suggesting that functions of the form:
(anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).
Although it's very nice to be able to say:
range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
range(1::int8, 10::int8)
or similar.
So, we could just make functions like:
int4range(int4, int4)
int8range(int8, int8)
...
when creating the range type, and it would actually be a usability
improvement.
There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.
We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).
Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.
Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?
Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.
Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers