Re: [HACKERS] How to pass around collation information

Heikki Linnakangas Fri, 28 May 2010 10:22:47 -0700

On 28/05/10 19:27, Peter Eisentraut wrote:

I have been thinking about this collation support business a bit.
Ignoring for the moment where we would get the actual collation routines
from, I wonder how we are going to pass this information around in the
system.  Someone declares a collation on a column in a table, and
somehow this information needs to arrive in bttextcmp() and friends.


Yes. Comparison operators need it, as do functions like isalpha().

Also, functions that take in a string and return one (e.g., substring),
need to take in this information and return it back out.  How should
this work?

Hmm, I don't see what substring would need collation for. And itcertainly shouldn't be returning it. Collation is a property of thecomparison operators (and isalpha etc.), and the planner needs to deducethe right collation for each such operation in the query. That involveslooking at the tables and columns involved, as well as per-userinformation and any explicit COLLATE clauses in the query, but all thathappens at plan-time.

Option 1, make it part of the datum.  That way it will pass through the
system just fine, but it would waste a lot of storage and break just
about everything that operates on string types now, as well as
pg_upgrade.  So that's probably out.

It's also fundamentally wrong, collation is not a property of a datumbut of the operation.

Option 2, invent some new mechanism that accompanies a datum or a type
whereever it goes.  Kind of like typmod, but not really.  Then the
collation information would presumably be made available to functions
through the fmgr interface.  The binary representation of data values
stays the same.

Something like that. I'm thinking that bttextcmp() and friends willsimply take an extra argument indicating the collation, and we'll teachthe operator / operator class infrastructure about that too.

One way to approach this is to realize that it's already possible to usemultiple collations in a database. You just have to define separate < => operators and operator classes for every collation, and change allyour queries to use the right operator depending on the desiredcollation everywhere where you use < = > (including ORDER BYs, with theUSING <operator> syntax). The behavior is exactly what we want, it'sjust completely inpractical, so we need something to do the same in aless cumbersome way.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

Reply via email to