Re: [HACKERS] Proposal - Collation at database level

Radek Strnad Fri, 30 May 2008 06:49:40 -0700

Zdenek Kotala wrote:

Radek Strnad napsal(a):


<snip>


I'm thinking of dividing the problem into two parts - in beginning
pg_collation will contain two functions. One will have hard-coded rules
for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
UCS_BASIC). It will compare each string character bitwise and guarantee
that the implementation will meet the SQL standard implemented in
PostgreSQL.
Second one will allow the user to use installed system locales. The set
of these collations will obviously vary between systems. Catalogs will
contain encoding and collation for calling the system locale function.
This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
etc. if they will be availible.

We will also need to change the way how strings are compared. Regarding
the set database collation the right function will be used.

http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675


This design will make possible switch to ICU or any other implementation
quite simple and will not cause any major rewriting of what I'm coding
right now.

Collation function is main point here. How you mentioned one will beonly wrapper about strcmp and second one about strcoll. (maybe youneed four - char/wchar) Which function will be used it is defined inpg_collation catalog by CREATE COLLATION command. But you need specifyname of locale for system locales. It means you need attribute forstoring locale name.

You're right. I've extended pg_collation for system locale columns. Inthe first stage we actually don't need any other catalogs such asencoding, etc. and we can build this functionality only on followingpg_collation catalog. Used collation function (system or built-in) willbe decided on existing collation name.


CATALOG(pg_collations, ###)
{
   NameData    colname;        /* collation name */
   Oid        colschema;        /* collation schema */
   NameData   colcharset;    /*  character set specification */
   Oid         colexistingcollation; /* existing collation */
   bool        colpadattribute;    /* pad attribute */
   bool        colcasesensitive;    /* case sensitive */
   bool        colaccent;        /* accent sensitive */
   NameData    colsyslccollate;    /* lc_collate */
   NameData    colsyslcctype; /* lc_ctype */
   regproc        colfunc;        /* used collation function */
} FormData_pg_collations;

FormData_pg_collations;
It would be good to send list of new and modified SQL commands (likeCREATE COLLATION) for wide discussion.

CREATE COLLATION <collation name> FOR <character set specification> FROM<existing collation name> [ <pad characteristic> ] [ <case sensitive> ][ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]


<pad characteristic> := NO PAD | PAD SPACE
<case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
<accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE

Since you can specify order by in select clause there's no need foradding ascending and descending type of collation. They will allways beascending.


DROP COLLATION <collation name>

CREATE DATABASE ... [ COLLATE <collation name> ] ...

ALTER DATABASE ... [ COLLATE <collation name> ] ...

   Any thoughts?

         Radek



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - Collation at database level

Reply via email to