Zdenek Kotala wrote:
Radek Strnad napsal(a):

<snip>


I'm thinking of dividing the problem into two parts - in beginning
pg_collation will contain two functions. One will have hard-coded rules
for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
UCS_BASIC). It will compare each string character bitwise and guarantee
that the implementation will meet the SQL standard implemented in
PostgreSQL.
Second one will allow the user to use installed system locales. The set
of these collations will obviously vary between systems. Catalogs will
contain encoding and collation for calling the system locale function.
This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
etc. if they will be availible.

We will also need to change the way how strings are compared. Regarding
the set database collation the right function will be used.
http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675

This design will make possible switch to ICU or any other implementation
quite simple and will not cause any major rewriting of what I'm coding
right now.


Collation function is main point here. How you mentioned one will be only wrapper about strcmp and second one about strcoll. (maybe you need four - char/wchar) Which function will be used it is defined in pg_collation catalog by CREATE COLLATION command. But you need specify name of locale for system locales. It means you need attribute for storing locale name.

You're right. I've extended pg_collation for system locale columns. In the first stage we actually don't need any other catalogs such as encoding, etc. and we can build this functionality only on following pg_collation catalog. Used collation function (system or built-in) will be decided on existing collation name.

CATALOG(pg_collations, ###)
{
   NameData    colname;        /* collation name */
   Oid        colschema;        /* collation schema */
   NameData   colcharset;    /*  character set specification */
   Oid         colexistingcollation; /* existing collation */
   bool        colpadattribute;    /* pad attribute */
   bool        colcasesensitive;    /* case sensitive */
   bool        colaccent;        /* accent sensitive */
   NameData    colsyslccollate;    /* lc_collate */
   NameData    colsyslcctype; /* lc_ctype */
   regproc        colfunc;        /* used collation function */
} FormData_pg_collations;


FormData_pg_collations;
It would be good to send list of new and modified SQL commands (like CREATE COLLATION) for wide discussion.

CREATE COLLATION <collation name> FOR <character set specification> FROM <existing collation name> [ <pad characteristic> ] [ <case sensitive> ] [ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]

<pad characteristic> := NO PAD | PAD SPACE
<case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
<accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE

Since you can specify order by in select clause there's no need for adding ascending and descending type of collation. They will allways be ascending.

DROP COLLATION <collation name>

CREATE DATABASE ... [ COLLATE <collation name> ] ...

ALTER DATABASE ... [ COLLATE <collation name> ] ...

   Any thoughts?

         Radek



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to