Re: [HACKERS] [WIP] patch - Collation at database level

2008-08-02 Thread Radek Strnad
Hello,

the main reason why I've submitted the patch was to start a discussion and
know other people's opinion on this problem.

On Tue, Jul 29, 2008 at 10:41 AM, Peter Eisentraut [EMAIL PROTECTED] wrote:


 Where are the collations going to come from?


There will be two new catalogs - pg_collate and pg_charset. Each of them
will be filled with ANSI standard collations and charsets (ISO8BIT, LATIN1,
UTF-8..) and alternatively with default collation set when creating. For
instance if you create database cluster with initdb and specify en_US.utf8
there will be standard rows (ISO8BIT, LATIN1, UTF-8..) + one row with
en_US.utf8 in template0. Then you can connect to template0 and create other
collations if your POSIX locales support them and use them one per each
database.

Have the various build and distributions issues been thought about?


Yes. Since POSIX locales doesn't guarantee any collation there will be
hard-coded collations implemented regarding ANSI collation standard. Others
can be set by command CREATE COLLATION.

 How are they going to be configured (not the SQL syntax, but how will the
 configuration be applied)?


pg_type, pg_attribute, pg_namespace of each database will be extended with
collation oid column that will be specifying collation.

 How are the collations going to be applied at run-time?


Collation will be set when connecting to the database with
setlocale(LC_COLLATION, XXX) and setlocale(LC_CTYPE, XXX)


  How are you going to handle locale and encoding conflicts?


Since I'm currently implementing collation support per database I don't
think this is an issue. (It will be in the future I know.)


  I also think that the clauses you have attached to your CREATE COLLATION
 statement (case-insensitive,
 accent-insensitive) are an oversimplification of reality.  I suggest you
 look
 up the Unicode collation algorithm to learn about who collations work in
 practice.


I already did in the very beginning of the development. The reason why I'm
not implementing the whole Unicode collation algorithm is that this patch
shold be sort of framework. You'll be able to use different collation
functions not only POSIX locales so further development towards full Unicode
collation algorithm is possible.

At the end of the next week I'll publish my bachelor thesis concerning this
topic where everything will be explained in details so stay tuned.

Regards

Radek Strnad


Re: [HACKERS] [WIP] patch - Collation at database level

2008-08-02 Thread Martijn van Oosterhout
On Sat, Aug 02, 2008 at 03:39:18PM +0200, Radek Strnad wrote:
   I also think that the clauses you have attached to your CREATE
  COLLATION statement (case-insensitive, accent-insensitive) are an
  oversimplification of reality.  I suggest you look up the Unicode
  collation algorithm to learn about who collations work in practice.
 
 I already did in the very beginning of the development. The reason why I'm
 not implementing the whole Unicode collation algorithm is that this patch
 shold be sort of framework. You'll be able to use different collation
 functions not only POSIX locales so further development towards full Unicode
 collation algorithm is possible.

Agreed. Ofcourse it's a simplification of reality. POSIX locales are a
simplification of reality, but its the only form of collation currently
available to us. And quite frankly, I don't beleive postgresql should
be in the business of writing collation algorithms, we don't have the
expertese.

FWIW, I think case-insensitive and accent-insensitive are useful modifiers
that we should aim to support in the future.

 At the end of the next week I'll publish my bachelor thesis concerning this
 topic where everything will be explained in details so stay tuned.

Good luck!

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-29 Thread Peter Eisentraut
Am Tuesday, 1. July 2008 schrieb Radek Strnad:
 I'm sending part of the code that I've done and is available for reviewing
 and I'm asking for your comments and some help because I'm new to
 PostgreSQL.

 Proposal: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00857.php

 The code contains changes of parser, pg_collation catalog, parsenodes.h for
 CREATE COLLATION... and DROP COLLATION statements.

This patch is a small start on the way to adding a catalog and some SQL 
statements that add and remove entries from it.  But I don't see any 
collation support here, which is about 99% of the work that is left to do.  
Where are the collations going to come from?  Have the various build and 
distributions issues been thought about?  How are they going to be configured 
(not the SQL syntax, but how will the configuration be applied)?  How are the 
collations going to be applied at run-time?  How are you going to handle 
locale and encoding conflicts?  I also think that the clauses you have 
attached to your CREATE COLLATION statement (case-insensitive, 
accent-insensitive) are an oversimplification of reality.  I suggest you look 
up the Unicode collation algorithm to learn about who collations work in 
practice.

In my opinion, you are starting this project from the wrong end.  I would 
suggest you approach it like this:

- Find some collation implementations
- Patch PostgreSQL to link with them
- Patch PostgreSQL to apply them for comparison
- Implement system catalog to configure them
- Implement SQL statements to manipulate the system catalog

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-29 Thread Gregory Stark
Peter Eisentraut [EMAIL PROTECTED] writes:

 In my opinion, you are starting this project from the wrong end.  I would 
 suggest you approach it like this:

 - Find some collation implementations
 - Patch PostgreSQL to link with them

Well I think the feeling is that we may as well start with the lowest common
denominator of libc's collation implementation. It's the only one everyone's
going to have. Later adding compile-time options to use a different library
and different function calls might be useful but a lot of people are unhappy
about the idea of *requiring* a major outside library for this.

 - Patch PostgreSQL to apply them for comparison

Er, yes. Well we do that already but the tricky bit is keeping track of
multiple collations and applying the right one for each comparison.

So we do need the concept of multiple collations and the syntax to select a
collation for each ordering operation.



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-09 Thread Martijn van Oosterhout
On Tue, Jul 08, 2008 at 09:05:11PM +0200, Zdenek Kotala wrote:
 All the argument here is based on the premise that we should have
 database-level collation specifications, which AFAICS is not required
 nor suggested by the SQL spec. 
 
 Yeah, it is not required, but by my opinion it should be derived from 
 CREATE SCHEMA statement. There is following item:
 
 --- SQL ANSI 2003 page 520 ---
 
 5) If schema character set specification is not specified, then a schema 

Careful, this is a 'character set specification which has (almost)
nothing to do with collation. It's closer to the encoding field, which
is already in pg_database.

The issue with having a default database collation is that it's
unclear where it would be used. In the end the collation is defined by
the types and domains. Columns inherit from the types. I think the only
senseible definition is to decide that all the text/varchar/char types
inherit from the database. It's not in the spec but I think it does
make easier to decide what the default collation is. As an alternative
to:

ALTER TYPE text SET DEFAULT COLLATION TO foo;
repeat for all other text types

 I try to determine how to implement collation itself - collation catalog 
 structure and content and how to create new collation. Column-level 
 collation is nice but until we will not have basic infrastructure we cannot 
 start implemented it.

I agree that this patch includes much basic work that needs to be done
for full collation support.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-09 Thread Zdenek Kotala

Martijn van Oosterhout napsal(a):

On Tue, Jul 08, 2008 at 09:05:11PM +0200, Zdenek Kotala wrote:

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. 
Yeah, it is not required, but by my opinion it should be derived from 
CREATE SCHEMA statement. There is following item:


--- SQL ANSI 2003 page 520 ---

5) If schema character set specification is not specified, then a schema 


Careful, this is a 'character set specification which has (almost)
nothing to do with collation. It's closer to the encoding field, which
is already in pg_database.


Yeah, I'm confuse why for character set is defined behavior when is not 
specified but there is not definition how to handle it for collation.


Maybe because charset has default collation.


The issue with having a default database collation is that it's
unclear where it would be used. In the end the collation is defined by
the types and domains. Columns inherit from the types. I think the only
senseible definition is to decide that all the text/varchar/char types
inherit from the database. It's not in the spec but I think it does
make easier to decide what the default collation is. As an alternative
to:

ALTER TYPE text SET DEFAULT COLLATION TO foo;
repeat for all other text types


It is good point I thought about following situation

select 'xxx'  'yyy';

But how you mentioned because both are text and text type has default collation 
there is no reason to have collation per database.


Hmm, It seems to me that expectation should be create all string data types with 
default collation during initdb phase. It means modify BKI types definition 
according to default locale setting.



Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Tom Lane napsal(a):

Gregory Stark [EMAIL PROTECTED] writes:

Out of curiosity, what is a user-defined collation? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!




snip


We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.


I though more about it and I discussed it with Radek yesterday. The problem is 
that collation must be created before user want to use CREATE DATABASE ... 
COLLATE ... command. It inclines to have have pg_collation as a global catalog, 
but ANSI specifies to use schema name in collation specification and schemes are 
database specific ... It means that pg_collation have to be non-shared catalog 
and new database only inherits collation from template db. And CREATE DATABASE 
have to check list of collation in template database :(.


My conclusion is that CREATE COLLATION does not make much sense. I see two 
possible solutions:


1) have global an local catalog for collation and have modified variants of 
create cmd:


   CREATE COLLATION ... GLOBAL|LOCAL

   CREATE DATABASE will use only collation from global catalog

Local catalog will be useful when full support of collation will be available 
mostly for specifying case sensitivity of collation.



2) Use Tom's suggested approach. Create list of collations in initdb phase. But 
there is problem how to obtain list of supported collation on the server. I 
think, only what is possible to do is to use default locale for creating default 
collation for template1 database.



Any suggestion?

thanks Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Zdenek Kotala napsal(a):



I though more about it and I discussed it with Radek yesterday. The 
problem is that collation must be created before user want to use CREATE 
DATABASE ... COLLATE ... command. It inclines to have have pg_collation 
as a global catalog, but ANSI specifies to use schema name in collation 
specification and schemes are database specific ... It means that 
pg_collation have to be non-shared catalog and new database only 
inherits collation from template db. And CREATE DATABASE have to check 
list of collation in template database :(.


thinking more ...
It must be shared catalog because pg_database will depend on it.

Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Martijn van Oosterhout
On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote:
 Zdenek Kotala napsal(a):
 
 
 I though more about it and I discussed it with Radek yesterday. The 
 problem is that collation must be created before user want to use CREATE 
 DATABASE ... COLLATE ... command. It inclines to have have pg_collation 
 as a global catalog, but ANSI specifies to use schema name in collation 
 specification and schemes are database specific ... It means that 
 pg_collation have to be non-shared catalog and new database only 
 inherits collation from template db. And CREATE DATABASE have to check 
 list of collation in template database :(.
 
 thinking more ...
 It must be shared catalog because pg_database will depend on it.

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Martijn van Oosterhout napsal(a):

On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote:

Zdenek Kotala napsal(a):

I though more about it and I discussed it with Radek yesterday. The 
problem is that collation must be created before user want to use CREATE 
DATABASE ... COLLATE ... command. It inclines to have have pg_collation 
as a global catalog, but ANSI specifies to use schema name in collation 
specification and schemes are database specific ... It means that 
pg_collation have to be non-shared catalog and new database only 
inherits collation from template db. And CREATE DATABASE have to check 
list of collation in template database :(.

thinking more ...
It must be shared catalog because pg_database will depend on it.


Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.


I think, Collation situation is different, becasue pg_database will contains 
column colname. pg_class (and all bootstrap catalog) only contains row which 
specify that shared table exists and content is cloned to the new database from 
template database. In corner case you can get context specific dependency for 
example if Czech collation will have oid=10 in database test01 and Swedish 
collation will have oid=10 in database test02. How to handle CREATE DATABASE and 
connect database? OK it shouldn't happen in normal situation  but ...


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Martijn van Oosterhout
On Tue, Jul 08, 2008 at 12:00:34PM +0200, Zdenek Kotala wrote:
 Not necessarily. pg_class is not shared yet without it you can't even
 find pg_database. Same deal with pg_type. All it means is that
 pg_collation in template1 must contain all the collations used in
 template1, which shouldn't be hard to arrange.
 
 I think, Collation situation is different, becasue pg_database will 
 contains column colname. pg_class (and all bootstrap catalog) only contains 
 row which specify that shared table exists and content is cloned to the new 
 database from template database. In corner case you can get context 
 specific dependency for example if Czech collation will have oid=10 in 
 database test01 and Swedish collation will have oid=10 in database test02. 
 How to handle CREATE DATABASE and connect database? OK it shouldn't happen 
 in normal situation  but ...

Oh I see, you're referring to the storage of the default collation for
a database. I was jumping ahead to the per-column collation state, when
the collation default is attached to columns, types and domains, and
not at the database level. So there the problem does not exist.

To be honest, I'd suggest storing the collation in pg_database as a
string, rather than as an identifier. This sidesteps the problem
entirly.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Martijn van Oosterhout napsal(a):


Oh I see, you're referring to the storage of the default collation for
a database. I was jumping ahead to the per-column collation state, when
the collation default is attached to columns, types and domains, and
not at the database level. So there the problem does not exist.


Yeah, but you still need one source/one collation list for database, scheme, 
table and column. And of course shared tables need also collation for their indexes.



To be honest, I'd suggest storing the collation in pg_database as a
string, rather than as an identifier. This sidesteps the problem
entirly.


I don't think that string is good idea. You need to use same approach on all 
levels by my opinion.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 Martijn van Oosterhout napsal(a):
 Not necessarily. pg_class is not shared yet without it you can't even
 find pg_database. Same deal with pg_type. All it means is that
 pg_collation in template1 must contain all the collations used in
 template1, which shouldn't be hard to arrange.

 I think, Collation situation is different,

All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.  I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 ... And of course shared tables need also collation for their indexes.

No, they don't, because the only textual indexes on shared catalogs are
on name columns, which are intentionally not locale aware, and
wouldn't be collation aware either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Andrew Dunstan



Tom Lane wrote:


All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.  I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.


  


Agreed. Are we even sure that we want per-database collations as a 
half-way house? Unless we can be sure that we want all the required 
catalog changes for the full requirement, it seems to me a rather messy 
way of getting to where we want to go.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Martijn van Oosterhout
On Tue, Jul 08, 2008 at 10:54:28AM -0400, Andrew Dunstan wrote:
 Agreed. Are we even sure that we want per-database collations as a 
 half-way house? Unless we can be sure that we want all the required 
 catalog changes for the full requirement, it seems to me a rather messy 
 way of getting to where we want to go.

Given that the current projected timeframe for full COLLATE support is
something like 2-3 major releases, I don't really see the problem with
doing this now. Who knows, it might prompt people to do something
sooner.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Josh Berkus
Andrew, Tom,

 Agreed. Are we even sure that we want per-database collations as a
 half-way house? Unless we can be sure that we want all the required
 catalog changes for the full requirement, it seems to me a rather messy
 way of getting to where we want to go.

Given that we don't have a delivery date for table or column level collations, 
we don't want to turn down database-level collations.If nothing else, 
Radek's work will expose what areas of our code are collation-dependant and 
hopefully make the work of more granular collations easier.  And if it takes 
us 3 years to get more granular collations, at least people can use 
database-level ones in the meantime so that they don't need to have separate 
PostgreSQL binaries for every language they want to support fully.

Also ... this is a Summer of Code Project, which we accepted, which at least 
in Google and the student's eyes means we're not going to discard the entire 
premise of the patch.  I'm not exaggerating when I say doing something like 
that could get PostgreSQL permanently banned from Google SoC.  

Tom, I think you need to be on the SoC committee in the future, just to raise 
objections.  Some 15+ PostgreSQL contributors on the SoC committee approved 
Radek's project.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Michael Paesold

Tom Lane wrote:


Zdenek Kotala [EMAIL PROTECTED] writes:

Martijn van Oosterhout napsal(a):
Not necessarily. pg_class is not shared yet without it you can't  
even

find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.



I think, Collation situation is different,


All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.  I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving  
the

real problem which is column-level collations.


Wouldn't you still need per-database and per-table default collations?  
At least MySQL does have such a concept.


Best Regards
Michael Paesold 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:

... And of course shared tables need also collation for their indexes.


No, they don't, because the only textual indexes on shared catalogs are
on name columns, which are intentionally not locale aware, and
wouldn't be collation aware either.


Yeah, name uses strcmp, which is not locale aware but from ANSI perspective 
there is collation SQL_IDENTIFIER for it which is fortunately implementation 
defined.


What I see now as the problem is that we need also to know correct collation for 
ORDER  - for example:


select * from pg_shdescription order by description;

...thinking...

but it should solve by collation per column which will work well with 
pg_attribute cloning for new database as Martijn mentioned.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Andrew, Tom,
 Agreed. Are we even sure that we want per-database collations as a
 half-way house? Unless we can be sure that we want all the required
 catalog changes for the full requirement, it seems to me a rather messy
 way of getting to where we want to go.

 Given that we don't have a delivery date for table or column level 
 collations, 
 we don't want to turn down database-level collations.

I am one hundred percent prepared to turn them down, if they end up
contorting the design in a way that we will have to undo (with
consequent backwards-compatibility problems) to get to the full feature.

If it's a partial implementation of the full feature, that's fine, but
I'm not getting good vibes about that from the discussions so far.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:

Martijn van Oosterhout napsal(a):

Not necessarily. pg_class is not shared yet without it you can't even
find pg_database. Same deal with pg_type. All it means is that
pg_collation in template1 must contain all the collations used in
template1, which shouldn't be hard to arrange.



I think, Collation situation is different,


All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec. 


Yeah, it is not required, but by my opinion it should be derived from CREATE 
SCHEMA statement. There is following item:


--- SQL ANSI 2003 page 520 ---

5) If schema character set specification is not specified, then a schema 
character set specification that specifies an implementation-defined character 
set that contains at least every character that is in SQL language character 
is implicit.




It is not for collation directly, but if I understand it correctly when you want 
to create schema  then default charset is inherit from parent instance which is 
database (catalog).


Following sentence specified that pg_collation should be database specific.

 SQL ANSI 2003 page 15 ---
Character sets defined by standards or by SQL-implementations reside in the 
Information Schema (named INFORMATION_SCHEMA) in each catalog, as do collations 
defined by standards and collations, transliterations, and transcodings defined 
by SQL-implementations.



 I wonder why we are allowing a
 nonstandard half-measure to drive our thinking, rather than solving the
 real problem which is column-level collations.

I try to determine how to implement collation itself - collation catalog 
structure and content and how to create new collation. Column-level collation is 
nice but until we will not have basic infrastructure we cannot start implemented it.


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-08 Thread Zdenek Kotala

Andrew Dunstan napsal(a):



Tom Lane wrote:


All the argument here is based on the premise that we should have
database-level collation specifications, which AFAICS is not required
nor suggested by the SQL spec.  I wonder why we are allowing a
nonstandard half-measure to drive our thinking, rather than solving the
real problem which is column-level collations.


  


Agreed. Are we even sure that we want per-database collations as a 
half-way house? Unless we can be sure that we want all the required 
catalog changes for the full requirement, it seems to me a rather messy 
way of getting to where we want to go.


Andrew,
I would like also to see full collation implementation rather then collation per 
database. But from my point of view split collation into small parts is much 
better. Radek's work is mostly about creating infrastructure for full collation 
support. When it will be finished, then only changes in parser, executor... 
will be necessary to complete a job.


If you look on most discussion about collation they fell into ICU yes/no problem 
without any real decision how to implemented the feature.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-03 Thread Zdenek Kotala

Tom Lane napsal(a):

Gregory Stark [EMAIL PROTECTED] writes:

Out of curiosity, what is a user-defined collation? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!


Hm, that's a good point.  SQL99 has

 collation definition ::=
  CREATE COLLATION collation name FOR
  character set specification
FROM existing collation name
  [ pad characteristic ]

 existing collation name ::= collation name

 pad characteristic ::=
NO PAD
  | PAD SPACE

which seems pretty stupid if you ask me --- all the mechanism required
to manage a new object type, just to enable PAD SPACE or not?
(Especially when PAD SPACE itself is an utterly broken, useless concept
... but I digress.)  You might as well just provide all the standard
collations in both variants and be done with it.

The statement looks the same in last year's 200n draft, so it's not
like they were just about to add some more capability.


The proposed syntax of CREATE COLLATION is:

CREATE COLLATION collation name FOR character set specification
FROM existing collation name [STRCOLFN fn name]
[ pad characteristic ] [ case sensitive ] [ accent sensitive ]
[ LCCOLLATE lc_collate ] [ LCCTYPE lc_ctype ]

Which extends ANSI specification.



We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.


I think you cannot create all collation at bootstrap. You can only create record 
for actual LC_COLLATION, because I there is not standard way how to obtain 
complete list of supported collations and there is also problem if you install 
new locales after initdb.


When I looked to another DB (MS SQL, MySQL, DB2, Firebird) then only Firebird 
supports CREATE COLLATION command. Other databases has hard coded list of 
locales. Hardcoded solution means to use some lib (e.g ICU) with unified names 
or has locale name mapping for all supported OS.


I personally prefer open solution when I can create own collation and specify 
collation function to handle it.



Zdenek





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Alvaro Herrera
Radek Strnad escribió:

 2) What type should all names in CREATE and DROP statement in gram.y have?
 I've chosen qualified_name but I know it's not the best choice.

I think it should be ColId.

 3) All collations are created from existing collations. How do I ensure that
 the collation already exists? Is there any possibility to define it in
 gram.y?

Certainly not -- shouldn't they come from a catalog?  In that case, it
must come in parse analysis (parser/analyze.c I guess) or perhaps later,
when you actually execute the function to create the new collation.

 5) Also can you look at the pg_catalog and tell me if anything is wrong with
 it?

Why does a collation have a schema?

What's the existing collation?

It seems a bit silly to have enum for what are basically boolean
variables.  Why not just use true and false?
 
-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Why does a collation have a schema?

Because the SQL spec says so.  Also, if we don't put them in schemas,
we have no nice way to distinguish built-in and user-defined collations,
which creates a problem for pg_dump.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Why does a collation have a schema?
 
 Because the SQL spec says so.  Also, if we don't put them in schemas,
 we have no nice way to distinguish built-in and user-defined collations,
 which creates a problem for pg_dump.

Oh, I see :-)  In that case, qualified_name would seem the right symbol
to use in the parser.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Alvaro Herrera [EMAIL PROTECTED] writes:
 Why does a collation have a schema?

 Because the SQL spec says so.  Also, if we don't put them in schemas,
 we have no nice way to distinguish built-in and user-defined collations,
 which creates a problem for pg_dump.

Out of curiosity, what is a user-defined collation? Are there SQL statements
to go around declaring what order code points should be sorted in? That seems
like it would be... quite tedious!

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Martijn van Oosterhout
On Wed, Jul 02, 2008 at 07:22:10PM +0100, Gregory Stark wrote:
  Because the SQL spec says so.  Also, if we don't put them in schemas,
  we have no nice way to distinguish built-in and user-defined collations,
  which creates a problem for pg_dump.
 
 Out of curiosity, what is a user-defined collation? Are there SQL statements
 to go around declaring what order code points should be sorted in? That seems
 like it would be... quite tedious!

Not that we'll ever use it, but ICU for example allows users to say:
use collation X but move this code point somewhere else, essentially
allowing users tweak the collation on a small scale. In any case,
whatever collation library is used, we're unlikely to predefine every
possible collation in the system, there's too many (assuming they're
denumerable).

Have a niceday,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Radek Strnad
My patch should be sort of wrapper that will implement guts for further
development (collation at column level) like catalogs, creating collations
etc. When creating collation user will be able to choose which function to
use (by statement STRCOLFN - not in SQL standard). In the first stage I'll
implement function that will use system locales. Adding ICU or any other
library won't be that big deal.

Radek Strnad

On Wed, Jul 2, 2008 at 8:22 PM, Gregory Stark [EMAIL PROTECTED]
wrote:

 Tom Lane [EMAIL PROTECTED] writes:

  Alvaro Herrera [EMAIL PROTECTED] writes:
  Why does a collation have a schema?
 
  Because the SQL spec says so.  Also, if we don't put them in schemas,
  we have no nice way to distinguish built-in and user-defined collations,
  which creates a problem for pg_dump.

 Out of curiosity, what is a user-defined collation? Are there SQL
 statements
 to go around declaring what order code points should be sorted in? That
 seems
 like it would be... quite tedious!

 --
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!



Re: [HACKERS] [WIP] patch - Collation at database level

2008-07-02 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Out of curiosity, what is a user-defined collation? Are there SQL statements
 to go around declaring what order code points should be sorted in? That seems
 like it would be... quite tedious!

Hm, that's a good point.  SQL99 has

 collation definition ::=
  CREATE COLLATION collation name FOR
  character set specification
FROM existing collation name
  [ pad characteristic ]

 existing collation name ::= collation name

 pad characteristic ::=
NO PAD
  | PAD SPACE

which seems pretty stupid if you ask me --- all the mechanism required
to manage a new object type, just to enable PAD SPACE or not?
(Especially when PAD SPACE itself is an utterly broken, useless concept
... but I digress.)  You might as well just provide all the standard
collations in both variants and be done with it.

The statement looks the same in last year's 200n draft, so it's not
like they were just about to add some more capability.

We might be best off to treat collations like index access methods,
ie, they're theoretically add-able but there's no infrastructure for
managing them, and what's expected is that all the ones you need are
created by initdb.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [WIP] patch - Collation at database level

2008-07-01 Thread Radek Strnad
Hello,

I'm sending part of the code that I've done and is available for reviewing
and I'm asking for your comments and some help because I'm new to
PostgreSQL.

Proposal: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00857.php

The code contains changes of parser, pg_collation catalog, parsenodes.h for
CREATE COLLATION... and DROP COLLATION statements.

Questions that I would like to know answers:

1) Should I move DROP statement from DropCollationStmt to DropStmt? Some
statements are seperated and some are included in DropStmt? Is there any
reason for that?
2) What type should all names in CREATE and DROP statement in gram.y have?
I've chosen qualified_name but I know it's not the best choice.
3) All collations are created from existing collations. How do I ensure that
the collation already exists? Is there any possibility to define it in
gram.y?
4) For further functionality development is there anything more needed than
adding T_CreateCollationStmt and T_DropCollationStmt to
/src/backend/tcop/utility.c, and write functionality into collation.c?
5) Also can you look at the pg_catalog and tell me if anything is wrong with
it?

Thank you for all your replies in advance.

Regards

Radek Strnad
Index: backend/parser/keywords.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v
retrieving revision 1.194
diff -c -r1.194 keywords.c
*** backend/parser/keywords.c	1 Jan 2008 19:45:50 -	1.194
--- backend/parser/keywords.c	1 Jul 2008 14:29:26 -
***
*** 33,38 
--- 33,39 
  	/* name, value, category */
  	{abort, ABORT_P, UNRESERVED_KEYWORD},
  	{absolute, ABSOLUTE_P, UNRESERVED_KEYWORD},
+ 	{accent, ACCENT, UNRESERVED_KEYWORD},
  	{access, ACCESS, UNRESERVED_KEYWORD},
  	{action, ACTION, UNRESERVED_KEYWORD},
  	{add, ADD_P, UNRESERVED_KEYWORD},
***
*** 81,87 
  	{close, CLOSE, UNRESERVED_KEYWORD},
  	{cluster, CLUSTER, UNRESERVED_KEYWORD},
  	{coalesce, COALESCE, COL_NAME_KEYWORD},
! 	{collate, COLLATE, RESERVED_KEYWORD},
  	{column, COLUMN, RESERVED_KEYWORD},
  	{comment, COMMENT, UNRESERVED_KEYWORD},
  	{commit, COMMIT, UNRESERVED_KEYWORD},
--- 82,89 
  	{close, CLOSE, UNRESERVED_KEYWORD},
  	{cluster, CLUSTER, UNRESERVED_KEYWORD},
  	{coalesce, COALESCE, COL_NAME_KEYWORD},
! 	{collate, COLLATE, UNRESERVED_KEYWORD},
! 	{collation, COLLATION, UNRESERVED_KEYWORD},
  	{column, COLUMN, RESERVED_KEYWORD},
  	{comment, COMMENT, UNRESERVED_KEYWORD},
  	{commit, COMMIT, UNRESERVED_KEYWORD},
***
*** 206,211 
--- 208,215 
  	{language, LANGUAGE, UNRESERVED_KEYWORD},
  	{large, LARGE_P, UNRESERVED_KEYWORD},
  	{last, LAST_P, UNRESERVED_KEYWORD},
+ 	{lccollate, LCCOLLATE, UNRESERVED_KEYWORD},
+ 	{lcctype, LCCTYPE, UNRESERVED_KEYWORD},
  	{leading, LEADING, RESERVED_KEYWORD},
  	{least, LEAST, COL_NAME_KEYWORD},
  	{left, LEFT, TYPE_FUNC_NAME_KEYWORD},
***
*** 270,275 
--- 274,280 
  	{overlay, OVERLAY, COL_NAME_KEYWORD},
  	{owned, OWNED, UNRESERVED_KEYWORD},
  	{owner, OWNER, UNRESERVED_KEYWORD},
+ 	{pad, PAD, UNRESERVED_KEYWORD},
  	{parser, PARSER, UNRESERVED_KEYWORD},
  	{partial, PARTIAL, UNRESERVED_KEYWORD},
  	{password, PASSWORD, UNRESERVED_KEYWORD},
***
*** 317,322 
--- 322,328 
  	{second, SECOND_P, UNRESERVED_KEYWORD},
  	{security, SECURITY, UNRESERVED_KEYWORD},
  	{select, SELECT, RESERVED_KEYWORD},
+ 	{sensitive, SENSITIVE, UNRESERVED_KEYWORD},
  	{sequence, SEQUENCE, UNRESERVED_KEYWORD},
  	{serializable, SERIALIZABLE, UNRESERVED_KEYWORD},
  	{session, SESSION, UNRESERVED_KEYWORD},
***
*** 329,334 
--- 335,341 
  	{simple, SIMPLE, UNRESERVED_KEYWORD},
  	{smallint, SMALLINT, COL_NAME_KEYWORD},
  	{some, SOME, RESERVED_KEYWORD},
+ 	{space, SPACE, UNRESERVED_KEYWORD},
  	{stable, STABLE, UNRESERVED_KEYWORD},
  	{standalone, STANDALONE_P, UNRESERVED_KEYWORD},
  	{start, START, UNRESERVED_KEYWORD},
***
*** 337,342 
--- 344,350 
  	{stdin, STDIN, UNRESERVED_KEYWORD},
  	{stdout, STDOUT, UNRESERVED_KEYWORD},
  	{storage, STORAGE, UNRESERVED_KEYWORD},
+ 	{strcolfn, STRCOLFN, UNRESERVED_KEYWORD},
  	{strict, STRICT_P, UNRESERVED_KEYWORD},
  	{strip, STRIP_P, UNRESERVED_KEYWORD},
  	{substring, SUBSTRING, COL_NAME_KEYWORD},
Index: backend/parser/gram.y
===
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.605
diff -c -r2.605 gram.y
*** backend/parser/gram.y	1 Jan 2008 19:45:50 -	2.605
--- backend/parser/gram.y	1 Jul 2008 14:29:25 -
***
*** 149,168 
  }
  
  %type node	stmt schema_stmt
! 		AlterDatabaseStmt AlterDatabaseSetStmt AlterDomainStmt AlterGroupStmt
! 		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterTableStmt
! 		AlterUserStmt AlterUserSetStmt AlterRoleStmt AlterRoleSetStmt
! 		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
!