[HACKERS] Refined LC_COLLATE or multiple database clusters?

2004-06-09 Thread Grega Bremec
Hello, List,

I recently stumbled across a problem that I can't really get across.

We have a database cluster (PG 7.4.2) that was initialized as follows:

$ pg_controldata /data/dir
pg_control version number:72
Catalog version number:   200310211
Database cluster state:   in production
pg_control last modified: sre 09 jun 2004 03:00:26 CEST
Current log file ID:  20
Next log file segment:63
... snip irrelevant checkpoint info ...
Database block size:  8192
Blocks per segment of large relation: 131072
Maximum length of identifiers:64
Maximum number of function arguments: 32
Date/time type storage:   64-bit integers
Maximum length of locale name:128
LC_COLLATE:   C
LC_CTYPE: C

Inside that cluster, there are several databases using different encodings:

template1=# \l
 List of databases
Name |  Owner   | Encoding 
-+--+--
 db1 | ownera   | UNICODE
 db2 | ownera   | UNICODE
 db3 | ownerb   | LATIN2
 db4 | ownerc   | LATIN2
 db5 | ownera   | LATIN2
 db6 | ownera   | LATIN2
 template0   | postgres | UNICODE
 template1   | postgres | UNICODE

Collate order for those databases, however, needs to be different. Obviously,
db3, db4, db5 and db6 will want to use some collate ordering scheme based on
ISO-8859-2, whereas the other two could possibly have table- or even column-
based collate requirements, as they contain unicode data in UTF-8 encoding,
which doesn't give any warranties wrt the contents of these databases.

Producing a list of requirements and/or imposing certain conventions on the
format of data stored in those tables is outside the scope of my authorities,
the only reasonable assumption I can make is that these databases could be
limited to one collating order per database (or several databases, as it is)
without much hassle.

Also, running several postmasters on this same machine is not an option, as
it only has 1.5GB RAM, of which only 1GB is available for cache (~260MB is
swapped after roughly a month's uptime, but that doesn't change much after
settling down in a week or two).

My question to the list would be the following:

Is it possible to do either of these things that could solve this problem
adequately:

- somehow manage to make one postmaster run on top of two separate
  database clusters that would each have a different collate ordering
  scheme

- use some other method of initializing one database from a different
  template and taking with it LC_COLLATE setting (I suppose not, as
  the ${PGDATA}/global/ directory is global to the cluster)

- use a patch that would add such functionality or upgrade to a version
  (even if release-candidate, beta is not really an option, i gather)
  of PostgreSQL that supported it

- in absence of any other viable solution, change the global setting of
  the database cluster without having to dump/reinitdb/restore it

I thank you in advance for your valuable input,
-- 
Grega Bremec
Senior Administrator
Noviforum Ltd., Software  Media
http://www.noviforum.si/


pgp2j0EXb4ihK.pgp
Description: PGP signature


Re: [HACKERS] Refined LC_COLLATE or multiple database clusters?

2004-06-09 Thread Honza Pazdziora
On Wed, Jun 09, 2004 at 12:33:03PM +0200, Grega Bremec wrote:
 
 Collate order for those databases, however, needs to be different. Obviously,

[...]

 Is it possible to do either of these things that could solve this problem
 adequately:
 
 - somehow manage to make one postmaster run on top of two separate
   database clusters that would each have a different collate ordering
   scheme
 
 - use some other method of initializing one database from a different
   template and taking with it LC_COLLATE setting (I suppose not, as
   the ${PGDATA}/global/ directory is global to the cluster)
 
 - use a patch that would add such functionality or upgrade to a version
   (even if release-candidate, beta is not really an option, i gather)
   of PostgreSQL that supported it
 
 - in absence of any other viable solution, change the global setting of
   the database cluster without having to dump/reinitdb/restore it

If you do not need the collating sequence to affect index operations,
you can use nls_string function to sort using

order by nls_string(column, 'sl_SI.utf-8')

where any locale can be specified in the runtime. The nls_string
result can also be used to compare strings in collating manner, however,
indexes will not be used in that case, which may or may not be a
problem for your usage.

The source of nls_string with installation instructions can be found
at

http://www.fi.muni.cz/~adelton/l10n/

-- 

 Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/
 .project: Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ...
Only self-confident people can be simple.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly