Re: [HACKERS] charset/collation in values

2004-11-02 Thread Peter Eisentraut
Am Montag, 1. November 2004 07:41 schrieb Dennis Bjorklund:
 For each type we need to have convertion functions to and from strings.
 Any suggestion of how to represent these as strings now when it's a string
 plus two oid's? This is a though one..

A collation implies a character set, so you only need to store one piece of 
information anyway.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Dennis Bjorklund
On Tue, 2 Nov 2004, Peter Eisentraut wrote:

 A collation implies a character set, so you only need to store one piece of 
 information anyway.

No, a collation implies a character repertoire like UCS (unicode), it can
apply to several character sets like UTF8 and UTF16.

One can enumerate all combinations if one want to, as suggested
previously.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Tatsuo Ishii
 Am Montag, 1. November 2004 07:41 schrieb Dennis Bjorklund:
  For each type we need to have convertion functions to and from strings.
  Any suggestion of how to represent these as strings now when it's a string
  plus two oid's? This is a though one..
 
 A collation implies a character set, so you only need to store one piece of 
 information anyway.

In my understanding the relation between charset and collation is
1:N. Thus storing only a collation is sufficient to determine the
charset. However a charset cannot determine a collation.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Peter Eisentraut
Am Dienstag, 2. November 2004 13:15 schrieb Dennis Bjorklund:
 On Tue, 2 Nov 2004, Peter Eisentraut wrote:
  A collation implies a character set, so you only need to store one piece
  of information anyway.

 No, a collation implies a character repertoire like UCS (unicode), it can
 apply to several character sets like UTF8 and UTF16.

For the theoretical specification of a collation, it might suffice to know the 
character repertoire.  But I think in practice, the implementation of a 
collation will require knowing the specific character encoding.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Peter Eisentraut
Am Dienstag, 2. November 2004 13:53 schrieb Tatsuo Ishii:
 In my understanding the relation between charset and collation is
 1:N. Thus storing only a collation is sufficient to determine the
 charset. However a charset cannot determine a collation.

Exactly.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Dennis Bjorklund
On Tue, 2 Nov 2004, Peter Eisentraut wrote:

 For the theoretical specification of a collation, it might suffice to
 know the character repertoire.  But I think in practice, the
 implementation of a collation will require knowing the specific
 character encoding.

The named entity that is called a collation works for a character
repertoire. It would need to handle different charsets for that repertoire
of course. So there would be one collation called say ucs_sv and not 
utf8_sv, utf16_sv, utf32_sv.

Anyway, this is not a problem.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Peter Eisentraut
Dennis Bjorklund wrote:
 The named entity that is called a collation works for a character
 repertoire. It would need to handle different charsets for that
 repertoire of course. So there would be one collation called say
 ucs_sv and not utf8_sv, utf16_sv, utf32_sv.

Again, theoretically, this might work, but I doubt that this is a 
practical implementation.  Moreover, since Unicode is more or less the 
only chararacter repertoire that have more than one encoding in use, 
and neither UTF-16 nor UTF-32 can be used inside the PostgreSQL server 
(embedded zero bytes etc.), this is really a nonissue.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] charset/collation in values

2004-11-02 Thread Tatsuo Ishii
 Dennis Bjorklund wrote:
  The named entity that is called a collation works for a character
  repertoire. It would need to handle different charsets for that
  repertoire of course. So there would be one collation called say
  ucs_sv and not utf8_sv, utf16_sv, utf32_sv.
 
 Again, theoretically, this might work, but I doubt that this is a 
 practical implementation.  Moreover, since Unicode is more or less the 
 only chararacter repertoire that have more than one encoding in use, 
 and neither UTF-16 nor UTF-32 can be used inside the PostgreSQL server 
 (embedded zero bytes etc.), this is really a nonissue.

I agree with Peter.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Thomas Hallgren
Dennis Bjorklund wrote:
I've looked into storing charset/collation in the string values. This
means that we change varchar/text/BpChar to be structures that have a
charset oid field and a collation oid field, the rest of the Datum is the
string data.
I think the number of charset/collation combinations will be relatively 
few so perhaps it would be space efficient to maintain a table where 
each combination is given an oid and have string values store that 
rather than two separate oid's?

Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
 I think the number of charset/collation combinations will be relatively 
 few so perhaps it would be space efficient to maintain a table where 
 each combination is given an oid and have string values store that 
 rather than two separate oid's?

In fact, we should do our best to get the overhead down to 1 or 2 bytes.
Two OIDs (8 bytes) is ridiculous.

I'm not sure if 1 byte is enough or not --- there might be more than 256
charsets/collations to support.  2 ought to be plenty though.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Dennis Bjorklund
On Mon, 1 Nov 2004, Tom Lane wrote:

  I think the number of charset/collation combinations will be relatively 
  few so perhaps it would be space efficient to maintain a table where 
  each combination is given an oid and have string values store that 
  rather than two separate oid's?
 
 In fact, we should do our best to get the overhead down to 1 or 2 bytes.
 Two OIDs (8 bytes) is ridiculous.

Just to be clear, we don't want to store it on disk no matter what since
it should be enough to store it once for each column. As a first solution
we could store it just to keep it simple until we have tried it out.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Tatsuo Ishii
 On Mon, 1 Nov 2004, Tom Lane wrote:
 
   I think the number of charset/collation combinations will be relatively 
   few so perhaps it would be space efficient to maintain a table where 
   each combination is given an oid and have string values store that 
   rather than two separate oid's?
  
  In fact, we should do our best to get the overhead down to 1 or 2 bytes.
  Two OIDs (8 bytes) is ridiculous.
 
 Just to be clear, we don't want to store it on disk no matter what since
 it should be enough to store it once for each column. As a first solution
 we could store it just to keep it simple until we have tried it out.

Right. AFAIK nobody has proposed charsets/collations onto disk.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Thomas Hallgren
Tatsuo Ishii wrote:
Right. AFAIK nobody has proposed charsets/collations onto disk.
--
My apologies in that case. I triggered on Dennis wording If we want to 
avoid storing charset/collation both in the column type and in each row, 
we would need an extra layer that transforms the Datums before they are 
stored. As a first implementation it's easier to just store everything.

Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] charset/collation in values

2004-11-01 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Right. AFAIK nobody has proposed charsets/collations onto disk.

Oh?

Personally, I'd much sooner eat those few bytes than try to impose a
regime where in-memory representation is different from on-disk.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend