Re: [HACKERS] Server-side support of all encodings

2007-06-24 Thread William ZHANG
Tom Lane [EMAIL PROTECTED] ITAGAKI Takahiro [EMAIL PROTECTED] writes: PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it? Very much so --- they aren't safe ASCII-supersets, and thus for example the

Re: [HACKERS] Server-side support of all encodings

2007-06-24 Thread Tom Lane
William ZHANG [EMAIL PROTECTED] writes: Sorry. I still cannot understand why backend encodings must have this property. AFAIK, the parser treats characters as ASCII. So any multi-byte characters will be treated as two or more ASCII characters. But if the multi-byte encoding doesnot use any

Re: [HACKERS] Server-side support of all encodings

2007-04-15 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: I think the best way to proceed is probably to fix this in HEAD but not back-patch it. During a dump and reload the encoding can be corrected to something safe. Ok. Shall I go ahead and remove JOHAB in HEAD? +1 for me.

Re: [HACKERS] Server-side support of all encodings

2007-04-15 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: BTW, do we have to modify pg_dump or pg_restore so that it can automatically adjust JOHAB to UTF8 (it's the only safe encoding compatible with JOHAB)? I'm not sure it's worth the trouble. Maybe documenting in the release note is enough? Do we actually

Re: [HACKERS] Server-side support of all encodings

2007-04-14 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: Sigh. From the first day when JOHAB was supported (back to 7.3 days), it should had not been in the server encodings. JOHAB's second byte definitely contain 0x41 and above. *johab*.map just reflect the fact. I think we should remove JOHAB from the

Re: [HACKERS] Server-side support of all encodings

2007-04-14 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: I think the best way to proceed is probably to fix this in HEAD but not back-patch it. During a dump and reload the encoding can be corrected to something safe. Ok. Shall I go ahead and remove JOHAB in HEAD? +1 for me. regards,

Re: [HACKERS] Server-side support of all encodings

2007-03-29 Thread Dezso Zoltan
pointers on how to proceed?) Thank you, Zaki -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Monday, March 26, 2007 11:20 AM To: ITAGAKI Takahiro Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Server-side support of all encodings

Re: [HACKERS] Server-side support of all encodings

2007-03-29 Thread Martijn van Oosterhout
On Wed, Mar 28, 2007 at 10:44:00AM +0900, Dezso Zoltan wrote: My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special

Re: [HACKERS] Server-side support of all encodings

2007-03-29 Thread Tatsuo Ishii
Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with

Re: [HACKERS] Server-side support of all encodings

2007-03-29 Thread Tatsuo Ishii
Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with

Re: [HACKERS] Server-side support of all encodings

2007-03-26 Thread Tatsuo Ishii
ITAGAKI Takahiro [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Backend encodings must have the property that all bytes of a multibyte character are = 128. But then, PG_JOHAB have already infringed it. Please see johab_to_utf8.map. Trailing bytes of JOHAB can be less than

Re: [HACKERS] Server-side support of all encodings

2007-03-26 Thread Tatsuo Ishii
ITAGAKI Takahiro [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Backend encodings must have the property that all bytes of a multibyte character are = 128. But then, PG_JOHAB have already infringed it. Please see johab_to_utf8.map. Trailing bytes of JOHAB can be less than

Re: [HACKERS] Server-side support of all encodings

2007-03-26 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: Sigh. From the first day when JOHAB was supported (back to 7.3 days), it should had not been in the server encodings. JOHAB's second byte definitely contain 0x41 and above. *johab*.map just reflect the fact. I think we should remove JOHAB from the server

Re: [HACKERS] Server-side support of all encodings

2007-03-25 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes: PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it? Very much so --- they aren't safe ASCII-supersets, and thus for example the parser will fail on them.

Re: [HACKERS] Server-side support of all encodings

2007-03-25 Thread ITAGAKI Takahiro
Tom Lane [EMAIL PROTECTED] wrote: PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it? Very much so --- they aren't safe ASCII-supersets, and thus for example the parser will fail on them.

Re: [HACKERS] Server-side support of all encodings

2007-03-25 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Backend encodings must have the property that all bytes of a multibyte character are = 128. But then, PG_JOHAB have already infringed it. Please see johab_to_utf8.map. Trailing bytes of JOHAB can be less than 128.

Re: [HACKERS] Server-side support of all encodings

2007-03-25 Thread Ioseph Kim
Subject: Re: [HACKERS] Server-side support of all encodings Tom Lane [EMAIL PROTECTED] wrote: PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it? Very much so --- they aren't safe ASCII-supersets