Well, maybe we'd be better off, compiling a list of (in?)valid ranges
from the full unicode database 
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and
and with every release of pg, update the detection logic so only valid
characters are allowed?


John Hansen

-----Original Message-----
From: Tatsuo Ishii [mailto:[EMAIL PROTECTED] 
Sent: Saturday, August 07, 2004 8:46 PM
To: John Hansen
Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000 

> Yes, but the specification allows for 6byte sequences, or 32bit 
> characters.

UTF-8 is just an encoding specification, not character set
specification. Unicode only has 17 256x256 planes in its specification.

> As dennis pointed out, just because they're not used, doesn't mean we 
> should not allow them to be stored, since there might me someone using

> the high ranges for a private character set, which could very well be 
> included in the specification some day.

We should expand it to 64-bit since some day the specification might be
changed then:-)

More seriously, Unicode is filled with tons of confusion and
inconsistency IMO. Remember that once Unicode adovocates said that the
merit of Unicode was it only requires 16-bit width. Now they say they
need surrogate pairs and 32-bit width chars...

Anyway my point is if current specification of Unicode only allows
24-bit range, why we need to allow usage against the specification?
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to