Well, maybe we'd be better off, compiling a list of (in?)valid ranges
from the full unicode database
and with every release of pg, update the detection logic so only valid
characters are allowed?
From: Tatsuo Ishii [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 07, 2004 8:46 PM
To: John Hansen
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000
> Yes, but the specification allows for 6byte sequences, or 32bit
UTF-8 is just an encoding specification, not character set
specification. Unicode only has 17 256x256 planes in its specification.
> As dennis pointed out, just because they're not used, doesn't mean we
> should not allow them to be stored, since there might me someone using
> the high ranges for a private character set, which could very well be
> included in the specification some day.
We should expand it to 64-bit since some day the specification might be
More seriously, Unicode is filled with tons of confusion and
inconsistency IMO. Remember that once Unicode adovocates said that the
merit of Unicode was it only requires 16-bit width. Now they say they
need surrogate pairs and 32-bit width chars...
Anyway my point is if current specification of Unicode only allows
24-bit range, why we need to allow usage against the specification?
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings