I don't believe it is valid to ignore CJK characters above U+20000.
If it is used for names, it will be stored in the database.
If the behaviour is different from characters below U+FFFF, you will
get a bug report in meanwhile.

see
CJK Extension B, C, and D
from
http://www.unicode.org/charts/

Also, there are some code points that could be regarded alphabet and numbers
http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols

On the other hand, it is ok if processing of characters above U+10000 is very 
slow, 
as far as properly processed, because it is considered rare.


On 2012/02/17, at 23:56, Andrew Dunstan wrote:

> 
> 
> On 02/17/2012 09:39 AM, Tom Lane wrote:
>> Heikki Linnakangas<heikki.linnakan...@enterprisedb.com>  writes:
>>> Here's a wild idea: keep the class of each codepoint in a hash table.
>>> Initialize it with all codepoints up to 0xFFFF. After that, whenever a
>>> string contains a character that's not in the hash table yet, query the
>>> class of that character, and add it to the hash table. Then recompile
>>> the whole regex and restart the matching engine.
>>> Recompiling is expensive, but if you cache the results for the session,
>>> it would probably be acceptable.
>> Dunno ... recompiling is so expensive that I can't see this being a win;
>> not to mention that it would require fundamental surgery on the regex
>> code.
>> 
>> In the Tcl implementation, no codepoints above U+FFFF have any locale
>> properties (alpha/digit/punct/etc), period.  Personally I'd not have a
>> problem imposing the same limitation, so that dealing with stuff above
>> that range isn't really a consideration anyway.
> 
> 
> up to U+FFFF is the BMP which is described as containing "characters for 
> almost all modern languages, and a large number of special characters." It 
> seems very likely to be acceptable not to bother about the locale of code 
> points in the supplementary planes.
> 
> See <http://en.wikipedia.org/wiki/Plane_%28Unicode%29> for descriptions of 
> which sets of characters are involved.
> 
> 
> cheers
> 
> andrew
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to