I don't believe it is valid to ignore CJK characters above U+20000. If it is used for names, it will be stored in the database. If the behaviour is different from characters below U+FFFF, you will get a bug report in meanwhile.
see CJK Extension B, C, and D from http://www.unicode.org/charts/ Also, there are some code points that could be regarded alphabet and numbers http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols On the other hand, it is ok if processing of characters above U+10000 is very slow, as far as properly processed, because it is considered rare. On 2012/02/17, at 23:56, Andrew Dunstan wrote: > > > On 02/17/2012 09:39 AM, Tom Lane wrote: >> Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> writes: >>> Here's a wild idea: keep the class of each codepoint in a hash table. >>> Initialize it with all codepoints up to 0xFFFF. After that, whenever a >>> string contains a character that's not in the hash table yet, query the >>> class of that character, and add it to the hash table. Then recompile >>> the whole regex and restart the matching engine. >>> Recompiling is expensive, but if you cache the results for the session, >>> it would probably be acceptable. >> Dunno ... recompiling is so expensive that I can't see this being a win; >> not to mention that it would require fundamental surgery on the regex >> code. >> >> In the Tcl implementation, no codepoints above U+FFFF have any locale >> properties (alpha/digit/punct/etc), period. Personally I'd not have a >> problem imposing the same limitation, so that dealing with stuff above >> that range isn't really a consideration anyway. > > > up to U+FFFF is the BMP which is described as containing "characters for > almost all modern languages, and a large number of special characters." It > seems very likely to be acceptable not to bother about the locale of code > points in the supplementary planes. > > See <http://en.wikipedia.org/wiki/Plane_%28Unicode%29> for descriptions of > which sets of characters are involved. > > > cheers > > andrew > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers