john knightley <john.knight...@gmail.com> writes: > The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on > a utf8 local
> A short 5 line dictionary file is sufficient to test:- > raeuz > æä»¬ > ð¦ð¥µ > ðª½ð« > ó¶ó´®¬ > line 1 "raeuz" Zhuang word written using English letters and show up > under ts_vector ok > line 2 "æä»¬" uses everyday Chinese word and show up under ts_vector ok > line 3 "ð¦ð¥µ" Zhuang word written using rather old Chinese charcters > found in Unicode 3.1 which came in about the year 2000 and show up > under ts_vector ok > line 4 "ðª½ð«" Zhuang word written using rather old Chinese charcters > found in Unicode 5.2 which came in about the year 2009 but do not show > up under ts_vector ok > line 5 "ó¶ó´®¬" Zhuang word written using rather old Chinese charcters > found in PUA area of the font Sawndip.ttf but do not show up under > ts_vector ok (Font can be downloaded from > http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf) AFAIK there is nothing in Postgres itself that would distinguish, say, ð¦ from ðª½. I think this must be down to your platform's locale definition: it probably thinks that the former is a letter and the latter is not. You'd have to gripe to the locale maintainers to get that fixed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers