On Mon, Oct 1, 2012 at 12:11 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > john knightley <john.knight...@gmail.com> writes: >> The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on >> a utf8 local > >> A short 5 line dictionary file is sufficient to test:- > >> raeuz >> 我们 >> 𦘭𥎵 >> 𪽖𫖂 >> > >> line 1 "raeuz" Zhuang word written using English letters and show up >> under ts_vector ok >> line 2 "我们" uses everyday Chinese word and show up under ts_vector ok >> line 3 "𦘭𥎵" Zhuang word written using rather old Chinese charcters >> found in Unicode 3.1 which came in about the year 2000 and show up >> under ts_vector ok >> line 4 "𪽖𫖂" Zhuang word written using rather old Chinese charcters >> found in Unicode 5.2 which came in about the year 2009 but do not show >> up under ts_vector ok >> line 5 "" Zhuang word written using rather old Chinese charcters >> found in PUA area of the font Sawndip.ttf but do not show up under >> ts_vector ok (Font can be downloaded from >> http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf) > > AFAIK there is nothing in Postgres itself that would distinguish, say, > 𦘭 from 𪽖. I think this must be down to > your platform's locale definition: it probably thinks that the former is > a letter and the latter is not. You'd have to gripe to the locale > maintainers to get that fixed. > > regards, tom lane
PostgreSQL in general does not usually distinguish but full text search does:- select ts_debug('𦘭 from 𪽖'); gives the result:- ts_debug ------------------------------------------------------------------- (word,"Word, all letters",𦘭,{english_stem},english_stem,{𦘭}) (blank,"Space symbols"," ",{},,) (asciiword,"Word, all ASCII",from,{english_stem},english_stem,{}) (blank,"Space symbols"," 𪽖",{},,) (4 rows) Somewhere there is dictionary, or library that is based on @ Unicode 4.0 which includes "𦘭","U+2662d" but not "𫖂","U+2b582" which is Unicode 5.1. Also PUA characters are dropped in the same way by the full text search, which is what google does but which I do not wish to do. Regards John -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers