Currently tsearch2 does not accept non ascii stop words if locale is
C. Included patches should fix the problem. Patches against PostgreSQL
8.2.3.
I'm not sure about correctness of patch's description.
First, p_islatin() function is used only in words/lexemes parser, not stop-word
code. Second, p_islatin() function is used for catching lexemes like URL or HTML
entities, so, it's important to define real latin characters. And it works
right: it calls p_isalpha (already patched for your case), then it calls
p_isascii which should be correct for any encodings with C-locale.
Third (and last):
contrib_regression=# show server_encoding;
server_encoding
-----------------
UTF8
contrib_regression=# show lc_ctype;
lc_ctype
----------
C
contrib_regression=# select lexize('ru_stem_utf8', RUSSIAN_STOP_WORD);
lexize
--------
{}
Russian characters with UTF8 take two bytes.
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings