Tom Lane wrote: > Something that was annoying me yesterday was that it was not clear > whether we had fixed every single place that uses a tsearch config file > to assume that the file is in UTF8 and should be converted to database > encoding. So I was thinking of hardwiring the "recode" part into > readstopwords, and using wordop just for the "lowercase" part, which > seemed to me like a saner division of labor. That is, UTF8 is a policy > that we want to enforce globally, but lowercasing maybe not, and this > still leaves the door open for more processing besides lowercasing.
I think we also want to always run input files through pg_verify_mbstr. We do it for stopwords, and synonym files (though incorrectly), but not for thesaurus files or ispell files. It's probably best to do that within the recode-function as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org