On Fri, 22 Jun 2007, Bruce Momjian wrote:
Tom Lane wrote:
Alvaro Herrera <[EMAIL PROTECTED]> writes:
I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.
Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to. The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.
OK, and the open question is when do we do this default setting. If we
do it in initdb then we can isolate all the detection there.
We can do that at initdb time, but we still have to decide how to map
human-readable language name and lang part of locale name. Are we going
to hardcode it ?
It's not friendly for hosting solution, when people often have no access
to the postgresql.conf, so they need to remember setting tsearch_conf_name.
It could be solved using 'alter user ... set tsearch_conf_name' command though.
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend