Re: [HACKERS] How does the tsearch configuration get selected?

Teodor Sigaev Fri, 15 Jun 2007 09:30:18 -0700

One possibility is that the user-visible specification is just a name
(eg, "english"), but the actual filename out on the filesystem is,
say, name.encoding.stop (eg, "english.utf8.stop") where we use PG's
names for the encodings.  We could just fail if there's not a file
matching the database encoding, or we could try that and then try
utf8, or some other rule.  In any case I'd want it to verify and
convert encoding as necessary while reading.

I have no strong objection for UTF8-encoded files (stop words or ispell orsynonym or thesaurus). Just recode it after reading.

But configuration for different languages might be differ, for example russian(and any cyrillic-based) configuration is differ from west-europeanconfiguration based on different character sets. So, we should have non-obviousrules for stemmers to define which exact stemmer and stop-file should be used.For russian language with utf8 encoding it should use for lword english stemmer,but for italian language - italian stemmer. Any ASCII chars can't present inrussian word, but might italian word can contains only ASCII.




--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [HACKERS] How does the tsearch configuration get selected?

Reply via email to