Re: [HACKERS] How does the tsearch configuration get selected?

Tom Lane Fri, 15 Jun 2007 08:44:16 -0700

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> Hmm. You mean to use language name in configuration, use current encoding to
> define which dictionary should be used (stemmers for the same language are 
> different for different encoding) and recode dictionaries file from UTF8 to 
> current locale. Did I understand you right?


Right.

> That's possible to do. But it's incompatible changes and cause some
> difficulties for DBA. If server locale is ISO (or KOI8 or any other)
> and file is in UTF8 then text editor/tools might be confused.

Well, I'm not as worried about that as I am about the database being
confused ;-).  We need some way to deal with stopword files that are in
a different encoding than the database encoding, and this has to be
proof against accidental or malicious mistakes by the non-superuser
users who are going to be able to specify which stopword file to use.
So I don't want the specification that goes into the CREATE DICTIONARY
command to involve an encoding.

One possibility is that the user-visible specification is just a name
(eg, "english"), but the actual filename out on the filesystem is,
say, name.encoding.stop (eg, "english.utf8.stop") where we use PG's
names for the encodings.  We could just fail if there's not a file
matching the database encoding, or we could try that and then try
utf8, or some other rule.  In any case I'd want it to verify and
convert encoding as necessary while reading.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] How does the tsearch configuration get selected?

Reply via email to