subject:"Re\: \[HACKERS\] How does the tsearch configuration get selected\?"

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-16 Thread Bruce Momjian

Teodor Sigaev wrote: So, added to my plan (http://archives.postgresql.org/pgsql-hackers/2007-06/msg00618.php) n) single encoded files. That will touch snowball, ispell, synonym, thesaurus and simple dictionaries n+1) use encoding names instead of locale's names in configuration FYI, I

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

Probably, having default text search configuration is not a good idea and we could just require it as a mandatory parameter, which could eliminate many confusion with selecting text search configuration. Ugh. Having default configuration (by locale or by postgresql.conf or some other way)

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Bruce Momjian

Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: First, why are we specifying the server locale here since it never changes: It's poorly described. What it should really say is the language that the text-to-be-searched is in. We can actually support multiple languages here

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Bruce Momjian

Bruce Momjian wrote: My guess right now is that we use a GUC that will default if a pg_catalog configuration name matches the lc_ctype locale name, and we have to throw an error if an accessed index creation GUC doesn't match the current GUC. So we create a pg_catalog full text

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

1) Require the configuration to be always specified. The problem with this is that casting (::tsquery) and operators (@@) have no way to specify a configuration. it's not comfortable for most often cases 2) Use a GUC that you can set for the configuration, and perhaps default it if

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes: My guess right now is that we use a GUC that will default if a pg_catalog configuration name matches the lc_ctype locale name, and we have to throw an error if an accessed index creation GUC doesn't match the current GUC. Where will index store index

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

I'd suggest allowing either full names (swedish) or the standard two-letter abbreviations (sv). But let's stay away from locale names. We can use database's encoding name (the same names used in initdb -E) -- Teodor Sigaev E-mail: [EMAIL PROTECTED]

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes: Do locale names vary across operating systems? Yes, which is the fatal flaw in the whole thing. The ru_RU part is reasonably well standardized, but the encoding part is not. Considering that encoding is exactly the part of it we don't care about for this

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes: I'd suggest allowing either full names (swedish) or the standard two-letter abbreviations (sv). But let's stay away from locale names. We can use database's encoding name (the same names used in initdb -E) AFAICS the encoding name shouldn't be anywhere

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes: It's not really the index's problem; IIUC the behavior of the gist and gin index opclasses is not locale-specific. It's the to_tsvector calls that built the tsvector heap column that have a locale specified or implicit. We need some way of annotating the

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: It's not really the index's problem; IIUC the behavior of the gist and gin index opclasses is not locale-specific. It's the to_tsvector calls that built the tsvector heap column that have a locale specified or

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

The only reason the TS stuff needs an encoding spec is to figure out how to read an external stop word file. I think my suggestion upthread is a lot better: have just one stop word file per language, store them all in UTF8, and convert to database encoding when loading them. The database Hmm.

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

It's not really the index's problem; IIUC the behavior of the gist and gin index opclasses is not locale-specific. Right It's the to_tsvector calls that built the tsvector heap column that have a locale specified or implicit. We need some way of annotating the heap column about this. It

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes: It's the to_tsvector calls that built the tsvector heap column that have a locale specified or implicit. We need some way of annotating the heap column about this. It seems too restrictive to advanced users. Hm, are you trying to say that it's sane to

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes: Hmm. You mean to use language name in configuration, use current encoding to define which dictionary should be used (stemmers for the same language are different for different encoding) and recode dictionaries file from UTF8 to current locale. Did I

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

Hm, are you trying to say that it's sane to have different tsvectors in a column computed under different language settings? Maybe we're all Yes, I think so. That might have sense for close languages. Norwegian languages has two dialects and one of them has advanced rules for compound words,

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

So, added to my plan (http://archives.postgresql.org/pgsql-hackers/2007-06/msg00618.php) n) single encoded files. That will touch snowball, ispell, synonym, thesaurus and simple dictionaries n+1) use encoding names instead of locale's names in configuration Tom Lane wrote: Teodor Sigaev

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

One possibility is that the user-visible specification is just a name (eg, english), but the actual filename out on the filesystem is, say, name.encoding.stop (eg, english.utf8.stop) where we use PG's names for the encodings. We could just fail if there's not a file matching the database

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] writes: But configuration for different languages might be differ, for example russian (and any cyrillic-based) configuration is differ from west-european configuration based on different character sets. Sure. I'm just assuming that the set of stopwords doesn't

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

Sure. I'm just assuming that the set of stopwords doesn't need to vary depending on the encoding you're using for a language --- that is, if you're willing to convert the encoding then the same stopword list file should serve for all encodings of a given language. Do you think this might be

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Gregory Stark

Teodor Sigaev [EMAIL PROTECTED] writes: Hm, are you trying to say that it's sane to have different tsvectors in a column computed under different language settings? Maybe we're all Yes, I think so. That might have sense for close languages. Norwegian languages has two dialects and one

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-15 Thread Teodor Sigaev

To support this sanely though wouldn't you need to know which language rule a tsvector was generated with? Like, have a byte in the tsvector tagging it with the language rule forever more? No. As corner case, dictionary might return just a number or a hash value. What I'm wondering about is

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes: First, why are we specifying the server locale here since it never changes: It's poorly described. What it should really say is the language that the text-to-be-searched is in. We can actually support multiple languages here today, the restriction being

Re: [HACKERS] How does the tsearch configuration get selected?

2007-06-14 Thread Oleg Bartunov

On Thu, 14 Jun 2007, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: First, why are we specifying the server locale here since it never changes: server's locale is used just for one purpose - to select what text search configuration to use by default. Any text search functions can

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

Re: [HACKERS] How does the tsearch configuration get selected?

24 matches

Site Navigation

Mail list logo

Footer information