Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-30 Thread Tatsuo Ishii
> Ishii-san, > > >>> Ok, probably we need to copy the English stemming rule to the one for > >>> Japanese. > >> Pardon my ignorance here, but is the concept of stemming even relevant > >> to Japanese/Chinese/Korean? What little I know about ideographic > >> languages suggests it wouldn't work wel

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-30 Thread Josh Berkus
Ishii-san, Ok, probably we need to copy the English stemming rule to the one for Japanese. Pardon my ignorance here, but is the concept of stemming even relevant to Japanese/Chinese/Korean? What little I know about ideographic languages suggests it wouldn't work well. And surely the specific

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander
On 6/25/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Mike Rylander" <[EMAIL PROTECTED]> writes: > I can certainly understand the benefit of making the default > configuration a simple locale to language map, but there are > definitely uses for searching using different stemmers/stop-lists even > with

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Tom Lane
"Mike Rylander" <[EMAIL PROTECTED]> writes: > I can certainly understand the benefit of making the default > configuration a simple locale to language map, but there are > definitely uses for searching using different stemmers/stop-lists even > within the same corpus/index. So, as a datapoint for

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander
On 6/25/07, Tom Lane <[EMAIL PROTECTED]> wrote: Well, it's not hard at all to find chunks of English text that have embedded bits of French, Spanish, or what-have-you, but that's not an argument for trying to intermix the stemmers. I doubt that such simple bits of program could tell the language

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tatsuo Ishii
> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > Ok, probably we need to copy the English stemming rule to the one for > > Japanese. > > Pardon my ignorance here, but is the concept of stemming even relevant > to Japanese/Chinese/Korean? What little I know about ideographic > languages suggests it

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Ok, probably we need to copy the English stemming rule to the one for > Japanese. Pardon my ignorance here, but is the concept of stemming even relevant to Japanese/Chinese/Korean? What little I know about ideographic languages suggests it wouldn't work

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tatsuo Ishii
> Tatsuo Ishii wrote: > > > japanese '{ja_JP, C}' > > > > How would we know C -> japanese? > > > You can't do that. You can't have different languages (not locales) > mapping to the same 'tsearch language' because the stemmer doesn't know > that a specific word is in english or japanese. So you

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-23 Thread Euler Taveira de Oliveira
Tatsuo Ishii wrote: > japanese '{ja_JP, C}' > > How would we know C -> japanese? > You can't do that. You can't have different languages (not locales) mapping to the same 'tsearch language' because the stemmer doesn't know that a specific word is in english or japanese. So you have two options:

[Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread teodor
>> How would this work for initdb with locale C? > > I'm worrying about that too. english '{en_GB, en_US, C}' I suppose, that locale name always has a dot separator exept C locale --- which is well known exception ---(end of broadcast)--- TIP 1

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread Tatsuo Ishii
> >> How would this work for initdb with locale C? > > > > I'm worrying about that too. > > english '{en_GB, en_US, C}' > > I suppose, that locale name always has a dot separator exept C locale --- > which is well known exception So we would have to?: japanese '{ja_JP, C}' How would we know C