RE: Single Analyzer for multiple European languages

2005-09-27 Thread Madhu Satyanarayana Panitini
:08 PM To: java-user@lucene.apache.org Subject: Re: Single Analyzer for multiple European languages On Mon, 26 Sep 2005, Andrzej Bialecki wrote: | Shashikant Kore wrote: | | > Search: | > - Get the superset of stopwords by merging the stopwords from all the | > languages. | | This ste

Re: Single Analyzer for multiple European languages

2005-09-27 Thread Endre Stølsvik
On Mon, 26 Sep 2005, Andrzej Bialecki wrote: | Shashikant Kore wrote: | | > Search: | > - Get the superset of stopwords by merging the stopwords from all the | > languages. | | This step doesn't make sense. Stopwords ARE language specific. A stopword in | one language may be a valid content word

Re: Single Analyzer for multiple European languages

2005-09-26 Thread Andrzej Bialecki
Shashikant Kore wrote: Search: - Get the superset of stopwords by merging the stopwords from all the languages. This step doesn't make sense. Stopwords ARE language specific. A stopword in one language may be a valid content word in another language - e.g. English stopwords "is, by, far" mea

Single Analyzer for multiple European languages

2005-09-26 Thread Shashikant Kore
Hi, I plan to use lucene to index documents in multiple languages (ie. each document in more than one European language) as follows. Index: - Before indexing find the language of the document (using Nutch's Language Identifier) - Use the Analyzer for that language to index the document. Analyzer