:08 PM
To: java-user@lucene.apache.org
Subject: Re: Single Analyzer for multiple European languages
On Mon, 26 Sep 2005, Andrzej Bialecki wrote:
| Shashikant Kore wrote:
|
| > Search:
| > - Get the superset of stopwords by merging the stopwords from all
the
| > languages.
|
| This ste
On Mon, 26 Sep 2005, Andrzej Bialecki wrote:
| Shashikant Kore wrote:
|
| > Search:
| > - Get the superset of stopwords by merging the stopwords from all the
| > languages.
|
| This step doesn't make sense. Stopwords ARE language specific. A stopword in
| one language may be a valid content word
Shashikant Kore wrote:
Search:
- Get the superset of stopwords by merging the stopwords from all the languages.
This step doesn't make sense. Stopwords ARE language specific. A
stopword in one language may be a valid content word in another language
- e.g. English stopwords "is, by, far" mea
Hi,
I plan to use lucene to index documents in multiple languages (ie.
each document in more than one European language) as follows.
Index:
- Before indexing find the language of the document (using Nutch's
Language Identifier)
- Use the Analyzer for that language to index the document. Analyzer