Hello,
I would like to build a search engine using several different languages -
f.e. Spanish names, French names, ...
- Using a different analyzer for each language would be one solution.
- But how about replacing each special character (Umlaute, ...ä, ö, ...)
with its html special character
On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote:
Hello,
I would like to build a search engine using several different
languages -
f.e. Spanish names, French names, ...
Will your text be a mix of languages within a single field? Or would
each document (or field) be a single language?
-
Hatcher [EMAIL PROTECTED]
28.02.2005 16:17
Bitte antworten an
Lucene Developers List lucene-dev@jakarta.apache.org
An
Lucene Developers List lucene-dev@jakarta.apache.org
Kopie
Thema
Re: special character with lucene
On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote:
Hello,
I would
Objet
Lucene Re: special character with lucene
Developers List
[EMAIL PROTECTED]
ta.apache.org
]
28.02.2005 17:04
An
[EMAIL PROTECTED]
Kopie
Thema
Re: special character with lucene
On Monday 28 February 2005 16:36, [EMAIL PROTECTED] wrote:
In a simple test I noticed that StandardAnalyzer removes special
characters like ä, ö, ...
It doesn't do that on my system (configured for UTF-8
AM
To: Lucene Developers List
Subject: Re: special character with lucene
Usually the text is in one specific language. English, German, Spanish,
French, ...
However, I dont really have a runtime identifier which language it is. I
could only pick a few words and decide from
Also, check out Nutch's language identification stuff:
URL:http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageidentifier/
Zhaohui Li wrote:
Basis Technology has a commercial product Rosette Language Identifier
to identify the input language. If you are interested in, you can
send