Also, check out Nutch's language identification stuff:
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageidentifier/>
Zhaohui Li wrote:
Basis Technology has a commercial product Rosette Language Identifier
to identify the input language. If you are interested in, you can
send ema
AM
To: Lucene Developers List
Subject: Re: special character with lucene
Usually the text is in one specific language. English, German, Spanish,
French, ...
However, I dont really have a runtime identifier which language it is. I
could only pick a few words and decide from there (?) - if this is
TED]>
28.02.2005 17:04
An
[EMAIL PROTECTED]
Kopie
Thema
Re: special character with lucene
On Monday 28 February 2005 16:36, [EMAIL PROTECTED] wrote:
> In a simple test I noticed that StandardAnalyzer removes special
> characters like ä, ö, ...
It doesn't do that on my system (con
à Objet
"Lucene Re: special character with lucene
Developers List"
<[EMAIL PROTECTED]
lipp
Erik Hatcher <[EMAIL PROTECTED]>
28.02.2005 16:17
Bitte antworten an
"Lucene Developers List"
An
"Lucene Developers List"
Kopie
Thema
Re: special character with lucene
On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote:
> Hello,
> I wo
On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote:
Hello,
I would like to build a search engine using several different
languages -
f.e. Spanish names, French names, ...
Will your text be a mix of languages within a single field? Or would
each document (or field) be a single language?
- Usi
Hello,
I would like to build a search engine using several different languages -
f.e. Spanish names, French names, ...
- Using a different analyzer for each language would be one solution.
- But how about replacing each special character (Umlaute, ...ä, ö, ...)
with its html special character b