special character with lucene

2005-02-28 Thread Philipp_Breuss
Hello, I would like to build a search engine using several different languages - f.e. Spanish names, French names, ... - Using a different analyzer for each language would be one solution. - But how about replacing each special character (Umlaute, ...ä, ö, ...) with its html special character

Re: special character with lucene

2005-02-28 Thread Erik Hatcher
On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote: Hello, I would like to build a search engine using several different languages - f.e. Spanish names, French names, ... Will your text be a mix of languages within a single field? Or would each document (or field) be a single language? -

Re: special character with lucene

2005-02-28 Thread Philipp_Breuss
Hatcher [EMAIL PROTECTED] 28.02.2005 16:17 Bitte antworten an Lucene Developers List lucene-dev@jakarta.apache.org An Lucene Developers List lucene-dev@jakarta.apache.org Kopie Thema Re: special character with lucene On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote: Hello, I would

Re: special character with lucene

2005-02-28 Thread slagraulet
Objet Lucene Re: special character with lucene Developers List [EMAIL PROTECTED] ta.apache.org

WG: Re: special character with lucene

2005-02-28 Thread Philipp_Breuss
] 28.02.2005 17:04 An [EMAIL PROTECTED] Kopie Thema Re: special character with lucene On Monday 28 February 2005 16:36, [EMAIL PROTECTED] wrote: In a simple test I noticed that StandardAnalyzer removes special characters like ä, ö, ... It doesn't do that on my system (configured for UTF-8

RE: special character with lucene

2005-02-28 Thread Zhaohui Li
AM To: Lucene Developers List Subject: Re: special character with lucene Usually the text is in one specific language. English, German, Spanish, French, ... However, I dont really have a runtime identifier which language it is. I could only pick a few words and decide from

Re: special character with lucene

2005-02-28 Thread Steven Rowe
Also, check out Nutch's language identification stuff: URL:http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageidentifier/ Zhaohui Li wrote: Basis Technology has a commercial product Rosette Language Identifier to identify the input language. If you are interested in, you can send