Re: a question for french analyzer

2007-07-30 Thread Erick Erickson
<<>> Yes, the character set we use is, as I remember, MARC-8. Which I don't think is the ISOLatin, but since I didn't know about that filter when we had our problem, I didn't even look. Oh well, smarter/braver/lazier next time ... Which is why I love this list, I find things like this and loo

RE: a question for french analyzer

2007-07-30 Thread Renaud Waldura
etter test to be sure... --Renaud -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Monday, July 30, 2007 1:36 PM To: java-user@lucene.apache.org Subject: Re: a question for french analyzer Hi, Erick, I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir'

Re: a question for french analyzer

2007-07-30 Thread Chris Lu
Hi, Erick, I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir's tip, and it works great! And I think it's the right way to go. Problems like "You have to store the data raw for display purposes if you want the accents to show though" will go away since Analyzer already have the origin

Re: a question for french analyzer

2007-07-30 Thread Chris Lu
Hi, Samir, Thanks a lot for this tip! It works great! -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Crea

RE: a question for french analyzer

2007-07-30 Thread Samir Abdou
Hi, Take a look to the class ISOLatin1AccentFilter ! Add this to your analyzer and it should work ! Hope this will help, Samir -Message d'origine- De : Chris Lu [mailto:[EMAIL PROTECTED] Envoyé : lundi, 30. juillet 2007 20:06 À : java-user@lucene.apache.org Objet : a question for frenc

Re: a question for french analyzer

2007-07-30 Thread Erick Erickson
Gosh, I sure hope not, because that would mean that we rolled our own for no good reason. We wound up just collapsing the input stream by substituting plain old 'e' for all the accented variants before indexing and before searching. Be *really* careful what character set you're using. Actually, we