Thanks Erik! Trouble is, I don't know those languages to conclude that my setup is correct, specially for CJK.
It's less problematic for European languages, but then again, should I be using those English filters with the German SnowballPorterFilterFactory? That is, will WordDelimiterFilterFactory work with a German filter? Etc. It would be nice if folks share their setting (Generic for each language) and then we can add them to a Solr Wiki. -- George > -----Original Message----- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 02, 2008 9:40 PM > To: solr-user@lucene.apache.org > Subject: Re: schema.xml for CJK, German, French, etc. > > > On Jul 2, 2008, at 9:16 PM, George Aroush wrote: > > Has anyone created schema.xml for languages other then English? > > Indeed. > > > I like to > > see a working example mainly for CJK, German and French. > If you have > > can you share them? > > > > TO get me started, I created the following for German: > > > > <fieldtype name="myfieldtype" class="solr.TextField"> > > <analyzer> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt"/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="0" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > > language="German" /> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > </fieldtype> > > > > Will those filters work on German text? > > > One tip that will help is visiting > http://localhost:8983/solr/admin/analysis.jsp > and test it out to see that you're getting the tokenization > that you desire on some sample text. Solr's analysis > introspection is quite nice and easy to tinker with. > > Removing stop words before lower casing won't quite work > though, as StopFilter is case-sensitive with all stop words > generally lowercased, but other than relocating the > StopFilterFactory in that chain it seems reasonable. > > As always, though, it depends on what you want to do with > these languages to offer more concrete recommendations. > > Erik >