Can I specify multiple language in filter tag in schema.xml ??? like below
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr. WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.SnowballPorterFilterFactory" language="Dutch" /> <filter class="solr.SnowballPorterFilterFactory" language="English" /> <filter class="solr.SnowballPorterFilterFactory" language="Chinese" /> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <tokenizer class="solr.CJKTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/><filter class="solr.SnowballPorterFilterFactory" language="Hungarian" /> On 8 June 2011 18:47, Erick Erickson <erickerick...@gmail.com> wrote: > This page is a handy reference for individual languages... > http://wiki.apache.org/solr/LanguageAnalysis > > But the usual approach, especially for Chinese/Japanese/Korean > (CJK) is to index the content in different fields with language-specific > analyzers then spread your search across the language-specific > fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords > particularly give "surprising" results if you put words from different > languages in the same field. > > Best > Erick > > On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq <shariqn...@gmail.com> > wrote: > > Hi, > > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in > > English, but my requirement extend to index the news of other languages > too. > > > > This is how my schema looks : > > <field name="news" type="text" indexed="true" stored="false" > > required="false"/> > > > > > > And the "text" Field in schema.xml looks like : > > > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" language="English" > > protected="protwords.txt"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" > > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > > catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" language="English" > > protected="protwords.txt"/> > > </analyzer> > > </fieldType> > > > > > > My Problem is : > > Now I want to index the news articles in other languages to e.g. > > Chinese,Japnese. > > How I can I modify my text field so that I can Index the news in other > lang > > too and make it searchable ?? > > > > Thanks > > Shariq > > > > > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Thanks and Regards Mohammad Shariq