Re: how to Index and Search non-Eglish Text in solr

Mohammad Shariq Thu, 09 Jun 2011 22:06:16 -0700

Thanks Erick for your help.
I have another silly question.
Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese,
news_Japnese etc.
after creating these field, can I copy all these to CopyField "*defaultquery"
*like below :


*<copyField source="news_English" dest="defaultquery"/>
<copyField source="news_Chinese" dest="defaultquery"/>
<copyField source="news_Japnese" dest="defaultquery"/>

*and my "defaultquery" looks like :*
<field name="defaultquery" type="query_text" indexed="false" stored="false"
multiValued="true"/>

*Is this right way to deal  with multiple language Indexing and searching* *
???*

*


On 9 June 2011 19:06, Erick Erickson <erickerick...@gmail.com> wrote:

> No, you'd have to create multiple fieldTypes, one for each language....
>
> Best
> Erick
>
> On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq <shariqn...@gmail.com>
> wrote:
> > Can I specify multiple language in filter tag in schema.xml ???  like
> below
> >
> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> >   <analyzer type="index">
> >      <tokenizer class="solr.
> > WhitespaceTokenizerFactory"/>
> >      <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >      <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> >
> > <filter class="solr.SnowballPorterFilterFactory" language="Dutch" />
> > <filter class="solr.SnowballPorterFilterFactory" language="English" />
> > <filter class="solr.SnowballPorterFilterFactory" language="Chinese" />
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <tokenizer class="solr.CJKTokenizerFactory"/>
> >
> >
> >
> >      <filter class="solr.LowerCaseFilterFactory"/><filter
> > class="solr.SnowballPorterFilterFactory" language="Hungarian" />
> >
> >
> > On 8 June 2011 18:47, Erick Erickson <erickerick...@gmail.com> wrote:
> >
> >> This page is a handy reference for individual languages...
> >> http://wiki.apache.org/solr/LanguageAnalysis
> >>
> >> But the usual approach, especially for Chinese/Japanese/Korean
> >> (CJK) is to index the content in different fields with language-specific
> >> analyzers then spread your search across the language-specific
> >> fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
> >> particularly give "surprising" results if you put words from different
> >> languages in the same field.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq <shariqn...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles
> in
> >> > English, but my requirement extend to index the news of other
> languages
> >> too.
> >> >
> >> > This is how my schema looks :
> >> > <field name="news" type="text" indexed="true" stored="false"
> >> > required="false"/>
> >> >
> >> >
> >> > And the "text" Field in schema.xml looks like :
> >> >
> >> > <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
> >> >    <analyzer type="index">
> >> >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >       <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" enablePositionIncrements="true"/>
> >> >       <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1"
> >> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >> > catenateAll="0" splitOnCaseChange="1"/>
> >> >       <filter class="solr.LowerCaseFilterFactory"/>
> >> >       <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> >> > protected="protwords.txt"/>
> >> >    </analyzer>
> >> >    <analyzer type="query">
> >> >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >       <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> >> > ignoreCase="true" expand="true"/>
> >> >       <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" enablePositionIncrements="true"/>
> >> >       <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1"
> >> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> >> > catenateAll="0" splitOnCaseChange="1"/>
> >> >       <filter class="solr.LowerCaseFilterFactory"/>
> >> >       <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> >> > protected="protwords.txt"/>
> >> >    </analyzer>
> >> > </fieldType>
> >> >
> >> >
> >> > My Problem is :
> >> > Now I want to index the news articles in other languages to e.g.
> >> > Chinese,Japnese.
> >> > How I can I modify my text field so that I can Index the news in other
> >> lang
> >> > too and make it searchable ??
> >> >
> >> > Thanks
> >> > Shariq
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
> >
>



-- 
Thanks and Regards
Mohammad Shariq

Re: how to Index and Search non-Eglish Text in solr

Reply via email to