Hello Team,
Solr provides some data type out of box in managed schema for different
languages such as english, french, japanies etc.
We are using common data type "text_general" for fields declaration and using
stopwards.txt for stopword filtering.
<fieldType name="text_general" class="solr.TextField"
autoGeneratePhraseQueries="true" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
minGramSize="1"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
While syncing data to Solr core we are importing different languages text in
the fields such as french, english, german etc.
My query is shall we use all different language stopwords into same
"stopwards.txt" file or how solr use different language stopwords?
Warm Regards,
Abhay Kumar | Lead Developer
401/402, Pride Portal, Shivaji Housing Society, Off. S. B. Road | Shivaji
Nagar, Pune-411 016
+91 20 2563 1011 | Mobile: +91 9096644108
anjusoftware.com<https://anjusoftware.com/>
[cid:[email protected]]<https://anjusoftware.com/>[cid:[email protected]]<https://www.linkedin.com/company/anju-software/>[cid:[email protected]]<https://www.facebook.com/Anju-Software-1415613681916676/>[cid:[email protected]]<https://twitter.com/AnjuSoftware>
Confidentiality Notice
====================
This email message, including any attachments, is for the sole use of the
intended recipient and may contain confidential and privileged information. Any
unauthorized view, use, disclosure or distribution is prohibited. If you are
not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message. Anju Software, Inc. 4500 S.
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.