Having the language already separated makes it a lot easier. 

You could add the language suffix (e.g. 3 letter with ISO 639-2B 
https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) per field where you have 
the different languages. Or else you could have copied an entire field to their 
language-analyzed fields, and hope that would be good enough for matching. 

I think Malay should be very similar to Indonesian 
(https://wiki.apache.org/solr/LanguageAnalysis#Indonesian). However, you could 
extend this by adding your own dictionary (keywords) and stopwords (if that is 
desirable).

/JZ

-----Original Message-----
From: Mugeesh Husain [mailto:muge...@gmail.com] 
Sent: Monday, September 11, 2017 3:46 AM
To: solr-user@lucene.apache.org
Subject: Re: multi language search engine in solr

Thank you rick for your response.

The document document have sepearte of the lanaguage instead of mix of Arabic, 
English, Bengali, Hindi, Malay.

I coul not find any tokenizer for Malay, can you suggest me if you know please.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to