Having the language already separated makes it a lot easier. You could add the language suffix (e.g. 3 letter with ISO 639-2B https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) per field where you have the different languages. Or else you could have copied an entire field to their language-analyzed fields, and hope that would be good enough for matching.
I think Malay should be very similar to Indonesian (https://wiki.apache.org/solr/LanguageAnalysis#Indonesian). However, you could extend this by adding your own dictionary (keywords) and stopwords (if that is desirable). /JZ -----Original Message----- From: Mugeesh Husain [mailto:muge...@gmail.com] Sent: Monday, September 11, 2017 3:46 AM To: solr-user@lucene.apache.org Subject: Re: multi language search engine in solr Thank you rick for your response. The document document have sepearte of the lanaguage instead of mix of Arabic, English, Bengali, Hindi, Malay. I coul not find any tokenizer for Malay, can you suggest me if you know please. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html