Thanks Jack, 

This is a good solution, but we have more combinations that I think can’t be 
handled as synonyms like every word starts with ‘عبد’ ‘Abd’ and ‘أبو’ ‘Abo’. 
When using Standard tokenizer on ‘أبو بكر’ ‘Abo Bakr’, It’ll be tokenised to 
‘أبو’ and ‘بكر’ and the filters will be applied for each separate term.

Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a single term?

Thanks,
Mahmoud 


> On Nov 9, 2015, at 5:47 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote:
> 
> Use an index-time (but not query time) synonym filter with a rule like:
> 
> Abd Allah,Abdallah
> 
> This will index the combined word in addition to the separate words.
> 
> -- Jack Krupansky
> 
> On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <prog.mahm...@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> We are indexing Arabic content and facing a problem for tokenizing multi
>> terms phrases like 'عبد الله' 'Abd Allah', so users will search for
>> 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
>> الله' with space. We are using StandardTokenizer.
>> 
>> 
>> Is there any configurations to handle this case?
>> 
>> Thank you,
>> Mahmoud
>> 

Reply via email to