Re: Arabic analyser

Paul Libbrecht Tue, 10 Nov 2015 05:07:12 -0800

Mahmoud,

there is an arabic analyzer:
  https://wiki.apache.org/solr/LanguageAnalysis#Arabic
doesn't it do what you describe?
Synonyms probably work there too.


Paul

> Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> 9 novembre 2015 17:47
> Thanks Jack,
>
> This is a good solution, but we have more combinations that I think
> can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
> and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
> Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
> applied for each separate term.
>
> Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
> single term?
>
> Thanks,
> Mahmoud
>
>
>
> Jack Krupansky <mailto:jack.krupan...@gmail.com>
> 9 novembre 2015 16:47
> Use an index-time (but not query time) synonym filter with a rule like:
>
> Abd Allah,Abdallah
>
> This will index the combined word in addition to the separate words.
>
> -- Jack Krupansky
>
> On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <prog.mahm...@gmail.com>
>
> Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> 9 novembre 2015 10:48
> Hello,
>
> We are indexing Arabic content and facing a problem for tokenizing multi
> terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
> الله' with space. We are using StandardTokenizer.
>
>
> Is there any configurations to handle this case?
>
> Thank you,
> Mahmoud
>

Re: Arabic analyser

Reply via email to