Re: Arabic analyser

Mahmoud Almokadem Tue, 10 Nov 2015 05:47:52 -0800

Thanks Pual,

Arabic analyser applying filters of normalisation and stemming only for
single terms out of standard tokenzier.
Gathering all synonyms will be hard work. Should I customise my Tokenizer
to handle this case?


Sincerely,
Mahmoud


On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net> wrote:

> Mahmoud,
>
> there is an arabic analyzer:
>   https://wiki.apache.org/solr/LanguageAnalysis#Arabic
> doesn't it do what you describe?
> Synonyms probably work there too.
>
> Paul
>
> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > 9 novembre 2015 17:47
> > Thanks Jack,
> >
> > This is a good solution, but we have more combinations that I think
> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
> > applied for each separate term.
> >
> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
> > single term?
> >
> > Thanks,
> > Mahmoud
> >
> >
> >
> > Jack Krupansky <mailto:jack.krupan...@gmail.com>
> > 9 novembre 2015 16:47
> > Use an index-time (but not query time) synonym filter with a rule like:
> >
> > Abd Allah,Abdallah
> >
> > This will index the combined word in addition to the separate words.
> >
> > -- Jack Krupansky
> >
> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <
> prog.mahm...@gmail.com>
> >
> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > 9 novembre 2015 10:48
> > Hello,
> >
> > We are indexing Arabic content and facing a problem for tokenizing multi
> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> > 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
> > الله' with space. We are using StandardTokenizer.
> >
> >
> > Is there any configurations to handle this case?
> >
> > Thank you,
> > Mahmoud
> >
>
>

Re: Arabic analyser

Reply via email to