Thanks Pual, Arabic analyser applying filters of normalisation and stemming only for single terms out of standard tokenzier. Gathering all synonyms will be hard work. Should I customise my Tokenizer to handle this case?
Sincerely, Mahmoud On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net> wrote: > Mahmoud, > > there is an arabic analyzer: > https://wiki.apache.org/solr/LanguageAnalysis#Arabic > doesn't it do what you describe? > Synonyms probably work there too. > > Paul > > > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com> > > 9 novembre 2015 17:47 > > Thanks Jack, > > > > This is a good solution, but we have more combinations that I think > > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’ > > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo > > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be > > applied for each separate term. > > > > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a > > single term? > > > > Thanks, > > Mahmoud > > > > > > > > Jack Krupansky <mailto:jack.krupan...@gmail.com> > > 9 novembre 2015 16:47 > > Use an index-time (but not query time) synonym filter with a rule like: > > > > Abd Allah,Abdallah > > > > This will index the combined word in addition to the separate words. > > > > -- Jack Krupansky > > > > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem < > prog.mahm...@gmail.com> > > > > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com> > > 9 novembre 2015 10:48 > > Hello, > > > > We are indexing Arabic content and facing a problem for tokenizing multi > > terms phrases like 'عبد الله' 'Abd Allah', so users will search for > > 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد > > الله' with space. We are using StandardTokenizer. > > > > > > Is there any configurations to handle this case? > > > > Thank you, > > Mahmoud > > > >