Re: Arabic words search in solr

2017-09-11 Thread mohanmca01
Hi Aman Deep Singh,

Thanks for the information.

We tried with EdgeNGramFilterFactory but it's not workingWe are not
getting expected results. 

Can you please suggest us alternative possible ways..

Thanks,




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Arabic words search in solr

2017-08-13 Thread Aman Deep Singh
Try the edge ngram filter
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
I think it will help you solve the problem

On Sun, Aug 13, 2017 at 7:08 PM mohanmca01  wrote:

> Hi Aman Deep Singh,
>
> Thanks for your update... I will update the status after complete the
> testing.
>
> I need one more help from your end,can you check below scenario:
>
> we are getting the results while using AND operator in between the words.
>
> Below is the example:
>
> Scenario 1:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 1,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(مسقط AND الاتصال)",
>   "_": "1501998206658",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 44,
> "start": 0,
> "docs": [
>   {
> "id": "56367",
> "bizNameAr": "بنك مسقط - مركز الاتصال",
> "_version_": 1574621133647380500
>   },
>   {
> "id": "27224",
> "bizNameAr": "بلدية مسقط -  - بلدية مسقط - مركز الاتصالات",
> "_version_": 1574621132817956900
>   },
>   {
> "id": "148922",
> "bizNameAr": "بنك مسقط - ميثاق - مركز الاتصال",
> "_version_": 1574621136335929300
>   },
>   {
> "id": "23695",
> "bizNameAr": "قوة السلطان الخاصة - مركز الإتصالات  - مسقط",
> "_version_": 1574621132683739100
>   },
>   {
> "id": "34992",
> "bizNameAr": "طوارئ الكهرباء - محافظة مسقط - مركز الاتصال",
> "_version_": 1574621133116801000
>   },
>   {
> "id": "96500",
> "bizNameAr": "شركة مسقط لتوزيع الكهرباء( ام اي دي سي)  - مركز
> الاتصال",
> "_version_": 1574621134575370200
>   },
>   {
> "id": "23966",
> "bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية
> العامة
> للاتصالات ونظم المعلومات -  - المديرية العامة للاتصالات ونظم المعلومات -
> البدالة",
> "_version_": 1574621132692127700
>   },
>   {
> "id": "24005",
> "bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية
> العامة
> للاتصالات ونظم المعلومات -  - مدير عام الاتصالات ونظم المعلومات -",
> "_version_": 1574621132694225000
>   },
>   {
> "id": "24026",
> "bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية
> العامة
> للاتصالات ونظم المعلومات -  - مساعد مدير عام الاتصالات ونظم المعلومات -",
> "_version_": 1574621132694225000
>   },
>   {
> "id": "24096",
> "bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية
> العامة
> للاتصالات ونظم المعلومات -  - مدير دائرة الاتصالات والصيانة -",
> "_version_": 1574621132697370600
>   }
> ]
>   }
> }
>
>
> Scenario 2:.
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 1,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(مسقط AND الات)",
>   "_": "1501998438821",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 0,
> "start": 0,
> "docs": []
>   }
> }
>
> We are expecting same results in the scenario 2 as well where am not typing
> the second word fully as in scenario’s 2 input.
>
>
> Below are the inputs used in both scenarios:
>
> Scenario 1:
> First word: مسقط
> Second word: الاتصال
>
> Scenario 2:
> First word: مسقط
> Second word: الات
>
> However, in our current production environment both of the above scenarios
> are working fine, but we have an issue of “Hamza” character where we are
> not
> getting results unless typing “Hamza” if it’s there.
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 9,
> "params": {
>   "fl": "businessNmBl",
>   "indent": "true",
>   "q": "businessNmBl:شرطة إزكي",
>   "_": "1501997897849",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "businessNmBl": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية
> -
> - مركز شرطة إزكي"
>   }
> ]
>   }
> }
>
> Thanks,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4350392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Arabic words search in solr

2017-08-13 Thread mohanmca01
Hi Aman Deep Singh,

Thanks for your update... I will update the status after complete the
testing.

I need one more help from your end,can you check below scenario:

we are getting the results while using AND operator in between the words. 

Below is the example: 

Scenario 1:

{ 
  "responseHeader": { 
"status": 0, 
"QTime": 1, 
"params": { 
  "indent": "true", 
  "q": "bizNameAr:(مسقط AND الاتصال)", 
  "_": "1501998206658", 
  "wt": "json" 
} 
  }, 
  "response": { 
"numFound": 44, 
"start": 0, 
"docs": [ 
  { 
"id": "56367", 
"bizNameAr": "بنك مسقط - مركز الاتصال", 
"_version_": 1574621133647380500 
  }, 
  { 
"id": "27224", 
"bizNameAr": "بلدية مسقط -  - بلدية مسقط - مركز الاتصالات", 
"_version_": 1574621132817956900 
  }, 
  { 
"id": "148922", 
"bizNameAr": "بنك مسقط - ميثاق - مركز الاتصال", 
"_version_": 1574621136335929300 
  }, 
  { 
"id": "23695", 
"bizNameAr": "قوة السلطان الخاصة - مركز الإتصالات  - مسقط", 
"_version_": 1574621132683739100 
  }, 
  { 
"id": "34992", 
"bizNameAr": "طوارئ الكهرباء - محافظة مسقط - مركز الاتصال", 
"_version_": 1574621133116801000 
  }, 
  { 
"id": "96500", 
"bizNameAr": "شركة مسقط لتوزيع الكهرباء( ام اي دي سي)  - مركز
الاتصال", 
"_version_": 1574621134575370200 
  }, 
  { 
"id": "23966", 
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - المديرية العامة للاتصالات ونظم المعلومات -
البدالة", 
"_version_": 1574621132692127700 
  }, 
  { 
"id": "24005", 
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مدير عام الاتصالات ونظم المعلومات -", 
"_version_": 1574621132694225000 
  }, 
  { 
"id": "24026", 
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مساعد مدير عام الاتصالات ونظم المعلومات -", 
"_version_": 1574621132694225000 
  }, 
  { 
"id": "24096", 
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مدير دائرة الاتصالات والصيانة -", 
"_version_": 1574621132697370600 
  } 
] 
  } 
} 


Scenario 2:. 

{ 
  "responseHeader": { 
"status": 0, 
"QTime": 1, 
"params": { 
  "indent": "true", 
  "q": "bizNameAr:(مسقط AND الات)", 
  "_": "1501998438821", 
  "wt": "json" 
} 
  }, 
  "response": { 
"numFound": 0, 
"start": 0, 
"docs": [] 
  } 
} 

We are expecting same results in the scenario 2 as well where am not typing
the second word fully as in scenario’s 2 input. 


Below are the inputs used in both scenarios: 

Scenario 1:
First word: مسقط 
Second word: الاتصال 

Scenario 2:
First word: مسقط 
Second word: الات 

However, in our current production environment both of the above scenarios
are working fine, but we have an issue of “Hamza” character where we are not
getting results unless typing “Hamza” if it’s there. 

{ 
  "responseHeader": { 
"status": 0, 
"QTime": 9, 
"params": { 
  "fl": "businessNmBl", 
  "indent": "true", 
  "q": "businessNmBl:شرطة إزكي", 
  "_": "1501997897849", 
  "wt": "json" 
} 
  }, 
  "response": { 
"numFound": 1, 
"start": 0, 
"docs": [ 
  { 
"businessNmBl": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  - 
- مركز شرطة إزكي" 
  } 
] 
  } 
} 

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4350392.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Arabic words search in solr

2017-08-13 Thread Aman Deep Singh
You can configure mm either in the request handler sorconfig.xml or pass as
a request parameter along side the user query
For more detail refer
 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

example of sample handler is


  
10
searchFields
100%
dismax
  

On 13-Aug-2017 6:43 PM, "mohanmca01"  wrote:

Hi Aman Deep,

Thanks for the information, In order to add mm=100% in the request handler,
in which place ?..Can you please share me sample snap. thanks in advance.






--
View this message in context:
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4350389.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-08-13 Thread mohanmca01
Any one help me on below use case.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4350390.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Arabic words search in solr

2017-08-13 Thread mohanmca01
Hi Aman Deep,

Thanks for the information, In order to add mm=100% in the request handler,
in which place ?..Can you please share me sample snap. thanks in advance.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4350389.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Arabic words search in solr

2017-08-06 Thread Aman Deep Singh
Use mm=100% in the request handler
It will give the same AND functionality


On 06-Aug-2017 11:59 AM, "mohanmca01"  wrote:

hello Allison.

thank you for the information.

i referred to your slide "33", yes we are looking for same kind of results
and solution.

would you please guide us on how to achieve this?

also, we would like to know Instead of putting AND operator in between the
words if there is another way of doing this by adding this in configuration
level.

thanks



--
View this message in context: http://lucene.472066.n3.
nabble.com/Arabic-words-search-in-solr-tp4317733p4349259.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Arabic words search in solr

2017-08-06 Thread mohanmca01
hello Allison.

thank you for the information.

i referred to your slide "33", yes we are looking for same kind of results
and solution.

would you please guide us on how to achieve this?

also, we would like to know Instead of putting AND operator in between the
words if there is another way of doing this by adding this in configuration
level.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4349259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-08-06 Thread mohanmca01
Hi Dave,

Yes we are getting the results while using AND operator in between the
words.

Below is the example:

*Scenario 1:*

{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": "bizNameAr:(مسقط AND الاتصال)",
  "_": "1501998206658",
  "wt": "json"
}
  },
  "response": {
"numFound": 44,
"start": 0,
"docs": [
  {
"id": "56367",
"bizNameAr": "بنك مسقط - مركز الاتصال",
"_version_": 1574621133647380500
  },
  {
"id": "27224",
"bizNameAr": "بلدية مسقط -  - بلدية مسقط - مركز الاتصالات",
"_version_": 1574621132817956900
  },
  {
"id": "148922",
"bizNameAr": "بنك مسقط - ميثاق - مركز الاتصال",
"_version_": 1574621136335929300
  },
  {
"id": "23695",
"bizNameAr": "قوة السلطان الخاصة - مركز الإتصالات  - مسقط",
"_version_": 1574621132683739100
  },
  {
"id": "34992",
"bizNameAr": "طوارئ الكهرباء - محافظة مسقط - مركز الاتصال",
"_version_": 1574621133116801000
  },
  {
"id": "96500",
"bizNameAr": "شركة مسقط لتوزيع الكهرباء( ام اي دي سي)  - مركز
الاتصال",
"_version_": 1574621134575370200
  },
  {
"id": "23966",
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - المديرية العامة للاتصالات ونظم المعلومات -
البدالة",
"_version_": 1574621132692127700
  },
  {
"id": "24005",
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مدير عام الاتصالات ونظم المعلومات -",
"_version_": 1574621132694225000
  },
  {
"id": "24026",
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مساعد مدير عام الاتصالات ونظم المعلومات -",
"_version_": 1574621132694225000
  },
  {
"id": "24096",
"bizNameAr": "ديوان البلاط السلطاني - القصر - مسقط - المديرية العامة
للاتصالات ونظم المعلومات -  - مدير دائرة الاتصالات والصيانة -",
"_version_": 1574621132697370600
  }
]
  }
}


*Scenario 2:*.

{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": "bizNameAr:(مسقط AND الات)",
  "_": "1501998438821",
  "wt": "json"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": []
  }
}

We are expecting same results in the scenario 2 as well where am not typing
the second word fully as in scenario’s 2 input.


Below are the inputs used in both scenarios:

*Scenario 1:*
First word: مسقط
Second word: الاتصال

*Scenario 2:*
First word: مسقط
Second word: الات

However, in our current production environment both of the above scenarios
are working fine, but we have an issue of “Hamza” character where we are not
getting results unless typing “Hamza” if it’s there.

{
  "responseHeader": {
"status": 0,
"QTime": 9,
"params": {
  "fl": "businessNmBl",
  "indent": "true",
  "q": "businessNmBl:شرطة إزكي",
  "_": "1501997897849",
  "wt": "json"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"businessNmBl": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  - 
- مركز شرطة إزكي"
  }
]
  }
}






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4349258.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-08-02 Thread Tim Casey
There should be a way to use a phrasal query for the specific names.

On Wed, Aug 2, 2017 at 2:15 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:

> Hopefully changing to default AND solves your problem. If so, I would be
> quite interested in what your index config looks like in the end. I also
> have upcoming need to index Arabic words.
>
> -Original Message-
> From: mohanmca01 [mailto:mohanmc...@gmail.com]
> Sent: Thursday, 3 August 2017 12:58 a.m.
> To: solr-user@lucene.apache.org
> Subject: RE: Arabic words search in solr
>
> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, but it
> was getting wrong results. i was searching for "شرطة ازكي" and it was
> showing me the result that am looking for, plus irrelevant result which
> either have the first or second word that i have typed while searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the words
> that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
> 
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  replacement="ئ"/>
>  replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


RE: Arabic words search in solr

2017-08-02 Thread Phil Scadden
Hopefully changing to default AND solves your problem. If so, I would be quite 
interested in what your index config looks like in the end. I also have 
upcoming need to index Arabic words.

-Original Message-
From: mohanmca01 [mailto:mohanmc...@gmail.com]
Sent: Thursday, 3 August 2017 12:58 a.m.
To: solr-user@lucene.apache.org
Subject: RE: Arabic words search in solr

Hi Phil Scadden,

 Thank you for your reply,

we tried your suggested solution by removing hyphen while indexing, but it was 
getting wrong results. i was searching for "شرطة ازكي" and it was showing me 
the result that am looking for, plus irrelevant result which either have the 
first or second word that i have typed while searching.

First word: شرطة
Second Word: ازكي

results that we are getting:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": "bizNameAr:(شرطة ازكي)",
  "_": "1501678260335",
  "wt": "json"
}
  },
  "response": {
"numFound": 444,
"start": 0,
"docs": [
  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  - 
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },
  {
"id": "13937",
"bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
"_version_": 157462113219720
  },
  {
"id": "15914",
"bizNameAr": "العلوي والازكي المتحدة ش.م.م",
"_version_": 1574621132344000500
  },
  {
"id": "20639",
"bizNameAr": "سحائب ازكي للتجارة",
"_version_": 1574621132574687200
  },
  {
"id": "25108",
"bizNameAr": "المستشفيات -  - مستشفى إزكي",
"_version_": 1574621132737216500
  },
  {
"id": "27629",
"bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
"_version_": 1574621132833685500
  },
  {
"id": "36351",
"bizNameAr": "طوارئ الكهرباء - إزكي",
"_version_": 157462113318391
  },
  {
"id": "61235",
"bizNameAr": "اضواء ازكي للتجارة",
"_version_": 1574621133785792500
  },
  {
"id": "66821",
"bizNameAr": "أطلال إزكي للتجارة",
"_version_": 1574621133915816000
  },
  {
"id": "67011",
"bizNameAr": "بنك ظفار - فرع ازكي",
"_version_": 1574621133920010200
  }
]
  }
}

Actually  we expecting the below results only since it has both the words that 
we typed while searching:

  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  - 
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },


Configuration:

In schema.xml we configured as below:





  








  



Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
Sent from the Solr - User mailing list archive at Nabble.com.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Arabic words search in solr

2017-08-02 Thread Allison, Timothy B.
+1

I was hoping to use this as a case for arguing for turning off an overly 
aggressive stemmer, but I checked on your 10 docs and query, and David is 
right, of course -- if you change the default operator to AND, you only get the 
one document back that you had intended to.

I can still use this as a case for getting on my Unicode normalization soapbox 
and +1'ing your use of the ICUFoldingFilter.  With no token filters, you get 4 
results; when you add the ICUFoldingFilter, you get 8 results; and when you add 
in the Arabic stemmer, you get all 10.  Not that you need this, but see slide 
33 of [1], where we show 78 Unicode variants for "America" in ~800k docs in an 
Arabic script language.  Without Unicode normalization, users might get 1/2 the 
documents back or far, far fewer...and they wouldn't even know what they were 
missing!

[1] 
https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com] 
Sent: Wednesday, August 2, 2017 9:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Arabic words search in solr

perhaps change your default operator to AND instead of OR if thats what you are 
expecting for a result

On Wed, Aug 2, 2017 at 8:57 AM, mohanmca01 <mohanmc...@gmail.com> wrote:

> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, 
> but it was getting wrong results. i was searching for "شرطة ازكي" and 
> it was showing me the result that am looking for, plus irrelevant 
> result which either have the first or second word that i have typed while 
> searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  
> -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the 
> words that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  
> -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
>  stored="true"/>
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  pattern="ى"
> replacement="ئ"/>
>  pattern="ء"
> replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Arabic words search in solr

2017-08-02 Thread David Hastings
perhaps change your default operator to AND instead of OR if thats what you
are expecting for a result

On Wed, Aug 2, 2017 at 8:57 AM, mohanmca01  wrote:

> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, but it
> was getting wrong results. i was searching for "شرطة ازكي" and it was
> showing me the result that am looking for, plus irrelevant result which
> either have the first or second word that i have typed while searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the words
> that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
> 
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  pattern="ى"
> replacement="ئ"/>
>  pattern="ء"
> replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Arabic words search in solr

2017-08-02 Thread mohanmca01
Hi Phil Scadden,

 Thank you for your reply,

we tried your suggested solution by removing hyphen while indexing, but it
was getting wrong results. i was searching for "شرطة ازكي" and it was
showing me the result that am looking for, plus irrelevant result which
either have the first or second word that i have typed while searching.

First word: شرطة 
Second Word: ازكي

results that we are getting:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": "bizNameAr:(شرطة ازكي)",
  "_": "1501678260335",
  "wt": "json"
}
  },
  "response": {
"numFound": 444,
"start": 0,
"docs": [
  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  -
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },
  {
"id": "13937",
"bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
"_version_": 157462113219720
  },
  {
"id": "15914",
"bizNameAr": "العلوي والازكي المتحدة ش.م.م",
"_version_": 1574621132344000500
  },
  {
"id": "20639",
"bizNameAr": "سحائب ازكي للتجارة",
"_version_": 1574621132574687200
  },
  {
"id": "25108",
"bizNameAr": "المستشفيات -  - مستشفى إزكي",
"_version_": 1574621132737216500
  },
  {
"id": "27629",
"bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
"_version_": 1574621132833685500
  },
  {
"id": "36351",
"bizNameAr": "طوارئ الكهرباء - إزكي",
"_version_": 157462113318391
  },
  {
"id": "61235",
"bizNameAr": "اضواء ازكي للتجارة",
"_version_": 1574621133785792500
  },
  {
"id": "66821",
"bizNameAr": "أطلال إزكي للتجارة",
"_version_": 1574621133915816000
  },
  {
"id": "67011",
"bizNameAr": "بنك ظفار - فرع ازكي",
"_version_": 1574621133920010200
  }
]
  }
}

Actually  we expecting the below results only since it has both the words
that we typed while searching:

  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  -
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },


Configuration:

In schema.xml we configured as below:





   






 
 
  



Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Arabic words search in solr

2017-07-31 Thread Phil Scadden
Further to that. What results do you get when you put those indexed terms into 
the Analysis tool on the Solr UI?

-Original Message-
From: Phil Scadden [mailto:p.scad...@gns.cri.nz]
Sent: Tuesday, 1 August 2017 9:06 a.m.
To: solr-user@lucene.apache.org
Subject: RE: Arabic words search in solr

Am I correct in assuming that you have the problem searching only when there is 
a hyphen in your indexed text? If you, then it would suggest that you need to 
use a different tokenizer when indexing - it looks like the hyphen is removed 
and words each side are concatenated - hence need both terms to find the text.

-Original Message-
From: mohanmca01 [mailto:mohanmc...@gmail.com]
Sent: Tuesday, 1 August 2017 1:18 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Arabic words search in solr

Please help me on this...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348372.html
Sent from the Solr - User mailing list archive at Nabble.com.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Arabic words search in solr

2017-07-31 Thread Phil Scadden
Am I correct in assuming that you have the problem searching only when there is 
a hyphen in your indexed text? If you, then it would suggest that you need to 
use a different tokenizer when indexing - it looks like the hyphen is removed 
and words each side are concatenated - hence need both terms to find the text.

-Original Message-
From: mohanmca01 [mailto:mohanmc...@gmail.com]
Sent: Tuesday, 1 August 2017 1:18 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Arabic words search in solr

Please help me on this...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348372.html
Sent from the Solr - User mailing list archive at Nabble.com.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Arabic words search in solr

2017-07-31 Thread mohanmca01
Please help me on this...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348372.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-07-11 Thread mohanmca01
Hi Steve,

thank you for your reply, it been quite long time to reply to you back.

i have tried what you suggested, and there were some improvements in terms
of searching and getting the results.

however, the team is facing some difficulty in searching using shortcut of
the indexed names which forced us to revert back the suggested changes..

below are the examples which we are facing:


-
*Example 1:*

*Indexed Text*
بنك مسقط - مركز الاتصال

*Searched*
مسقط الات

*Remarks of Example 1*
unable to get the indexed result unless I typed the two words fully (مسقط
الاتصال)


{
  "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
  "indent": "true",
  "q": "businessNmBl:(مسقط الات)",
  "_": "1499758511717",
  "wt": "json"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": []
  }
}


-

*Example 2:*

*Indexed Text
*الطيران العماني - مركز الاتصال

*Searched*
الطير الات

*Remarks*
unable to get the indexed result unless I typed the two words fully (الطيران
الاتصال)


{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "indent": "true",
  "q": "businessNmBl:(طير الات)",
  "_": "1499758649600",
  "wt": "json"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": []
  }
}



Please be noted that the existing configuration (which we are facing
problems with Hamzzh (ء) and etc. )  on production is working with the above
examples. its not working only once we implement your suggested
configuration. 

Thanks in advance





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4345392.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-03-09 Thread Steve Rowe
Hi Mohan,

Your examples refer to documents I don’t have in my 9 document set, so I recast 
the problem to a query/doc combo I have from earlier in this thread, and I was 
able to restrict hits to only documents that contained all terms from the query.

If I use the query “name_ar:(شرطة ازكي)” I get 3 hits (I’ve left out some 
details):

-
{ "responseHeader": { ... "params": { "q":"name_ar:(شرطة ازكي)”, ... } },
  "response": { "numFound":3, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي"], ... },
  { "id":"3", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة شمال 
الشرقية - - مركز شرطة إبراء”], ... },
  { "id":"8", "name_ar":["وزارة الصحة - المديرية العامة للخدمات الصحية  
محافظة الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ”], ... }]}
-

If I add “q.op=AND” to the request, only one of these documents matches - note 
that I’ve also checked the “debugQuery” option on the Admin UI:

-
{ "responseHeader": { … 
  "params": { "q":"name_ar:(شرطة ازكي)”, "q.op":"AND”, "debugQuery":“true”, ... 
} },
  "response": { "numFound":1, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
"rawquerystring": "name_ar:(شرطة ازكي)",
"querystring": "name_ar:(شرطة ازكي)",
"parsedquery": "+name_ar:شرط +name_ar:ازك",
"parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-

Note the “parsedquery" above - it shows how to require individual terms when 
specifying the field for each term.  This is how the "name_ar:(شرطة ازكي)” 
query is interpreted when the "q.op=AND” request param is used.

The equivalent query using ‘+’ signs is: "name_ar:(+شرطة +ازكي)”.  This *looks* 
strange because of how the Unicode bidirectional algorithm works.  This W3C 
writeup uses Arabic to drive its discussion of display of strings that contain 
both RTL and LTR character runs, and I found it quite helpful here: 
.

Here’s the output from the "name_ar:(+شرطة +ازكي)” query:

-
{ "responseHeader": { ... "params": { "q":"name_ar:(+شرطة +ازكي)", 
"debugQuery":“true” ... } },
  "response": { "numFound":1, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
"rawquerystring": "name_ar:(+شرطة +ازكي)",
"querystring": "name_ar:(+شرطة +ازكي)",
"parsedquery": "+name_ar:شرط +name_ar:ازك",
"parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-

The above is the same result (and has the same parsedQuery) as query 
"name_ar:(شرطة ازكي)” with request param “q.op=AND”.

I won’t show it here, but I get the same 1-hit result for this query when I use 
AND instead of ‘+’: "name_ar:(شرطة AND ازكي)” - note that the terms only 
*appear* to be in reverse order because of how the Unicode bidirectional 
algorithm works.

> On Mar 9, 2017, at 2:30 AM, mohanmca01  wrote:
> 
> I saw your products in lucidworks website. Do you have any solr arabic
> support customized product?

Lucidworks doesn’t have a specifically Arabic-focused product, but we have 
helped people enable Arabic search in the past.  Click on the “Contact Us” link 
on the website if you’d like to talk to us about getting involved.

--
Steve
www.lucidworks.com



Re: Arabic words search in solr

2017-03-08 Thread mohanmca01
Hi Stave,

Thanks for the support, I tried below cases but still i'm not able to get
the expected results.

Case 1 :

Input :  bizNameAr:شرطة + ازكي

Output : {

  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": " bizNameAr:شرطة + ازكي",
  "_": "1489041466096",
  "wt": "json"
}
  },
  "response": {
"numFound": 4,
"start": 0,
"docs": [
  {
"id": "82",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية
- - مركز شرطة إزكي",
"_version_": 1560298301338681300
  },
  {
"id": "63",
"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي",
"_version_": 1560298301325049900
  },
  {
"id": "56",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
الشرقية  -  - مركز شرطة إبراء",
"_version_": 1560298301319807000
  },
  {
"id": "79",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
الشرقية - - مركز شرطة إبراء",
"_version_": 1560298301335535600
  }
]
  }
}


In this case document id : 63,56,79 are not matching with the input,
where id 82 is the only correct in these results.



Case 2:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": " bizNameAr:شرطة AND ازكي",
  "_": "1489043935549",
  "wt": "json"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": []
  }
}


if AND is given in between of the terms then no results are shown.

I saw your products in lucidworks website. Do you have any solr arabic
support customized product?

Thanks,



On Thu, Mar 2, 2017 at 7:01 PM, sarowe [via Lucene] <
ml-node+s472066n4323036...@n3.nabble.com> wrote:

> Hi Mohan,
>
> > On Feb 26, 2017, at 1:37 AM, mohanmca01 <[hidden email]
> > wrote:
> >
> > i searched with (bizNameAr: شرطة ازكي), and am getting:
> > […]
> >
> > the expected result is:   "id": "82",
> >  "bizNameAr": "شرطة عمان السلطانية -
> قيادة
> > شرطة محافظة الداخلية - - مركز *شرطة إزكي*",
> >
> > as the above has both the words mentioned in the query (marked as Bold),
> > where the rest have the following:
> >
> >"id": "63",
> >"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي"
> >
> > it has only one word of the query (ازكي)
> >
> >"id": "56",
> >"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال
> الشرقية
> > -  - مركز شرطة إبراء"
> >
> > it has only one word of the query (شرطة)
> >
> > "id": "79",
> > "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - -
> مركز
> > شرطة إبراء"
> >
> > It has only one word of the query (شرطة)
> >
> > where the above 3 records should not come in the result since already 2
> > words mentioned in the query, and only one record has these two words.
>
> Solr's standard query language includes two mechanisms for requiring
> terms: ‘+’ before a required term, and ‘AND’ between two required terms.
>  ‘+’ is better - see  12/28/why-not-and-or-and-not/> for more information.
>
> You can also set the default operator to ‘AND’, e.g. via request parameter
> “=AND” (if this is always what you want, you can include this in the
> /select request handler’s definition in solrconfig.xml).  See <
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser>
> for more information.
>
> > I would really suggest if we can give you a real-time demo on our system
> > with my Arab colleague so it can be more clear for you. let us know if
> we
> > can do that.
>
> I prefer to keep discussion on this public mailing list so that others can
> benefit.  If you find that you need faster or more interactive help, you
> can check out the list of people who have indicated that they provide Solr
> support: .
>
> --
> Steve
> www.lucidworks.com
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-
> tp4317733p4323036.html
> To unsubscribe from Arabic words search in solr, click here
> 
> .
> NAML
> 
>



-- 
Regards,
Mohan.N
9865998919




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4324142.html
Sent from the Solr - User mailing list archive at 

Re: Arabic words search in solr

2017-03-02 Thread Steve Rowe
Hi Mohan,

> On Feb 26, 2017, at 1:37 AM, mohanmca01  wrote:
> 
> i searched with (bizNameAr: شرطة ازكي), and am getting:
> […]
> 
> the expected result is:   "id": "82",
>  "bizNameAr": "شرطة عمان السلطانية - قيادة
> شرطة محافظة الداخلية - - مركز *شرطة إزكي*",
> 
> as the above has both the words mentioned in the query (marked as Bold),
> where the rest have the following:
> 
>"id": "63",
>"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي"
> 
> it has only one word of the query (ازكي)
> 
>"id": "56",
>"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية 
> -  - مركز شرطة إبراء"
> 
> it has only one word of the query (شرطة)
> 
> "id": "79",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز
> شرطة إبراء"
> 
> It has only one word of the query (شرطة)
> 
> where the above 3 records should not come in the result since already 2
> words mentioned in the query, and only one record has these two words.

Solr's standard query language includes two mechanisms for requiring terms: ‘+’ 
before a required term, and ‘AND’ between two required terms.  ‘+’ is better - 
see  for more 
information.

You can also set the default operator to ‘AND’, e.g. via request parameter 
“=AND” (if this is always what you want, you can include this in the 
/select request handler’s definition in solrconfig.xml).  See 
 
for more information.  

> I would really suggest if we can give you a real-time demo on our system
> with my Arab colleague so it can be more clear for you. let us know if we
> can do that.

I prefer to keep discussion on this public mailing list so that others can 
benefit.  If you find that you need faster or more interactive help, you can 
check out the list of people who have indicated that they provide Solr support: 
.

--
Steve
www.lucidworks.com



Re: Arabic words search in solr

2017-03-02 Thread mohanmca01
Hi Stave, 

Any update on this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4323005.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-25 Thread mohanmca01
Hi Stave,

Thank for your continues investigation..

This has improved the search little bit, but am facing another issue where
am getting a record doesn't have a specific word in my query. 

Plz note that you have indexed only 9 records where i have shared you more
than 76 sample records (please refer to the earlier attachment
Arabic_Characters2.xlsx in Examples sheet) to index so you can reproduce the
issue. 

i.e. i searched with (bizNameAr: شرطة ازكي), and am getting:

{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": "bizNameAr: شرطة ازكي",
  "_": "1488089550104",
  "wt": "json"
}
  },
  "response": {
"numFound": 4,
"start": 0,
"docs": [
  {
"id": "82",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - -
مركز شرطة إزكي",
"_version_": 1560298301338681300
  },
  {
"id": "63",
"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي",
"_version_": 1560298301325049900
  },
  {
"id": "56",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية 
-  - مركز شرطة إبراء",
"_version_": 1560298301319807000
  },
  {
"id": "79",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية -
- مركز شرطة إبراء",
"_version_": 1560298301335535600
  }
]
  }
}



the expected result is:   "id": "82",
  "bizNameAr": "شرطة عمان السلطانية - قيادة
شرطة محافظة الداخلية - - مركز *شرطة إزكي*",

as the above has both the words mentioned in the query (marked as Bold),
where the rest have the following:

"id": "63",
"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي"

it has only one word of the query (ازكي)

"id": "56",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية 
-  - مركز شرطة إبراء"

it has only one word of the query (شرطة)

"id": "79",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز
شرطة إبراء"

It has only one word of the query (شرطة)

where the above 3 records should not come in the result since already 2
words mentioned in the query, and only one record has these two words.


I would really suggest if we can give you a real-time demo on our system
with my Arab colleague so it can be more clear for you. let us know if we
can do that.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4322354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-23 Thread Steve Rowe
Hi Mohan,

I indexed your 9 examples as simple documents after mapping dynamic field 
“*_ar” to the “text_ar” field type:

-
[{"id":"1", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"2", "name_ar":"شركة التأمين الأهلية ش.م.ع.م"},
{"id":"3", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - 
مركز شرطة إبراء"},
{"id":"4", "name_ar":"شركة ظفار للتأمين ش.م.ع.ع"},
{"id":"5", "name_ar":"طوارئ المستشفيات   - طوارئ مستشفى صحار"},
{"id":"6", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز 
شرطة إزكي"},
{"id":"7", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"8", "name_ar":"وزارة الصحة - المديرية العامة للخدمات الصحية  محافظة 
الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ"},
{"id":"9", "name_ar":"أسعار المكالمات الدولية - مونتسرات -  - مونتسرات”}]
-

Then when I search from the Admin UI for “name_ar:شرطة ازكي” (the query in one 
of your screenshots with numFound=0) I get the following results:

-
{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": "name_ar:شرطة ازكي",
  "_": "1487912340325",
  "wt": "json"
}
  },
  "response": {
"numFound": 2,
"start": 0,
"docs": [
  {
"id": "6",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز شرطة إزكي"
],
"_version_": 1560170434794619000
  },
  {
"id": "3",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز شرطة 
إبراء"
],
"_version_": 1560170434793570300
  }
]
  }
}
-

So I cannot reproduce the failures you’re seeing.  In fact, I tried all 9 of 
the queries you listed as not working, and all of them matched at least one of 
the above 9 documents, except for case 5 (which I give details for below).  Are 
you absolutely sure that you reindexed your data with the ICUFF last?

The one query that didn’t return any matches for me is “name_ar:طوارى صحار”.  
Here’s why:

Indexed original: طوارئ صحار
Indexed analyzed: طواري صحار

Query original: طوارى صحار
Query analyzed: طوار صحار

In the analyzed indexed form, the “ئ” (yeh with hamza above) is left intact by 
ArabicNormalizationFilter and ArabicStemFilter, and then the ICUFoldingFilter 
converts it to “ي” (yeh without the hamza).

In the analyzed query, ArabicNormalizationFilter converts “طوارى” to “طواري” 
(alef maksura->yeh), which ArabicStemFilter converts to “طوار” by removing the 
trailing yeh.

I don’t know what the correct thing to do is to make alef maksura and yeh match 
each other, but one possibility is adding a char filter that converts all alefs 
maksura into yehs with hamza, like this:



Re: Arabic words search in solr

2017-02-21 Thread mohanmca01
Hi Stave,

As per your suggestion I added ICU folding filter and I re-indexed entire
solr data, but still am unable to find the expected results which i
highlighted earlier.

attached excel sheet with examples of Arabic names for your investigation &
reproducing the issue.
Arabic_Characters2.xlsx
  

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4321582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-21 Thread Steve Rowe
Hi Mohan,

It looks to me like the example query should match, since the analyzed query 
terms look like a subset of the analyzed document terms.

Did you re-index your docuemnts after you changed your schema?  If not, then 
the indexed documents won’t have the same terms as the ones you see on the 
Admin UI Analysis pane.

If you have re-indexed, and are still not getting matches you expect, please 
include textual examples of the remaining problems, so that I can copy/paste to 
reproduce the problem - I can’t copy/paste Arabic from images you pointed to.

--
Steve
www.lucidworks.com

> On Feb 21, 2017, at 1:28 AM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> I changed ICU folding filter order and re-index entire Arabic content. But
> still problem is present. I am not able to get the expected result.
> 
> I attached screen shot for your references.
>  
>  
>  
> 
> Kindly check and let me know.
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4321397.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arabic words search in solr

2017-02-20 Thread mohanmca01
Hi Steve,

I changed ICU folding filter order and re-index entire Arabic content. But
still problem is present. I am not able to get the expected result.

I attached screen shot for your references.
 
 
 

Kindly check and let me know.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4321397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-15 Thread Steve Rowe
Hi Mohan,

When I said "the ICU folding filter should be the last filter, to allow the 
Arabic normalization and stemming filters to see the original words”, I meant 
that no filter should follow it.  

You did not make that change.

Here’s what I mean:

   
  
   
   
   
   
   
 
   

--
Steve
www.lucidworks.com

> On Feb 15, 2017, at 12:23 AM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> As per your suggestion,I added ICUFoldingFilterFactory in schema.xml as
> below:
> 
> 
>   
>
>
> words="lang/stopwords_ar.txt" />
>
>
>  
>
> 
> I attached expecting result document in previous mail thread for your
> references.
> 
> Kindly check and let me know.
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320427.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arabic words search in solr

2017-02-14 Thread mohanmca01
Hi Steve,

As per your suggestion,I added ICUFoldingFilterFactory in schema.xml as
below:


   





  


I attached expecting result document in previous mail thread for your
references.

Kindly check and let me know.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320427.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-14 Thread Steve Rowe
Hi Mohan,

Did you change the order of the filters as I suggested?

--
Steve
eww.lucidworks.com

On Tue, Feb 14, 2017 at 8:05 AM mohanmca01  wrote:

> Hi Steve,
>
> any update on this .???.. I am waiting for your inputs..
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320253.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Arabic words search in solr

2017-02-14 Thread mohanmca01
Hi Steve,

any update on this .???.. I am waiting for your inputs..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-08 Thread Steve Rowe
Hi Mohan,

I haven’t looked at the latest problems, but the ICU folding filter should be 
the last filter, to allow the Arabic normalization and stemming filters to see 
the original words.

--
Steve
www.lucidworks.com

> On Feb 8, 2017, at 10:58 PM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> Thanks for your continues investigation on this issue.
> 
> I added ICU Folding Filter in schema.xml file and re-indexed all the data
> again. i noticed some improvements in search but its not really as expected.
> 
> below is the configuration changed in schema file:
> 
> -
> 
>   
>
> 
> 
> words="lang/stopwords_ar.txt" />
> 
>
>
>  
>
> -
> 
> attached the document for your reference where highlighted ones in red are
> not working as expected.
> 
> Also, i have raised one point regarding Jquery autocomplete with unique
> records..kindly let me know if you have any background on how to implement
> the same.
> 
> arabicSearch.docx
>   
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4319436.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arabic words search in solr

2017-02-08 Thread mohanmca01
Hi Steve,

Thanks for your continues investigation on this issue.

I added ICU Folding Filter in schema.xml file and re-indexed all the data
again. i noticed some improvements in search but its not really as expected.

below is the configuration changed in schema file:

-

   


 




  

-

attached the document for your reference where highlighted ones in red are
not working as expected.

Also, i have raised one point regarding Jquery autocomplete with unique
records..kindly let me know if you have any background on how to implement
the same.

arabicSearch.docx
  


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4319436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-02-02 Thread Steve Rowe
Hi Mohan,

I ran your Case #1 through Solr 4.9.0’s Admin UI Analysis pane and I can see 
the analyzer for the field type “text_ar" analyzer does not remove all 
diacritics:

Indexed original: المؤسسة التجارية العمانية
Indexed analyzed: مؤسس تجار عمان

Query original: الموسسة التجارية
Query analyzed: موسس تجار

The analyzed query terms are the same as the first two analyzed indexed terms, 
with one exception: the hamza on the waw in the analyzed indexed term “مؤسس” 
was not stripped off by the analyzer, and so won’t match the analyzed query 
term “موسس”, which was entered by the user without the hamza.

Adding ICUFoldingFilterFactory to the “text_ar” field type fixed case #1 for me 
by stripping the hamza from the waw.  You can read more about this filter in 
the Solr Reference Guide (yes, this is basically for Solr 6.4, but I don’t 
think this functionality has changed between 4.9 and 6.4): 
.
  If you do this, you can remove the LowerCaseFilterFactory since 
ICUFoldingFilterFactory performs lowercasing as part of its work.

Note that to use ICUFoldingFilterFactory you must add three jars to the lib/ 
directory in your solr home dir.  Here’s how I did it:

$ mkdir example/solr/lib
$ cp dist/solr-analysis-extras-4.9.0.jar example/solr/lib/
$ cp contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.9.0.jar 
example/solr/lib/
$ cp contrib/analysis-extras/lib/icu4j-53.1.jar example/solr/lib/

--
Steve
www.lucidworks.com 

> On Feb 1, 2017, at 6:50 AM, mohanmca01  wrote:
> 
> Dear Steve,Thanks for investigating our problem. Our project is basically
> business directory search platform, and we have more than 100+ K business
> details information. I’m providing you some examples of Arabic words to
> reproduce the problem. please find attached word file where i explained
> everything along with screenshots. arabicSearch.docx
>  
> regarding upgrading to the latest version, our project is running on Java
> 1.7V, and if i need to upgrade then we have to upgrade Java, Application
> Server JBoos, and etc. which is not that right time to do this activity at
> all..!!
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4318227.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arabic words search in solr

2017-02-01 Thread mohanmca01
Dear Steve,Thanks for investigating our problem. Our project is basically
business directory search platform, and we have more than 100+ K business
details information. I’m providing you some examples of Arabic words to
reproduce the problem. please find attached word file where i explained
everything along with screenshots. arabicSearch.docx
 
regarding upgrading to the latest version, our project is running on Java
1.7V, and if i need to upgrade then we have to upgrade Java, Application
Server JBoos, and etc. which is not that right time to do this activity at
all..!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4318227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arabic words search in solr

2017-01-31 Thread Steve Rowe
Mohan,

I downloaded and started Solr 4.9.0 and entered your example indexed and 
queried words into the Admin UI’s Analysis pane using the text_ar field type.  
You can see the results here: 
.

Each of the indexed words and the query word are analyzed to the same string.  
They should match and return docs containing them as hits for the query word.

So, what is exactly the problem you are having?  What specifically doesn’t work?

FYI, in general you should be using the most recent release of Solr (6.4.0 
right now) unless there are reasons why you can't.  It’s the most 
stable/performant/supported version.

--
Steve
www.lucidworks.com

> On Jan 31, 2017, at 1:19 AM, mohan sundaram  wrote:
> 
> Hi,
> 
> I went through the solr references document which you shared in the link.
> Your shared references document pointing to solr version 6.4.0.
> 
> The implemented Solr version in my project is 4.9.0.
> 
> 
> As I mentioned earlier In my solr schema.xml I defined product Arabic name
> field as below:
> 
> /*--*/
> 
>  stored="true"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
>
> 
> class="solr.StandardTokenizerFactory"/>
> 
> 
>
> 
> 
> ignoreCase="true" words="lang/stopwords_ar.txt" />
> 
>
> 
>
> 
>
> 
> 
> 
> /*--*/
> 
> 
> 
> I am indexing the Arabic content using “text_ar” field type.
> 
> 
> 
> 
> *Characters*
> 
> *ا*
> 
> *أ*
> 
> *إ*
> 
> *آ*
> 
> Shift key Considers for the above
> 
> Table 1
> 
> 
> These are the example of characters where I’m facing the searching
> difficulty.
> 
> 
> 
> 
> *Example Indexed words*
> 
> *ابرا*
> 
> *أبرا*
> 
> *إبرا*
> 
> *آبرا*
> 
> Table 2
> 
> These an example of indexed words in Solr.
> 
> 
> 
> *Searching word*
> 
> *ابرا*
> 
> Table 3
> 
> 
> Now my problem is, By searching for the above word(table 3) I should get
> all indexed words in table 2 in the output.
> 
> 
> 
> Is Solr version 4.9.0 compatible with Arabic search or do I need to upgrade
> to higher version?
> 
> 
> Kindly, do let me know if I need to give an example of all characters since
> I gave only for one character which is hamza with alef.
> 
> 
> Thanks,
> 
> Mohan
> 
> 
> 
> 
> On Mon, Jan 30, 2017 at 9:21 PM, Steve Rowe  wrote:
> 
>> Hi Mohan,
>> 
>> I answered your question on the solr-user list.  Did you see my response?
>> 
>> I CC’d you on this email, but you should know that Apache mailing lists
>> won’t automatically send you email unless you have subscribed to the list.
>> For more information, see > /community.html#mailing-lists-irc>.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jan 29, 2017, at 2:16 PM, mohan sundaram 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> In solr search I want to search with product name using Arabic letters.
>>> While searching, Arabic user can feel little default to search some
>> product
>>> name. Because some characters need to mention while searching.
>>> 
>>> Ex: إ أ آ
>>> 
>>> 
>>> In the above mentioned characters, user can get combination of shift key.
>>> Usually if Arabic people will mention “ ا “  character and will get the
>>> below combined words.
>>> 
>>> Ex: إبرا
>>> 
>>> 
>>> In my solr schema.xml I defined product arabic name field as below
>>> 
>>> 
>>> >> stored="true"/>
>>> 
>>> 
>>> >> positionIncrementGap="100">
>>> 
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   >> words="lang/stopwords_ar.txt" />
>>> 
>>>   
>>> 
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>> 
>>> 
>>> What changes I have do in schame.xml. Please help me on this.
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Mohan.N
>>> 096896429683
>> 
>> 



Re: Arabic words search in solr

2017-01-31 Thread Erick Erickson
If you look in the upper-lerf corner of any reference guide page
you'll see a link to previous versions of the docs and can download
whatever version you are working with back to 4.7 IIRC. I'd download
that and see if there's similar functionality.

On Mon, Jan 30, 2017 at 10:19 PM, mohan sundaram  wrote:
> Hi,
>
>  I went through the solr references document which you shared in the link.
> Your shared references document pointing to solr version 6.4.0.
>
> The implemented Solr version in my project is 4.9.0.
>
>
> As I mentioned earlier In my solr schema.xml I defined product Arabic name
> field as below:
>
> /*--*/
>
>  stored="true"/>
>
>
>
>  positionIncrementGap="100">
>
> 
>
>  class="solr.StandardTokenizerFactory"/>
>
>
> 
>
>
>  ignoreCase="true" words="lang/stopwords_ar.txt" />
>
> 
>
> 
>
> 
>
> 
>
> /*--*/
>
>
>
> I am indexing the Arabic content using “text_ar” field type.
>
>
>
>
> *Characters*
>
> *ا*
>
> *أ*
>
> *إ*
>
> *آ*
>
> Shift key Considers for the above
>
> Table 1
>
>
> These are the example of characters where I’m facing the searching
> difficulty.
>
>
>
>
> *Example Indexed words*
>
> *ابرا*
>
> *أبرا*
>
> *إبرا*
>
> *آبرا*
>
> Table 2
>
> These an example of indexed words in Solr.
>
>
>
> *Searching word*
>
> *ابرا*
>
> Table 3
>
>
> Now my problem is, By searching for the above word(table 3) I should get
> all indexed words in table 2 in the output.
>
>
>
> Is Solr version 4.9.0 compatible with Arabic search or do I need to upgrade
> to higher version?
>
>
> Kindly, do let me know if I need to give an example of all characters since
> I gave only for one character which is hamza with alef.
>
>
> Thanks,
>
> Mohan
>
>
>
>
> On Mon, Jan 30, 2017 at 9:21 PM, Steve Rowe  wrote:
>
>> Hi Mohan,
>>
>> I answered your question on the solr-user list.  Did you see my response?
>>
>> I CC’d you on this email, but you should know that Apache mailing lists
>> won’t automatically send you email unless you have subscribed to the list.
>> For more information, see > /community.html#mailing-lists-irc>.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jan 29, 2017, at 2:16 PM, mohan sundaram 
>> wrote:
>> >
>> > Hi,
>> >
>> > In solr search I want to search with product name using Arabic letters.
>> > While searching, Arabic user can feel little default to search some
>> product
>> > name. Because some characters need to mention while searching.
>> >
>> > Ex: إ أ آ
>> >
>> >
>> > In the above mentioned characters, user can get combination of shift key.
>> > Usually if Arabic people will mention “ ا “  character and will get the
>> > below combined words.
>> >
>> > Ex: إبرا
>> >
>> >
>> > In my solr schema.xml I defined product arabic name field as below
>> >
>> >
>> > > > stored="true"/>
>> >
>> >
>> >  > > positionIncrementGap="100">
>> >
>> >  
>> >
>> >
>> >
>> >
>> >
>> >> > words="lang/stopwords_ar.txt" />
>> >
>> >
>> >
>> >
>> >
>> >  
>> >
>> >
>> >
>> >
>> >
>> > What changes I have do in schame.xml. Please help me on this.
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Mohan.N
>> > 096896429683
>>
>>


Re: Arabic words search in solr

2017-01-30 Thread mohanmca01
Hi Steve,

Thanks for sharing the information. 

 I went through the solr references document which you shared in the link.
Your shared references document pointing to solr version 6.4.0.
The implemented Solr version in my project is 4.9.0.

As I mentioned earlier In my solr schema.xml I defined product Arabic name
field as below:

/*--*/
 
 









/*--*/


I am indexing the Arabic content using “text_ar” field type.

 
Characters
ا
أ
إ
آ
Shift key Considers for the above
Table 1

These are the example of characters where I’m facing the searching
difficulty.
 
Example Indexed words
ابرا
أبرا
إبرا
آبرا
Table 2

These an example of indexed words in Solr.
 
Searching word
ابرا
Table 3

Now my problem is, By searching for the above word(table 3) I should get all
indexed words in table 2 in the output.
 
Is Solr version 4.9.0 compatible with Arabic search or do I need to upgrade
to higher version?

Kindly, do let me know if I need to give an example of all characters since
I gave only for one character which is hamza with alef.

Thanks,
Mohan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4317941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-01-29 Thread Steve Rowe
Hi Mohan,

The analyzer in your text_ar field type looks like an expanded version of the 
one suggested in the Solr Reference Guide[1].

Can you give an example of a query and the indexed text you expect to match but 
doesn't?

ArabicNormalizationFilterFactory, which uses Lucene’s ArabicNormalizer[2] 
should convert alefs with hamza to plain alef, among several other 
normalizations.

The Light 10 stemming algorithm implemented by ArabicNormalizer and 
ArabicStemmer[3] is described here: 
.

[1] Solr Ref Guide: Language Analysis: Arabic 

[2] ArabicNormalizer javadocs 

[3] ArabicStemmer javadocs 


--
Steve
www.lucidworks.com

> On Jan 29, 2017, at 2:12 PM, mohan sundaram  wrote:
> 
> Hi,
> 
> In solr search I want to search with product name using Arabic letters.
> While searching, Arabic user can feel little default to search some product
> name. Because some characters need to mention while searching.
> 
> Ex: إ أ آ
> 
> 
> In the above mentioned characters, user can get combination of shift key.
> Usually if Arabic people will mention “ ا “  character and will get the
> below combined words.
> 
> Ex: إبرا
> 
> 
> In my solr schema.xml I defined product arabic name field as below
> 
> 
>  stored="true"/>
> 
> 
>   positionIncrementGap="100">
> 
>  
> 
>
> 
>
> 
> words="lang/stopwords_ar.txt" />
> 
>
> 
>
> 
>  
> 
>
> 
> 
> 
> What changes I have do in schame.xml. Please help me on this.
> 
> 
> 
> --
> Regards,
> Mohan.N
> 096896429683