[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

Ibr Thu, 04 May 2017 05:44:15 -0700

replied to it

On Thursday, May 4, 2017 at 3:06:34 PM UTC+3, Ahmad Moawad wrote:
>
> check ur email
>
> On Thursday, May 4, 2017 at 1:51:04 PM UTC+2, Ibr wrote:
>>
>> [email protected]
>>
>> On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote:
>>>
>>> Ibr give me your email!
>>>
>>> On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote:
>>>>
>>>> while I was creating lstmf files to I can use them in recognition text 
>>>> images I fount that some of the characters are recognized in a wrong way, 
>>>> some of them are not integrated in the tesseract and some them are due to 
>>>> some writing in certain Arabic itself, 
>>>>
>>>> in this case the tesseract acts correct but the font in Arabic is 
>>>> different scripting, but in the other case, the tesseract makes mistake in 
>>>> detecting the characters 
>>>> both cases are described in this issue that I made few days ago
>>>> https://github.com/tesseract-ocr/tesseract/issues/840
>>>>
>>>>
>>>> On Thursday, May 4, 2017 at 12:49:01 PM UTC+3, Ahmad Moawad wrote:
>>>>
>>>>> My Scenario is related to make training from images not from text 
>>>>> base, I want to finetune characters such as: 
>>>>> لمجرد not ملجرد
>>>>> and soon on 
>>>>>
>>>>> On Thursday, May 4, 2017 at 11:28:13 AM UTC+2, Ibr wrote:
>>>>>>
>>>>>> if you are referring to tesseract 4.00alpha with liptonica 1.74.1, 
>>>>>> and if you compiled them in the correct way and got the binaries that 
>>>>>> you 
>>>>>> need for training lmstf files, then I recommend to follow the 
>>>>>> suggestions 
>>>>>> that is made by tesseract devs which is: once you create an .lstmf file 
>>>>>> for 
>>>>>> a certain font (that can be used for Arabic writing) then get the 
>>>>>> official 
>>>>>> ara.traineddata file from GitHub paste it in tessdata folder, and the 
>>>>>> lstmf 
>>>>>> file in tesseract folder and run the command  tesseract text_image 
>>>>>> result_text -l ara --oem 1 
>>>>>> what Arabic characters exactly are you trying to enhance the accuracy 
>>>>>> for ?
>>>>>>
>>>>>> On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I want to make training for Arabic language in Tesseract 4.0, and 
>>>>>>> The result of this version is great but still need some tunning, so I 
>>>>>>> got 
>>>>>>> jTessBoxEditor 2.0 beta.
>>>>>>> I tried to modify the incorrect characters and build 
>>>>>>> ara.traineddata. After copying the ara.traineddata to 
>>>>>>> /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I 
>>>>>>> run 
>>>>>>> the tesseract on the image.
>>>>>>> So any suggestion of how making training for Version 4.0, I already 
>>>>>>> know that that last version 3.0x cube doesn't included in 4.0 LSTM or 
>>>>>>> waiting until Ray makes another updated ara.traineddata.
>>>>>>>
>>>>>>> ,Thanks.
>>>>>>>
>>>>>>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7d05a823-e5a8-4673-8d4f-b8a2e4b147a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

Reply via email to