while I was creating lstmf files to I can use them in recognition text 
images I fount that some of the characters are recognized in a wrong way, 
some of them are not integrated in the tesseract and some them are due to 
some writing in certain Arabic itself, 

in this case the tesseract acts correct but the font in Arabic is different 
scripting, but in the other case, the tesseract makes mistake in detecting 
the characters 
both cases are described in this issue that I made few days ago
https://github.com/tesseract-ocr/tesseract/issues/840


On Thursday, May 4, 2017 at 12:49:01 PM UTC+3, Ahmad Moawad wrote:

> My Scenario is related to make training from images not from text base, I 
> want to finetune characters such as: 
> لمجرد not ملجرد
> and soon on 
>
> On Thursday, May 4, 2017 at 11:28:13 AM UTC+2, Ibr wrote:
>>
>> if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if 
>> you compiled them in the correct way and got the binaries that you need for 
>> training lmstf files, then I recommend to follow the suggestions that is 
>> made by tesseract devs which is: once you create an .lstmf file for a 
>> certain font (that can be used for Arabic writing) then get the official 
>> ara.traineddata file from GitHub paste it in tessdata folder, and the lstmf 
>> file in tesseract folder and run the command  tesseract text_image 
>> result_text -l ara --oem 1 
>> what Arabic characters exactly are you trying to enhance the accuracy for 
>> ?
>>
>> On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote:
>>
>>> Hello All,
>>>
>>>
>>> I want to make training for Arabic language in Tesseract 4.0, and The 
>>> result of this version is great but still need some tunning, so I got 
>>> jTessBoxEditor 2.0 beta.
>>> I tried to modify the incorrect characters and build ara.traineddata. 
>>> After copying the ara.traineddata to 
>>> /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I run 
>>> the tesseract on the image.
>>> So any suggestion of how making training for Version 4.0, I already know 
>>> that that last version 3.0x cube doesn't included in 4.0 LSTM or waiting 
>>> until Ray makes another updated ara.traineddata.
>>>
>>> ,Thanks.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5b826196-eaee-4a35-8e81-2bb7d0dc4367%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to