replied to it On Thursday, May 4, 2017 at 3:06:34 PM UTC+3, Ahmad Moawad wrote: > > check ur email > > On Thursday, May 4, 2017 at 1:51:04 PM UTC+2, Ibr wrote: >> >> [email protected] >> >> On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: >>> >>> Ibr give me your email! >>> >>> On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >>>> >>>> while I was creating lstmf files to I can use them in recognition text >>>> images I fount that some of the characters are recognized in a wrong way, >>>> some of them are not integrated in the tesseract and some them are due to >>>> some writing in certain Arabic itself, >>>> >>>> in this case the tesseract acts correct but the font in Arabic is >>>> different scripting, but in the other case, the tesseract makes mistake in >>>> detecting the characters >>>> both cases are described in this issue that I made few days ago >>>> https://github.com/tesseract-ocr/tesseract/issues/840 >>>> >>>> >>>> On Thursday, May 4, 2017 at 12:49:01 PM UTC+3, Ahmad Moawad wrote: >>>> >>>>> My Scenario is related to make training from images not from text >>>>> base, I want to finetune characters such as: >>>>> لمجرد not ملجرد >>>>> and soon on >>>>> >>>>> On Thursday, May 4, 2017 at 11:28:13 AM UTC+2, Ibr wrote: >>>>>> >>>>>> if you are referring to tesseract 4.00alpha with liptonica 1.74.1, >>>>>> and if you compiled them in the correct way and got the binaries that >>>>>> you >>>>>> need for training lmstf files, then I recommend to follow the >>>>>> suggestions >>>>>> that is made by tesseract devs which is: once you create an .lstmf file >>>>>> for >>>>>> a certain font (that can be used for Arabic writing) then get the >>>>>> official >>>>>> ara.traineddata file from GitHub paste it in tessdata folder, and the >>>>>> lstmf >>>>>> file in tesseract folder and run the command tesseract text_image >>>>>> result_text -l ara --oem 1 >>>>>> what Arabic characters exactly are you trying to enhance the accuracy >>>>>> for ? >>>>>> >>>>>> On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote: >>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> >>>>>>> I want to make training for Arabic language in Tesseract 4.0, and >>>>>>> The result of this version is great but still need some tunning, so I >>>>>>> got >>>>>>> jTessBoxEditor 2.0 beta. >>>>>>> I tried to modify the incorrect characters and build >>>>>>> ara.traineddata. After copying the ara.traineddata to >>>>>>> /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I >>>>>>> run >>>>>>> the tesseract on the image. >>>>>>> So any suggestion of how making training for Version 4.0, I already >>>>>>> know that that last version 3.0x cube doesn't included in 4.0 LSTM or >>>>>>> waiting until Ray makes another updated ara.traineddata. >>>>>>> >>>>>>> ,Thanks. >>>>>>> >>>>>>
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7d05a823-e5a8-4673-8d4f-b8a2e4b147a0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

