Please see https://github.com/Shreeshrii/tessdata_arabic
You can try the new traineddata from there alongwith the PR https://github.com/tesseract-ocr/tesseract/pull/2266 On Mon, Feb 25, 2019 at 9:27 PM Soufiane Sabiri <[email protected]> wrote: > Have you had any luck training tesseract for arabic letters or numbers? > > On Sunday, November 25, 2018 at 9:09:33 AM UTC+1, [email protected] wrote: >> >> Hi Marwa M. Khan >> >> Have you generated any tessdataa for arabic-indian number ? >> >> I'm trying to generate one but JTessBoxEditor does not take >> arabic-indian numbers, how to fix it ? >> >> On Thursday, July 19, 2018 at 12:52:24 PM UTC+3, Marwa M. Khan wrote: >>> >>> Hello, >>> >>> I am trying to train the Tesseract 4.0 with LTSM on Arabic/Hindi >>> Digits in windows OS. I found that I need to create box file. Thus, I'm >>> using JTessBoxEditor 2.0 for creating tiff and box files. However, it fails >>> when I used JTessBoxEditor 2.0 to generate the .traindata file. Note that >>> I choose combine_tessdata.exe as tesseract executable, ara.arial.exp0.box >>> as training data, and training with existing box as a training mode. >>> >>> >>> The output is the followings: >>> >>> esseract Open Source OCR Engine v4.0.0-beta.1-108-gf291 with Leptonica >>> Page 1 >>> Bad box coordinates in boxfile string! ١ ٤٥٤ ٣١٦٣ ٤٦٣ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٢ ٤١٣ ٣١٦٣ ٤٢٨ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٣ ٣٧٣ ٣١٦٣ ٣٩٣ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٤ ٣٣٨ ٣١٦٣ ٣٥٠ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٥ ٢٩٨ ٣١٦٨ ٣١٤ ٣١٨٥ ٠ >>> >>> Bad box coordinates in boxfile string! ٦ ٢٥٨ ٣١٦٣ ٢٧٣ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٧ ٢١٩ ٣١٦٣ ٢٣٨ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٨ ١٨٠ ٣١٦٣ ٢٠٠ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٩ ١٤٥ ٣١٦٣ ١٥٩ ٣١٩٠ ٠ >>> >>> Bad box coordinates in boxfile string! ٠ ١٠٩ ٣١٦٧ ١١٧ ٣١٧٨ ٠ >>> >>> Bad box coordinates in boxfile string! ١ ٤٥٤ ٣٠١٥ ٤٦٣ ٣٠٤٢ ٠ >>> >>> Bad box coordinates in boxfile string! ٢ ٤١٣ ٣٠١٥ ٤٢٨ ٣٠٤٢ ٠ >>> >>> Bad box coordinates in boxfile string! ٣ ٣٧٣ ٣٠١٥ ٣٩٣ ٣٠٤٢ ٠ >>> >>> Bad box coordinates in boxfile string! ٤ ٣٣٨ ٣٠١٥ ٣٥٠ ٣٠٤٢ ٠ >>> >>> Bad box coordinates in boxfile string! ٥ ٢٩٨ ٣٠٢٠ ٣١٤ ٣٠٣٧ ٠ >>> >>> Bad box coordinates in boxfile string! ٦ ٢٥٨ ٣٠١٥ ٢٧٣ ٣٠٤٢ ٠ >>> >>> >>> Could you please tell me where I did wrong or how to fix this error? >>> >>> >>> Best Regards, >>> Marwa M. Khan >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/3d1aa31a-6af2-4b90-a1e6-b93f9b792de9%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3d1aa31a-6af2-4b90-a1e6-b93f9b792de9%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWLeVKjkMBQw8t%3DOOtFHs6ZxFvgCT7UPKEGK6b_cOW54w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

