Hi , I'm using tesseract 4 with vs 2017 . i have used with English 
characters first , now i started to include arabic as well . the thing is 
got weird characters even when i change the eng.traineddata to 
ara.traineddata.

i found out that it's the characters when the UTF8 code is treated as Hex 
code .
[image: image] 
<https://user-images.githubusercontent.com/35866217/39512903-1cb50738-4e25-11e8-8d0b-eba47c163d8b.png>
this is the image i want to recognize .
the result is
[image: image] 
<https://user-images.githubusercontent.com/35866217/39513059-9a6a2424-4e25-11e8-9bcd-2eb56d3437b6.png>
i convert the letters in arabic to UTF8 code using this website
https://r12a.github.io/app-conversion/
and when i take this code and converted as a hex code to character i get 
the same characters that tesseract showed me the first time.
"عبدالسلام مدي عبدالعزيز"

I think the problem is with UTF8 or might be my visual studio can't 
recognize Arabic letters or what

On Thursday, April 26, 2018 at 1:00:47 PM UTC+8, Amir Raouf wrote:
>
> First The arabic is read by tesseract with good accuracy but NO DIGITS 
> read so I decided to train only numbers with specific font I need
>
> This is the question 
> https://stackoverflow.com/questions/50029477/issue-with-training-tesseract-4-0
>
> Any advice
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ee45d52d-1bcd-491b-bd62-bf22b1cca42a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to