Hi , I'm using tesseract 4 with vs 2017 . i have used with English characters first , now i started to include arabic as well . the thing is got weird characters even when i change the eng.traineddata to ara.traineddata.
i found out that it's the characters when the UTF8 code is treated as Hex code . [image: image] <https://user-images.githubusercontent.com/35866217/39512903-1cb50738-4e25-11e8-8d0b-eba47c163d8b.png> this is the image i want to recognize . the result is [image: image] <https://user-images.githubusercontent.com/35866217/39513059-9a6a2424-4e25-11e8-9bcd-2eb56d3437b6.png> i convert the letters in arabic to UTF8 code using this website https://r12a.github.io/app-conversion/ and when i take this code and converted as a hex code to character i get the same characters that tesseract showed me the first time. "عبدالسلام مدي عبدالعزيز" I think the problem is with UTF8 or might be my visual studio can't recognize Arabic letters or what On Thursday, April 26, 2018 at 1:00:47 PM UTC+8, Amir Raouf wrote: > > First The arabic is read by tesseract with good accuracy but NO DIGITS > read so I decided to train only numbers with specific font I need > > This is the question > https://stackoverflow.com/questions/50029477/issue-with-training-tesseract-4-0 > > Any advice > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ee45d52d-1bcd-491b-bd62-bf22b1cca42a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.