[tesseract-ocr] No recognizing Arabic numbers, but recognizes letters

Aijolomohi Egwaikhide Mon, 06 Apr 2020 11:12:25 -0700

Hi, I am working with OCR to recognize arabic words and numbers(dates) from 
a scanned pdf (I have done some enhancing on it), but it cant seem to 
accurately read arabic numbers but it reads the letters properly. what can 
i do to make this better?


An example of a numbers (in date format) 

[image: Screen Shot 2020-04-06 at 10.41.21 AM.png]

[image: Screen Shot 2020-04-06 at 10.42.42 AM.png]
The first one is the input, the second one is the output and it read it as 
English even though i specify that it is all arabic - when I put the 
document in google docs and read as a word document, it reads it fine

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5db69553-cde5-4937-ba71-1ff5c2b3bfd4%40googlegroups.com.

[tesseract-ocr] No recognizing Arabic numbers, but recognizes letters

Reply via email to