[tesseract-ocr] Extracting black & white text from image

Edoardo Conti Fri, 23 Aug 2019 20:43:02 -0700


I am using tesseract to extract a bunch of sparse numbers from an image 
<https://i.stack.imgur.com/PS6AR.png> for a Poker application I am working 
on. I have tweaked the settings a bit and am getting decent results, but am 
still missing several numbers from the image that I'd need. Specifically, I 
am missing all the player numbers (the 1 - 6 labels in the small circles), 
and the small $ values ($0.05, $0.15, $0.37, etc.). I think the issue is 
that the image contains both black and white text.



Any advice on preprocessing I could do to improve this or settings to 
change in tesseract would be appreciated.


Code below:

from PIL import Image
import pytesseract


img = Image.open(path).convert('L')

print(pytesseract.image_to_string(img, lang='eng', \
    config='--psm 11 -c tessedit_char_whitelist=0123456789$.'))


And output:

$ python test.py
08

$0.02$0.05

$1.50

$4.12

$2.56

3

$2.39

$4.33

$1.52



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/82355411-a164-4864-8b0f-5dd1ce08fa83%40googlegroups.com.

[tesseract-ocr] Extracting black & white text from image

Reply via email to