[tesseract-ocr] Extracting text from digital display

Berend Berendsen Sun, 22 Nov 2015 05:01:36 -0800

I am trying to extract text from a digital display (not seven segment). The 
use case is that there will be a camera pointed at the display taking a 
picture every X seconds which has to be processed. An example of a display 
is:

<https://lh3.googleusercontent.com/-SvVO4ZPJFd0/VlEGZ-lxLHI/AAAAAAAAAsE/v96MxmSp_34/s1600/checkweigherdisplay.tif>

There are three segments I am interested in, which I cut out of the image
before giving it to Tesseract:

1) the number behind No.

2) The number behind Total

3) The number at the right side of the display

Extracting the images and then preprocessing them (grayscale, invert,
change contrast) and psm mode 6 with digits only works wel for 1) and 3).
However 2 seems to be a challenge. I think it is because of the font which
causes Tesseract to see disjointed characters. I am wondering if I am not
overshooting the problem, because the images will be of fixed size, fixed
locations for the areas I am interested in - would pattern matching work
better?

I can train Tesseract on the font of 2) or has someone has any suggestions
on what would be the best plan of attack for this?

Cut out version of 2):

<https://lh3.googleusercontent.com/-dGOqlCa_738/VlEHbBgYQbI/AAAAAAAAAsM/0L2jXNPzBgY/s1600/checkweigheritem2.jpg>

Thanks and regards!

berend

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/a8ba2251-ffa0-43b6-b168-ae48ea732614%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Extracting text from digital display

Reply via email to