I am trying to extract text from a digital display (not seven segment). The 
use case is that there will be a camera pointed at the display taking a 
picture every X seconds which has to be processed. An example of a display 
is: 

<https://lh3.googleusercontent.com/-SvVO4ZPJFd0/VlEGZ-lxLHI/AAAAAAAAAsE/v96MxmSp_34/s1600/checkweigherdisplay.tif>

There are three segments I am interested in, which I cut out of the image 
before giving it to Tesseract:

1) the number behind No.

2) The number behind Total

3) The number at the right side of the display


Extracting the images and then preprocessing them (grayscale, invert, 
change contrast) and psm mode 6 with digits only works wel for 1) and 3). 
However 2 seems to be a challenge. I think it is because of the font which 
causes Tesseract to see disjointed characters. I am wondering if I am not 
overshooting the problem, because the images will be of fixed size, fixed 
locations for the areas I am interested in - would pattern matching work 
better? 


I can train Tesseract on the font of 2) or has someone has any suggestions 
on what would be the best plan of attack for this?


Cut out version of 2):

<https://lh3.googleusercontent.com/-dGOqlCa_738/VlEHbBgYQbI/AAAAAAAAAsM/0L2jXNPzBgY/s1600/checkweigheritem2.jpg>


Thanks and regards!

berend



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a8ba2251-ffa0-43b6-b168-ae48ea732614%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to