I am trying to extract text from a digital display (not seven segment). The use case is that there will be a camera pointed at the display taking a picture every X seconds which has to be processed. An example of a display is:
<https://lh3.googleusercontent.com/-SvVO4ZPJFd0/VlEGZ-lxLHI/AAAAAAAAAsE/v96MxmSp_34/s1600/checkweigherdisplay.tif> There are three segments I am interested in, which I cut out of the image before giving it to Tesseract: 1) the number behind No. 2) The number behind Total 3) The number at the right side of the display Extracting the images and then preprocessing them (grayscale, invert, change contrast) and psm mode 6 with digits only works wel for 1) and 3). However 2 seems to be a challenge. I think it is because of the font which causes Tesseract to see disjointed characters. I am wondering if I am not overshooting the problem, because the images will be of fixed size, fixed locations for the areas I am interested in - would pattern matching work better? I can train Tesseract on the font of 2) or has someone has any suggestions on what would be the best plan of attack for this? Cut out version of 2): <https://lh3.googleusercontent.com/-dGOqlCa_738/VlEHbBgYQbI/AAAAAAAAAsM/0L2jXNPzBgY/s1600/checkweigheritem2.jpg> Thanks and regards! berend -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a8ba2251-ffa0-43b6-b168-ae48ea732614%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

