Hello,
I'm trying to recognize the machine readable part of a passport. (see the last line in this picture: http://s.hswstatic.com/gif/passport-11.jpg ) I'm using Tesseract on Android (tess-two) and take the picture with a 5 Mpix mobile camera. Unfortunately, the accuracy is not satisfyingly high. What I have tried to improve recognition was cropping the picture and retraining Tesseract for the font used in a passport (ocr-b). Both raises accuracy but still not to an acceptable level. Here is a typical cropped picture I hand to Tesseract to perform ocr: <https://lh3.googleusercontent.com/-DjwyoGe0dYQ/VSU_mcxzkMI/AAAAAAAAAAM/3HpmT04hzBM/s1600/croppic6.gif> The binarized picture created by Tess for the actual recognition looks like this: <https://lh3.googleusercontent.com/-DwGxUTaDcK0/VSU_5W9pnpI/AAAAAAAAAAU/ffmiLw6yuLo/s1600/tessinput6.tif> This is what Tesseract recognizes: * 09 1 M 1 907 1 8 F8 F857<4 < W<B<O <UME QVWBBENO W JMGHJ <RBP6W9BQR ED* I figured that the thin line at the bottom is extremely distracting to Tesseract. If I cut off the line manually and perform ocr, results are perfectly fine and all characters are recognized. My question is, how can I find and get rid of that line automatically if it is in the cropped picture? This has to be done on an Android phone. Any help will be appreciated! Mirko -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4eba59f2-0fbe-461a-bde8-1bee207ef1ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

