Hello,

I'm trying to recognize the machine readable part of a passport. (see the 
last line in this picture: http://s.hswstatic.com/gif/passport-11.jpg )

I'm using Tesseract on Android (tess-two) and take the picture with a 5 
Mpix mobile camera. Unfortunately, the accuracy is not satisfyingly high. 
What I have tried to improve recognition was cropping the picture and 
retraining Tesseract for the font used in a passport (ocr-b). Both raises 
accuracy but still not to an acceptable level.
Here is a typical cropped picture I hand to Tesseract to perform ocr:

<https://lh3.googleusercontent.com/-DjwyoGe0dYQ/VSU_mcxzkMI/AAAAAAAAAAM/3HpmT04hzBM/s1600/croppic6.gif>

The binarized picture created by Tess for the actual recognition looks like 
this:

<https://lh3.googleusercontent.com/-DwGxUTaDcK0/VSU_5W9pnpI/AAAAAAAAAAU/ffmiLw6yuLo/s1600/tessinput6.tif>

This is what Tesseract recognizes:


*  09 1 M 1 907 1 8  F8 F857<4 < W<B<O <UME  QVWBBENO W JMGHJ <RBP6W9BQR ED*


I figured that the thin line at the bottom is extremely distracting to 
Tesseract. If I cut off the line manually and perform ocr, results are 
perfectly fine and all characters are recognized.
My question is, how can I find and get rid of that line automatically if it 
is in the cropped picture? This has to be done on an Android phone.

Any help will be appreciated!
Mirko

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4eba59f2-0fbe-461a-bde8-1bee207ef1ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to