I have to read sets of numbers from a very large number of cards, but I 
need good accuracy.  15 digits and a 4 digit pin.  There's no check digit, 
but some digits are the same on every card and the font and spacing are the 
same.    I've attached a sample image below.  I tested tesseract on that 
image, and several others like it and I'm getting pretty poor accuracy, 80% 
or less sometimes.  I know better cropping and getting the rotation correct 
will help, but I'm still getting poor accuracy after manually cropping 
them.  I also thought about processing the pin part separately, and pulling 
the crop in closer.  This is tricky because the spacing is consistent 
between the 15 digit part and the pin, but whole set of numbers is not 
located in precisely the same place on each card.   I'm sure I can write 
some code that would use the first number as a reference point and crop the 
pin separately and much tighter, but I'd rather not write it if it won't 
help.

<https://lh3.googleusercontent.com/-4fDWYjRZGG0/VV-s2SuWo1I/AAAAAAAAIg0/PAECkwyqjwg/s1600/cardimg.tiff>
 
I've already written a little program so I can put a card under the camera, 
press a key, and it will display the cropped image above the tesseract 
output so I can manually confirm.  I just need to figure out what I need to 
do to improve tesseract's performance, because so far I haven't had a 
single card recognized accurately.  I expected some difficulty with the 
background noise around the pin, but I'm suprised at getting poor 
recognition even on the first 15 digits.  I've got a better camera on 
order, and I'm going to make a little frame to hold the cards so I'll be 
able to get perfectly cropped and rotated images, and much better image 
quality.  What else can I do to improve my accuracy in this situation?  Is 
this a case where training would help? I'm open to any idea that can be 
made automated.  





-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3634aa1c-24c2-499d-a2a6-1e711da3962c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to