I have to read sets of numbers from a very large number of cards, but I need good accuracy. 15 digits and a 4 digit pin. There's no check digit, but some digits are the same on every card and the font and spacing are the same. I've attached a sample image below. I tested tesseract on that image, and several others like it and I'm getting pretty poor accuracy, 80% or less sometimes. I know better cropping and getting the rotation correct will help, but I'm still getting poor accuracy after manually cropping them. I also thought about processing the pin part separately, and pulling the crop in closer. This is tricky because the spacing is consistent between the 15 digit part and the pin, but whole set of numbers is not located in precisely the same place on each card. I'm sure I can write some code that would use the first number as a reference point and crop the pin separately and much tighter, but I'd rather not write it if it won't help.
<https://lh3.googleusercontent.com/-4fDWYjRZGG0/VV-s2SuWo1I/AAAAAAAAIg0/PAECkwyqjwg/s1600/cardimg.tiff> I've already written a little program so I can put a card under the camera, press a key, and it will display the cropped image above the tesseract output so I can manually confirm. I just need to figure out what I need to do to improve tesseract's performance, because so far I haven't had a single card recognized accurately. I expected some difficulty with the background noise around the pin, but I'm suprised at getting poor recognition even on the first 15 digits. I've got a better camera on order, and I'm going to make a little frame to hold the cards so I'll be able to get perfectly cropped and rotated images, and much better image quality. What else can I do to improve my accuracy in this situation? Is this a case where training would help? I'm open to any idea that can be made automated. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3634aa1c-24c2-499d-a2a6-1e711da3962c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.