Hi, I am working on a project that requires OCR.  I have not used Tesseract 
much before, aside from using it on some basic examples using the command 
line tool.  My goal is to use OCR on insurance cards to get all of the 
characters and then find certain information such as the ID of the 
cardholder from the output.  In this, accuracy is critical, as a single 
misread character messes up the entire ID.  

My concern stems from this need for extreme accuracy, which from this 
discussion thread 
<https://groups.google.com/forum/#!topic/tesseract-ocr/YO9XhsAWW_k>, 
appears would only be possible by running the character recognition on each 
individual character on the card.  The following quote is where I draw most 
of my worries from:

But if accuracy is critical in your app, in the long run I would absolutely 
> avoid using any parts of Tesseract except char classifier. I.e. crop every 
> single char out of your source image and run Tess in the single char PSM. I 
> think it's should be easy as long as location of every character is quite 
> stable among your source images. ImageMagick/shell scripts would suffice.
>

However, the images I will be processing differ vastly in layout - not 
stable like the example I linked to.   Some examples of how the format may 
differ follow:
 
<https://lh3.googleusercontent.com/-mPGe6BSmfSU/VWiQQMzkD8I/AAAAAAAAAA8/1WwUjQpPRkE/s1600/Sample_Card_2.jpg>
 
<https://lh3.googleusercontent.com/-ovzD1qb6x8g/VWiQWG6zP-I/AAAAAAAAABE/Sb6vNLozPoY/s1600/Sample_Card_3.jpg>
 
<https://lh3.googleusercontent.com/-K78wt72YzXA/VWiQinq_wiI/AAAAAAAAABM/wcYKEzXBYdI/s1600/Sample_Card_4.jpg>
 

I have run Tesseract on samples and while it works for most of the 
characters, there will be cases where it misreads a single character (such 
as registering an "H " when the character is a "W") or even worse an entire 
phrase(such as registering "No New Rum" when the phrase is actually "No 
Referral Required").  Because of errors like this, I would not be able to 
use the output that Tesseract currently gives me.

Is there a realistic way to use Tesseract for this kind of endeavor?

Thanks for taking the time to read,
Scott
 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d24aebd0-8e45-4ec4-8afa-6a583a5b9298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to