Hi Scott,

Can be done. Involves much R&D. Use layout templates for each card type.
For individual fields use patterns, ether Tess's or based on your own
logic. If some card type layout is too flexible - use field localization by
layout element relative positioning, fg/bg color, CC analysis, frame/table
borders, font size, dense text regions, etc. On a specific stage you can do
a bulk OCR, then search for a pattern, then search for a field in a
narrower subregion. Probably do cross checks against other fields in the
card or DB. In other words, increase probability in any way you can. Be
inventive. Decent accuracy can be achieved. You should admit, though, a
less than 100% accuracy rate.

Best regards,
Dmitri Silaev
www.CustomOCR.com





On Fri, May 29, 2015 at 10:57 PM, S Kirkwood <[email protected]>
wrote:

> Hi, I am working on a project that requires OCR.  I have not used
> Tesseract much before, aside from using it on some basic examples using the
> command line tool.  My goal is to use OCR on insurance cards to get all of
> the characters and then find certain information such as the ID of the
> cardholder from the output.  In this, accuracy is critical, as a single
> misread character messes up the entire ID.
>
> My concern stems from this need for extreme accuracy, which from this
> discussion thread
> <https://groups.google.com/forum/#!topic/tesseract-ocr/YO9XhsAWW_k>,
> appears would only be possible by running the character recognition on each
> individual character on the card.  The following quote is where I draw most
> of my worries from:
>
> But if accuracy is critical in your app, in the long run I would
>> absolutely avoid using any parts of Tesseract except char classifier. I.e.
>> crop every single char out of your source image and run Tess in the single
>> char PSM. I think it's should be easy as long as location of every
>> character is quite stable among your source images. ImageMagick/shell
>> scripts would suffice.
>>
>
> However, the images I will be processing differ vastly in layout - not
> stable like the example I linked to.   Some examples of how the format may
> differ follow:
>
>
> <https://lh3.googleusercontent.com/-mPGe6BSmfSU/VWiQQMzkD8I/AAAAAAAAAA8/1WwUjQpPRkE/s1600/Sample_Card_2.jpg>
> <https://lh3.googleusercontent.com/-ovzD1qb6x8g/VWiQWG6zP-I/AAAAAAAAABE/Sb6vNLozPoY/s1600/Sample_Card_3.jpg>
> <https://lh3.googleusercontent.com/-K78wt72YzXA/VWiQinq_wiI/AAAAAAAAABM/wcYKEzXBYdI/s1600/Sample_Card_4.jpg>
>
>
> I have run Tesseract on samples and while it works for most of the
> characters, there will be cases where it misreads a single character (such
> as registering an "H " when the character is a "W") or even worse an entire
> phrase(such as registering "No New Rum" when the phrase is actually "No
> Referral Required").  Because of errors like this, I would not be able to
> use the output that Tesseract currently gives me.
>
> Is there a realistic way to use Tesseract for this kind of endeavor?
>
> Thanks for taking the time to read,
> Scott
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d24aebd0-8e45-4ec4-8afa-6a583a5b9298%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d24aebd0-8e45-4ec4-8afa-6a583a5b9298%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFPVUg6KfAWgysxJWGdpNXt1M3Rs14iom%3D6xk9rVR6n0AA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to