Hi Grant,
The percent accuracy depends on what kind of data you're looking to
process. People who want to process large data sets with Tesseract
typically train for the specific domain they're going to use it for,
unless they're using high quality scans with common fonts, in which
case accuracy is around 95--98% for most people. Many people
post-process the OCR'ed text -- for a point and click solution, try
VietOCR which uses Tesseract. Several people have gotten close to 100%
accuracy with training. Numbers tend to throw a wrench in things
because they're typically formatted weirdly. There is a standard
EuroTest document on the website to show what symbols it can process
easily.
-_Sven

On Fri, May 18, 2012 at 8:42 AM, Grant Fletcher <[email protected]> wrote:
> Hi All,
>
> Does anyone have a set of sample data I can have to test, & in understanding
> how to work out a percentage accuracy of the OCR engine..
>
> Any assistance on this would be appreciated.
>
> Thanks
> Grant

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to