improving accuracy when working with a single font

dustin Thu, 11 Aug 2011 19:55:13 -0700

I'm running into nearly the same issues Philip mentioned in the post
below (2.04 being far more accurate than 3.0, yet less stable than
3.0):


http://groups.google.com/group/tesseract-ocr/browse_thread/thread/10466ace326a6c88/1c392c1f0fb7fd2c?lnk=gst&q=accuracy+of+3.0#1c392c1f0fb7fd2c

Luckily, the scope of my OCR project is much smaller than it sounds
like Philip's is.  Mine involves OCR'ing documents (and then also
parsing the results and putting them into a sql db) that all consist
of text using the exact same font.

Would accuracy be improved by training solely on this single font?  I
believe I know which font my documents use (or at least a font that
very closely resembles it).  Is there a way to manually go through the
existing language data and pull out all other fonts?

Barring this, is the training data between 2.04 and 3.00 compatible?
That is, could i simply try to copy over some appropriate config and/
or data files from my 2.04 installation into my 3.00 installation and
get comparable accuracy?

I am not yet familiar with tesseract's config/data files or its
training procedure, so please forgive me if this should be obvious...

Thanks,
Dustin

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

improving accuracy when working with a single font

Reply via email to