I'm running into nearly the same issues Philip mentioned in the post below (2.04 being far more accurate than 3.0, yet less stable than 3.0):
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/10466ace326a6c88/1c392c1f0fb7fd2c?lnk=gst&q=accuracy+of+3.0#1c392c1f0fb7fd2c Luckily, the scope of my OCR project is much smaller than it sounds like Philip's is. Mine involves OCR'ing documents (and then also parsing the results and putting them into a sql db) that all consist of text using the exact same font. Would accuracy be improved by training solely on this single font? I believe I know which font my documents use (or at least a font that very closely resembles it). Is there a way to manually go through the existing language data and pull out all other fonts? Barring this, is the training data between 2.04 and 3.00 compatible? That is, could i simply try to copy over some appropriate config and/ or data files from my 2.04 installation into my 3.00 installation and get comparable accuracy? I am not yet familiar with tesseract's config/data files or its training procedure, so please forgive me if this should be obvious... Thanks, Dustin -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

