I looked at that but there is precious little I can use from there. I am grabbing English text from a program (approx 8 pt bitmap font), writing it to an image file, and passing the image file through tesseract. Following the instructions I have scaled the image up. That gets me from about 5% accuracy (original size) to about 10-12% accuracy (scaled up). The characters are clear, distinct, and exactly the same every single time. I had expected tesseract would do a decent job of it, but this has not proven to be the case.
I have *once* (and only once) managed to get the training steps to work, completely by accident. I have not been able to recreate that since: While editing the .box file for my first font I wondered if I was going through all this trouble for nothing, so I took what I had and ran through the steps to train it. And noticed a significant increase in accuracy (70-80%), even with a incomplete/inaccurate .tif/.box file as a basis. Satisfied this was worth the effort I finished editing the .box file and proceeded to work through the steps again. I'll be damned if I can now get through the steps successfully. So I am stuck with a completed .tif/.box combination I spent a *lot* of time on that is doing nothing for me since I can't get tesseract to train on the blasted thing. Frustrating indeed. Side note: Contrary to the instructions the "Run Tesseract for Training" section, unicharset_extractor would crash and burn every single time for me unless I edited the file and replaced UnknownFont with the name of the font. On Friday, January 10, 2014 6:03:03 AM UTC-4, Nick White wrote: > > On Thu, Jan 09, 2014 at 11:46:17AM -0800, Doug . wrote: > > And I am still not clear why I have to create a new "language"? I have a > number > > of bitmap (not truetype) English fonts that Tesseract does a mediocre > job on > > "out of the box". > > How different are these fonts you're using from ordinary English > fonts? Unless they're substantially different you're unlikely to get > large gains from training for the new fonts, and your time would be > better spent checking the common issues at this page: > https://code.google.com/p/tesseract-ocr/wiki/PoorQuality > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

