I looked at those instructions and there was little in there I could do other than scale the image up. Which resulted in 5% accuracy going up to 10-15% accuracy.
Let me start this explanation over... I am doing a screen capture of a program, processing it to remove the background, leaving me with just the text I am interested in as an image file. Write this image out to file and passing it through Tesseract using the default English language *should* give me the text. (Since the image is approx 8 pt I scaled it up as per the suggestions before writing it to file). The individual characters are clear, crisp, and exactly the same each and every time. I expected decent results "out of the box". This is not the case. I do have all the characters of the font in a single image file, which I thought to use as a basis for creating my own training file. Not surprisingly the generated .box file for this image contained a lot of "wrong" guesses on what letter is represented by each individual character. Which meant some "quality" time with jTessBoxEditor to correct the file. Partway through this process I thought to "test" this to see if it was even worthwhile. I was successful in following the steps (with some alterations I did not write down to my regret) and the results were amazing. Even with the only partially corrected .box file accuracy shot up to around 70-80%. I have since finished editing the .box file with jTestBoxEditor so what is in the .box file matches what is in the source image. And now I will be damned if I can get through the steps to create the training file. Several attempts later I am well and truly frustrated. I do recall I had to deviate from the "official" instructions to make it work, but not what those changes were. Which is why I asked: If have these files named like this, what are the commands I have to execute to make this process work? On Friday, January 10, 2014 6:03:03 AM UTC-4, Nick White wrote: > > On Thu, Jan 09, 2014 at 11:46:17AM -0800, Doug . wrote: > > And I am still not clear why I have to create a new "language"? I have a > number > > of bitmap (not truetype) English fonts that Tesseract does a mediocre > job on > > "out of the box". > > How different are these fonts you're using from ordinary English > fonts? Unless they're substantially different you're unlikely to get > large gains from training for the new fonts, and your time would be > better spent checking the common issues at this page: > https://code.google.com/p/tesseract-ocr/wiki/PoorQuality > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

