[tesseract-ocr] Ideal config settings for finetuned monospace text?

Dustin Spicuzza Thu, 12 Sep 2019 23:01:45 -0700

Hey,

Using @shreeshrii's excellent examples at 
https://github.com/Shreeshrii/tessdata_shreetest, I've fine tuned on a 
single monospace font with a giant pile of representative data. With very 
little effort the recognition results have been significantly better than 
using the stock english data -- just a few errors per page. Thanks so much!


However, I'd like to get even closer to zero errors. I've been trying to 
constrain my problem in an effort to get better results:

   - Known monospaced font, font size, page size
   - Known character set (ASCII)
   - Data layout is fairly consistent

Are there configuration settings that I can use to provide hints to 
tesseract about the nature of the data? I don't really want it to do layout 
or blocks or anything particularly fancy, I just want it to recognize all 
the text and give it to me. I've been using page segment mode 6 (Assume a 
single uniform block of text). I've been going through the wiki but I 
haven't been able to make much more progress there.

Thanks for any tips!

Dustin

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4bfaf2ed-a8a0-429b-8b8f-cc9db11ba5a8%40googlegroups.com.

[tesseract-ocr] Ideal config settings for finetuned monospace text?

Reply via email to