Hey, Using @shreeshrii's excellent examples at https://github.com/Shreeshrii/tessdata_shreetest, I've fine tuned on a single monospace font with a giant pile of representative data. With very little effort the recognition results have been significantly better than using the stock english data -- just a few errors per page. Thanks so much!
However, I'd like to get even closer to zero errors. I've been trying to constrain my problem in an effort to get better results: - Known monospaced font, font size, page size - Known character set (ASCII) - Data layout is fairly consistent Are there configuration settings that I can use to provide hints to tesseract about the nature of the data? I don't really want it to do layout or blocks or anything particularly fancy, I just want it to recognize all the text and give it to me. I've been using page segment mode 6 (Assume a single uniform block of text). I've been going through the wiki but I haven't been able to make much more progress there. Thanks for any tips! Dustin -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4bfaf2ed-a8a0-429b-8b8f-cc9db11ba5a8%40googlegroups.com.