Have you tried using PSM 13? I get a few % more than 6 on my dataset. Also, what kind of image preprocessing are you doing? I've reclaimed a ton of accuracy finely tuning my preprocessing. Mind posting some pictures of what you're recognizing?
On Fri, Sep 13, 2019 at 2:00 AM Dustin Spicuzza <dustin.spicu...@gmail.com> wrote: > Hey, > > Using @shreeshrii's excellent examples at > https://github.com/Shreeshrii/tessdata_shreetest, I've fine tuned on a > single monospace font with a giant pile of representative data. With very > little effort the recognition results have been significantly better than > using the stock english data -- just a few errors per page. Thanks so much! > > However, I'd like to get even closer to zero errors. I've been trying to > constrain my problem in an effort to get better results: > > - Known monospaced font, font size, page size > - Known character set (ASCII) > - Data layout is fairly consistent > > Are there configuration settings that I can use to provide hints to > tesseract about the nature of the data? I don't really want it to do layout > or blocks or anything particularly fancy, I just want it to recognize all > the text and give it to me. I've been using page segment mode 6 (Assume a > single uniform block of text). I've been going through the wiki but I > haven't been able to make much more progress there. > > Thanks for any tips! > > Dustin > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4bfaf2ed-a8a0-429b-8b8f-cc9db11ba5a8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4bfaf2ed-a8a0-429b-8b8f-cc9db11ba5a8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CABtjQ9%2BTy%2BYQUZNtE--OMr6zFhTDO4%2B5_RYdTnNaHqtGN7-8Wg%40mail.gmail.com.