See https://github.com/tesseract-ocr/tesseract/pull/2294
On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote: > Is there a way to restrict the character set that tesseract-ocr will > attempt to identify? I'm scanning USA-based receipts which have a fairly > simple set of monospaced characters but, for example, often '1' will get > misidentified as '|', and a whole host of other simple substitution > errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would > be an immediate boost to accuracy. (Hoping for a way that doesn't involved > having to retrain from scratch on the limited set.) > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV5JWeiv1xbP%3DKS33aavTqBMa_BKWHrJ2VOuMqCyyfZmQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

