See https://github.com/tesseract-ocr/tesseract/pull/2294

On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote:

> Is there a way to restrict the character set that tesseract-ocr will
> attempt to identify?  I'm scanning USA-based receipts which have a fairly
> simple set of monospaced characters but, for example, often '1' will get
> misidentified as '|', and a whole host of other simple substitution
> errors.  If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would
> be an immediate boost to accuracy.  (Hoping for a way that doesn't involved
> having to retrain from scratch on the limited set.)
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV5JWeiv1xbP%3DKS33aavTqBMa_BKWHrJ2VOuMqCyyfZmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to