[tesseract-ocr] Settings for non-language recognition of short codes

Martin Camitz Tue, 21 Aug 2018 12:27:18 -0700

Hello,

I'm using tesseract (actually the tesseract.js port) to recognize short 6 
character codes like this


F65M0P

What are the optimal settings for this in terms of speed and correctness?

Things to note:

- Language is irrelevant.
- Codes are always 6 characters long, uppercase, both digits and letters.
- The font is chosen for its prevalence OCR contexts.
- This image consists of the code and nothing else.

I experience tesseract quite slow compared to larger texts which I suspect 
has to do with trying to force out a dictionary word. It will mostly prefer 
letters over digits for example S instead of 5.

I am unfamiliar with tessaract and OCR, and you might be unfamiliar with 
the js-port. I don't think I can train the engine but I can set options 
like language_model_penalty_non_dict_word.

Thanks for your help.

Martin

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ea523655-5ca7-42e2-860b-ba6894da8ba2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Settings for non-language recognition of short codes

Reply via email to