[tesseract-ocr] How do I use custom words list (whitelist only) to recognise only 1 word from image?

Erikas Rudinskas Sat, 22 Sep 2018 09:32:47 -0700

Hi,

I am trying to find a way on how to define my own words list for tesseract. 
I want to use only my defined words and guess the most likely one.


So I have a small image with a single word in it. I process it with this 
command in order to get pure "black on white" type of image:

$ convert -colorspace gray -auto-level -threshold 60% -type bilevel -depth 
8 *image.png newimage.png*

Then I try to extract a single word from that image:

$ tesseract *newimage.png* -psm 8 stdout

and it returns a single word (which is great), but slightly incorrect:

*Expectation*: nieko
*Result*: ﬂieko

I've just spent like 5+ hours trying to find any documentation or tutorial 
on how to set a whitelist dictionary for words recognition. Any tips on 
that?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f0147f58-f586-4346-bb35-366d571bf0ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] How do I use custom words list (whitelist only) to recognise only 1 word from image?

Reply via email to