Re: options to specify possible characters

Nick White Wed, 19 Mar 2014 11:53:38 -0700

Hi Roger,

Tesseract has loads of options, but they're mostly specified as 
arguments after the -c flag. A lot of the options you can set with 
that are more for debugging or development, though, so best is just 
to check if what you want to do is mentioned in the wiki (in this 
case it is at [0] and [1]).


> "-C string
> only recognise characters from string, this is a filter function in cases 
> where
> the interest is only to a part of the character alphabet, you can use 0-9 or
> a-z to specify ranges, use -- to detect the minus sign"

You can do this with tesseract with a command like this:

  tesseract image output -c tessedit_char_whitelist string

> "-l level
> set grey level to level (0<160<=255, default: 0 for autodetect), darker pixels
> belong to characters, brighter pixels are interpreted as background of the
> input image"

Tesseract essentially does "autodetect" always here. If it gets the 
binarisation (converting from grey to black & white) wrong, you need 
to preprocess the image until it works ;)

Hope that helps,

Nick

0.  
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality#Dictionaries,_word_lists,_and_patterns
1.  
https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: options to specify possible characters

Reply via email to