Re: Tesseract with weird/stylized fonts?

Nick White Wed, 06 Jun 2012 04:22:18 -0700

Hi Joseph,

> Is there any easy way to simply tell Tesseract "this is what all the 
> letters look like in this font", so that it can know that an O with a + 
> inside is really just an O?


That would be training. If the source files for the English training
were available you could just add an extra png+box set with your new
font to that and then regenerate the training, but unfortunately they
aren't for Tesseract 3. There are source tiff+box files for Tesseract 2
(see boxtiff-2.01.eng.tar.gz in the Downloads section), though these
don't include things like the word lists so wouldn't be too much help
to you. But that's OK, just using a png+box set without things like
wordlists could well provide accuracy which is fine for your needs.

If I were you I'd create a new training, using one of the tools
mentioned in a recent thread "Scripts to semi-automate training" to
help you, and see how that works out.

> Is there a flag or setting where I can specify "only normal A-Z characters 
> and numbers 0-9"?

There is, you can set the characters you want in a file in
tessdata/configs/somename - see
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?
for an example of that.

Out of curiousity are you planning to use Tesseract to read from screen
grabs of Diablo? I wonder whether there would be any trouble with
the texture and colouring of the background. I would be interested
to know...

Best of luck, and let us know how you get on.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract with weird/stylized fonts?

Reply via email to