Hi Joseph, > Is there any easy way to simply tell Tesseract "this is what all the > letters look like in this font", so that it can know that an O with a + > inside is really just an O?
That would be training. If the source files for the English training were available you could just add an extra png+box set with your new font to that and then regenerate the training, but unfortunately they aren't for Tesseract 3. There are source tiff+box files for Tesseract 2 (see boxtiff-2.01.eng.tar.gz in the Downloads section), though these don't include things like the word lists so wouldn't be too much help to you. But that's OK, just using a png+box set without things like wordlists could well provide accuracy which is fine for your needs. If I were you I'd create a new training, using one of the tools mentioned in a recent thread "Scripts to semi-automate training" to help you, and see how that works out. > Is there a flag or setting where I can specify "only normal A-Z characters > and numbers 0-9"? There is, you can set the characters you want in a file in tessdata/configs/somename - see http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits? for an example of that. Out of curiousity are you planning to use Tesseract to read from screen grabs of Diablo? I wonder whether there would be any trouble with the texture and colouring of the background. I would be interested to know... Best of luck, and let us know how you get on. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

