Anybody have instructions on training Tesseract that actually work?

The original instructions 
(TrainingTesseract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>)
 
are not at all clear, and in at least one case just plain wrong. matthew 
christy's instructions (Training with Tesseract | 
eMOP<http://emop.tamu.edu/node/47>) 
are incomplete, contradicts the original instructions, and assumes the use 
of Franken+ and Aletheia. Haven't been able to get my hands on Aletheia yet 
so those instructions do me absolutely no good. To say this is not newbie 
friendly is an understatement of epic proportions.

And I am still not clear why I have to create a new "language"? I have a 
number of bitmap (not truetype) English fonts that Tesseract does a 
mediocre job on "out of the box". All I want to do is add these couple 
fonts and work with them. I suppose if the files used to create the English 
training data were available I could add them to the English language 
files. But they don't appear to exist?. Or a tool to extract them from the 
training data file???

What I have: A number of bitmap fonts in which the characters are exactly 
the same size, shape, and quality every single time. Every time with zero 
variation. For each font I have an image containing the entire font's 
characters. Using the 
jTessBoxEditor<http://vietocr.sourceforge.net/training.html>I have a working 
.box file for each font. For simplicity sake say I have 
Font1.tif/Font1.box, Font2.tif/Font2.box, and Font3.tif/Font3.box. How 
exactly do I add these three fonts to Tesseract?

(Yes, I am a *tiny* bit frustrated at the crap documentation at this point)

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to