I doubt that google will release their (full) training set :-( Have a look at svn to file eng.cube.size [1]. You can see there name of fonts that was training for English in 3.01. As far as I understood there is (unpublished/not released) possibility to train language data directly on font files. Unfortunately there are no detail for "cube" part of training.
Zd. [1] 12,4Mb! http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/eng.cube.size On Wed, Feb 9, 2011 at 5:48 PM, Sly_bzh <[email protected]> wrote: > I would like to train tesseract for English with some special fonts. > Tesseract training documentation says that a text should be prepared > and it must follow some important points (see > > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images > ) > > Could someone provide to the community the content of a good and > efficient text for english training ? > > Note : I think it could be useful to provide the texts that have been > used to build the training files that could be downloaded in the > "Download" section (http://code.google.com/p/tesseract-ocr/downloads/ > list). What do you think about that ? > > Thanks ! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

