I doubt that google will release their (full) training set :-(

Have a look at svn to file eng.cube.size [1]. You can see there name of
fonts that was training for English in 3.01. As far as I understood there is
(unpublished/not released) possibility to train language data directly on
font files. Unfortunately there are no detail for "cube" part of training.

Zd.

[1] 12,4Mb!
http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/eng.cube.size

On Wed, Feb 9, 2011 at 5:48 PM, Sly_bzh <[email protected]> wrote:

> I would like to train tesseract for English with some special fonts.
> Tesseract training documentation says that a text should be prepared
> and it must follow some important points (see
>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
> )
>
> Could someone provide to the community the content of a good and
> efficient text for english training ?
>
> Note : I think it could be useful to provide the texts that have been
> used to build the training files that could be downloaded in the
> "Download" section (http://code.google.com/p/tesseract-ocr/downloads/
> list). What do you think about that ?
>
> Thanks !
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to