Hi Thomas,

On Mon, Aug 18, 2014 at 02:17:19PM -0700, Thomas Bruno wrote: 
> Where can I find the box/tif combo for the eng.traineddata that Tessearct 3.02
> provides for download?

The tif/box files used to create the eng.traineddata for 3.02 are 
not available, and are very unlikely to be made so, because they 
were automatically generated using a program that was specific to 
Google's infrastructure.

The good news is that the training image generation program has 
recently been added to the code repository[0] and works with regular 
Linux distributions, as well as most[1] of the information needed to 
recreate the training tif/box files[2]. If you can get that working, 
you can just add your own training tif/box files alongside it.

I plan to update the TrainingTesseract3 wiki page soon to make this 
clearer, but haven't done so yet.

An alternative option would just be to use your new training 
alongside the official eng.traineddata, and call it something else, 
so you call tesseract like this:
  tesseract -l eng+mycustomeng image.png outbase

Nick

0. See the training/text2image tool in the main code repository
1. https://groups.google.com/forum/#!topic/tesseract-dev/VhUk9IxFt8Y
2. See the langdata repository

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140820153549.GA2103%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Reply via email to