Hi Thomas, On Mon, Aug 18, 2014 at 02:17:19PM -0700, Thomas Bruno wrote: > Where can I find the box/tif combo for the eng.traineddata that Tessearct 3.02 > provides for download?
The tif/box files used to create the eng.traineddata for 3.02 are not available, and are very unlikely to be made so, because they were automatically generated using a program that was specific to Google's infrastructure. The good news is that the training image generation program has recently been added to the code repository[0] and works with regular Linux distributions, as well as most[1] of the information needed to recreate the training tif/box files[2]. If you can get that working, you can just add your own training tif/box files alongside it. I plan to update the TrainingTesseract3 wiki page soon to make this clearer, but haven't done so yet. An alternative option would just be to use your new training alongside the official eng.traineddata, and call it something else, so you call tesseract like this: tesseract -l eng+mycustomeng image.png outbase Nick 0. See the training/text2image tool in the main code repository 1. https://groups.google.com/forum/#!topic/tesseract-dev/VhUk9IxFt8Y 2. See the langdata repository -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140820153549.GA2103%40manta.lan. For more options, visit https://groups.google.com/d/optout.

