I digged around and found the raw tiff/box package in the downloads page: http://code.google.com/p/tesseract-ocr/downloads/detail?name=eng.traineddata.gz&can=2&q=
But without a batch file to build the .tr files, re-building all 32 fonts from command line would be terrifying. But I don't see an alternative other than to bite the bullet. The plan is to take out one of the fonts from the package and replace it with my own tif/box pair, then proceed to buy a super cheap one-time-use keyboard for this assignment -- I don't want to wear out my Logitech Keyboard. On Sep 7, 4:08 pm, Dmitri Silaev <[email protected]> wrote: > Yes, unfortunately you are right. No way to do this unless you have > source English tiff/box pairs, and these are held back by Google. > Consider detecting OCR-A areas to feed them to Tesseract separately > from other text. It is possible to switch between language files > during the single program run. > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > > > > > > > On Wed, Sep 7, 2011 at 7:36 AM, haoest <[email protected]> wrote: > > I read the instructions (http://code.google.com/p/tesseract-ocr/wiki/ > > TrainingTesseract3) several times over before I attempted, but am > > still uncertain. > > > I am trying to add a new font, OCR-A, to the existing eng.traineddata > > file. All I need is the digits from 0 to 9, so I made a tif file > > consist of those 10 characters, made a box file and .tr file out of > > it, and this is where I hit the road block. > > > I don't think I can simply append the output of cntraining or > > mftraining into the existing eng.inttemp/normproto. I need to rebuild > > ALL the .tr files from the original English tif/box package and then > > feed all of them, including my own .tr file, into the training > > prorgram to re-produce the inttmp and proto files. > > > Is this correct, and is there an easier way? I just want 10 characters > > in OCR-A (http://en.wikipedia.org/wiki/OCR-A_font) > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

