Yes, unfortunately you are right. No way to do this unless you have source English tiff/box pairs, and these are held back by Google. Consider detecting OCR-A areas to feed them to Tesseract separately from other text. It is possible to switch between language files during the single program run.
Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Sep 7, 2011 at 7:36 AM, haoest <[email protected]> wrote: > I read the instructions (http://code.google.com/p/tesseract-ocr/wiki/ > TrainingTesseract3) several times over before I attempted, but am > still uncertain. > > I am trying to add a new font, OCR-A, to the existing eng.traineddata > file. All I need is the digits from 0 to 9, so I made a tif file > consist of those 10 characters, made a box file and .tr file out of > it, and this is where I hit the road block. > > I don't think I can simply append the output of cntraining or > mftraining into the existing eng.inttemp/normproto. I need to rebuild > ALL the .tr files from the original English tif/box package and then > feed all of them, including my own .tr file, into the training > prorgram to re-produce the inttmp and proto files. > > Is this correct, and is there an easier way? I just want 10 characters > in OCR-A (http://en.wikipedia.org/wiki/OCR-A_font) > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

