Yes, unfortunately you are right. No way to do this unless you have
source English tiff/box pairs, and these are held back by Google.
Consider detecting OCR-A areas to feed them to Tesseract separately
from other text. It is possible to switch between language files
during the single program run.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Wed, Sep 7, 2011 at 7:36 AM, haoest <[email protected]> wrote:
> I read the instructions (http://code.google.com/p/tesseract-ocr/wiki/
> TrainingTesseract3) several times over before I attempted, but am
> still uncertain.
>
> I am trying to add a new font, OCR-A, to the existing eng.traineddata
> file. All I need is the digits from 0 to 9, so I made a tif file
> consist of those 10 characters, made a box file and .tr file out of
> it, and this is where I hit the road block.
>
> I don't think I can simply append the output of cntraining or
> mftraining into the existing eng.inttemp/normproto. I need to rebuild
> ALL the .tr files from the original English tif/box package and then
> feed all of them, including my own .tr file, into the training
> prorgram to re-produce the inttmp and proto files.
>
> Is this correct, and is there an easier way? I just want 10 characters
> in OCR-A (http://en.wikipedia.org/wiki/OCR-A_font)
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to