I digged around and found the raw tiff/box package in the downloads
page:

http://code.google.com/p/tesseract-ocr/downloads/detail?name=eng.traineddata.gz&can=2&q=

But without a batch file to build the .tr files, re-building all 32
fonts from command line would be terrifying. But I don't see an
alternative other than to bite the bullet. The plan is to take out one
of the fonts from the package and replace it with my own tif/box pair,
then proceed to buy a super cheap one-time-use keyboard for this
assignment -- I don't want to wear out my Logitech Keyboard.



On Sep 7, 4:08 pm, Dmitri Silaev <[email protected]> wrote:
> Yes, unfortunately you are right. No way to do this unless you have
> source English tiff/box pairs, and these are held back by Google.
> Consider detecting OCR-A areas to feed them to Tesseract separately
> from other text. It is possible to switch between language files
> during the single program run.
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Wed, Sep 7, 2011 at 7:36 AM, haoest <[email protected]> wrote:
> > I read the instructions (http://code.google.com/p/tesseract-ocr/wiki/
> > TrainingTesseract3) several times over before I attempted, but am
> > still uncertain.
>
> > I am trying to add a new font, OCR-A, to the existing eng.traineddata
> > file. All I need is the digits from 0 to 9, so I made a tif file
> > consist of those 10 characters, made a box file and .tr file out of
> > it, and this is where I hit the road block.
>
> > I don't think I can simply append the output of cntraining or
> > mftraining into the existing eng.inttemp/normproto. I need to rebuild
> > ALL the .tr files from the original English tif/box package and then
> > feed all of them, including my own .tr file, into the training
> > prorgram to re-produce the inttmp and proto files.
>
> > Is this correct, and is there an easier way? I just want 10 characters
> > in OCR-A (http://en.wikipedia.org/wiki/OCR-A_font)
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to