When a command like

combine_tessdata lang.

is issued, the "combine_tessdata" utility simply searches for files
having the name starting from "lang." and concatenates them into a
single ".traineddata" file. Hence the small size in your case.
Therefore you need to prefix the names of all your intermediate files
with "med." and then try to run "combine_tessdata" again. Its size
should be good. Mine was about 901K.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Thu, Jun 16, 2011 at 10:27 AM, Erik Reisig <[email protected]> wrote:
> After that I run
>
> Tesseract med.draft.tif med.arial nobatch box.train
>
> For every tif/box pair. This creates a .tr for each pair.
>
> I attached the .tr file for my specific font.
>
> Then I run
>
> Unichar-set_extractor med.arial.box med.draft.box …..
>
> With each box file as an argument.
>
> The creates the unicharset file I attached.
>
> After that I run
>
> Mftraining –F font_properties –U unicharset –O med.unicharset
> med.arial.tr. med.draft.tr …
>
> I attached the font-properties file and the mftraining output.
>
> After that I run
>
> Cntraining med.arial.tr med.draft.tr ….
>
> Also attached the cntraining output files.
>
> Since I currently don’t need any dictionary data, I don’t create any.
>
> Then I run
>
> combine_tessdata med.
>
> which generates me the
>
> med.traineddata file.
>
> Unfortonately the file is only 2kb large and every attempt to
> recognize test using this fails fails completely.
>
> Could somebody please point out at which point in my process im making
> a mistake?
>
> Any help would be greatly appreciated!
>
> Thanks,
>
> Erik
>
>
> Attachments:
>
> http://dl.dropbox.com/u/686228/training_tesseract_for_draft.zip
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to