When a command like combine_tessdata lang.
is issued, the "combine_tessdata" utility simply searches for files having the name starting from "lang." and concatenates them into a single ".traineddata" file. Hence the small size in your case. Therefore you need to prefix the names of all your intermediate files with "med." and then try to run "combine_tessdata" again. Its size should be good. Mine was about 901K. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Jun 16, 2011 at 10:27 AM, Erik Reisig <[email protected]> wrote: > After that I run > > Tesseract med.draft.tif med.arial nobatch box.train > > For every tif/box pair. This creates a .tr for each pair. > > I attached the .tr file for my specific font. > > Then I run > > Unichar-set_extractor med.arial.box med.draft.box ….. > > With each box file as an argument. > > The creates the unicharset file I attached. > > After that I run > > Mftraining –F font_properties –U unicharset –O med.unicharset > med.arial.tr. med.draft.tr … > > I attached the font-properties file and the mftraining output. > > After that I run > > Cntraining med.arial.tr med.draft.tr …. > > Also attached the cntraining output files. > > Since I currently don’t need any dictionary data, I don’t create any. > > Then I run > > combine_tessdata med. > > which generates me the > > med.traineddata file. > > Unfortonately the file is only 2kb large and every attempt to > recognize test using this fails fails completely. > > Could somebody please point out at which point in my process im making > a mistake? > > Any help would be greatly appreciated! > > Thanks, > > Erik > > > Attachments: > > http://dl.dropbox.com/u/686228/training_tesseract_for_draft.zip > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

