Pierre, Please confirm whether you have succeeded in training by using your commandline like "tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.logfile" [please note Logfile is used for Windows platform like winXP] Kindly upload OCRB.tif for hands on experience by me. I wanted to use your commandline for Indic lang like Kannada. Thanks for your research, Pierre. -With regards, -sriranga(77yrsold)
On Mon, Apr 12, 2010 at 9:02 PM, MARTIN Pierre <[email protected]> wrote: > Replying to myself so you can understand why it fails. Solution follows. > > > i'm getting: > *Tesseract Open Source OCR Engine with Leptonica* > *APPLY_BOXES:* > *Boxes read from boxfile: 290* > *Initially labelled blobs: 290 in 8 rows* > *Box failures detected: 0* > *Duped blobs for rebalance: 0* > *"<" has fewest samples: 1* > *Total unlabelled words: 0* > *Final labelled words: 290* > *Generating training data* > And then it just crashes without an error message. i'm unable to debug the > application (For some reason, the visual studio project shipped with the svn > version can't read the debugging information, i've tryed to dynamically read > the debugging symbols with no luck). > > > This is triggered in blobclass.cpp in function LearBlob, when trying to get > the "firstdot" variable from a "filename" variable. > After debugging this, i figured that the "filename" variable was set to > "junk", because i just followed the wiki training doc. > In fact, there seem to be a new filename format, as stated with the comment > in this C++ file: > // filename is expected to be of the form [lang].[fontname].exp[num] > // The [lang], [fontname] and [num] fields should not have '.' characters. > So instead of calling: > tesseract OCRB.tif junk nobatch box.train.stderr > You have to call: > tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.stderr > > Thanks me, > Me. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<tesseract-ocr%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

