Replying to myself so you can understand why it fails. Solution follows. > i'm getting: > Tesseract Open Source OCR Engine with Leptonica > APPLY_BOXES: > Boxes read from boxfile: 290 > Initially labelled blobs: 290 in 8 rows > Box failures detected: 0 > Duped blobs for rebalance: 0 > "<" has fewest samples: 1 > Total unlabelled words: 0 > Final labelled words: 290 > Generating training data > And then it just crashes without an error message. i'm unable to debug the > application (For some reason, the visual studio project shipped with the svn > version can't read the debugging information, i've tryed to dynamically read > the debugging symbols with no luck).
This is triggered in blobclass.cpp in function LearBlob, when trying to get the "firstdot" variable from a "filename" variable. After debugging this, i figured that the "filename" variable was set to "junk", because i just followed the wiki training doc. In fact, there seem to be a new filename format, as stated with the comment in this C++ file: // filename is expected to be of the form [lang].[fontname].exp[num] // The [lang], [fontname] and [num] fields should not have '.' characters. So instead of calling: tesseract OCRB.tif junk nobatch box.train.stderr You have to call: tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.stderr Thanks me, Me. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

