Fwd: Generating / Training box files.

MARTIN Pierre Mon, 12 Apr 2010 10:19:35 -0700

Replying to myself so you can understand why it fails. Solution follows.

> i'm getting:
> Tesseract Open Source OCR Engine with Leptonica
> APPLY_BOXES:
> Boxes read from boxfile:     290
> Initially labelled blobs:    290 in 8 rows
> Box failures detected:            0
> Duped blobs for rebalance:     0
> "<" has fewest samples:     1
> Total unlabelled words:        0
> Final labelled words:        290
> Generating training data
> And then it just crashes without an error message. i'm unable to debug the 
> application (For some reason, the visual studio project shipped with the svn 
> version can't read the debugging information, i've tryed to dynamically read 
> the debugging symbols with no luck).


This is triggered in blobclass.cpp in function LearBlob, when trying to get the 
"firstdot" variable from a "filename" variable.
After debugging this, i figured that the "filename" variable was set to "junk", 
because i just followed the wiki training doc.
In fact, there seem to be a new filename format, as stated with the comment in 
this C++ file:
// filename is expected to be of the form [lang].[fontname].exp[num]
// The [lang], [fontname] and [num] fields should not have '.' characters.
So instead of calling:
tesseract OCRB.tif junk nobatch box.train.stderr
You have to call:
tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.stderr

Thanks me,
Me.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Fwd: Generating / Training box files.

Reply via email to