OK! I have resample the image in 300x300 and binarized it, but when
training tesseract obtain this log:
Tesseract Open Source OCR Engine
Image has 1 * 1 bit per pixel, and size (2667,2000)
Resolution=300
APPLY_BOXES: boxfile 1/2/C ((919,900),(1047,1120)): FAILURE! box
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:1 row:1 "C"
APPLY_BOXES: boxfile 1/4/5 ((1276,907),(1401,1122)): FAILURE! box
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:1 row:1 "1"
APPLY_BOXES: boxfile 1/6/B ((1539,904),(1666,1122)): FAILURE! box
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:1 row:1 "4"
APPLY_BOXES: FATALITY - 0 labelled samples of "C" - target is 2:
C:[43]
APPLY_BOXES: FATALITY - 0 labelled samples of "1" - target is 1:
1:[31]
APPLY_BOXES: FATALITY - 0 labelled samples of "5" - target is 1:
5:[35]
APPLY_BOXES: FATALITY - 0 labelled samples of "4" - target is 1:
4:[34]
APPLY_BOXES: FATALITY - 0 labelled samples of "B" - target is 1:
B:[42]
APPLY_BOXES:
Boxes read from boxfile: 7
Initially labelled blobs: 1 in 1 rows
Box failures detected: 6
Duped blobs for rebalance: 0
"C" has fewest samples: 0
Total unlabelled words: 0
Final labelled words: 1
Generating training data
box overlaps?why? i have use a python script to find a box! what's the
problem now?
this is the two file:
http://groups.google.com/group/tesseract-ocr/web/t1%20%282%29.tif
http://groups.google.com/group/tesseract-ocr/web/t1%20%282%29.box
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.