I am creating training data for GD&T symbols using Tesseract 3.05.01. One 
of my TIFF files I use for training is in the attached 
gdt.symbols.exp10.tif. When I attempt to use this TIFF with the 
corresponding gdt.symbols.exp10.box, I get this output:

Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Page 1
FAIL!
APPLY_BOXES: boxfile line 7/Ⓜ ((1153,69),(1431,346)): FAILURE! Couldn't 
find a matching blob
FAIL!
APPLY_BOXES: boxfile line 10/Ⓜ ((1993,69),(2268,346)): FAILURE! Couldn't 
find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:      10
   Boxes failed resegmentation:       2
   Found 8 good blobs.
Generated training data for 5 words


Basically, both circled M symbols are failing.

I've attached ImagesWithBoxes.PNG which is a screen capture from 
jTessBoxEditor showing the TIFF image with boxes. As you can see, the boxes 
appear to be correct.

Why isn't tesseract able to use the circle M symbols for training? Can I 
change the image of the symbols some how to help tesseract... maybe connect 
the circle and M parts with a line?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b598449f-05ab-4a64-b948-eb3c1ac10b7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: gdt.symbols.exp10.box
Description: Binary data

Reply via email to