I am creating training data for GD&T symbols using Tesseract 3.05.01. One of my TIFF files I use for training is in the attached gdt.symbols.exp10.tif. When I attempt to use this TIFF with the corresponding gdt.symbols.exp10.box, I get this output:
Tesseract Open Source OCR Engine v3.05.01 with Leptonica Page 1 FAIL! APPLY_BOXES: boxfile line 7/Ⓜ ((1153,69),(1431,346)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 10/Ⓜ ((1993,69),(2268,346)): FAILURE! Couldn't find a matching blob APPLY_BOXES: Boxes read from boxfile: 10 Boxes failed resegmentation: 2 Found 8 good blobs. Generated training data for 5 words Basically, both circled M symbols are failing. I've attached ImagesWithBoxes.PNG which is a screen capture from jTessBoxEditor showing the TIFF image with boxes. As you can see, the boxes appear to be correct. Why isn't tesseract able to use the circle M symbols for training? Can I change the image of the symbols some how to help tesseract... maybe connect the circle and M parts with a line? Thanks in advance. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b598449f-05ab-4a64-b948-eb3c1ac10b7b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
gdt.symbols.exp10.box
Description: Binary data

