You need a much larger sample, in the range of hundreds or at least several dozens, so that even though some symbols could experience "Couldn't find a matching blob" errors, other samples would get picked up.
On Saturday, May 26, 2018 at 1:52:39 AM UTC-5, Paul Kitchen wrote: > > I am creating training data for GD&T symbols using Tesseract 3.05.01. One > of my TIFF files I use for training is in the attached > gdt.symbols.exp10.tif. When I attempt to use this TIFF with the > corresponding gdt.symbols.exp10.box, I get this output: > > Tesseract Open Source OCR Engine v3.05.01 with Leptonica > Page 1 > FAIL! > APPLY_BOXES: boxfile line 7/Ⓜ ((1153,69),(1431,346)): FAILURE! Couldn't > find a matching blob > FAIL! > APPLY_BOXES: boxfile line 10/Ⓜ ((1993,69),(2268,346)): FAILURE! Couldn't > find a matching blob > APPLY_BOXES: > Boxes read from boxfile: 10 > Boxes failed resegmentation: 2 > Found 8 good blobs. > Generated training data for 5 words > > > Basically, both circled M symbols are failing. > > I've attached ImagesWithBoxes.PNG which is a screen capture from > jTessBoxEditor showing the TIFF image with boxes. As you can see, the boxes > appear to be correct. > > Why isn't tesseract able to use the circle M symbols for training? Can I > change the image of the symbols some how to help tesseract... maybe connect > the circle and M parts with a line? > > Thanks in advance. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/75aa477d-ec94-4c08-bb0e-10d6765a2798%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.