You need a much larger sample, in the range of hundreds or at least several 
dozens, so that even though some symbols could experience "Couldn't find a 
matching blob" errors, other samples would get picked up.

On Saturday, May 26, 2018 at 1:52:39 AM UTC-5, Paul Kitchen wrote:
>
> I am creating training data for GD&T symbols using Tesseract 3.05.01. One 
> of my TIFF files I use for training is in the attached 
> gdt.symbols.exp10.tif. When I attempt to use this TIFF with the 
> corresponding gdt.symbols.exp10.box, I get this output:
>
> Tesseract Open Source OCR Engine v3.05.01 with Leptonica
> Page 1
> FAIL!
> APPLY_BOXES: boxfile line 7/Ⓜ ((1153,69),(1431,346)): FAILURE! Couldn't 
> find a matching blob
> FAIL!
> APPLY_BOXES: boxfile line 10/Ⓜ ((1993,69),(2268,346)): FAILURE! Couldn't 
> find a matching blob
> APPLY_BOXES:
>    Boxes read from boxfile:      10
>    Boxes failed resegmentation:       2
>    Found 8 good blobs.
> Generated training data for 5 words
>
>
> Basically, both circled M symbols are failing.
>
> I've attached ImagesWithBoxes.PNG which is a screen capture from 
> jTessBoxEditor showing the TIFF image with boxes. As you can see, the boxes 
> appear to be correct.
>
> Why isn't tesseract able to use the circle M symbols for training? Can I 
> change the image of the symbols some how to help tesseract... maybe connect 
> the circle and M parts with a line?
>
> Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/75aa477d-ec94-4c08-bb0e-10d6765a2798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to