On Tuesday, December 22, 2015 at 2:04:26 AM UTC-5, Utkarsh Sinha wrote: > > I'm trying to find out why Tesseract is rejecting certain blobs from the > image here. The text "nestle" and "nesquik" have overlapping baselines. I > suspect the overlap might be causing it to stop recognizing anything at all. >
They're not only overlapping, but they are at something like a 30 degree angle to each other. It doesn't surprise me that Tess considers that an unreasonable amount of interline skew. Where would one see that in a normal text layout? Additionally, the "Nesquick" isn't really text, but a stylized logotype. Perhaps consider using SIFT/SURF/etc detectors from OpenCV? Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/22650f25-4431-4acb-a10a-fd447a4a9574%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

