Tom, we did set the "Force parallel baselines" to false. I was hoping that would keep Tesseract for discarding Nesquik. Are there any other parameters I can try tweaking?
While SIFT/Surf/etc are definitely options, I'm currently exploring using an OCR and the its limits. Given enough training, SIFT/etc would work just fine. However, we would have to first gather a lot of data - which isn't possible in our case. The data I'm working with hits us first and later becomes popular and available through Google images. So scraping the internet might not be of much help to us. On Tuesday, December 22, 2015 at 4:51:14 PM UTC-5, Tom Morris wrote: > > On Tuesday, December 22, 2015 at 2:04:26 AM UTC-5, Utkarsh Sinha wrote: >> >> I'm trying to find out why Tesseract is rejecting certain blobs from the >> image here. The text "nestle" and "nesquik" have overlapping baselines. I >> suspect the overlap might be causing it to stop recognizing anything at all. >> > > They're not only overlapping, but they are at something like a 30 degree > angle to each other. It doesn't surprise me that Tess considers that an > unreasonable amount of interline skew. Where would one see that in a > normal text layout? Additionally, the "Nesquick" isn't really text, but a > stylized logotype. > > Perhaps consider using SIFT/SURF/etc detectors from OpenCV? > > Tom > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/594ae06e-3c1b-4f4f-8fc6-66bf4d3029ae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

