[tesseract-ocr] Re: Tesseract seems to be removing correctly segmented and oriented blocks for the final classification

Utkarsh Sinha Tue, 22 Dec 2015 20:42:11 -0800

Tom, we did set the "Force parallel baselines" to false. I was hoping that 
would keep Tesseract for discarding Nesquik. Are there any other parameters 
I can try tweaking?


While SIFT/Surf/etc are definitely options, I'm currently exploring using 
an OCR and the its limits. Given enough training, SIFT/etc would work just 
fine. However, we would have to first gather a lot of data - which isn't 
possible in our case. The data I'm working with hits us first and later 
becomes popular and available through Google images. So scraping the 
internet might not be of much help to us.

On Tuesday, December 22, 2015 at 4:51:14 PM UTC-5, Tom Morris wrote:
>
> On Tuesday, December 22, 2015 at 2:04:26 AM UTC-5, Utkarsh Sinha wrote:
>>
>> I'm trying to find out why Tesseract is rejecting certain blobs from the 
>> image here. The text "nestle" and "nesquik" have overlapping baselines. I 
>> suspect the overlap might be causing it to stop recognizing anything at all.
>>
>
> They're not only overlapping, but they are at something like a 30 degree 
> angle to each other.  It doesn't surprise me that Tess considers that an 
> unreasonable amount of interline skew.  Where would one see that in a 
> normal text layout? Additionally, the "Nesquick" isn't really text, but a 
> stylized logotype.
>
> Perhaps consider using SIFT/SURF/etc detectors from OpenCV?
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/594ae06e-3c1b-4f4f-8fc6-66bf4d3029ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Tesseract seems to be removing correctly segmented and oriented blocks for the final classification

Reply via email to