[tesseract-ocr] Re: How to extract bounding box only? If I do not need the word/characters classifier.

Quan Nguyen Fri, 30 Oct 2015 03:23:29 -0700

Have you tried the GetComponentImages example?

https://code.google.com/p/tesseract-ocr/wiki/APIExample


On Wednesday, October 28, 2015 at 3:46:14 AM UTC-5, [email protected] 
wrote:
>
> Hi,
>
> First, I have very little knowledge about ocr/tesseract. 
>
> We use tesseract ocr to detect text area of a given image, which is used 
> for calculating image quality(the smaller text area ratio the better). We 
> don't use the content result of ocr, only use bounding boxes of words. 
>
> And the problems is, there are cases that there are a lot of Chinese or 
> Russia characters in images. It often takes more than 20 seconds, which is 
> unacceptable. As a online interactive service, we can not let the user, our 
> customers, wait too long. 
>
> Are there some parameters I can tweak for speed up OCR? If we only need 
> the text boxes area. Or I just call method to do "perform page layout 
> analysis" ?
> Assume the text in image are rarely rotated. Images are from customers' 
> website, the readability is not bad.
>
> Please help.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d0dd8798-4a87-4108-a10d-c56a9dd89266%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: How to extract bounding box only? If I do not need the word/characters classifier.

Reply via email to