Have you tried the GetComponentImages example? https://code.google.com/p/tesseract-ocr/wiki/APIExample
On Wednesday, October 28, 2015 at 3:46:14 AM UTC-5, [email protected] wrote: > > Hi, > > First, I have very little knowledge about ocr/tesseract. > > We use tesseract ocr to detect text area of a given image, which is used > for calculating image quality(the smaller text area ratio the better). We > don't use the content result of ocr, only use bounding boxes of words. > > And the problems is, there are cases that there are a lot of Chinese or > Russia characters in images. It often takes more than 20 seconds, which is > unacceptable. As a online interactive service, we can not let the user, our > customers, wait too long. > > Are there some parameters I can tweak for speed up OCR? If we only need > the text boxes area. Or I just call method to do "perform page layout > analysis" ? > Assume the text in image are rarely rotated. Images are from customers' > website, the readability is not bad. > > Please help. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d0dd8798-4a87-4108-a10d-c56a9dd89266%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

