Hey, I have the same two errors, in a image that seems pretty readable. I have Tesseract v3.02.02 with Leptonica
Preprocessed image is attached. On Monday, January 27, 2014 at 2:04:35 AM UTC-6, zdenop wrote: > > As always - if you do not provide sample image for your problem nobody > will look at the problem... > > Zdenko > > > On Mon, Jan 27, 2014 at 8:46 AM, <[email protected] <javascript:>> > wrote: > >> Hi everyone, >> >> I know this is quite an old topic by now, but this question still stands >> and I saw no reason to create a new one for it. >> >> I use tesseract 3.0.2 (with leptonica 1.67, which was the recommended at >> the time of the installation) on Centos 6.5. I convert large pdf files to >> seperate page-PNGs, then use tesseract to scan for specific keywords. >> >> A few pages have given me the following errors (the errors always come >> together): >> >> Error in boxClipToRectangle: box outside rectangle >> Error in pixScanForForeground: invalid box >> >> These pages seem to be OCRed correctly, with more or less the same >> precision as the rest of the pages (~96% characters recognised), but I have >> only found three pages with these errors so my sample is not very >> significant. >> >> What do these errors mean? >> Do these hint to a user error? >> Is there any possibility they can mean a loss of precision? >> >> Thanks in advance for the help. :) >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> <javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d7c38262-ca94-43fc-8e19-e86e82eb3617%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

