On Mon, Feb 15, 2016 at 8:24 PM, viraf <[email protected]> wrote:
> Tom, the images are TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 bit > - i.e. BW). Th language is english. > So, roughly the same resolution and format as I used, but only 1/4 the speed. My test machine calls itself a mid-2014 MBP with 2.5 GHz Intel Core i7 (and no, it's not using OpenCL, the GPU, or multiple threads). > I am using Tess4j 3.0, which includes Tesseract 3.0.4. I am instantiating > a new Tesseract object for each page, however the cost was minimal (74ms) > for the total run. > I'm not familiar with the Tess4J wrapper, but that sounds pretty low for initialization cost. Are you sure you're measuring the true cost (ie you're not being fooled by lazy initialization)? What happens when you combine all the pages into a single multi-page TIFF and OCR it (so you can be sure you've amortized the initialization cost)? When you state "taking a big hit on image processing" how would I be able > to isolate the issue to image processing? > I was mainly talking about operations like thresholding, format conversion, etc to get to a usable image. That's obviously not applicable if you're working with bitonal images (which you hadn't disclosed when I wrote my reply). -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEHRv6zu%3DGb_Qkji9syX_%3DeApE982r6Prgzu_5LwPwkgFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

