Thanks for the clarification. I now know that 24 PPM on a single thread should be achievable. I'll update the post after trying a few options. Thanks for your help.
- viraf On Tuesday, February 16, 2016 at 1:53:40 AM UTC-5, Tom Morris wrote: > > On Mon, Feb 15, 2016 at 8:24 PM, viraf <[email protected] <javascript:> > > wrote: > >> Tom, the images are TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 >> bit - i.e. BW). Th language is english. >> > > So, roughly the same resolution and format as I used, but only 1/4 the > speed. My test machine calls itself a mid-2014 MBP with 2.5 GHz Intel Core > i7 (and no, it's not using OpenCL, the GPU, or multiple threads). > > >> I am using Tess4j 3.0, which includes Tesseract 3.0.4. I am >> instantiating a new Tesseract object for each page, however the cost was >> minimal (74ms) for the total run. >> > > I'm not familiar with the Tess4J wrapper, but that sounds pretty low for > initialization cost. Are you sure you're measuring the true cost (ie you're > not being fooled by lazy initialization)? What happens when you combine all > the pages into a single multi-page TIFF and OCR it (so you can be sure > you've amortized the initialization cost)? > > When you state "taking a big hit on image processing" how would I be able >> to isolate the issue to image processing? >> > > I was mainly talking about operations like thresholding, format > conversion, etc to get to a usable image. That's obviously not applicable > if you're working with bitonal images (which you hadn't disclosed when I > wrote my reply). > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e77854c1-3069-465f-8c6c-0e89eb88227f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

