Thanks - I will investigate further. Initial test that I ran based on Tom's input showed around the same performance (I used a multi-page TIFF), however the article you referenced indicated a speedup factor of 2x.
Is there a way to have Tesseract to process the pages in parallel ? On Thursday, February 18, 2016 at 9:58:12 PM UTC-5, Quan Nguyen wrote: > > If you can reduce or minimize initializing and disposing of Tesseract > native instances for every run, you can achieve significant performance > increase. > > https://sourceforge.net/p/tess4j/discussion/1202294/thread/d32bd579/ > > On Sunday, February 14, 2016 at 10:15:12 AM UTC-6, viraf wrote: >> >> I am new to tesseract and using it through Tess4J. I am trying to OCR >> faxes where pages are represented as TIFF (CCITT T.6) images - 2509 x 3530 >> @ 300 dpi (1 bit - i.e. BW). >> >> I have two set of questions >> >> *Speed* >> On an intel i7-4800 MQ @ 2.7GHz I am getting approximately 6 PPM using 1 >> thread. I was looking for suggestions on how to speed up page processing. >> I use parallelStream to process each page in a separate thread, >> >> >> - viraf >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3035d0aa-1872-40da-9db5-acffe3c7e773%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

