I created a large (1800 page) multi-page tiff and am feeding it to Tesseract via command line (on Ubuntu). This way I am testing Tesseract performance. I am still getting about 5/6 PPM. I will run the test on another machine to see if the performance is the same. Is this the performance that you are seeing for similar pages (details in thread above). This is about 25% the performance of a commercial engine that I am evaluating (it gets about 24 PPM with 2 cores on my laptop), and its accuracy is significantly better.
- viraf On Friday, February 19, 2016 at 7:50:09 AM UTC-5, viraf wrote: > > Thanks - I will investigate further. Initial test that I ran based on > Tom's input showed around the same performance (I used a multi-page TIFF), > however the article you referenced indicated a speedup factor of 2x. > > Is there a way to have Tesseract to process the pages in parallel ? > > On Thursday, February 18, 2016 at 9:58:12 PM UTC-5, Quan Nguyen wrote: >> >> If you can reduce or minimize initializing and disposing of Tesseract >> native instances for every run, you can achieve significant performance >> increase. >> >> https://sourceforge.net/p/tess4j/discussion/1202294/thread/d32bd579/ >> >> On Sunday, February 14, 2016 at 10:15:12 AM UTC-6, viraf wrote: >>> >>> I am new to tesseract and using it through Tess4J. I am trying to OCR >>> faxes where pages are represented as TIFF (CCITT T.6) images - 2509 x 3530 >>> @ 300 dpi (1 bit - i.e. BW). >>> >>> I have two set of questions >>> >>> *Speed* >>> On an intel i7-4800 MQ @ 2.7GHz I am getting approximately 6 PPM using 1 >>> thread. I was looking for suggestions on how to speed up page processing. >>> I use parallelStream to process each page in a separate thread, >>> >>> >>> - viraf >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9e976f46-8205-4a11-9c17-b6616c46a85b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

