[tesseract-ocr] Multiple pages in parallel?

Matthew Lai Sat, 10 Mar 2018 11:51:03 -0800

Hello!

According to the FAQ[1], if I run tesseract on a multi-page image, it 
should process the pages in parallel.


I am converting a 10-page TIF (in one file) into PDF, and looking at *top*, 
it seems like tesseract never uses more than about 250% CPU (I have 16 
cores / 32 threads on my machine).

Am I doing something wrong?

tesseract combined.tif out pdf
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
OSD: Weak margin (6.98) for 914 blob text block, but using orientation 
anyway: 0

tesseract -v (from Debian Testing):
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.1) : libpng 1.6.28 : libtiff 
4.0.8 : zlib 1.2.8 : libwebp 0.5.2 : libopenjp2 2.1.2

 Found AVX
 Found SSE

Thanks!
Matthew

[1]: 
https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-increase-speed-of-ocr

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7f01dc90-9210-45e6-93d4-282a1edd4a0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Multiple pages in parallel?

Reply via email to