If you're using linux, then "man gprof" will tell you how
to get profile data that shows where the program is spending
its time.  Enabling debugging will help you step through
the code as it runs, but that gives only a rough (and
maybe inaccurate) guess about what takes a long time to
compute.

If you don't want to rebuild tesseract with profiling
enabled, then the "oprofile" package on linux can be used
to get profiling data.  It's more complicated than gprof,
but also much more powerful.

Cheers,
Rob Komar

On Thu, 29 Mar 2018, Patrick Ramsey wrote:

Hi!

So, I am running tesseract4 on clean, 1-bit images of
rasterized text (not printed and scanned).? I'm getting very
accurate output, as expected, but tesseract is taking
about 1 second to process a single page on a core i7 cpu,
and that seems a lot longer than I'd have expected.?

I've been trying to enable debug output so that I can see
what's taking the most time, to see if there is anything
that I could get away with turning off to speed it up
(since I don't need to account for e.g. dirt on the lens),
but thus far I'm feeling pretty stupid.? So:

A) is there any straightforward way to get more
information on what tesseract is actually doing? (I've
built with --enable-debug and it doesn't seem to have
changed the output on the command line)
B) are there any control parameters you folks would
suggest setting to speed up image processing/turn off
unnecessary work, given the inputs I've described?

Many thanks,

PTR

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/alpine.LNX.2.21.1803301217140.8770%40robpc4.robk-home.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to