If you're using linux, then "man gprof" will tell you how to get profile data that shows where the program is spending its time. Enabling debugging will help you step through the code as it runs, but that gives only a rough (and maybe inaccurate) guess about what takes a long time to compute.
If you don't want to rebuild tesseract with profiling enabled, then the "oprofile" package on linux can be used to get profiling data. It's more complicated than gprof, but also much more powerful. Cheers, Rob Komar On Thu, 29 Mar 2018, Patrick Ramsey wrote:
Hi! So, I am running tesseract4 on clean, 1-bit images of rasterized text (not printed and scanned).? I'm getting very accurate output, as expected, but tesseract is taking about 1 second to process a single page on a core i7 cpu, and that seems a lot longer than I'd have expected.? I've been trying to enable debug output so that I can see what's taking the most time, to see if there is anything that I could get away with turning off to speed it up (since I don't need to account for e.g. dirt on the lens), but thus far I'm feeling pretty stupid.? So: A) is there any straightforward way to get more information on what tesseract is actually doing? (I've built with --enable-debug and it doesn't seem to have changed the output on the command line) B) are there any control parameters you folks would suggest setting to speed up image processing/turn off unnecessary work, given the inputs I've described? Many thanks, PTR
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/alpine.LNX.2.21.1803301217140.8770%40robpc4.robk-home.org. For more options, visit https://groups.google.com/d/optout.

