Thank you for the detailed info. My suggestion is to try recognition with eng.traineddata from the tessdata_fast repository with --oem 1.
On Tue 3 Apr, 2018, 3:13 AM Patrick Ramsey, <[email protected]> wrote: > Answers below inline. And thank you very much for your help :) > > |PTR > > On Friday, March 30, 2018 at 2:00:18 AM UTC-7, shree wrote: >> >> Please check GitHub/issues for similar reports and suggestions. >> >> Also specify, >> > Which version/commit of tesseract 4 >> > > commit hash: 40f43111e05b3dd2f2f8aeae3aba33016523c881 > tag: 4.0.0-beta.1 > > Which traineddata file, from which repo >> > > eng.traineddata from https://github.com/tesseract-ocr/tessdata at commit > 9b2e3f6642285b3e9a7a5852e5b10259e42d5510 > > >> Which o/s >> > > Ubuntu 17.10 on amd64 > >> >> tesseract -v >> > > tesseract 4.0.0-beta.1 > leptonica-1.74.4 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.34 : > libtiff 4.0.8 : zlib 1.2.11 : libwebp 0.6.0 : libopenjp2 2.2.0 > > Found AVX2 > Found AVX > Found SSE > > > >> >> > >> >> >> On Fri 30 Mar, 2018, 2:19 PM Patrick Ramsey, <[email protected]> >> wrote: >> >>> Hi! >>> >>> So, I am running tesseract4 on clean, 1-bit images of rasterized text >>> (not printed and scanned). I'm getting very accurate output, as expected, >>> but tesseract is taking about 1 second to process a single page on a core >>> i7 cpu, and that seems a lot longer than I'd have expected. >>> >>> I've been trying to enable debug output so that I can see what's taking >>> the most time, to see if there is anything that I could get away with >>> turning off to speed it up (since I don't need to account for e.g. dirt on >>> the lens), but thus far I'm feeling pretty stupid. So: >>> >>> A) is there any straightforward way to get more information on what >>> tesseract is actually doing? (I've built with --enable-debug and it doesn't >>> seem to have changed the output on the command line) >>> B) are there any control parameters you folks would suggest setting to >>> speed up image processing/turn off unnecessary work, given the inputs I've >>> described? >>> >>> Many thanks, >>> >>> PTR >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/893cf5f7-8f64-428e-b1fe-5e6214215059%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/893cf5f7-8f64-428e-b1fe-5e6214215059%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c709dd21-02d4-4d23-a52a-60501916c37a%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c709dd21-02d4-4d23-a52a-60501916c37a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVLbi6wbRyWnNqTwAdZovBm-W%3DmZx4gTOjoCfTdrXcucA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

