Without AVX support tesseract 4/5 will be slow(er). So try to focus on this. Using more than one lang will slower OCR too...
Zdenko ut 14. 4. 2020 o 5:56 Ravil R <[email protected]> napísal(a): > Oh you gave so much info, thanks! > My test exe file shows this version information: > tesseract 5.0.0 > leptonica-1.79.0 (Apr 14 2020, 06:42:43) [MSC v.1900 LIB Debug x86] > libjpeg 9b : libpng 1.6.32 : libtiff 4.0.7 : zlib 1.2.11 > > > Looks like I need to add (upgrade) the whole package > > понедельник, 13 апреля 2020 г., 21:02:42 UTC+3 пользователь zdenop написал: >> >> OS Name: Microsoft Windows 10 Pro >> OS Version: 10.0.18362 N/A Build 18362 >> System Model: Latitude E5570 >> System Type: x64-based PC >> Processor(s): 1 Processor(s) Installed. >> [01]: Intel64 Family 6 Model 78 Stepping 3 >> GenuineIntel ~2801 Mhz >> >> *tesseract -v* >> tesseract 5.0.0-alpha-638-gef4f >> leptonica-1.80.0 (Mar 12 2020, 12:47:16) [MSC v.1916 LIB Release x64] >> libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : >> libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0 >> Found AVX2 >> Found AVX >> Found FMA >> Found SSE >> Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8 >> >> *-l eng:* >> tessdata_best duration: 22.839419659999997 >> tessdata_fast duration: 3.3998838399999984 >> tessdata duration: 5.028869279999998 >> >> *-l eng+rus:* >> tessdata_best duration: 42.03311656 >> tessdata_fast duration: 4.122473539999999 >> tessdata duration: 9.4696169 >> >> *-l eng+rus -c tessedit_do_invert=0* >> tessdata_best duration: 33.66898392 >> tessdata_fast duration: 1.7703644200000042 >> tessdata duration: 6.849705899999998 >> >> tested with script: >> >> https://github.com/tesseract-ocr/tesseract/issues/263#issuecomment-536197289 >> >> I built tesseract with cmake and clang 10 with VS 2017 compatibility. >> >> Zdenko >> >> >> po 13. 4. 2020 o 9:50 Ravil R <[email protected]> napísal(a): >> >>> Sorry, I have just now seen your full answer with the questions, >>> yesterday i've just got an email with the advice to go to the forum, that I >>> did. >>> Now the answers >>> 1) I tested the latest 5.0.0-alpha build using all types of data files, >>> modern: best, fast, normal and old: for 3.0 version >>> 2) Yesterday I also tested 3.05 (with old tess data files) and 4.0 >>> versions (both with old data file and modern "Fast" data files) >>> 3) my PC is notebook i7-7700HQ, 32 GB, Windows 10, MS VC 2015. During >>> the recognition, one core is fully loaded. >>> 4) I read issues regarding performance but didn't find them useful, when >>> someone complains that 2 seconds is too slow it just makes me laughing. >>> 5) 2 minutes for page recognition with "Fast" data is an approximate >>> value, if a tested app is compiled using Release build it is 30% faster, >>> but still very slow. "Best" data files recognition takes around 5 minutes. >>> 6) Tesseract version doesn't significantly affect the results >>> 7) Old data files have the size around the size of "best" data files, >>> work a little faster than "fast" data files but produce output results >>> worse than "fast". So quality of the recognition is raising. >>> >>> понедельник, 13 апреля 2020 г., 10:08:08 UTC+3 пользователь zdenop >>> написал: >>>> >>>> Why you decided to ignore instructions in comment >>>> >>>> https://github.com/tesseract-ocr/tesseract/issues/2946#issuecomment-612613461 >>>> ? >>>> Why we should care about your problems if you do not care? >>>> >>>> Zdenko >>>> >>>> >>>> ne 12. 4. 2020 o 16:00 Ravil R <[email protected]> napísal(a): >>>> >>>>> I have my own simple Windows dll based on tesseractmain,cpp code. It >>>>> works fine since Tesseract 3x (now I moved it the latest 5 build) and the >>>>> only issue still persists is its low speed - 1 page TIFF takes around 2 >>>>> minutes even with the Fast version of tessdata ('eng+rus'). Is this how it >>>>> actually works or there is something I don't understand? >>>>> Almost all the time takes this line: >>>>> api.ProcessPages("c:\\1.tif", NULL, 0, NULL); >>>>> Sample file is attached >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wSmLwGT1vhfE4BP%3DHwH3HDkYG0b7esM853c3tzc7_AFw%40mail.gmail.com.

