Ok, got it, not to pay too much attention to the libraries other than tesseract itself
среда, 15 апреля 2020 г., 21:45:39 UTC+3 пользователь zdenop написал: > > Just for future reference: for AVX (and ...) support there is needed to > rebuild only tesseract - it depends on compiler and HW. > Of course it make sense to use the latest version of tesseract > dependencies (because of security, bugfixes etc) , but they have (AFAIK) > minimum effect on tesseract speed (they are use to reading input images). > > Zdenko > > > st 15. 4. 2020 o 19:10 Ravil R <[email protected] <javascript:>> > napísal(a): > >> Yes exactly, I updated libraries (without turbojpeg and libarchive) and >> added AVX2 support, now t works at least 10 times faster than before. >> Problem solved. Thank you very much! >> Ravil >> >> вторник, 14 апреля 2020 г., 13:25:03 UTC+3 пользователь zdenop написал: >>> >>> Without AVX support tesseract 4/5 will be slow(er). So try to focus on >>> this. >>> Using more than one lang will slower OCR too... >>> >>> Zdenko >>> >>> >>> ut 14. 4. 2020 o 5:56 Ravil R <[email protected]> napísal(a): >>> >>>> Oh you gave so much info, thanks! >>>> My test exe file shows this version information: >>>> tesseract 5.0.0 >>>> leptonica-1.79.0 (Apr 14 2020, 06:42:43) [MSC v.1900 LIB Debug x86] >>>> libjpeg 9b : libpng 1.6.32 : libtiff 4.0.7 : zlib 1.2.11 >>>> >>>> >>>> Looks like I need to add (upgrade) the whole package >>>> >>>> понедельник, 13 апреля 2020 г., 21:02:42 UTC+3 пользователь zdenop >>>> написал: >>>>> >>>>> OS Name: Microsoft Windows 10 Pro >>>>> OS Version: 10.0.18362 N/A Build 18362 >>>>> System Model: Latitude E5570 >>>>> System Type: x64-based PC >>>>> Processor(s): 1 Processor(s) Installed. >>>>> [01]: Intel64 Family 6 Model 78 Stepping 3 >>>>> GenuineIntel ~2801 Mhz >>>>> >>>>> *tesseract -v* >>>>> tesseract 5.0.0-alpha-638-gef4f >>>>> leptonica-1.80.0 (Mar 12 2020, 12:47:16) [MSC v.1916 LIB Release x64] >>>>> libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : >>>>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0 >>>>> Found AVX2 >>>>> Found AVX >>>>> Found FMA >>>>> Found SSE >>>>> Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8 >>>>> >>>>> *-l eng:* >>>>> tessdata_best duration: 22.839419659999997 >>>>> tessdata_fast duration: 3.3998838399999984 >>>>> tessdata duration: 5.028869279999998 >>>>> >>>>> *-l eng+rus:* >>>>> tessdata_best duration: 42.03311656 >>>>> tessdata_fast duration: 4.122473539999999 >>>>> tessdata duration: 9.4696169 >>>>> >>>>> *-l eng+rus -c tessedit_do_invert=0* >>>>> tessdata_best duration: 33.66898392 >>>>> tessdata_fast duration: 1.7703644200000042 >>>>> tessdata duration: 6.849705899999998 >>>>> >>>>> tested with script: >>>>> >>>>> https://github.com/tesseract-ocr/tesseract/issues/263#issuecomment-536197289 >>>>> >>>>> I built tesseract with cmake and clang 10 with VS 2017 compatibility. >>>>> >>>>> Zdenko >>>>> >>>>> >>>>> po 13. 4. 2020 o 9:50 Ravil R <[email protected]> napísal(a): >>>>> >>>>>> Sorry, I have just now seen your full answer with the questions, >>>>>> yesterday i've just got an email with the advice to go to the forum, >>>>>> that I >>>>>> did. >>>>>> Now the answers >>>>>> 1) I tested the latest 5.0.0-alpha build using all types of data >>>>>> files, modern: best, fast, normal and old: for 3.0 version >>>>>> 2) Yesterday I also tested 3.05 (with old tess data files) and 4.0 >>>>>> versions (both with old data file and modern "Fast" data files) >>>>>> 3) my PC is notebook i7-7700HQ, 32 GB, Windows 10, MS VC 2015. During >>>>>> the recognition, one core is fully loaded. >>>>>> 4) I read issues regarding performance but didn't find them useful, >>>>>> when someone complains that 2 seconds is too slow it just makes me >>>>>> laughing. >>>>>> 5) 2 minutes for page recognition with "Fast" data is an approximate >>>>>> value, if a tested app is compiled using Release build it is 30% faster, >>>>>> but still very slow. "Best" data files recognition takes around 5 >>>>>> minutes. >>>>>> 6) Tesseract version doesn't significantly affect the results >>>>>> 7) Old data files have the size around the size of "best" data files, >>>>>> work a little faster than "fast" data files but produce output results >>>>>> worse than "fast". So quality of the recognition is raising. >>>>>> >>>>>> понедельник, 13 апреля 2020 г., 10:08:08 UTC+3 пользователь zdenop >>>>>> написал: >>>>>>> >>>>>>> Why you decided to ignore instructions in comment >>>>>>> >>>>>>> https://github.com/tesseract-ocr/tesseract/issues/2946#issuecomment-612613461 >>>>>>> ? >>>>>>> Why we should care about your problems if you do not care? >>>>>>> >>>>>>> Zdenko >>>>>>> >>>>>>> >>>>>>> ne 12. 4. 2020 o 16:00 Ravil R <[email protected]> napísal(a): >>>>>>> >>>>>>>> I have my own simple Windows dll based on tesseractmain,cpp code. >>>>>>>> It works fine since Tesseract 3x (now I moved it the latest 5 build) >>>>>>>> and >>>>>>>> the only issue still persists is its low speed - 1 page TIFF takes >>>>>>>> around 2 >>>>>>>> minutes even with the Fast version of tessdata ('eng+rus'). Is this >>>>>>>> how it >>>>>>>> actually works or there is something I don't understand? >>>>>>>> Almost all the time takes this line: >>>>>>>> api.ProcessPages("c:\\1.tif", NULL, 0, NULL); >>>>>>>> Sample file is attached >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/fce61619-ec01-43cb-8393-1a32d3cc8088%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/fce61619-ec01-43cb-8393-1a32d3cc8088%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/05d8e7cf-8a38-4a22-9d9a-9465e35a8c09%40googlegroups.com.

