Ok, got it, not to pay too much attention to the libraries other than 
tesseract itself

среда, 15 апреля 2020 г., 21:45:39 UTC+3 пользователь zdenop написал:
>
> Just for future reference: for AVX (and ...) support there is needed to 
> rebuild only tesseract - it depends on compiler and HW.
> Of course it make sense to use the latest version of tesseract 
> dependencies (because of security, bugfixes etc) , but they have (AFAIK) 
> minimum effect on tesseract speed (they are use to reading input images).
>
> Zdenko
>
>
> st 15. 4. 2020 o 19:10 Ravil R <[email protected] <javascript:>> 
> napísal(a):
>
>> Yes exactly, I updated libraries (without turbojpeg and libarchive) and 
>> added AVX2 support, now t works at least 10 times faster than before. 
>> Problem solved. Thank you very much!
>> Ravil
>>
>> вторник, 14 апреля 2020 г., 13:25:03 UTC+3 пользователь zdenop написал:
>>>
>>> Without AVX support tesseract 4/5 will be slow(er). So try to focus on 
>>> this.
>>> Using more than one lang will slower OCR too...
>>>
>>> Zdenko
>>>
>>>
>>> ut 14. 4. 2020 o 5:56 Ravil R <[email protected]> napísal(a):
>>>
>>>> Oh you gave so much info, thanks!
>>>> My test exe file shows this version information:
>>>> tesseract 5.0.0
>>>>  leptonica-1.79.0 (Apr 14 2020, 06:42:43) [MSC v.1900 LIB Debug x86]
>>>>   libjpeg 9b : libpng 1.6.32 : libtiff 4.0.7 : zlib 1.2.11
>>>>
>>>>
>>>> Looks like I need to add (upgrade) the whole package
>>>>
>>>> понедельник, 13 апреля 2020 г., 21:02:42 UTC+3 пользователь zdenop 
>>>> написал:
>>>>>
>>>>> OS Name:                   Microsoft Windows 10 Pro
>>>>> OS Version:                10.0.18362 N/A Build 18362
>>>>> System Model:              Latitude E5570
>>>>> System Type:               x64-based PC
>>>>> Processor(s):              1 Processor(s) Installed.
>>>>>                            [01]: Intel64 Family 6 Model 78 Stepping 3 
>>>>> GenuineIntel ~2801 Mhz
>>>>>
>>>>> *tesseract -v*
>>>>> tesseract 5.0.0-alpha-638-gef4f
>>>>>  leptonica-1.80.0 (Mar 12 2020, 12:47:16) [MSC v.1916 LIB Release x64]
>>>>>   libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : 
>>>>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
>>>>>  Found AVX2
>>>>>  Found AVX
>>>>>  Found FMA
>>>>>  Found SSE
>>>>>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8
>>>>>
>>>>> *-l eng:*
>>>>> tessdata_best duration: 22.839419659999997
>>>>> tessdata_fast duration: 3.3998838399999984
>>>>> tessdata duration: 5.028869279999998
>>>>>
>>>>> *-l eng+rus:*
>>>>> tessdata_best duration: 42.03311656
>>>>> tessdata_fast duration: 4.122473539999999
>>>>> tessdata duration: 9.4696169
>>>>>
>>>>> *-l eng+rus -c tessedit_do_invert=0*
>>>>> tessdata_best duration: 33.66898392
>>>>> tessdata_fast duration: 1.7703644200000042
>>>>> tessdata duration: 6.849705899999998
>>>>>
>>>>> tested with script:
>>>>>
>>>>> https://github.com/tesseract-ocr/tesseract/issues/263#issuecomment-536197289
>>>>>
>>>>> I built tesseract  with cmake and clang 10 with VS 2017 compatibility.
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> po 13. 4. 2020 o 9:50 Ravil R <[email protected]> napísal(a):
>>>>>
>>>>>> Sorry, I have just now seen your full answer with the questions, 
>>>>>> yesterday i've just got an email with the advice to go to the forum, 
>>>>>> that I 
>>>>>> did.
>>>>>> Now the answers 
>>>>>> 1) I tested the latest 5.0.0-alpha build using all types of data 
>>>>>> files, modern: best, fast, normal and old: for 3.0 version
>>>>>> 2) Yesterday I also tested 3.05 (with old tess data files) and 4.0 
>>>>>> versions (both with old data file and modern "Fast" data files)
>>>>>> 3) my PC is notebook i7-7700HQ, 32 GB, Windows 10, MS VC 2015. During 
>>>>>> the recognition, one core is fully loaded.
>>>>>> 4) I read issues regarding performance but didn't find them useful, 
>>>>>> when someone complains that 2 seconds is too slow it just makes me 
>>>>>> laughing.
>>>>>> 5) 2 minutes for page recognition with "Fast" data is an approximate 
>>>>>> value, if a tested app is compiled using Release build it is 30% faster, 
>>>>>> but still very slow. "Best" data files recognition takes around 5 
>>>>>> minutes.
>>>>>> 6) Tesseract version doesn't significantly affect the results
>>>>>> 7) Old data files have the size around the size of "best" data files, 
>>>>>> work a little faster than "fast" data files but produce output results 
>>>>>> worse than "fast". So quality of the recognition is raising.
>>>>>>
>>>>>> понедельник, 13 апреля 2020 г., 10:08:08 UTC+3 пользователь zdenop 
>>>>>> написал:
>>>>>>>
>>>>>>> Why you decided to ignore instructions in comment
>>>>>>>
>>>>>>> https://github.com/tesseract-ocr/tesseract/issues/2946#issuecomment-612613461
>>>>>>> ?
>>>>>>> Why we should care about your problems if you do not care?
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> ne 12. 4. 2020 o 16:00 Ravil R <[email protected]> napísal(a):
>>>>>>>
>>>>>>>> I have my own simple Windows dll based on tesseractmain,cpp code. 
>>>>>>>> It works fine since Tesseract 3x (now I moved it the latest 5 build) 
>>>>>>>> and 
>>>>>>>> the only issue still persists is its low speed - 1 page TIFF takes 
>>>>>>>> around 2 
>>>>>>>> minutes even with the Fast version of tessdata ('eng+rus'). Is this 
>>>>>>>> how it 
>>>>>>>> actually works or there is something I don't understand?
>>>>>>>> Almost all the time takes this line: 
>>>>>>>> api.ProcessPages("c:\\1.tif", NULL, 0, NULL);
>>>>>>>> Sample file is attached
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/09e3279e-ed9a-44f8-a1f9-678fb8e034e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/fce61619-ec01-43cb-8393-1a32d3cc8088%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/fce61619-ec01-43cb-8393-1a32d3cc8088%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05d8e7cf-8a38-4a22-9d9a-9465e35a8c09%40googlegroups.com.

Reply via email to