OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.18362 N/A Build 18362
System Model: Latitude E5570
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 78 Stepping 3
GenuineIntel ~2801 Mhz
*tesseract -v*
tesseract 5.0.0-alpha-638-gef4f
leptonica-1.80.0 (Mar 12 2020, 12:47:16) [MSC v.1916 LIB Release x64]
libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : libtiff
4.1.0 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8
*-l eng:*
tessdata_best duration: 22.839419659999997
tessdata_fast duration: 3.3998838399999984
tessdata duration: 5.028869279999998
*-l eng+rus:*
tessdata_best duration: 42.03311656
tessdata_fast duration: 4.122473539999999
tessdata duration: 9.4696169
*-l eng+rus -c tessedit_do_invert=0*
tessdata_best duration: 33.66898392
tessdata_fast duration: 1.7703644200000042
tessdata duration: 6.849705899999998
tested with script:
https://github.com/tesseract-ocr/tesseract/issues/263#issuecomment-536197289
I built tesseract with cmake and clang 10 with VS 2017 compatibility.
Zdenko
po 13. 4. 2020 o 9:50 Ravil R <[email protected]> napísal(a):
> Sorry, I have just now seen your full answer with the questions, yesterday
> i've just got an email with the advice to go to the forum, that I did.
> Now the answers
> 1) I tested the latest 5.0.0-alpha build using all types of data files,
> modern: best, fast, normal and old: for 3.0 version
> 2) Yesterday I also tested 3.05 (with old tess data files) and 4.0
> versions (both with old data file and modern "Fast" data files)
> 3) my PC is notebook i7-7700HQ, 32 GB, Windows 10, MS VC 2015. During the
> recognition, one core is fully loaded.
> 4) I read issues regarding performance but didn't find them useful, when
> someone complains that 2 seconds is too slow it just makes me laughing.
> 5) 2 minutes for page recognition with "Fast" data is an approximate
> value, if a tested app is compiled using Release build it is 30% faster,
> but still very slow. "Best" data files recognition takes around 5 minutes.
> 6) Tesseract version doesn't significantly affect the results
> 7) Old data files have the size around the size of "best" data files, work
> a little faster than "fast" data files but produce output results worse
> than "fast". So quality of the recognition is raising.
>
> понедельник, 13 апреля 2020 г., 10:08:08 UTC+3 пользователь zdenop написал:
>>
>> Why you decided to ignore instructions in comment
>>
>> https://github.com/tesseract-ocr/tesseract/issues/2946#issuecomment-612613461
>> ?
>> Why we should care about your problems if you do not care?
>>
>> Zdenko
>>
>>
>> ne 12. 4. 2020 o 16:00 Ravil R <[email protected]> napísal(a):
>>
>>> I have my own simple Windows dll based on tesseractmain,cpp code. It
>>> works fine since Tesseract 3x (now I moved it the latest 5 build) and the
>>> only issue still persists is its low speed - 1 page TIFF takes around 2
>>> minutes even with the Fast version of tessdata ('eng+rus'). Is this how it
>>> actually works or there is something I don't understand?
>>> Almost all the time takes this line:
>>> api.ProcessPages("c:\\1.tif", NULL, 0, NULL);
>>> Sample file is attached
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/759d47df-da5f-4683-ab13-0f8ffb08b159%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/36507710-55f7-4c62-8aff-60692be32a96%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xX%2Bu86cS4%3DTE4zDBKMWA8qqjOCK8t5mhcbExLx3L7pzg%40mail.gmail.com.