Hello zdenop,

My idea is use multithreading for multiple tiffs - e.g. contain 30 pages. 
Currently tesseract is working on 1 thread, not using its full potential 
for Windows - We could get for example half of available system threads and 
automatically allocate some of the pages of the tiff file as independent 
images. The results would be collected into 1 structure, which would be, 
for example a map of results. The implementation could be carried out on 
the level of some wrapper class, whih has been prepared for communication 
with the OCR engine. Example functional diagram for a 4-core processor is 
presented by below schema. Is this a good direction to run several 
Tesseract OCR instances simultaneously? 

[image: Test.png]

poniedziałek, 9 maja 2022 o 18:44:05 UTC+2 zdenop napisał(a):

> Hello,
>
> 1) search issue tracker for openmp[1] reports for more details. There are 
> different experiences. For me, it seems for me like it does not help on 
> linux (and mac?) - just consumes the CPU. My experience[2] is that it helps 
> on windows, but maybe it is the question of HW& SW configuration. To be on 
> the safe side - OpenMP is turned off by default, so if somebody turns it 
> on,  such user/developer should be responsible for the consequences ;-)
>
> 2) I made some test with multithreading of tesserocr in python and it does 
> not work for me. It works only with 1 thread (I never use multithreading, 
> so maybe the problem is on my side.). 
>
> Anyway expect and contribution in this area (OpenMP) is warmly welcomed.
>
> [1]https://github.com/tesseract-ocr/tesseract/issues?q=is%3Aissue+openmp
> [2] https://github.com/tesseract-ocr/tessdoc/blob/main/Benchmarks.md
>
> Zdenko
>
>
> ne 8. 5. 2022 o 14:24 Krzysztof J <[email protected]> napísal(a):
>
>> have the problems & questions:
>>
>> 1). Question 1: While preparing the build, I noticed that the 
>> "OPENMP_BUILD" setting is not included when building the solution see below:
>>
>> [image: configuration_tesseract.png]
>>
>> Anyone can say something more about it? Is using multiprocessing at the 
>> moment recommended? What's the state of it now? I only saw subject # 1662 
>> <https://github.com/tesseract-ocr/tesseract/issues/1662> where it was 
>> turned off, but it was 4 years ago :o
>>
>> 2). Question: Are there any other ways to take advantage of 
>> multithreading in Tesseract besides OpenMP in Tesseract 5.1.0? Anyone have 
>> experience in this topic? For now I am working on 1 thread, but ultimately 
>> I would like to switch to multiple threads.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/190a9765-e00c-4f1b-b784-b81851d2a0c4n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/190a9765-e00c-4f1b-b784-b81851d2a0c4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1d6812e6-2b78-49e5-a341-2f0c5505126bn%40googlegroups.com.

Reply via email to