Hello zdenop, My idea is use multithreading for multiple tiffs - e.g. contain 30 pages. Currently tesseract is working on 1 thread, not using its full potential for Windows - We could get for example half of available system threads and automatically allocate some of the pages of the tiff file as independent images. The results would be collected into 1 structure, which would be, for example a map of results. The implementation could be carried out on the level of some wrapper class, whih has been prepared for communication with the OCR engine. Example functional diagram for a 4-core processor is presented by below schema. Is this a good direction to run several Tesseract OCR instances simultaneously?
[image: Test.png] poniedziałek, 9 maja 2022 o 18:44:05 UTC+2 zdenop napisał(a): > Hello, > > 1) search issue tracker for openmp[1] reports for more details. There are > different experiences. For me, it seems for me like it does not help on > linux (and mac?) - just consumes the CPU. My experience[2] is that it helps > on windows, but maybe it is the question of HW& SW configuration. To be on > the safe side - OpenMP is turned off by default, so if somebody turns it > on, such user/developer should be responsible for the consequences ;-) > > 2) I made some test with multithreading of tesserocr in python and it does > not work for me. It works only with 1 thread (I never use multithreading, > so maybe the problem is on my side.). > > Anyway expect and contribution in this area (OpenMP) is warmly welcomed. > > [1]https://github.com/tesseract-ocr/tesseract/issues?q=is%3Aissue+openmp > [2] https://github.com/tesseract-ocr/tessdoc/blob/main/Benchmarks.md > > Zdenko > > > ne 8. 5. 2022 o 14:24 Krzysztof J <[email protected]> napísal(a): > >> have the problems & questions: >> >> 1). Question 1: While preparing the build, I noticed that the >> "OPENMP_BUILD" setting is not included when building the solution see below: >> >> [image: configuration_tesseract.png] >> >> Anyone can say something more about it? Is using multiprocessing at the >> moment recommended? What's the state of it now? I only saw subject # 1662 >> <https://github.com/tesseract-ocr/tesseract/issues/1662> where it was >> turned off, but it was 4 years ago :o >> >> 2). Question: Are there any other ways to take advantage of >> multithreading in Tesseract besides OpenMP in Tesseract 5.1.0? Anyone have >> experience in this topic? For now I am working on 1 thread, but ultimately >> I would like to switch to multiple threads. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/190a9765-e00c-4f1b-b784-b81851d2a0c4n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/190a9765-e00c-4f1b-b784-b81851d2a0c4n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1d6812e6-2b78-49e5-a341-2f0c5505126bn%40googlegroups.com.

