Hello, I apologize in advance if this seems like the wrong place to post this. It is Tesseract-related, but it seems like the issue may be more at fault with .NET than Tesseract. However, I have found almost no one else who has this particular issue and I'm running out of options.
The issue and code are described in detail here: https://stackoverflow.com/questions/60456829/why-does-calling-the-tesseract-process-cause-this-service-to-crash-randomly I will summarize in order to avoid simply copy-pasting everything from my SO post. We run Tesseract 4.00 in multiple threads on an Ubuntu 18.04 VM, and it is called as an external process from a .NET Core 2.1 application (I have also tried upgrading to 3.1, but that did not seem to make a difference). I am aware of the "OMP_THREAD_LIMIT" variable, but we want to process multiple pages from a split document file at once, so we call Tesseract on multiple threads (currently, it's set to 8 degrees of parallelism). This didn't have any issues in the past, but recently I have been making changes to reduce the number of reads/writes to disk in the service, and now it seems to crash with the message "Error while reaping child" randomly while processing a file. The stack trace is in the SO post. Rarely it won't happen at all, but usually it will occur (more likely on larger files since the processes need to run more frequently). It could occur at the very start of processing a document or at the very end. I have tried using the prerelease of the API wrapper found here https://github.com/charlesw/tesseract which uses a recent version of Tesseract, but it does not seem to handle multithreading very well (I suppose I could just be using it wrong, but it does not allow me to process multiple pages simultaneously without disposing the first page). It seems like an issue with the Process class in .NET cleaning up the child resources when a process ends. Tesseract is a child process to the dotnet process when it is called. However, I'm really not sure what I can do to make .NET clean up the children without throwing an error. I was reading the .NET Core source code and they mentioned that they must make a global lock in order to add/remove process references ( https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs). I'm wondering if there is some interaction between multithreading, possibly the GC, and this global ref table that causes an issue. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f142e6d5-c5ab-4021-a553-2951818e85c1%40googlegroups.com.

