Hello, I apologize in advance if this seems like the wrong place to post 
this. It is Tesseract-related, but it seems like the issue may be more at 
fault with .NET than Tesseract. However, I have found almost no one else 
who has this particular issue and I'm running out of options.

The issue and code are described in detail here: 
https://stackoverflow.com/questions/60456829/why-does-calling-the-tesseract-process-cause-this-service-to-crash-randomly

I will summarize in order to avoid simply copy-pasting everything from my 
SO post. We run Tesseract 4.00 in multiple threads on an Ubuntu 18.04 VM, 
and it is called as an external process from a .NET Core 2.1 application (I 
have also tried upgrading to 3.1, but that did not seem to make a 
difference). I am aware of the "OMP_THREAD_LIMIT" variable, but we want to 
process multiple pages from a split document file at once, so we call 
Tesseract on multiple threads (currently, it's set to 8 degrees of 
parallelism). This didn't have any issues in the past, but recently I have 
been making changes to reduce the number of reads/writes to disk in the 
service, and now it seems to crash with the message "Error while reaping 
child" randomly while processing a file. The stack trace is in the SO post. 
Rarely it won't happen at all, but usually it will occur (more likely on 
larger files since the processes need to run more frequently). It could 
occur at the very start of processing a document or at the very end.  

I have tried using the prerelease of the API wrapper found here 
https://github.com/charlesw/tesseract which uses a recent version of 
Tesseract, but it does not seem to handle multithreading very well (I 
suppose I could just be using it wrong, but it does not allow me to process 
multiple pages simultaneously without disposing the first page).

It seems like an issue with the Process class in .NET cleaning up the child 
resources when a process ends. Tesseract is a child process to the dotnet 
process when it is called. However, I'm really not sure what I can do to 
make .NET clean up the children without throwing an error. I was reading 
the .NET Core source code and they mentioned that they must make a global 
lock in order to add/remove process references (
https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs).
 
I'm wondering if there is some interaction between multithreading, possibly 
the GC, and this global ref table that causes an issue.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f142e6d5-c5ab-4021-a553-2951818e85c1%40googlegroups.com.

Reply via email to