Hi, Alberto. At program end time, POE runs a quick waitpid() check for child processes that may have leaked. This check was added after a bug report where POE locked up a server after several days of running. It turned out to be the reporter's application, but it was hard to debug.
Your program seems to have created two processes that it didn't reap: PIDs 5373 and 5374. The ideal solution is to reap those processes before exiting. Your program can do this using POE::Kernel's sig_child() method. In some cases, a third-party library will create processes and not properly clean them up. It can be impossible to solve this case without modifying other people's code. If you just want to ignore the problem, this might do the trick. Put these lines in your last _stop handler. They should reap the processes you've leaked before POE's check: use POSIX ":sys_wait_h"; 1 while waitpid(WNOHANG, -1) > 0; It's a bit of a pain, but I think it's better to explicitly ignore the problem than for it to go unnoticed by default. Please let me know whether that resolves your problem. It may not. For example, the processes may still be open until an object is destroyed at global destruction time. -- Rocco Caputo <rcap...@pobox.com> On Mar 24, 2014, at 05:46, albertocurro <albertocu...@zoho.com> wrote: > Guys, > > We have a product developed using POE as a base framework, with some other > tool libraries as log4perl; basically is a forward proxy, composed of several > modules, each one of them comprising a POE::Session; all of them share an > internal queue of tasks to be performed. Each module performs several tasks > on initialization, and if anything goes wrong, croak() is called to stop the > service -> this is considered ok, since croak() is only called during > initialization, when validation is being performed. > > The product is stable and works really fine, but recently I updated POE to > the latest version, and since then we can see this message in the logs: > > registering pdu failed: 263! > === 5267 === 5 -> on_handle (from Handler/StoreRemote.pm at 87) > === 5267 === 5 -> on_retry (from Handler/StoreRemote.pm at 141) > === 5267 === 9 -> on_handle (from Handler/StoreRemote.pm at 87) > === 5267 === 9 -> on_retry (from Handler/StoreRemote.pm at 141) > === 5267 === !!! Kernel has child processes. > === 5267 === !!! Stopped child process (PID 5373) reaped when > POE::Kernel->run() is ready to return. > === 5267 === !!! Stopped child process (PID 5374) reaped when > POE::Kernel->run() is ready to return. > === 5267 === !!! At least one child process is still running when > POE::Kernel->run() is ready to return. > === 5267 === !!! Be sure to use sig_child() to reap child processes. > === 5267 === !!! In extreme cases, failure to reap child processes has > === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems. > mkdir /mnt/nfs99: Permission denied at Handler/Store.pm line 147 > > first lines and last line above are the errors itself, but this part is new > since the upgrading: > > === 5267 === !!! Kernel has child processes. > === 5267 === !!! Stopped child process (PID 5373) reaped when > POE::Kernel->run() is ready to return. > === 5267 === !!! Stopped child process (PID 5374) reaped when > POE::Kernel->run() is ready to return. > === 5267 === !!! At least one child process is still running when > POE::Kernel->run() is ready to return. > === 5267 === !!! Be sure to use sig_child() to reap child processes. > === 5267 === !!! In extreme cases, failure to reap child processes has > === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems. > > I can see it everytime the service is stopped because of an unhandled > condition, even when POE's event loop has been already running for ours. It > was not visible before, and I can't get rid of it in any way. I've tried > different ways to avoid it with no luck. > > Any advice or alternative approach on this? > > Many thanks > Alberto >