Hi Rocco,

 many thanks for your quick answer! Unfortunately, the provided solution only 
works partially. I still have some cases where the "fork bomb" message is here 
with us :(

  One of the cases is this one: under some configuration, an instance of nginx 
is started, so our product writes the configuration file and starts the Nginx 
instance pointing to that configuration file. BUT, if the configuration file 
could not be written (directory does not exist, etc), then the error raises, 
and I've not found any way to handle it:

DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1
DEBUG - Created nginx configuration directory /opt/etc/pull/instance1
DEBUG - Created nginx log directory /opt/log/pull/instance1
DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1
=== 13991 === !!! Kernel has 1 child process(es).
=== 13991 === !!! At least one child process is still running when 
POE::Kernel->run() is ready to return.
=== 13991 === !!! Be sure to use sig_child() to reap child processes.
=== 13991 === !!! In extreme cases, failure to reap child processes has
=== 13991 === !!! resulted in a slow 'fork bomb' that has halted systems.
Could not open file: No such file or directory

 I've added a DIE handler in the main session to try to handle this:

 $sig_session = POE::Session->create(
    inline_states => {
        _start => sub {
            $_[HEAP]{RELOADED} = 0;
            $_[KERNEL]->sig(TERM => '_sigterm');
            $_[KERNEL]->sig(INT => '_sigterm');
            $_[KERNEL]->sig(DIE => '_sigterm');
            $_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload');
            $_[KERNEL]->alias_set('sighandler');
        },
        _sigdie => sub {
            print "Handling exception, calling stop";
            POE::Kernel->call($sig_session, '_stop');
        },
        _stop => sub {
            # Reap any existing pid (# 1825119)
            print "Handling stop";
            POE::Kernel->sig_child();
            use POSIX ":sys_wait_h";
            1 while waitpid(WNOHANG, -1) > 0;

            # Clear signal handlers...
            $_[KERNEL]->sig('TERM');

But, as said above, it's not working. Checking POE's code, I can see the 
message lines are generated in Resources/Signals.pm, under _data_sig_finalize() 
method (where POE is already doing the same you recommended me, waiting for the 
pid).

But _data_sig_finalize() method is called in Kernel.pm just after unregistered 
all the signals (Kernel.pm => _finalize_kernel):

 my $self = shift;

  # Disable signal watching since there's now no place for them to go.
  foreach ($self->_data_sig_get_safe_signals()) {
    $self->loop_ignore_signal($_);
  }

  # Remove the kernel session's signal watcher.
  $self->_data_sig_remove($self->ID, "IDLE");

  # The main loop is done, no matter which event library ran it.
  # sig before loop so that it clears the signal_pipe file handler
  $self->_data_sig_finalize();
  $self->loop_finalize();

 Once here, none of my signal handlers in the main session instance would work, 
as the signals have been unregistered. On an exception (die) while 
POE::Kernel->run(), how could I handle it then??

 Thanks a lot
 Alberto




---- Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió ---- 

>Hi, Alberto. 
> 
>At program end time, POE runs a quick waitpid() check for child processes that 
>may have leaked. This check was added after a bug report where POE locked up a 
>server after several days of running. It turned out to be the reporter's 
>application, but it was hard to debug. 
> 
>Your program seems to have created two processes that it didn't reap: PIDs 
>5373 and 5374. The ideal solution is to reap those processes before exiting. 
>Your program can do this using POE::Kernel's sig_child() method. 
> 
>In some cases, a third-party library will create processes and not properly 
>clean them up. It can be impossible to solve this case without modifying other 
>people's code. 
> 
>If you just want to ignore the problem, this might do the trick. Put these 
>lines in your last _stop handler. They should reap the processes you've leaked 
>before POE's check: 
> 
>use POSIX ":sys_wait_h"; 
>1 while waitpid(WNOHANG, -1) > 0; 
> 
>It's a bit of a pain, but I think it's better to explicitly ignore the problem 
>than for it to go unnoticed by default. 
> 
>Please let me know whether that resolves your problem. It may not. For 
>example, the processes may still be open until an object is destroyed at 
>global destruction time. 
> 
>-- 
>Rocco Caputo  
> 
>On Mar 24, 2014, at 05:46, albertocurro  wrote: 
> 
>> Guys, 
>> 
>> We have a product developed using POE as a base framework, with some other 
>> tool libraries as log4perl; basically is a forward proxy, composed of 
>> several modules, each one of them comprising a POE::Session; all of them 
>> share an internal queue of tasks to be performed. Each module performs 
>> several tasks on initialization, and if anything goes wrong, croak() is 
>> called to stop the service -> this is considered ok, since croak() is only 
>> called during initialization, when validation is being performed. 
>> 
>> The product is stable and works really fine, but recently I updated POE to 
>> the latest version, and since then we can see this message in the logs: 
>> 
>> registering pdu failed: 263! 
>> === 5267 === 5 -> on_handle (from Handler/StoreRemote.pm at 87) 
>> === 5267 === 5 -> on_retry (from Handler/StoreRemote.pm at 141) 
>> === 5267 === 9 -> on_handle (from Handler/StoreRemote.pm at 87) 
>> === 5267 === 9 -> on_retry (from Handler/StoreRemote.pm at 141) 
>> === 5267 === !!! Kernel has child processes. 
>> === 5267 === !!! Stopped child process (PID 5373) reaped when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! Stopped child process (PID 5374) reaped when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! At least one child process is still running when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! Be sure to use sig_child() to reap child processes. 
>> === 5267 === !!! In extreme cases, failure to reap child processes has 
>> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems. 
>> mkdir /mnt/nfs99: Permission denied at Handler/Store.pm line 147 
>> 
>> first lines and last line above are the errors itself, but this part is new 
>> since the upgrading: 
>> 
>> === 5267 === !!! Kernel has child processes. 
>> === 5267 === !!! Stopped child process (PID 5373) reaped when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! Stopped child process (PID 5374) reaped when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! At least one child process is still running when 
>> POE::Kernel->run() is ready to return. 
>> === 5267 === !!! Be sure to use sig_child() to reap child processes. 
>> === 5267 === !!! In extreme cases, failure to reap child processes has 
>> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems. 
>> 
>> I can see it everytime the service is stopped because of an unhandled 
>> condition, even when POE's event loop has been already running for ours. It 
>> was not visible before, and I can't get rid of it in any way. I've tried 
>> different ways to avoid it with no luck. 
>> 
>> Any advice or alternative approach on this? 
>> 
>> Many thanks 
>> Alberto 
>> 
> 
>

Reply via email to