Hello,

I have a cygwin installation under which I'm running the 
Net::Server::Fork daemon "munin-node".  For those not aware, munin is a 
monitoring system which is really easy to use and configure 
(http://munin.projects.linpro.no).

That said, it's not working properly.

Here's the trouble I'm having - maybe somebody has seen it before and 
can push me in the right direction?

The basic flow of the daemon is:

<start munin-node which attaches to a port and runs fine>
<connect from other machine1>
<machine1 request plugin>
<server forks() and then exec() the 'plugin' to gather data>
<plugin returns data to parent via STDIN>
<plugin exits>
<parent returns data to machine1>
<sing songs, drink beer>

Now, this is breaking unfortunately, so I never get to sing songs and 
drink beer.

What seems to happen is the child gets fork()ed and then the plugin code 
exec()'d.  The data then comes back up the line via the STDIN to the 
parent, however, despite the child finishing execution (I've made sure 
all sockets are closed and even tried a die()) it never exits.

I've made the sure the data is actually coming back by putting a print 
in the while loop and that shows that it's coming back from the child. 
All the data makes it back, but the while loop doesn't finish and the 
timeout alarm hits, so the child gets reaped.  When it's reaped it 
returns "Interrupted system call".

Ive tried replacing the exec() with a dirty hack of system();exit(); but 
exactly the same thing happens.

The relevant code which does the running of the plugin is below:

(Full code: 
http://munin.projects.linpro.no/browser/branches/1.2-stable/node/munin-node.in)


..
     print "# Forking .. \n" if $DEBUG;
     if ($child = open (CHILD, "-|")) {
       eval {
           local $SIG{ALRM} = sub { $timed_out=1; die "$!\n"};
           alarm($timeout);
           while(<CHILD>) {
             #last if $_ eq "# DONE";
             if ($_ eq "# DONE") { close(CHILD); }
             push @lines,$_;
             print "#DEBUG CHILD: $_" if $DEBUG;
           }
           print "# Finished gathering data from Child\n" if $DEBUG;
       };
       if( $timed_out ) {
           print "# Child timed out - calling reap_children $@ \n" if 
$DEBUG;
           reap_children($child, "$service $command: $@");
           close (CHILD);
           return ();
       }
       unless (close CHILD)
       {
           if ($!)
           {
               # If Net::Server::Fork is currently taking care of reaping,
               # we get false errors. Filter them out.
               unless (defined $autoreap and $autoreap)
               {
                   logger ("Error while executing plugin \"$service\": $!");
               }
           }
           else
           {
               logger ("Plugin \"$service\" exited with status $?. 
[EMAIL PROTECTED]");
           }
       }
     else {
       if ($child == 0) {
         my $timenow = localtime();
         print "# Child forked as $$ - $timenow\n" if $DEBUG;
         # New process group...
         POSIX::setsid();

         ..
         <child stuff here>
         ..

             print "# Execing $servicedir/$service $command\n" if $DEBUG;
             exec ("$servicedir/$service", $command);

     ..

Now, after doing some more debugging:

**********************************************^M
Program name: C:\cygwin\bin\perl.exe (windows pid 5936)^M
App version:  1005.24, api: 0.156^M
DLL version:  1005.24, api: 0.156^M
DLL build:    2007-01-31 10:57^M
OS version:   Windows NT-5.2^M
Date/Time:    2007-08-15 18:04:46^M
**********************************************^M
   114     376 [main] perl (5936) child_copy: cygheap - hp 0x67C low 
0x611668E0, high 0x6116BBF8, res 1^M
    47     423 [main] perl (5936) child_copy: done^M
    70     493 [main] perl (5936) open_shared: name (null), n 4, shared 
0x60000000 (wanted 0x60000000), h 0xEC^M
    99     592 [main] perl (5936) heap_init: heap base 0x10410000, heap 
top 0x10760000^M
    62     654 [main] perl (5936) open_shared: name (null), n 1, shared 
0x60010000 (wanted 0x60010000), h 0xF0^M
    43     697 [main] perl (5936) user_shared_initialize: opening user 
shared for '' at 0x60010000^M
    44     741 [main] perl (5936) user_shared_initialize: user shared 
version B1D50001^M
    58     799 [main] perl (5936) open_shared: name (null), n 2, shared 
0x60040000 (wanted 0x60040000), h 0xF4^M
   186     985 [main] perl (5936) open_shared: name 
Global\cygwin1S4.cygpid.5936, n 5936, shared 0x60030000 (wanted 
0x60030000), h 0x768^M
    54    1039 [main] perl 5936 set_myself: myself->dwProcessId 5936^M
    84    1123 [main] perl 5936 child_copy: dll data - hp 0x67C low 
0x61100000, high 0x61104BA0, res 1^M
12277544 12278667 [main] perl 5936 child_copy: dll bss - hp 0x67C low 
0x6113F000, high 0x611483D0, res 1^M
  6188 12284855 [main] perl 5936 child_copy: user heap - hp 0x67C low 
0x10410000, high 0x10760000, res 1^M
    92 12284947 [main] perl 5936 child_copy: done^M
   108 12285055 [main] perl 5936 child_copy: data - hp 0x67C low 
0x408000, high 0x408010, res 1^M
    98 12285153 [main] perl 5936 child_copy: bss - hp 0x67C low 
0x40A000, high 0x40A0F0, res 1^M
    56 12285209 [main] perl 5936 child_copy: done^M

It would seem the dll bss copy is taking 12 seconds, which causes the 
client to timeout.

The machine running this is a quad xeon woodcrest with 16Gb ram so it 
shouldn't have any issue with power (cpu is very low).

Can I give any data to help debug this or has anybody seen this issue 
before?

Thanks,

George
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to