To fix this I suggest changing the line in peruser.c:2111

ret = apr_poll(pollset, num_listensocks, &n, 1000000);

so it gets a time out of 1000000 (one second). And below it do a check
if the main process is still running. If not, quit.

Hannes


On 25 January 2011 16:56, Hannes Landeholm <[email protected]> wrote:
> Oh yeah, and it's even worse since my automatic watchdog script
> doesn't even know apache doesn't work anymore since it sees alive
> httpd processes running and think everything is hunky dory.
>
> Hannes
>
> On 25 January 2011 16:55, Hannes Landeholm <[email protected]> wrote:
>> Hi,
>>
>> I see a whole bunch of loose children that are stopped and refuse to
>> exit even though their parent process has died a long long time ago.
>> This has happened multiple times. I think it happens when the parent
>> exits ungracefully like when it's crashed. Can you add a check that
>> terminates child processes when the parent is killed? This is
>> exceptionally annoying when multiplexers do this since they block
>> apache from restarting as they block the listen port. Since not even
>> automatic watchdog scripts can bring back apache to life when that
>> happens I'd say this is a critical/major bug.
>>
>> Here's a backtrace for one of the borked kids:
>>
>> #0  0x00007f04fd20ef58 in *__GI___poll (fds=0x7fffe4e50740, nfds=2,
>> timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
>> #1  0x00007f04fdd3c230 in apr_poll (aprset=0x1c44670, num=2,
>> nsds=0x7fffe4e508c8, timeout=-1) at poll/unix/poll.c:120
>> #2  0x000000000046c0da in child_main (child_num_arg=<value optimized
>> out>) at peruser.c:2111
>> #3  0x000000000046cfe9 in make_child (s=0xd34b38, slot=14) at peruser.c:2534
>> ...
>>
>> It's also proably related to the earlier mutex warning/critical child error.
>>
>> Hannes
>>
>
_______________________________________________
Peruser mailing list
[email protected]
http://www.telana.com/mailman/listinfo/peruser

Reply via email to