Re: Master prefork problems; CLOSE_WAIT problems

2002-05-05 Thread Helmut Apfelholz

Hi,

--- Jeremy Howard [EMAIL PROTECTED] wrote:
 Jeremy Howard wrote:
 
   I've seen a couple of problems over the last few
 weeks with master
   apparently failing to correctly maintain the
 prefork pool. We
   particularly see this problem with pop3d, which
 has more
   connects/disconnects than IMAP because of the
 nature of the protocol.
 
 ...
 
   The third issue is that when a process fails to
 shutdown correctly,
   such as if it segfaults, master does not seem to
 correctly keep track
   of the child process count. As a result,
 eventually the pool runs out
   and no more connections are accepted.
 
 I've found a way to fix this. In master.c
 reap_child, add:
 
c-s-ready_workers--;
 
 and
 
t-s-ready_workers--;
 
 ...immediately after the corresponding nactive--.
 This resolves the
 problem for me in the limited testing I've done to
 date.
I've applied the patch to our server and I must say it
doesn't work. The master process creates more and more
processes with this patch.

This is not the right solution to the problem.

BTW, does anyone work on the 'seen files locking'
problem?
Thanks

Helmut.



__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com



Re: Master prefork problems; CLOSE_WAIT problems

2002-05-05 Thread Jeremy Howard

Helmut Apfelholz wrote:

I've found a way to fix this. In master.c
reap_child, add:

   c-s-ready_workers--;

and

   t-s-ready_workers--;

...immediately after the corresponding nactive--.
This resolves the
problem for me in the limited testing I've done to
date.


I've applied the patch to our server and I must say it
doesn't work. The master process creates more and more
processes with this patch.

This is not the right solution to the problem.
  

That's correct. We've got a proper fix which I'll be posting in a few hours.





Re: Master prefork problems; CLOSE_WAIT problems

2002-05-02 Thread Scott Adkins

--On Thursday, May 02, 2002 9:52 AM +1000 Jeremy Howard 
[EMAIL PROTECTED] wrote:

 I've seen a couple of problems over the last few weeks with master
 apparently failing to correctly maintain the prefork pool. We
 particularly see this problem with pop3d, which has more
 connects/disconnects than IMAP because of the nature of the protocol.

 The first issue is that in shut_down() sockets are not closed. It seems
 that this can leave sockets in CLOSE_WAIT state in certain error
 situations where popd_reset() is not called.

 The second issue is that we sometimes see sockets remain in a CLOSE_WAIT
 state because there is still data to be read. It appears that prot_fill()
 should be called in popd_reset() and shut_down().

 The third issue is that when a process fails to shutdown correctly, such
 as if it segfaults, master does not seem to correctly keep track of the
 child process count. As a result, eventually the pool runs out and no
 more connections are accepted.

 Do the resolutions to the first two issues sound correct (we have made
 these changes and it seems to have fixed things for us)? Does anyone have
 a fix for the third issue?

YES!  I believe you have hit the nail on the head on all 3 of the issues
above!  Good job!

We have been particularly bitten by the third issue with the master process
losing track of the number of child processes in each service maintained.
There has to be a better way for the master process to manage its children.

I was thinking that it would be nice if the cyrus server used shared memory
to keep track of the children, which ones were active or idle, which ones
haven't checked in with the master in awhile (possible problem), etc.  If
the master had a incoming client queue, the children could pick up the
next client and run with it.  Furthermore, all the client information, such
as number of clients each one has handled, etc would be available to the
master, which is the only cyrus process that currently has SNMP support.
This would make SNMP stats far more useful.

Anyways, I wish I had the time right now to dive into the master/child
communicatin problem, but I am glad somebody else has seen the problem too!

Scott
--
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
  Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/
   UNIX Systems Engineer  mailto:[EMAIL PROTECTED]
ICQ 7626282 Work (740)593-9478 Fax (740)593-1944
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
 PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/


msg07532/pgp0.pgp
Description: PGP signature


Master prefork problems; CLOSE_WAIT problems

2002-05-01 Thread Jeremy Howard

I've seen a couple of problems over the last few weeks with master 
apparently failing to correctly maintain the prefork pool. We 
particularly see this problem with pop3d, which has more 
connects/disconnects than IMAP because of the nature of the protocol.

The first issue is that in shut_down() sockets are not closed. It seems 
that this can leave sockets in CLOSE_WAIT state in certain error 
situations where popd_reset() is not called.

The second issue is that we sometimes see sockets remain in a CLOSE_WAIT 
state because there is still data to be read. It appears that 
prot_fill() should be called in popd_reset() and shut_down().

The third issue is that when a process fails to shutdown correctly, such 
as if it segfaults, master does not seem to correctly keep track of the 
child process count. As a result, eventually the pool runs out and no 
more connections are accepted.

Do the resolutions to the first two issues sound correct (we have made 
these changes and it seems to have fixed things for us)? Does anyone 
have a fix for the third issue?





Re: Master prefork problems; CLOSE_WAIT problems

2002-05-01 Thread Jeremy Howard

Jeremy Howard wrote:

  I've seen a couple of problems over the last few weeks with master
  apparently failing to correctly maintain the prefork pool. We
  particularly see this problem with pop3d, which has more
  connects/disconnects than IMAP because of the nature of the protocol.

...

  The third issue is that when a process fails to shutdown correctly,
  such as if it segfaults, master does not seem to correctly keep track
  of the child process count. As a result, eventually the pool runs out
  and no more connections are accepted.

I've found a way to fix this. In master.c reap_child, add:

   c-s-ready_workers--;

and

   t-s-ready_workers--;

...immediately after the corresponding nactive--. This resolves the
problem for me in the limited testing I've done to date.