Bump. A scenario this might occur is: Say webdata are served over NFS, but if the NFS server becomes unresponsive, either it's a local(backend host) problem or the NFS server. As a result requests coming in for apache all stall until they time out.
The bottomline is, I think the condition to resurrect a backend must be stricter. Anyone can provide more insight in this matter? Kind regards, Xiwen On Mon, Apr 06, 2009 at 03:13:19PM +0200, Xiwen Cheng wrote: > Product: Pound-2.4.4 > > To avoid confusion I will define some terms up front: > * backend hangs: the backend is still active listening for connections > but doesn't respond. As a result all requests time out. > * backend died: the backend processes are dead. Its associated ports were > released back to the system. > > When a backend hangs, pound doesn't seem to mark it as "dead". But > later in time it resurrects the backend: > Apr 5 06:08:22 web pound: (4079d950) connect_nb: error after getsockopt: > Connection timed out > Apr 5 06:08:22 web pound: (4079d950) backend XXX.XXX.XXX.XXX:80 connect: > Connection timed out > Apr 5 06:08:36 web pound: (40185950) e500 response error read from > XXX.XXX.XXX.XXX:80/GET /jun2001/ HTTP/1.1: Connection reset by peer (224.085 > secs) > Apr 5 06:08:39 web pound: BackEnd XXX.XXX.XXX.XXX:80 resurrect > Apr 5 06:08:41 web pound: (405d6950) e500 response error read from > XXX.XXX.XXX.XXX:80/GET / HTTP/1.1: Connection reset by peer (215.073 secs) > > Pound sees that the backend doesn't respond but doesn't mark it as dead. > No occurence of "dead" were found prior to "resurrect". Even though pound > reports the backend couldn't handle the requests (time outs or connection > reset by peer), pound still dispatches them to this faulty backend. > > A checklist based on the possible log-messages to trace what could have > happened in connect_nb(): > Line message Present in syslog? > svc.c:787 fcntl GETFL failed n > svc.c:791 fcntl SETFL failed n > svc.c:798 connect failed n > svc.c:805 fcntl reSETFL failed n > svc.c:817 poll timed out n > svc.c:820 poll failed n > svc.c:827 getsockopt failed n > svc.c:833 fcntl reSETFL failed n > svc.c:840 error after getsockopt y** > > ** see log-snippet in the beginning > > connect_nb() returned -1 several times when it was called, but why isn't > the backend marked as dead? -- -- Xiwen Cheng System Administrator ;" Enthusiasm is contagious, Mathematical Institute ; but hype is a disease. " Leiden University ;E-mail: [email protected] Niels Bohrweg 1 K210 ;Office: (+31) 715277134 2333 CA Leiden ;Mobile: (+31) 611119991 The Netherlands ;GPG Key id: 194F572B ++ -- To unsubscribe send an email with subject unsubscribe to [email protected]. Please contact [email protected] for questions.
