Bump.

A scenario this might occur is: Say webdata are served over NFS, but if the NFS
server becomes unresponsive, either it's a local(backend host) problem or 
the NFS server. As a result requests coming in for apache all stall until 
they time out. 

The bottomline is, I think the condition to resurrect a backend must be
stricter.

Anyone can provide more insight in this matter?

Kind regards,
Xiwen

On Mon, Apr 06, 2009 at 03:13:19PM +0200, Xiwen Cheng wrote:
> Product: Pound-2.4.4
> 
> To avoid confusion I will define some terms up front:
> * backend hangs: the backend is still active listening for connections
>   but doesn't respond. As a result all requests time out.
> * backend died: the backend processes are dead. Its associated ports were
>   released back to the system.
> 
> When a backend hangs, pound doesn't seem to mark it as "dead". But
> later in time it resurrects the backend:
> Apr  5 06:08:22 web pound: (4079d950) connect_nb: error after getsockopt: 
> Connection timed out
> Apr  5 06:08:22 web pound: (4079d950) backend XXX.XXX.XXX.XXX:80 connect: 
> Connection timed out
> Apr  5 06:08:36 web pound: (40185950) e500 response error read from 
> XXX.XXX.XXX.XXX:80/GET /jun2001/ HTTP/1.1: Connection reset by peer (224.085 
> secs)
> Apr  5 06:08:39 web pound: BackEnd XXX.XXX.XXX.XXX:80 resurrect
> Apr  5 06:08:41 web pound: (405d6950) e500 response error read from 
> XXX.XXX.XXX.XXX:80/GET / HTTP/1.1: Connection reset by peer (215.073 secs)
> 
> Pound sees that the backend doesn't respond but doesn't mark it as dead.
> No occurence of "dead" were found prior to "resurrect". Even though pound
> reports the backend couldn't handle the requests (time outs or connection
> reset by peer), pound still dispatches them to this faulty backend.
> 
> A checklist based on the possible log-messages to trace what could have
> happened in connect_nb():
> Line          message                 Present in syslog?
> svc.c:787     fcntl GETFL failed      n
> svc.c:791     fcntl SETFL failed      n
> svc.c:798     connect failed          n
> svc.c:805     fcntl reSETFL failed    n
> svc.c:817     poll timed out          n
> svc.c:820     poll failed             n
> svc.c:827     getsockopt failed       n
> svc.c:833     fcntl reSETFL failed    n
> svc.c:840     error after getsockopt  y**
> 
> ** see log-snippet in the beginning
> 
> connect_nb() returned -1 several times when it was called, but why isn't
> the backend marked as dead?

-- 
--
Xiwen Cheng
System Administrator            ;" Enthusiasm is contagious,
Mathematical Institute          ;  but hype is a disease. "
Leiden University               ;E-mail: [email protected]
Niels Bohrweg 1 K210            ;Office: (+31) 715277134
2333 CA Leiden                  ;Mobile: (+31) 611119991
The Netherlands                 ;GPG Key id: 194F572B
++


--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

Reply via email to