Pound will only mark a backend dead if the TCP connection to the backend fails. 
 (For instance, I'll add an iptables rule on the backend to REJECT connections 
to the http port when doing maintenance)  Similarly, resurrect checks for a TCP 
connection to the backend.

What you're talking about would happen if the TCP connection succeeded and the 
httpd could not return data.  This could also happen if a backend process were 
running and generating content, but took a long time to complete.  (This 
happens a lot in my situation)  I wouldn't want my backend to be marked dead 
because someone ran a large report.

Which is why the checks for life are so rudimentary in pound.  But it's also 
why there's a HAPort directive.  You can craft a simple perl script that 
listens on a different port, tries to read a dummy file from NFS on connect 
attempts, and runs an accept() call if the check succeeds.  If it doesn't, the 
backend will be marked dead and stay dead until that check succeeds.

This question comes up a lot. I'm sure there are plenty of examples in the list 
archives.

It is interesting however that kill_be does not log that it is killing a 
backend... That should likely happen.

Take care!
Joe

> -----Original Message-----
> From: Xiwen Cheng [mailto:[email protected]]
> Sent: Monday, May 04, 2009 5:28 AM
> To: [email protected]
> Subject: Re: [Pound Mailing List] when backend hangs
>
> Bump.
>
> A scenario this might occur is: Say webdata are served over
> NFS, but if the NFS
> server becomes unresponsive, either it's a local(backend
> host) problem or
> the NFS server. As a result requests coming in for apache all
> stall until
> they time out.
>
> The bottomline is, I think the condition to resurrect a
> backend must be
> stricter.
>
> Anyone can provide more insight in this matter?
>
> Kind regards,
> Xiwen
>
> On Mon, Apr 06, 2009 at 03:13:19PM +0200, Xiwen Cheng wrote:
> > Product: Pound-2.4.4
> >
> > To avoid confusion I will define some terms up front:
> > * backend hangs: the backend is still active listening for
> connections
> >   but doesn't respond. As a result all requests time out.
> > * backend died: the backend processes are dead. Its
> associated ports were
> >   released back to the system.
> >
> > When a backend hangs, pound doesn't seem to mark it as "dead". But
> > later in time it resurrects the backend:
> > Apr  5 06:08:22 web pound: (4079d950) connect_nb: error
> after getsockopt: Connection timed out
> > Apr  5 06:08:22 web pound: (4079d950) backend
> XXX.XXX.XXX.XXX:80 connect: Connection timed out
> > Apr  5 06:08:36 web pound: (40185950) e500 response error
> read from XXX.XXX.XXX.XXX:80/GET /jun2001/ HTTP/1.1:
> Connection reset by peer (224.085 secs)
> > Apr  5 06:08:39 web pound: BackEnd XXX.XXX.XXX.XXX:80 resurrect
> > Apr  5 06:08:41 web pound: (405d6950) e500 response error
> read from XXX.XXX.XXX.XXX:80/GET / HTTP/1.1: Connection reset
> by peer (215.073 secs)
> >
> > Pound sees that the backend doesn't respond but doesn't
> mark it as dead.
> > No occurence of "dead" were found prior to "resurrect".
> Even though pound
> > reports the backend couldn't handle the requests (time outs
> or connection
> > reset by peer), pound still dispatches them to this faulty backend.
> >
> > A checklist based on the possible log-messages to trace
> what could have
> > happened in connect_nb():
> > Line                message                 Present in syslog?
> > svc.c:787   fcntl GETFL failed      n
> > svc.c:791   fcntl SETFL failed      n
> > svc.c:798   connect failed          n
> > svc.c:805   fcntl reSETFL failed    n
> > svc.c:817   poll timed out          n
> > svc.c:820   poll failed             n
> > svc.c:827   getsockopt failed       n
> > svc.c:833   fcntl reSETFL failed    n
> > svc.c:840   error after getsockopt  y**
> >
> > ** see log-snippet in the beginning
> >
> > connect_nb() returned -1 several times when it was called,
> but why isn't
> > the backend marked as dead?
>
> --
> --
> Xiwen Cheng
> System Administrator          ;" Enthusiasm is contagious,
> Mathematical Institute                ;  but hype is a disease. "
> Leiden University             ;E-mail: [email protected]
> Niels Bohrweg 1 K210          ;Office: (+31) 715277134
> 2333 CA Leiden                        ;Mobile: (+31) 611119991
> The Netherlands                       ;GPG Key id: 194F572B
> ++
>
>
> --
> To unsubscribe send an email with subject unsubscribe to
> [email protected].
> Please contact [email protected] for questions.
>
--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

Reply via email to