RE: [Pound Mailing List] when backend hangs

Joe Gooch Wed, 06 May 2009 07:01:32 -0700

> -----Original Message-----
> From: Xiwen Cheng [mailto:[email protected]]
> Sent: Wednesday, May 06, 2009 4:41 AM
> To: [email protected]
> Subject: Re: [Pound Mailing List] when backend hangs
>
> On Mon, May 04, 2009 at 09:49:35AM -0400, Joe Gooch wrote:
> >
> > Pound will only mark a backend dead if the TCP connection
> to the backend fails.  (For instance, I'll add an iptables
> rule on the backend to REJECT connections to the http port
> when doing maintenance)  Similarly, resurrect checks for a
> TCP connection to the backend.
> >
> > What you're talking about would happen if the TCP
> connection succeeded and the httpd could not return data.
> This could also happen if a backend process were running and
> generating content, but took a long time to complete.  (This
> happens a lot in my situation)  I wouldn't want my backend to
> be marked dead because someone ran a large report.
> >
> > Which is why the checks for life are so rudimentary in
> pound.  But it's also why there's a HAPort directive.  You
> can craft a simple perl script that listens on a different
> port, tries to read a dummy file from NFS on connect
> attempts, and runs an accept() call if the check succeeds.
> If it doesn't, the backend will be marked dead and stay dead
> until that check succeeds.
> I understand the need for the HAPort directive. I actually
> considered it at
> some point. But if applied, the number of connections made to
> the backend
> will increase drastically, I would even say unnecessary. More
> connection means
> more activity. So in the end more overhead. I think it'd be
> cheaper, in
> terms of resources to use the information gathered from
> ongoing requests to
> determine the status of a backend. I didn't look through the
> source code
> to determine the precise behaviour of having HAPort defined.
> So if I'm wrong,
> please correct me.


The man page documents the behavior under "HIGH-AVAILABILITY".... Basically it 
polls that port every "Alive" seconds.

But the idea is that the port you check with HAPort is not the same port as the 
HTTP daemon.

> After all, in this case we're only interested in availablity
> of the backend
> as in: handle incoming HTTP requests. Using an external
> program to determine the
> availability of the data source still doesn't imply the
> availability of the
> backend itself. Such a situation is if the backend is under heavy load
> (your example) or in case the webserving daemon ends up in a
> race-condition
> situation.
>
> I don't think ignoring backend timeouts, as this seems to be
> the current
> behaviour in pound is desireable. Or the weak condition to
> resurrect a backend.
> Sure someone may be generating a large report which renders
> the server
> unresponsive for a limited period of time. But that doesn't
> change the fact
> the backend isn't responding. So it's logical to mark it
> either as dead or
> give a status to avoid requests being forwarded to it. And
> _only_ resurrect
> it when it answers to HTTP requests.
>

True, but if your data source isn't available, your backend isn't going to be 
able to serve the data, right?

In my case, one request running a long report may mean that one request takes a 
while to complete, but my other request threads are behaving as normal.  Unless 
all of my request threads are full, which Pound isn't going to know, because 
the vast majority of the requests are succeeding.  Plus, since I'm using 
session affinity, I would want Pound to be *very* sure that none of the 
requests are going to succeed to that backend before breaking sessions.

It's entirely possible that dynamic scaling might help you, as that tries to 
use timeout values to determine the better/best backend for any given 
connection.

It's also possible that your HAPort script doesn't just check NFS, or 
datasources.  It could also run a simple HTTP request against the backend to 
verify it responds.  I think the flexibility of the system was the reason it 
was done this way...  Since all applications/backends are very specific to 
their use, it's hard to implement a solution that would work for everyone.


> An efficient solution is to have the ability to define a URL for
> Availability check, which is _only_ used in case a backend
> has been marked
> dead/unavailable. This mechanism looks similar to any other monitoring
> solution like Nagios. The difference, which makes this superior to the
> existing solution, is the backend won't be unnecessarily bombarded by
> Availability request checks.

I don't really see any reason this couldn't be done.  It just means 
thr_resurect() in svc.c will need some additional code, such that if the 
connect succeeds, it sends a GET request to a URL (if defined in the config), 
and then it would need to know success conditions. (HTTP status code of 200?  
Response in n seconds or less?)  Might be worth it if Robert weighed in.

Then again, this was suggested 11/05/2005.
http://www.apsis.ch/pound/pound_list/archive/2005/2005-11/1131177343000

I think the difference in your case is that it would only check the Alive URL 
to resurrect, not regularly.

> > This question comes up a lot. I'm sure there are plenty of
> examples in the list archives.
> Couldn't find them. Maybe I should look harder.

http://osdir.com/ml/web.pound.general/2006-03/msg00055.html
http://www.apsis.ch/pound/pound_list/archive/2006/2006-06/1151100017000/index_html?fullMode=1
http://www.apsis.ch/pound/pound_list/archive/2006/2006-12/1165505787000

> > It is interesting however that kill_be does not log that it
> is killing a backend... That should likely happen.
> Indeed. Maybe someone else can shed some lights on this kind
> of bevahiour?

I've put a patch on my site that should add additional log messages.

http://users.k12system.com/mrwizard/pound/pound24.html


Joe
--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

RE: [Pound Mailing List] when backend hangs

Reply via email to