Re: Problems with HAProxy, down servers and 503 errors

2009-01-25 Thread Willy Tarreau
On Sun, Jan 25, 2009 at 07:06:23PM -0500, John Marrett wrote:
> Willy, 
> 
> > No problem, no time wasted yet !
> 
> Well, none of your time :) It took me far longer than it should have to
> realise my error. Regretable, packet captures are usually my first
> diagnostic tool. A mistake I won't make again any time soon.
> 
> > Have you at least found a solution to your issue ?
> 
> I've found a partial solution to my issue, and in fact, now I have a
> question that's relevant to the list. The backend server is IIS, if
> you're getting 503s during shutdowns, you can use this solution to turn
> them into RSTs [1]. 
> 
> The RST is sent by IIS after it receives the full client request from
> HAProxy (I suspect that it may want to see the Host header before it
> decides how it wants to treat the request). When HAProxy receives the
> RST it returns a 503 to the client (respecting the errorfile!). Despite
> the presence of "option redistribute", HAProxy does not send the request
> to another backend server.
> 
> If there was a way to get HAProxy to send the request to another
> functional real server at this time it would be great, though I fear
> that HAProxy no longer has the request information after having sent it
> to the server.

You're perfectly right, redispatch only happens when the request is still
in haproxy. Once it has been sent, it is cannot be performed. It must not
be performed either for non idempotent requests, because there is no way
to know whether some processing has begun on the server before it died
and returned an RST.

> Any further advice would be much appreciated, I can provide packet
> captures off list if required.

Shouldn't you include the Host header in the health checks, in order to
sollicit the final server and get a chance to see it fail ?

Regards,
Willy




RE: Problems with HAProxy, down servers and 503 errors

2009-01-25 Thread John Marrett
Forgot the link for the IIS 503 / RST solution:

http://technet.microsoft.com/en-us/library/cc757659.aspx

I believe that our application itself (currently) throws 503s, so we
couldn't use some kind of down server on 503 response type solution,
though we could probably change that if it might afford us a solution.

-JohnF

> -Original Message-
> From: John Marrett [mailto:jmarr...@mediagrif.com] 
> Sent: January 25, 2009 7:06 PM
> To: Willy Tarreau
> Cc: haproxy@formilux.org
> Subject: RE: Problems with HAProxy, down servers and 503 errors
> 
> Willy, 
> 
> > No problem, no time wasted yet !
> 
> Well, none of your time :) It took me far longer than it 
> should have to
> realise my error. Regretable, packet captures are usually my first
> diagnostic tool. A mistake I won't make again any time soon.
> 
> > Have you at least found a solution to your issue ?
> 
> I've found a partial solution to my issue, and in fact, now I have a
> question that's relevant to the list. The backend server is IIS, if
> you're getting 503s during shutdowns, you can use this 
> solution to turn
> them into RSTs [1]. 
> 
> The RST is sent by IIS after it receives the full client request from
> HAProxy (I suspect that it may want to see the Host header before it
> decides how it wants to treat the request). When HAProxy receives the
> RST it returns a 503 to the client (respecting the 
> errorfile!). Despite
> the presence of "option redistribute", HAProxy does not send 
> the request
> to another backend server.
> 
> If there was a way to get HAProxy to send the request to another
> functional real server at this time it would be great, though I fear
> that HAProxy no longer has the request information after 
> having sent it
> to the server.
> 
> Any further advice would be much appreciated, I can provide packet
> captures off list if required.
> 
> -JohnF
> 
> 



RE: Problems with HAProxy, down servers and 503 errors

2009-01-25 Thread John Marrett
Willy, 

> No problem, no time wasted yet !

Well, none of your time :) It took me far longer than it should have to
realise my error. Regretable, packet captures are usually my first
diagnostic tool. A mistake I won't make again any time soon.

> Have you at least found a solution to your issue ?

I've found a partial solution to my issue, and in fact, now I have a
question that's relevant to the list. The backend server is IIS, if
you're getting 503s during shutdowns, you can use this solution to turn
them into RSTs [1]. 

The RST is sent by IIS after it receives the full client request from
HAProxy (I suspect that it may want to see the Host header before it
decides how it wants to treat the request). When HAProxy receives the
RST it returns a 503 to the client (respecting the errorfile!). Despite
the presence of "option redistribute", HAProxy does not send the request
to another backend server.

If there was a way to get HAProxy to send the request to another
functional real server at this time it would be great, though I fear
that HAProxy no longer has the request information after having sent it
to the server.

Any further advice would be much appreciated, I can provide packet
captures off list if required.

-JohnF



Re: official paper of haproxy?

2009-01-25 Thread Willy Tarreau
Hi Monika,

On Thu, Jan 22, 2009 at 11:30:17AM +0100, Monika Spechtenhauser wrote:
> Dear Willy Tarreau!
> I am setting up a system with high availability and load balancing and I 
> would really like to use haproxy. As I am also writing my thesis about 
> this topic I would like to ask you if there exists an official or 
> scientific paper of haproxy?

What type of paper are you looking for ? You can download the
documentation. If you're more interested in architecture in general,
you could also download the old article I wrote about load-balancing,
it might help you getting started.

Regards,
Willy




Re: Stunnel + HAProxy + Apache + Tomcat

2009-01-25 Thread Willy Tarreau
Hi Jill,

On Thu, Jan 22, 2009 at 02:30:55PM -0500, Jill Rochelle wrote:
> I'm just getting started with all this; I thought I had this working 
> last year, but having issues now.
> 
> When using stunnel and xforwardfor with haproxy, is the URL suppose to 
> stay https or will it change to http?  If it changes to http, is it 
> secure; no lock shows in browser?

The URL used by the browser is still https, as it only defines the
protocol to use.

> Also, has anybody got this working along side Apache and Tomcat where 
> Apache is routing everything to tomcat as the main application is 
> running in tomcat only?
> Routing is port 80 (apache) > 85 (haproxy) >  (server for proxy - 
> goes back to apache) > mod_jk to tomcat

I see nothing abnormal in your description, though I've never used
tomcat.

> Parts of application is http and other parts are https.  I need the URL 
> to remain https (when entering that part of app) so that it is secure 
> and the lock for the certificate appears in the browser.
> 
> I feel like I'm missing something, but I can't put my finger on it.

If you're using apache's mod_proxy, you might have difficulties setting
up proxypass and proxypass reverse to make https appear as such. I don't
remember the exact details, but I know people who're constantly annoyed
by the fact that apache rewrites the URL when passing the request, instead
of leaving it untouched. This could be what you had in mind.

Regards,
Willy




Re: HAProxy and SSL

2009-01-25 Thread Willy Tarreau
Hi Nicholas,

On Sat, Jan 24, 2009 at 12:38:12AM +1300, Nicholas Fauchelle wrote:
> Good Evening,
> 
> Yes another haproxy and SSL question.
> I have looked over the archives and want to post this.
> 
> We are going to setup our own little cluster with several machines to  
> host a several domains, and a couple of these will need SSL.
> 
> We plan on running a haproxy passing to a handful of apache machines.
> 
> We are going to have to use name based  vhosts for this config, and  
> that is where our first issue with SSL starts however I believe that  
> can be solved by using a different port.
> 
> Eg.
> We might use *:443 for one domain, and *:445 for another.

I'd recommend that you used ports above 1024 instead of reusing wrong ports,
because 445 is used by a lot of Windows services.

> So in haproxy's config which will have all the different ips I should  
> be able to
> 
> listen domain1 72.x.x.1:443
>mode http
>option httpchk HEAD /check.txt HTTP/1.0
>server www1 10.0.0.1:443 check
>server www2 10.0.0.2:443 check
>server www3 10.0.0.3:443 check
> 
> 
> listen domain2 72.x.x.2:443
>mode http
>option httpchk HEAD /check.txt HTTP/1.0
>server www1 10.0.0.1:445 check
>server www2 10.0.0.2:445 check
>server www3 10.0.0.3:445 check
> 
> So to the user when they type in domain1.com:443 or domain2.com:443  
> they would both be done with SSL, and the requested is just passed  
> onto apache on the correct port.
> 
> Would I need to change the mode from http to tcp

Yes you need to use "mode tcp". Also, you need to change "option httpchk"
to "option ssl-hello-chk" in order to test the presence of SSL protocol
on the server.

> Is this a workable solution?

Yes it is. Keep in mind that your apache logs will reflect haproxy's IP
instead of the client's. If this is problematic, you can try to patch
your kernel with the tproxy patch.

> The user always connecting to the same backend apache server isn't a  
> problem for sessions.

You could even improve performance by using "balance source" so that
most users keep their SSL IDs in sync with the server's.

Hoping this helps,
Willy




Re: Check on Port 60000 not responding in time

2009-01-25 Thread Willy Tarreau
Hi Joseph,

On Fri, Jan 23, 2009 at 07:21:08PM -0500, Joseph Hardeman wrote:
> Hi Guys,
> 
> Here is a question I am hoping someone has either seen before or has a 
> suggestion for me.
> 
> For the first time since we put haproxy in months ago, the primary 
> haproxy we have did not respond in 10 seconds for the check on port 
> 6, which we have set as our health check port:
> 
> listen health_check 0.0.0.0:6
>mode health
> 
> When nagios checks port 6 it looks for the OK in the response.  Two 
> days ago, it did not get the OK in the max 10 second timeout.

Do you have an idea if the connection did at least establish ? I suspect
it hanged waiting for haproxy to accept it. Either the system's backlog
was full and a few SYNs were dropped, or the process's maxconn was reached
and no listener was accepting any more connection.

In fact, listeners in "mode health" are not scheduled at all and reply
immediately after the accept. That's why I suspect one of the issues
above.

> I am 
> running haproxy on a Dell R200, Dual Core 2.4GHz, with 2G of memory.  I 
> have gone over the system logs and have not been able to find anything 
> wrong.  I do have a script that is called via SNMP that calculates the 
> number of uniq IP's hitting the external IP on port 80 and at the time 
> it failed over there was only 2 IP's hitting it.  Because this is for a 
> client who can not have any down time I allow a single time out, 
> checking the status every minute, before failing haproxy over to the 
> backup system.  On the next check haproxy responded ok, but it had 
> already fallen over and no traffic was hitting it. 

It is very dangerous to take such a decision on only one fault. If you
need it to failover very fast, you should check it twice as fast and
at least accept one failure. There are multiple reasons for such a
failure to occur. The system might have been doing backups, swapping,
or the network interface's transceiver might have been renegociating
due to a transient error, etc... It is also possible that the nagios
probe itself was having difficulties (swap, network, CPU, ...) and
was victim of its own load.

> Has anyone else seen this happen where haproxy did not respond back or 
> it has taken longer than 10 seconds to respond?  I would think it might 
> have been internet traffic, but the checks are from another system that 
> is on the same network over Gig ports.

You could check the switch's error counters on each port, and the
servers' counters as well. At least you seem to have a high quality
network if you caught such an anomaly only once in several months.

Regards,
Willy




Re: Problems with HAProxy, down servers and 503 errors

2009-01-25 Thread Willy Tarreau
Hi John,

On Sun, Jan 25, 2009 at 11:23:24AM -0500, John Marrett wrote:
> I'm embarassed to report that this is not an HAProxy issue.

Don't feel embarassed. I'm glad that you found the issue. And it's
kind to send us an update.

> In addition to the changes being made on the load balancing level, we
> have also upgraded the backend real servers. It seems there has been a
> change in their shutdown procedure, where before they would stop
> responding immediately when a shutdown was initiated they now return 503
> errors during the (protracted) shut down process.
> 
> This explains both unusual issues perfectly clearly.
> 
> I'm very sorry for any time wasted looking into this issue.

No problem, no time wasted yet !

Have you at least found a solution to your issue ?

Regards,
Willy




RE: Problems with HAProxy, down servers and 503 errors

2009-01-25 Thread John Marrett
I'm embarassed to report that this is not an HAProxy issue.

In addition to the changes being made on the load balancing level, we
have also upgraded the backend real servers. It seems there has been a
change in their shutdown procedure, where before they would stop
responding immediately when a shutdown was initiated they now return 503
errors during the (protracted) shut down process.

This explains both unusual issues perfectly clearly.

I'm very sorry for any time wasted looking into this issue.

-JohnF 

> -Original Message-
> From: John Marrett [mailto:jmarr...@mediagrif.com] 
> Sent: January 23, 2009 5:38 PM
> To: haproxy@formilux.org
> Subject: Problems with HAProxy, down servers and 503 errors
> 
> We have been using HAProxy in a production environment, without issue
> for a long period. Thanks for a wonderful product!
> 
> Unfortunately we recently encountered some issues as we have worked on
> the migration of one of our sites onto a new HAProxy based load
> balancing solution. We've started to notice issues related to 
> persistent
> cookies, client requests and down backend servers.
> 
> This new application requires users remain on the same web server to
> avoid losing session information, which is not shared between backend
> servers. If we stop one of the backend servers (port 80 is no longer
> listening, HAProxy receives a RST packet from the server when 
> sending a
> health check or client request) the clients who have a persistent
> session on this web server will continue to be sent to the until it is
> formally declared down (after two health checks fail, as controlled by
> the fall 2 parameter).
> 
> Here's where we run into our issue:
> 
> While HAProxy is receiving RST responses, it sends a 503 
> response to the
> client. We're not very eager to send this error response to 
> the client.
> It appeared, from my reading of the documentation, that by setting
> "option redispatch" and "retries 3" (or greater than 1) we should get
> HAProxy to retry, and, in the event of explicit connection 
> failure from
> the backend server, move on to the next functioning server on 
> the final
> retry. This doesn't appear to be the case. 
> 
> To make matters worse, when HAProxy throws a 503 response because of a
> RST it ignores the errorfile directive. If you have two servers, and
> stop one you will receive an extremely plain 503 error response. If no
> backend is available at all the errorfile directive functions 
> properly,
> and the "pretty" error message is returned.
> 
> Ideally, in the event that a backend server is returning 
> RSTs, we'd like
> to move to the next server. HAProxy could either do this 
> immediately or
> buffer the request until it makes the final determination that the
> backend server is down and can send it to another server. If 
> that isn't
> possible, we'd really like the 503 response returned to the 
> client to be
> the one specified in the errorfile. 
> 
> I'm going to investigate the possibility of creating a patch for this
> issue tonight, though if more experience hands could help, either with
> the patch, or with something obvious that I've missed in my
> configuration, I'd greatly appreciate it. 
> 
> A few other notes:
> 
> Interestingly, even upon receiving a RST to a client request to the
> backend server, HAProxy doesn't consider the server as having failed a
> health check until it performs it's next health check. So, if 
> you have a
> health checking interval of 10 seconds, if a customer makes a 
> request 1
> second after the first health check, with fall 2 set, it will take 29
> seconds before the backend server is declared down and the 
> client moved
> on to the next server.
> 
> The documentation for the track command could also be made a bit
> clearer, it took me a while (and a colleagues examination of 
> the source)
> to determine that the  is another backend, in the case that you
> are trying to reference a server from a different backend. (Perhaps,
> depending on your configuration, it's not always a backend, 
> but could be
> something else?).
> 
> Thank you for taking the time to read this novella :), configuration
> follows, thanks in advance for your help,
> 
> -JohnF
> 
> Configuration Details
> 
> We are running 1.3.15.7, with the following configuration (excerpt):
> 
> global
>   stats socket /var/run/haproxy.stat
> 
> defaults
>   balance roundrobin
>   cookie SERVERID insert indirect
>   option httpchk GET /index.html HTTP/1.0
>   timeout client 10m
>   timeout server 10m
>   timeout connect 3s
> 
> frontend http_frontend *:80
>   mode http
>   reqirep ^Host:([^:]*) Host:\1
>   #Traffic matching ACLs
> [...]
>   acl host_qa_site_com hdr(host) -i qa.site.com
>   use_backend qa_site_com_http if host_qa_site_com
> frontend ssl_frontend *:81
>   mode http
>   reqirep ^Host:([^:]*) Host:\1
>   #Traffic matching ACLs
> [...]
>   acl host_qa_site_com hdr(host) -i qa.site.com
>   use_backend qa2_site_com_ssl if host_qa_site_com
> 

Re: Makefile.bsd patch for SS 20081208

2009-01-25 Thread Willy Tarreau
On Thu, Jan 22, 2009 at 06:32:41PM -0500, Ross West wrote:
> 
> Did a full compile of the 1.3.15.7 - 20081208 snapshot on Freebsd-7.x
> recently, and noted that there needs to be a quick patch done on the
> Makefile for bsd machines.
> 
> This was due to the stream_interface replacing the send data commands
> in the rewrite Willy did a while ago.
> 
> Simple fix, and it compiled cleanly otherwise.  Thanks for the work
> Willy!

Thanks Ross, and I've applied it to Makefile.osx too.

Cheers,
Willy