httpchk failures
We have 5 Apache servers behind haproxy and we're trying to enable use the httpchk option along with some performance monitoring. For some reason, haproxy keeps thinking that 3/5 apache servers are down even though it's obvious that haproxy is both asking the questions and the servers are answering. Is there a way to log httpchk failures? How can I ask haproxy why it seems to think that several apache servers are down? Our config: CentOS 6.x recently updated, 64 bit. Performing an agent-check manually seems to give good results. The below result is immediate: [root@xr1 ~]# telnet 10.1.1.12 9333 Trying 10.1.1.12... Connected to 10.1.1.12. Escape character is '^]'. up 78% Connection closed by foreign host. I can see that xinetd on the logic server got the response: Apr 13 18:45:02 curie xinetd[21890]: EXIT: calcload333 status=0 pid=25693 duration=0(sec) Apr 13 18:45:06 curie xinetd[21890]: START: calcload333 pid=26590 from=:::10.1.1.1 I can see that apache is serving happy replies to the load balancer: [root@curie ~]# tail -f /var/log/httpd/access_log | grep -i 10.1.1.1 10.1.1.1 - - [13/Apr/2015:18:47:15 +] OPTIONS / HTTP/1.0 302 - - - 10.1.1.1 - - [13/Apr/2015:18:47:17 +] OPTIONS / HTTP/1.0 302 - - - 10.1.1.1 - - [13/Apr/2015:18:47:19 +] OPTIONS / HTTP/1.0 302 - - - ^C Why is haproxy pulling this server out of the queue and setting status to DRAIN? Here's the stats output for the server: [pxname] = logic333 [svname] = server12 [qcur] = 0 [qmax] = 0 [scur] = 0 [smax] = 0 [slim] = 256 [stot] = 0 [bin] = 0 [bout] = 0 [dreq] = [dresp] = 0 [ereq] = [econ] = 0 [eresp] = 0 [wretr] = 0 [wredis] = 0 [status] = DRAIN [weight] = 0 [act] = 1 [bck] = 0 [chkfail] = 0 [chkdown] = 0 [lastchg] = 997 [downtime] = 0 [qlimit] = [pid] = 1 [iid] = 5 [sid] = 3 [throttle] = [lbtot] = 0 [tracked] = [type] = 2 [rate] = 0 [rate_lim] = [rate_max] = 0 [check_status] = L7OK [check_code] = 302 [check_duration] = 16 [hrsp_1xx] = 0 [hrsp_2xx] = 0 [hrsp_3xx] = 0 [hrsp_4xx] = 0 [hrsp_5xx] = 0 [hrsp_other] = 0 [hanafail] = 0 [req_rate] = [req_rate_max] = [req_tot] = [cli_abrt] = 0 [srv_abrt] = 0 [comp_in] = [comp_out] = [comp_byp] = [comp_rsp] = [lastsess] = -1 [last_chk] = Found [last_agt] = via agent : up [qtime] = 0 [ctime] = 0 [rtime] = 0 [ttime] = 0 [root@xr1 ~]# haproxy -v HA-Proxy version 1.5.6 2014/10/18 Copyright 2000-2014 Willy Tarreau w...@1wt.eu relevant parts of /etc/haproxy/haproxy.conf -- SNIP -- frontend FrontProd333 mode http option httplog option dontlognull option forwardfor bind [MYIP]:443 ssl crt star.mydomain.com.pem no-sslv3 default_backend logic333 stats uri /haproxy?stats stats realm StrictlyPrivate stats auth [SNIP] errorfile 503 /etc/haproxy/errorfile.http backend logic333 mode http option httpchk OPTIONS / server server10 10.1.1.10:20333 maxconn 256 check agent-check agent-port 9333 agent-inter 4000 server server11 10.1.1.11:20333 maxconn 256 check agent-check agent-port 9333 agent-inter 4000 server server12 10.1.1.12:20333 maxconn 256 check agent-check agent-port 9333 agent-inter 4000 server server13 10.1.1.13:20333 maxconn 256 check agent-check agent-port 9333 agent-inter 4000 server server14 10.1.1.14:20333 maxconn 256 check agent-check agent-port 9333 agent-inter 4000 -- /SNIP -- Is there any other information I could provide to help resolve? Thanks, Ben Smith
Re: tcp reset errors
Willy Tarreau w at 1wt.eu writes: Hi Franky, On Thu, Sep 11, 2014 at 01:08:09PM +0200, Franky Van Liedekerke wrote: On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke liedekef@... wrote: After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky Ok, after much trial and error, I pinned it down to the following: we have lots of servers doing ldap lookup for authentication, also when connecting via ssh. Now on EL5 servers this auth is done via a call to /usr/libexec/openssh/ssh-ldap-wrapper. Apparently this binary causes the resets to be shown in the haproxy error logs. I switched to the sssd version for EL5 servers, but that version did not include ssh-keys support, so the resets persisted. Again to the internet for the rescue: the version 1.9.6 for el5 can be found at http://copr-be.cloud.fedoraproject.org/results/sgallagh/sssd-1.9-rhel5/epel-5-x86_64 , and that version does support ssh correctly. Installing it, changing the ssh config et voila: no more resets. So the bug is in the ssh-ldap-wrapper, but I understand that doing a RST at the end is not bad, just not good either ... the side-effect of the new sssd is that much less ldap queries are made (as sudo and ssh use sssd too then), but I'll leave it up to the management to decide wether or not to go for that solution. Thanks for sending the details of your diagnostic. As you say, RST are not necessarily bad. When a client closes first, it has two options : - either send RST - or have the source port unusable for 2 minutes. Most of the time you chose the first option. In your case since you were seeing SD flags, it means the reset came fro mthe server, maybe the client was speaking inappropriately on the connection, causing the server to abort it. If so, it proves that the behaviour was properly chosen, because it allowed you to detect the anomaly in the logs and to fix it, which is quite good. Regards, Willy I have have been having a similar issue with RST on LDAP connection but have not been able to pin it down any further than haproxy. LDAP seems to be the only issue at the moment although a had identical symptoms with SMTP that were resolved after I lowered the MTU on the interface. I have a mail appliance on one side and AD LDAP on the other of this haproxy: spam_filter - haproxy(service) --- AD LDAP When I run an LDAP test, it doesn't seem to matter if it is plain text or SSL, I will randomly get a RST sent by the server during transfer. I have run the LDAP test dozens of times and there is no patter, randomly about 50% fail mid stream and a packet capture shows this. RST from server. To further test I routed the same test through the haproxy machine, ip_forwarding, to the same destination as the haproxy backend and all tests succeeded 100%. spam_filter -- haproxy(ip_forward) - AD LDAP Does anyone have any advise on what to check next? My haproxy.cfg only has defined mode tcp, do I require other options for LDAP? I need LDAPS and have found that the option ldap-check does not work, but LDAP vs LDAPS does not affect my problem. Thanks Steve
Re: Achieving Zero Downtime Restarts at Yelp
Wow, this is a really informative blog post. Thanks for sharing! I'm curious, did you weight the costs of simply converting your proxies to run on one of the BSD's? As I understand it, their implementation of SO_REUSEPORT would mean zero downtime reloads just work as hoped-for/expected. On Mon, Apr 13, 2015 at 10:24 AM, Joseph Lynch joe.e.ly...@gmail.com wrote: Hello, I published an article today on Yelp's engineering blog ( http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. The solution basically consists of using Linux queuing disciplines to delay SYN packets for the duration of the restart. It can definitely be improved by further tuning the qdiscs or replacing the iptables mangle with a u8/u32 tc filter, but I decided it was better to talk about the idea and if the community likes it, then we can optimize it further. -Joey
Re: Achieving Zero Downtime Restarts at Yelp
Thanks for sharing this. This is a great and useful article!
Re: Achieving Zero Downtime Restarts at Yelp
Hi David, On Mon, Apr 13, 2015 at 12:53 PM, David Birdsong david.birds...@gmail.com wrote: I'm curious, did you weight the costs of simply converting your proxies to run on one of the BSD's? As I understand it, their implementation of SO_REUSEPORT would mean zero downtime reloads just work as hoped-for/expected. It was considered. Unfortunately, as part of our service oriented architecture we run HAProxy on every machine and use it for routing requests to service instances, which means that we have to run on the same underlying platform that all our services run on, which is Linux. The sheer number of packages we'd have to port to run on a BSD was frankly a bit staggering so we decided against it. We may have been able to work around this with proper containerization but we're not quite there yet. -Joey
Re: HA proxy - Need infromation
On Tue, Apr 14, 2015 at 12:55 AM, Thibault Labrut thibault.lab...@enioka.com wrote: Hello, I currently installing HAProxy with keepalived to one of my clients. To facilitate the administration of this tool, I would like to know if you can advise me of administration web gui for HA proxy. Look for stats in the HAP documentation. Thank you for your help. Best regards, -- Thibault Labrut enioka 24 galerie Saint-Marc 75002 Paris +33 615 700 935 +33 144 618 314
Achieving Zero Downtime Restarts at Yelp
Hello, I published an article today on Yelp's engineering blog ( http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. The solution basically consists of using Linux queuing disciplines to delay SYN packets for the duration of the restart. It can definitely be improved by further tuning the qdiscs or replacing the iptables mangle with a u8/u32 tc filter, but I decided it was better to talk about the idea and if the community likes it, then we can optimize it further. -Joey
HAProxy 1.4.18 performance issue
Hi, I'm experiencing latency problems while running HAProxy 1.4.18. Our backend servers reply to HAProxy almost instantly (~4ms), but some of those replies are sent to the clients more than 100ms later. We have approx. 50k sessions opened at any time, with a HTTP request coming in approximately every minute over each session. I realize the version we're running is very old and should be updated. I'd like to know if there was a bug that could explain what we're seeing? If so, which version of HAProxy fixed it? Best regards, -- *Michał Ślizak* DaftCode Sp. z o.o. ul. Domaniewska 34a, 02-672 Warszawa tel. +48 506 175 074
Re: HAProxy 1.4.18 performance issue
On Mon, Apr 13, 2015 at 6:09 PM, Lukas Tribus luky...@hotmail.com wrote: On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus luky...@hotmail.commailto:luky...@hotmail.com wrote: Hi, I'm experiencing latency problems while running HAProxy 1.4.18. Our backend servers reply to HAProxy almost instantly (~4ms), but some of those replies are sent to the clients more than 100ms later. We have approx. 50k sessions opened at any time, with a HTTP request coming in approximately every minute over each session. I suggest you try option http-no-delay but really try to understand the implications: http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay Thanks Lucas, option http-no-delay seems to have solved the problem. Good, but this will just hide the real problem and may cause others (as per the documentation). Both 1.4.23 and 1.4.20 fix latency (MSG_MORE/DONTWAIT) related problems, also, if you always expect zero latency from the proxy then you are misusing HTTP. I strongly suggest you consider upgrading to latest stable (either 1.4 or better yet 1.5) and retry without this command. You didn't provide your configuration so its not possible to tell if your are running into those already fixed bugs, or if you simply need zero latency in all cases per application design. Thank you for your suggestions, I'll discuss upgrading with our sysops team. Unfortunately misusing HTTP keep-alive is what we do ;-) We're running a real-time bidding system which has to reply to auctions (HTTP GET or POST requests) over thousands of connections within 75ms. Each connection doesn't carry much traffic, maybe one request per minute, and both the requests and our replies are very small. Waiting for a full reply packet will never work in our case. It would be much more efficient to decrease the number of concurrent connections and send requests more often, but unfortunately we have no control over this. Best regards, -- *Michał Ślizak* DaftCode Sp. z o.o. ul. Domaniewska 34a, 02-672 Warszawa tel. +48 506 175 074
RE: HAProxy 1.4.18 performance issue
On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus luky...@hotmail.commailto:luky...@hotmail.com wrote: Hi, I'm experiencing latency problems while running HAProxy 1.4.18. Our backend servers reply to HAProxy almost instantly (~4ms), but some of those replies are sent to the clients more than 100ms later. We have approx. 50k sessions opened at any time, with a HTTP request coming in approximately every minute over each session. I suggest you try option http-no-delay but really try to understand the implications: http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay Thanks Lucas, option http-no-delay seems to have solved the problem. Good, but this will just hide the real problem and may cause others (as per the documentation). Both 1.4.23 and 1.4.20 fix latency (MSG_MORE/DONTWAIT) related problems, also, if you always expect zero latency from the proxy then you are misusing HTTP. I strongly suggest you consider upgrading to latest stable (either 1.4 or better yet 1.5) and retry without this command. You didn't provide your configuration so its not possible to tell if your are running into those already fixed bugs, or if you simply need zero latency in all cases per application design. Lukas
HA proxy - Need infromation
Hello, I currently installing HAProxy with keepalived to one of my clients. To facilitate the administration of this tool, I would like to know if you can advise me of administration web gui for HA proxy. Thank you for your help. Best regards, -- Thibault Labrut enioka 24 galerie Saint-Marc 75002 Paris +33 615 700 935 +33 144 618 314
RE: HAProxy 1.4.18 performance issue
Hi, I'm experiencing latency problems while running HAProxy 1.4.18. Our backend servers reply to HAProxy almost instantly (~4ms), but some of those replies are sent to the clients more than 100ms later. We have approx. 50k sessions opened at any time, with a HTTP request coming in approximately every minute over each session. I suggest you try option http-no-delay but really try to understand the implications: http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay Lukas
Re: HAProxy 1.4.18 performance issue
On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus luky...@hotmail.com wrote: Hi, I'm experiencing latency problems while running HAProxy 1.4.18. Our backend servers reply to HAProxy almost instantly (~4ms), but some of those replies are sent to the clients more than 100ms later. We have approx. 50k sessions opened at any time, with a HTTP request coming in approximately every minute over each session. I suggest you try option http-no-delay but really try to understand the implications: http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay Thanks Lucas, option http-no-delay seems to have solved the problem. -- *Michał Ślizak* DaftCode Sp. z o.o. ul. Domaniewska 34a, 02-672 Warszawa tel. +48 506 175 074