httpchk failures

2015-04-13 Thread Benjamin Smith
We have 5 Apache servers behind haproxy and we're trying to enable use the 
httpchk option along with some performance monitoring. For some reason, 
haproxy keeps thinking that 3/5 apache servers are down even though it's 
obvious that haproxy is both asking the questions and the servers are 
answering. 

Is there a way to log httpchk failures? How can I ask haproxy why it seems to 
think that several apache servers are down? 

Our config: 
CentOS 6.x recently updated, 64 bit. 

Performing an agent-check manually seems to give good results. The below 
result is immediate: 
[root@xr1 ~]# telnet 10.1.1.12 9333 
Trying 10.1.1.12...
Connected to 10.1.1.12.
Escape character is '^]'.
up 78%
Connection closed by foreign host.


I can see that xinetd on the logic server got the response: 
Apr 13 18:45:02 curie xinetd[21890]: EXIT: calcload333 status=0 pid=25693 
duration=0(sec)
Apr 13 18:45:06 curie xinetd[21890]: START: calcload333 pid=26590 
from=:::10.1.1.1


I can see that apache is serving happy replies to the load balancer: 
[root@curie ~]# tail -f /var/log/httpd/access_log | grep -i 10.1.1.1 
10.1.1.1 - - [13/Apr/2015:18:47:15 +] OPTIONS / HTTP/1.0 302 - - -
10.1.1.1 - - [13/Apr/2015:18:47:17 +] OPTIONS / HTTP/1.0 302 - - -
10.1.1.1 - - [13/Apr/2015:18:47:19 +] OPTIONS / HTTP/1.0 302 - - -
^C

Why is haproxy pulling this server out of the queue and setting status to 
DRAIN? Here's the stats output for the server: 


   [pxname] = logic333
[svname] = server12
[qcur] = 0
[qmax] = 0
[scur] = 0
[smax] = 0
[slim] = 256
[stot] = 0
[bin] = 0
[bout] = 0
[dreq] = 
[dresp] = 0
[ereq] = 
[econ] = 0
[eresp] = 0
[wretr] = 0
[wredis] = 0
[status] = DRAIN
[weight] = 0
[act] = 1
[bck] = 0
[chkfail] = 0
[chkdown] = 0
[lastchg] = 997
[downtime] = 0
[qlimit] = 
[pid] = 1
[iid] = 5
[sid] = 3
[throttle] = 
[lbtot] = 0
[tracked] = 
[type] = 2
[rate] = 0
[rate_lim] = 
[rate_max] = 0
[check_status] = L7OK
[check_code] = 302
[check_duration] = 16
[hrsp_1xx] = 0
[hrsp_2xx] = 0
[hrsp_3xx] = 0
[hrsp_4xx] = 0
[hrsp_5xx] = 0
[hrsp_other] = 0
[hanafail] = 0
[req_rate] = 
[req_rate_max] = 
[req_tot] = 
[cli_abrt] = 0
[srv_abrt] = 0
[comp_in] = 
[comp_out] = 
[comp_byp] = 
[comp_rsp] = 
[lastsess] = -1
[last_chk] = Found
[last_agt] = via agent : up
[qtime] = 0
[ctime] = 0
[rtime] = 0
[ttime] = 0




[root@xr1 ~]# haproxy -v
HA-Proxy version 1.5.6 2014/10/18
Copyright 2000-2014 Willy Tarreau w...@1wt.eu


relevant parts of /etc/haproxy/haproxy.conf 
-- SNIP -- 
frontend FrontProd333
mode http
option httplog
option dontlognull
option forwardfor
bind [MYIP]:443 ssl crt star.mydomain.com.pem no-sslv3
default_backend logic333
stats uri /haproxy?stats
stats realm StrictlyPrivate
stats auth [SNIP]
errorfile 503 /etc/haproxy/errorfile.http

backend logic333
  mode http
  option httpchk OPTIONS /
  server server10 10.1.1.10:20333 maxconn 256 check agent-check agent-port 
9333 agent-inter 4000
  server server11 10.1.1.11:20333 maxconn 256 check agent-check agent-port 
9333 agent-inter 4000
  server server12 10.1.1.12:20333 maxconn 256 check agent-check agent-port 
9333 agent-inter 4000
  server server13 10.1.1.13:20333 maxconn 256 check agent-check agent-port 
9333 agent-inter 4000
  server server14 10.1.1.14:20333 maxconn 256 check agent-check agent-port 
9333 agent-inter 4000
-- /SNIP -- 


Is there any other information I could provide to help resolve? 

Thanks, 
Ben Smith 



Re: tcp reset errors

2015-04-13 Thread Steve
Willy Tarreau w at 1wt.eu writes:

 
 Hi Franky,
 
 On Thu, Sep 11, 2014 at 01:08:09PM +0200, Franky Van Liedekerke wrote:
  On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke
  liedekef@... wrote:
   After doing tcpdump on both servers (no ldap errors anywhere in the
   ldap logs), I see that the ldap server sends out resets and the
   clients connecting to haproxy. This might be related to one another.
   Each client seems to send 2 RST packets at the end of a LDAP TLS
   session (over port 389), does that sound familiar?
  
   Franky
  
  Ok, after much trial and error, I pinned it down to the following: we
  have lots of servers doing ldap lookup for authentication, also when
  connecting via ssh. Now on EL5 servers this auth is done via a call to
  /usr/libexec/openssh/ssh-ldap-wrapper.
  Apparently this binary causes the resets to be shown in the haproxy
  error logs. I switched to the sssd version for EL5 servers, but that
  version did not include ssh-keys support, so the resets persisted.
  Again to the internet for the rescue: the version 1.9.6 for el5 can be
  found at
http://copr-be.cloud.fedoraproject.org/results/sgallagh/sssd-1.9-rhel5/epel-5-x86_64
  , and that version does support ssh correctly. Installing it, changing
  the ssh config et voila: no more resets.
  So the bug is in the ssh-ldap-wrapper, but I understand that doing a
  RST at the end is not bad, just not good either ... the side-effect
  of the new sssd is that much less ldap queries are made (as sudo and
  ssh use sssd too then), but I'll leave it up to the management to
  decide wether or not to go for that solution.
 
 Thanks for sending the details of your diagnostic. As you say, RST are
 not necessarily bad. When a client closes first, it has two options :
   - either send RST
   - or have the source port unusable for 2 minutes.
 
 Most of the time you chose the first option. In your case since you were
 seeing SD flags, it means the reset came fro mthe server, maybe the client
 was speaking inappropriately on the connection, causing the server to
 abort it. If so, it proves that the behaviour was properly chosen, because
 it allowed you to detect the anomaly in the logs and to fix it, which is
 quite good.
 
 Regards,
 Willy
 
 

I have have been having a similar issue with RST on LDAP connection but have
not been able to pin it down any further than haproxy. LDAP seems to be the
only issue at the moment although a had identical symptoms with SMTP that
were resolved after I lowered the MTU on the interface.

I have a mail appliance on one side and AD LDAP on the other of this haproxy:
spam_filter -  haproxy(service) --- AD LDAP
When I run an LDAP test, it doesn't seem to matter if it is plain text or
SSL, I will randomly get a RST sent by the server during transfer. I have
run the LDAP test dozens of times and there is no patter, randomly about 50%
fail mid stream and a packet capture shows this. RST from server.

To further test I routed the same test through the haproxy machine,
ip_forwarding, to the same destination as the haproxy backend and all tests
succeeded 100%.
spam_filter -- haproxy(ip_forward) - AD LDAP

Does anyone have any advise on what to check next?

My haproxy.cfg only has defined mode tcp, do I require other options for LDAP?

I need LDAPS and have found that the option ldap-check does not work, but
LDAP vs LDAPS does not affect my problem.

Thanks
Steve





Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread David Birdsong
Wow, this is a really informative blog post. Thanks for sharing!

I'm curious, did you weight the costs of simply converting your proxies to
run on one of the BSD's? As I understand it, their implementation of
SO_REUSEPORT would mean zero downtime reloads just work as
hoped-for/expected.

On Mon, Apr 13, 2015 at 10:24 AM, Joseph Lynch joe.e.ly...@gmail.com
wrote:

 Hello,

 I published an article today on Yelp's engineering blog (
 http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
 that shows a technique we use for low latency, zero downtime restarts of
 HAProxy. This solves the when I restart HAProxy some of my clients get
 RSTs problems that can occur. We built it to solve the RSTs in our
 internal load balancing, so there is a little more work to be done to
 modify the method to work with external traffic, which I talk about in the
 post.

 The solution basically consists of using Linux queuing disciplines to
 delay SYN packets for the duration of the restart. It can definitely be
 improved by further tuning the qdiscs or replacing the iptables mangle with
 a u8/u32 tc filter, but I decided it was better to talk about the idea and
 if the community likes it, then we can optimize it further.

 -Joey



Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Nicolas Grilly
Thanks for sharing this. This is a great and useful article!


Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Joseph Lynch
Hi David,

On Mon, Apr 13, 2015 at 12:53 PM, David Birdsong
david.birds...@gmail.com wrote:
 I'm curious, did you weight the costs of simply converting your proxies to
 run on one of the BSD's? As I understand it, their implementation of
 SO_REUSEPORT would mean zero downtime reloads just work as
 hoped-for/expected.

It was considered. Unfortunately, as part of our service oriented
architecture we run HAProxy on every machine and use it for routing
requests to service instances, which means that we have to run on the
same underlying platform that all our services run on, which is Linux.
The sheer number of packages we'd have to port to run on a BSD was
frankly a bit staggering so we decided against it. We may have been
able to work around this with proper containerization but we're not
quite there yet.

-Joey



Re: HA proxy - Need infromation

2015-04-13 Thread Igor Cicimov
On Tue, Apr 14, 2015 at 12:55 AM, Thibault Labrut 
thibault.lab...@enioka.com wrote:

 Hello,

 I currently installing HAProxy with keepalived to one of my clients.

 To facilitate the administration of this tool, I would like to know if you
 can advise me of administration web gui for HA proxy.


Look for stats in the HAP documentation.



 Thank you for your help.

 Best regards,
 --
 Thibault Labrut
 enioka
 24 galerie Saint-Marc
 75002 Paris
 +33 615 700 935
 +33 144 618 314



Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Joseph Lynch
Hello,

I published an article today on Yelp's engineering blog (
http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
that shows a technique we use for low latency, zero downtime restarts of
HAProxy. This solves the when I restart HAProxy some of my clients get
RSTs problems that can occur. We built it to solve the RSTs in our
internal load balancing, so there is a little more work to be done to
modify the method to work with external traffic, which I talk about in the
post.

The solution basically consists of using Linux queuing disciplines to delay
SYN packets for the duration of the restart. It can definitely be improved
by further tuning the qdiscs or replacing the iptables mangle with a u8/u32
tc filter, but I decided it was better to talk about the idea and if the
community likes it, then we can optimize it further.

-Joey


HAProxy 1.4.18 performance issue

2015-04-13 Thread Michał Ślizak
Hi,

I'm experiencing latency problems while running HAProxy 1.4.18.

Our backend servers reply to HAProxy almost instantly (~4ms), but some of
those replies are sent to the clients more than 100ms later.

We have approx. 50k sessions opened at any time, with a HTTP request coming
in approximately every minute over each session.

I realize the version we're running is very old and should be updated.
I'd like to know if there was a bug that could explain what we're seeing?
If so, which version of HAProxy fixed it?

Best regards,

-- 

*Michał Ślizak*
DaftCode Sp. z o.o.
ul. Domaniewska 34a, 02-672 Warszawa
tel. +48 506 175 074


Re: HAProxy 1.4.18 performance issue

2015-04-13 Thread Michał Ślizak
On Mon, Apr 13, 2015 at 6:09 PM, Lukas Tribus luky...@hotmail.com wrote:

  On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus
  luky...@hotmail.commailto:luky...@hotmail.com wrote:
  Hi,
 
  I'm experiencing latency problems while running HAProxy 1.4.18.
 
  Our backend servers reply to HAProxy almost instantly (~4ms), but some
  of those replies are sent to the clients more than 100ms later.
 
  We have approx. 50k sessions opened at any time, with a HTTP request
  coming in approximately every minute over each session.
 
  I suggest you try option http-no-delay but really try to understand
  the implications:
 
 http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay

 
 
  Thanks Lucas, option http-no-delay seems to have solved the problem.

 Good, but this will just hide the real problem and may cause others (as
 per the documentation).
 Both 1.4.23 and 1.4.20 fix latency (MSG_MORE/DONTWAIT) related problems,
 also, if you always
 expect zero latency from the proxy then you are misusing HTTP.

 I strongly suggest you consider upgrading to latest stable (either 1.4 or
 better yet 1.5) and retry
 without this command.

 You didn't provide your configuration so its not possible to tell if your
 are running into those
 already fixed bugs, or if you simply need zero latency in all cases per
 application design.



Thank you for your suggestions, I'll discuss upgrading with our sysops team.

Unfortunately misusing HTTP keep-alive is what we do ;-)
We're running a real-time bidding system which has to reply to auctions
(HTTP GET or POST requests) over thousands of connections within 75ms.
Each connection doesn't carry much traffic, maybe one request per minute,
and both the requests and our replies are very small.
Waiting for a full reply packet will never work in our case.

It would be much more efficient to decrease the number of concurrent
connections and send requests more often, but unfortunately we have no
control over this.

Best regards,

-- 

*Michał Ślizak*
DaftCode Sp. z o.o.
ul. Domaniewska 34a, 02-672 Warszawa
tel. +48 506 175 074


RE: HAProxy 1.4.18 performance issue

2015-04-13 Thread Lukas Tribus
 On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus 
 luky...@hotmail.commailto:luky...@hotmail.com wrote: 
 Hi, 
 
 I'm experiencing latency problems while running HAProxy 1.4.18. 
 
 Our backend servers reply to HAProxy almost instantly (~4ms), but some 
 of those replies are sent to the clients more than 100ms later. 
 
 We have approx. 50k sessions opened at any time, with a HTTP request 
 coming in approximately every minute over each session. 
 
 I suggest you try option http-no-delay but really try to understand 
 the implications: 
 http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay
  
 
 
 Thanks Lucas, option http-no-delay seems to have solved the problem.

Good, but this will just hide the real problem and may cause others (as per the 
documentation).
Both 1.4.23 and 1.4.20 fix latency (MSG_MORE/DONTWAIT) related problems, also, 
if you always
expect zero latency from the proxy then you are misusing HTTP.

I strongly suggest you consider upgrading to latest stable (either 1.4 or 
better yet 1.5) and retry
without this command.

You didn't provide your configuration so its not possible to tell if your are 
running into those
already fixed bugs, or if you simply need zero latency in all cases per 
application design.



Lukas

  


HA proxy - Need infromation

2015-04-13 Thread Thibault Labrut
Hello,

I currently installing HAProxy with keepalived to one of my clients.

To facilitate the administration of this tool, I would like to know if you
can advise me of administration web gui for HA proxy.

Thank you for your help.

Best regards,
-- 
Thibault Labrut
enioka
24 galerie Saint-Marc
75002 Paris
+33 615 700 935
+33 144 618 314




RE: HAProxy 1.4.18 performance issue

2015-04-13 Thread Lukas Tribus
 Hi, 
 
 I'm experiencing latency problems while running HAProxy 1.4.18. 
 
 Our backend servers reply to HAProxy almost instantly (~4ms), but some 
 of those replies are sent to the clients more than 100ms later. 
 
 We have approx. 50k sessions opened at any time, with a HTTP request 
 coming in approximately every minute over each session. 

I suggest you try option http-no-delay but really try to understand
the implications:
http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay



Lukas

  


Re: HAProxy 1.4.18 performance issue

2015-04-13 Thread Michał Ślizak
On Mon, Apr 13, 2015 at 4:58 PM, Lukas Tribus luky...@hotmail.com wrote:

  Hi,
 
  I'm experiencing latency problems while running HAProxy 1.4.18.
 
  Our backend servers reply to HAProxy almost instantly (~4ms), but some
  of those replies are sent to the clients more than 100ms later.
 
  We have approx. 50k sessions opened at any time, with a HTTP request
  coming in approximately every minute over each session.

 I suggest you try option http-no-delay but really try to understand
 the implications:

 http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20http-no-delay


Thanks Lucas, option http-no-delay seems to have solved the problem.

-- 

*Michał Ślizak*
DaftCode Sp. z o.o.
ul. Domaniewska 34a, 02-672 Warszawa
tel. +48 506 175 074