Re: Help! HAProxy randomly failing health checks!
Hey guys, just following up. Still running into the issue. From: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Date: Friday, March 18, 2016 at 6:07 PM To: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! Ok! Here is a bunch of info that might better assist with the issue: Each of our clients has an HAProxy install that forwards requests for 80 and 443 to 1025 and 1026 respectively. These requests are forwarded over TCP using proxy protocol to our HAP instances. Our HAP instances then SSL term the request and forward them off to our backend on port 80. See attached diagram which should better explain the entire flow. During an outage due to the SSL handshakes failing, I was running TCPDump so I could look through and discover what was causing the failure, I was able to discover that we are receiving connection resets on some SSL connections. We then tested all the SSL certs from our client side to our side to verify that there is not a mismatched cert. This test was completed with no issues. Here is a connection reset packet I found in that TCP Dump 29525 158.09621710.1.4.11954.239.21.251TCP5438740 → 443 [RST] Seq=3533 Win=0 Len=0 Frame 29523: 54 bytes on wire (432 bits), 54 bytes captured (432 bits) Encapsulation type: Ethernet (1) Arrival Time: Mar 17, 2016 14:58:07.34584 PDT [Time shift for this packet: 0.0 seconds] Epoch Time: 1458251887.34584 seconds [Time delta from previous captured frame: 0.2 seconds] [Time delta from previous displayed frame: 0.021655000 seconds] [Time since reference or first frame: 158.096184000 seconds] Frame Number: 29523 Frame Length: 54 bytes (432 bits) Capture Length: 54 bytes (432 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:ip:tcp] [Coloring Rule Name: TCP RST] [Coloring Rule String: tcp.flags.reset eq 1] Ethernet II, Src: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91), Dst: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) Destination: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) Address: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) ..1. = LG bit: Locally administered address (this is NOT the factory default) ...0 = IG bit: Individual address (unicast) Source: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91) Address: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91) ..1. = LG bit: Locally administered address (this is NOT the factory default) ...0 = IG bit: Individual address (unicast) Type: IPv4 (0x0800) Internet Protocol Version 4, Src: $SRC IP Dst: $DST IP 0100 = Version: 4 0101 = Header Length: 20 bytes Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 00.. = Differentiated Services Codepoint: Default (0) ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 40 Identification: 0x5f56 (24406) Flags: 0x02 (Don't Fragment) 0... = Reserved bit: Not set .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0x8018 [validation disabled] [Good: False] [Bad: False] Source: $SourceIP Destination: $DestinationIP [Source GeoIP: Unknown] [Destination GeoIP: Unknown] Transmission Control Protocol, Src Port: 38740 (38740), Dst Port: 443 (443), Seq: 3533, Len: 0 Source Port: 38740 Destination Port: 443 [Stream index: 2799] [TCP Segment Len: 0] Sequence number: 3533(relative sequence number) Acknowledgment number: 0 Header Length: 20 bytes Flags: 0x004 (RST) 000. = Reserved: Not set ...0 = Nonce: Not set 0... = Congestion Window Reduced (CWR): Not set .0.. = ECN-Echo: Not set ..0. = Urgent: Not set ...0 = Acknowledgment: Not set 0... = Push: Not set .1.. = Reset: Set [Expert Info (Warn/Sequence): Connection reset (RST)] [Connection reset (RST)] [Severity level: Warn] [Group: Sequence] ..0. = Syn: Not set ...0 = Fin: Not set [TCP Flags: *R**] Window size value: 0 [Calculated window size: 0] [Window size scaling factor: 128] Checksum: 0x5c2f [validation disabled] [Good Checksum: False] [Bad Che
Re: Help! HAProxy randomly failing health checks!
hake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60376 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 68.116.153.225:57824 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60365 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60364 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60362 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:37490 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 108.59.8.48:43566 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:59763 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:59760 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60319 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60299 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60293 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60292 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60284 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60282 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:38664 [17/Mar/2016:18:37:45.736] shared_incoming/2: SSL handshake failure Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60270 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:33270 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:33273 [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL handshake Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60089 [17/Mar/2016:18:37:06.938] shared_incoming~ shared_incoming/ -1/-1/-1/-1/0 400 187 - - CR-- 314/314/0/0/0 0/0 "" Mar 17 18:37:45 localhost haproxy[28703]: 109.154.74.227:53964 [17/Mar/2016:18:37:06.938] shared_incoming shared_incoming/ -1/-1/-1/-1/0 400 0 - - CR-- 313/313/0/0/0 0/0 "" Mar 17 18:37:45 localhost haproxy[28703]: 66.87.151.25:3325 [17/Mar/2016:18:37:06.938] shared_incoming shared_incoming/ -1/-1/-1/-1/0 400 0 - - CR-- 312/312/0/0/0 0/0 "" Mar 17 18:37:45 localhost haproxy[28703]: 108.59.8.48:33611 [17/Mar/2016:18:36:55.938] shared_incoming provedmedia/provedmedia_http 279/0/0/-1/279 -1 0 - - CD-- 311/311/91/91/0 0/0 "GET /?a=61=22008=7346_0_1=1_0_0_0_0_2102824_0_571_61811_0_0 HTTP/1.1" From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 5:01 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: Thanks for the reply! Ok so based on what you saw in my config, does it look like we’re misconfigured enough to cause this to happen? If we were misconfigured, one would assume we would go down all the time yeah? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 4:50 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:47 AM, Igor Ci
Re: Help! HAProxy randomly failing health checks!
On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov < ig...@encompasscorporation.com> wrote: > > > On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches> wrote: > >> I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 >> routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 >> checks done a second) then Route53 changes its routing from the first proxy >> box to the second >> >> >> >> >> On 3/15/16, 9:46 PM, "Baptiste" wrote: >> >> >Maybe you're checking a third party VM :) >> > >> > > AFAIK the Route53 health checks come from different points around the > globe and it is possible that at some time of the day AWS has scheduled > some specific end points to perform the HC. And it is possible that those > ones have different SSL settings from the ones performing the HC during > your day time. I would suggest you bring up this issue with AWS support, > let them know your SSL cypher settings in HAP and ask if they are > compatible with ALL their servers performing SSL health checks. > > I personally haven't seen any issues with failed SSL handshakes coming > from AWS servers and have HAP's running in AU and UK regions. > > Igor > That is if you are absolutely sure that the failed handshakes are not caused by overload or misconfigured (system) settings on HAP
Re: Help! HAProxy randomly failing health checks!
On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com> wrote: > Thanks for the reply! > > Ok so based on what you saw in my config, does it look like we’re > misconfigured enough to cause this to happen? > > If we were misconfigured, one would assume we would go down all the time > yeah? > > From: Igor Cicimov <ig...@encompasscorporation.com> > Date: Wednesday, March 16, 2016 at 4:50 PM > To: Zachary Punches <zpunc...@getcake.com> > Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < > haproxy@formilux.org> > Subject: Re: Help! HAProxy randomly failing health checks! > > > > On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov < > ig...@encompasscorporation.com> wrote: > >> >> >> On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com> >> wrote: >> >>> I’m not, these guys aren’t sitting behind an ELB. They sit behind >>> route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds >>> (with 4 checks done a second) then Route53 changes its routing from the >>> first proxy box to the second >>> >>> >>> >>> >>> On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com> wrote: >>> >>> >Maybe you're checking a third party VM :) >>> > >>> >> >> AFAIK the Route53 health checks come from different points around the >> globe and it is possible that at some time of the day AWS has scheduled >> some specific end points to perform the HC. And it is possible that those >> ones have different SSL settings from the ones performing the HC during >> your day time. I would suggest you bring up this issue with AWS support, >> let them know your SSL cypher settings in HAP and ask if they are >> compatible with ALL their servers performing SSL health checks. >> >> I personally haven't seen any issues with failed SSL handshakes coming >> from AWS servers and have HAP's running in AU and UK regions. >> >> Igor >> > > That is if you are absolutely sure that the failed handshakes are not > caused by overload or misconfigured (system) settings on HAP > > I was saying this in regards to system (kernel) settings. For example, assuming Unix/Linux is your net.core.somaxconn actually set *higher* than your maxconn which is set to 3 and 15000 in HAP? Any other kernel settings you might have changed? (output of "sysctl -p" command) What is your pick hour load, how many connections/sessions are you seeing on each HAP? Another suggestion is maybe set tune.ssl.default-dh-param to 1024 and see if that helps.
Re: Help! HAProxy randomly failing health checks!
I went ahead and added the performance tuning you recommended (changing the maxconn to 1024). Hopefully this adds some stability As for the port, we’re using 1027 for our SSL traffic vs 443. We are currently getting SSL traffic that isn’t always failing on handshake. As for what is in front of our HAP: Our clients have HAP servers that are forwarding requests off to our HAP setup From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 6:51 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 12:46 PM, Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> wrote: On Thu, Mar 17, 2016 at 11:14 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: I wanna say average is like 4-6 connections a second? Super minimal From what I’ve seen in the logs during the SSL errors, the log hangs then outputs a bunch of SSL errors all at once. Here it the output from sysctl –p net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 What would lowering the tune.ssl.default-dh-param to 1024 do? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 5:01 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: Thanks for the reply! Ok so based on what you saw in my config, does it look like we’re misconfigured enough to cause this to happen? If we were misconfigured, one would assume we would go down all the time yeah? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 4:50 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> wrote: On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 checks done a second) then Route53 changes its routing from the first proxy box to the second On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com<mailto:bed...@gmail.com>> wrote: >Maybe you're checking a third party VM :) > AFAIK the Route53 health checks come from different points around the globe and it is possible that at some time of the day AWS has scheduled some specific end points to perform the HC. And it is possible that those ones have different SSL settings from the ones performing the HC during your day time. I would suggest you bring up this issue with AWS support, let them know your SSL cypher settings in HAP and ask if they are compatible with ALL their servers performing SSL health checks. I personally haven't seen any issues with failed SSL handshakes coming from AWS servers and have HAP's running in AU and UK regions. Igor That is if you are absolutely sure that the failed handshakes are not caused by overload or misconfigured (system) settings on HAP I was saying this in regards to system (kernel) settings. For example, assuming Unix/Linux is your net.core.somaxconn actually set *higher* than your maxconn which is set to 3 and 15000 in HAP? Any other kernel settings you might have changed? (output of "sysctl -p" com
Re: Help! HAProxy randomly failing health checks!
ng/2: Connection closed during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:59222 > [17/Mar/2016:18:37:45.736] shared_incoming/2: SSL handshake failure > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60388 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60379 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60376 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 68.116.153.225:57824 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60365 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60364 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60362 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:37490 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 108.59.8.48:43566 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:59763 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:59760 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60319 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60299 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60293 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60292 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60284 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60282 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:38664 > [17/Mar/2016:18:37:45.736] shared_incoming/2: SSL handshake failure > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60270 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:33270 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 208.100.26.237:33273 > [17/Mar/2016:18:37:45.736] shared_incoming/2: Connection error during SSL > handshake > Mar 17 18:37:45 localhost haproxy[28703]: 54.218.28.138:60089 > [17/Mar/2016:18:37:06.938] shared_incoming~ shared_incoming/ > -1/-1/-1/-1/0 400 187 - - CR-- 314/314/0/0/0 0/0 "" > Mar 17 18:37:45 localhost haproxy[28703]: 109.154.74.227:53964 > [17/Mar/2016:18:37:06.938] shared_incoming shared_incoming/ > -1/-1/-1/-1/0 400 0 - - CR-- 313/313/0/0/0 0/0 "" > Mar 17 18:37:45 localhost haproxy[28703]: 66.87.151.25:3325 > [17/Mar/2016:18:37:06.938] shared_incoming shared_incoming/ > -1/-1/-1/-1/0 400 0 - - CR-- 312/312/0/0/0 0/0 "" > Mar 17 18:37:45 localhost haproxy[28703]: 108.59.8.48:33611 > [17/Mar/2016:18:36:55.938] shared_incoming provedmedia/provedmedia_http > 279/0/0/-1/279 -1 0 - - CD-- 311/311/91/91/0 0/0 "GET > /?a=61=22008=7346_0_1=1_0_0_0_0_2102824_0_571_61811_0_0 HTTP/1.1" > > From: Igor Cicimov <ig...@encompasscorporation.com> > Date: Wednesday, March 16, 2016 at 5:01 PM > To: Zachary Punches <zpunc...@getcake.com> > Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < > haproxy@formilux.org> > Subject: Re: Help! HAProxy randomly failing health checks! > > > > On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com> > wrote: > >> Thanks for the reply! >> >> Ok so based on what you saw in my config, does it loo
Re: Help! HAProxy randomly failing health checks!
On Thu, Mar 17, 2016 at 12:46 PM, Igor Cicimov < ig...@encompasscorporation.com> wrote: > > > On Thu, Mar 17, 2016 at 11:14 AM, Zachary Punches <zpunc...@getcake.com> > wrote: > >> I wanna say average is like 4-6 connections a second? Super minimal >> >> From what I’ve seen in the logs during the SSL errors, the log hangs then >> outputs a bunch of SSL errors all at once. >> >> Here it the output from sysctl –p >> net.ipv4.ip_forward = 0 >> net.ipv4.conf.default.rp_filter = 1 >> net.ipv4.conf.default.accept_source_route = 0 >> kernel.sysrq = 0 >> kernel.core_uses_pid = 1 >> net.ipv4.tcp_syncookies = 1 >> error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key >> error: "net.bridge.bridge-nf-call-iptables" is an unknown key >> error: "net.bridge.bridge-nf-call-arptables" is an unknown key >> kernel.msgmnb = 65536 >> kernel.msgmax = 65536 >> kernel.shmmax = 68719476736 >> kernel.shmall = 4294967296 >> >> What would lowering the tune.ssl.default-dh-param to 1024 do? >> From: Igor Cicimov <ig...@encompasscorporation.com> >> Date: Wednesday, March 16, 2016 at 5:01 PM >> To: Zachary Punches <zpunc...@getcake.com> >> Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < >> haproxy@formilux.org> >> Subject: Re: Help! HAProxy randomly failing health checks! >> >> >> >> On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com> >> wrote: >> >>> Thanks for the reply! >>> >>> Ok so based on what you saw in my config, does it look like we’re >>> misconfigured enough to cause this to happen? >>> >>> If we were misconfigured, one would assume we would go down all the time >>> yeah? >>> >>> From: Igor Cicimov <ig...@encompasscorporation.com> >>> Date: Wednesday, March 16, 2016 at 4:50 PM >>> To: Zachary Punches <zpunc...@getcake.com> >>> Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < >>> haproxy@formilux.org> >>> Subject: Re: Help! HAProxy randomly failing health checks! >>> >>> >>> >>> On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov < >>> ig...@encompasscorporation.com> wrote: >>> >>>> >>>> >>>> On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com> >>>> wrote: >>>> >>>>> I’m not, these guys aren’t sitting behind an ELB. They sit behind >>>>> route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds >>>>> (with 4 checks done a second) then Route53 changes its routing from the >>>>> first proxy box to the second >>>>> >>>>> >>>>> >>>>> >>>>> On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com> wrote: >>>>> >>>>> >Maybe you're checking a third party VM :) >>>>> > >>>>> >>>> >>>> AFAIK the Route53 health checks come from different points around the >>>> globe and it is possible that at some time of the day AWS has scheduled >>>> some specific end points to perform the HC. And it is possible that those >>>> ones have different SSL settings from the ones performing the HC during >>>> your day time. I would suggest you bring up this issue with AWS support, >>>> let them know your SSL cypher settings in HAP and ask if they are >>>> compatible with ALL their servers performing SSL health checks. >>>> >>>> I personally haven't seen any issues with failed SSL handshakes coming >>>> from AWS servers and have HAP's running in AU and UK regions. >>>> >>>> Igor >>>> >>> >>> That is if you are absolutely sure that the failed handshakes are not >>> caused by overload or misconfigured (system) settings on HAP >>> >>> >> I was saying this in regards to system (kernel) settings. For example, >> assuming Unix/Linux is your net.core.somaxconn actually set *higher* than >> your maxconn which is set to 3 and 15000 in HAP? Any other kernel >> settings you might have changed? (output of "sysctl -p" command) >> >> What is your pick hour load, how many connections/sessions are you seeing >> on each HAP? >> >> Another suggestion is maybe set tune.ssl.default-dh-param to 1024 and see >> if that helps. >> > > > Ok, so on default ubuntu cloud instance this is what we have: > > # sysctl -a | grep maxconn > net.core.somaxconn = 128 > > which is too low for production server. Check your value and adjust it to > your needs. > > By the way, what is accept-proxy doing there in your setup? Is there > anything else in front of HAP using PROXY protocol? > > Wait a minute: bind *:1027 # Health checking port are you maybe using https health check on a non SSL port?
Re: Help! HAProxy randomly failing health checks!
On Fri, Mar 18, 2016 at 1:38 PM, Igor Cicimov < ig...@encompasscorporation.com> wrote: > > > On Fri, Mar 18, 2016 at 12:04 PM, Zachary Punches <zpunc...@getcake.com> > wrote: > >> Yeah port 1027 is used for health checks over SSL. >> >> This HAP forwards requests off to our databases. The databases have a >> string in a table that indicates that the HAP instance can move all the way >> through the entire process before it lights as green. >> >> Our health checks in route 53 are setup to ping 1027 as the SSL port >> >> From: Igor Cicimov <ig...@encompasscorporation.com> >> Date: Thursday, March 17, 2016 at 4:18 PM >> To: Zachary Punches <zpunc...@getcake.com> >> Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < >> haproxy@formilux.org> >> Subject: Re: Help! HAProxy randomly failing health checks! >> >> So is port 1027 used for health checks over SSL or not? I don't see any >> ssl settings on that port. >> > > > I see. In that case, since you are doing ssl pass-through in http mode, > you need to add ssl to your server line: > > backend server0 ## added to allow gs ssl meta tag verification > reqrep ^GET\ /.*\ (HTTP/.*)GET\ /GlobalSignVerification\ \1 > server server0_http server0.domain.com:80/GlobalSignVerification/ > > so it becomes: > > server server0_http server0.domain.com:80/GlobalSignVerification/ ssl > > if I understood your intention correctly and server0 is ssl enabled db. > # Route traffic based on domain use_backend gs_verify if gs_texthtml or gs_user_agent## allow gs meta tag verification Where is this backend in your config? Which domain is used fore the health checks?
Re: Help! HAProxy randomly failing health checks!
Yeah port 1027 is used for health checks over SSL. This HAP forwards requests off to our databases. The databases have a string in a table that indicates that the HAP instance can move all the way through the entire process before it lights as green. Our health checks in route 53 are setup to ping 1027 as the SSL port From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Thursday, March 17, 2016 at 4:18 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! So is port 1027 used for health checks over SSL or not? I don't see any ssl settings on that port.
Re: Help! HAProxy randomly failing health checks!
I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 checks done a second) then Route53 changes its routing from the first proxy box to the second On 3/15/16, 9:46 PM, "Baptiste"wrote: >Maybe you're checking a third party VM :) >
Re: Help! HAProxy randomly failing health checks!
On Fri, Mar 18, 2016 at 12:04 PM, Zachary Punches <zpunc...@getcake.com> wrote: > Yeah port 1027 is used for health checks over SSL. > > This HAP forwards requests off to our databases. The databases have a > string in a table that indicates that the HAP instance can move all the way > through the entire process before it lights as green. > > Our health checks in route 53 are setup to ping 1027 as the SSL port > > From: Igor Cicimov <ig...@encompasscorporation.com> > Date: Thursday, March 17, 2016 at 4:18 PM > To: Zachary Punches <zpunc...@getcake.com> > Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < > haproxy@formilux.org> > Subject: Re: Help! HAProxy randomly failing health checks! > > So is port 1027 used for health checks over SSL or not? I don't see any > ssl settings on that port. > I see. In that case, since you are doing ssl pass-through in http mode, you need to add ssl to your server line: backend server0 ## added to allow gs ssl meta tag verification reqrep ^GET\ /.*\ (HTTP/.*)GET\ /GlobalSignVerification\ \1 server server0_http server0.domain.com:80/GlobalSignVerification/ so it becomes: server server0_http server0.domain.com:80/GlobalSignVerification/ ssl if I understood your intention correctly and server0 is ssl enabled db.
Re: Help! HAProxy randomly failing health checks!
I wanna say average is like 4-6 connections a second? Super minimal From what I’ve seen in the logs during the SSL errors, the log hangs then outputs a bunch of SSL errors all at once. Here it the output from sysctl –p net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 What would lowering the tune.ssl.default-dh-param to 1024 do? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 5:01 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: Thanks for the reply! Ok so based on what you saw in my config, does it look like we’re misconfigured enough to cause this to happen? If we were misconfigured, one would assume we would go down all the time yeah? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 4:50 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> wrote: On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 checks done a second) then Route53 changes its routing from the first proxy box to the second On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com<mailto:bed...@gmail.com>> wrote: >Maybe you're checking a third party VM :) > AFAIK the Route53 health checks come from different points around the globe and it is possible that at some time of the day AWS has scheduled some specific end points to perform the HC. And it is possible that those ones have different SSL settings from the ones performing the HC during your day time. I would suggest you bring up this issue with AWS support, let them know your SSL cypher settings in HAP and ask if they are compatible with ALL their servers performing SSL health checks. I personally haven't seen any issues with failed SSL handshakes coming from AWS servers and have HAP's running in AU and UK regions. Igor That is if you are absolutely sure that the failed handshakes are not caused by overload or misconfigured (system) settings on HAP I was saying this in regards to system (kernel) settings. For example, assuming Unix/Linux is your net.core.somaxconn actually set *higher* than your maxconn which is set to 3 and 15000 in HAP? Any other kernel settings you might have changed? (output of "sysctl -p" command) What is your pick hour load, how many connections/sessions are you seeing on each HAP? Another suggestion is maybe set tune.ssl.default-dh-param to 1024 and see if that helps.
Re: Help! HAProxy randomly failing health checks!
Thanks for the reply! Ok so based on what you saw in my config, does it look like we’re misconfigured enough to cause this to happen? If we were misconfigured, one would assume we would go down all the time yeah? From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Wednesday, March 16, 2016 at 4:50 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> wrote: On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> wrote: I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 checks done a second) then Route53 changes its routing from the first proxy box to the second On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com<mailto:bed...@gmail.com>> wrote: >Maybe you're checking a third party VM :) > AFAIK the Route53 health checks come from different points around the globe and it is possible that at some time of the day AWS has scheduled some specific end points to perform the HC. And it is possible that those ones have different SSL settings from the ones performing the HC during your day time. I would suggest you bring up this issue with AWS support, let them know your SSL cypher settings in HAP and ask if they are compatible with ALL their servers performing SSL health checks. I personally haven't seen any issues with failed SSL handshakes coming from AWS servers and have HAP's running in AU and UK regions. Igor That is if you are absolutely sure that the failed handshakes are not caused by overload or misconfigured (system) settings on HAP
Re: Help! HAProxy randomly failing health checks!
On Thu, Mar 17, 2016 at 11:14 AM, Zachary Punches <zpunc...@getcake.com> wrote: > I wanna say average is like 4-6 connections a second? Super minimal > > From what I’ve seen in the logs during the SSL errors, the log hangs then > outputs a bunch of SSL errors all at once. > > Here it the output from sysctl –p > net.ipv4.ip_forward = 0 > net.ipv4.conf.default.rp_filter = 1 > net.ipv4.conf.default.accept_source_route = 0 > kernel.sysrq = 0 > kernel.core_uses_pid = 1 > net.ipv4.tcp_syncookies = 1 > error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key > error: "net.bridge.bridge-nf-call-iptables" is an unknown key > error: "net.bridge.bridge-nf-call-arptables" is an unknown key > kernel.msgmnb = 65536 > kernel.msgmax = 65536 > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > > What would lowering the tune.ssl.default-dh-param to 1024 do? > From: Igor Cicimov <ig...@encompasscorporation.com> > Date: Wednesday, March 16, 2016 at 5:01 PM > To: Zachary Punches <zpunc...@getcake.com> > Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < > haproxy@formilux.org> > Subject: Re: Help! HAProxy randomly failing health checks! > > > > On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches <zpunc...@getcake.com> > wrote: > >> Thanks for the reply! >> >> Ok so based on what you saw in my config, does it look like we’re >> misconfigured enough to cause this to happen? >> >> If we were misconfigured, one would assume we would go down all the time >> yeah? >> >> From: Igor Cicimov <ig...@encompasscorporation.com> >> Date: Wednesday, March 16, 2016 at 4:50 PM >> To: Zachary Punches <zpunc...@getcake.com> >> Cc: Baptiste <bed...@gmail.com>, "haproxy@formilux.org" < >> haproxy@formilux.org> >> Subject: Re: Help! HAProxy randomly failing health checks! >> >> >> >> On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov < >> ig...@encompasscorporation.com> wrote: >> >>> >>> >>> On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches <zpunc...@getcake.com> >>> wrote: >>> >>>> I’m not, these guys aren’t sitting behind an ELB. They sit behind >>>> route53 routing. If one of the proxy boxes fails 3 checks in 30 seconds >>>> (with 4 checks done a second) then Route53 changes its routing from the >>>> first proxy box to the second >>>> >>>> >>>> >>>> >>>> On 3/15/16, 9:46 PM, "Baptiste" <bed...@gmail.com> wrote: >>>> >>>> >Maybe you're checking a third party VM :) >>>> > >>>> >>> >>> AFAIK the Route53 health checks come from different points around the >>> globe and it is possible that at some time of the day AWS has scheduled >>> some specific end points to perform the HC. And it is possible that those >>> ones have different SSL settings from the ones performing the HC during >>> your day time. I would suggest you bring up this issue with AWS support, >>> let them know your SSL cypher settings in HAP and ask if they are >>> compatible with ALL their servers performing SSL health checks. >>> >>> I personally haven't seen any issues with failed SSL handshakes coming >>> from AWS servers and have HAP's running in AU and UK regions. >>> >>> Igor >>> >> >> That is if you are absolutely sure that the failed handshakes are not >> caused by overload or misconfigured (system) settings on HAP >> >> > I was saying this in regards to system (kernel) settings. For example, > assuming Unix/Linux is your net.core.somaxconn actually set *higher* than > your maxconn which is set to 3 and 15000 in HAP? Any other kernel > settings you might have changed? (output of "sysctl -p" command) > > What is your pick hour load, how many connections/sessions are you seeing > on each HAP? > > Another suggestion is maybe set tune.ssl.default-dh-param to 1024 and see > if that helps. > Ok, so on default ubuntu cloud instance this is what we have: # sysctl -a | grep maxconn net.core.somaxconn = 128 which is too low for production server. Check your value and adjust it to your needs. By the way, what is accept-proxy doing there in your setup? Is there anything else in front of HAP using PROXY protocol?
Re: Help! HAProxy randomly failing health checks!
On Thu, Mar 17, 2016 at 5:29 AM, Zachary Puncheswrote: > I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 > routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 > checks done a second) then Route53 changes its routing from the first proxy > box to the second > > > > > On 3/15/16, 9:46 PM, "Baptiste" wrote: > > >Maybe you're checking a third party VM :) > > > AFAIK the Route53 health checks come from different points around the globe and it is possible that at some time of the day AWS has scheduled some specific end points to perform the HC. And it is possible that those ones have different SSL settings from the ones performing the HC during your day time. I would suggest you bring up this issue with AWS support, let them know your SSL cypher settings in HAP and ask if they are compatible with ALL their servers performing SSL health checks. I personally haven't seen any issues with failed SSL handshakes coming from AWS servers and have HAP's running in AU and UK regions. Igor
Re: Help! HAProxy randomly failing health checks!
Ok! Here is a bunch of info that might better assist with the issue: Each of our clients has an HAProxy install that forwards requests for 80 and 443 to 1025 and 1026 respectively. These requests are forwarded over TCP using proxy protocol to our HAP instances. Our HAP instances then SSL term the request and forward them off to our backend on port 80. See attached diagram which should better explain the entire flow. During an outage due to the SSL handshakes failing, I was running TCPDump so I could look through and discover what was causing the failure, I was able to discover that we are receiving connection resets on some SSL connections. We then tested all the SSL certs from our client side to our side to verify that there is not a mismatched cert. This test was completed with no issues. Here is a connection reset packet I found in that TCP Dump 29525 158.09621710.1.4.119 54.239.21.251TCP 5438740 → 443 [RST] Seq=3533 Win=0 Len=0 Frame 29523: 54 bytes on wire (432 bits), 54 bytes captured (432 bits) Encapsulation type: Ethernet (1) Arrival Time: Mar 17, 2016 14:58:07.34584 PDT [Time shift for this packet: 0.0 seconds] Epoch Time: 1458251887.34584 seconds [Time delta from previous captured frame: 0.2 seconds] [Time delta from previous displayed frame: 0.021655000 seconds] [Time since reference or first frame: 158.096184000 seconds] Frame Number: 29523 Frame Length: 54 bytes (432 bits) Capture Length: 54 bytes (432 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:ip:tcp] [Coloring Rule Name: TCP RST] [Coloring Rule String: tcp.flags.reset eq 1] Ethernet II, Src: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91), Dst: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) Destination: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) Address: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58) ..1. = LG bit: Locally administered address (this is NOT the factory default) ...0 = IG bit: Individual address (unicast) Source: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91) Address: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91) ..1. = LG bit: Locally administered address (this is NOT the factory default) ...0 = IG bit: Individual address (unicast) Type: IPv4 (0x0800) Internet Protocol Version 4, Src: $SRC IP Dst: $DST IP 0100 = Version: 4 0101 = Header Length: 20 bytes Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 00.. = Differentiated Services Codepoint: Default (0) ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 40 Identification: 0x5f56 (24406) Flags: 0x02 (Don't Fragment) 0... = Reserved bit: Not set .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0x8018 [validation disabled] [Good: False] [Bad: False] Source: $SourceIP Destination: $DestinationIP [Source GeoIP: Unknown] [Destination GeoIP: Unknown] Transmission Control Protocol, Src Port: 38740 (38740), Dst Port: 443 (443), Seq: 3533, Len: 0 Source Port: 38740 Destination Port: 443 [Stream index: 2799] [TCP Segment Len: 0] Sequence number: 3533(relative sequence number) Acknowledgment number: 0 Header Length: 20 bytes Flags: 0x004 (RST) 000. = Reserved: Not set ...0 = Nonce: Not set 0... = Congestion Window Reduced (CWR): Not set .0.. = ECN-Echo: Not set ..0. = Urgent: Not set ...0 = Acknowledgment: Not set 0... = Push: Not set .1.. = Reset: Set [Expert Info (Warn/Sequence): Connection reset (RST)] [Connection reset (RST)] [Severity level: Warn] [Group: Sequence] ..0. = Syn: Not set ...0 = Fin: Not set [TCP Flags: *R**] Window size value: 0 [Calculated window size: 0] [Window size scaling factor: 128] Checksum: 0x5c2f [validation disabled] [Good Checksum: False] [Bad Checksum: False] Urgent pointer: 0 From: Igor Cicimov <ig...@encompasscorporation.com<mailto:ig...@encompasscorporation.com>> Date: Thursday, March 17, 2016 at 8:40 PM To: Zachary Punches <zpunc...@getcake.com<mailto:zpunc...@getcake.com>> Cc: Baptiste <bed...@gmail.com<mailto:bed...@gmail.com>>, "haproxy@formilux.org<mailto:haproxy@formilux.org>" <haproxy@formilux.org<mailto:haproxy@formilux.org>> Subject: Re: Help! HAProxy randomly failing health checks! On Fri, Mar 18, 2016 at
Re: Help! HAProxy randomly failing health checks!
Has ELB changed its IP address??? Maybe you're checking a third party VM :) Baptiste
Re: Help! HAProxy randomly failing health checks!
On Wed, Mar 16, 2016 at 5:54 AM, Zachary Puncheswrote: > Hello! > > > > My name is Zack, and I have been in the middle of an on going HAProxy > issue that has me scratching my head. > > > > Here is the setup: > > > > Our setup is hosted by amazon, and our HAProxy (1.6.3) boxes are in each > region in 3 regions. We have 2 HAProxy boxes per region for a total of 6 > proxy boxes. > > > > These boxes are routed information through route 53. Their entire job is > to forward data from one of our clients to our database backend. It handles > this absolutely fine, except between the hours of 7pm PST and 7am PST. > During these hours, our route53 health checks time out thus causing the > traffic to switch to the other HAProxy box inside of the same region. > > > > During the other 12 hours of the day, we receive 0 alerts from our health > checks. > > > > I have noticed that we get a series of SSL handshake failures (though this > happens throughout the entire day) that causes the server to hang for a > second, thus causing the health checks to fail. During the day our SSL > failures do not cause the server to hang long enough to go fail the checks, > they only fail at night. I have attached my HAProxy config hoping that you > guys have an answer for me. Lemme know if you need any more info. > > > > I have done a few tcpdump captures during the SSL handshake failures (not > at night during it failing, but during the day when it still gets the SSL > handshake failures, but doesn’t fail the health check) and it seems there > is a d/c and a reconnect during the handshake. > > > > Here is my config, I will be running a tcpdump tonight to capture the > packets during the failure and will attach it if you guys need more info. > > > > #- > > # Example configuration for a possible web application. See the > > # full configuration options online. > > # > > # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt > > # > > #- > > > > #- > > # Global settings > > #- > > global > > log 127.0.0.1 local2 > > > > pidfile /var/run/haproxy.pid > > maxconn 3 > > userhaproxy > > group haproxy > > daemon > > ssl-default-bind-options no-sslv3 no-tls-tickets > > tune.ssl.default-dh-param 2048 > > > > # turn on stats unix socket > > #stats socket /var/lib/haproxy/stats` > > > > #- > > # common defaults that all the 'listen' and 'backend' sections will > > # use if not designated in their block > > #- > > defaults > > modehttp > > log global > > option httplog > > retries 3 > > timeout http-request5s > > timeout queue 1m > > timeout connect 31s > > timeout client 31s > > timeout server 31s > > maxconn 15000 > > > > # Stats > > statsenable > > stats uri /haproxy?stats > > stats realm Strictly\ Private > > stats auth$StatsUser:$StatsPass > > > > #- > > # main frontend which proxys to the backends > > #- > > > > frontend shared_incoming > > maxconn 15000 > > timeout http-request 5s > > > > #Bind ports of incoming traffic > > bind *:1025 accept-proxy # http > > bind *:1026 accept-proxy ssl crt /path/to/default/ssl/cert.pem ssl crt > /path/to/cert/folder/ # https > > bind *:1027 # Health checking port > > acl gs_texthtml url_reg \/gstext\.html## allow gs to do meta tag > verififcation > > acl gs_user_agent hdr_sub(User-Agent) -i globalsign## > allow gs to do meta tag verififcation > > > > # Add headers > > http-request set-header $Proxy-Header-Ip %[src] > > http-request set-header $Proxy-Header-Proto http if !{ ssl_fc } > > http-request set-header $Proxy-Header-Proto https if { ssl_fc } > > > > # Route traffic based on domain > > use_backend gs_verify if gs_texthtml or gs_user_agent## allow gs > meta tag verification > > use_backend > %[req.hdr(host),lower,map_dom(/path/to/map/file.map,unknown_domain)] > > > > # Drop unrecognized traffic > > default_backend unknown_domain > > > > #- > > # Backends > > #- > > > > backend
Re: Help! HAProxy randomly failing health checks!
Greetings, On 03/15/2016 02:54 PM, Zachary Punches wrote: Hello! My name is Zack, and I have been in the middle of an on going HAProxy issue that has me scratching my head. Here is the setup: Our setup is hosted by amazon, and our HAProxy (1.6.3) boxes are in each region in 3 regions. We have 2 HAProxy boxes per region for a total of 6 proxy boxes. These boxes are routed information through route 53. Their entire job is to forward data from one of our clients to our database backend. It handles this absolutely fine, except between the hours of 7pm PST and 7am PST. During these hours, our route53 health checks time out thus causing the traffic to switch to the other HAProxy box inside of the same region. During the other 12 hours of the day, we receive 0 alerts from our health checks. I have noticed that we get a series of SSL handshake failures (though this happens throughout the entire day) that causes the server to hang for a second, thus causing the health checks to fail. During the day our SSL failures do not cause the server to hang long enough to go fail the checks, they only fail at night. I have attached my HAProxy config hoping that you guys have an answer for me. Lemme know if you need any more info. Before thinking about less obvious potential causes, the CPU of the instance isn't close to getting capped out during the time in question? Also, are the connection counts under 15,000 (otherwise I could see it ending up with a timeout and trying again)? - Chad I have done a few tcpdump captures during the SSL handshake failures (not at night during it failing, but during the day when it still gets the SSL handshake failures, but doesn’t fail the health check) and it seems there is a d/c and a reconnect during the handshake. Here is my config, I will be running a tcpdump tonight to capture the packets during the failure and will attach it if you guys need more info. #- # Example configuration for a possible web application. See the # full configuration options online. # # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt # #- #- # Global settings #- global log 127.0.0.1 local2 pidfile /var/run/haproxy.pid maxconn 3 userhaproxy group haproxy daemon ssl-default-bind-options no-sslv3 no-tls-tickets tune.ssl.default-dh-param 2048 # turn on stats unix socket #stats socket /var/lib/haproxy/stats` #- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #- defaults modehttp log global option httplog retries 3 timeout http-request5s timeout queue 1m timeout connect 31s timeout client 31s timeout server 31s maxconn 15000 # Stats stats enable stats uri /haproxy?stats stats realm Strictly\ Private stats auth $StatsUser:$StatsPass #- # main frontend which proxys to the backends #- frontend shared_incoming maxconn 15000 timeout http-request 5s #Bind ports of incoming traffic bind *:1025 accept-proxy # http bind *:1026 accept-proxy ssl crt /path/to/default/ssl/cert.pem ssl crt /path/to/cert/folder/ # https bind *:1027 # Health checking port acl gs_texthtml url_reg \/gstext\.html## allow gs to do meta tag verififcation acl gs_user_agent hdr_sub(User-Agent) -i globalsign## allow gs to do meta tag verififcation # Add headers http-request set-header $Proxy-Header-Ip %[src] http-request set-header $Proxy-Header-Proto http if !{ ssl_fc } http-request set-header $Proxy-Header-Proto https if { ssl_fc } # Route traffic based on domain use_backend gs_verify if gs_texthtml or gs_user_agent## allow gs meta tag verification use_backend %[req.hdr(host),lower,map_dom(/path/to/map/file.map,unknown_domain)] # Drop unrecognized traffic default_backend unknown_domain #- # Backends #- backend server0 ## added to allow gs ssl meta tag verification reqrep ^GET\ /.*\ (HTTP/.*)GET\ /GlobalSignVerification\ \1 server server0_http
Re: Help haproxy
Yes, i have just the option that's make gone right. It's just the option as you know. Regards, Mathieu 2015-02-05 10:03 GMT+01:00 Yuan Long yuan.l...@chinanetcloud.com: Do you have the words option forward for in your config. http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20forwardfor Can you copy/paste your config (without sensitive info if needed). Regards, Long Wu Yuan 龙 武 缘 Sr. Linux Engineer 高级工程师 ChinaNetCloud 云络网络科技(上海)有限公司 | www.ChinaNetCloud.com1238 Xietu Lu, X2 Space 1-601, Shanghai, China | 中国上海市徐汇区斜土路1238号X2空 间1-601室 24x7 Support Hotline: +86-400-618-0024 | Office Tel: +86-(21)-6422-1946 We are hiring! http://careers.chinanetcloud.com | Customer Portal - https://customer-portal.service.chinanetcloud.com/ On Mon, Feb 2, 2015 at 11:45 PM, Sander Klein roe...@roedie.nl wrote: On 02.02.2015 16:33, Mathieu Sergent wrote: Hi Sander, Yes i reloaded the haproxy and my web server too. But no change. And i'm not using proxy protocol. To give you more precisions, on my web server i used tcpdump functions which give me back the header of the requete http. And in this i found my client's address. But this is really strange that i can do it without the forwardfor. The only other thing that I can think of is that your client is behind a proxy server which adds the X-Forward-For header for you... Or you got something strange in your config... Sander
Re: Help haproxy
Maybe that is why the clinet ipaddress is forwarded. What is your webserver (ngix/apache) software and what is the logformat configuration in it. Regards, Long Wu Yuan 龙 武 缘 Sr. Linux Engineer 高级工程师 ChinaNetCloud 云络网络科技(上海)有限公司 | www.ChinaNetCloud.com1238 Xietu Lu, X2 Space 1-601, Shanghai, China | 中国上海市徐汇区斜土路1238号X2空 间1-601室 24x7 Support Hotline: +86-400-618-0024 | Office Tel: +86-(21)-6422-1946 We are hiring! http://careers.chinanetcloud.com | Customer Portal - https://customer-portal.service.chinanetcloud.com/ On Mon, Feb 9, 2015 at 5:28 PM, Mathieu Sergent mathieu.sergent...@gmail.com wrote: Yes, i have just the option that's make gone right. It's just the option as you know. Regards, Mathieu 2015-02-05 10:03 GMT+01:00 Yuan Long yuan.l...@chinanetcloud.com: Do you have the words option forward for in your config. http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20forwardfor Can you copy/paste your config (without sensitive info if needed). Regards, Long Wu Yuan 龙 武 缘 Sr. Linux Engineer 高级工程师 ChinaNetCloud 云络网络科技(上海)有限公司 | www.ChinaNetCloud.com1238 Xietu Lu, X2 Space 1-601, Shanghai, China | 中国上海市徐汇区斜土路1238号X2空 间1-601室 24x7 Support Hotline: +86-400-618-0024 | Office Tel: +86-(21)-6422-1946 We are hiring! http://careers.chinanetcloud.com | Customer Portal - https://customer-portal.service.chinanetcloud.com/ On Mon, Feb 2, 2015 at 11:45 PM, Sander Klein roe...@roedie.nl wrote: On 02.02.2015 16:33, Mathieu Sergent wrote: Hi Sander, Yes i reloaded the haproxy and my web server too. But no change. And i'm not using proxy protocol. To give you more precisions, on my web server i used tcpdump functions which give me back the header of the requete http. And in this i found my client's address. But this is really strange that i can do it without the forwardfor. The only other thing that I can think of is that your client is behind a proxy server which adds the X-Forward-For header for you... Or you got something strange in your config... Sander
Re: Help haproxy
Do you have the words option forward for in your config. http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#option%20forwardfor Can you copy/paste your config (without sensitive info if needed). Regards, Long Wu Yuan 龙 武 缘 Sr. Linux Engineer 高级工程师 ChinaNetCloud 云络网络科技(上海)有限公司 | www.ChinaNetCloud.com1238 Xietu Lu, X2 Space 1-601, Shanghai, China | 中国上海市徐汇区斜土路1238号X2空 间1-601室 24x7 Support Hotline: +86-400-618-0024 | Office Tel: +86-(21)-6422-1946 We are hiring! http://careers.chinanetcloud.com | Customer Portal - https://customer-portal.service.chinanetcloud.com/ On Mon, Feb 2, 2015 at 11:45 PM, Sander Klein roe...@roedie.nl wrote: On 02.02.2015 16:33, Mathieu Sergent wrote: Hi Sander, Yes i reloaded the haproxy and my web server too. But no change. And i'm not using proxy protocol. To give you more precisions, on my web server i used tcpdump functions which give me back the header of the requete http. And in this i found my client's address. But this is really strange that i can do it without the forwardfor. The only other thing that I can think of is that your client is behind a proxy server which adds the X-Forward-For header for you... Or you got something strange in your config... Sander
Re: Help haproxy
2015-02-02 16:45 GMT+01:00 Sander Klein roe...@roedie.nl: The only other thing that I can think of is that your client is behind a proxy server which adds the X-Forward-For header for you... Or you got something strange in your config... Sander You're totally right. My client have a proxy server. I feel very sorry for this stupid mistake. Thanks for your time and your help. Regards, Mathieu
Re: Help haproxy
On 02.02.2015 12:09, Mathieu Sergent wrote: Hi, I try to set up a load balancing with HAProxy and 3 web servers. I want to receive on my web servers the address' client. I read that it is possible with the option source ip usesrc but you need to be root. If you want to not be root, you have to used HAProxy with Tproxy. But Tproxy demand too much system configuration. There is an other solution ? I hope that you have understood my problem. Yours sincerely. Mathieu Sergent PS : Sorry for my English. Your English is no problem. ;-) You can add an X-Forwarded-For header using haproxy. If you then use mod_rpaf for apache or realip on nginx you can easily substitute the loadbalancer ip with the ip of the client. Regards, Sander
Re: Help haproxy
Hi, On Mon, Feb 02, Sander Klein wrote: On 02.02.2015 12:09, Mathieu Sergent wrote: Hi, I try to set up a load balancing with HAProxy and 3 web servers. I want to receive on my web servers the address' client. I read that it is possible with the option source ip usesrc but you need to be root. If you want to not be root, you have to used HAProxy with Tproxy. But Tproxy demand too much system configuration. There is an other solution ? I hope that you have understood my problem. Yours sincerely. Mathieu Sergent PS : Sorry for my English. Your English is no problem. ;-) You can add an X-Forwarded-For header using haproxy. If you then use mod_rpaf for apache or realip on nginx you can easily substitute the loadbalancer ip with the ip of the client. Or if you're running apache 2.4 then it should come with mod_remoteip: http://httpd.apache.org/docs/current/mod/mod_remoteip.html And for tomcat there's: https://tomcat.apache.org/tomcat-7.0-doc/api/org/apache/catalina/valves/RemoteIpValve.html -Jarno -- Jarno Huuskonen
Re: Help haproxy
Hi Sander, Yes i reloaded the haproxy and my web server too. But no change. And i'm not using proxy protocol. To give you more precisions, on my web server i used tcpdump functions which give me back the header of the requete http. And in this i found my client's address. But this is really strange that i can do it without the forwardfor. Regards, Mathieu 2015-02-02 16:15 GMT+01:00 Sander Klein roe...@roedie.nl: Hi Mathieu, Pleas keep the list in the CC. On 02.02.2015 15:26, Mathieu Sergent wrote: Thanks for your reply. I just used the option forwardfor in the haproxy configuration. And i can find client's address from my web server (with tcpdump). But if i don't use the option forwardfor, the web server still find the client's address. That's make any sense ? To be honest, that doesn't make any sense to me. Are you sure you have reloaded the haproxy process after you removed the forwardfor? Or, could it be you are using the proxy protocol (send-proxy)? Greets, Sander
Re: Help haproxy
Hi Mathieu, Pleas keep the list in the CC. On 02.02.2015 15:26, Mathieu Sergent wrote: Thanks for your reply. I just used the option forwardfor in the haproxy configuration. And i can find client's address from my web server (with tcpdump). But if i don't use the option forwardfor, the web server still find the client's address. That's make any sense ? To be honest, that doesn't make any sense to me. Are you sure you have reloaded the haproxy process after you removed the forwardfor? Or, could it be you are using the proxy protocol (send-proxy)? Greets, Sander
Re: Help haproxy
On 02.02.2015 16:33, Mathieu Sergent wrote: Hi Sander, Yes i reloaded the haproxy and my web server too. But no change. And i'm not using proxy protocol. To give you more precisions, on my web server i used tcpdump functions which give me back the header of the requete http. And in this i found my client's address. But this is really strange that i can do it without the forwardfor. The only other thing that I can think of is that your client is behind a proxy server which adds the X-Forward-For header for you... Or you got something strange in your config... Sander
Re: Help! haproxy+nginx+tomcats cluster problems.
Hi, On Sat, Jan 23, 2010 at 12:05:16PM +0800, ËïéªËÉ wrote: Hello ! I have questions ! please help me ! thank you very much ! my cluster works , but not excellent. that's please see this architecture below my questions first. Q1:on the tomcats ,there are always 500~800 TIME_WAIT connections from haproxy; Time_wait sockets are harmless and normal. 800 is very low and you don't have to worry about them. I have already reached 4 millions :-) Q2:when I use loadrunner to test this cluster , the result disappointed my very much. you have to describe a bit more what happened. (...) global log 127.0.0.1 local0 notice maxconn 40960 chroot /usr/local/haproxy/chroot uid 99 gid 99 nbproc 8 remove nbproc, it can only cause you trouble. pidfile /usr/local/haproxy/logs/haproxy.pid stats socket /var/run/haproxy.stat mode 600 stats maxconn 65536 you don't need to support 64k connections to the stats socket, that's nonsense. Simply remove this statement. ulimit-n 65536 you can safely remove that one too since it's automatically computed from maxconn. daemon #debug #quiet nosepoll You can save some CPU cycles by commenting out nosepoll. defaults log global mode http option httplog please add log global here, otherwise you won't log. retries 3 option redispatch option abortonclose maxconn 40960 Please always ensure that any defaults or listen or frontend's maxconn is lower than the global one. contimeout 5000 clitimeout 3 srvtimeout 3 timeout check 2000 option dontlognull frontend haproxy bind 0.0.0.0:80 mode http option httplog option httpclose option forwardfor maxconn5 same here about maxconn. clitimeout 3 acl statcs url_reg ^[^?]*\.(jpg|html|htm|png|gif|css|shtml)([?]|$) This one can be simplified a lot and without an expensive regex : acl statcs path_end .jpg .html .htm .png .gif .css .shtml OK, your config is otherwise correct. What load did you reach with loadrunner, in terms of requests per second and concurrent connections ? What response time did you observe ? What haproxy version are you running on ? You may have to start a new test with the fixed parameters above and with logging enabled (please check that your syslog daemon correctly logs). Otherwise you won't know where your issues happen. Regards, Willy