Re: HAProxy 1.8.4 crashing

2018-07-05 Thread Holger Just
Hi Praveen,

There are several fixes for segfaults which might occur in your version
of HAProxy. Before checking anything else, you should upgrade to the
latest version of HAProxy 1.8 (currently 1.8.12).

See http://www.haproxy.org/bugs/bugs-1.8.4.html for bugs fixed in this
version compared to your current version.

Regards,
Holger

UPPALAPATI, PRAVEEN wrote:
> 
> Hi Haproxy Team,
> 
> Our Prod Haproxy instance is crashing with following error in 
> /var/log/messages:
> 
> Jun 28 17:52:30 zlp32359 kernel: haproxy[55940]: segfault at 60 ip 
> 0045b0a9 sp 7f4ef6b9f010 error 4 in haproxy[40+12b000]
> Jun 28 17:56:01 zlp32359 systemd: Started Session 73792 of user root.
> Jun 28 17:56:01 zlp32359 systemd: Starting Session 73792 of user root.
> Jun 28 17:56:01 zlp32359 LMC: Hardware Manufacturer = VMWARE
> Jun 28 17:56:01 zlp32359 LMC: Hardware Product Name = VMware Virtual Platform
> Jun 28 17:56:01 zlp32359 LMC: Hardware Serial # = VMware-42 29 ea 5e 6c 7b 5b 
> 49-ca 32 48 fb 5a 9d e7 d6
> Jun 28 17:56:01 zlp32359 LMC: ### NO PID_MAX ISSUES FOUND ###
> Jun 28 17:56:01 zlp32359 LMC: ### NO READ ONLY FILE SYSTEM ISSUES FOUND ###
> Jun 28 17:56:01 zlp32359 LMC: ### NO SCSI ABORT ISSUES FOUND ###
> Jun 28 17:56:02 zlp32359 LMC: ### NO SCSI ERROR ISSUES FOUND ###
> 
> HaProxyVersion:
> 
> haproxy -v
> HA-Proxy version 1.8.4-1deb90d 2018/02/08
> Copyright 2000-2018 Willy Tarreau 
> 
> Cmd to run haproxy :
> 
> //opt/app/haproxy/sbin/haproxy -D -f //opt/app/haproxy/etc/haproxy.cfg -f 
> //opt/app/haproxy/etc/ haproxy-healthcheck.cfg -p 
> //opt/app/haproxy/log/haprox.pid
> 
> Let me know if you need more information and if you need more logging let me 
> know how can enable it. I am not able to reproduce this in our dev 
> box(Probably I am not able to replicate the traffic on dev).
> 
> Thanks,
> Praveen.
> 
> 
> 
> 



Re: HAProxy 1.8.4 crashing

2018-07-05 Thread Willy Tarreau
Hi Praveen,

On Thu, Jul 05, 2018 at 04:13:25PM +, UPPALAPATI, PRAVEEN wrote:
> 
> 
> Hi Haproxy Team,
> 
> Our Prod Haproxy instance is crashing with following error in 
> /var/log/messages:
> 
> Jun 28 17:52:30 zlp32359 kernel: haproxy[55940]: segfault at 60 ip 
> 0045b0a9 sp 7f4ef6b9f010 error 4 in haproxy[40+12b000]
> Jun 28 17:56:01 zlp32359 systemd: Started Session 73792 of user root.
> Jun 28 17:56:01 zlp32359 systemd: Starting Session 73792 of user root.
> Jun 28 17:56:01 zlp32359 LMC: Hardware Manufacturer = VMWARE
> Jun 28 17:56:01 zlp32359 LMC: Hardware Product Name = VMware Virtual Platform
> Jun 28 17:56:01 zlp32359 LMC: Hardware Serial # = VMware-42 29 ea 5e 6c 7b 5b 
> 49-ca 32 48 fb 5a 9d e7 d6
> Jun 28 17:56:01 zlp32359 LMC: ### NO PID_MAX ISSUES FOUND ###
> Jun 28 17:56:01 zlp32359 LMC: ### NO READ ONLY FILE SYSTEM ISSUES FOUND ###
> Jun 28 17:56:01 zlp32359 LMC: ### NO SCSI ABORT ISSUES FOUND ###
> Jun 28 17:56:02 zlp32359 LMC: ### NO SCSI ERROR ISSUES FOUND ###
> 
> HaProxyVersion:
> 
> haproxy -v
> HA-Proxy version 1.8.4-1deb90d 2018/02/08


Well, as you can see, at least 109 bugs have been fixed since this version,
including one critical and 11 major, all potentially able to provoke a crash :

   http://www.haproxy.org/bugs/bugs-1.8.4.html

I think it's *really* time for you to apply maintenance updates and try again.

Regards,
Willy



HAProxy 1.8.4 crashing

2018-07-05 Thread UPPALAPATI, PRAVEEN



Hi Haproxy Team,

Our Prod Haproxy instance is crashing with following error in /var/log/messages:

Jun 28 17:52:30 zlp32359 kernel: haproxy[55940]: segfault at 60 ip 
0045b0a9 sp 7f4ef6b9f010 error 4 in haproxy[40+12b000]
Jun 28 17:56:01 zlp32359 systemd: Started Session 73792 of user root.
Jun 28 17:56:01 zlp32359 systemd: Starting Session 73792 of user root.
Jun 28 17:56:01 zlp32359 LMC: Hardware Manufacturer = VMWARE
Jun 28 17:56:01 zlp32359 LMC: Hardware Product Name = VMware Virtual Platform
Jun 28 17:56:01 zlp32359 LMC: Hardware Serial # = VMware-42 29 ea 5e 6c 7b 5b 
49-ca 32 48 fb 5a 9d e7 d6
Jun 28 17:56:01 zlp32359 LMC: ### NO PID_MAX ISSUES FOUND ###
Jun 28 17:56:01 zlp32359 LMC: ### NO READ ONLY FILE SYSTEM ISSUES FOUND ###
Jun 28 17:56:01 zlp32359 LMC: ### NO SCSI ABORT ISSUES FOUND ###
Jun 28 17:56:02 zlp32359 LMC: ### NO SCSI ERROR ISSUES FOUND ###

HaProxyVersion:

haproxy -v
HA-Proxy version 1.8.4-1deb90d 2018/02/08
Copyright 2000-2018 Willy Tarreau 

Cmd to run haproxy :

//opt/app/haproxy/sbin/haproxy -D -f //opt/app/haproxy/etc/haproxy.cfg -f 
//opt/app/haproxy/etc/ haproxy-healthcheck.cfg -p 
//opt/app/haproxy/log/haprox.pid

Let me know if you need more information and if you need more logging let me 
know how can enable it. I am not able to reproduce this in our dev box(Probably 
I am not able to replicate the traffic on dev).

Thanks,
Praveen.






Re: Issue with parsing DNS from AWS

2018-07-05 Thread Jim Deville
Hi Baptiste,


I appreciate you taking time for this, we had tried increasing the response 
size, but I believe we left hold obsolete at defaults and that probably lead to 
flapping. How often does HAProxy re-poll DNS for this? I'm curious what limits 
this really sets for how many servers we can scale to with this. Also, will DNS 
over TCP help any? Seems like it still needs roughly the same settings given 
the round-robin responses.


In the meantime, we will look into these settings to see if we can make them 
work as well.


Jim


From: Baptiste 
Sent: Tuesday, July 3, 2018 9:20:53 AM
To: Jim Deville
Cc: haproxy@formilux.org; Jonathan Works
Subject: Re: Issue with parsing DNS from AWS

Ah yes, I also added the following "init-addr none" statement on the 
server-template line.
This prevents HAProxy from using libc resolvers, which might end up in 
unpredictible behavior in that enviroment

Baptiste

On Tue, Jul 3, 2018 at 3:18 PM, Baptiste 
mailto:bed...@gmail.com>> wrote:
Well, I can partially reproduce the issue you're facing and I can see some 
weird behavior of AWS's DNS servers.

First, by default, HAProxy only support DNS over UDP and can accept up to 512 
bytes of payload in the DNS response.
DNS over TCP is not yet available and accepted payload size can be increased 
using EDNS0 extension.

There is a "magic" number of SRV records with AWS and default HAProxy accepted 
payload size, at around 4 SRV records, the response payload may be bigger than 
512 bytes.
And so, AWS DNS server does not return any data, simply returns an empty 
response, with the TRUNCATED flag.
In such case, a client is supposed to replay the request over TCP...

An other magic value with AWS DNS servers is that it won't return more than 8 
SRV records, even if you have 10 servers in your service. (even in TCP)
AWS DNS servers will simply return a round robin list of the records, some will 
disappear, some will reappear at some point in time.


Conclusion, to make HAProxy work in such environment, you want to configure it 
that way:
resolvers awsdns
  nameserver dns0 NAMESERVER:53 # <=== please remove the doule quotes
  accepted_payload_size 8192 # <=== workaround for too short 
accepted payload
  hold obsolete 30s   # <=== workaround for 
limited number of records returned by AWS

You may want to read the documentation of HAProxy's resolver. There are a few 
other timeout / hold period you could tune.

With the configuration above, I could easily scale from 2 to 10, back to 2, 
passing through 4, 8, etc... successfully and without any server flapping.
I did not try to go higher than 10. Bear in mind the "hold obsolete" period is 
the period during which HAProxy considers a server as available even if the DNS 
server did not return it in the SRV record list.

Baptiste







On Tue, Jul 3, 2018 at 1:26 PM, Baptiste 
mailto:bed...@gmail.com>> wrote:
Answering myself... I found my way in the menu to be able to allow port 9000 to 
read the stats page and to find the public IP associated to my "app".
That said, I still can't get a shell on the running container, but I think I 
found an AWS documentation page for this purpose.

I keep you updated.

On Tue, Jul 3, 2018 at 1:06 PM, Baptiste 
mailto:bed...@gmail.com>> wrote:
Hi Jim,

I think I have something running...
At least, terraform did not complain and I can see "stuff" in my AWS dashoard.
Now, I have no idea how I can get connected to my running HAProxy container, 
neither how I can troubleshoot what's happening :)

Any help would be (again) appreciated.

Baptiste



On Tue, Jul 3, 2018 at 11:39 AM, Baptiste 
mailto:bed...@gmail.com>> wrote:
Hi Jim,

Sorry for the long pause :)
I was dealing with some travel, conferences and catching up on my backlog.
So, the good news, is that this issue is now my priority :)

I'll try to first reproduce it and come back to you if I have any issue during 
that step.
(by the way, thanks for the github repo to help me speed up in that step).

Baptiste




On Mon, Jun 25, 2018 at 10:54 PM, Jim Deville 
mailto:jdevi...@malwarebytes.com>> wrote:

Hi Bapiste,


I just wanted to follow up to see if you were able to repro and perhaps had a 
patch we could try?


Jim


From: Jim Deville
Sent: Thursday, June 21, 2018 1:05:49 PM
To: Baptiste
Cc: haproxy@formilux.org; Jonathan Works
Subject: Re: Issue with parsing DNS from AWS


Thanks for the reply, we were able to extract a minimal repro to demonstrate 
the problem: https://github.com/jgworks/haproxy-servicediscovery



The docker folder contains a version of the config we're using and a startup 
script to determine the local private DNS zone (AWS puts it at the subnet's +2).


Jim


From: Baptiste mailto:bed...@gmail.com>>
Sent: Thursday, June 21, 2018 11:02:26 AM
To: Jim Deville
Cc: