Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:58, Christian Ruppert wrote: On 2020-03-27 16:49, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote: On 2020-03-27 16:27, Olivier Houchard wrote: > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: >> During the reload I just found something in the daemon log: >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [:::18540] >> >> So during the reload, this happened and seems to have caused any >> further >> issues/trouble. >> > > That would make sense. Does that mean you have old processes hanging > around ? Do you use seemless reload ? If so, it shouldn't attempt to > bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify [...] We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? The wrapper is no longer needed, it has been superceeded by the master-worker (which you seem to be using, given you're using -Ws). It is possible the old process refuse to die, and you end up hitting the timeout and it gets killed eventually, but it's too late. Do you have a expose-fd listeners on the unix stats socket ? Using it will allow the new process to connect to the old process' stats socket, and get all the listening sockets, so that it won't have to bind them. Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. Thanks for the hint! https://www.haproxy.com/de/blog/hitless-reloads-with-haproxy-howto/ "Please note that this step does not need to be performed if your HAProxy configuration already contains the directive “master-worker”, or if it is started with the option -W." I have steps to reproduce it: A C sample to bind the socket (nc doesn't work for some reason): #include #include #include #include int main() { int sock; struct sockaddr_in server; sock = socket(AF_INET , SOCK_STREAM , 0); if (sock == -1) { printf("Failed to create socket!\n"); } server.sin_family = AF_INET; server.sin_addr.s_addr = INADDR_ANY; server.sin_port = htons(1338); if( bind(sock,(struct sockaddr *) , sizeof(server)) == -1) { printf("Failed to bind socket!\n"); } while(1) { sleep(1); } return 0; } gcc socket.c -o socket ./socket Having a initial HAProxy config: global user haproxy group haproxy log-send-hostname log 127.0.0.1 len 65535 local0 stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 level admin frontend unixsocket_reload bind 127.0.0.1:1337 bind unix@/run/haproxy-sockettest.sock user haproxy group root mode 600 mode http log global And starting it, with sytemd, ending up in: /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid Testing: curl --unix-socket /run/haproxy-sockettest.sock http://127.0.0.1 -vs echo help | socat unix-connect:/run/haproxy.stat stdio Adding a second frontend to the haproxy.cfg: frontend unixsocket_reload2 bind 127.0.0.1:1338 bind unix@/run/haproxy-sockettest-2.sock user haproxy group root mode 600 mode http log global systemctl reload haproxy curl and socat doesn't work anymore while the TCP socket still works. Now restarting HAProxy with the initial config but with the adjusted stats socket: stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 level admin expose-fd listeners Note that the -x will be appended automatically (at least for systemd -Ws) And doing the same again. curl and socat still works. The new frontend does not even though the UNIX socket it created. I think the way that works is ok for me then. Thanks for pointing out the expose-fd listeners! Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:49, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote: On 2020-03-27 16:27, Olivier Houchard wrote: > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: >> During the reload I just found something in the daemon log: >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [:::18540] >> >> So during the reload, this happened and seems to have caused any >> further >> issues/trouble. >> > > That would make sense. Does that mean you have old processes hanging > around ? Do you use seemless reload ? If so, it shouldn't attempt to > bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify [...] We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? The wrapper is no longer needed, it has been superceeded by the master-worker (which you seem to be using, given you're using -Ws). It is possible the old process refuse to die, and you end up hitting the timeout and it gets killed eventually, but it's too late. Do you have a expose-fd listeners on the unix stats socket ? Using it will allow the new process to connect to the old process' stats socket, and get all the listening sockets, so that it won't have to bind them. Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. Thanks for the hint! Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote: > On 2020-03-27 16:27, Olivier Houchard wrote: > > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: > >> During the reload I just found something in the daemon log: > >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : > >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] > >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : > >> Starting proxy someotherlistener: cannot bind socket [:::18540] > >> > >> So during the reload, this happened and seems to have caused any > >> further > >> issues/trouble. > >> > > > > That would make sense. Does that mean you have old processes hanging > > around ? Do you use seemless reload ? If so, it shouldn't attempt to > > bind the socket, but get them from the old process. > > I remember that it was necessary to have a systemd wrapper around, as it > caused trouble otherwise, due to PID being changed etc. > Not sure if that wrapper is still in use. In this case it's systemd > though. > [Unit] > Description=HAProxy Load Balancer > After=network.target > > [Service] > Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" > ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q > ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE > ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q > ExecReload=/bin/kill -USR2 $MAINPID > KillMode=mixed > Restart=always > SuccessExitStatus=143 > TimeoutStopSec=30 > Type=notify [...] > We've added the TimeoutStopSec=30 for some reason (I'd have to ask my > college, something took longer or something like that, since we have > quite a lot of frontends/listener/backend) > Only the two processes I mentioned before are / were running. Seems like > the fallback didn't work properly? > The wrapper is no longer needed, it has been superceeded by the master-worker (which you seem to be using, given you're using -Ws). It is possible the old process refuse to die, and you end up hitting the timeout and it gets killed eventually, but it's too late. Do you have a expose-fd listeners on the unix stats socket ? Using it will allow the new process to connect to the old process' stats socket, and get all the listening sockets, so that it won't have to bind them. Regards, Olivier
Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:27, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: During the reload I just found something in the daemon log: Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [:::18540] So during the reload, this happened and seems to have caused any further issues/trouble. That would make sense. Does that mean you have old processes hanging around ? Do you use seemless reload ? If so, it shouldn't attempt to bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify # The following lines leverage SystemD's sandboxing options to provide # defense in depth protection at the expense of restricting some flexibility # in your setup (e.g. placement of your configuration files) or possibly # reduced performance. See systemd.service(5) and systemd.exec(5) for further # information. # NoNewPrivileges=true # ProtectHome=true # If you want to use 'ProtectSystem=strict' you should whitelist the PIDFILE, # any state files and any other files written using 'ReadWritePaths' or # 'RuntimeDirectory'. # ProtectSystem=true # ProtectKernelTunables=true # ProtectKernelModules=true # ProtectControlGroups=true # If your SystemD version supports them, you can add: @reboot, @swap, @sync # SystemCallFilter=~@cpu-emulation @keyring @module @obsolete @raw-io [Install] WantedBy=multi-user.target We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: > During the reload I just found something in the daemon log: > Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : > Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] > Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : > Starting proxy someotherlistener: cannot bind socket [:::18540] > > So during the reload, this happened and seems to have caused any further > issues/trouble. > That would make sense. Does that mean you have old processes hanging around ? Do you use seemless reload ? If so, it shouldn't attempt to bind the socket, but get them from the old process. Regards, Olivier
Re: Weird issues with UNIX-Sockets on 2.1.x
During the reload I just found something in the daemon log: Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [:::18540] So during the reload, this happened and seems to have caused any further issues/trouble. On 2020-03-27 15:10, Christian Ruppert wrote: So now I looked for more of those "SC"'s in the log, from our monitoring and it appeared first around 13:38:01. Around 13:37:54 a reload was issued by puppet or rundeck. So right now, it seems that something happened during the reload which affected UNIX sockets. On 2020-03-27 15:00, Christian Ruppert wrote: Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
So now I looked for more of those "SC"'s in the log, from our monitoring and it appeared first around 13:38:01. Around 13:37:54 a reload was issued by puppet or rundeck. So right now, it seems that something happened during the reload which affected UNIX sockets. On 2020-03-27 15:00, Christian Ruppert wrote: Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: > Hi list, > > we have some weird issues now, the second time, that *some* SSL sockets > seem to be broken as well as stats sockets. > HTTP seems to work fine, still, SSL ones are broken however. It happened > at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether > the first time was on 2.1.2 or 2.1.3. > The one that failed today was updated yesterday, so HAProxy has an > uptime of about 24h. > We're using threads. default + HTTP is using 1 thread, 1 is dedicated > for a TCP listener/Layer-4, one is for RSA only and all the rest is for > ECC. [...] > The problem ocurred arount 13:40 (CET, in case it matters at some point) > > Any ideas so far? > So basically, it used to work, and suddenly you get errors on any TLS connection ? If you still have the TCP stat socket working, can you show the output of "show fd" ? Thanks ! Olivier