subject:"Re\: Weird issues with UNIX\-Sockets on 2.1.x"

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-30 Thread Christian Ruppert


On 2020-03-27 16:58, Christian Ruppert wrote:

On 2020-03-27 16:49, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:

On 2020-03-27 16:27, Olivier Houchard wrote:
> On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
>> During the reload I just found something in the daemon log:
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [:::18540]
>>
>> So during the reload, this happened and seems to have caused any
>> further
>> issues/trouble.
>>
>
> That would make sense. Does that mean you have old processes hanging
> around ? Do you use seemless reload ? If so, it shouldn't attempt to
> bind the socket, but get them from the old process.

I remember that it was necessary to have a systemd wrapper around, as 
it

caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd
though.
[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" 
"PIDFILE=/run/haproxy.pid"

ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify


[...]


We've added the TimeoutStopSec=30 for some reason (I'd have to ask my
college, something took longer or something like that, since we have
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems 
like

the fallback didn't work properly?



The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting 
the

timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats 
socket,

and get all the listening sockets, so that it won't have to bind them.



Oh, that sounds quite handy. I wasn't aware of it. I'll add it
soonish. Thanks for the hint!


https://www.haproxy.com/de/blog/hitless-reloads-with-haproxy-howto/
"Please note that this step does not need to be performed if your 
HAProxy configuration already contains the directive “master-worker”, or 
if it is started with the option -W."


I have steps to reproduce it:
A C sample to bind the socket (nc doesn't work for some reason):
#include 
#include 
#include 
#include 

int main() {
int sock;
struct sockaddr_in server;

sock = socket(AF_INET , SOCK_STREAM , 0);
if (sock == -1) {
printf("Failed to create socket!\n");
}

server.sin_family = AF_INET;
server.sin_addr.s_addr = INADDR_ANY;
server.sin_port = htons(1338);

if( bind(sock,(struct sockaddr *)  , sizeof(server)) == -1) {
printf("Failed to bind socket!\n");
}

while(1) {
sleep(1);
}

return 0;
}

gcc socket.c -o socket
./socket

Having a initial HAProxy config:
global
user haproxy
group haproxy

log-send-hostname

log 127.0.0.1 len 65535 local0

   stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 
600 level admin



frontend unixsocket_reload
bind 127.0.0.1:1337
bind unix@/run/haproxy-sockettest.sock user haproxy group root mode 600
mode http
log global


And starting it, with sytemd, ending up in:
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Testing:
curl --unix-socket /run/haproxy-sockettest.sock http://127.0.0.1 -vs
echo help | socat unix-connect:/run/haproxy.stat stdio

Adding a second frontend to the haproxy.cfg:
frontend unixsocket_reload2
bind 127.0.0.1:1338
	bind unix@/run/haproxy-sockettest-2.sock user haproxy group root mode 
600

mode http
log global

systemctl reload haproxy

curl and socat doesn't work anymore while the TCP socket still works.

Now restarting HAProxy with the initial config but with the adjusted 
stats socket:
stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 
level admin expose-fd listeners


Note that the -x will be appended automatically (at least for systemd 
-Ws)


And doing the same again. curl and socat still works. The new frontend 
does not even though the UNIX socket it created.
I think the way that works is ok for me then. Thanks for pointing out 
the expose-fd listeners!





Regards,

Olivier


--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

On 2020-03-27 16:49, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:

On 2020-03-27 16:27, Olivier Houchard wrote:
> On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
>> During the reload I just found something in the daemon log:
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [:::18540]
>>
>> So during the reload, this happened and seems to have caused any
>> further
>> issues/trouble.
>>
>
> That would make sense. Does that mean you have old processes hanging
> around ? Do you use seemless reload ? If so, it shouldn't attempt to
> bind the socket, but get them from the old process.

I remember that it was necessary to have a systemd wrapper around, as 
it

caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd
though.
[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" 
"PIDFILE=/run/haproxy.pid"

ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify

[...]

We've added the TimeoutStopSec=30 for some reason (I'd have to ask my
college, something took longer or something like that, since we have
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems 
like

the fallback didn't work properly?

The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting 
the

timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats socket,
and get all the listening sockets, so that it won't have to bind them.

Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. 
Thanks for the hint!

Regards,

Olivier

--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard

On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:
> On 2020-03-27 16:27, Olivier Houchard wrote:
> > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
> >> During the reload I just found something in the daemon log:
> >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
> >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
> >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
> >> Starting proxy someotherlistener: cannot bind socket [:::18540]
> >> 
> >> So during the reload, this happened and seems to have caused any 
> >> further
> >> issues/trouble.
> >> 
> > 
> > That would make sense. Does that mean you have old processes hanging
> > around ? Do you use seemless reload ? If so, it shouldn't attempt to
> > bind the socket, but get them from the old process.
> 
> I remember that it was necessary to have a systemd wrapper around, as it 
> caused trouble otherwise, due to PID being changed etc.
> Not sure if that wrapper is still in use. In this case it's systemd 
> though.
> [Unit]
> Description=HAProxy Load Balancer
> After=network.target
> 
> [Service]
> Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecReload=/bin/kill -USR2 $MAINPID
> KillMode=mixed
> Restart=always
> SuccessExitStatus=143
> TimeoutStopSec=30
> Type=notify

[...]

> We've added the TimeoutStopSec=30 for some reason (I'd have to ask my 
> college, something took longer or something like that, since we have 
> quite a lot of frontends/listener/backend)
> Only the two processes I mentioned before are / were running. Seems like 
> the fallback didn't work properly?
> 

The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting the
timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats socket,
and get all the listening sockets, so that it won't have to bind them.

Regards,

Olivier

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert


On 2020-03-27 16:27, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:

During the reload I just found something in the daemon log:
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
Starting proxy someotherlistener: cannot bind socket [:::18540]

So during the reload, this happened and seems to have caused any 
further

issues/trouble.



That would make sense. Does that mean you have old processes hanging
around ? Do you use seemless reload ? If so, it shouldn't attempt to
bind the socket, but get them from the old process.


I remember that it was necessary to have a systemd wrapper around, as it 
caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd 
though.

[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify

# The following lines leverage SystemD's sandboxing options to provide
# defense in depth protection at the expense of restricting some 
flexibility

# in your setup (e.g. placement of your configuration files) or possibly
# reduced performance. See systemd.service(5) and systemd.exec(5) for 
further

# information.

# NoNewPrivileges=true
# ProtectHome=true
# If you want to use 'ProtectSystem=strict' you should whitelist the 
PIDFILE,

# any state files and any other files written using 'ReadWritePaths' or
# 'RuntimeDirectory'.
# ProtectSystem=true
# ProtectKernelTunables=true
# ProtectKernelModules=true
# ProtectControlGroups=true
# If your SystemD version supports them, you can add: @reboot, @swap, 
@sync

# SystemCallFilter=~@cpu-emulation @keyring @module @obsolete @raw-io

[Install]
WantedBy=multi-user.target


We've added the TimeoutStopSec=30 for some reason (I'd have to ask my 
college, something took longer or something like that, since we have 
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems like 
the fallback didn't work properly?




Regards,

Olivier


--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard

On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
> During the reload I just found something in the daemon log:
> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
> Starting proxy someotherlistener: cannot bind socket [:::18540]
> 
> So during the reload, this happened and seems to have caused any further 
> issues/trouble.
> 

That would make sense. Does that mean you have old processes hanging
around ? Do you use seemless reload ? If so, it shouldn't attempt to
bind the socket, but get them from the old process.

Regards,

Olivier

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert


During the reload I just found something in the daemon log:
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
Starting proxy someotherlistener: cannot bind socket [:::18540]


So during the reload, this happened and seems to have caused any further 
issues/trouble.


On 2020-03-27 15:10, Christian Ruppert wrote:

So now I looked for more of those "SC"'s in the log, from our
monitoring and it appeared first around 13:38:01.
Around 13:37:54 a reload was issued by puppet or rundeck.
So right now, it seems that something happened during the reload which
affected UNIX sockets.

On 2020-03-27 15:00, Christian Ruppert wrote:

Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is 
dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.

If you still have the TCP stat socket working, can you show the 
output

of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure
whether it accepts any commands to be honest.
pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate =
219.704 kbps
Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

So now I looked for more of those "SC"'s in the log, from our monitoring 
and it appeared first around 13:38:01.

Around 13:37:54 a reload was issued by puppet or rundeck.
So right now, it seems that something happened during the reload which 
affected UNIX sockets.


On 2020-03-27 15:00, Christian Ruppert wrote:

Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.


If you still have the TCP stat socket working, can you show the output
of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure
whether it accepts any commands to be honest.
pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate =
219.704 kbps
Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert


Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.


If you still have the TCP stat socket working, can you show the output
of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure whether 
it accepts any commands to be honest.

pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 
219.704 kbps

Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert

Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:
> Hi list,
> 
> we have some weird issues now, the second time, that *some* SSL sockets 
> seem to be broken as well as stats sockets.
> HTTP seems to work fine, still, SSL ones are broken however. It happened 
> at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether 
> the first time was on 2.1.2 or 2.1.3.
> The one that failed today was updated yesterday, so HAProxy has an 
> uptime of about 24h.
> We're using threads. default + HTTP is using 1 thread, 1 is dedicated 
> for a TCP listener/Layer-4, one is for RSA only and all the rest is for 
> ECC.
[...]
> The problem ocurred arount 13:40 (CET, in case it matters at some point)
> 
> Any ideas so far?
> 

So basically, it used to work, and suddenly you get errors on any TLS
connection ?
If you still have the TCP stat socket working, can you show the output
of "show fd" ?

Thanks !

Olivier

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

Re: Weird issues with UNIX-Sockets on 2.1.x

9 matches

Site Navigation

Mail list logo

Footer information