Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Willy Tarreau
Hi Pierre,

> Apart from that, we exchanged off-list with Willy about the submitted patch.
> It seems that if fixes the issue. I now have only one instance bound to the
> TCP sockets after the reloads, the others are there just to terminate the
> existing connections.

And thank you for the quick tests in the live environment, that was very
helpful. Here's the patch series I have added to address the issue (there
were other abominations in this wrapper that had to be dealt with) :

  7643d09 BUG/MINOR: systemd: make the wrapper return a non-null status code on 
error
  4351ea6 BUG/MINOR: systemd: always restore signals before execve()
  3747ea0 BUG/MINOR: systemd: check return value of calloc()
  a785269 MINOR: systemd: report it when execve() fails
  b957109 BUG/MEDIUM: systemd: let the wrapper know that haproxy has completed 
or failed

I intend to backport this soon into 1.6 and even 1.5. Normally the
wrapper is expected to be exactly the same so the patches should
apply (unless we missed some fixes of course). Those facing the issue
on these versions are welcome to test, if more issues remain we'll have
to address them anyway.

cheers,
Willy



RE: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Pierre Cheynier
Hi,


I didn't subscribed to the list and noticed that there was several exchanges on 
this thread that I didn't read so far.


To share a bit more of our context:


* we do not reload every 2ms, this was the setting used to be able to reproduce 
easily and in a short period of time. Our reload average is more around 5 to 
10s, which seems consistent to me on relatively big setups (I'm talking about 1 
hundred of physical nodes per DC that makes run up to 1 thousand of app 
instances).


* true, it's something that becomes very common as long as I/PaaS-style 
architectures are adopted. On our side we work with Apache Mesos and schedulers 
that add/remove backends as long as the end-user scale his application or if 
node/app fails, are under maintenance etc.


By the way, I noticed that a lot of these "trending" projects are using HAProxy 
as their external load balancing stack (and most of them are also usually run 
over systemd-based distros), so it seems to me that this will fix some setups 
(that apparently rely on Yelp approach to 'safely restart' their haproxy - but 
induce latencies).


Apart from that, we exchanged off-list with Willy about the submitted patch. It 
seems that if fixes the issue. I now have only one instance bound to the TCP 
sockets after the reloads, the others are there just to terminate the existing 
connections.


Pierre


Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Willy Tarreau
Hi Holger,

On Tue, Oct 25, 2016 at 12:38:26PM +0200, Holger Just wrote:
> Hey Willy,
> 
> Willy Tarreau wrote:
> > I absolutely despise systemd and each time I have to work on the
> > wrapper I feel like I'm going to throw up. So for me working on this
> > crap is a huge pain each time. But I'm really fed up with seeing
> > people having problems in this crazy environment because one
> > clueless guy decided that he knew better than all others how a daemon
> > should reload, so whatever we can do to make our users' lifes easier
> > in captivity should at least be considered.
> 
> Just to be sure, I don't like systemd for mostly the reasons you
> mentioned. However, I do use the systemd wrapper to reliably run HAProxy
> under runit for a couple of years now.

This feedback is actually useful because if we later decide to improve
the design, we must absolutely take such other use cases into consideration.

> Since they (similar to most service managers) also expect a service to
> have one stable parent process even after reloading,

In my opinion this is a flawed expectation. I really don't understand how
a *service* manager confuses a *service* and a *process*. There is not a
1-to-1 relation between the two at all. A service is something that is
delivered. We don't care how it is delivered. Some kernel daemons do not
even involve any process at all. Further, certains settings like IP
forwarding are services in my opinion yet they just end up being a sysctl.

> the systemd wrapper
> acts as a nice workaround to facilitate reloading.

I've been using it from time to time to get multi-process *and* stdout/stderr
available for debugging.

> The same wrapper
> allows simple service handling with Solaris's SMF and is a much better
> solution than the crude python script I wrote a couple of years ago for
> this simple process.

Good point.

> I guess what I've been trying to say is: the wrapper is absolutely
> useful for about any process manager, not just systemd and I would love
> to see it stay compatible with other process managers like runit.

Do you see any specific value that I'm overlooing in using the wrapper
as it is now compared to having it natively integrated into haproxy ?
The only value I'm seeing is a possibly lower memory consumption for very
large configurations after the copy-on-write happens in the children. I
think it's a small benefit to be honnest.

Thanks for your feedback,
Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Holger Just
Hey Willy,

Willy Tarreau wrote:
> I absolutely despise systemd and each time I have to work on the
> wrapper I feel like I'm going to throw up. So for me working on this
> crap is a huge pain each time. But I'm really fed up with seeing
> people having problems in this crazy environment because one
> clueless guy decided that he knew better than all others how a daemon
> should reload, so whatever we can do to make our users' lifes easier
> in captivity should at least be considered.

Just to be sure, I don't like systemd for mostly the reasons you
mentioned. However, I do use the systemd wrapper to reliably run HAProxy
under runit for a couple of years now.

Since they (similar to most service managers) also expect a service to
have one stable parent process even after reloading, the systemd wrapper
acts as a nice workaround to facilitate reloading. The same wrapper
allows simple service handling with Solaris's SMF and is a much better
solution than the crude python script I wrote a couple of years ago for
this simple process.

I guess what I've been trying to say is: the wrapper is absolutely
useful for about any process manager, not just systemd and I would love
to see it stay compatible with other process managers like runit.

Thanks for the great work Willy, here and on the Kernel.

Regards, Holger



Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Willy Tarreau
Hi Jarno,

On Tue, Oct 25, 2016 at 11:43:44AM +0300, Jarno Huuskonen wrote:
> This is probably a bit off topic, but there's sd_notify call
> (and [Service] Type=notify)
> where service can notify systemd that it's done starting/reloading
> configuration:
> https://www.freedesktop.org/software/systemd/man/sd_notify.html

Thank you, that can be useful for future improvements on the wrapper.
And no, it's not off-topic, quite the opposite.

> I don't know if systemd-wrapper would call:
> sd_notify(0, "RELOADING=1");
> ... restart
> sd_notify(0, "READY=1");
> if this would prevent systemd from trying to do multiple reloads before
> haproxy has finished starting.

Yes, however for now the problem is that the wrapper doesn't even know
whether or not haproxy has finished restarting since it stays attached.

I'm starting to think that the wrapper was a good idea to address short
term incompatibilities, but over the long term we may have to think again
about a master-worker architecture that would address this. And this
combined with previous Simon's work on the socket server could possibly
also help address the RST on close issue.

Thanks,
willy



Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Jarno Huuskonen
Hi,

On Sat, Oct 22, Willy Tarreau wrote:
> Another important point, when you say you restart every 2ms, are you
> certain you have a way to ensure that everything is completely started
> before you issue your signal to kill the old process ? I'm asking because
> thanks to the principle that the wrapper must stay in foreground (smart
> design choice from systemd), there's no way for a service manager to
> know whether all processes are fully started or not. With a normal init,
> when the process returns, all sub-processes have been created.

This is probably a bit off topic, but there's sd_notify call
(and [Service] Type=notify)
where service can notify systemd that it's done starting/reloading
configuration:
https://www.freedesktop.org/software/systemd/man/sd_notify.html

I don't know if systemd-wrapper would call:
sd_notify(0, "RELOADING=1");
... restart
sd_notify(0, "READY=1");
if this would prevent systemd from trying to do multiple reloads before
haproxy has finished starting.

-Jarno

-- 
Jarno Huuskonen



Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Pavlos Parissis
On 25/10/2016 01:21 πμ, Willy Tarreau wrote:
> Hi guys,
> 
> On Tue, Oct 25, 2016 at 12:42:26AM +0200, Lukas Tribus wrote:
>> Not fixing *real world issues* because we don't agree with the use-case or
>> there is a design misconception somewhere else is dangerous. We don't have
>> to support every single obscure use-case out there, that's not what I am
>> saying, but systemd is a reality; as is docker and periodic reloads.
> (...)
> 
> Thank you both for your insights. There are indeed valid points in both
> of your arguments. I too am afraid of breaking things for people who do
> really abuse, but at the same time we cannot blame the users who get
> caught by systemd lying to them. I really don't care about people who
> would reload every 2 ms to be honnest, but I'm concerned about people
> shooting themselves in the foot because they use very large configs or
> because as you say Lukas, they accidently run the command twice. This
> is something we used to deal with in the past, it's hard to lose this
> robustness. I've seen configs taking minutes to start (300k backends),
> reduced to a few seconds after the backends were moved to a binary
> tree. But these seconds remain a period of uncertainty and that's not
> nice for the user.
> 
> I think the patch I sent this evening covers both of your concerns. It's
> quite simple, relies on a process *closing* a file descriptor, which also
> covers a dying/crashing process (because I never trust any design consisting
> in saying "I promise I will tell you when I die"). And it doesn't
> significantly change things. I'm really interested in feedback on it.
> Pavlos, please honnestly tell me if it really scares you and I'd like
> to address this (even if that means doing it differently). Let's consider
> that I want something backportable into HAPEE, you know that users there
> can be very demanding regarding reliability :-)
> 

Well, I have full confidence on the quality of your code (assuming you will
polish the patch to handle errors as you mentioned :-) ) and I am willing to
test it on our environment when arrives on HAPEE. But, we will never hit the
conditions which triggers this behavior as our configuration tool for haproxy
doesn't allow to reload very often, we allow 1 reload per min (this is
configurable of course). We did that in order to address also the case of too
many live processes for a cluster of haproxies which has a lot of long-lived TCP
connections [1].


> I'm really open to suggestions. I absolutely despise systemd and each time
> I have to work on the wrapper I feel like I'm going to throw up. So for me
> working on this crap is a huge pain each time. But I'm really fed up with
> seeing people having problems in this crazy environment because one clueless
> guy decided that he knew better than all others how a daemon should reload,
> so whatever we can do to make our users' lifes easier in captivity should
> at least be considered.
> 

Have you considered to report this to systemd? May be they have a solution, I
don't know.

To sum up, go ahead with the patch as it addresses real problems of users and
you can count on me testing HAPEE in our environment.

Cheers,
Pavlos

[1] I have mentioned before that we balance rsyslog traffic from 20K clients
and every time we reload haproxy we see old processes staying alive for days.
This is because frontend/backend runs in TCP mode and rsyslog daemon on clients
doesn't terminate the connection till it is restarted, it opens 1
long-lived TCP connection against frontend. haproxy can't close the connection
on shutdown as it does with HTTP mode as it doesn't understand the protocol and
tries to play nice and graceful with the clients, I will wait for you to close
the connection.



signature.asc
Description: OpenPGP digital signature


Re: HAProxy reloads lets old and outdated processes

2016-10-25 Thread Pavlos Parissis
Good morning,

Got my coffee ready before I read and reply:-)

On 25/10/2016 12:42 πμ, Lukas Tribus wrote:
> Hello,
> 
> 
> Am 24.10.2016 um 22:32 schrieb Pavlos Parissis:
>> 
>> IMHO: Ask the users to not perform reloads every 2miliseconds. It is
>> insane. You may spend X hours on this which will make the code a bot more
>> complex and cause possible breakages somewhere else.
> 
> Not fixing *real world issues* because we don't agree with the use-case or
> there is a design misconception somewhere else is dangerous. We don't have to
> support every single obscure use-case out there, that's not what I am saying,
> but systemd is a reality; as is docker and periodic reloads.
> 
> You are talking about 2 milliseconds, but that is just the testcase here,
> think about how long haproxy would need to start when it has to load
> thousands of certificates. Probably more than a few seconds (I don't have any
> clue), and it would be pretty easy to create a mess of processes, not because
> of docker/cloud orchestration/whatever, but in SSH by hitting reload two
> times in a row manually.
> 
> I don't want to be scared of hitting reload two times even if I'm on a
> systemd based box with heavy SSL traffic. In fact, I *do* wanna be able to
> reload haproxy every 2 ms, not because I need it, but because the alternative
> would mean I need to remember to be "always careful" about that "strange
> issue with systemd which is not our fault" and make sure my colleague is not
> doing the same thing I'm doing and we reload simultaneously. I don't want to
> run my infrastructure like a house of cards.
> 
> This is not limited to fancy new cloud orchestration technologies and it is
> not a minor issue either.
> 
> 

All valid points. The bottom line is to trust the reload process that wont cause
unexpected behavior regardless the frequency of reloads and wallclock of a
single reload.


> 
>> I am pretty sure 90% of the cases which require so often reload are the
>> ones which try to integrate HAProxy with docker stuff, where servers in the
>> pools are treated as ephemeral nodes, appear and disappear very often and
>> at high volume.
> 
> Not sure if I understand you here correctly, but this sounds like you are 
> implying that we shouldn't spend time fixing issues related to docker (and 
> similar technologies). I have to disagree.
> 

Οn the contrary, I have requested for ETA of DNS SRV functionality, which allows
to extend and shrink the backend without reload, and I have also requested for
the ability to add/remove servers via the socket. All these because I need to
support docker on my environments:-)

The high frequency of reloads on docker environment is the result of missing the
above 2 functionalities.

> 
> We may not like systemd and we may not like docker. But that doesn't mean
> its not worth looking into those issues.
> 

Οn the contrary, I *do* love systemd. I am not joking here.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Willy Tarreau
Hi guys,

On Tue, Oct 25, 2016 at 12:42:26AM +0200, Lukas Tribus wrote:
> Not fixing *real world issues* because we don't agree with the use-case or
> there is a design misconception somewhere else is dangerous. We don't have
> to support every single obscure use-case out there, that's not what I am
> saying, but systemd is a reality; as is docker and periodic reloads.
(...)

Thank you both for your insights. There are indeed valid points in both
of your arguments. I too am afraid of breaking things for people who do
really abuse, but at the same time we cannot blame the users who get
caught by systemd lying to them. I really don't care about people who
would reload every 2 ms to be honnest, but I'm concerned about people
shooting themselves in the foot because they use very large configs or
because as you say Lukas, they accidently run the command twice. This
is something we used to deal with in the past, it's hard to lose this
robustness. I've seen configs taking minutes to start (300k backends),
reduced to a few seconds after the backends were moved to a binary
tree. But these seconds remain a period of uncertainty and that's not
nice for the user.

I think the patch I sent this evening covers both of your concerns. It's
quite simple, relies on a process *closing* a file descriptor, which also
covers a dying/crashing process (because I never trust any design consisting
in saying "I promise I will tell you when I die"). And it doesn't
significantly change things. I'm really interested in feedback on it.
Pavlos, please honnestly tell me if it really scares you and I'd like
to address this (even if that means doing it differently). Let's consider
that I want something backportable into HAPEE, you know that users there
can be very demanding regarding reliability :-)

I'm really open to suggestions. I absolutely despise systemd and each time
I have to work on the wrapper I feel like I'm going to throw up. So for me
working on this crap is a huge pain each time. But I'm really fed up with
seeing people having problems in this crazy environment because one clueless
guy decided that he knew better than all others how a daemon should reload,
so whatever we can do to make our users' lifes easier in captivity should
at least be considered.

Cheers,
Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Lukas Tribus

Hello,


Am 24.10.2016 um 22:32 schrieb Pavlos Parissis:


IMHO: Ask the users to not perform reloads every 2miliseconds. It is insane.
You may spend X hours on this which will make the code a bot more complex and
cause possible breakages somewhere else.


Not fixing *real world issues* because we don't agree with the use-case 
or there is a design misconception somewhere else is dangerous. We don't 
have to support every single obscure use-case out there, that's not what 
I am saying, but systemd is a reality; as is docker and periodic reloads.


You are talking about 2 milliseconds, but that is just the testcase 
here, think about how long haproxy would need to start when it has to 
load thousands of certificates. Probably more than a few seconds (I 
don't have any clue), and it would be pretty easy to create a mess of 
processes, not because of docker/cloud orchestration/whatever, but in 
SSH by hitting reload two times in a row manually.


I don't want to be scared of hitting reload two times even if I'm on a 
systemd based box with heavy SSL traffic. In fact, I *do* wanna be able 
to reload haproxy every 2 ms, not because I need it, but because the 
alternative would mean I need to remember to be "always careful" about 
that "strange issue with systemd which is not our fault" and make sure 
my colleague is not doing the same thing I'm doing and we reload 
simultaneously. I don't want to run my infrastructure like a house of cards.


This is not limited to fancy new cloud orchestration technologies and it 
is not a minor issue either.





I am pretty sure 90% of the cases which require so often reload are the ones 
which
try to integrate HAProxy with docker stuff, where servers in the pools are 
treated
as ephemeral nodes, appear and disappear very often and at high volume.


Not sure if I understand you here correctly, but this sounds like you 
are implying that we shouldn't spend time fixing issues related to 
docker (and similar technologies). I have to disagree.



We may not like systemd and we may not like docker. But that doesn't 
mean its not worth looking into those issues.





So now I'm wondering what to do with all this mess. Declaring systemd
misdesigned and born with some serious trauma will not help us progress
on this, so we need to work around this pile of crap which tries to prevent
us from dealing with a simple service.


This.


Just my two cents,
Lukas




Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Pavlos Parissis
On 24/10/2016 09:13 μμ, Willy Tarreau wrote:
> Hi again,
> 
> On Mon, Oct 24, 2016 at 07:41:06PM +0200, Willy Tarreau wrote:
>> I don't know if this is something you're interested in experimenting
>> with. This is achieved using fcntl(F_SETLKW). It should be done in the
>> wrapper as well.
> 
> Finally I did it and it doesn't help at all. The signal-based asynchronous
> reload is fundamentally flawed. It's amazing to see how systemd managed to
> break something simple and robust in the sake of reliability, by introducing
> asynchronous signal delivery...
> 
> The problem is not even with overlapping writes (well, it very likely
> happens) but it is related to the fact that you never know whom you're
> sending your signals at all and that the children may not even be started
> yet, or may not have had the time to process the whole config file, etc.
> 
> So now I'm wondering what to do with all this mess. Declaring systemd
> misdesigned and born with some serious trauma will not help us progress
> on this, so we need to work around this pile of crap which tries to prevent
> us from dealing with a simple service.
> 
> Either we find a way to completely redesign the wrapper, even possibly the
> relation between the wrapper and the sub-processes, or we'll simply have
> to get rid of the reload action under systemd and reroute it to a restart.
> 
> I've thought about something which could possibly work though I'm far from
> being sure for now.
> 
> Let's say that the wrapper tries to take an exclusive lock on the pidfile
> upon receipt of SIGUSR2. It then keeps the file open and passes this FD to
> all the haproxy sub-processes. Ideally the FD num is passed as an argument
> to the child.
> 
> Once it fork()+exec(), it can simply close its fd. The exclusive lock is still
> maintained by the children so it's not lost. The benefit is that at this
> point, until the sub-processes have closed the pid file, there's no way for
> the wrapper to pick the same lock again. Thus it can *know* the processes
> have not finished booting. This will cause further SIGUSR2 processing to
> wait for the children processes to either start or die. Sort of a way to
> "pass" the lock to the sub-processes.
> 
> Here we don't even care if signals are sent in storm because only one of
> them will be used and will have to wait for the previous one to be dealt
> with.
> 
> The model is not perfect and ideally a lock file would be better than using
> the pidfile since the pidfile currently is opened late in haproxy and requires
> an unlinking in case of successful startup. But I suspect that using extra
> files will just make things worse. And I don't know if it's possible to flock
> something else (eg: a pipe).
> 
> BTW, that just makes me realize that we also have another possibility for this
> precisely using a pipe (which are more portable than mandatory locks). Let's
> see if that would work. The wrapper creates a pipe then forks. The child
> closes the read side, the parent the write side. Then the parent performs a
> read() on this fd and waits until it returns zero. The child execve() and
> calls the haproxy sub-processes. The FD is closed after the pidfile is updated
> (and in children). After the last close, the wrapper receives a zero on this
> pipe. If haproxy dies, the pipe is closed as well. We could even (ab)use it
> to let the wrapper know whether the process properly started or not, or pass
> the pids there (though that just needlessly complicates operations).
> 
> Any opinion on this ?
> 
> Willy
> 

IMHO: Ask the users to not perform reloads every 2miliseconds. It is insane.
You may spend X hours on this which will make the code a bot more complex and
cause possible breakages somewhere else.

I am pretty sure 90% of the cases which require so often reload are the ones 
which
try to integrate HAProxy with docker stuff, where servers in the pools are 
treated
as ephemeral nodes, appear and disappear very often and at high volume.

@Pieper, what is your user-case for so many reloads?


My 0.02cents,
Pavlos




signature.asc
Description: OpenPGP digital signature


Testers needed [Re: HAProxy reloads lets old and outdated processes]

2016-10-24 Thread Willy Tarreau
On Mon, Oct 24, 2016 at 09:13:13PM +0200, Willy Tarreau wrote:
> BTW, that just makes me realize that we also have another possibility for this
> precisely using a pipe (which are more portable than mandatory locks). Let's
> see if that would work. The wrapper creates a pipe then forks. The child
> closes the read side, the parent the write side. Then the parent performs a
> read() on this fd and waits until it returns zero. The child execve() and
> calls the haproxy sub-processes. The FD is closed after the pidfile is updated
> (and in children). After the last close, the wrapper receives a zero on this
> pipe. If haproxy dies, the pipe is closed as well. We could even (ab)use it
> to let the wrapper know whether the process properly started or not, or pass
> the pids there (though that just needlessly complicates operations).

So before taking the road back home I decided to give it a try. Here comes the
patch. "It works for me"(TM). Instead of seeing my kill loop leave over 500
processes while spinning like mad, now I constantly have 10 for nbproc=10,
which is good. I also fixed a small bug by which the wrapper could get killed
by its own signal during the reexec where signals are unmasked.

The patch is ugly for now, it's just a proof of concept, it lacks any error
checking. I cannot make it fail anymore, so please try it, hammer it, torture
it. If it works fine I'm even willing to backport it to 1.6 given that it's
not much intrusive and will fix the problems for all victims of systemd.

Regards,
Willy
>From 50970eb815dabfab9443aa3e80ce710d8af99feb Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Mon, 24 Oct 2016 22:17:17 +0200
Subject: EXP: maintain a pipe between the wrapper and its kids to notify when
 it's safe to reload
X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.2.4

Using this trick, the wrapper refrains from performing new reloads until the
currently starting processes either succeed or die.

This command used to leave >500 kids for a nbproc=10 config involving SSL
(slower to start), now it leaves exactly 10 :

 $ for i in {1..100}; do killall -USR2 haproxy-systemd-wrapper;done

It also fixes a bug where the newly re-executed wrapper could catch the
signal before intercepting it and die. Now we ignore it just before doing
the execve.
---
 src/haproxy-systemd-wrapper.c | 29 +
 src/haproxy.c | 13 +
 2 files changed, 42 insertions(+)

diff --git a/src/haproxy-systemd-wrapper.c b/src/haproxy-systemd-wrapper.c
index d118ec6..b2065d9 100644
--- a/src/haproxy-systemd-wrapper.c
+++ b/src/haproxy-systemd-wrapper.c
@@ -66,16 +66,28 @@ static void spawn_haproxy(char **pid_strv, int nb_pid)
pid_t pid;
int main_argc;
char **main_argv;
+   int pipefd[2];
+   char fdstr[20];
+   int ret;
 
main_argc = wrapper_argc - 1;
main_argv = wrapper_argv + 1;
 
+   if (pipe(pipefd) != 0)
+   exit(1);
+
pid = fork();
if (!pid) {
/* 3 for "haproxy -Ds -sf" */
char **argv = calloc(4 + main_argc + nb_pid + 1, sizeof(char 
*));
int i;
int argno = 0;
+
+   close(pipefd[0]); /* close the read side */
+
+   snprintf(fdstr, sizeof(fdstr), "%d", pipefd[1]);
+   setenv("HAPROXY_WRAPPER_FD", fdstr, 1);
+
locate_haproxy(haproxy_bin, 512);
argv[argno++] = haproxy_bin;
for (i = 0; i < main_argc; ++i)
@@ -96,6 +108,19 @@ static void spawn_haproxy(char **pid_strv, int nb_pid)
execv(argv[0], argv);
exit(0);
}
+
+   /* The parent closes the write side and waits for the child to close it
+* as well. Also deal the case where the fd would unexpectedly be 1 or 2
+* by silently draining all data.
+*/
+   close(pipefd[1]);
+
+   do {
+   char c;
+   ret = read(pipefd[0], , sizeof(c));
+   } while ((ret > 0) || (ret == -1 && errno == EINTR));
+   /* the child has finished starting up */
+   close(pipefd[0]);
 }
 
 static int read_pids(char ***pid_strv)
@@ -134,6 +159,10 @@ static void do_restart(int sig)
fprintf(stderr, SD_NOTICE "haproxy-systemd-wrapper: re-executing on 
%s.\n",
sig == SIGUSR2 ? "SIGUSR2" : "SIGHUP");
 
+   /* don't let the other process take one of those signals by accident */
+   signal(SIGUSR2, SIG_IGN);
+   signal(SIGHUP, SIG_IGN);
+   signal(SIGINT, SIG_IGN);
execv(wrapper_argv[0], wrapper_argv);
 }
 
diff --git a/src/haproxy.c b/src/haproxy.c
index b1c10b6..522fad0 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1998,6 +1998,19 @@ int main(int argc, char **argv)
close(pidfd);
}
 
+   /* each child must notify the wrapper that it's ready by 
closing the requested fd */
+   {
+

Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Willy Tarreau
Hi again,

On Mon, Oct 24, 2016 at 07:41:06PM +0200, Willy Tarreau wrote:
> I don't know if this is something you're interested in experimenting
> with. This is achieved using fcntl(F_SETLKW). It should be done in the
> wrapper as well.

Finally I did it and it doesn't help at all. The signal-based asynchronous
reload is fundamentally flawed. It's amazing to see how systemd managed to
break something simple and robust in the sake of reliability, by introducing
asynchronous signal delivery...

The problem is not even with overlapping writes (well, it very likely
happens) but it is related to the fact that you never know whom you're
sending your signals at all and that the children may not even be started
yet, or may not have had the time to process the whole config file, etc.

So now I'm wondering what to do with all this mess. Declaring systemd
misdesigned and born with some serious trauma will not help us progress
on this, so we need to work around this pile of crap which tries to prevent
us from dealing with a simple service.

Either we find a way to completely redesign the wrapper, even possibly the
relation between the wrapper and the sub-processes, or we'll simply have
to get rid of the reload action under systemd and reroute it to a restart.

I've thought about something which could possibly work though I'm far from
being sure for now.

Let's say that the wrapper tries to take an exclusive lock on the pidfile
upon receipt of SIGUSR2. It then keeps the file open and passes this FD to
all the haproxy sub-processes. Ideally the FD num is passed as an argument
to the child.

Once it fork()+exec(), it can simply close its fd. The exclusive lock is still
maintained by the children so it's not lost. The benefit is that at this
point, until the sub-processes have closed the pid file, there's no way for
the wrapper to pick the same lock again. Thus it can *know* the processes
have not finished booting. This will cause further SIGUSR2 processing to
wait for the children processes to either start or die. Sort of a way to
"pass" the lock to the sub-processes.

Here we don't even care if signals are sent in storm because only one of
them will be used and will have to wait for the previous one to be dealt
with.

The model is not perfect and ideally a lock file would be better than using
the pidfile since the pidfile currently is opened late in haproxy and requires
an unlinking in case of successful startup. But I suspect that using extra
files will just make things worse. And I don't know if it's possible to flock
something else (eg: a pipe).

BTW, that just makes me realize that we also have another possibility for this
precisely using a pipe (which are more portable than mandatory locks). Let's
see if that would work. The wrapper creates a pipe then forks. The child
closes the read side, the parent the write side. Then the parent performs a
read() on this fd and waits until it returns zero. The child execve() and
calls the haproxy sub-processes. The FD is closed after the pidfile is updated
(and in children). After the last close, the wrapper receives a zero on this
pipe. If haproxy dies, the pipe is closed as well. We could even (ab)use it
to let the wrapper know whether the process properly started or not, or pass
the pids there (though that just needlessly complicates operations).

Any opinion on this ?

Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Willy Tarreau
Hi Pierre,

On Mon, Oct 24, 2016 at 12:46:34PM +, Pierre Cheynier wrote:
> Unfortunately, I remember we had the same issue (but less frequently) on
> CentOS6 which is init-based.

OK but be careful, we used to have other issues with signals in the
past, it's not necessarily exactly the same thing.

> I tried to reproduce, but didn't succeed... So let's ignore that for now, it
> was maybe related to something else.

Yes I prefer not to mix all issues :-)

> > Ah this is getting very interesting. Maybe we should hack systemd-wrapper
> > to log the signals it receives and the signals and pids it sends to see
> > what is happening here. It may also be that the signal is properly sent
> > but never received (but why ?).
> 
> Clearly. Apparently I sometimes have a wrong information in the pidfile...
> 
> Have a look at journald logs: 
> 
> Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
> haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44941
> Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
> 297/122657 (44951) : config : 'option forwardfor' ignored for frontend 
> 'https-in' as it requires HTTP mode.
> Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
> haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44952
> Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
> 297/122700 (44978) : config : 'option forwardfor' ignored for frontend 
> 'https-in' as it requires HTTP mode.
> Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
> haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44983
> Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
> 297/122705 (45131) : config : 'option forwardfor' ignored for frontend 
> 'https-in' as it requires HTTP mode.
> Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
> haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 45132
> Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
> 297/122709 (45146) : config : 'option forwardfor' ignored for frontend 
> 'https-in' as it requires HTTP mode.
> 
> Hopefully I've an error in my config, which let me see the process of the 
> first child :).
> Here we can the that: 
> * 44978 references (-sf) 44952 (child of 44951)
> * 45131 references 44983=nobody that we've seen in the logs... (so 44978 and 
> its child will stay alive forever !)
> * 45146 references 45132 (child of 45131)

That's completely strange... but very interesting. I guess we're getting
closer to the root cause of this problem. The fact that you didn't see
44983 in the log is not an issue by itself because it very likely is a
child of 44978. What surprises me however is that 44978 was not seen.
It would be useful to take a timestamped snapshot of your pid file before
each such reload. I'm interested in knowing whether these entries *really*
exist.

What I'm suspecting now is that for any reason sometimes your config takes
time to load (many certs, huge ACL files, FQDN host names for servers adding
a dependency on external DNS, etc). And sometimes a new process might be
restarted before the first one is completely ready. In this case we could
imagine something like this (approximately, still guessing) :

  Process A (old)Process B (new)

open(pidfile, O_TRUNC)
write(parent=44978)
write(child=44979)
write(child=44980)
write(child=44981)
write(child=44982)
 open(pidfile, O_TRUNC)
write(child=44983)
 write(child=45131)
 write(child=45132)
 write(child=45133)
...

You see the point. In the end you'd have 44983, 45131, 45132, etc... in
the pid file.

This problem could not happen in the past because the reload was synchronous.
But now that it's asynchronous with systemd, there's no way to ensure the
caller waits for the operation to be completed.

At best what we could do is try to lock the pidfile using mandatory
locks (and ignore the error if locks are not implemented as is common
on many embedded systems). This would at least serialize the access to
the pid file, ensuring that nobody writes past each other and that the
file is only read once complete.

I don't know if this is something you're interested in experimenting
with. This is achieved using fcntl(F_SETLKW). It should be done in the
wrapper as well.

Best regards,
Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Beluc
Hi,

here is the idea :

[...]
ExecReload=
ExecReload=/bin/kill -USR2 $MAINPID
[...]

not the sexiest solution but it do the job and I never got the problem
anymore ;)

2016-10-24 17:43 GMT+02:00 Pierre Cheynier :
>> A solution I use is to delay next reload in systemd unit until a
>> reload is in progress.
>
> Unfortunately, even when doing this you can end up in the situation described 
> before, because for systemd a reload is basically a SIGUSR2 to send. You do 
> not wait for some callback saying "I'm now OK and fully reloaded" (if I'm 
> wrong, I could be interested in your systemd setup).
>
> I naively tried such approach by adding a grace period of 2s (sleep) and 
> avoid to send another reload during that period, but at some point you'll 
> encounter the same issue when upstream contention will be higher (meaning 
> that you'll have ton of things to reload, you'll then add delay and decrease 
> real-time aspect of your solution etc etc).



RE: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Pierre Cheynier
> A solution I use is to delay next reload in systemd unit until a
> reload is in progress.

Unfortunately, even when doing this you can end up in the situation described 
before, because for systemd a reload is basically a SIGUSR2 to send. You do not 
wait for some callback saying "I'm now OK and fully reloaded" (if I'm wrong, I 
could be interested in your systemd setup).

I naively tried such approach by adding a grace period of 2s (sleep) and avoid 
to send another reload during that period, but at some point you'll encounter 
the same issue when upstream contention will be higher (meaning that you'll 
have ton of things to reload, you'll then add delay and decrease real-time 
aspect of your solution etc etc).


Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Beluc
Hi,
I had similar issues when reloading haproxy with lot of ssl (long to fork).
A solution I use is to delay next reload in systemd unit until a
reload is in progress.

2016-10-24 17:06 GMT+02:00 Willy Tarreau :
> On Mon, Oct 24, 2016 at 01:09:59PM +, Pierre Cheynier wrote:
>> > Same for all of them. Very interesting, SIGUSR2 (12) is set
>> > in SigIgn :-)  One question is "why", but at least we know we
>> > have a workaround consisiting in unblocking these signals in
>> > haproxy-systemd-wrapper, as we did in haproxy.
>>
>> > Care to retry with the attached patch ?
>>
>> Same behaviour.
>>
>> SigIgn is still 1000 (which is probably normal, I assume the 
>> goal was to ignore that).
>
> No, the goal was to ensure we don't block anything. I was a bit quicky at
> copy-pasting it from haproxy before going to a meeting, I can recheck, but
> there's something odd here.
>
> Willy
>



Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Willy Tarreau
On Mon, Oct 24, 2016 at 01:09:59PM +, Pierre Cheynier wrote:
> > Same for all of them. Very interesting, SIGUSR2 (12) is set
> > in SigIgn :-)  One question is "why", but at least we know we
> > have a workaround consisiting in unblocking these signals in
> > haproxy-systemd-wrapper, as we did in haproxy.
> 
> > Care to retry with the attached patch ?
> 
> Same behaviour.
> 
> SigIgn is still 1000 (which is probably normal, I assume the goal 
> was to ignore that).

No, the goal was to ensure we don't block anything. I was a bit quicky at
copy-pasting it from haproxy before going to a meeting, I can recheck, but
there's something odd here.

Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Simon Dick
On 24 October 2016 at 13:46, Pierre Cheynier  wrote:

> Hi,
>
> Sorry, wrong order in the answers.
>
> > Yes it has something to do with it because it's the systemd-wrapper which
> > delivers the signal to the old processes in this mode, while in the
> normal
> > mode the processes get the signal directly from the new process. Another
> > important point is that exactly *all* users having problem with zombie
> > processes are systemd users, with no exception. And this problem has
> never
> > existed over the first 15 years where systems were using a sane init
> > instead and still do not exist on non-systemd OSes.
>
> Unfortunately, I remember we had the same issue (but less frequently) on
> CentOS6 which is init-based.
> I tried to reproduce, but didn't succeed... So let's ignore that for now,
> it was maybe related to something else.
>
>
I had similar problems in my last job when I was reloading haproxy pretty
frequently using standard init.d scripts from the consul-template program.
I even updated it to the latest 1.6 at the time without noticeable
improvements.


RE: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Pierre Cheynier
> Same for all of them. Very interesting, SIGUSR2 (12) is set
> in SigIgn :-)  One question is "why", but at least we know we
> have a workaround consisiting in unblocking these signals in
> haproxy-systemd-wrapper, as we did in haproxy.

> Care to retry with the attached patch ?

Same behaviour.

SigIgn is still 1000 (which is probably normal, I assume the goal 
was to ignore that).

Pierre



RE: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Pierre Cheynier
Hi, 

Sorry, wrong order in the answers.

> Yes it has something to do with it because it's the systemd-wrapper which
> delivers the signal to the old processes in this mode, while in the normal
> mode the processes get the signal directly from the new process. Another
> important point is that exactly *all* users having problem with zombie
> processes are systemd users, with no exception. And this problem has never
> existed over the first 15 years where systems were using a sane init
> instead and still do not exist on non-systemd OSes.

Unfortunately, I remember we had the same issue (but less frequently) on 
CentOS6 which is init-based.
I tried to reproduce, but didn't succeed... So let's ignore that for now, it 
was maybe related to something else.

> OK that's interesting. And when this happens, they stay there forever ?

Yes, these process are never stopped and are still bound to the socket.

> Ah this is getting very interesting. Maybe we should hack systemd-wrapper
> to log the signals it receives and the signals and pids it sends to see
> what is happening here. It may also be that the signal is properly sent
> but never received (but why ?).

Clearly. Apparently I sometimes have a wrong information in the pidfile...

Have a look at journald logs: 

Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44941
Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
297/122657 (44951) : config : 'option forwardfor' ignored for frontend 
'https-in' as it requires HTTP mode.
Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44952
Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
297/122700 (44978) : config : 'option forwardfor' ignored for frontend 
'https-in' as it requires HTTP mode.
Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44983
Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
297/122705 (45131) : config : 'option forwardfor' ignored for frontend 
'https-in' as it requires HTTP mode.
Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: 
haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 45132
Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 
297/122709 (45146) : config : 'option forwardfor' ignored for frontend 
'https-in' as it requires HTTP mode.

Hopefully I've an error in my config, which let me see the process of the first 
child :).
Here we can the that: 
* 44978 references (-sf) 44952 (child of 44951)
* 45131 references 44983=nobody that we've seen in the logs... (so 44978 and 
its child will stay alive forever !)
* 45146 references 45132 (child of 45131)

> That's very kind, thank you. However I don't have access to a docker
> machine but I know some people on the list do so I hope we'll quickly
> find the cause and hopefully be able to fix it (unless it's another
> smart invention from systemd to further annoy running deamons).

> Another important point, when you say you restart every 2ms, are you
> certain you have a way to ensure that everything is completely started
> before you issue your signal to kill the old process ? 
> (..)
> So at 2ms I could easily imagine that we're delivering signals to a
> starting process, maybe even before it has the time to register a signal
> handler, and that these signals are lost before the sub-processes are
> started. 

Clearly no, my test is trivial, but as I observe the behaviour on a platform 
that operates at a different time scale (reload every 1 to 10 seconds average), 
it was just a way to reproduce the issue and be able to investigate in the 
container for ex. with gdb.

> Regards,
> Willy

Thanks !
Pierre


Re: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Willy Tarreau
Hi Pierre,

On Mon, Oct 24, 2016 at 12:16:32PM +, Pierre Cheynier wrote:
> $ grep ^Sig /proc/43135/status
> SigQ:    0/192473
> SigPnd:    
> SigBlk:    
> SigIgn:    1000
> SigCgt:    000180004803
(...)

Same for all of them. Very interesting, SIGUSR2 (12) is set
in SigIgn :-)  One question is "why", but at least we know we
have a workaround consisiting in unblocking these signals in
haproxy-systemd-wrapper, as we did in haproxy.

Care to retry with the attached patch ?

Willy
diff --git a/src/haproxy-systemd-wrapper.c b/src/haproxy-systemd-wrapper.c
index d118ec6..f42723b 100644
--- a/src/haproxy-systemd-wrapper.c
+++ b/src/haproxy-systemd-wrapper.c
@@ -169,6 +169,14 @@ int main(int argc, char **argv)
 {
int status;
struct sigaction sa;
+   sigset_t blocked_sig;
+
+   /* Ensure signals are not blocked. Some shells or service managers may
+* accidently block all of our signals unfortunately, causing lots of
+* zombie processes to remain in the background during reloads.
+*/
+   sigemptyset(_sig);
+   sigprocmask(SIG_SETMASK, _sig, NULL);
 
wrapper_argc = argc;
wrapper_argv = argv;


RE: HAProxy reloads lets old and outdated processes

2016-10-24 Thread Pierre Cheynier
Hi,

> Pierre, could you please issue "grep ^Sig /proc/pid/status" for each
> wrapper and haproxy process ? I'm interested in seeing SigIgn and
> SigBlk particularly.
> 

Sure, here is the output for the following pstree: 

$ ps fauxww | grep haproxy | grep -v grep
root 43135  0.0  0.0  46340  1820 ?    Ss   12:11   0:00 
/usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p 
/run/haproxy.pid
haproxy  43136  0.0  0.0  88988 15732 ?    S    12:11   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy  43137  0.8  0.0  88988 14200 ?    Ss   12:11   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy  43190  0.1  0.0  88988 15720 ?    S    12:11   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 43163
haproxy  43191  0.6  0.0  88988 14132 ?    Ss   12:11   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 43163
haproxy  43235  0.3  0.0  88988 15720 ?    S    12:11   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 43228
haproxy  43236  1.3  0.0  88988 14096 ?    Ss   12:11   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 43228

$ grep ^Sig /proc/43135/status
SigQ:    0/192473
SigPnd:    
SigBlk:    
SigIgn:    1000
SigCgt:    000180004803

$ grep ^Sig /proc/43136/status
SigQ:    0/192473
SigPnd:    
SigBlk:    
SigIgn:    1000
SigCgt:    000180300205

$ grep ^Sig /proc/43137/status
SigQ:    0/192473
SigPnd:    
SigBlk:    
SigIgn:    1000
SigCgt:    000180300205

$ grep ^Sig /proc/43190/status
SigQ:    0/192473
SigPnd:    
SigBlk:    
SigIgn:    1000
SigCgt:    000180300205

$ grep ^Sig /proc/43191/status
SigQ:    0/192473
SigPnd:    
SigBlk:    
SigIgn:    1000
SigCgt:    000180300205




Re: HAProxy reloads lets old and outdated processes

2016-10-21 Thread Willy Tarreau
Hi Maciej,

On Fri, Oct 21, 2016 at 03:44:22PM -0700, Maciej Katafiasz wrote:
> There was a similar issue with reloads in Docker that I reported a while
> back: https://www.mail-archive.com/haproxy@formilux.org/msg21485.html . It
> was ultimately tracked down to a faulty Golang compiler version, which
> messed up signal masks of spawned processes. This is the direction I'd look
> in; given all the hackery systemd engages in and the wanton disregard it
> shows for everything that wasn't specifically written to live in the brave
> new systemd world, I wouldn't put it past the wrapper to do something nasty
> to signals there. The good outcome here is that it's a bug and gets fixed
> eventually and then works its way to distros. The bad outcome is that it's
> intentional, and systemd maintainers tell anyone whose code got broken to
> go away, as they tend to.

I was about to say that we already worked around this issue in haproxy,
but now I don't remember having done the same to systemd-wrapper.

Pierre, could you please issue "grep ^Sig /proc/pid/status" for each
wrapper and haproxy process ? I'm interested in seeing SigIgn and
SigBlk particularly.

Cheers,
Willy



Re: HAProxy reloads lets old and outdated processes

2016-10-21 Thread Maciej Katafiasz
On 21 October 2016 at 15:33, Willy Tarreau  wrote:

>
> On Fri, Oct 21, 2016 at 03:05:55PM +, Pierre Cheynier wrote:
> > First let's clarify again: we are on systemd-based OS (centOS7), so
> reload is
> > done by sending SIGUSR2 to haproxy-systemd-wrapper.
> > Theoretically, this has absolutely no relation with our current issue
> (if I
> > understand well the way the old process are managed)
>
> Yes it has something to do with it because it's the systemd-wrapper which
> delivers the signal to the old processes in this mode, while in the normal
> mode the processes get the signal directly from the new process. Another
> important point is that exactly *all* users having problem with zombie
> processes are systemd users, with no exception. And this problem has never
> existed over the first 15 years where systems were using a sane init
> instead and still do not exist on non-systemd OSes.
>
> > This happens on servers with live traffic, but with a reasonable amount
> of
> > connections. I'm also able to reproduce with no connections, but I've to
> be a
> > bit more aggressive with the reloads frequency (probably because
> children are
> > faster to die).
>
> OK that's interesting. And when this happens, they stay there forever ?
>
> > For me the problem is not that we still have connections or not, it is
> that
> > in this case some old processes are never "aware" that they should die,
> so
> > they continues to listen for incoming requests, thanks to SO_REUSEPORT.
> > Consequently, you end up with N process listening with different configs.
>
> Ah this is getting very interesting. Maybe we should hack systemd-wrapper
> to log the signals it receives and the signals and pids it sends to see
> what is happening here. It may also be that the signal is properly sent
> but never received (but why ?).


There was a similar issue with reloads in Docker that I reported a while
back: https://www.mail-archive.com/haproxy@formilux.org/msg21485.html . It
was ultimately tracked down to a faulty Golang compiler version, which
messed up signal masks of spawned processes. This is the direction I'd look
in; given all the hackery systemd engages in and the wanton disregard it
shows for everything that wasn't specifically written to live in the brave
new systemd world, I wouldn't put it past the wrapper to do something nasty
to signals there. The good outcome here is that it's a bug and gets fixed
eventually and then works its way to distros. The bad outcome is that it's
intentional, and systemd maintainers tell anyone whose code got broken to
go away, as they tend to.

Cheers,
Maciej


Re: HAProxy reloads lets old and outdated processes

2016-10-21 Thread Willy Tarreau
Hi Pierre,

On Fri, Oct 21, 2016 at 03:05:55PM +, Pierre Cheynier wrote:
> First let's clarify again: we are on systemd-based OS (centOS7), so reload is
> done by sending SIGUSR2 to haproxy-systemd-wrapper.
> Theoretically, this has absolutely no relation with our current issue (if I
> understand well the way the old process are managed)

Yes it has something to do with it because it's the systemd-wrapper which
delivers the signal to the old processes in this mode, while in the normal
mode the processes get the signal directly from the new process. Another
important point is that exactly *all* users having problem with zombie
processes are systemd users, with no exception. And this problem has never
existed over the first 15 years where systems were using a sane init
instead and still do not exist on non-systemd OSes.

> This happens on servers with live traffic, but with a reasonable amount of
> connections. I'm also able to reproduce with no connections, but I've to be a
> bit more aggressive with the reloads frequency (probably because children are
> faster to die).

OK that's interesting. And when this happens, they stay there forever ?

> For me the problem is not that we still have connections or not, it is that
> in this case some old processes are never "aware" that they should die, so
> they continues to listen for incoming requests, thanks to SO_REUSEPORT.
> Consequently, you end up with N process listening with different configs.

Ah this is getting very interesting. Maybe we should hack systemd-wrapper
to log the signals it receives and the signals and pids it sends to see
what is happening here. It may also be that the signal is properly sent
but never received (but why ?).

> In the pstree I pasted in the previous message, there are 3 minutes between
> the first living instance and the last (and as you can see, we are quite
> aggressive with long connections) :
> 
>  timeout client 2s
>  timeout server 5s
>  timeout connect 200ms
>  timeout http-keep-alive 200ms
> 
> Here is a Dockerfile that can be used to reproduce (where I use 
> haproxy-systemd-wrapper, just run with default settings - ie nb of 
> reloads=300 and interval between each=2ms -) :
> 
> https://github.com/pierrecdn/haproxy-reload-issue
> 
> docker build -t haproxy-reload-issue . && docker run --rm -ti 
> haproxy-reload-issue

That's very kind, thank you. However I don't have access to a docker
machine but I know some people on the list do so I hope we'll quickly
find the cause and hopefully be able to fix it (unless it's another
smart invention from systemd to further annoy running deamons).

Another important point, when you say you restart every 2ms, are you
certain you have a way to ensure that everything is completely started
before you issue your signal to kill the old process ? I'm asking because
thanks to the principle that the wrapper must stay in foreground (smart
design choice from systemd), there's no way for a service manager to
know whether all processes are fully started or not. With a normal init,
when the process returns, all sub-processes have been created.

So at 2ms I could easily imagine that we're delivering signals to a
starting process, maybe even before it has the time to register a signal
handler, and that these signals are lost before the sub-processes are
started. Of course that's just a guess, but I don't see a clean way to
work around this, except of course by switching back to a reliable
service manager :-/

Regards,
Willy




RE: HAProxy reloads lets old and outdated processes

2016-10-21 Thread Pierre Cheynier
Hi Willy,

Thanks for your answer and sorry for my delay.

First let's clarify again: we are on systemd-based OS (centOS7), so reload is 
done by sending SIGUSR2 to haproxy-systemd-wrapper.
Theoretically, this has absolutely no relation with our current issue (if I 
understand well the way the old process are managed)

This happens on servers with live traffic, but with a reasonable amount of 
connections. I'm also able to reproduce with no connections, but I've to be a 
bit more aggressive with the reloads frequency (probably because children are 
faster to die).

For me the problem is not that we still have connections or not, it is that in 
this case some old processes are never "aware" that they should die, so they 
continues to listen for incoming requests, thanks to SO_REUSEPORT.

Consequently, you end up with N process listening with different configs.

In the pstree I pasted in the previous message, there are 3 minutes between the 
first living instance and the last (and as you can see, we are quite aggressive 
with long connections) :

 timeout client 2s
 timeout server 5s
 timeout connect 200ms
 timeout http-keep-alive 200ms

Here is a Dockerfile that can be used to reproduce (where I use 
haproxy-systemd-wrapper, just run with default settings - ie nb of reloads=300 
and interval between each=2ms -) :

https://github.com/pierrecdn/haproxy-reload-issue

docker build -t haproxy-reload-issue . && docker run --rm -ti 
haproxy-reload-issue

Thanks,

Pierre
    
> Hi Pierre,
>
> (...)
>
> Is this with live traffic or on a test machine ? Could you please check
> whether these instances have one connection attached ? I don't see any
> valid reason for a dying process not to leave once it doesn't have any
> more connection. And during my last attempts at fixing such issues by
> carefully reviewing the code and hammering the systemd-wrapper like mad,
> I couldn't get this behaviour to happen a single time. Thus it would be
> nice to know what these processes are doing there and why they don't
> stop.
> 
> Regards,
> Willy
 


Re: HAProxy reloads lets old and outdated processes

2016-10-18 Thread Willy Tarreau
Hi Pierre,

On Fri, Oct 14, 2016 at 10:54:43AM +, Pierre Cheynier wrote:
> Hi Lukas,
> 
> > I did not meant no-reuseport to workaround or "solve" the problem 
> definitely, but rather to see if the problems can still be triggered, 
> since you can reproduce the problem easily.
> 
> This still happens using snapshot 20161005 with no-reuseport set, a bit less 
> probably because reload is faster.
> 
> Here is what I observe after reloading 50 times, waiting 0.1 sec between 
> each: 
> 
> $ ps fauxww | tail -9
> root 50253  0.1  0.0  46340  1820 ?Ss   10:43   0:00 
> /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p 
> /run/haproxy.pid
> haproxy  51003  0.0  0.0  78256  9144 ?S10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51000
> haproxy  51025  0.3  0.0  78256  9208 ?Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51000
> haproxy  51777  0.0  0.0  78256  9144 ?S10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51771
> haproxy  51834  0.3  0.0  78256  9208 ?Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51771
> haproxy  51800  0.0  0.0  78256  9140 ?S10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51785
> haproxy  51819  0.3  0.0  78256  9204 ?Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51785
> haproxy  52083  0.0  0.0  78256  9144 ?S10:47   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 52076
> haproxy  52084  0.3  0.0  78256  3308 ?Ss   10:47   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 52076
> 
> $ sudo ss -tanp |grep -i listen | grep 80
> LISTEN 0  128  *:80   *:* 
>   users:(("haproxy",pid=52084,fd=8))
> LISTEN 0  128  *:8080 *:* 
>   users:(("haproxy",pid=52084,fd=6))
> LISTEN 0  12810.5.6.7:8000 *:*
>users:(("haproxy",pid=52084,fd=7))
> 
> $ head -12 /etc/haproxy/haproxy.cfg
> global
>  log 127.0.0.1 local0 warning
>  log 127.0.0.1 local1 notice
>  maxconn 262144
>  user haproxy
>  group haproxy
>  nbproc 1
>  chroot /var/lib/haproxy
>  pidfile /var/run/haproxy.pid
>  stats socket /var/lib/haproxy/stats
>  noreuseport
> 
> Definitely, some instances seems to be "lost" (not referenced by another) and
> will never be stopped.

Is this with live traffic or on a test machine ? Could you please check
whether these instances have one connection attached ? I don't see any
valid reason for a dying process not to leave once it doesn't have any
more connection. And during my last attempts at fixing such issues by
carefully reviewing the code and hammering the systemd-wrapper like mad,
I couldn't get this behaviour to happen a single time. Thus it would be
nice to know what these processes are doing there and why they don't
stop.

Regards,
Willy



RE: HAProxy reloads lets old and outdated processes

2016-10-18 Thread Pierre Cheynier
Hi,
Any updates/findings on that issue ?

Many thanks,

Pierre

> From : Pierre Cheynier
> To: Lukas Tribus; haproxy@formilux.org
> Sent: Friday, October 14, 2016 12:54 PM
> Subject: RE: HAProxy reloads lets old and outdated processes
>     
> Hi Lukas,
> 
> > I did not meant no-reuseport to workaround or "solve" the problem 
> definitely, but rather to see if the problems can still be triggered, 
> since you can reproduce the problem easily.
> 
> This still happens using snapshot 20161005 with no-reuseport set, a bit less 
> probably because reload is faster.
> 
> Here is what I observe after reloading 50 times, waiting 0.1 sec between 
> each: 
> 
> $ ps fauxww | tail -9
> root 50253  0.1  0.0  46340  1820 ?    Ss   10:43   0:00 
> /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p 
> /run/haproxy.pid
> haproxy  51003  0.0  0.0  78256  9144 ?    S    10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51000
> haproxy  51025  0.3  0.0  78256  9208 ?    Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51000
> haproxy  51777  0.0  0.0  78256  9144 ?    S    10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51771
> haproxy  51834  0.3  0.0  78256  9208 ?    Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51771
> haproxy  51800  0.0  0.0  78256  9140 ?    S    10:44   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51785
> haproxy  51819  0.3  0.0  78256  9204 ?    Ss   10:44   0:00  |   \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 51785
> haproxy  52083  0.0  0.0  78256  9144 ?    S    10:47   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 52076
> haproxy  52084  0.3  0.0  78256  3308 ?    Ss   10:47   0:00  \_ 
> /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 
> 52076
> 
> $ sudo ss -tanp |grep -i listen | grep 80
> LISTEN 0  128  *:80   *:* 
>   users:(("haproxy",pid=52084,fd=8))
> LISTEN 0  128  *:8080 *:* 
>   users:(("haproxy",pid=52084,fd=6))
> LISTEN 0  128    10.5.6.7:8000 *:*
>    users:(("haproxy",pid=52084,fd=7))
> 
> $ head -12 /etc/haproxy/haproxy.cfg
> global
>  log 127.0.0.1 local0 warning
>  log 127.0.0.1 local1 notice
>  maxconn 262144
>  user haproxy
>  group haproxy
>  nbproc 1
>  chroot /var/lib/haproxy
>  pidfile /var/run/haproxy.pid
>  stats socket /var/lib/haproxy/stats
>  noreuseport
> 
> Definitely, some instances seems to be "lost" (not referenced by another) and 
> will never be stopped.
> 
> In that case it will not impact the config consistency as only one is bound 
> to the socket, but the reload is far less transparent from a network point of 
> view.
> 
> Pierre



RE: HAProxy reloads lets old and outdated processes

2016-10-14 Thread Pierre Cheynier
Hi Lukas,

> I did not meant no-reuseport to workaround or "solve" the problem 
definitely, but rather to see if the problems can still be triggered, 
since you can reproduce the problem easily.

This still happens using snapshot 20161005 with no-reuseport set, a bit less 
probably because reload is faster.

Here is what I observe after reloading 50 times, waiting 0.1 sec between each: 

$ ps fauxww | tail -9
root 50253  0.1  0.0  46340  1820 ?Ss   10:43   0:00 
/usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p 
/run/haproxy.pid
haproxy  51003  0.0  0.0  78256  9144 ?S10:44   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51000
haproxy  51025  0.3  0.0  78256  9208 ?Ss   10:44   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51000
haproxy  51777  0.0  0.0  78256  9144 ?S10:44   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51771
haproxy  51834  0.3  0.0  78256  9208 ?Ss   10:44   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51771
haproxy  51800  0.0  0.0  78256  9140 ?S10:44   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51785
haproxy  51819  0.3  0.0  78256  9204 ?Ss   10:44   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 51785
haproxy  52083  0.0  0.0  78256  9144 ?S10:47   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 52076
haproxy  52084  0.3  0.0  78256  3308 ?Ss   10:47   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 52076

$ sudo ss -tanp |grep -i listen | grep 80
LISTEN 0  128  *:80   *:*   
users:(("haproxy",pid=52084,fd=8))
LISTEN 0  128  *:8080 *:*   
users:(("haproxy",pid=52084,fd=6))
LISTEN 0  12810.5.6.7:8000 *:*  
 users:(("haproxy",pid=52084,fd=7))

$ head -12 /etc/haproxy/haproxy.cfg
global
 log 127.0.0.1 local0 warning
 log 127.0.0.1 local1 notice
 maxconn 262144
 user haproxy
 group haproxy
 nbproc 1
 chroot /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 stats socket /var/lib/haproxy/stats
 noreuseport

Definitely, some instances seems to be "lost" (not referenced by another) and 
will never be stopped.

In that case it will not impact the config consistency as only one is bound to 
the socket, but the reload is far less transparent from a network point of view.

Pierre



Re: HAProxy reloads lets old and outdated processes

2016-10-13 Thread Lukas Tribus

Hi Pierre,


Am 13.10.2016 um 18:56 schrieb Pierre Cheynier:

This becomes impossible in PaaS-like approach where many events occurs and may trigger 
reloads every seconds. BTW, the new "no-reuseport" feature does not help in my 
case (as well as ip/nftables or tc workarounds) because it introduces latencies spikes 
potentially every second.


I did not meant no-reuseport to workaround or "solve" the problem 
definitely, but rather to see if the problems can still be triggered, 
since you can reproduce the problem easily.


Without SO_REUSEPORT it may come to a hard error earlier in the code 
path, which could be an important information to have.




Regards,
Lukas