Re: Lua: forcing garbage collector after socket i/o
We are running this patch on top of 1.9.13 where it is needed. I will report back if/when we have anything to add. Untill then consider no news as good news in regards to this. Dave. On Tue, Jan 14, 2020 at 9:37 AM Willy Tarreau wrote: > > On Tue, Jan 14, 2020 at 09:31:07AM -0600, Dave Chiluk wrote: > > Can we get this backported onto the 2.0 and 1.9 stable streams? It > > looks like it mostly cleanly patches. *(aside from line numbers). > > Given that the risk of regression is far from zero (hence the tag "medium"), > I'd rather avoid for a while and observe instead. Very few users will notice > an improvement, maybe only two, but every Lua user would have to accept the > risk of a possible regression, so care is mandatory. We'd do it to 2.1 first, > and after a few releases possibly to 2.0 if there is compelling demand for > this. By then 1.9 will likely be dead anyway. > > If you're facing a high enough Lua-based connection rate that would make this > a nice improvement to the point where you'd be taking the risk to use the > backport, I think everyone would very much appreciate that you run with this > patch for a while to help confirm it doesn't break anything. > > Thanks, > Willy
Re: Lua: forcing garbage collector after socket i/o
Can we get this backported onto the 2.0 and 1.9 stable streams? It looks like it mostly cleanly patches. *(aside from line numbers). Thanks, Dave On Tue, Jan 14, 2020 at 3:49 AM Willy Tarreau wrote: > > On Mon, Jan 13, 2020 at 10:11:57AM -0800, Sadasiva Gujjarlapudi wrote: > > Sounds good to me. > > Thank you so much once again. > > OK now merged. Thanks guys! > > Willy >
Re: Haproxy nbthreads + multi-threading lua?
After a bit more research I discovered that the lua scripts are actually from signal sciences. You should have a conversation with Signal Sciences, and how they are doing ingress capture that goes through Haproxy. https://docs.signalsciences.net/install-guides/other-modules/haproxy-module/ Dave. p.s. Yes we did meet at KubeCon, and I really appreciated your suggestions on healthchecking. I just haven't had a chance to check/test them because of higher priority issues that have arisen *(isn't this always the case). And no this isn't even one fo those higher priority issues. On Wed, Dec 11, 2019 at 2:35 AM Baptiste wrote: > > On Mon, Dec 2, 2019 at 5:15 PM Dave Chiluk wrote: >> >> Since 2.0 nbproc and nbthreads are now mutually exclusive, are there >> any ways to make lua multi-threaded? >> >> One of our proxy's makes heavy use of lua scripting. I'm not sure if >> this is still the case, but in earlier versions of HAProxy lua was >> single threaded per process. Because of this we were running that >> proxy with nbproc=4, and nbthread=4. This allowed us to scale without >> being limited by lua. >> >> Has lua single-threaded-ness now been solved? Are there other options >> I should be aware of related to that? What's the preferred way around >> this? >> >> Thanks, >> Dave. >> > > Hi Dave, > (I think we met at kubecon) > > What's your use case for Lua exactly? > Can't it be replaced by SPOE at some point? (which is compatible with > nbthread and can run heavy processing outside of the HAProxy process)? > > You can answer me privately if you don't want such info to be public. > > Baptiste
Haproxy nbthreads + multi-threading lua?
Since 2.0 nbproc and nbthreads are now mutually exclusive, are there any ways to make lua multi-threaded? One of our proxy's makes heavy use of lua scripting. I'm not sure if this is still the case, but in earlier versions of HAProxy lua was single threaded per process. Because of this we were running that proxy with nbproc=4, and nbthread=4. This allowed us to scale without being limited by lua. Has lua single-threaded-ness now been solved? Are there other options I should be aware of related to that? What's the preferred way around this? Thanks, Dave.
Re: Status of 1.5 ?
Ubuntu 16.04 is on 1.6 which is bug-fix "supported" till 2021. It's probably fine to deprecate next year. Ubuntu 18.04 is on 1.8 which is bug-fix "supported" till 2023. Debian has 1.8 in their stable and 2.0.9 in unstable, but I'm not as familiar with their release cycles. RHEL/Centos 7 haproxy package is on 1.5, but they've also provided a rh-haproxy18 which provides 1.8. AFAICT from a distro perspective you are pretty good to kill off 1.5. Dave. FYI, I'm an Ubuntu Dev if you ever need one. On Tue, Nov 26, 2019 at 7:00 AM Willy Tarreau wrote: > > Hi Vincent, > > On Tue, Nov 26, 2019 at 01:33:30PM +0100, Vincent Bernat wrote: > > ? 25 octobre 2019 11:27 +02, Willy Tarreau : > > > > > Now I'm wondering, is anyone interested in this branch to still be > > > maintained ? Should I emit a new release with a few pending fixes > > > just to flush the pipe and pursue its "critical fixes only" status a > > > bit further, or should we simply declare it unmaintained ? I'm fine > > > with either option, it's just that I hate working for no reason, and > > > this version was released a bit more than 5 years ago now, so I can > > > easily expect that it has few to no user by now. > > > > > > Please just let me know what you think, > > > > What's the conclusion? :) > > Oh you're right, I wanted to mention it yesterday but the e-mail delivery > issues derailed my focus a bit... > > So it looks like the most reasonable thing to do is to drop it at the end > of this year, or exactly 3 years after the last update to the branch! I > don't expect it to require any new fix at all to be honest. Those using > it for SSL should really upgrade to something more recent, at least to > benefit from more recent openssl versions (1.0.1 was probably the last > supported one) and those who don't need SSL likely didn't even upgrade > to 1.5 anyway ;-) > > So we could say that if anything really critical must happen to 1.5, it > must happen within one month for it to get a fix and after that it's too > late. > > Cheers, > Willy >
Re: Increase in sockets in TIME_WAIT with 1.9.x
I was able to bisect this down to 53216e7 being the problematic commit, when using calls to setsockopt(... SO_LINGER ...) as the test metric. I used the number of calls to setsockopt with SO_LINGER in them using the following command. $ sudo timeout 60s strace -e setsockopt,close -p $(ps -lf -C haproxy | tail -n 1 | awk -e '{print $4}') 2>&1 | tee 1.9-${V} ; grep LINGER 1.9-${V} | wc -l 53216e7 = 1 81a15af6b = 69 Interesting to note is that 1.8.17 only has roughly 17. I'll see if I can do a bisection for that tomorrow. Hope that helps. Dave. On Thu, Jun 13, 2019 at 3:30 PM Willy Tarreau wrote: > On Thu, Jun 13, 2019 at 03:20:20PM -0500, Dave Chiluk wrote: > > I've attached an haproxy.cfg that is as minimal as I felt comfortable. > (...) > > many thanks for this, Dave, I truly appreciate it. I'll have a look at > it hopefully tomorrow morning. > > Willy >
Re: Increase in sockets in TIME_WAIT with 1.9.x
I've attached an haproxy.cfg that is as minimal as I felt comfortable. We are using admin sockets for dynamic configuration of backends so left the server-templating in, but no other application was configured to orchestrate haproxy at the time of testing. I've also attached output from $ sudo timeout 60s strace -e setsockopt,close -p $(ps -lf -C haproxy | tail -n 1 | awk -e '{print $4}') 2>&1 | tee 1.8.17 Which shows the significant decrease in setting of SO_LINGER. I guess I lied earlier when I said there were none, but over 60s it looks like I 1.9.8 had 1/17 the number of SO_LINGER setsockopt calls vs 1.8.17. Unfortunately the number of sockets sitting in TIME_WAIT fluctuates to the point where there's not a great metric to use. Looking at SO_LINGER settings does appear to be consistent though. I bet if I spawned 700 backend instances instead of 7 it would be more pronounced. I got perf stack traces for setsockopt from both versions on our production servers, but inlining made those traces mostly useless. Let me know if there's anything else i can grab. Dave. On Thu, Jun 13, 2019 at 1:30 AM Willy Tarreau wrote: > On Wed, Jun 12, 2019 at 12:08:03PM -0500, Dave Chiluk wrote: > > I did a bit more introspection on our TIME_WAIT connections. The > increase > > in sockets in TIME_WAIT is definitely from old connections to our backend > > server instances. Considering the fact that this server is doesn't > > actually serve real traffic we can make a reasonable assumptions that > this > > is almost entirely due to increases in healthchecks. > > Great! > > > Doing an strace on haproxy 1.8.17 we see > > > > sudo strace -e setsockopt,close -p 15743 > > strace: Process 15743 attached > > setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > > setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 > > close(17) = 0 > > > > > > Doing the same strace on 1.9.8 we see > > > > sudo strace -e setsockopt,close -p 6670 > > strace: Process 6670 attached > > setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > > close(4)= 0 > > > > > > The calls to setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, > 8) > > = 0 > > appear to be missing. > > Awesome, that's exactly the info I was missing. I suspected that for > whatever reason the lingering was not disabled, at least now we have > a proof of this! Now the trick is to figure why :-/ > > > We are running centos 7 with kernel 3.10.0-957.1.3.el7.x86_64. > > OK, and with the setsockopt it should behave properly. > > > I'll keep digging into this, and see if I can get stack traces that > result > > in teh setsockopt calls on 1.8.17 so the stack can be more closely > > inspected. > > Don't worry for this now, this is something we at least need to resolve > before issuing 2.0 or it will cause some trouble. Then we'll backport the > fix once the cause is figured out. > > However when I try here I don't have the problem, either in 1.9.8 or > 2.0-dev7 : > > 08:27:30.212570 connect(14, {sa_family=AF_INET, sin_port=htons(9003), > sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > 08:27:30.212590 recvfrom(14, NULL, 2147483647, > MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > 08:27:30.212610 setsockopt(14, SOL_SOCKET, SO_LINGER, {l_onoff=1, > l_linger=0}, 8) = 0 > 08:27:30.212630 close(14) = 0 > 08:27:30.212659 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, > tv_nsec=6993282}) = 0 > > So it must depend on the type of check. Could you please share a > minimalistic > config that reproduces this ? > > Thanks, > Willy > 1.8.17 Description: Binary data domain_map Description: Binary data 1.9.8 Description: Binary data haproxy.cfg Description: Binary data
Re: Increase in sockets in TIME_WAIT with 1.9.x
I did a bit more introspection on our TIME_WAIT connections. The increase in sockets in TIME_WAIT is definitely from old connections to our backend server instances. Considering the fact that this server is doesn't actually serve real traffic we can make a reasonable assumptions that this is almost entirely due to increases in healthchecks. Doing an strace on haproxy 1.8.17 we see sudo strace -e setsockopt,close -p 15743 strace: Process 15743 attached setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 close(17) = 0 Doing the same strace on 1.9.8 we see sudo strace -e setsockopt,close -p 6670 strace: Process 6670 attached setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 close(4)= 0 The calls to setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 appear to be missing. We are running centos 7 with kernel 3.10.0-957.1.3.el7.x86_64. I'll keep digging into this, and see if I can get stack traces that result in teh setsockopt calls on 1.8.17 so the stack can be more closely inspected. Thanks for any help, Dave On Tue, Jun 11, 2019 at 2:29 AM Willy Tarreau wrote: > On Mon, Jun 10, 2019 at 04:01:27PM -0500, Dave Chiluk wrote: > > We are in the process of evaluating upgrading to 1.9.8 from 1.8.17, > > and we are seeing a roughly 70% increase in sockets in TIME_WAIT on > > our haproxy servers with a mostly idle server cluster > > $ sudo netstat | grep 'TIME_WAIT' | wc -l > > Be careful, TIME_WAIT on the frontend is neither important nor > representative of anything, only the backend counts. > > > Looking at the source/destination of this it seems likely that this > > comes from healthchecks. We also see a corresponding load increase on > > the backend applications serving the healthchecks. > > It's very possible and problematic at the same time. > > > Checking the git logs for healthcheck was unfruitful. Any clue what > > might be going on? > > Normally we make lots of efforts to close health-check responses with > a TCP RST (by disabling lingering before closing). I don't see why it > wouldn't be done here. What OS are you running on and what do your > health checks look like in the configuration ? > > Thanks, > Willy >
Increase in sockets in TIME_WAIT with 1.9.x
We are in the process of evaluating upgrading to 1.9.8 from 1.8.17, and we are seeing a roughly 70% increase in sockets in TIME_WAIT on our haproxy servers with a mostly idle server cluster $ sudo netstat | grep 'TIME_WAIT' | wc -l Looking at the source/destination of this it seems likely that this comes from healthchecks. We also see a corresponding load increase on the backend applications serving the healthchecks. Checking the git logs for healthcheck was unfruitful. Any clue what might be going on? Thanks, Dave.
Re: What to look out for when going from 1.6 to 1.8?
We have the same use case as Alex *(mesos load balancing), and also confirm that our config worked without change 1.6->1.8. Given our testing you should consider the seamless reload -x option, and the dynamic server configuration apis. Both have greatly alleviated issues we've faced in our microservices-based cloud. Dave. On Mon, Jul 16, 2018 at 8:47 AM Alex Evonosky wrote: > Tim- > > I can speak from a production point of view that we had HAproxy on the 1.6 > branch inside docker containers for mesos load balancing with pretty much > the same requirements as you spoke of. After compiling Haproxy to the 1.8x > branch the same config worked without issues. > > -Alex > > > On Mon, Jul 16, 2018 at 9:39 AM, Tim Verhoeven > wrote: > >> Hello all, >> >> We have been running the 1.6 branch of HAProxy, without any issues, for a >> while now. And reading the updates around 1.8 here in the mailing list it >> looks like its time to upgrade to this branch. >> >> So I was wondering if there are any things I need to look of for when >> doing this upgrade? We are not doing anything special with HAProxy (I >> think). We run it as a single process, we use SSL/TLS termination, some >> ACL's and a bunch of backends. We only use HTTP 1.1 and TCP connections. >> >> From what I've been able to gather my current config will works just as >> good with 1.8. But some extra input from all the experts here is always >> appreciated. >> >> Thanks, >> Tim >> > >
[PATCH] [MINOR] Some spelling cleanup in the comments.
Signed-off-by: Dave Chiluk --- include/common/cfgparse.h | 2 +- src/session.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h index c003bd3b0..6e35bc948 100644 --- a/include/common/cfgparse.h +++ b/include/common/cfgparse.h @@ -92,7 +92,7 @@ int parse_process_number(const char *arg, unsigned long *proc, int *autoinc, cha /* * Sends a warning if proxy does not have at least one of the - * capabilities in . An optionnal may be added at the end + * capabilities in . An optional may be added at the end * of the warning to help the user. Returns 1 if a warning was emitted * or 0 if the condition is valid. */ diff --git a/src/session.c b/src/session.c index c1bd2d6b5..ae2d9e1d9 100644 --- a/src/session.c +++ b/src/session.c @@ -114,11 +114,11 @@ static void session_count_new(struct session *sess) } /* This function is called from the protocol layer accept() in order to - * instanciate a new session on behalf of a given listener and frontend. It + * instantiate a new session on behalf of a given listener and frontend. It * returns a positive value upon success, 0 if the connection can be ignored, * or a negative value upon critical failure. The accepted file descriptor is * closed if we return <= 0. If no handshake is needed, it immediately tries - * to instanciate a new stream. The created connection's owner points to the + * to instantiate a new stream. The created connection's owner points to the * new session until the upper layers are created. */ int session_accept_fd(struct listener *l, int cfd, struct sockaddr_storage *addr) -- 2.17.1
Re: [PATCH] [MINOR] Some spelling cleanup in comments.
I'm sorry I just realized I applied this against the 1.8 stable. I'll send another patch for 1.9. On Thu, Jun 21, 2018 at 10:55 AM Dave Chiluk wrote: > Some spelling cleanup in comments. > > Signed-off-by: Dave Chiluk > --- > include/common/cfgparse.h | 2 +- > include/types/task.h | 2 +- > src/session.c | 4 ++-- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h > index c3355ca4..3022b8d8 100644 > --- a/include/common/cfgparse.h > +++ b/include/common/cfgparse.h > @@ -90,7 +90,7 @@ int parse_process_number(const char *arg, unsigned long > *proc, int *autoinc, cha > > /* > * Sends a warning if proxy does not have at least one of the > - * capabilities in . An optionnal may be added at the end > + * capabilities in . An optional may be added at the end > * of the warning to help the user. Returns 1 if a warning was emitted > * or 0 if the condition is valid. > */ > diff --git a/include/types/task.h b/include/types/task.h > index 991e3a46..ac8c4339 100644 > --- a/include/types/task.h > +++ b/include/types/task.h > @@ -64,7 +64,7 @@ struct notification { > struct task { > struct eb32sc_node rq; /* ebtree node used to hold the > task in the run queue */ > unsigned short state; /* task state : bit field of > TASK_* */ > - unsigned short pending_state; /* pending states for running talk > */ > + unsigned short pending_state; /* pending states for running task > */ > short nice; /* the task's current nice value > from -1024 to +1024 */ > unsigned int calls; /* number of times ->process() was > called */ > struct task * (*process)(struct task *t); /* the function which > processes the task */ > diff --git a/src/session.c b/src/session.c > index 318c1716..898dbaab 100644 > --- a/src/session.c > +++ b/src/session.c > @@ -114,11 +114,11 @@ static void session_count_new(struct session *sess) > } > > /* This function is called from the protocol layer accept() in order to > - * instanciate a new session on behalf of a given listener and frontend. > It > + * instantiate a new session on behalf of a given listener and frontend. > It > * returns a positive value upon success, 0 if the connection can be > ignored, > * or a negative value upon critical failure. The accepted file > descriptor is > * closed if we return <= 0. If no handshake is needed, it immediately > tries > - * to instanciate a new stream. The created connection's owner points to > the > + * to instantiate a new stream. The created connection's owner points to > the > * new session until the upper layers are created. > */ > int session_accept_fd(struct listener *l, int cfd, struct > sockaddr_storage *addr) > -- > 2.17.1 > >
[PATCH] [MINOR] Some spelling cleanup in comments.
Some spelling cleanup in comments. Signed-off-by: Dave Chiluk --- include/common/cfgparse.h | 2 +- include/types/task.h | 2 +- src/session.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/common/cfgparse.h b/include/common/cfgparse.h index c3355ca4..3022b8d8 100644 --- a/include/common/cfgparse.h +++ b/include/common/cfgparse.h @@ -90,7 +90,7 @@ int parse_process_number(const char *arg, unsigned long *proc, int *autoinc, cha /* * Sends a warning if proxy does not have at least one of the - * capabilities in . An optionnal may be added at the end + * capabilities in . An optional may be added at the end * of the warning to help the user. Returns 1 if a warning was emitted * or 0 if the condition is valid. */ diff --git a/include/types/task.h b/include/types/task.h index 991e3a46..ac8c4339 100644 --- a/include/types/task.h +++ b/include/types/task.h @@ -64,7 +64,7 @@ struct notification { struct task { struct eb32sc_node rq; /* ebtree node used to hold the task in the run queue */ unsigned short state; /* task state : bit field of TASK_* */ - unsigned short pending_state; /* pending states for running talk */ + unsigned short pending_state; /* pending states for running task */ short nice; /* the task's current nice value from -1024 to +1024 */ unsigned int calls; /* number of times ->process() was called */ struct task * (*process)(struct task *t); /* the function which processes the task */ diff --git a/src/session.c b/src/session.c index 318c1716..898dbaab 100644 --- a/src/session.c +++ b/src/session.c @@ -114,11 +114,11 @@ static void session_count_new(struct session *sess) } /* This function is called from the protocol layer accept() in order to - * instanciate a new session on behalf of a given listener and frontend. It + * instantiate a new session on behalf of a given listener and frontend. It * returns a positive value upon success, 0 if the connection can be ignored, * or a negative value upon critical failure. The accepted file descriptor is * closed if we return <= 0. If no handshake is needed, it immediately tries - * to instanciate a new stream. The created connection's owner points to the + * to instantiate a new stream. The created connection's owner points to the * new session until the upper layers are created. */ int session_accept_fd(struct listener *l, int cfd, struct sockaddr_storage *addr) -- 2.17.1
Re: Truly seamless reloads
The patches are all cherry picks from the 1.8 branch that i backported to the 1.7 branch. They are all documented with the original development tree sha as well. Have fun Dave. On Fri, Jun 1, 2018, 6:16 AM Veiko Kukk wrote: > On 31/05/18 23:15, William Lallemand wrote: > > Sorry but unfortunately we are not backporting features in stable > branches, > > those are only meant for maintenance. > > > > People who want to use the seamless reload should migrate to HAProxy > 1.8, the > > stable team won't support this feature in previous branches. > > > I've been keeping eye on this list about 1.8 related bugs and it does > not seem to me that 1.8 stable enough yet for production use. Too many > reports about high CPU usage and/or crashes. > We are still using 1.6 which finally seems to have stabilized enough for > production. When we started using 1.6 some years ago, we had many issues > with it which caused service interruptions. Would not want to repeat > that again. > > Even with 1.7, processes would hang forever after reload (days, > sometimes weeks or until reboot). Really hard to debug, happens only > under production load. > > I will look at patches provided by Dave. We are building HAproxy rpm-s > for ourselves anyway, applying some patches in spec file does not seem > to be that much additional work if indeed those would provide truly > seamless reloads. > > Best regards, > Veiko > >
Re: Truly seamless reloads
I backported the necessary patchset for seamless reloads on top of 1.7.9 a while back. It was used in production without issue for quite some time. I just rebased those patches on top of haproxy-1.7 development and pushed the result to seamless_reload branch that I pushed to github. They apply cleanly, but I have not built or tested them, nor do I have the time to do so at the moment. https://github.com/chiluk/haproxy-1.7 I'm also attached the patchset for completeness. Happy reloading. I think the 1.7 maintainer should pick these patches up, as the hard work has already been done. Dave. On Mon, Apr 30, 2018 at 4:26 AM William Lallemand wrote: > On Mon, Apr 30, 2018 at 10:35:37AM +0300, Veiko Kukk wrote: > > On 26/04/18 17:11, Veiko Kukk wrote: > > > Hi, > > > > > > According to > > > > https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ > > > : > > > > > > "The patchset has already been merged into the HAProxy 1.8 development > > > branch and will soon be backported to HAProxy Enterprise Edition 1.7r1 > > > and possibly 1.6r2." > > > > > > Has it been backported to 1.7 and/or 1.6? > > > > > > If yes, then should seamless reload also work with multiprocess > > > configurations? (nbproc > 1). > > > > Can i assume the answer is no for both questions? > > > > > > Veiko > > > > Hello Veiko, > > Indeed, the seamless reload is only available since HAProxy 1.8. > > It supports multiprocess configuration. > > > -- > William Lallemand > > From cd0e6748ad7ce13ff9db07b7e32e56a0c77f1afe Mon Sep 17 00:00:00 2001 From: William Lallemand Date: Fri, 26 May 2017 18:19:55 +0200 Subject: [PATCH 10/10] MEDIUM: proxy: zombify proxies only when the expose-fd socket is bound When HAProxy is running with multiple processes and some listeners arebound to processes, the unused sockets were not closed in the other processes. The aim was to be able to send those listening sockets using the -x option. However to ensure the previous behavior which was to close those sockets, we provided the "no-unused-socket" global option. This patch changes this behavior, it will close unused sockets which are not in the same process as an expose-fd socket, making the "no-unused-socket" option useless. The "no-unused-socket" option was removed in this patch. (cherry picked from commit 7f80eb2383bb54ddafecf0e7df6b3b3ef4b4f6e5) Signed-off-by: Dave Chiluk --- doc/configuration.txt | 7 --- src/cfgparse.c| 5 - src/haproxy.c | 19 ++- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 9bb9cb9..980b253 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -587,7 +587,6 @@ The following keywords are supported in the "global" section : - nosplice - nogetaddrinfo - noreuseport - - no-unused-socket - spread-checks - server-state-base - server-state-file @@ -1250,12 +1249,6 @@ noreuseport Disables the use of SO_REUSEPORT - see socket(7). It is equivalent to the command line argument "-dR". -no-unused-socket - By default, each haproxy process keeps all sockets opened, event those that - are only used by another processes, so that any process can provide all the - sockets, to make reloads seamless. This option disables this, and close all - unused sockets, to save some file descriptors. - spread-checks <0..50, in percent> Sometimes it is desirable to avoid sending agent and health checks to servers at exact intervals, for instance when many logical servers are diff --git a/src/cfgparse.c b/src/cfgparse.c index be21088..8c0906b 100644 --- a/src/cfgparse.c +++ b/src/cfgparse.c @@ -671,11 +671,6 @@ int cfg_parse_global(const char *file, int linenum, char **args, int kwm) goto out; global.tune.options &= ~GTUNE_USE_REUSEPORT; } - else if (!strcmp(args[0], "no-unused-socket")) { - if (alertif_too_many_args(0, file, linenum, args, &err_code)) - goto out; - global.tune.options &= ~GTUNE_SOCKET_TRANSFER; - } else if (!strcmp(args[0], "quiet")) { if (alertif_too_many_args(0, file, linenum, args, &err_code)) goto out; diff --git a/src/haproxy.c b/src/haproxy.c index 2091573..f7605e0 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -975,7 +975,6 @@ void init(int argc, char **argv) #if defined(SO_REUSEPORT) global.tune.options |= GTUNE_USE_REUSEPORT; #endif - global.tune.options |= GTUNE_SOCKET_TRANSFER; pid = getpid(); progname = *argv; @@ -2306,6 +2305,24 @@ int main(int argc, char **argv) exit(0); /* parent must leave */ } + /* pass through every cli socket, and check if it's bound to + * the
Re: remaining process after (seamless) reload
We've battled the same issue with our haproxys. We root caused it to slow dns lookup times while parsing the config was causing haproxy config parsing to take so long that we were attempting to reload again before the original reload had completed. I'm still not sure why or where the Signals are getting dropped to the old haproxy, but we found by installing a dns cache on our haproxy nodes we were able to greatly decrease the likelihood of creating zombie haproxy instances. We further improved on that by rearchitecting our micro-service architecture to make use of the haproxy dynamic scaling apis, and allocating dummy slots for future expansion. Similar to https://www.haproxy.com/blog/dynamic-scaling-for-microservices-with-runtime-api/ . Good luck, I hope that's the answer to your problem. Dave. On Tue, May 29, 2018 at 10:12 AM William Dauchy wrote: > Hello William, > > Sorry for the last answer. > > > Are the problematical workers leaving when you reload a second time? > > no, they seems to stay for a long time (forever?) > > > Did you try to kill -USR1 the worker ? It should exits with "Former > worker $PID > > exited with code 0" on stderr. > > If not, could you check the Sig* lines in /proc/$PID/status for this > worker? > > will try. I need to put the setup back in shape, and maybe test > without multiple binding. > > > Do you know how much time take haproxy to load its configuration, and do > you > > think you tried a reload before it finished to parse and load the config? > > Type=notify in your systemd unit file should help for this case. If I > remember > > well it checks that the service is 'ready' before trying to reload. > > We are using Type=notify. I however cannot guarantee we do not trigger > a new reload, before the previous one is done. Is there a way to check > the "ready" state you mentioned? > (We are talking about a reload every 10 seconds maximum though) > > > I suspect the SIGUSR1 signal is not received by the worker, but I'm not > sure > > either if it's the master that didn't send it or if the worker blocked > it. > > good to know. > > Best, > -- > William > >
Re: haproxy startup at boot too quick
Assuming you are running an Ubuntu archive version of haproxy you should consider opening a bug in launchpad as well. https://launchpad.net/ubuntu/+source/haproxy/+filebug It sounds like there's a missing dependency in the unit file against DNS or network, but I haven't looked into it other than what you've mentioned here. Dave. On Mon, May 7, 2018 at 7:57 PM Bill Waggoner wrote: > On Mon, May 7, 2018 at 8:44 PM Kevin Decherf wrote: > >> Hello, >> >> On 8 May 2018 02:32:01 CEST, Bill Waggoner wrote: >> >> >Anyway, when the system boots haproxy fails to start. Unfortunately I >> >forgot to save the systemctl status message but the impression I get is >> >that it's starting too soon. >> >> You can find all past logs of your service using `journalctl -u >> haproxy.service`. If journal persistence is off you'll not be able to look >> at logs sent before the last boot. >> >> >> -- >> Sent from my mobile. Please excuse my brevity. >> > > Thank you, that was very helpful. I am new to systemd so please forgive my > lack of knowledge. > > Looking at the messages it looks like one server was failing to start. > That one happens to have a name instead of a static address in the server > definition. My guess is that DNS isn't available yet when haproxy was > starting and the retries are so quick that it didn't have time to recover. > > I'll simply change that to a literal IP address as all the others are. > > Thanks! > > Bill Waggoner > -- > Bill Waggoner > ad...@greybeard.org > {Even Old Dogs can learn new tricks!} >
Re: Health Checks not run before attempting to use backend
Well after having read your thread that's disappointing. An alternative solution to forcing healthchecks before the bind It would be nice to have an option to initially start all servers in the down state unless explicitly loaded as up via a "show servers state /load-server-state-from-file" option. Additionally, in a "seamless reload" configuration as we are using, would it be possible for the new haproxy to complete a healthcheck on backends after it has bound to the socket, but before it has signaled the old haproxy, or am I missing another gotcha there. Also we are doing all this using 1.8.7. Thanks, Dave On Fri, Apr 13, 2018 at 12:35 PM Jonathan Matthews wrote: > On Fri, 13 Apr 2018 at 00:01, Dave Chiluk > wrote: > >> Is there a way to force haproxy to not use a backend until it passes a >> healthcheck? I'm also worried about the side affects this might cause as >> requests start to queue up in the haproxy >> > > I asked about this in 2014 ("Current solutions to the > soft-restart-healthcheck-spread problem?") and I don't recall seeing a fix > since then. Very interested in whatever you find out! > > > J > >> -- > Jonathan Matthews > London, UK > http://www.jpluscplusm.com/contact.html >
Health Checks not run before attempting to use backend
Hi we're evaluating haproxy for use as the load balancer in front of our mesos cluster. What we are finding is that even though we have requested the check option in the server line, haproxy attempts to serve traffic to the server on startup until the first healthcheck completes. server slot1 10.40.40.2:7070 check inter 1000 rise 3 fall 3 maxconn 32 This is because we are adding servers to haproxy as they are started in mesos, but before our backend application itself is ready to serve connections. This results in spurious 503's being handed to clients as we add backends via the admin socket or haproxy restart. I looked into possibly forcing a healthcheck during the cfgparse constructors, but that seems like it would require some significant rearchitecting. Is there a way to force haproxy to not use a backend until it passes a healthcheck? I'm also worried about the side affects this might cause as requests start to queue up in the haproxy. Thanks, Dave
Seamless reloads and init scripts, and nbproc > 1
I'm trying to write what amounts to an init/startup script for haproxy with a patched version of 1.7.8 that includes the seamless reload patches that are described on this blog post. https://www.haproxy.com/blog/truly-seamless-reloads-with- haproxy-no-more-hacks/ #1. If haproxy dies or was killed for some reason the stats socket still exists, and when you try to relaunch haproxy with the -x option you get [ALERT] 249/165956 (2750) : Failed to get the sockets from the old process! It's not impossible, but it's pretty messy to determine if the stats socket has a valid old process listening on it when trying to relaunch/reload haproxy. Is there a solution for this that I'm not seeing? Otherwise when you first launch haproxy you have to do so without the -x and then later have to conditionally include it, and then check to see if you succeeded. Here's an excerpt from a bash init script as an example of the pain I'm going through. unset RELOADSOCK if [ -e "${STATSFILE}" ] ; then RELOADSOCK="-x ${STATSFILE}" sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE $RELOADSOCK -p $HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE) if [ $? == 1 ] ; then # We likely had difficulty reading the stats file. Delete it and run normally. rm ${STATSFILE} sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE -p $HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE) fi else sudo -u haproxy -g haproxy haproxy -f $HAPROXY_CONFIG_FILE $RELOADSOCK -p $HAPROXY_PID_FILE -sf $(cat $HAPROXY_PID_FILE) fi Other than that, I have seen no ill effects yet when using the -x for passing, and I can confirm that it has resolved some disconnects. Thanks, Dave. p.s. The above script is not for Ubuntu, but for my day job.