ERR 20: Auth Rejected Credentials (client should begin new session)

2018-04-16 Thread TomK

Hey All,

Wondering if anyone seen this message from a tcpdump of a simple mount 
session:


psql01: mount nfs-c01:/n /m

Yields this message

ERR 20: Auth Rejected Credentials (client should begin new session)

and the mount attempt never exits and never mounts /m .  nfs-c01 is a 
VIP that's serviced by HAproxy / keepalived.  Using one of the 
underlying hosts works and mounts succeeds.


--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.




Re: [PATCH] BUG/MINOR: cli: Ensure appctx->ctx.cli.err is always set when using CLI_ST_PRINT_FREE

2018-04-16 Thread Willy Tarreau
On Mon, Apr 16, 2018 at 07:19:15PM +0200, Aurélien Nephtali wrote:
> Hello Willy (not being rude this time :p),

Great, now applied, thank you!

Willy
PS: you were not rude (or I didn't sense it at least)



Re: [PATCH] BUG/MINOR: cli: Ensure appctx->ctx.cli.err is always set when using CLI_ST_PRINT_FREE

2018-04-16 Thread Aurélien Nephtali
Hello Willy (not being rude this time :p),

On Mon, Apr 16, 2018 at 05:01:18PM +0200, Willy Tarreau wrote:
> I agree on the principle, but memprintf(, "foo") will set err to NULL
> if there's no more memory. And I personally care a lot about staying rock
> solid even under harsh memory conditions, because it's always when you have
> the most visitors on your site that you have the least memory left and you
> don't want so many witnesses of your lack of RAM. That's why I'm thinking
> that the "out of memory" error message could more or less serve as a real
> indicator of what happened and as a motivation for developers never to use
> it by default (while "internal error" could be tempting on a lazy day).
> 

Here are the two patches with the changes you proposed.

Thanks !

-- 
Aurélien.
>From 29719bded4ff2b96cb4ac258373c01f0be18428b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien=20Nephtali?= 
Date: Mon, 16 Apr 2018 18:50:19 +0200
Subject: [PATCH 1/2] BUG/MINOR: cli: Guard against NULL messages when using
 CLI_ST_PRINT_FREE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some error paths (especially those followed when running out of memory)
can set the error message to NULL. In order to avoid a crash, use a
generic message ("Out of memory") when this case arises.

It should be backported to 1.8.

Signed-off-by: Aurélien Nephtali 
---
 src/cli.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/cli.c b/src/cli.c
index 018d508d3..965709ec8 100644
--- a/src/cli.c
+++ b/src/cli.c
@@ -625,14 +625,20 @@ static void cli_io_handler(struct appctx *appctx)
 else
 	si_applet_cant_put(si);
 break;
-			case CLI_ST_PRINT_FREE:
-if (cli_output_msg(res, appctx->ctx.cli.err, LOG_ERR, cli_get_severity_output(appctx)) != -1) {
+			case CLI_ST_PRINT_FREE: {
+const char *msg = appctx->ctx.cli.err;
+
+if (!msg)
+	msg = "Out of memory.\n";
+
+if (cli_output_msg(res, msg, LOG_ERR, cli_get_severity_output(appctx)) != -1) {
 	free(appctx->ctx.cli.err);
 	appctx->st0 = CLI_ST_PROMPT;
 }
 else
 	si_applet_cant_put(si);
 break;
+			}
 			case CLI_ST_CALLBACK: /* use custom pointer */
 if (appctx->io_handler)
 	if (appctx->io_handler(appctx)) {
-- 
2.11.0

>From a9b9825d3b6257ccd42d2b56827e23fa91d4768c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien=20Nephtali?= 
Date: Mon, 16 Apr 2018 19:02:42 +0200
Subject: [PATCH 2/2] MINOR: cli: Ensure the CLI always outputs an error when
 it should
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When using the CLI_ST_PRINT_FREE state, always output something back
if the faulty function did not fill the 'err' variable.
The map/acl code could lead to a crash whereas the SSL code was silently
failing.

Signed-off-by: Aurélien Nephtali 
---
 src/map.c  | 38 --
 src/ssl_sock.c |  5 +
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/src/map.c b/src/map.c
index 9313dc87e..7953c2a0b 100644
--- a/src/map.c
+++ b/src/map.c
@@ -723,15 +723,21 @@ static int cli_parse_set_map(char **args, struct appctx *appctx, void *private)
 return 1;
 			}
 
-			/* Try to delete the entry. */
+			/* Try to modify the entry. */
 			err = NULL;
 			HA_SPIN_LOCK(PATREF_LOCK, >ctx.map.ref->lock);
 			if (!pat_ref_set_by_id(appctx->ctx.map.ref, ref, args[4], )) {
 HA_SPIN_UNLOCK(PATREF_LOCK, >ctx.map.ref->lock);
-if (err)
+if (err) {
 	memprintf(, "%s.\n", err);
-appctx->ctx.cli.err = err;
-appctx->st0 = CLI_ST_PRINT_FREE;
+	appctx->ctx.cli.err = err;
+	appctx->st0 = CLI_ST_PRINT_FREE;
+}
+else {
+	appctx->ctx.cli.severity = LOG_ERR;
+	appctx->ctx.cli.msg = "Failed to update an entry.\n";
+	appctx->st0 = CLI_ST_PRINT;
+}
 return 1;
 			}
 			HA_SPIN_UNLOCK(PATREF_LOCK, >ctx.map.ref->lock);
@@ -744,10 +750,16 @@ static int cli_parse_set_map(char **args, struct appctx *appctx, void *private)
 			HA_SPIN_LOCK(PATREF_LOCK, >ctx.map.ref->lock);
 			if (!pat_ref_set(appctx->ctx.map.ref, args[3], args[4], )) {
 HA_SPIN_UNLOCK(PATREF_LOCK, >ctx.map.ref->lock);
-if (err)
+if (err) {
 	memprintf(, "%s.\n", err);
-appctx->ctx.cli.err = err;
-appctx->st0 = CLI_ST_PRINT_FREE;
+	appctx->ctx.cli.err = err;
+	appctx->st0 = CLI_ST_PRINT_FREE;
+}
+else {
+	appctx->ctx.cli.severity = LOG_ERR;
+	appctx->ctx.cli.msg = "Failed to update an entry.\n";
+	appctx->st0 = CLI_ST_PRINT;
+}
 return 1;
 			}
 			HA_SPIN_UNLOCK(PATREF_LOCK, >ctx.map.ref->lock);
@@ -829,10 +841,16 @@ static int cli_parse_add_map(char **args, struct appctx *appctx, void *private)
 			ret = pat_ref_add(appctx->ctx.map.ref, args[3], NULL, );
 		HA_SPIN_UNLOCK(PATREF_LOCK, 

Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Lukas Tribus
Hello Shawn,



On 16 April 2018 at 17:39, Shawn Heisey  wrote:
> I enabled the admin socket so that I could renew OCSP stapling. As far as I
> understand, it can only be used on the load balancer machine itself, and I
> think this is the only way to renew stapling other than restarting the
> program, which isn't something I want to do.
>
> As for the possible security issue: If somebody were to compromise the back
> end server and the back end server had knowledge about the load balancer

Why would the backend need to have any knowledge about the
load-balancer? You'd adjust your workflow and command the switch from
the load-balancer instead of your backend application, that's it. Your
backend does not need to access the load-balancer in any way.



> then the attacker might have enough information to fiddle with the load
> balancer for *other* things the load balancer is handling that are more
> sensitive.
>
>> I think your original issue may be due to the "retries 1"
>> configuration you have in there. I would recommend removing that.
>
>
> The documentation for 1.5 says the default value for retries is 3.  Wouldn't
> removing it make whatever problems a retry causes *worse*?  If retries are
> bad, then perhaps I should set it to 0.  I have no recollection about why I
> have this setting in the config.  The default/global settings were created
> years ago and don't change much.

Retries are a good thing and the default retries of 3 is a good value.
Changing this to a non-default will have an impact, especially in the
cases where this server is going down or is about to go down.

By removing the "retries" configuration altogether you are using the
default value of 3, which is the recommended configuration.



Regards,
Lukas



Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-16 Thread Olivier Houchard
Hi,

On Mon, Apr 16, 2018 at 03:37:34PM +0200, Olivier Houchard wrote:
> Hi Pieter,
> 
> On Fri, Apr 13, 2018 at 06:50:50AM +, Pi Ba wrote:
> > Using poll (startup with -dk) the request works properly.
> 
> After some discussion with Willy, we came with a solution that may fix your
> problem with kqueue.
> Can you test the attached patch and let me know if it fixes it for you ?
> 
> Thanks !
> 
> Olivier


Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
scanning needlessly the kqueue.

Regards,

Olivier
>From 2229159329ec539c7875943af08c539064dcd76b Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 16 Apr 2018 13:24:48 +0200
Subject: [PATCH] BUG/MEDIUM: When adding new events, provide an output to get
 errors.

When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece9d..4306c4372 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -31,6 +31,7 @@
 /* private data */
 static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
 static THREAD_LOCAL struct kevent *kev = NULL;
+static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write 
the eventlist in
 
 /*
  * kqueue() poller
@@ -43,6 +44,8 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
int updt_idx, en;
int changes = 0;
 
+   timeout.tv_sec  = 0;
+   timeout.tv_nsec = 0;
/* first, scan the update list to find changes */
for (updt_idx = 0; updt_idx < fd_nbupdt; updt_idx++) {
fd = fd_updt[updt_idx];
@@ -81,13 +84,15 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
HA_ATOMIC_OR([fd].polled_mask, tid_bit);
}
}
-   if (changes)
-   kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
+   if (changes) {
+#ifdef EV_RECEIPT
+   kev[0].flags |= EV_RECEIPT;
+#endif
+   kevent(kqueue_fd[tid], kev, changes, kev_out, changes, 
);
+   }
fd_nbupdt = 0;
 
delta_ms= 0;
-   timeout.tv_sec  = 0;
-   timeout.tv_nsec = 0;
 
if (!exp) {
delta_ms= MAX_DELAY_MS;
@@ -194,6 +199,10 @@ REGPRM1 static int _do_init(struct poller *p)
 {
p->private = NULL;
 
+   kev_out = calloc(1, sizeof(struct kevent) * 2 * global.maxsock);
+   if (!kev_out)
+   goto fail_alloc;
+
kqueue_fd[tid] = kqueue();
if (kqueue_fd[tid] < 0)
goto fail_fd;
@@ -203,6 +212,9 @@ REGPRM1 static int _do_init(struct poller *p)
return 1;
 
  fail_fd:
+   free(kev_out);
+   kev_out = NULL;
+fail_alloc:
p->pref = 0;
return 0;
 }
@@ -220,6 +232,10 @@ REGPRM1 static void _do_term(struct poller *p)
 
p->private = NULL;
p->pref = 0;
+   if (kev_out) {
+   free(kev_out);
+   kev_out = NULL;
+   }
 }
 
 /*
@@ -250,6 +266,7 @@ REGPRM1 static int _do_fork(struct poller *p)
return 1;
 }
 
+
 /*
  * It is a constructor, which means that it will automatically be called before
  * main(). This is GCC-specific but it works at least since 2.95.
-- 
2.14.3



Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Shawn Heisey

On 4/16/2018 9:15 AM, Lukas Tribus wrote:

Hello Shawn,

please keep the mailing-list in the loop.


Sorry about that.  Looks like the haproxy list doesn't set a reply-to 
header sending replies to the list.  Most mailing lists I have dealt 
with do this, so just hitting "reply" does the right thing.  I sometimes 
forget to do the "reply list" option.



I don't follow? Why is using a restricted admin socket a security issue?

You are already exposing the admin socket locally in your
configuration on line 16:
stats socket /etc/haproxy/stats.socket level admin

My suggestion was to use that admin interface to send the "set server" command.


I enabled the admin socket so that I could renew OCSP stapling. As far 
as I understand, it can only be used on the load balancer machine 
itself, and I think this is the only way to renew stapling other than 
restarting the program, which isn't something I want to do.


As for the possible security issue: If somebody were to compromise the 
back end server and the back end server had knowledge about the load 
balancer, then the attacker might have enough information to fiddle with 
the load balancer for *other* things the load balancer is handling that 
are more sensitive.



I think your original issue may be due to the "retries 1"
configuration you have in there. I would recommend removing that.


The documentation for 1.5 says the default value for retries is 3.  
Wouldn't removing it make whatever problems a retry causes *worse*?  If 
retries are bad, then perhaps I should set it to 0.  I have no 
recollection about why I have this setting in the config.  The 
default/global settings were created years ago and don't change much.


Thanks,
Shawn




Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Lukas Tribus
Hello Shawn,



please keep the mailing-list in the loop.



On 16 April 2018 at 16:53, Shawn Heisey  wrote:
>> Having said that, you'd be better off setting the server to
>> maintenance mode instead of letting the health check fail (via
>> webinterface or stats socket):
>>
>>
>> http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#9.2-set%20server
>
>
> The back end servers don't know anything about the load balancer.  And since
> the load balancer does send them requests from the Internet, I think it
> would be a potential security issue if it was able to affect the load
> balancer -- that load balancer handles a lot more than just this service.

I don't follow? Why is using a restricted admin socket a security issue?

You are already exposing the admin socket locally in your
configuration on line 16:
stats socket /etc/haproxy/stats.socket level admin

My suggestion was to use that admin interface to send the "set server" command.



> The disable-on-404 setting that Jarno mentioned might do what we need.  I
> will give it a try.  That's very easy to do in my application.

Yes, that may be more elegant depending on the environment, the final
result is the same: to put the server into maintenance mode.



> I have placed a slightly redacted version of my config here:

I think your original issue may be due to the "retries 1"
configuration you have in there. I would recommend removing that.




Regards,
Lukas



Re: [PATCH] BUG/MINOR: cli: Ensure appctx->ctx.cli.err is always set when using CLI_ST_PRINT_FREE

2018-04-16 Thread Aurélien Nephtali
On Mon, Apr 16, 2018 at 4:19 PM, Willy Tarreau  wrote:
> Hi Aurélien,
>
> On Sun, Apr 15, 2018 at 09:58:49AM +0200, Aurélien Nephtali wrote:
>> Hello,
>>
>> Here is a small patch to fix a potential crash when using
>> CLI_ST_PRINT_FREE in an error path in the 'map' code.
>> The problematic part is in the 'add' feature but all other usages have
>> ben modified the same way to be consistent.
>
> Interesting one. In fact, while it does provide a friendlier error message
> to the user, the real issue in my opinion is in the cli_io_handler() where
> it handles CLI_ST_PRINT_FREE, where it's not defensive enough against a
> NULL in appctx->ctx.cli.err. And even with your patch this situation can
> arise if an out of memory condition happens in the final memprintf() of
> the map code.
>
> Thus what I'd suggest would be instead to check for NULL there and to fall
> back to a generic "out of memory" error message (if that makes sense, maybe
> other situations may lead to this, I don't know) as a first patch,

This was my first idea, using "Internal error" or something like that
but I had the feeling it was covering some cases that should be
properly handled.
As it's "internal code" I bet on the fact that it should not happen.

>From what I saw briefly all errors paths fill 'err' but I may have
overlooked some cases.

> then another one which is just a small improvement to make error messages more
> relevant for map and ocsp (which is exactly what your patch does).
>
> I'm just having a small comment below :
>
>> - if (err)
>> + if (err) {
>>   memprintf(, "%s.\n", err);
>> - appctx->ctx.cli.err = err;
>> - appctx->st0 = CLI_ST_PRINT_FREE;
>> +appctx->ctx.cli.err = err;
>> +appctx->st0 = CLI_ST_PRINT_FREE;
>> +}
>> +else {
>> + appctx->ctx.cli.severity = LOG_ERR;
>> + appctx->ctx.cli.msg = "Failed to 
>> update an entry.\n";
>> + appctx->st0 = CLI_ST_PRINT;
>> +}
>>   return 1;
>
> Please be careful above, as you can see, the lines are filled with spaces,
> maybe the code was copy-pasted there (it's the same at other locations).
>

Arg!#@ sorry, I thought I got them all.. My indent rules were reset at
some point and these lines slipped through.
I usually have a rule to show extra tabs when I should use spaces but
not the inverse :).

-- 
Aurélien Nephtali



Re: [PATCH] BUG/MINOR: cli: Ensure appctx->ctx.cli.err is always set when using CLI_ST_PRINT_FREE

2018-04-16 Thread Willy Tarreau
On Mon, Apr 16, 2018 at 04:41:27PM +0200, Aurélien Nephtali wrote:
> On Mon, Apr 16, 2018 at 4:19 PM, Willy Tarreau  wrote:
> > Hi Aurélien,
> >
> > On Sun, Apr 15, 2018 at 09:58:49AM +0200, Aurélien Nephtali wrote:
> >> Hello,
> >>
> >> Here is a small patch to fix a potential crash when using
> >> CLI_ST_PRINT_FREE in an error path in the 'map' code.
> >> The problematic part is in the 'add' feature but all other usages have
> >> ben modified the same way to be consistent.
> >
> > Interesting one. In fact, while it does provide a friendlier error message
> > to the user, the real issue in my opinion is in the cli_io_handler() where
> > it handles CLI_ST_PRINT_FREE, where it's not defensive enough against a
> > NULL in appctx->ctx.cli.err. And even with your patch this situation can
> > arise if an out of memory condition happens in the final memprintf() of
> > the map code.
> >
> > Thus what I'd suggest would be instead to check for NULL there and to fall
> > back to a generic "out of memory" error message (if that makes sense, maybe
> > other situations may lead to this, I don't know) as a first patch,
> 
> This was my first idea, using "Internal error" or something like that
> but I had the feeling it was covering some cases that should be
> properly handled.
> As it's "internal code" I bet on the fact that it should not happen.

I agree on the principle, but memprintf(, "foo") will set err to NULL
if there's no more memory. And I personally care a lot about staying rock
solid even under harsh memory conditions, because it's always when you have
the most visitors on your site that you have the least memory left and you
don't want so many witnesses of your lack of RAM. That's why I'm thinking
that the "out of memory" error message could more or less serve as a real
indicator of what happened and as a motivation for developers never to use
it by default (while "internal error" could be tempting on a lazy day).

> Arg!#@ sorry, I thought I got them all.. My indent rules were reset at
> some point and these lines slipped through.

No problem :-)

Willy



Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Shawn Heisey

On 4/16/2018 6:43 AM, Jarno Huuskonen wrote:

There's also http-check disable-on-404
(http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4.2-http-check%20disable-on-404)

So maybe first set flag that returns 404 on health check and only after
thirty seconds fail the health check.


This looks really promising, but then I saw this in the documentation 
for that option:


"If the server responds 2xx or 3xx again, it will immediately be 
reinserted into the farm."


Is that referring to a 2xx or 3xx on the health check, or a 2xx/3xx on 
external requests already sent to that server?  If it's the former, then 
there's no problem, but if it's the latter, then that isn't what I want 
at all.  My guess about this is that it's the former, but I'd like 
confirmation.


Thanks,
Shawn




Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Shawn Heisey

On 4/16/2018 6:43 AM, Jarno Huuskonen wrote:

There's also http-check disable-on-404
(http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4.2-http-check%20disable-on-404)


I couldn't get this to work at first.  If I put the disable-on-404 
option in the actual back end, it complains like this:


[WARNING] 105/095152 (5379) : config : 'disable-on-404' will be ignored 
for backend 'be-cdn-9000' (requires 'option httpchk').


That makes sense, because I'm using tracking, not actual health checks 
in that back end.  So I moved it to the check back end, and it gave me 
much worse errors:


[ALERT] 105/095234 (7186) : config : backend 'be-cdn-9000', server 
'planet': unable to use chk-cdn-9000/planet fortracking: disable-on-404 
option inconsistency.
[ALERT] 105/095234 (7186) : config : backend 'be-cdn-9000', server 
'hollywood': unable to use chk-cdn-9000/hollywood fortracking: 
disable-on-404 option inconsistency.

[ALERT] 105/095234 (7186) : Fatal errors found in configuration.

Eliminating the "track" config and doing the health checks in the actual 
back end has fixed that.  I need to do some testing to see whether it 
does what I want it to do.


I am curious about why I couldn't use "track".

Thanks,
Shawn




Re: Fix building haproxy 1.8.5 with LibreSSL 2.6.4

2018-04-16 Thread Dmitry Sivachenko

> On 07 Apr 2018, at 17:38, Emmanuel Hocdet  wrote:
> 
> 
> I Andy
> 
>> Le 31 mars 2018 à 16:43, Andy Postnikov  a écrit :
>> 
>> I used to rework previous patch from Alpinelinux to build with latest stable 
>> libressl
>> But found no way to run tests with openssl which is primary library as I see
>> Is it possible to accept the patch upstream or get review on it? 
>> 
>> 
> 
> 
> @@ -2208,7 +2223,7 @@
> #else
>   cipher = SSL_CIPHER_find(ssl, cipher_suites);
> #endif
> - if (cipher && SSL_CIPHER_get_auth_nid(cipher) == 
> NID_auth_ecdsa) {
> + if (cipher && SSL_CIPHER_is_ECDSA(cipher)) {
>   has_ecdsa = 1;
>   break;
>   }
> 
> No, it’s a regression in lib compatibility.
> 


Hello,

it would be nice if you come to an acceptable solution and finally merge 
LibreSSL support.
There were several attempts to propose LibreSSL support in the past and every 
time discussion dies with no result.

Thanks :)





Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Jarno Huuskonen
Hi,

On Mon, Apr 16, Lukas Tribus wrote:
> On 15 April 2018 at 21:53, Shawn Heisey  wrote:
> > I'm working on making my application capable of handling service restarts on
> > the back end with zero loss or interruption.  It runs on two servers behind
> > haproxy.
> >
> > At application shutdown, I'm setting a flag that makes the healthcheck fail,
> > and then keeping the application running for thirty seconds in order to
> > finish up all requests that the server has already received.
> >
> > It seems that when haproxy's health check fails while a request is underway,
> > the machine making the request will be sent a 502 response instead of the
> > good response that the server WILL make. This is probably a good thing for
> > haproxy to do in general, but in this case, I know that my application's
> > shutdown hook *WILL* allow enough time for the request to finish before it
> > forcibly halts the application.
> 
> You'll have to share you entire configuration for us to be able to
> comment on the behavior.
> 
> Having said that, you'd be better off setting the server to
> maintenance mode instead of letting the health check fail (via
> webinterface or stats socket):
> 
> http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#9.2-set%20server

There's also http-check disable-on-404
(http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4.2-http-check%20disable-on-404)

So maybe first set flag that returns 404 on health check and only after
thirty seconds fail the health check.

-Jarno

-- 
Jarno Huuskonen



Re: [PATCH][MINOR]: config: Warn if resolvers section has no namerservers configured

2018-04-16 Thread Willy Tarreau
Hi Ben,

On Fri, Apr 13, 2018 at 03:51:17PM -0600, Ben Draut wrote:
> This implements a simple warning for 'resolvers' sections that have no
> nameservers.

Thank you, now merged. However :

> (Also trimmed lines with trailing whitespace in this file.)

Please don't do this, it needlessly inflates the patch, complicates
the review process and possibly makes backports more painful. While
it can sometimes be fine to fix these where you are editing, it's not
much welcome in other places, especially mixed with a feature. As a
rule of thumb, if a patch contains some hunks irrelevant to the patch's
initial purpose, these changes should be dropped.

Thanks,
Willy



Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-16 Thread Olivier Houchard
Hi Pieter,

On Fri, Apr 13, 2018 at 06:50:50AM +, Pi Ba wrote:
> Using poll (startup with -dk) the request works properly.

After some discussion with Willy, we came with a solution that may fix your
problem with kqueue.
Can you test the attached patch and let me know if it fixes it for you ?

Thanks !

Olivier
>From 3c0a505e5f163989239ffb5267ddf7c1ed549fb9 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 16 Apr 2018 13:24:48 +0200
Subject: [PATCH] BUG/MEDIUM: When adding new events, provide an output to get
 errors.

When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece9d..faf3ba2c2 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -31,6 +31,7 @@
 /* private data */
 static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
 static THREAD_LOCAL struct kevent *kev = NULL;
+static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write 
the eventlist in
 
 /*
  * kqueue() poller
@@ -43,6 +44,8 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
int updt_idx, en;
int changes = 0;
 
+   timeout.tv_sec  = 0;
+   timeout.tv_nsec = 0;
/* first, scan the update list to find changes */
for (updt_idx = 0; updt_idx < fd_nbupdt; updt_idx++) {
fd = fd_updt[updt_idx];
@@ -82,12 +85,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
}
}
if (changes)
-   kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
+   kevent(kqueue_fd[tid], kev, changes, kev_out, changes, 
);
fd_nbupdt = 0;
 
delta_ms= 0;
-   timeout.tv_sec  = 0;
-   timeout.tv_nsec = 0;
 
if (!exp) {
delta_ms= MAX_DELAY_MS;
@@ -194,6 +195,10 @@ REGPRM1 static int _do_init(struct poller *p)
 {
p->private = NULL;
 
+   kev_out = calloc(1, sizeof(struct kevent) * 2 * global.maxsock);
+   if (!kev_out)
+   goto fail_alloc;
+
kqueue_fd[tid] = kqueue();
if (kqueue_fd[tid] < 0)
goto fail_fd;
@@ -203,6 +208,9 @@ REGPRM1 static int _do_init(struct poller *p)
return 1;
 
  fail_fd:
+   free(kev_out);
+   kev_out = NULL;
+fail_alloc:
p->pref = 0;
return 0;
 }
@@ -220,6 +228,10 @@ REGPRM1 static void _do_term(struct poller *p)
 
p->private = NULL;
p->pref = 0;
+   if (kev_out) {
+   free(kev_out);
+   kev_out = NULL;
+   }
 }
 
 /*
-- 
2.16.3



Re: MINOR: proxy: Add fe_defbe fetcher

2018-04-16 Thread Willy Tarreau
Hi Marcin,

On Fri, Apr 13, 2018 at 03:41:18PM +0200, Marcin Deranek wrote:
> Hi,
> 
> New fetcher which adds ability to retrieve default backend name for
> frontend. Should cleanly apply to both 1.8 & 1.9 branches.

Now merged, thank you!

Willy



Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Willy Tarreau
On Mon, Apr 16, 2018 at 10:03:44AM -0600, Shawn Heisey wrote:
> I am curious about why I couldn't use "track".

"track" means that your current server will always be in the same state
as the designated one. It will never run its own checks, and will receive
notifications from the other one's state change events.

So you can simply not have any check-specific stuff on a server tracking
another one. However if you use disable-on-404 on the tracked one, the
tracking one will obviously adapt.

Willy



Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-16 Thread PiBa-NL

Hi Olivier,

Op 16-4-2018 om 17:09 schreef Olivier Houchard:

After some discussion with Willy, we came with a solution that may fix your
problem with kqueue.
Can you test the attached patch and let me know if it fixes it for you ?

Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
scanning needlessly the kqueue.

Regards,

Olivier


Thanks the patch solves the issue i experienced at least for the 
testcase that i had. (And doesn't seem to cause obvious new issues that 
i could quickly spot..) Both with and without EV_RECEIPT on kev[0] it 
seems to work the same for my testcase..


Just a few thoughts though:
Now only the first kev[0] gets the EV_RECEIPT flag, shouldn't it be 
added to all items in the array? Now sometimes 3 changes are send and 
only 2 'results' are reported back. If i read right the EV_RECEIPT 
should 'force' a result for each change send. Also is there a reason you 
put it inside a '#ifdef' ? It seems to me a hard requirement to not read 
any possible pending events when sending the list of updated filters at 
that moment.?. Or perhaps its possible to call kevent only once? Both 
sending changes, and receiving new events in 1 big go and without the 
RECEIPT flag?


There are now more 'changes' send than required that need to be 
disregarded with a 'error flag' by kqueue
Doesn't that (slightly) affect performance? Or would checking a bitmask 
beforehand not be cheaper than what kevent itself needs to do to ignore 
an item and 'error report' some of the changes.?. I've not tried to 
measure this, but technically i think there will be a few more cpu 
operations needed overall this way.?.


Regards,

PiBa-NL (Pieter)




Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Shawn Heisey
On 4/16/2018 1:46 PM, Willy Tarreau wrote:
> On Mon, Apr 16, 2018 at 10:03:44AM -0600, Shawn Heisey wrote:
>> I am curious about why I couldn't use "track".
> "track" means that your current server will always be in the same state
> as the designated one. It will never run its own checks, and will receive
> notifications from the other one's state change events.
>
> So you can simply not have any check-specific stuff on a server tracking
> another one. However if you use disable-on-404 on the tracked one, the
> tracking one will obviously adapt.

Thanks to you and everyone else who has replied.

That didn't work.  I tried to use disable-on-404 on the tracked
backend.  I got the fatal configuration error I mentioned before:

[ALERT] 105/095234 (7186) : config : backend 'be-cdn-9000', server
'planet': unable to use chk-cdn-9000/planet fortracking: disable-on-404
option inconsistency.
[ALERT] 105/095234 (7186) : config : backend 'be-cdn-9000', server
'hollywood': unable to use chk-cdn-9000/hollywood fortracking:
disable-on-404 option inconsistency.
[ALERT] 105/095234 (7186) : Fatal errors found in configuration.

I also tried it with that option in both backend configurations.  That
didn't work either.  I don't recall the error, but it was probably the
same as one of the other errors I had gotten before.

This is on 1.5.12, and I can't really blame you if the "standard" mental
map you keep of the project doesn't include that version.  It's got to
be hard enough keeping that straight for just 1.8 and 1.9-dev!  Maybe
the error I encountered would be solved by upgrading.  Upgrading is on
the (really long) todo list.

I do have a config that works.  I'm no longer tracking another backend,
but doing the health checks in the load-balancing backend.  The whole
reason I had migrated server checks to dedicated back ends was because I
wanted to reduce the number of check requests being sent, and I'm
sharing the check backends with multiple balancing backends in some
cases.  For the one I've been describing, I don't need to share the
check backend.

I ran into other problems on the application side with how process
shutdowns work, but resolved those by adding an endpoint into my app
with the URL path of "/lbdisable" and handling the disable/pause in the
init script instead of the application.  I can now restart my custom
application at will without any loss, and without a client even noticing
there was a problem.

As of a little while ago, I have solved all the problems I encountered
on the road to graceful application restarts except the one where a
backup server is not promoted to active as soon as the primary servers
are all down.  I described that issue in a separate message to the
list.  I do have a workaround to that issue -- I'm no longer using
"backup" on any server entries for this service.

Thanks,
Shawn




Re: Question regarding haproxy backend behaviour

2018-04-16 Thread Ayush Goyal
Hi Moemen,

Thanks for your response. But I think I need to clarify a few things here.

On Mon, Apr 16, 2018 at 4:33 AM Moemen MHEDHBI  wrote:

> Hi
>
> On 12/04/2018 19:16, Ayush Goyal wrote:
>
> Hi,
>
> I have a question regarding haproxy backend connection behaviour. We have
> following setup:
>
>   +-+ +---+
>   | haproxy |>| nginx |
>   +-+ +---+
>
> We use a haproxy cluster for ssl off-loading and then load balance request
> to
> nginx cluster. We are currently benchmarking this setup with 3 nodes for
> haproxy
> cluster and 1 nginx node. Each haproxy node has two frontend/backend pair.
> First
> frontend is a router for ssl connection which redistributes request to the
>  second
> frontend in the haproxy cluster. The second frontend is for ssl handshake
> and
> routing requests to nginx servers. Our configuration is as follows:
>
> ```
> global
> maxconn 10
> user haproxy
> group haproxy
> nbproc 2
> cpu-map 1 1
> cpu-map 2 2
>
> defaults
> mode http
> option forwardfor
> timeout connect 5s
> timeout client 30s
> timeout server 30s
> timeout tunnel 30m
> timeout client-fin 5s
>
> frontend ssl_sess_id_router
> bind *:443
> bind-process 1
> mode tcp
> maxconn 10
> log global
> option tcp-smart-accept
> option splice-request
> option splice-response
> default_backend ssl_sess_id_router_backend
>
> backend ssl_sess_id_router_backend
> bind-process 1
> mode tcp
> fullconn 5
> balance roundrobin
> ..
> option tcp-smart-connect
> server lbtest01 :8443 weight 1 check send-proxy
> server lbtest02 :8443 weight 1 check send-proxy
> server lbtest03 :8443 weight 1 check send-proxy
>
> frontend nginx_ssl_fe
> bind *:8443 ssl 
> maxconn 10
> bind-process 2
> option tcp-smart-accept
> option splice-request
> option splice-response
> option forwardfor
> reqadd X-Forwarded-Proto:\ https
> timeout client-fin 5s
> timeout http-request 8s
> timeout http-keep-alive 30s
> default_backend nginx_backend
>
> backend nginx_backend
> bind-process 2
> balance roundrobin
> http-reuse safe
> option tcp-smart-connect
> option splice-request
> option splice-response
> timeout tunnel 30m
> timeout http-request 8s
> timeout http-keep-alive 30s
> server testnginx :80  weight 1 check
> ```
>
> The nginx node has nginx with 4 workers and 8192 max clients, therefore
> the max
> number of connection it can accept is 32768.
>
> For benchmark, we are generating ~3k new connections per second where each
> connection makes 1 http request and then holds the connection for next 30
> seconds. This results in a high established connection on the first
> frontend,
> ssl_sess_id_router, ~25k per haproxy node (Total ~77k connections on 3
> haproxy
> nodes). The second frontend (nginx_ssl_fe) receives the same number of
> connection on the frontend. On nginx node, we see that active connections
> increase to ~32k.
>
> Our understanding is that haproxy should keep a 1:1 connection mapping for
> each
> new connection in frontend/backend. But there is a connection count
> mismatch
> between haproxy and nginx (Total 77k connections in all 3 haproxy for both
> frontends vs 32k connections in nginx made by nginx_backend), We are still
> not
> facing any major 5xx or connection errors. We are assuming that this is
> happening because haproxy is terminating old idle ssl connections to serve
> the
> new ones. We have following questions:
>
> 1. How the nginx_backend connections are being terminated to serve the new
> connections?
>
> Connections are usually terminated when the client receives the whole
> response. Closing the connection can be initiated by the client, server of
> HAProxy (timeouts, etc..)
>

Client connections are keep-alive here for 30 seconds from client side.
Various timeout values in both nginx and haproxy are sufficiently high of
the order of 60 seconds. Still what we are observing here is that nginx is
closing the connection after 7-14 seconds to serve new client requests. Not
sure why nginx or haproxy will close existing keep-alive connections to
serve new requests when timeouts are sufficiently high?

> 2. Why haproxy is not terminating connections on the frontend to keep it
> them at 32k
> for 1:1 mapping?
>
> I think there is no 1:1 mapping between the number of connections in
> haproxy and nginx. This is because you are chaining the two fron/back pairs
> in haproxy, so when the client establishes 1 connctions with haproxy you
> will see 2 established connections in haproxy stats. This explains why the
> number of connections in haproxy is the double of the ones in nginx.
>

I want to 

[PATCH] BUG/MINOR: http: Return an error in proxy mode when url2sa fails

2018-04-16 Thread Christopher Faulet

Hi,

Here is a patch fixing an old bug in proxy mode, when you mix valid 
requests (using an IP) with invalid ones (with a domain name for instance).


With following configuration:

  listen test
  mode http
  bind *:
  option http_proxy

try to do:

  $> printf "GET http://pouet.com/ HTTP/1.0\r\n\r\n" | nc 127.0.0.1 

You will have a 503 returned by HAProxy, because there is no DNS 
resolution, as stated in the documentation. Then try:


  $> printf "GET http://51.15.8.218 HTTP/1.0\r\nHost: 
www.haproxy.org\r\n\r\n" | nc 127.0.0.1 


You will get a haproxy.org homepage. Then retry the first request. It 
should return the same error...


  $> printf "GET http://pouet.com/ HTTP/1.0\r\n\r\n" | nc 127.0.0.1 

... But, instead you will have a 404 from haproxy.org (because there is 
no Host header).


With the patches, HAProxy always return an error 400 in this case.

--
Christopher Faulet
>From d1e5d8cb24ad0e71706cf5d4472142a4e048a7f1 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Fri, 13 Apr 2018 15:53:12 +0200
Subject: [PATCH] BUG/MINOR: http: Return an error in proxy mode when url2sa
 fails

In proxy mode, the result of url2sa is never checked. So when the function fails
to resolve the destination server from the URL, we continue. Depending on the
internal state of the connection, we get different behaviours. With a newly
allocated connection, the field  is not set. So we will get a HTTP
error. The status code is 503 instead of 400, but it's not really critical. But,
if it's a recycled connection, we will reuse the previous value of ,
opening a connection on an unexpected server.

To fix the bug, we return an error when url2sa fails.

This patch should be backported in all version from 1.5.
---
 src/proto_http.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/proto_http.c b/src/proto_http.c
index 80e001d69..8370889b4 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -3720,9 +3720,11 @@ int http_process_request(struct stream *s, struct channel *req, int an_bit)
 		}
 
 		path = http_get_path(txn);
-		url2sa(req->buf->p + msg->sl.rq.u,
-		   path ? path - (req->buf->p + msg->sl.rq.u) : msg->sl.rq.u_l,
-		   >addr.to, NULL);
+		if (url2sa(req->buf->p + msg->sl.rq.u,
+			   path ? path - (req->buf->p + msg->sl.rq.u) : msg->sl.rq.u_l,
+			   >addr.to, NULL) == -1)
+			goto return_bad_req;
+
 		/* if the path was found, we have to remove everything between
 		 * req->buf->p + msg->sl.rq.u and path (excluded). If it was not
 		 * found, we need to replace from req->buf->p + msg->sl.rq.u for
-- 
2.14.3



Re: [PATCH] BUG/MINOR: http: Return an error in proxy mode when url2sa fails

2018-04-16 Thread Willy Tarreau
On Mon, Apr 16, 2018 at 10:29:11AM +0200, Christopher Faulet wrote:
> Here is a patch fixing an old bug in proxy mode, when you mix valid requests
> (using an IP) with invalid ones (with a domain name for instance).
> 
> With following configuration:
> 
>   listen test
>   mode http
>   bind *:
>   option http_proxy
(...)

Now applied, thank you Christopher.

Willy



Re: [PATCH] BUG/MINOR: cli: Ensure appctx->ctx.cli.err is always set when using CLI_ST_PRINT_FREE

2018-04-16 Thread Willy Tarreau
Hi Aurélien,

On Sun, Apr 15, 2018 at 09:58:49AM +0200, Aurélien Nephtali wrote:
> Hello,
> 
> Here is a small patch to fix a potential crash when using
> CLI_ST_PRINT_FREE in an error path in the 'map' code.
> The problematic part is in the 'add' feature but all other usages have
> ben modified the same way to be consistent.

Interesting one. In fact, while it does provide a friendlier error message
to the user, the real issue in my opinion is in the cli_io_handler() where
it handles CLI_ST_PRINT_FREE, where it's not defensive enough against a
NULL in appctx->ctx.cli.err. And even with your patch this situation can
arise if an out of memory condition happens in the final memprintf() of
the map code.

Thus what I'd suggest would be instead to check for NULL there and to fall
back to a generic "out of memory" error message (if that makes sense, maybe
other situations may lead to this, I don't know) as a first patch, then
another one which is just a small improvement to make error messages more
relevant for map and ocsp (which is exactly what your patch does).

I'm just having a small comment below :

> - if (err)
> + if (err) {
>   memprintf(, "%s.\n", err);
> - appctx->ctx.cli.err = err;
> - appctx->st0 = CLI_ST_PRINT_FREE;
> +appctx->ctx.cli.err = err;
> +appctx->st0 = CLI_ST_PRINT_FREE;
> +}
> +else {
> + appctx->ctx.cli.severity = LOG_ERR;
> + appctx->ctx.cli.msg = "Failed to update 
> an entry.\n";
> + appctx->st0 = CLI_ST_PRINT;
> +}
>   return 1;

Please be careful above, as you can see, the lines are filled with spaces,
maybe the code was copy-pasted there (it's the same at other locations). 

Thanks!
Willy



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-04-16 Thread Sander Hoentjen
Reading my email again it looks like somehow I messed up part of it,
retrying:

Hi all,

I built Haproxy (1.8.7) against openssl 1.1.1-pre4, and now after 1 hour
running haproxy stops accepting new SSL connections. When I restart it
works again for almost(?) exactly 1 hour, then stops. Any idea what
might be causing this, or where I should look? Especially the part that
it works for one hour seems weird to me. Next to that, only SSL
connections stop working, the plain ones continue to work. The setup has
one frontend that accepts both http and https, using:
    tcp-request inspect-delay 500ms
    tcp-request content accept if HTTP
    tcp-request content accept if { req.ssl_hello_type 1 }
Maybe this has something to do with it?
Exactly the same config, with only difference being built agains openssl
1.1.0 works without any issues.

Any help appreciated.

Regards,
Sander


On 04/13/2018 10:27 AM, Sander Hoentjen wrote:
> Hi all,
>
> I built Haproxy (1.8.7) against openssl 1.1.1-pre4, and now after 1 hour
> running haproxy stops accepting new SSL connections. When I restart it
> works again for almost(?) exactly 1 hour, then stops.
> Any idea what might be causing this, or where I should look
>
> # haproxy -vv
> HA-Proxy version 1.8.7 2018/04/07
> Copyright 2000-2018 Willy Tarreau 
>
> Build options :
>   TARGET  = linux2628
>   CPU = generic
>   CC  = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -fno-strict-overflow -Wno-unused-label
>   OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1
> USE_PCRE=1
>
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>
> Built with network namespace support.
> Built with zlib version : 1.2.3
> Running on zlib version : 1.2.3
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with PCRE version : 7.8 2008-09-05
> Running on PCRE version : 7.8 2008-09-05
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built with multi-threading support.
> Encrypted password support via crypt(3): yes
> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> Built with OpenSSL version : OpenSSL 1.1.1-pre4 (beta) 3 Apr 2018
> Running on OpenSSL version : OpenSSL 1.1.1-pre4 (beta) 3 Apr 2018
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
>
> Available polling systems :
>   epoll : pref=300,  test result OK
>    poll : pref=200,  test result OK
>  select : pref=150,  test result OK
> Total: 3 (3 usable), will use epoll.
>
> Available filters :
>     [TRACE] trace
>     [COMP] compression
>     [SPOE] spoe
>
> Regards,
> Sander Hoentjen
>




Re: Question regarding haproxy backend behaviour

2018-04-16 Thread Igor Cicimov
On Mon, 16 Apr 2018 6:09 pm Ayush Goyal  wrote:

> Hi Moemen,
>
> Thanks for your response. But I think I need to clarify a few things here.
>
> On Mon, Apr 16, 2018 at 4:33 AM Moemen MHEDHBI 
> wrote:
>
>> Hi
>>
>> On 12/04/2018 19:16, Ayush Goyal wrote:
>>
>> Hi,
>>
>> I have a question regarding haproxy backend connection behaviour. We have
>> following setup:
>>
>>   +-+ +---+
>>   | haproxy |>| nginx |
>>   +-+ +---+
>>
>> We use a haproxy cluster for ssl off-loading and then load balance
>> request to
>> nginx cluster. We are currently benchmarking this setup with 3 nodes for
>> haproxy
>> cluster and 1 nginx node. Each haproxy node has two frontend/backend
>> pair. First
>> frontend is a router for ssl connection which redistributes request to the
>>  second
>> frontend in the haproxy cluster. The second frontend is for ssl
>> handshake and
>> routing requests to nginx servers. Our configuration is as follows:
>>
>> ```
>> global
>> maxconn 10
>> user haproxy
>> group haproxy
>> nbproc 2
>> cpu-map 1 1
>> cpu-map 2 2
>>
>> defaults
>> mode http
>> option forwardfor
>> timeout connect 5s
>> timeout client 30s
>> timeout server 30s
>> timeout tunnel 30m
>> timeout client-fin 5s
>>
>> frontend ssl_sess_id_router
>> bind *:443
>> bind-process 1
>> mode tcp
>> maxconn 10
>> log global
>> option tcp-smart-accept
>> option splice-request
>> option splice-response
>> default_backend ssl_sess_id_router_backend
>>
>> backend ssl_sess_id_router_backend
>> bind-process 1
>> mode tcp
>> fullconn 5
>> balance roundrobin
>> ..
>> option tcp-smart-connect
>> server lbtest01 :8443 weight 1 check send-proxy
>> server lbtest02 :8443 weight 1 check send-proxy
>> server lbtest03 :8443 weight 1 check send-proxy
>>
>> frontend nginx_ssl_fe
>> bind *:8443 ssl 
>> maxconn 10
>> bind-process 2
>> option tcp-smart-accept
>> option splice-request
>> option splice-response
>> option forwardfor
>> reqadd X-Forwarded-Proto:\ https
>> timeout client-fin 5s
>> timeout http-request 8s
>> timeout http-keep-alive 30s
>> default_backend nginx_backend
>>
>> backend nginx_backend
>> bind-process 2
>> balance roundrobin
>> http-reuse safe
>> option tcp-smart-connect
>> option splice-request
>> option splice-response
>> timeout tunnel 30m
>> timeout http-request 8s
>> timeout http-keep-alive 30s
>> server testnginx :80  weight 1 check
>> ```
>>
>> The nginx node has nginx with 4 workers and 8192 max clients, therefore
>> the max
>> number of connection it can accept is 32768.
>>
>> For benchmark, we are generating ~3k new connections per second where each
>> connection makes 1 http request and then holds the connection for next 30
>> seconds. This results in a high established connection on the first
>> frontend,
>> ssl_sess_id_router, ~25k per haproxy node (Total ~77k connections on 3
>> haproxy
>> nodes). The second frontend (nginx_ssl_fe) receives the same number of
>> connection on the frontend. On nginx node, we see that active connections
>> increase to ~32k.
>>
>> Our understanding is that haproxy should keep a 1:1 connection mapping
>> for each
>> new connection in frontend/backend. But there is a connection count
>> mismatch
>> between haproxy and nginx (Total 77k connections in all 3 haproxy for both
>> frontends vs 32k connections in nginx made by nginx_backend), We are
>> still not
>> facing any major 5xx or connection errors. We are assuming that this is
>> happening because haproxy is terminating old idle ssl connections to
>> serve the
>> new ones. We have following questions:
>>
>> 1. How the nginx_backend connections are being terminated to serve the new
>> connections?
>>
>> Connections are usually terminated when the client receives the whole
>> response. Closing the connection can be initiated by the client, server of
>> HAProxy (timeouts, etc..)
>>
>
> Client connections are keep-alive here for 30 seconds from client side.
> Various timeout values in both nginx and haproxy are sufficiently high of
> the order of 60 seconds. Still what we are observing here is that nginx is
> closing the connection after 7-14 seconds to serve new client requests. Not
> sure why nginx or haproxy will close existing keep-alive connections to
> serve new requests when timeouts are sufficiently high?
>
>> 2. Why haproxy is not terminating connections on the frontend to keep it
>> them at 32k
>> for 1:1 mapping?
>>
>> I think there is no 1:1 mapping between the number of connections in
>> haproxy and nginx. This is because you are chaining the two fron/back pairs
>> in haproxy, 

Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Jonathan Matthews
On Sun, 15 Apr 2018 at 20:56, Shawn Heisey  wrote:

> Would I need to upgrade beyond 1.5 to get that working?


I don't have any info about your precise problem, but here's a quote from
Willy's 1.9 thread within the last couple of months:

"Oh, before I forget, since nobody asked for 1.4 to continue to be
maintained, I've just marked it "unmaintained", and 1.5 now entered
the "critical fixes only" status. 1.4 will have lived almost 8 years
(1.4.0 was released on 2010-02-26). Given that it doesn't support
SSL, it's unlikely to be found exposed to HTTP traffic in sensitive
places anymore. If you still use it, there's nothing wrong for now,
as it's been one of the most stable versions of all times. But please
at least regularly watch the activity on the newer ones and consider
upgrading it once you see that some issues might affect it. For those
who can really not risk to face a bug, 1.6 is a very good candidate
now and is still well supported 2 years after its birth."
>
>
You might get a solution to this and your other 1.5 problem on the list -
it has a very helpful and knowledgeable population :-)

But if you can possibly upgrade to 1.6 or later, I suspect the frequency of
answers you get and the flexibility they'll have to help you will improve
markedly.

HTH!
J
-- 
Jonathan Matthews
London, UK
http://www.jpluscplusm.com/contact.html


Re: Version 1.5.12, getting 502 when server check fails, but server is still working

2018-04-16 Thread Lukas Tribus
Hello,


On 15 April 2018 at 21:53, Shawn Heisey  wrote:
> I'm working on making my application capable of handling service restarts on
> the back end with zero loss or interruption.  It runs on two servers behind
> haproxy.
>
> At application shutdown, I'm setting a flag that makes the healthcheck fail,
> and then keeping the application running for thirty seconds in order to
> finish up all requests that the server has already received.
>
> It seems that when haproxy's health check fails while a request is underway,
> the machine making the request will be sent a 502 response instead of the
> good response that the server WILL make. This is probably a good thing for
> haproxy to do in general, but in this case, I know that my application's
> shutdown hook *WILL* allow enough time for the request to finish before it
> forcibly halts the application.

You'll have to share you entire configuration for us to be able to
comment on the behavior.

Having said that, you'd be better off setting the server to
maintenance mode instead of letting the health check fail (via
webinterface or stats socket):

http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#9.2-set%20server



The upgrade to a more recent build is, thanks to Vincent's work, very
simple on debian and Ubuntu:
https://haproxy.debian.net



Cheers,
Lukas