Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Olivier Houchard
On Thu, Apr 06, 2023 at 09:17:34PM +0200, Willy Tarreau wrote:
> > > > +/* limit compression rate */
> > > > +   if (global.comp_rate_lim > 0)
> > > > +   if (read_freq_ctr(_bps_in) > 
> > > > global.comp_rate_lim)
> > > > +   goto fail;
> > > 
> > > I had a doubt regarding this one but I think it's indeed reasonable to
> > > use the same global value for all directions as the goal was precisely
> > > to be able to enforce a total limit on all compressed streams to limit
> > > the resource impacts.
> >  
> > I think it makes sense, unless someone wants to give more priority to
> > request compression vs response compression, or vice versa, but I'll
> > rethink it if somebody gives me a valid use-case.
> 
> We'll see, but I strongly doubt this would ever happen because those who
> want to enforce speed limits are often hosting providers who don't want
> that a single site eats all their CPU, so they configure maximum compression
> bandwidths. It dates back from the era where it was still common to start
> several daemons, which is no more the case these days. So actually if
> anything would change in the short term it would rather be to have
> per-frontend and/or per-backend total limits rather than splitting
> req/res which still concern the same application.
 
Yeah that makes sense.

> > > I have not seen anything updating the stats, so I suspect that request
> > > and response compression stats are accounted as response stats, which
> > > can be a problem showing compression rates larger than 100%. If you hover
> > > over your frontend's "Bytes / Out" fields, you'll see a yellow box with
> > > numbers such as:
> > > 
> > >   Response bytes in:  468197882863
> > >   Compression in: 117157038990
> > >   Compression out:93277104103 (79%)
> > >   Compression bypass: 0
> > >   Total bytes saved:  23879934887 (5%)
> > > 
> > > Here I'm figuring that we'll need to add the same for the upload, so
> > > these are extra stats. I can possibly help locate where to place them,
> > > but I'll need that you figure where they are updated.
> > 
> > Ok so I've had mixed feelings about this, because I thought a global
> > counter could be useful too. But ultimately I think it's better to keep
> > them separated, if only so that the percentages make sense.
> 
> Yes, and another point is that when you start with a global one and
> finally decide to introduce a separate one for one direction so that
> you can subtract it from the total to get the other one, it's not
> uncommon to see negative numbers due to rounding errors or timings,
> and that's extremely confusing for those who deal with that. Thus I
> prefer to let users sum data on their side rather than providing
> totals and separate ones later.
 
That's a good point.

> > We could
> > have had Request + Response bytes there, but I think it's less useful.
> > So I added the counters, but do not expose them for requests yet, as I'm
> > unsure where to do that exactly, but that can easily be addressed with a
> > separate commit.
> 
> Yes we can have a look later. I think we'll add new metric in the stats
> that will appear in a new tooltip on the bytes/in column of the stats
> page and that will be OK.
 
Yeah I was considering something like this, but wasn't so sure.

> Just a few tiny cosmetic comments below:

[Took care of unaligned comments]

> > From: Olivier Houchard 
> > Date: Thu, 6 Apr 2023 00:33:48 +0200
> > Subject: [PATCH 5/5] MEDIUM: compression: Make it so we can compress 
> > requests
> >  as well.
> (...)
> > diff --git a/doc/configuration.txt b/doc/configuration.txt
> > index 3468c78f3..63e8d3440 100644
> > --- a/doc/configuration.txt
> > +++ b/doc/configuration.txt
> > @@ -5099,7 +5111,15 @@ compression offload
> >If this setting is used in a defaults section, a warning is emitted and 
> > the
> >option is ignored.
> >  
> > -  See also : "compression type", "compression algo"
> > +  See also : "compression type", "compression algo", "compression "
>  ^^^
> A word is missing here, I suspect it was "direction" since that one
> references "offload" which is where we are.
 
Ahahah yeah, pretty sure I removed "tocompress" to replace it with
direction, but was probably distracted before I did so, and forgot.

> OK otherwise it looks good to me. I suggest you adjust these cosmetic
> details and directly push it.
 
Done, thanks!

> I'm having one question by the way: how did you manage to test this ?
> Did you find a server that supports decompressing requests, or do you
> only have some in-house stuff for this ? I'm wondering if we could manage
> to make a regtest for this, at least by putting some expect rules on
> the content-encoding header probably.

My test server was actually a nginx with custom LUA code to decompress
stuff. Maybe we can do something similar with haproxy?

Regards,

Olivier



Re: [PATCH] Add the possibility to compress requests

2023-04-05 Thread Olivier Houchard
Hi,

On Tue, Apr 04, 2023 at 09:45:24PM +0200, Willy Tarreau wrote:
> Hi Olivier,
> 
> On Tue, Apr 04, 2023 at 12:29:15AM +0200, Olivier Houchard wrote:
> > Hi,
> > 
> > The attached patchset is a first attempt at adding the possibility to
> > compress requests, as well as responses.
> 
> This is pretty cool, I definitely see how this can be super useful :-)
 
:)

> > It adds a new keyword for compression, "tocompress" (any better
> > alternative name would be appreciated).
> 
> I would call it "direction" which I find more natural and understandable.
 
I like that a lot more. Thank you!

> > The valid values are "request",
> > if you want to compress requests, "response" if you want to compress
> > responses or "both", if you want to compress both.
> > The default is to compress responses only.
> 
> I have a few below (besides renaming "tocompress").
 
[...]

> > diff --git a/include/haproxy/compression-t.h 
> > b/include/haproxy/compression-t.h
> > index cdbf0d14e..d15d15ee3 100644
> > --- a/include/haproxy/compression-t.h
> > +++ b/include/haproxy/compression-t.h
> > @@ -37,6 +37,9 @@
> >  /* Compression flags */
> >  
> >  #define COMP_FL_OFFLOAD0x0001 /* Compression offload */
> > +#define COMP_FL_COMPRESS_REQ   0x0002 /* Compress requests */
> > +#define COMP_FL_COMPRESS_RES   0x0004 /* Compress responses */
> 
> So these ones would be COMP_FL_DIR_{REQ,RES}.
 
Done!

> >  struct comp {
> > struct comp_algo *algos;
> > struct comp_type *types;
> > diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
> > index 507009692..7680e241f 100644
> > --- a/src/flt_http_comp.c
> > +++ b/src/flt_http_comp.c
> > @@ -33,7 +33,9 @@ struct flt_ops comp_ops;
> >  
> >  struct comp_state {
> > struct comp_ctx  *comp_ctx;   /* compression context */
> > +   struct comp_ctx  *comp_ctx_req; /* compression context for request */
> > struct comp_algo *comp_algo;  /* compression algorithm if not NULL */
> > +   struct comp_algo *comp_algo_req; /* Algo to be used for request */
> > unsigned int  flags;  /* COMP_STATE_* */
> 
> I find it confusing to have comp_ctx and comp_ctx_req (and same for algo).
> I'd rather see a preliminary patch renaming comp_ctx to comp_ctx_res and
> comp_algo to comp_algo_res (just perform a mechanical rename). Or you could
> also have an array [2] for each, which will even remove some ifs in the
> code.
 
Ok I made that an array.

> Also I don't understand how you can have different algos for each direction
> since the config language allows you to define "compression algo" and
> "compression tocompress" so you cannot (apparently) have a situation where
> the two algos differ. I think it can make sense to use different algos for
> each direction (at least compression settings, e.g. expensive compression
> for uploads, relayed over expensive links, little compression for downloads
> that possibly comes from a cache).
 
The idea was that while only one algo could be used for requests, the
response algorithm would depend on what was supported by the client.

> Thus maybe an approach could be to use the algo to indicate the direction
> at the same time:
> 
>compression algo foobar   => sets response only, for compatibility
>compression algo-res foobar   => sets response only, explicit syntax
>compression algo-req foobar   => sets request only, explicit syntax
> 
> Then there's probably no need to have a keyword to set it at once in
> both directions since both actions are totally independent and it's
> probably not desired to encourage users to conflate the two directions
> in a single keyword.
> 
> You might need the same for "compression types" I think, so that you
> don't try to compress regular forms or login:password that will confuse
> applications, but only large data.
 
But I did that, so now there's even more reason to have separate
algorithms.

> > -static int set_compression_response_header(struct comp_state *st,
> > -  struct stream *s,
> > -  struct http_msg *msg);
> > +static int set_compression_header(struct comp_state *st,
> > + struct stream *s,
> > + struct http_msg *msg,
> > + int for_response);
>   
> For this one as well I would call it "direction" (or "dir"), as we
> already do in samples an

[PATCH] Add the possibility to compress requests

2023-04-03 Thread Olivier Houchard
Hi,

The attached patchset is a first attempt at adding the possibility to
compress requests, as well as responses.
It adds a new keyword for compression, "tocompress" (any better
alternative name would be appreciated). The valid values are "request",
if you want to compress requests, "response" if you want to compress
responses or "both", if you want to compress both.
The default is to compress responses only.

Any comment is more than welcome.

Thanks!

Olivier
>From 6c3e62baa888359521091387ce6ac8376a001259 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 3 Apr 2023 22:22:24 +0200
Subject: [PATCH 1/2] MINOR: compression: Make compression offload a flag

Turn compression offload into a flag in struct comp, instead of using
an int just for it.
---
 include/haproxy/compression-t.h | 5 -
 src/flt_http_comp.c | 6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/haproxy/compression-t.h b/include/haproxy/compression-t.h
index 062f17f76..cdbf0d14e 100644
--- a/include/haproxy/compression-t.h
+++ b/include/haproxy/compression-t.h
@@ -34,10 +34,13 @@
 
 #include 
 
+/* Compression flags */
+
+#define COMP_FL_OFFLOAD		0x0001 /* Compression offload */
 struct comp {
 	struct comp_algo *algos;
 	struct comp_type *types;
-	unsigned int offload;
+	unsigned int flags;
 };
 
 struct comp_ctx {
diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
index ceda3fd5e..507009692 100644
--- a/src/flt_http_comp.c
+++ b/src/flt_http_comp.c
@@ -451,8 +451,8 @@ select_compression_request_header(struct comp_state *st, struct stream *s, struc
 
 	/* remove all occurrences of the header when "compression offload" is set */
 	if (st->comp_algo) {
-		if ((s->be->comp && s->be->comp->offload) ||
-		(strm_fe(s)->comp && strm_fe(s)->comp->offload)) {
+		if ((s->be->comp && (s->be->comp->flags & COMP_FL_OFFLOAD)) ||
+		(strm_fe(s)->comp && (strm_fe(s)->comp->flags & COMP_FL_OFFLOAD))) {
 			http_remove_header(htx, );
 			ctx.blk = NULL;
 			while (http_find_header(htx, ist("Accept-Encoding"), , 1))
@@ -690,7 +690,7 @@ parse_compression_options(char **args, int section, struct proxy *proxy,
   args[0], args[1]);
 			ret = 1;
 		}
-		comp->offload = 1;
+		comp->flags |= COMP_FL_OFFLOAD;
 	}
 	else if (strcmp(args[1], "type") == 0) {
 		int cur_arg = 2;
-- 
2.39.1

>From 77a5657592338db2ed53189c50828b3c0ed52716 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 4 Apr 2023 00:14:36 +0200
Subject: [PATCH 2/2] MEDIUM: compression: allow to compress requests too.

Make it so we can compress requests, as well as responses.
A new compress keyword is added, "tocompress". It takes exactly one
argument. The valid values for that arguments are "response", if we want
to compress only responses, "request", if we want to compress only
requests, or "both", if we want to compress both.
The dafault value is "response".
---
 doc/configuration.txt   |  12 ++-
 include/haproxy/compression-t.h |   3 +
 src/flt_http_comp.c | 165 +++-
 3 files changed, 154 insertions(+), 26 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 586689ca2..8f86b285e 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -5077,7 +5077,7 @@ compression type  ...
 compression algo gzip
 compression type text/html text/plain
 
-  See also : "compression offload"
+  See also : "compression offload", "compression tocompress"
 
 compression offload
   Makes HAProxy work as a compression offloader only.
@@ -5099,7 +5099,15 @@ compression offload
   If this setting is used in a defaults section, a warning is emitted and the
   option is ignored.
 
-  See also : "compression type", "compression algo"
+  See also : "compression type", "compression algo", "compression tocompress"
+
+compression tocompress 
+  Makes haproxy able to compress both requests and responses.
+  Valid values are "request", to compress only requests, "response", to
+  compress only responses, or "both", when you want to compress both.
+  The default value is "response".
+
+  See also : "compression type", "compression algo", "compression offload"
 
 cookie  [ rewrite | insert | prefix ] [ indirect ] [ nocache ]
   [ postonly ] [ preserve ] [ httponly ] [ secure ]
diff --git a/include/haproxy/compression-t.h b/include/haproxy/compression-t.h
index cdbf0d14e..d15d15ee3 100644
--- a/include/haproxy/compression-t.h
+++ b/include/haproxy/compression-t.h
@@ -37,6 +37,9 @@
 /* Compression flags */
 
 #define COMP_FL_OFFLOAD		0x0001 /* Compression offload */
+#

Re: HAProxy performance on OpenBSD

2023-01-24 Thread Olivier Houchard
On Tue, Jan 24, 2023 at 11:05:37PM +0100, Willy Tarreau wrote:
> On Tue, Jan 24, 2023 at 02:15:08PM -0600, Marc West wrote:
> > > Stupid question but I prefer to ask in order to be certain, are all of
> > > these 32 threads located on the same physical CPU ? I just want to be
> > > sure that locks (kernel or user) are not traveling between multiple CPU
> > > sockets, as that ruins scalability.
> > 
> > Very good observation. This is a 2 socket system, 20 cores per socket,
> > and since there is no affinity in OpenBSD unfortunately the 32 threads
> > are not on the same physical CPU.
> 
> Ah, then that's particularly bad. When you're saying "there is no
> affinity", do you mean the OS doesn't implement anything or only that
> haproxy doesn't support it ? Could you please check if you have man
> pages about "cpuset_setaffinity" or "sched_setaffinity" ? The former
> is the FreeBSD version and the second the Linux version.

Unfortunately, having a quick look at OpenBSD's sources, I don't think
they provide anything that could look like that. In fact, I'm not sure
there is any way to restrict a process to a specific CPU.

[...]

> Oh thank you very much. From what I'm seeing we're here:
> 
>   0x0af892c770b0 :   mov%r12,%rdi
>   0x0af892c770b3 :   callq  0xaf892c24e40 
> 
>   0x0af892c770b8 :   mov%rax,%r12
>   0x0af892c770bb :   test   %rax,%rax
>   0x0af892c770be :   je 0xaf892c770e5 
> 
>   0x0af892c770c0 :   mov0x18(%r12),%rax
> =>0x0af892c770c5 :   mov0xa0(%rax),%r11
>   0x0af892c770cc :   test   %r11,%r11
>   0x0af892c770cf :   je 0xaf892c770b0 
> 
>   0x0af892c770d1 :   mov%r12,%rdi
>   0x0af892c770d4 :   mov%r13d,%esi
> 
>   1229  conn = 
> srv_lookup_conn(>per_thr[i].safe_conns, hash);
>   1230  while (conn) {
>   1231==>>> if (conn->mux->takeover && 
> conn->mux->takeover(conn, i) == 0) {
> ^^^
> 
> It's here. %rax==0, which is conn->mux when we're trying to dereference
> it to retrieve ->takeover. I can't see how it's possible to have a NULL
> mux here in an idle connection since they're placed there by the muxes
> themselves. But maybe there's a tiny race somewhere that's impossible to
> reach except under such an extreme contention between the two sockets.
> I really have no idea regarding this one for now, maybe Olivier could
> spot a race here ? Maybe we're missing a barrier somewhere.
 
One way I can see that happening, as unlikely as it may be, is if
h1_takeover() is called, but fails to allocate a task or a tasklet, and
will call h1_release(), which will set the mux to NULL and call
conn_free(). It was previously ok to happen, because conn_free() would
unconditionally remove the connection from any list, but since the idle
connections now stay in a tree, and no longer in a list, that no longer
happens.
Another possibility of h1_release() been called while still in the
idle tree is if h1_wake() gets called, but the only way I see that
happening is from mmux_stopping_process(), whatever that is, so that's
unlikely to be the problem in this case.
Anyway, I suggest doing something along the line of the patch attached,
and add a BUG_ON() in conn_free() to catch any freeing of connection
still in the idle tree.

Olivier
>From 2a0ac4b84b97aa05cd7befc5fcf45b03795f2e76 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 24 Jan 2023 23:59:32 +0100
Subject: [PATCH] MINOR: Add a BUG_ON() to detect destroying connection in idle
 list

Add a BUG_ON() in conn_free(), to check that zhen we're freeing a
connection, it is not still in the idle connections tree, otherwise the
next thread that will try to use it will probably crash.
---
 src/connection.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/connection.c b/src/connection.c
index 4a73dbcc8..97619ec26 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -498,6 +498,10 @@ void conn_free(struct connection *conn)
 	pool_free(pool_head_uniqueid, istptr(conn->proxy_unique_id));
 	conn->proxy_unique_id = IST_NULL;
 
+	/* Make sure the connection is not left in the idle connection tree */
+	if (conn->hash_node != NULL)
+		BUG_ON(conn->hash_node->node.node.leaf_p != NULL);
+
 	pool_free(pool_head_conn_hash_node, conn->hash_node);
 	conn->hash_node = NULL;
 
-- 
2.36.1



Re: HAProxy performance on OpenBSD

2023-01-23 Thread Olivier Houchard
Hi Marc,

On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote:
> Hi,
> 
> We have been running HAProxy on OpenBSD for serveral years (currently
> OpenBSD 7.2 / HAProxy 2.6.7) and everything has been working perfect
> until a recent event of higher than normal traffic. It was an unexpected
> flood to one site and above ~1100 cur sessions we started to see major
> impacts to all sites behind haproxy in different frontends/backends than 
> the one with heavy traffic. All sites started to respond very slowly or 
> failed to load (including requests that haproxy denies before hitting 
> backend), health checks in various other backends started flapping, and
> once the heavy traffic stopped everything returned to normal. Total 
> bandwidth was less than 50Mbps on a 10G port (Intel 82599 ix fiber NIC).
> 
> After that issue we have been doing some load testing to try to gain
> more info and make any possible tweaks. Using the ddosify load testing
> tool (-d 60 -n 7) from a single machine (different from haproxy) is
> able to reproduce the same issue we saw with real traffic.
> 
> When starting a load test HAProxy handles 400-500 requests per second
> for about 3 seconds. After the first few seconds of heavy traffic, the
> test error rate immediately starts to shoot up to 75%+ connection/read
> timeouts and other haproxy sites/health checks start to be impacted. We
> had to stop the test after only 11 seconds to restore responsiveness to
> other sites. This is a physical server with 2x E5-2698 v4 20 core CPUs
> (hyperthreading disabled) and the haproxy process uses about 1545% CPU
> under this load. Overall CPU utilization is 21% user, 0% nice, 37% sys,
> 18% spin, 0.7% intr, 23.1% idle. There was no noticeable impact on
> connections to other services that this box NATs via PF to other servers
> outside of haproxy. 
> 
> Installing FreeBSD 13.1 on an identical machine and trying the same test
> on the same 2.6.7 with the same config and backend servers, the results
> are more what I would expect out of this hardware - haproxy has no
> problems handling over 10,000 req/sec and 40k connections without
> impacting any traffic/health checks, and only about 30% overall CPU
> usage at that traffic level. 100% success rate on the load tests with
> plenty of headroom on the haproxy box to handle way more. The backend
> servers were actually the bottleneck in the FreeBSD test.
> 
> I understand that raw performance on OpenBSD is sometimes not as high as
> other OSes in some scenarios, but the difference of 500 vs 10,000+
> req/sec and 1100 vs 40,000 connections here is very large so I wanted to
> see if there are any thoughts, known issues, or tunables that could
> possibly help improve HAProxy throughput on OpenBSD?
> 
> The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
> kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
> reports those as expected. PF state limit is raised to 1 million and
> peaked at 72k in use. BIOS power profile is set to max performance.
> 
> pid = 78180 (process #1, nbproc = 1, nbthread = 32)
> uptime = 1d 19h10m11s
> system limits: memmax = unlimited; ulimit-n = 20
> maxsock = 20; maxconn = 99904; maxpipes = 0
> 
> No errors that I can see in logs about hitting any limits. There is no
> change in results with http vs https, http/1.1 vs h2, with or without
> httplog, or reducing nbthread on this 40 core machine. If there are any
> other details I can provide please let me know.
> 
> Thanks in advance for any input!
> 
> 
> global
>   chroot  /var/haproxy
>   daemon  
>   log  127.0.0.1 local2
>   nbthread  32

I wonder if the problem comes from OpenBSD's thread implementation, I
fear you may have too much contention in the kernel, especially on a
NUMA machine such as yours. Does using, say, only 4 threads, make things
better?

Regards,

Olivier



Re: ACL block ignored in 2.0.25

2021-11-21 Thread Olivier Houchard
Hi Bart,

On Thu, Nov 18, 2021 at 08:06:55PM +0100, Bart van der Schans wrote:
> Here is the patch:
> 
> Fix "block" ACL by reverting accidental removal of code in
> 8ab2a364a8c8adf0965e74a41a2ff3cebd43e7a9
> ---
> diff --git a/src/cfgparse.c b/src/cfgparse.c
> index 857321e1..78f0ed76 100644
> --- a/src/cfgparse.c
> +++ b/src/cfgparse.c
> @@ -2823,6 +2823,16 @@ int check_config_validity()
> }
> }
> 
> +   /* move any "block" rules at the beginning of the http-request
> rules */
> +   if (!LIST_ISEMPTY(>block_rules)) {
> +   /* insert block_rules into http_req_rules at the beginning
> */
> +   curproxy->block_rules.p->n= curproxy->http_req_rules.n;
> +   curproxy->http_req_rules.n->p = curproxy->block_rules.p;
> +   curproxy->block_rules.n->p= >http_req_rules;
> +   curproxy->http_req_rules.n= curproxy->block_rules.n;
> +   LIST_INIT(>block_rules);
> +   }
> +
> /* check validity for 'tcp-request' layer 4/5/6/7 rules */
> cfgerr += check_action_rules(>tcp_req.l4_rules,
> curproxy, _code);
> cfgerr += check_action_rules(>tcp_req.l5_rules,
> curproxy, _code);

Your patch has been committed as feeccc6e40e4d61b1e9fa84aca961aceabe39371

Thanks a lot !

Olivier



Re: ACL block ignored in 2.0.25

2021-11-18 Thread Olivier Houchard
On Thu, Nov 18, 2021 at 05:59:24PM +0100, Bart van der Schans wrote:
> A bit more digging:
> 
> It looks like this commit broke it:
> http://git.haproxy.org/?p=haproxy-2.0.git;a=commitdiff;h=8ab2a364a8c8adf0965e74a41a2ff3cebd43e7a9
> 
> Re-adding the following block on line 2826 in cfgparse.c in 2.0.25 "solves"
> the issue but I have no idea if that is the right thing to do:
> 
> -   /* move any "block" rules at the beginning of the
> http-request rules */
> -   if (!LIST_ISEMPTY(>block_rules)) {
> -   /* insert block_rules into http_req_rules at
> the beginning */
> -   curproxy->block_rules.p->n=
> curproxy->http_req_rules.n;
> -   curproxy->http_req_rules.n->p = 
> curproxy->block_rules.p;
> -   curproxy->block_rules.n->p=
> >http_req_rules;
> -   curproxy->http_req_rules.n= 
> curproxy->block_rules.n;
> -   LIST_INIT(>block_rules);
> -   }
> 
> Thanks
> Bart
> 
I think you are totally correct, and this is the right thing to do. That
block is unrelated to the other changes in that commit, and probably
should not have been removed.

Regards,

Olivier



Re: `ssl_fc_has_early` fetcher and 0rtt

2020-09-09 Thread Olivier Houchard
On Wed, Sep 09, 2020 at 05:35:28PM +0200, Willy Tarreau wrote:
> On Wed, Sep 09, 2020 at 04:57:58PM +0200, William Dauchy wrote:
> > > I think it's not easy to reproduce these tests, you need a high enough
> > > latency between haproxy and the client so that the handshake is not
> > > already completed when you evaluate the rule, and of course you need
> > > to make sure the client sends using early data. I don't remember how
> > > Olivier used to run his tests but I remember that it was a bit tricky,
> > > so it's very possible that you never fall into the situation where you
> > > can see the unvalidated early data yet.
> > 
> > It means my understanding of this fetcher was wrong indeed.
> > For me the protection was here:
> >   http-request wait-for-handshake if ! METH_GET
> > and the fetcher here to log whether it was a 0rtt request or not.
> > In reality, it means all our requests have completed the handshake
> > when the rule is evaluated (which is surprising looking at the number
> > we have).
> 
> That seems strange indeed but looking at the code that's what I'm
> seeing. Was your access to ssl_fc_has_early placed before or after the
> rule above ? If it's after it must indeed report false.
> 
> > So maybe we can possibly work on an alternative fetcher to know
> > whether this was a 0rtt request? Or is there another way?
> 
> I seem to remember there was one but can't find it, so I may have been
> confused. With this said, it doesn't provide a big information since
> once the handshake is completed, it's exactly identical to a regular
> one. But it can be nice for statistics at least.
> 

Pretty sure the original version always returned 1 if there were early
data, but we changed it so that it would return 0 once the handshake was
done, as we thought it was more useful, that may be what you're thinking
about.

Olivier



Re: compile issues on FreeBSD/i386

2020-06-20 Thread Olivier Houchard
Hi Dmitry,

On Sat, Jun 20, 2020 at 03:30:55PM +0300, Dmitry Sivachenko wrote:
> Hello!
> 
> I am trying to compile haproxy-2.2-dev10 on FreeBSD-12/i386 (i386 is 
> important here) with clang version 9.0.1.
> 
> I get the following linker error:
> 
>   LD  haproxy
> ld: error: undefined symbol: __atomic_fetch_add_8
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server)
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server)
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server_and_queue)
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server_and_queue)
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server_and_queue)
> >>> referenced by backend.c
> >>>   src/backend.o:(assign_server_and_queue)
> >>> referenced by backend.c
> >>>   src/backend.o:(connect_server)
> >>> referenced by backend.c
> >>>   src/backend.o:(connect_server)
> >>> referenced by backend.c
> >>>   src/backend.o:(connect_server)
> >>> referenced by backend.c
> >>>   src/backend.o:(srv_redispatch_connect)
> >>> referenced 233 more times
> 
> ld: error: undefined symbol: __atomic_store_8
> 
> For some time we apply the following patch to build on FreeBSD/i386:
> 
> --- include/common/hathreads.h.orig 2018-02-17 18:17:22.21940 +
> +++ include/common/hathreads.h  2018-02-17 18:18:44.598422000 +
> @@ -104,7 +104,7 @@ extern THREAD_LOCAL unsigned long tid_bit; /* The bit 
>  /* TODO: thread: For now, we rely on GCC builtins but it could be a good 
> idea to
>   * have a header file regrouping all functions dealing with threads. */
>  
> -#if defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 
> 7) && !defined(__clang__)
> +#if (defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 
> 7) && !defined(__clang__)) || (defined(__clang__) && defined(__i386__))
>  /* gcc < 4.7 */
>  
>  #define HA_ATOMIC_ADD(val, i)__sync_add_and_fetch(val, i)
> 
> (it is from older -dev but still applies to include/haproxy/atomic.h and 
> fixes the build).
> 
> If this patch is correct for i386, may be we include it to haproxy sources?
 
The problem seems to be that clang assumes that the atomic access may be
done unaligned. Fixing most of it is pretty straight forward, it mostly
involves adding __attribute__((aligned(64))) on all the relevant
structure members. However, no amount of cajoling helps for trace.c, as
long as we use ev_ptr, it always assumes it may be unaligned.
Even more funny, while
HA_ATOMIC_STORE(>report_events, 0); will work if report_events is 
flagged as 64bits-aligned, HA_ATOMIC_STORE(ev_ptr == >report_events
? >report_events : >report_events, 0) will make clang unhappy.
So I don't know how to handle that properly. Willy, I think Dmitry's
solution of using the old builtins for clang/i386 might be the less
intrusive way of fixing it.

> PS:  with that patch applied I get the following warning which can have sense:
> 
> src/stick_table.c:3462:12: warning: result of comparison 'unsigned long' > 
> 4294967295 is always false [-Wtautological-type-limit-compare]
> val > 0x)
> ~~~ ^ ~~
> 
This one sounds mostly harmless, I guess whe should use a long long
here, and strtoull().

Olivier





Re: [PATCH] CLEANUP: connections: align function declaration

2020-05-04 Thread Olivier Houchard
Hi William,

On Mon, May 04, 2020 at 02:08:54PM +0200, William Dauchy wrote:
> On Mon, May 04, 2020 at 01:56:10PM +0200, William Dauchy wrote:
> > Sorry for this one, I originally did update the server.h file, but I
> > don't know why I forgot to add it in my patch submission :/
> > then I saw you fixed it while pushing my commit, thank you for this.
> 
> I'm talking too fast, the .h was correctly sent.
> in fact it was a indeed a mistake added in between while pushing the
> fix. I lost myself between the different version here and there.
> Anyway :)

You did no wrong, this is actually my fault, I decided to make the
function static, as there were no need to export the symbol, so I
removed the changes from server.h, and added a static prototype at the
beginning of server.c, but quite obviously I forgot to add "static" to
the function definition :)
Your patch is now applied, thanks !

Olivier



Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread Olivier Houchard
On Sat, May 02, 2020 at 10:23:05PM +0200, William Dauchy wrote:
> On Sat, May 02, 2020 at 10:17:01PM +0200, Olivier Houchard wrote:
> > If that's the only change you have for a v2, don't bother, I already
> > integrated it in what I plan to push :)
> 
> ok, thanks a lot!
> 

It's now pushed, so any fix for the fix is now welcome :)

Thanks a lot !

Olivier



Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread Olivier Houchard
On Sat, May 02, 2020 at 10:14:15PM +0200, William Dauchy wrote:
> On Sat, May 02, 2020 at 09:52:36PM +0200, William Dauchy wrote:
> > +void srv_cleanup_connections(struct server *srv)
> > +{
> > +   struct connection *conn;
> > +   int did_remove;
> > +   int i;
> > +   int j;
> > +
> > +   HA_SPIN_LOCK(OTHER_LOCK, _conn_srv_lock);
> > +   for (i = 0; i < global.nbthread; i++) {
> > +   did_remove = 0;
> > +   HA_SPIN_LOCK(OTHER_LOCK, _lock[i]);
> > +   for (j = 0; j < srv->curr_idle_conns; j++) {
> > +   conn = MT_LIST_POP(>idle_conns[i], struct 
> > connection *, list);
> > +   if (!conn)
> > +   conn = MT_LIST_POP(>safe_conns[i],
> > +  struct connection *, list);
> > +   if (!conn)
> > +   break;
> 
> just realised I forgot:
>   did_remove = 1;
> 
> but will wait before sending a possible v2 with other feedbacks.

If that's the only change you have for a v2, don't bother, I already
integrated it in what I plan to push :)

Regards,

Olivier



Re: [PATCH 0/1] fix idle connections cleanup

2020-05-02 Thread Olivier Houchard
Hi William,

On Sat, May 02, 2020 at 09:52:35PM +0200, William Dauchy wrote:
> Hello Olivier,
> 
> The following patch is an attempt to fix a change of behavior we encounter
> following commit:
> 
> 079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are 
> killed")
> being the origin of this new behaviour.
> 
> All details are in the commit message itself.
> The approach is most likely not the good one but feel free to advise so
> I may improve it.
> 
This is a good catch !

And I think your fix is correct, so I will apply it as-is. However, I
think it should be applied to previous branches, that also had idle
connection pools. It is probably less noticable before 079cb9af22da6,
because before that patch, one thread couldn't reuse a connection owned
by another thread, so to reproduce it you'd have to test it until you
hit the same thread twice, but I don't see why it wouldn't be an issue
tgere

Thanks a lot !

Olivier



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard
On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:
> On 2020-03-27 16:27, Olivier Houchard wrote:
> > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
> >> During the reload I just found something in the daemon log:
> >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
> >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
> >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
> >> Starting proxy someotherlistener: cannot bind socket [:::18540]
> >> 
> >> So during the reload, this happened and seems to have caused any 
> >> further
> >> issues/trouble.
> >> 
> > 
> > That would make sense. Does that mean you have old processes hanging
> > around ? Do you use seemless reload ? If so, it shouldn't attempt to
> > bind the socket, but get them from the old process.
> 
> I remember that it was necessary to have a systemd wrapper around, as it 
> caused trouble otherwise, due to PID being changed etc.
> Not sure if that wrapper is still in use. In this case it's systemd 
> though.
> [Unit]
> Description=HAProxy Load Balancer
> After=network.target
> 
> [Service]
> Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecReload=/bin/kill -USR2 $MAINPID
> KillMode=mixed
> Restart=always
> SuccessExitStatus=143
> TimeoutStopSec=30
> Type=notify

[...]

> We've added the TimeoutStopSec=30 for some reason (I'd have to ask my 
> college, something took longer or something like that, since we have 
> quite a lot of frontends/listener/backend)
> Only the two processes I mentioned before are / were running. Seems like 
> the fallback didn't work properly?
> 

The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting the
timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats socket,
and get all the listening sockets, so that it won't have to bind them.

Regards,

Olivier



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard
On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
> During the reload I just found something in the daemon log:
> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
> Starting proxy someotherlistener: cannot bind socket [:::18540]
> 
> So during the reload, this happened and seems to have caused any further 
> issues/trouble.
> 

That would make sense. Does that mean you have old processes hanging
around ? Do you use seemless reload ? If so, it shouldn't attempt to
bind the socket, but get them from the old process.

Regards,

Olivier



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Olivier Houchard
Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:
> Hi list,
> 
> we have some weird issues now, the second time, that *some* SSL sockets 
> seem to be broken as well as stats sockets.
> HTTP seems to work fine, still, SSL ones are broken however. It happened 
> at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether 
> the first time was on 2.1.2 or 2.1.3.
> The one that failed today was updated yesterday, so HAProxy has an 
> uptime of about 24h.
> We're using threads. default + HTTP is using 1 thread, 1 is dedicated 
> for a TCP listener/Layer-4, one is for RSA only and all the rest is for 
> ECC.
[...]
> The problem ocurred arount 13:40 (CET, in case it matters at some point)
> 
> Any ideas so far?
> 

So basically, it used to work, and suddenly you get errors on any TLS
connection ?
If you still have the TCP stat socket working, can you show the output
of "show fd" ?

Thanks !

Olivier



Re: commit 493d9dc makes a SVN-checkout stall..

2020-03-25 Thread Olivier Houchard
Hi Willy,

On Wed, Mar 25, 2020 at 02:03:56PM +0100, Willy Tarreau wrote:
> On Wed, Mar 25, 2020 at 12:40:33PM +0100, Olivier Houchard wrote:
> > I think I figured it out, and commit 
> > 69664419d209d2f64dbcd90f991e7ec2efff2ee2
> > should fix it, at least now I can do a full svn checkout.
> > 
> > Can you confirm it does the trick for you ?
> 
> Olivier, I think you made a last minute change after your fix because
> it broke the build:
> 
> https://travis-ci.com/github/haproxy/haproxy/jobs/302517378
> 
>   src/mux_h1.c:2495:23: error: incompatible pointer types passing 'struct 
> connection **' to parameter of type 'const struct buffer *' 
> [-Werror,-Wincompatible-pointer-types]
> if (unlikely(b_data(>conn)))
> ^~
>   include/common/compiler.h:98:40: note: expanded from macro 'unlikely'
>   #define unlikely(x) (__builtin_expect((x) != 0, 0))
>^
>   include/common/buf.h:100:50: note: passing argument to parameter 'b' here
>   static inline size_t b_data(const struct buffer *b)
>  ^
> I think it was h1c->dbuf not conn.
> 
> Willy

That is... interesting, not sure I reached such an outstanding result.

This is now fixed, sorry about that !

Olivier



Re: commit 493d9dc makes a SVN-checkout stall..

2020-03-25 Thread Olivier Houchard
Hi Pieter,

On Wed, Mar 25, 2020 at 01:42:14AM +0100, Olivier Houchard wrote:
> Hi Pieter,
> 
> On Wed, Mar 25, 2020 at 01:14:46AM +0100, PiBa-NL wrote:
> > Hi List, Willy,
> > 
> > Today i thought lets give v2.2-dev5 a try for my production environment ;).
> > Soon it turned out to cause SVN-Checkout to stall/disconnect for a
> > repository we run locally in a Collab-SVN server.
> > 
> > I tracked it down to this commit: 493d9dc (MEDIUM: mux-h1: do not blindly
> > wake up the tasklet at end of request anymore) causing the problem for the
> > first time. Is there something tricky there that can be suspected to cause
> > the issue.? Perhaps a patch i can try?
> > 
> > While 'dissecting' the issue i deleted the whole directory each time and
> > performed a new svn-checkout several times. It doesn't always stall at the
> > exact same point but usually after checking out around +- 20 files something
> > between 0.5 and 2 MB. , the commit before that one allows me to checkout
> > 500+MB through haproxy without issue.. A wireshark seems to show that
> > haproxy is sending several of RST,ACK packets for a 4 different connections
> > to the svn-server at the same milisecond after it was quiet for 2 seconds..
> > The whole issue happens in a timeframe of start of checkout till when it
> > stalls within 15 seconds.
> > 
> > The 'nokqueue' i usually try on my FreeBSD machine doesn't change anything.
> > 
> > Hope you have an idea where to look. Providing captures/logs is a bit
> > difficult without some careful scrubbing..
> > 
> > Regards,
> > PiBa-NL (Pieter)
> > 
> 
> No answer yet, but I just tried to do a checkout of the FreeBSD svn tree
> via haproxy, and it indeed doesn't work, it's a bit late right now, but
> I'll have a look tomorrow :)
> 

I think I figured it out, and commit 69664419d209d2f64dbcd90f991e7ec2efff2ee2
should fix it, at least now I can do a full svn checkout.

Can you confirm it does the trick for you ?

Thanks !

Olivier



Re: commit 493d9dc makes a SVN-checkout stall..

2020-03-24 Thread Olivier Houchard
Hi Pieter,

On Wed, Mar 25, 2020 at 01:14:46AM +0100, PiBa-NL wrote:
> Hi List, Willy,
> 
> Today i thought lets give v2.2-dev5 a try for my production environment ;).
> Soon it turned out to cause SVN-Checkout to stall/disconnect for a
> repository we run locally in a Collab-SVN server.
> 
> I tracked it down to this commit: 493d9dc (MEDIUM: mux-h1: do not blindly
> wake up the tasklet at end of request anymore) causing the problem for the
> first time. Is there something tricky there that can be suspected to cause
> the issue.? Perhaps a patch i can try?
> 
> While 'dissecting' the issue i deleted the whole directory each time and
> performed a new svn-checkout several times. It doesn't always stall at the
> exact same point but usually after checking out around +- 20 files something
> between 0.5 and 2 MB. , the commit before that one allows me to checkout
> 500+MB through haproxy without issue.. A wireshark seems to show that
> haproxy is sending several of RST,ACK packets for a 4 different connections
> to the svn-server at the same milisecond after it was quiet for 2 seconds..
> The whole issue happens in a timeframe of start of checkout till when it
> stalls within 15 seconds.
> 
> The 'nokqueue' i usually try on my FreeBSD machine doesn't change anything.
> 
> Hope you have an idea where to look. Providing captures/logs is a bit
> difficult without some careful scrubbing..
> 
> Regards,
> PiBa-NL (Pieter)
> 

No answer yet, but I just tried to do a checkout of the FreeBSD svn tree
via haproxy, and it indeed doesn't work, it's a bit late right now, but
I'll have a look tomorrow :)

Regards,

Olivier



Re: Clarification for ARM support

2020-03-09 Thread Olivier Houchard
Hi,

On Mon, Mar 09, 2020 at 10:33:55AM +0200, Emilio Fernandes wrote:
> Hello HAProxy team,
> 
> At https://www.haproxy.org/#plat I see that ARM architecture is supported
> for Linux with newer kernel.
> To avoid confusion could you please add next to it whether it is only 32
> bit, 64 bit or both ? Just like "x86, x86_64" before that.
> 
> Recently we started testing on ARM64 and so far it works great! Great job!
> Thank you!
> 

I guess this was written before arm64 was really a thing, but both 32bits
and 64bits arm are actively supported.

Regards,

Olivier



Re: freebsd ci is broken - commit 08fa16e - curl download stalls in reg-tests/compression/lua_validation.vtc

2020-01-15 Thread Olivier Houchard
Hi guys,

On Tue, Jan 14, 2020 at 09:45:34PM +0100, Willy Tarreau wrote:
> Hi guys,
> 
> On Tue, Jan 14, 2020 at 08:02:51PM +0100, PiBa-NL wrote:
> > Below a part of the output that the test generates for me. The first curl
> > request seems to succeed, but the second one runs into a timeout..
> > When compiled with the commit before 08fa16e 
> > 
> 
> Ah, and unsurprizingly I'm the author :-/
> 
> I'm wondering why it only affects FreeBSD (very likely kqueue in fact, I
> suppose it works if you start with -dk). Maybe something subtle escaped
> me in the poller after the previous changes.
> 
> > Should i update to a newer FreeBSD version, or is it likely unrelated, and
> > in need of some developer attention.. Do you (Willy or anyone), need more
> > information from my side? Or is there a patch i can try to validate?
> 
> I don't think I need more info for now and your version has nothing to do
> with this (until proven otherwise). I apparently really broke something
> there. I think I have a FreeBSD VM somewhere, in the worst case I'll ask
> Olivier for some help :-)
> 

To give you a quick update, we investigating that, and I'm still not really
sure why it only affects FreeBSD, but we fully understood the problem, and
it should be fixed by now.

Regards,

Olivier



Re: Connection reuse for HTTP/2?

2019-09-18 Thread Olivier Houchard
Hi David,

On Tue, Sep 17, 2019 at 04:58:28PM -0500, David Pirotte wrote:
> Hi all,
> 
> Is there a way to multiplex frontend HTTP/2 (GRPC) connections onto a shared 
> pool of backend HTTP/2 connections?
> 
> My testing shows that one frontend connection with multiple concurrent 
> requests will reuse one backend connection, but multiple frontend connections 
> will not share backend connections.
> 

HTTP/2 backend connections should be reused, however be aware that currently
there's one connection pool per thread, so chances are, if you have 2 frontend
connections, they ended up on 2 different threads, and so wouldn't reuse the
other connection. I'm not sure how you ran your test, but using "nbthread 1"
should make it reuse as you'd expect (or just having more frontend connections
than threads, at some point you should see some reuse).

Regards,

Olivier




Re: [PATCH] BUG/MINOR: ssl: fix 0-RTT for BoringSSL

2019-08-07 Thread Olivier Houchard
On Wed, Aug 07, 2019 at 03:09:26PM +0200, Emmanuel Hocdet wrote:
> Hi,
> 
> Two patches to fix (and simplify) 0-RTT for BoringSSL.
> If you can consider them.
> 
> ++
> Manu
> 

Applied, thanks !

Olivier




Re: tfo on default-server settings

2019-07-04 Thread Olivier Houchard
Hi Fred,

On Thu, Jul 04, 2019 at 02:37:55PM +0200, Frederic Lecaille wrote:
> On 7/4/19 1:36 PM, Olivier Houchard wrote:
> > Hi William,
> > 
> > On Wed, Jul 03, 2019 at 04:52:14PM +, William Dauchy wrote:
> > > Hello,
> > > 
> > > On haproxy v2.0.x, while using tfo option in default-server:
> > >   default-server inter 5s fastinter 1s fall 3 slowstart 20s observe 
> > > layer7 error-limit 5 on-error fail-check pool-purge-delay 10s tfo
> > > we are getting:
> > >   'default-server inter' : 'tfo' option is not accepted in default-server 
> > > sections.
> > > 
> > > from https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.2-tfo
> > > 
> > > Is tfo excluded from default-server options?
> > 
> > There's indeed no reason to disallow tfo on default-server, it was an
> > oversight, it is fixed in master with commit
> > 8d82db70a5f32427fb592a947d2a612bcb01d95e, and should be backported in 2.0
> > before the next release.
> > 
> > Regards,
> > 
> > Olivier
> > 
> 
> This makes me think we should perhaps add "no-tfo" option as this is already
> done for several other "server" options.
> 
> The attached patch should do the trick.
> 
> Fred.


Ooops, very good point.
I just pushed your patch, thanks a lot !

Olivier



Re: tfo on default-server settings

2019-07-04 Thread Olivier Houchard
Hi William,

On Wed, Jul 03, 2019 at 04:52:14PM +, William Dauchy wrote:
> Hello,
> 
> On haproxy v2.0.x, while using tfo option in default-server:
>  default-server inter 5s fastinter 1s fall 3 slowstart 20s observe layer7 
> error-limit 5 on-error fail-check pool-purge-delay 10s tfo
> we are getting:
>  'default-server inter' : 'tfo' option is not accepted in default-server 
> sections.
> 
> from https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.2-tfo
> 
> Is tfo excluded from default-server options?

There's indeed no reason to disallow tfo on default-server, it was an
oversight, it is fixed in master with commit
8d82db70a5f32427fb592a947d2a612bcb01d95e, and should be backported in 2.0
before the next release.

Regards,

Olivier



Re: haproxy=2.0.1: socket leak

2019-06-28 Thread Olivier Houchard
Hi Maksim,

On Fri, Jun 28, 2019 at 04:09:48PM +0300, Максим Куприянов wrote:
> Hi!
> 
> I found out that in some situations under high rate of incoming connections
> haproxy=2.0.1 starts leaking sockets. It looks like haproxy doesn't close
> connections to its backends after request is finished (FIN received from
> client) thus leaving its server-sockets in close-wait state.
> 
> As an example this simple config starts leaking right from the start for
> me. And everything ends with messages: "local0.emerg Proxy
> server.local:18400 reached process FD limit (maxsock=4130). Please check
> 'ulimit-n' and restart."
> 
> My config look very similar to this:
> global
>   daemon
>   uid 120
>   gid 126
>   stats socket /var/run/haproxy.sock mode 700 level admin expose-fd
> listeners
>   master-worker
>   log 127.0.0.1:516 local0 warning
>   maxconn 2000
>   tune.ssl.default-dh-param 2048
> 
> defaults
>   log global
>   maxconn 4096
>   modetcp
>   retries 3
>   timeout client  1h
>   timeout connect 5s
>   timeout server  1h
>   option  redispatch
>   option  dontlognull
> 
> listen server.local:18400
>   bind ipv6@::1:18400 tfo
>   bind ipv4@127.0.0.1:18400 tfo
>   mode tcp
>   balance leastconn
> 
>   timeout server 24h
>   timeout client 24h
>   option  dontlog-normal
>   log 127.0.0.1:516 local1 info
>   option  httpchk GET /check HTTP/1.1\r\nHost:\ server.local
>   http-check send-state
>   http-check expect status 200
>   tcp-request inspect-delay 10s
>   tcp-request content reject if { nbsrv lt 1 }
>   default-server weight 50
>   server backend-server.local:17995 backend-server.local:17995 check port
> 17994
> 

What kind of requests do you do ? GET ? POST ? Others ?

Thanks !

Olivier



Re: Config Segmentation Fault [2.0.1]

2019-06-28 Thread Olivier Houchard
Hi Luke,

On Fri, Jun 28, 2019 at 07:05:32AM +0200, Luke Seelenbinder wrote:
> Hello all,
> 
> I've found a segfault in v2.0.1. I believe the issue is a no-ssl directive on 
> a server line after seeing check ssl on default-server in defaults. Here's 
> the snips of my config. I haven't been able to create a minimal config that 
> recreates it, since my config is rather complex.
> 
> defaults
>   log  global
>   mode http
>   default-server ca-file ca-certificates.crt resolvers default inter 5s 
> fastinter 2s downinter 10s init-addr libc,last check ssl check-alpn http/1.1 
> pool-purge-delay 60s max-reuse 1500 alpn http/1.1
> […snip…]
> backend varnish
>   server varnish_local   unix@/path-to-socket.sock no-check-ssl no-ssl
> 
> If I remove no-ssl, it starts up, but the check naturally fails. If I add it 
> back, I get a segmentation fault. I've tried this with and without unix 
> sockets to verify it wasn't something related to IP binding.
> 
> I'm happy to try alternatives / test things a bit.
> 
> Best,

Indeed, "check-alpn" failed to make sure we were really using a SSL connection
before attempting to change the ALPN. This should be fixed by commit
c50eb73b85f80ac1ac6e519fcab2ba6807f5de65, and should be backported to 2.0
soon.

Thanks a lot !

Olivier



Re: haproxy 2.0: SIGSEGV in ssl_subscribe

2019-06-25 Thread Olivier Houchard
Hi Maksim,

On Tue, Jun 25, 2019 at 01:29:24PM +0300, Максим Куприянов wrote:
> Hi!
> 
> Got SIGSEGV in ssl_subscribe function. Happens multiple times per day.
> Haproxy was built from trunk with commits upto:
> http://git.haproxy.org/?p=haproxy-2.0.git;a=commit;h=9eae8935663bc0b27c23018e8cc24ae9a3e31732
> 

It is believed to be fixed in master, and should be backported to 2.0 soon.
If you feel like it, you can apply commit
0ff28651c184f2b6cc7782b0960ed69cc907ca97, and let me know if you still have
issues or not.
You may want c31e2cbd28d53210b7184091bb64137c806d7957 too.

Regards,

Olivier



Re: Zero RTT in backend server side

2019-06-24 Thread Olivier Houchard
Hi Igor,

On Sun, Jun 23, 2019 at 08:42:46PM +0800, Igor Pav wrote:
> Hi Olivier,
> 
> The `retry-on 0rtt-rejected` will only work in tcp mode, is that
> possible to let it work in http mode too?
> 

It should work with HTTP too. What may happen is you're using "alpn" on
the server line, and thus we have to wait until the handshake is done to
know if we're using H1 or H2, so we can't send early data, because we won't
know its format.
If you only want yo use H2, you can add "proto h2" on your server line, and
it should work.

Regards,

Olivier



Re: ea8dd949e4ab7ddd94afdbf0e96087c883192217 breaks allow-0rtt

2019-06-15 Thread Olivier Houchard
Hi Igor,

On Sat, Jun 15, 2019 at 07:19:24PM +0800, Igor Pav wrote:
> Hi Olivier,
> 
> Still suffering from 2.0-dev7-b6563f-41 :(
> 

I can't seem to reproduce it.
I found a potential issue, and 965e84e2df7ba448d887bd5e9e03d76b206d3eee may
help.
If it does not, how do you reproduce it ? Which client and which server do
you use ?

Thanks !

Olivier



Re: ea8dd949e4ab7ddd94afdbf0e96087c883192217 breaks allow-0rtt

2019-06-15 Thread Olivier Houchard
Hi Igor,

On Sat, Jun 15, 2019 at 03:00:23AM +0800, Igor Pav wrote:
> Hello, dev
> 
> The commit of ea8dd949e4ab7ddd94afdbf0e96087c883192217 seems to break
> the allow-0rtt in server line, a connection will take very very long
> to complete. Remove allow-0rtt it turns normal.
> 
> conf like:
> 
> listen test
>  mode tcp
>  bind 0.0.0.0:88
>  default_backend tls
> backend tls
>   mode tcp
>   retry-on 0rtt-rejected conn-failure empty-response response-timeout
>   server default x.x.x.x:4433 allow-0rtt ssl check inter 300s alpn h2
> verify none
> 

Indeed, it was not working as expected when specifying the ALPN. I believe
this is now fixed, can you confirm it ?

Thanks !

Olivier



Re: 2.0-dev5-ea8dd94 - conn_fd_handler() - dumps core - Program terminated with signal 11, Segmentation fault.

2019-06-07 Thread Olivier Houchard
On Thu, Jun 06, 2019 at 08:33:26PM +0200, PiBa-NL wrote:
> Hi Olivier,
> 
> Op 6-6-2019 om 18:20 schreef Olivier Houchard:
> > Hi Pieter,
> > 
> > On Wed, Jun 05, 2019 at 09:00:22PM +0200, PiBa-NL wrote:
> > > Hi Olivier,
> > > 
> > > It seems this commit ea8dd94  broke something for my FreeBSD11 system.
> > > Before that commit (almost) all vtest's succeed. After it several cause
> > > core-dumps. (and keep doing that including the current HEAD: 03abf2d )
> > > 
> > > Can you take a look at the issue?
> > > 
> > > Below in this mail are the following:
> > > - gdb# bt full  of one of the crashed tests..
> > > - summary of failed tests
> > > 
> > > Regards,
> > > PiBa-NL (Pieter)
> > > 
> > Indeed, there were a few issues. I know pushed enough patches so that I only
> > fail one reg test, which also failed before the offending commit
> > (reg-tests/compression/basic.vtc).
> > Can you confirm it's doing better for you too ?
> > 
> > Thanks !
> > 
> > Olivier
> 
> Looks better for me :).
> 
> Testing with haproxy version: 2.0-dev5-7b3a79f
> 0 tests failed, 0 tests skipped, 36 tests passed
> 
> This includes the /compression/basic.vtc for me.
> 
> p.s.
> This result doesn't "always" happen. But at least it seems 'just as good' as
> before ea8dd94. For example i still see this on my tests:
> 1 tests failed, 0 tests skipped, 35 tests passed
> ## Gathering results ##
> ## Test case: 
> ./work/haproxy-7b3a79f/reg-tests/http-rules/converters_ipmask_concat_strcmp_field_word.vtc
> ##
> ## test results in:
> "/tmp/haregtests-2019-06-06_20-22-18.5Z9PR6/vtc.4579.056b0e93"
>  c1    0.3 EXPECT resp.status (504) == "200" failed
> But then another test run of the same binary again says '36 passed'.. so it
> seems some tests are rather timing sensitive, or maybe a other variable
> doesn't play nice.. Anyhow the core-dump as reported is fixed. Ill try and
> find why the testresults are a bit inconsistent when running them
> repeatedly.. Anyhow ill send a new mail for that if i find something
> conclusive :).
> 

Hi Pieter,

Yeah I had this one randomly failing for quite some time, on Linux and on
FreeBSD, haven't bothered to investigate yet.

Regards,

Olivier



Re: 2.0-dev5-ea8dd94 - conn_fd_handler() - dumps core - Program terminated with signal 11, Segmentation fault.

2019-06-06 Thread Olivier Houchard
Hi Pieter,

On Wed, Jun 05, 2019 at 09:00:22PM +0200, PiBa-NL wrote:
> Hi Olivier,
> 
> It seems this commit ea8dd94  broke something for my FreeBSD11 system.
> Before that commit (almost) all vtest's succeed. After it several cause
> core-dumps. (and keep doing that including the current HEAD: 03abf2d )
> 
> Can you take a look at the issue?
> 
> Below in this mail are the following:
> - gdb# bt full  of one of the crashed tests..
> - summary of failed tests
> 
> Regards,
> PiBa-NL (Pieter)
> 

Indeed, there were a few issues. I know pushed enough patches so that I only
fail one reg test, which also failed before the offending commit
(reg-tests/compression/basic.vtc).
Can you confirm it's doing better for you too ?

Thanks !

Olivier



Re: Zero RTT in backend server side

2019-05-15 Thread Olivier Houchard
Hi William,

On Wed, May 15, 2019 at 01:10:37PM +0200, William Dauchy wrote:
> Hello Olivier,
> 
> In another subject related to 0rtt was wondering why it was not
> available in ssl-default-bind-options?
> 

We usually only add options in ssl-default-bind-options that can later be
overriden on a per-bind basis, but right now, there's no option to disable
0RTT.

Regards,

Olivier



Re: [1.9 HEAD] HAProxy using 100% CPU

2019-05-10 Thread Olivier Houchard
Hi Maciej,

On Thu, May 09, 2019 at 07:25:54PM +0200, Maciej Zdeb wrote:
> Hi again,
> 
> I have bad news, HAProxy 1.9.7-35b44da still looping :/
> 
> gdb session:
> h2_process_mux (h2c=0x1432420) at src/mux_h2.c:2609
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb) n
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2613if (!LIST_ISEMPTY(>sending_list))
> (gdb)
> 2619if (!h2s->send_wait) {
> (gdb)
> 2620LIST_DEL_INIT(>list);
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2613if (!LIST_ISEMPTY(>sending_list))
> (gdb)
> 2619if (!h2s->send_wait) {
> (gdb)
> 2620LIST_DEL_INIT(>list);
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb)
> 2613if (!LIST_ISEMPTY(>sending_list))
> (gdb)
> 2619if (!h2s->send_wait) {
> (gdb)
> 2620LIST_DEL_INIT(>list);
> (gdb)
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb) p *h2s
> $1 = {cs = 0x1499030, sess = 0x819580 , h2c = 0x1432420, h1m
> = {state = H1_MSG_DONE, flags = 29, curr_len = 0, body_len = 111976, next =
> 411, err_pos = -1, err_state = 0}, by_id = {node = {branches = {b =
> {0x13dcf50,
>   0x15b3120}}, node_p = 0x125ab90, leaf_p = 0x15b3121, bit = 1, pfx
> = 0}, key = 11}, id = 11, flags = 28675, mws = 977198, errcode =
> H2_ERR_NO_ERROR, st = H2_SS_CLOSED, status = 200, body_len = 0, rxbuf =
> {size = 0, area = 0x0,
> data = 0, head = 0}, wait_event = {task = 0x15077a0, handle = 0x0,
> events = 0}, recv_wait = 0x0, send_wait = 0x0, list = {n = 0x15b31a8, p =
> 0x15b31a8}, sending_list = {n = 0x15b31b8, p = 0x15b31b8}}
> (gdb) p *h2s_back
> $2 = {cs = 0x1499030, sess = 0x819580 , h2c = 0x1432420, h1m
> = {state = H1_MSG_DONE, flags = 29, curr_len = 0, body_len = 111976, next =
> 411, err_pos = -1, err_state = 0}, by_id = {node = {branches = {b =
> {0x13dcf50,
>   0x15b3120}}, node_p = 0x125ab90, leaf_p = 0x15b3121, bit = 1, pfx
> = 0}, key = 11}, id = 11, flags = 28675, mws = 977198, errcode =
> H2_ERR_NO_ERROR, st = H2_SS_CLOSED, status = 200, body_len = 0, rxbuf =
> {size = 0, area = 0x0,
> data = 0, head = 0}, wait_event = {task = 0x15077a0, handle = 0x0,
> events = 0}, recv_wait = 0x0, send_wait = 0x0, list = {n = 0x15b31a8, p =
> 0x15b31a8}, sending_list = {n = 0x15b31b8, p = 0x15b31b8}}
> (gdb) p *h2c
> $3 = {conn = 0x17e3310, st0 = H2_CS_FRAME_H, errcode = H2_ERR_NO_ERROR,
> flags = 0, streams_limit = 100, max_id = 13, rcvd_c = 0, rcvd_s = 0, ddht =
> 0x1e99a40, dbuf = {size = 0, area = 0x0, data = 0, head = 0}, dsi = 13, dfl
> = 4,
>   dft = 8 '\b', dff = 0 '\000', dpl = 0 '\000', last_sid = -1, mbuf = {size
> = 16384, area = 0x1e573a0 "", data = 13700, head = 0}, msi = -1, mfl = 0,
> mft = 0 '\000', mff = 0 '\000', miw = 65535, mws = 10159243, mfs = 16384,
>   timeout = 2, shut_timeout = 2, nb_streams = 2, nb_cs = 3,
> nb_reserved = 0, stream_cnt = 7, proxy = 0xb85fc0, task = 0x126aa30,
> streams_by_id = {b = {0x125ab91, 0x0}}, send_list = {n = 0x15b31a8, p =
> 0x125ac18}, fctl_list = {
> n = 0x14324f8, p = 0x14324f8}, sending_list = {n = 0x1432508, p =
> 0x1432508}, buf_wait = {target = 0x0, wakeup_cb = 0x0, list = {n =
> 0x1432528, p = 0x1432528}}, wait_event = {task = 0x1420fa0, handle = 0x0,
> events = 1}}
> (gdb) p list
> $4 = (int *) 0x0
> (gdb) n
> 2610if (h2c->st0 >= H2_CS_ERROR || h2c->flags &
> H2_CF_MUX_BLOCK_ANY)
> (gdb) n
> 2609list_for_each_entry_safe(h2s, h2s_back, >send_list, list) {
> (gdb) p *h2s
> $5 = {cs = 0x1499030, sess = 0x819580 , h2c = 0x1432420, h1m
> = {state = H1_MSG_DONE, flags = 29, curr_len = 0, body_len = 111976, next =
> 411, err_pos = -1, err_state = 0}, by_id = {node = {branches = {b =
> {0x13dcf50,
>   0x15b3120}}, node_p = 0x125ab90, leaf_p = 0x15b3121, bit = 1, pfx
> = 0}, key = 11}, id = 11, flags = 28675, 

Re: [1.9 HEAD] HAProxy using 100% CPU

2019-05-08 Thread Olivier Houchard
On Wed, May 08, 2019 at 02:30:07PM +0200, Willy Tarreau wrote:
> On Wed, May 08, 2019 at 01:56:05PM +0200, Olivier Houchard wrote:
> > > > One of processes stuck in infinite loop, admin socket is not responsive 
> > > > so
> > > > I've got information only from gdb:
> > > > 
> > > > 0x00484ab8 in h2_process_mux (h2c=0x2e8ff30) at 
> > > > src/mux_h2.c:2589
> > > > 2589if (h2s->send_wait->events & SUB_CALL_UNSUBSCRIBE)
> > > > (gdb) n
> > > (...)
> > > 
> > > CCing Olivier. Olivier, I'm wondering if this is not directly related to
> > > what you addressed with this fix merged in 2.0 but not backported :
> > > 
> > >   998410a ("BUG/MEDIUM: h2: Revamp the way send subscriptions works.")
> > > 
> > > From what I'm seeing there's no error, the stream is in the sending list,
> > > there's no blocking flag, well, everything looks OK, but we're looping
> > > on SUB_CALL_UNSUBSCRIBE which apprently should not if I understand it
> > > right. Do you think we should backport this patch ?
> > > 
> > > Remaining of the trace below for reference.
> > > 
> > > THanks,
> > > Willy
> > > 
> > 
> > I can't seem to remember :)
> 
> Given the number of bugs we've dealt with in the last few weeks, you're
> forgiven :-)
> 

I'm afraid I'm getting old :/

> > I remember a bug involving the h2s being added to the send_list twice, and
> > thus leading to an infinite loop when parsing the send_list, but it was
> > fixed by 9a0f559676c4d309edbe42ba33197e7dd8935f1c, and backported.
> > For me, this one was mostly to make me happier about the code, and stop
> > abusing SUB_CALL_UNSUBSCRIBE, and if I didn't mention it should be 
> > backported
> > it's probably I considered the code in 1.9 was OK (or I just forgot), but
> > it's still flagged as "BUG" too (but that could be me having "BUG/MEDIUM"
> > hardcoded in my finger).
> > I think that patch is safe to backport anyway, but I wouldn't swear it 
> > solves
> > Maciej's issue.
> 
> OK. I'll take it then so that we can study the situation under better
> conditions if the problem happens again. It may be possible that we've
> also overlooked other situations where this infinite loop could happen
> with SUB_CALL_UNSUBSCRIBE and that were addressed by your patch.
> 

Yes it is certainly possible it helps, I just can affirm it with confidence.

Regards,

Olivier



Re: [1.9 HEAD] HAProxy using 100% CPU

2019-05-08 Thread Olivier Houchard
Hi,

On Wed, May 08, 2019 at 07:11:27AM +0200, Willy Tarreau wrote:
> Hi Maciej,
> 
> On Tue, May 07, 2019 at 07:08:47PM +0200, Maciej Zdeb wrote:
> > Hi,
> > 
> > I've got another bug with 100% CPU on HAProxy process, it is built from
> > HEAD of 1.9 branch.
> > 
> > One of processes stuck in infinite loop, admin socket is not responsive so
> > I've got information only from gdb:
> > 
> > 0x00484ab8 in h2_process_mux (h2c=0x2e8ff30) at src/mux_h2.c:2589
> > 2589if (h2s->send_wait->events & SUB_CALL_UNSUBSCRIBE)
> > (gdb) n
> (...)
> 
> CCing Olivier. Olivier, I'm wondering if this is not directly related to
> what you addressed with this fix merged in 2.0 but not backported :
> 
>   998410a ("BUG/MEDIUM: h2: Revamp the way send subscriptions works.")
> 
> From what I'm seeing there's no error, the stream is in the sending list,
> there's no blocking flag, well, everything looks OK, but we're looping
> on SUB_CALL_UNSUBSCRIBE which apprently should not if I understand it
> right. Do you think we should backport this patch ?
> 
> Remaining of the trace below for reference.
> 
> THanks,
> Willy
> 

I can't seem to remember :)
I remember a bug involving the h2s being added to the send_list twice, and
thus leading to an infinite loop when parsing the send_list, but it was
fixed by 9a0f559676c4d309edbe42ba33197e7dd8935f1c, and backported.
For me, this one was mostly to make me happier about the code, and stop
abusing SUB_CALL_UNSUBSCRIBE, and if I didn't mention it should be backported
it's probably I considered the code in 1.9 was OK (or I just forgot), but
it's still flagged as "BUG" too (but that could be me having "BUG/MEDIUM"
hardcoded in my finger).
I think that patch is safe to backport anyway, but I wouldn't swear it solves
Maciej's issue.

Regards,

Olivier



Re: recent LibreSSL regressions

2019-05-06 Thread Olivier Houchard
Hi Ilya,

On Mon, May 06, 2019 at 12:54:56AM +0500, Илья Шипицин wrote:
> hello,
> 
> when I first sent LibreSSL patches (it was 27th April 2019), reg-tests were
> ok.
> I suspect recent 0RTT patches could break LibreSSL things
> 
> can someone have a look ?
> 
> https://travis-ci.org/chipitsine/haproxy-1/builds/528535120
> 
> thanks!

You're right indeed, it was definitively broken for libressl, it should be
fixed with commit 4cd2af4e5d785c42d2924492df987a7cd5832e23

Regards,

Olivier



Re: Zero RTT in backend server side

2019-05-05 Thread Olivier Houchard
Hi Igor,

On Mon, May 06, 2019 at 12:26:33AM +0800, Igor Pav wrote:
> Hi, Olivier, thanks for the effort. So can we force the server always
> to carry data to remote via 0RTT like below scenario(to protect
> http2http in unsecured env)?
> 
> listen http -- server default x.x ssl allow-0rtt (SSL) bind
> x.x ssl allow-0rtt -- http backend
> 

As it is currently, no. Haproxy will never attempt to use 0RTT on server
connections if the client didn't use 0RTT.
2.0, however, which should be released in a not to distant future, will let
you do that, with the new "retry-on" feature.

Regards,

Olivier




Re: Zero RTT in backend server side

2019-05-02 Thread Olivier Houchard
Hi Igor,

On Thu, May 02, 2019 at 08:39:58PM +0800, Igor Pav wrote:
> Hello, can we use TLS zero RTT in server-side now? Just want to reduce
> more latency when using SSL talk to the backend servers(also running
> haproxy).
> 
> Thanks in advance. Regards
> 

It should work if you add "allow-0rtt" on your server line. However it hasn't
been tested for some time, and was written with a development version of
OpenSSL 1.1.1, so I wouldn't be entirely surprised if it didn't work anymore.

Regards,

Olivier



Re: PATCH: fix typo

2019-04-30 Thread Olivier Houchard
Hi Ilya,

On Tue, Apr 30, 2019 at 09:23:21PM +0500,  ?? wrote:
> Hello,
> 
> typo was found by cppcheck
> 
> thank!
> Ilya Shipitsin

> From 62c6b78843e9c1b6f829872a54175f68332a2859 Mon Sep 17 00:00:00 2001
> From: Ilya Shipitsin 
> Date: Tue, 30 Apr 2019 21:21:28 +0500
> Subject: [PATCH] MINOR: fix typo "src" instead of "srv"
> 
> ---
>  src/server.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/server.c b/src/server.c
> index 12a14adc..9916361d 100644
> --- a/src/server.c
> +++ b/src/server.c
> @@ -1639,7 +1639,7 @@ static void srv_settings_cpy(struct server *srv, struct 
> server *src, int srv_tmp
>   srv->check.port   = src->check.port;
>   srv->check.sni= src->check.sni;
>   srv->check.alpn_str   = src->check.alpn_str;
> - srv->check.alpn_len   = srv->check.alpn_len;
> + srv->check.alpn_len   = src->check.alpn_len;
>   /* Note: 'flags' field has potentially been already initialized. */
>   srv->flags   |= src->flags;
>   srv->do_check = src->do_check;

Oops, that seems right, it's been pushed, thanks a lot !

Olivier



Re: [1.9.7] One of haproxy processes using 100% CPU

2019-04-30 Thread Olivier Houchard
Hi Maciej,

On Tue, Apr 30, 2019 at 08:43:07AM +0200, Maciej Zdeb wrote:
> Filtered results from show fd for that particular virtual server:
> 
> 10 : st=0x22(R:pRa W:pRa) ev=0x01(heopI) [lc] cnext=-3 cprev=-2 tmask=0x1
> umask=0x0 owner=0x53a5690 iocb=0x59d689(conn_fd_handler) back=0
> cflg=0x80243300 fe=virtual-server_front mux=H2 ctx=0x6502860 h2c.st0=2
> .err=0 .maxid=17 .lastid=-1 .flg=0x1 .nbst=0 .nbcs=1 .fctl_cnt=0
> .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=0 .dsi=13 .dbuf=0@(nil)+0/0
> .msi=-1 .mbuf=0@(nil)+0/0 last_h2s=0x5907040 .id=13 .flg=0x4005
> .rxbuf=0@(nil)+0/0
> .cs=0x905b1b0 .cs.flg=0x00106a00 .cs.data=0x5d1d228
> 98 : st=0x22(R:pRa W:pRa) ev=0x01(heopI) [lc] cnext=-1809 cprev=-2
> tmask=0x1 umask=0x0 owner=0xa3bb7f0 iocb=0x59d689(conn_fd_handler) back=0
> cflg=0x80201300 fe=virtual-server_front mux=H2 ctx=0xa71f310 h2c.st0=3
> .err=0 .maxid=0 .lastid=-1 .flg=0x0008 .nbst=0 .nbcs=0 .fctl_cnt=0
> .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=0 .dsi=3
> .dbuf=16384@0x5873f10+61/16384
> .msi=-1 .mbuf=0@(nil)+0/0

I see that it seems to be HTTP/2. Not saying it's your problem, but a bug
that would cause haproxy to use 100% of the CPU has been fixed in the HTTP/2
code just after the 1.9.7 release was done.
Any chance you can see if it still happens with that commit :
commit c980b511bfef566e9890eb9a06d607c193d63828
Author: Willy Tarreau 
Date:   Mon Apr 29 10:20:21 2019 +0200

BUG/MEDIUM: mux-h2: properly deal with too large headers frames

Regards,

Olivier



Re: Infinite loop after 39cc020af BUG/MEDIUM: streams: Don't remove the SI_FL_ERR flag in si_update_both()

2019-04-12 Thread Olivier Houchard
Hi,

On Fri, Apr 12, 2019 at 08:37:10AM +0200, Maciej Zdeb wrote:
> Hi Richard,
> 
> Those patches from Olivier (in streams) are related to my report from
> thread "[1.9.6] One of haproxy processes using 100% CPU", but as it turned
> out it wasn't a single bug and issue is not entirely fixed yet.
> 
> Currently I'm testing some additional patches from Olivier which hopefully
> fix the issue definitely.
> 

Indeed, the rmoeval of SI_FL_ERR in si_update_both() was bogus, and covered
misuses of it.
With the great help of Maciej, we investigated this, and I just pushed what
we fixed so far. Richard, I'd be really interested in knowing if you still
have issues with the latest master.

Thanks !

Olivier

> pt., 12 kwi 2019 o 00:01 Richard Russo  napisał(a):
> 
> > It seems that after applying 39cc020af, if a stream gets the SI_FL_ERR
> > flag, process_stream can keep going back to redo around stream.c:line 2503:
> >
> > if (si_f->state == SI_ST_DIS || si_f->state != si_f_prev_state ||
> > si_b->state == SI_ST_DIS || si_b->state != si_b_prev_state ||
> > ((si_f->flags | si_b->flags) & SI_FL_ERR) ||
> > (((req->flags ^ rqf_last) | (res->flags ^ rpf_last)) &
> > CF_MASK_ANALYSER))
> >  goto redo;
> >
> > Now that si_update_both no longer clears the SI_FL_ERR flag, and nothing
> > else does, the goto will get called forever. I don't understand this
> > section enough to try to reproduce this, but I found several processes
> > stuck here on a machine testing from yesterday's HEAD.
> >
> > Richard
> >
> > --
> >   Richard Russo
> >   to...@enslaves.us
> >
> >



Re: [1.9.6] One of haproxy processes using 100% CPU

2019-04-05 Thread Olivier Houchard
Hi Maciej,

On Fri, Apr 05, 2019 at 01:33:29PM +0200, Maciej Zdeb wrote:
> I think I found something, please look at the session 0x2110edc0 and
> "calls", it never ends:
> 
> socat /run/haproxy/haproxy2.sock - <<< "show sess all"
> 0x2110edc0: [05/Apr/2019:12:03:32.141927] id=14505 proto=tcpv4
> source=S.S.S.S:52414
>   flags=0x4504e, conn_retries=3, srv_conn=0xe2e7fb0, pend_pos=(nil)
> waiting=0
>   frontend=dev_metrics_5044_front (id=1015 mode=tcp), listener=? (id=2)
> addr=B.B.B.B:5044
>   backend=B.B.B.B:5044_back (id=1016 mode=tcp) addr=X.X.X.X:16222
>   server=slot_5_1 (id=34) addr=N.N.N.N:31728
>   task=0x21105f90 (state=0x00 nice=0 calls=-1222180819 exp= tmask=0x1
> age=1h6m)
>   si[0]=0x2110f058 (state=EST flags=0x4 endp0=CS:0x210fe2a0 exp=
> et=0x000 sub=0)
>   si[1]=0x2110f098 (state=EST flags=0x80010 endp1=CS:0x2109eb10 exp=
> et=0x200 sub=0)
>   co0=0x210f8040 ctrl=tcpv4 xprt=RAW mux=PASS data=STRM
> target=LISTENER:0xe25a290
>   flags=0x00283300 fd=2404 fd.state=22 fd.cache=0 updt=0 fd.tmask=0x1
>   cs=0x210fe2a0 csf=0x0640 ctx=0x210fe2c0
>   co1=0x20d84cb0 ctrl=tcpv4 xprt=RAW mux=PASS data=STRM
> target=SERVER:0xe2e7fb0
>   flags=0x003c3310 fd=1959 fd.state=22 fd.cache=0 updt=0 fd.tmask=0x1
>   cs=0x2109eb10 csf=0x1000 ctx=(nil)
>   req=0x2110edd0 (f=0x848800 an=0x0 pipe=0 tofwd=-1 total=38767)
>   an_exp= rex= wex=?
>   buf=0x2110edd8 data=0x20e8a7e0 o=15360 p=15360 req.next=0 i=0
> size=16384
>   res=0x2110ee30 (f=0x8004a028 an=0x0 pipe=0 tofwd=0 total=66)
>   an_exp= rex= wex=
>   buf=0x2110ee38 data=(nil) o=0 p=0 rsp.next=0 i=0 size=0
> 0x20eec3f0: [05/Apr/2019:13:18:24.444808] id=29808 proto=unix_stream
> source=unix:2
>   flags=0x8, conn_retries=0, srv_conn=(nil), pend_pos=(nil) waiting=0
>   frontend=GLOBAL (id=0 mode=tcp), listener=? (id=2) addr=unix:2
>   backend= (id=-1 mode=-)
>   server= (id=-1)
>   task=0x21028880 (state=0x00 nice=-64 calls=1 exp=30s tmask=0x1 age=?)
>   si[0]=0x20eec688 (state=EST flags=0x28 endp0=CS:0x20cd4600
> exp= et=0x000 sub=1)
>   si[1]=0x20eec6c8 (state=EST flags=0x204018 endp1=APPCTX:0x20cdfe60
> exp= et=0x000 sub=0)
>   co0=0x210af6a0 ctrl=unix_stream xprt=RAW mux=PASS data=STRM
> target=LISTENER:0x11b7d20
>   flags=0x00203306 fd=10 fd.state=25 fd.cache=0 updt=0 fd.tmask=0x1
>   cs=0x20cd4600 csf=0x0200 ctx=0x20cd4620
>   app1=0x20cdfe60 st0=7 st1=0 st2=3 applet= tmask=0x1, nice=-64,
> calls=2, cpu=0, lat=0
>   req=0x20eec400 (f=0xc48202 an=0x0 pipe=0 tofwd=-1 total=14)
>   an_exp= rex=30s wex=
>   buf=0x20eec408 data=0x20e2a530 o=0 p=0 req.next=0 i=0 size=16384
>   res=0x20eec460 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1465)
>   an_exp= rex= wex=
>   buf=0x20eec468 data=0x20d5bbd0 o=1465 p=1465 rsp.next=0 i=0 size=16384
> 

This is indeed very strange. I'm quite interested in seeing the output of
"show fd", as you mentionned you had it.

Thanks !

Olivier



Re: 1.9.5: SIGSEGV in wake_srv_chk under heavy load

2019-03-28 Thread Olivier Houchard
Hi Maksim,

On Thu, Mar 28, 2019 at 04:36:16PM +0300,  ?? wrote:
> Hi!
> 
> We accidentally got a spike of client requests to our site and under that
> heavy load to ssl-protected backends haproxy=1.9.5 had fallen down :(
> 

I think I understand how it may have happened, and hopefully fixed it with
commit to master 06f6811d9fc3f36597b686e4ca9fe7fbccf091b0
It will be backported to 1.9 later, but if you need it right now, you can
probably apply it on 1.9 as-is.

Thanks a lot for reporting !

Olivier



Re: [PATCH]: BUILD/MINOR listener

2019-03-27 Thread Olivier Houchard
Hi David,

On Wed, Mar 27, 2019 at 04:13:28PM +, David CARLIER wrote:
> Hi Here a little patch proposal.
> 
> Kind regards.

Sounds good ! I pushed it.

Thanks a lot !

Olivier



Re: [Potential Spoof] [PATCH] BUG/MAJOR: fd/threads, task/threads: ensure all spin locks are unlocked

2019-02-25 Thread Olivier Houchard
Hi Richard,

On Wed, Feb 20, 2019 at 11:58:42PM +, Richard Russo wrote:
> While continuing to test this, I ended up with a crash in 
> listener.c:listener_accept on a closed/closing listen socket where 
> fdtab[fd].owner is NULL by the time the thread gets there. This is possible 
> because the fd.c: fdlist_process_cached_events unlocks the spinlock before 
> calling fdtab[fd].iocb(fd); allowing for a concurrent close.
> 
> I'm not sure if the right thing to do is to hold the FD's spinlock while 
> calling the callback, consider a read/write style lock for listen FDs, or 
> simply adding a check that l is not NULL in listener_accept similar to the 
> one in connection.c:conn_fd_handle. I'm testing the NULL check right now, 
> since it's simplest. I haven't had a crash since, but that doesn't mean it 
> won't crash.  This change would need to be applied to 1.8 as well, although I 
> haven't managed to observe a crash on 1.8.

I think you're right on all accounts.
I've pushed your patch, and after talking with Willy, came to the conclusion
checking if the listener is NULL before using it should be enough, so I
pushed a patch that did just that as well.
Can you confirm that your problems are gone in master ?

Thanks a lot !

Olivier




Re: Compilation fails on OS-X

2019-02-14 Thread Olivier Houchard
Hi Patrick,

On Thu, Feb 14, 2019 at 09:12:18AM -0500, Patrick Hemmer wrote:
> 
> 
> On 2019/2/14 08:20, Frederic Lecaille wrote:
> > On 2/14/19 1:32 PM, Frederic Lecaille wrote:
> >> On 2/13/19 7:30 PM, Patrick Hemmer wrote:
> >>>
> >>>
> >>> On 2019/2/13 10:29, Olivier Houchard wrote:
> >>>> Hi Patrick,
> >>>>
> >>>> On Wed, Feb 13, 2019 at 10:01:01AM -0500, Patrick Hemmer wrote:
> >>>>> On 2019/2/13 09:40, Aleksandar Lazic wrote:
> >>>>>> Am 13.02.2019 um 14:45 schrieb Patrick Hemmer:
> >>>>>>> Trying to compile haproxy on my local machine for testing
> >>>>>>> purposes and am
> >>>>>>> running into the following:
> >>>>>> Which compiler do you use?
> >>>>> # gcc -v
> >>>>> Configured with:
> >>>>> --prefix=/Applications/Xcode.app/Contents/Developer/usr
> >>>>> --with-gxx-include-dir=/usr/include/c++/4.2.1
> >>>>> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> >>>>> Target: x86_64-apple-darwin17.7.0
> >>>>> Thread model: posix
> >>>>> InstalledDir:
> >>>>> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> >>>>>
> >>>>>
> >>>>>>>  # make TARGET=osx
> >>>>>>>  src/proto_http.c:293:1: error: argument to 'section'
> >>>>>>> attribute is not
> >>>>>>> valid for this target: mach-o section specifier requires a
> >>>>>>> segment and section
> >>>>>>> separated by a comma
> >>>>>>>  DECLARE_POOL(pool_head_http_txn, "http_txn",
> >>>>>>> sizeof(struct http_txn));
> >>>>>>>  ^
> >>>>>>>  include/common/memory.h:128:2: note: expanded from
> >>>>>>> macro 'DECLARE_POOL'
> >>>>>>>  REGISTER_POOL(, name, size)
> >>>>>>>  ^
> >>>>>>>  include/common/memory.h:123:2: note: expanded from
> >>>>>>> macro 'REGISTER_POOL'
> >>>>>>>  INITCALL3(STG_POOL,
> >>>>>>> create_pool_callback, (ptr), (name),
> >>>>>>> (size))
> >>>>>>>  ^
> >>>>>>>  include/common/initcall.h:102:2: note: expanded from
> >>>>>>> macro 'INITCALL3'
> >>>>>>>  _DECLARE_INITCALL(stage, __LINE__,
> >>>>>>> function, arg1, arg2,
> >>>>>>> arg3)
> >>>>>>>  ^
> >>>>>>>  include/common/initcall.h:78:2: note: expanded from macro
> >>>>>>> '_DECLARE_INITCALL'
> >>>>>>>  __DECLARE_INITCALL(__VA_ARGS__)
> >>>>>>>  ^
> >>>>>>>  include/common/initcall.h:65:42: note: expanded from macro
> >>>>>>> '__DECLARE_INITCALL'
> >>>>>>> __attribute__((__used__,__section__("init_"#stg))) =   \
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Issue occurs on master, and the 1.9 branch
> >>>>>>>
> >>>>>>> -Patrick
> >>>> Does the (totally untested, because I have no Mac to test) patch
> >>>> works for
> >>>> you ?
> >>>
> >>> Unfortunately not. Just introduces a lot of new errors:
> >>>
> >>>
> >>>  In file included from src/ev_poll.c:22:
> >>>  In file included from include/common/hathreads.h:26:
> >>>  include/common/initcall.h:134:22: error: expected ')'
> >>>  DECLARE_INIT_SECTION(STG_PREPARE);
> >>>   ^
> >>>  include/common/initcall.h:134:1: note: to match this '('
> >>>  DECLARE_INIT_SECTION(STG_PREPARE);
> >>>  ^
> >>>  include/common/initcall.h:124:82: note: expanded from macro
> >>> 'DECLARE_INIT_SECTION'
> >>>   

Re: Compilation fails on OS-X

2019-02-13 Thread Olivier Houchard
Hi Patrick,

On Wed, Feb 13, 2019 at 10:01:01AM -0500, Patrick Hemmer wrote:
> 
> 
> On 2019/2/13 09:40, Aleksandar Lazic wrote:
> > Am 13.02.2019 um 14:45 schrieb Patrick Hemmer:
> >> Trying to compile haproxy on my local machine for testing purposes and am
> >> running into the following:
> > Which compiler do you use?
> 
>   # gcc -v
>   Configured with: 
> --prefix=/Applications/Xcode.app/Contents/Developer/usr 
> --with-gxx-include-dir=/usr/include/c++/4.2.1
>   Apple LLVM version 9.0.0 (clang-900.0.39.2)
>   Target: x86_64-apple-darwin17.7.0
>   Thread model: posix
>   InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> 
> >> # make TARGET=osx
> >> src/proto_http.c:293:1: error: argument to 'section' attribute is 
> >> not
> >> valid for this target: mach-o section specifier requires a segment and 
> >> section
> >> separated by a comma
> >> DECLARE_POOL(pool_head_http_txn, "http_txn", sizeof(struct 
> >> http_txn));
> >> ^
> >> include/common/memory.h:128:2: note: expanded from macro 
> >> 'DECLARE_POOL'
> >> REGISTER_POOL(, name, size)
> >> ^
> >> include/common/memory.h:123:2: note: expanded from macro 
> >> 'REGISTER_POOL'
> >> INITCALL3(STG_POOL, create_pool_callback, (ptr), 
> >> (name),
> >> (size))
> >> ^
> >> include/common/initcall.h:102:2: note: expanded from macro 
> >> 'INITCALL3'
> >> _DECLARE_INITCALL(stage, __LINE__, function, arg1, 
> >> arg2,
> >> arg3)
> >> ^
> >> include/common/initcall.h:78:2: note: expanded from macro
> >> '_DECLARE_INITCALL'
> >> __DECLARE_INITCALL(__VA_ARGS__)
> >> ^
> >> include/common/initcall.h:65:42: note: expanded from macro
> >> '__DECLARE_INITCALL'
> >>    
> >> __attribute__((__used__,__section__("init_"#stg))) =   \
> >>
> >>
> >>
> >> Issue occurs on master, and the 1.9 branch
> >>
> >> -Patrick
> 

Does the (totally untested, because I have no Mac to test) patch works for
you ?

Thanks !

Olivier
>From 2f68108ae32a66782b4c7518f8830896732be64d Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 13 Feb 2019 16:22:17 +0100
Subject: [PATCH] BUILD/MEDIUM: initcall: Fix build on MacOS.

MacOS syntax for sections is a bit different, so implement it.

This should be backported to 1.9.
---
 include/common/initcall.h | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/common/initcall.h b/include/common/initcall.h
index 35d237fd..95e488ba 100644
--- a/include/common/initcall.h
+++ b/include/common/initcall.h
@@ -50,6 +50,12 @@ struct initcall {
void *arg3;
 };
 
+#ifdef __APPLE__
+#define HA_SECTION(s) __section__("__DATA, " s)
+#else
+#define HA_SECTION(s) __section__((s))
+#endif
+
 /* Declare a static variable in the init section dedicated to stage ,
  * with an element referencing function  and arguments .
  *  is needed to deduplicate entries created from a same file. The
@@ -62,7 +68,7 @@ struct initcall {
  */
 #define __DECLARE_INITCALL(stg, linenum, function, a1, a2, a3) \
 static const struct initcall *__initcb_##linenum   \
-   __attribute__((__used__,__section__("init_"#stg))) =   \
+   __attribute__((__used__,HA_SECTION("init_"#stg))) =\
(stg < STG_SIZE) ? &(const struct initcall) {  \
.fct = (void (*)(void *,void *,void *))function,   \
.arg1 = (void *)(a1),  \
@@ -113,9 +119,16 @@ struct initcall {
  * empty. The corresponding sections must contain exclusively pointers to
  * make sure each location may safely be visited by incrementing a pointer.
  */
+#ifdef __APPLE__
+#define DECLARE_INIT_SECTION(stg)  
 \
+   extern __attribute__((__weak__)) const struct initcall 
*__start_init_##stg __asm("section$start$__DATA$" stg); \
+   extern __attribute__((__weak__)) const struct initcall 
*__stop_init_##stg __asm("section$end$__DATA$" stg)
+
+#else
 #define DECLARE_INIT_SECTION(stg)  
 \
extern __attribute__((__weak__)) const struct initcall 
*__start_init_##stg; \
extern __attribute__((__weak__)) const struct initcall 
*__stop_init_##stg
+#endif
 
 /* Declare all initcall sections here */
 DECLARE_INIT_SECTION(STG_PREPARE);
-- 
2.20.1



Re: Segfault in assign_tproxy_address with h2 and source address .[1.9.2]

2019-01-17 Thread Olivier Houchard
On Thu, Jan 17, 2019 at 03:09:20PM +, Luke Seelenbinder wrote:
> Hi Oliver,
> 
> Yes! I can confirm the patch does indeed work—thanks for the quick turnaround.
> 
> Best,
> Luke
> 

Willy, can you push it ?

Thanks !

Olivier
>From 585bcc7f8ec84573d070d2d9d1e0b104fd18eb48 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 17 Jan 2019 15:59:13 +0100
Subject: [PATCH] BUG/MEDIUM: servers: Make assign_tproxy_address work when
 ALPN is set.

If an ALPN is set on the server line, then when we reach assign_tproxy_address,
the stream_interface's endpoint will be a connection, not a conn_stream,
so make sure assign_tproxy_address() handles both cases.

This should be backported to 1.9.
---
 src/backend.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend.c b/src/backend.c
index ab5a629c..3e327975 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1021,7 +1021,12 @@ static void assign_tproxy_address(struct stream *s)
struct server *srv = objt_server(s->target);
struct conn_src *src;
struct connection *cli_conn;
-   struct connection *srv_conn = cs_conn(objt_cs(s->si[1].end));
+   struct connection *srv_conn;
+
+   if (objt_cs(s->si[1].end))
+   srv_conn = cs_conn(__objt_cs(s->si[1].end));
+   else
+   srv_conn = objt_conn(s->si[1].end);
 
if (srv && srv->conn_src.opts & CO_SRC_BIND)
src = >conn_src;
-- 
2.19.1



Re: Segfault in assign_tproxy_address with h2 and source address .[1.9.2]

2019-01-17 Thread Olivier Houchard
Hi Luke,

On Thu, Jan 17, 2019 at 02:35:38PM +, Luke Seelenbinder wrote:
> Hello all,
> 
> First, I wanted to say a huge thanks to the team for a producing a quality 
> piece of software. My company just moved all of our traffic over, and the 
> performance and nimbleness of haproxy is impressive. I'm testing 1.9.2 for 
> migration as soon as it's stable for our use-case.
> 
> I'm experiencing a segfault when running in mode http, option http-use-htx, 
> with h2 backends (alpn h2) and an assigned source address. A sanitized config 
> is as follows:
> 
> defaults
>source []
>mode http
>http-reuse always
>option http-use-htx
> 
> listen test
>bind :443 ssl crt  alpn h2,http/1.1
>server backend ipv6@ check ssl crt  ca-file 
>  verifyhost  alpn h2,http/1.1 check-alpn http/1.1
> 
> If I disable h2 on the backend, it works correctly. If I disable the source 
> in defaults, it works correctly. I've attached the backtrace below.
> 
> Best,
> Luke
> 

I think I understand what's going on.
Does the attached patch fix it for you ?

Thanks a lot !

Olivier
>From 585bcc7f8ec84573d070d2d9d1e0b104fd18eb48 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 17 Jan 2019 15:59:13 +0100
Subject: [PATCH] BUG/MEDIUM: servers: Make assign_tproxy_address work when
 ALPN is set.

If an ALPN is set on the server line, then when we reach assign_tproxy_address,
the stream_interface's endpoint will be a connection, not a conn_stream,
so make sure assign_tproxy_address() handles both cases.

This should be backported to 1.9.
---
 src/backend.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend.c b/src/backend.c
index ab5a629c..3e327975 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1021,7 +1021,12 @@ static void assign_tproxy_address(struct stream *s)
struct server *srv = objt_server(s->target);
struct conn_src *src;
struct connection *cli_conn;
-   struct connection *srv_conn = cs_conn(objt_cs(s->si[1].end));
+   struct connection *srv_conn;
+
+   if (objt_cs(s->si[1].end))
+   srv_conn = cs_conn(__objt_cs(s->si[1].end));
+   else
+   srv_conn = objt_conn(s->si[1].end);
 
if (srv && srv->conn_src.opts & CO_SRC_BIND)
src = >conn_src;
-- 
2.19.1



Re: Lots of mail from email alert on 1.9.x

2019-01-13 Thread Olivier Houchard
Hi,

On Sat, Jan 12, 2019 at 01:11:29PM +0100, Willy Tarreau wrote:
> Hi Pieter,
> 
> On Sat, Jan 12, 2019 at 01:00:33AM +0100, PiBa-NL wrote:
> > Thanks for this 'change in behavior' ;). Indeed the mailbomb is fixed, and
> > it seems the expected mails get generated and delivered, but a segfault also
> > happens on occasion. Not with the regtest as it was, but with a few minor
> > modifications (adding a unreachable mailserver, and giving it a little more
> > time seems to be the most reliable reproduction a.t.m.) it will crash
> > consistently after 11 seconds.. So i guess the patch needs a bit more
> > tweaking.
> > 
> > Regards,
> > PiBa-NL (Pieter)
> > 
> > Core was generated by `haproxy -d -f /tmp/vtc.37274.4b8a1a3a/h1/cfg'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x00500955 in chk_report_conn_err (check=0x802616a10,
> > errno_bck=0, expired=1) at src/checks.c:689
> > 689 dns_trigger_resolution(check->server->dns_requester);
> > (gdb) bt full
> > #0  0x00500955 in chk_report_conn_err (check=0x802616a10,
> > errno_bck=0, expired=1) at src/checks.c:689
> 
> Indeed, this function should not have any special effect in this case,
> it is needed to prepend this at the beginning of chk_report_conn_err() :
> 
>   if (!check->server)
>   return;
> 
> We need to make sure that check->server is properly tested everywhere.
> With a bit of luck this one was the only remnant.
> 

I'd rather just avoid calling dns_trigger_resolution() if there's no server,
it seems it is the only use of check->server in chk_report_conn_err(), so
that set_server_check_status() is call, and the check's status and result
may be updated. Not sure it is really needed, but I'd rather not offend
the Check Gods.
The attached patches are updated to od just that.

Regards,

Olivier
>From 7e7b41cac480029c5fd93338dd86a875fee0b5a7 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 11 Jan 2019 18:17:17 +0100
Subject: [PATCH 1/2] MINOR: checks: Store the proxy in checks.

Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.

This should be backported to 1.9.
---
 include/types/checks.h |  1 +
 src/checks.c   | 32 +---
 src/server.c   |  1 +
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/types/checks.h b/include/types/checks.h
index 6346fe33..f89abcba 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -180,6 +180,7 @@ struct check {
int send_string_len;/* length of agent command 
string */
char *send_string;  /* optionally send a string 
when connecting to the agent */
struct server *server;  /* back-pointer to server */
+   struct proxy *proxy;/* proxy to be used */
char **argv;/* the arguments to use if 
running a process-based check */
char **envp;/* the environment to use if 
running a process-based check */
struct pid_list *curpid;/* entry in pid_list used for 
current process-based test, or -1 if not in test */
diff --git a/src/checks.c b/src/checks.c
index 4baaf9fc..edb61b4f 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2124,7 +2124,7 @@ static struct task *process_chk_proc(struct task *t, void 
*context, unsigned sho
 static struct task *process_chk_conn(struct task *t, void *context, unsigned 
short state)
 {
struct check *check = context;
-   struct server *s = check->server;
+   struct proxy *proxy = check->proxy;
struct conn_stream *cs = check->cs;
struct connection *conn = cs_conn(cs);
int rv;
@@ -2142,7 +2142,7 @@ static struct task *process_chk_conn(struct task *t, void 
*context, unsigned sho
 * is disabled.
 */
if (((check->state & (CHK_ST_ENABLED | CHK_ST_PAUSED)) != 
CHK_ST_ENABLED) ||
-   s->proxy->state == PR_STSTOPPED)
+   proxy->state == PR_STSTOPPED)
goto reschedule;
 
/* we'll initiate a new check */
@@ -2167,8 +2167,8 @@ static struct task *process_chk_conn(struct task *t, void 
*context, unsigned sho
 */
t->expire = tick_add(now_ms, MS_TO_TICKS(check->inter));
 
-   if (s->proxy->timeout.check && 
s->proxy->timeout.connect) {
-   int t_con = tick_add(now_ms, 
s->proxy->timeout.connect);
+

Re: Lots of mail from email alert on 1.9.x

2019-01-11 Thread Olivier Houchard
On Fri, Jan 11, 2019 at 06:53:11PM +0100, Olivier Houchard wrote:
> Hi,
> 
> On Fri, Jan 11, 2019 at 10:36:04AM +0100, Johan Hendriks wrote:
> > Thanks all.
> > No rush on my side as it is a test machine, at least we do know when a
> > backend server fails.
> > With all this mail the mail server are giving alarms too :D
> > 
> > I disable the mail feature for now.
> > 
> > Thanks again Pieter, Willy and Oliver for all your work.
> > 
> > 
> > Op 10-01-19 om 20:05 schreef PiBa-NL:
> > > Hi Johan, Olivier, Willy,
> > >
> > > Op 10-1-2019 om 17:00 schreef Johan Hendriks:
> > >> I just updated to 1.9.1 on my test system.
> > >>
> > >> We noticed that when a server fails we now get tons of mail, and with
> > >> tons we mean a lot.
> > >>
> > >> After a client backend server fails we usually get 1 mail on 1.8.x now
> > >> with 1.9.1 within 1 minute we have the following.
> > >>
> > >> mailq | grep -B2 l...@testdomain.nl | grep '^[A-F0-9]' | awk '{print
> > >> $1}' | sed 's/*//' | postsuper -d -
> > >> postsuper: Deleted: 19929 messages
> > >>
> > >> My setting from the backend part is as follows.
> > >>
> > >>  email-alert mailers alert-mailers
> > >>  email-alert from l...@testdomain.nl
> > >>  email-alert to not...@testdomain.nl
> > >>  server webserver09 11.22.33.44:80 check
> > >>
> > >> Has something changed in 1.9.x (it was on 1.9.0 also)
> > >>
> > >> regards
> > >> Johan Hendriks
> > >>
> > >>
> > > Its a 'known issue' see:
> > > https://www.mail-archive.com/haproxy@formilux.org/msg32290.html
> > > a 'regtest' is added in that mail thread also to aid developers in
> > > reproducing the issue and validating a possible fix.
> > >
> > > @Olivier, Willy, may i assume this mailbomb feature is 'planned' to
> > > get fixed in 1.9.2 ? (perhaps a bugtracker with a 'target version'
> > > would be nice ;) ?)
> > >
> > > Regards,
> > > PiBa-NL (Pieter)
> > 
> 
> Ok so erm, I'd be lying if I claimed I enjoy working on the check code, or
> that I understand it fully. However, after talking with Willy and Christopher,
> I think I may have comed with an acceptable solution, and the attached patch
> should fix it (at least by getting haproxy to segfault, but it shouldn't
> mailbomb you anymore).
> Pieter, I'd be very interested to know if it still work with your setup.
> It's a different way of trying to fix what you tried ot fix with
> 1714b9f28694d750d446917672dd59c46e16afd7 
> I'd like to be sure I didn't break it for you again :)
> 
> Regards,
> 
> Olivier

(Slightly modified patches, I think there were a potential race condition
when running with multiple threads).

Olivier
>From 7e7b41cac480029c5fd93338dd86a875fee0b5a7 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 11 Jan 2019 18:17:17 +0100
Subject: [PATCH 1/2] MINOR: checks: Store the proxy in checks.

Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.

This should be backported to 1.9.
---
 include/types/checks.h |  1 +
 src/checks.c   | 32 +---
 src/server.c   |  1 +
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/types/checks.h b/include/types/checks.h
index 6346fe33..f89abcba 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -180,6 +180,7 @@ struct check {
int send_string_len;/* length of agent command 
string */
char *send_string;  /* optionally send a string 
when connecting to the agent */
struct server *server;  /* back-pointer to server */
+   struct proxy *proxy;/* proxy to be used */
char **argv;/* the arguments to use if 
running a process-based check */
char **envp;/* the environment to use if 
running a process-based check */
struct pid_list *curpid;/* entry in pid_list used for 
current process-based test, or -1 if not in test */
diff --git a/src/checks.c b/src/checks.c
index 4baaf9fc..edb61b4f 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2124,7 +2124,7 @@ static struct task *process_chk_proc(struct task *t, void 
*context, unsigned sho
 static struct task *process_chk_conn(struct task *t, void *context, unsi

Re: Lots of mail from email alert on 1.9.x

2019-01-11 Thread Olivier Houchard
Hi,

On Fri, Jan 11, 2019 at 10:36:04AM +0100, Johan Hendriks wrote:
> Thanks all.
> No rush on my side as it is a test machine, at least we do know when a
> backend server fails.
> With all this mail the mail server are giving alarms too :D
> 
> I disable the mail feature for now.
> 
> Thanks again Pieter, Willy and Oliver for all your work.
> 
> 
> Op 10-01-19 om 20:05 schreef PiBa-NL:
> > Hi Johan, Olivier, Willy,
> >
> > Op 10-1-2019 om 17:00 schreef Johan Hendriks:
> >> I just updated to 1.9.1 on my test system.
> >>
> >> We noticed that when a server fails we now get tons of mail, and with
> >> tons we mean a lot.
> >>
> >> After a client backend server fails we usually get 1 mail on 1.8.x now
> >> with 1.9.1 within 1 minute we have the following.
> >>
> >> mailq | grep -B2 l...@testdomain.nl | grep '^[A-F0-9]' | awk '{print
> >> $1}' | sed 's/*//' | postsuper -d -
> >> postsuper: Deleted: 19929 messages
> >>
> >> My setting from the backend part is as follows.
> >>
> >>  email-alert mailers alert-mailers
> >>  email-alert from l...@testdomain.nl
> >>  email-alert to not...@testdomain.nl
> >>  server webserver09 11.22.33.44:80 check
> >>
> >> Has something changed in 1.9.x (it was on 1.9.0 also)
> >>
> >> regards
> >> Johan Hendriks
> >>
> >>
> > Its a 'known issue' see:
> > https://www.mail-archive.com/haproxy@formilux.org/msg32290.html
> > a 'regtest' is added in that mail thread also to aid developers in
> > reproducing the issue and validating a possible fix.
> >
> > @Olivier, Willy, may i assume this mailbomb feature is 'planned' to
> > get fixed in 1.9.2 ? (perhaps a bugtracker with a 'target version'
> > would be nice ;) ?)
> >
> > Regards,
> > PiBa-NL (Pieter)
> 

Ok so erm, I'd be lying if I claimed I enjoy working on the check code, or
that I understand it fully. However, after talking with Willy and Christopher,
I think I may have comed with an acceptable solution, and the attached patch
should fix it (at least by getting haproxy to segfault, but it shouldn't
mailbomb you anymore).
Pieter, I'd be very interested to know if it still work with your setup.
It's a different way of trying to fix what you tried ot fix with
1714b9f28694d750d446917672dd59c46e16afd7 
I'd like to be sure I didn't break it for you again :)

Regards,

Olivier
>From 7e7b41cac480029c5fd93338dd86a875fee0b5a7 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 11 Jan 2019 18:17:17 +0100
Subject: [PATCH 1/2] MINOR: checks: Store the proxy in checks.

Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.

This should be backported to 1.9.
---
 include/types/checks.h |  1 +
 src/checks.c   | 32 +---
 src/server.c   |  1 +
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/types/checks.h b/include/types/checks.h
index 6346fe33..f89abcba 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -180,6 +180,7 @@ struct check {
int send_string_len;/* length of agent command 
string */
char *send_string;  /* optionally send a string 
when connecting to the agent */
struct server *server;  /* back-pointer to server */
+   struct proxy *proxy;/* proxy to be used */
char **argv;/* the arguments to use if 
running a process-based check */
char **envp;/* the environment to use if 
running a process-based check */
struct pid_list *curpid;/* entry in pid_list used for 
current process-based test, or -1 if not in test */
diff --git a/src/checks.c b/src/checks.c
index 4baaf9fc..edb61b4f 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2124,7 +2124,7 @@ static struct task *process_chk_proc(struct task *t, void 
*context, unsigned sho
 static struct task *process_chk_conn(struct task *t, void *context, unsigned 
short state)
 {
struct check *check = context;
-   struct server *s = check->server;
+   struct proxy *proxy = check->proxy;
struct conn_stream *cs = check->cs;
struct connection *conn = cs_conn(cs);
int rv;
@@ -2142,7 +2142,7 @@ static struct task *process_chk_conn(struct task *t, void 
*context, unsigned sho
 * is disabled.
 */
if (((check->state & (CHK_ST_ENABLED | CHK_ST_PAUSED)) != 
CHK_ST_ENABLED) ||
-  

Re: State of 0-RTT TLS resumption with OpenSSL

2019-01-09 Thread Olivier Houchard
Hi Willy,

On Tue, Jan 08, 2019 at 03:44:07PM +0100, Willy Tarreau wrote:
> On Tue, Jan 08, 2019 at 03:27:58PM +0100, Olivier Houchard wrote:
> > On Tue, Jan 08, 2019 at 03:00:32PM +0100, Janusz Dziemidowicz wrote:
> > > pt., 4 sty 2019 o 11:59 Olivier Houchard  
> > > napisa??(a):
> > > However, I believe in general this is a bit more complicated. RFC 8446
> > > described this in detail in section 8:
> > > https://tools.ietf.org/html/rfc8446#section-8
> > > My understanding is that RFC highly recommends anti-replay with 0-RTT.
> > > It seems that s_server implements single use tickets, which is exactly
> > > what is in section 8.1. The above patch disables anti-replay
> > > completely in haproxy, which might warrant some updates to
> > > documentation about allow-0rtt option?
> > > 
> > 
> > Hi Janusz,
> > 
> > Yes indeed, I thought I documented it better than that, but obviously I
> > didn't :)
> > The allow-0rtt option was added before OpenSSL added anti-replay protection,
> > and I'm pretty sure the RFC wasn't talking about it yet, it was mostly 
> > saying
> > it was a security concern, so it was designed with "only allow it for what
> > would be safe to replay", and the documentation should certainly reflect 
> > that.
> > I will make it explicit.
> > 
> > Thanks a lot !
> 
> To clear some of the confusion on this subject, when 0-RTT was designed
> in TLS 1.3, it was sort of a backport from QUIC, and it was decided that
> the replay protection was up to the application layer (here "application"
> is not in the sense of the web application you're using, but the HTTP layer
> on top of TLS). And RFC8470 (https://tools.ietf.org/html/rfc8470), which
> haproxy implements, was produced exactly to address this specific area.
> 
> Thus an HTTP implementation which properly implements RFC8470 does have
> TLS anti-replay covered by the application layer.
> 

Can you push the attached patches ?

Thanks !

Olivier
>From fe291b43e1cb9143edf0051a1d69247dde298118 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 2 Jan 2019 18:46:41 +0100
Subject: [PATCH 1/2] BUG/MEDIUM: ssl: Disable anti-replay protection and set
 max data with 0RTT.

When using early data, disable the OpenSSL anti-replay protection, and set
the max amount of early data we're ready to accept, based on the size of
buffers, or early data won't work with the released OpenSSL 1.1.1.

This should be backported to 1.8.
---
 src/ssl_sock.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 282b85dd..13ce2e5b 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3869,6 +3869,10 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
SSL_CTX_set_select_certificate_cb(ctx, ssl_sock_switchctx_cbk);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #elif (OPENSSL_VERSION_NUMBER >= 0x10101000L)
+   if (bind_conf->ssl_conf.early_data) {
+   SSL_CTX_set_options(ctx, SSL_OP_NO_ANTI_REPLAY);
+   SSL_CTX_set_max_early_data(ctx, global.tune.bufsize - 
global.tune.maxrewrite);
+   }
    SSL_CTX_set_client_hello_cb(ctx, ssl_sock_switchctx_cbk, NULL);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #else
-- 
2.14.4

>From c1105e9723a2c79e3b445287586350e0d3d3e291 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 8 Jan 2019 15:35:32 +0100
Subject: [PATCH 2/2] DOC: Be a bit more explicit about allow-0rtt security
 implications.

Document a bit better than allow-0rtt can trivially be used for replay attacks,
and so should only be used when it's safe to replay a request.

This should probably be backported to 1.8 and 1.9.
---
 doc/configuration.txt | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 2447254c..888515fb 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -10768,7 +10768,10 @@ accept-proxy
 
 allow-0rtt
   Allow receiving early data when using TLSv1.3. This is disabled by default,
-  due to security considerations.
+  due to security considerations. Because it is vulnerable to replay attacks,
+  you should only allow if for requests that are safe to replay, ie requests
+  that are idempotent. You can use the "wait-for-handshake" action for any
+  request that wouldn't be safe with early data.
 
 alpn 
   This enables the TLS ALPN extension and advertises the specified protocol
-- 
2.14.4



Re: [PATCH] BUG/MEDIUM: init: Initialize idle_orphan_conns for first server in server-template

2019-01-09 Thread Olivier Houchard
Hi,

On Wed, Jan 09, 2019 at 01:44:08AM -0500, cripy wrote:
> Hi,
> 
> I found a segfault when using server-template within 1.9.x and 2.0-dev.
> This seems to be related to  "http-reuse" as when I set to "never" it does
> not crash anymore.
> 
> It appears that idle_orphan_conns is not being properly initialized for the
> first server within the server-template.  I was able to confirm this by
> creating a small server-template with 4 servers and setting all of the
> addresses except for the first 1.  This did not result in a crash.  As soon
> as I set and was sent to the first address it resulted in a crash.
> 
> I found that server_template_init() establishes everything fine for all
> servers (setting id from prefix with srv_set_id_from_prefix() , etc... )
> and then at the bottom of the function you can see it calls
> srv_set_id_from_prefix() to then establish the id for the first server --
> however, the first server doesn't get any of the logic to initialize the
> idle_orphan_conns.
> 
> My initial fix added the idle_orphan_conns initialization code to the
> bottom of server_template_init() (right below the srv_set_id_from_prefix()
> which sets the prefix specifically for the first server slot) -- however
> this seemed like it might be too messy.
> 
> I believe a better option is to remove the check for !srv->tmpl_info.prefix
> within server_finalize_init().  Patch attached.
> 
> Feel free to correct me if I am wrong on this assumption.
> 
> Here is the config which results in a crash:
> 
> listen fe_main
> mode http
> bind *:80
> timeout server 5ms
> timeout client 5ms
> timeout connect 5ms
> server-template srv 2 10.1.0.1:80
> 
> (Should segfault after the first request)
> 
> HA-Proxy version 2.0-dev0-251a6b-97 2019/01/08 - https://haproxy.org/
> Build options :
>   TARGET  = linux2628
>   CPU = generic
>   CC  = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter
> -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered
> -Wno-missing-field-initializers -Wtype-limits
>   OPTIONS = USE_OPENSSL=1
> 
> Backtrace:
> [New LWP 14046]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `./haproxy -f crash.cfg -d'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x004f82fe in srv_add_to_idle_list (conn=0x2331320,
> srv=0x22aeb60) at include/proto/server.h:244
> 244LIST_ADDQ(>idle_orphan_conns[tid], >list);
> (gdb) bt
> #0  0x004f82fe in srv_add_to_idle_list (conn=0x2331320,
> srv=0x22aeb60) at include/proto/server.h:244
> #1  session_free (sess=0x2330970) at src/session.c:90
> #2  0x0050dca3 in mux_pt_destroy (ctx=0x2330920) at src/mux_pt.c:38
> #3  0x00446bdb in cs_destroy (cs=0x2331230) at
> include/proto/connection.h:708
> #4  si_release_endpoint (si=si@entry=0x2330cd8) at
> include/proto/stream_interface.h:170
> #5  0x0044c9ec in stream_free (s=0x2330a40) at src/stream.c:446
> #6  process_stream (t=t@entry=0x2330e30, context=0x2330a40,
> state=) at src/stream.c:2610
> #7  0x00509955 in process_runnable_tasks () at src/task.c:432
> #8  0x0048b485 in run_poll_loop () at src/haproxy.c:2619
> #9  run_thread_poll_loop (data=data@entry=0x23267d0) at src/haproxy.c:2684
> #10 0x0040aa0c in main (argc=, argv=0x7fffd8018e48)
> at src/haproxy.c:3313
> 
> (gdb) frame 0
> #0  0x004f82fe in srv_add_to_idle_list (conn=0x2331320,
> srv=0x22aeb60) at include/proto/server.h:244
> 244LIST_ADDQ(>idle_orphan_conns[tid], >list);
> 
> (gdb) print >idle_orphan_conns[tid]
> $1 = (struct list *) 0x0
> 
> (gdb) print >list
> $2 = (struct list *) 0x2331370


Oops, that seems right, and the patch looks fine, Willy can you push it ?

Thanks a lot !

Olivier



Re: State of 0-RTT TLS resumption with OpenSSL

2019-01-08 Thread Olivier Houchard
On Tue, Jan 08, 2019 at 03:00:32PM +0100, Janusz Dziemidowicz wrote:
> pt., 4 sty 2019 o 11:59 Olivier Houchard  napisa??(a):
> > I understand the concern.
> > I checked and both nghttp2 and nginx disable the replay protection. The idea
> > is you're supposed to allow early data only on harmless requests anyway, ie
> > ones that could be replayed with no consequence.
> 
> Sorry for the late reply, I was pondering the problem ;) I'm pretty ok
> with this patch, especially since others seem to do the same. And my
> use case is DNS-over-TLS, which has no problems with replays anyway ;)
> 
> However, I believe in general this is a bit more complicated. RFC 8446
> described this in detail in section 8:
> https://tools.ietf.org/html/rfc8446#section-8
> My understanding is that RFC highly recommends anti-replay with 0-RTT.
> It seems that s_server implements single use tickets, which is exactly
> what is in section 8.1. The above patch disables anti-replay
> completely in haproxy, which might warrant some updates to
> documentation about allow-0rtt option?
> 

Hi Janusz,

Yes indeed, I thought I documented it better than that, but obviously I
didn't :)
The allow-0rtt option was added before OpenSSL added anti-replay protection,
and I'm pretty sure the RFC wasn't talking about it yet, it was mostly saying
it was a security concern, so it was designed with "only allow it for what
would be safe to replay", and the documentation should certainly reflect that.
I will make it explicit.

Thanks a lot !

Olivier



Re: State of 0-RTT TLS resumption with OpenSSL

2019-01-04 Thread Olivier Houchard
Hi Janusz,

On Fri, Jan 04, 2019 at 10:53:51AM +0100, Janusz Dziemidowicz wrote:
> czw., 3 sty 2019 o 17:52 Olivier Houchard  napisa??(a):
> > Ah I think I figured it out.
> > OpenSSL added anti-replay protection when using early data, and it messes up
> > with the session handling.
> > With the updated attached patch, I get early data to work again. Is it 
> > better
> > for you ?
> 
> Now it works.
> However, I am a bit concerned about disabling something that sounds
> like an important safeguard.
> Reading this 
> https://www.openssl.org/docs/man1.1.1/man3/SSL_SESSION_get_max_early_data.html#REPLAY-PROTECTION
> suggests that it is really not a wise thing to do.
> 
> And again, s_server works differently. It does not use
> SSL_OP_NO_ANTI_REPLAY but the resumption, with early data, works,
> once. Then you get new session that you can resume again if you wish,
> but also once. You cannot resume the same session twice. With your
> patch I can resume single session as many times as I wish. Coupled
> with early data this is exactly something that TLS 1.3 RFC warns
> against. This probably is due to haproxy using external session
> management.
> 
> I'll try to dig more into this on weekend, now that I know where to look.
> 

I understand the concern.
I checked and both nghttp2 and nginx disable the replay protection. The idea
is you're supposed to allow early data only on harmless requests anyway, ie
ones that could be replayed with no consequence.

Regards,

Olivier



Re: State of 0-RTT TLS resumption with OpenSSL

2019-01-03 Thread Olivier Houchard
Hi Janusz,

On Thu, Jan 03, 2019 at 11:49:35AM +0100, Janusz Dziemidowicz wrote:
> ??r., 2 sty 2019 o 19:04 Olivier Houchard  napisa??(a):
> > You're right indeed. 0RTT was added with a development version of OpenSSL 
> > 1.1.1,
> > which had a default value for max early data of 16384, but it was changed to
> > 0 in the meanwhile.
> > Does the attached patch work for you ?
> 
> This indeed results in following when using s_client:
> Max Early Data: 16385
> 
> However, I believe it still does not work. I was trying again to test
> it with s_client.
> 
> Without allow-0rtt option I can resume TLS 1.3 session without problem:
> openssl s_client -connect host:port -sess_out sessfile
> openssl s_client -connect host:port -sess_in sessfile
> This results with:
> Reused, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
> 
> As soon as I add allow-0rtt (and your patch) above s_client results
> always with a new session:
> New, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
> No matter what I do I was not able to resume any session with allow-0rtt 
> active.
> 
> Just to rule out that I am using s_client in a wrong way I've made the
> same test against s_server. I was able to successfully resume session
> and even send early data that was accepted. So I believe that there is
> still something wrong in haproxy with TLS session handling.
> 

Ah I think I figured it out.
OpenSSL added anti-replay protection when using early data, and it messes up
with the session handling.
With the updated attached patch, I get early data to work again. Is it better
for you ?

Regards,

Olivier
>From 82126322107bc628e32ff300195951fd660a43ac Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 2 Jan 2019 18:46:41 +0100
Subject: [PATCH] BUG/MEDIUM: ssl: Disable anti-replay protection and set max
 data with 0RTT.

When using early data, disable the OpenSSL anti-replay protection, and set
the max amount of early data we're ready to accept, based on the size of
buffers, or early data won't work with the released OpenSSL 1.1.1.

This should be backported to 1.8.
---
 src/ssl_sock.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 282b85dd..13ce2e5b 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3869,6 +3869,10 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
SSL_CTX_set_select_certificate_cb(ctx, ssl_sock_switchctx_cbk);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #elif (OPENSSL_VERSION_NUMBER >= 0x10101000L)
+   if (bind_conf->ssl_conf.early_data) {
+   SSL_CTX_set_options(ctx, SSL_OP_NO_ANTI_REPLAY);
+   SSL_CTX_set_max_early_data(ctx, global.tune.bufsize - 
global.tune.maxrewrite);
+   }
SSL_CTX_set_client_hello_cb(ctx, ssl_sock_switchctx_cbk, NULL);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #else
-- 
2.14.4



Re: State of 0-RTT TLS resumption with OpenSSL

2019-01-02 Thread Olivier Houchard
Hi Janusz,

On Sun, Dec 30, 2018 at 05:38:26PM +0100, Janusz Dziemidowicz wrote:
> Hi,
> I've been trying to get 0-RTT resumption working with haproxy 1.8.16
> and OpenSSL 1.1.1a.
> No matter what I put in configuration file, testing with openssl
> s_client always results in:
> Max Early Data: 0
> 
> OK, let's look at ssl_sock.c
> The only thing that seems to try to enable 0-RTT is this:
> #ifdef OPENSSL_IS_BORINGSSL
> if (allow_early)
> SSL_set_early_data_enabled(ssl, 1);
> #else
> if (!allow_early)
> SSL_set_max_early_data(ssl, 0);
> #endif
> 
> But I fail to see how this is supposed to work. OpenSSL has 0-RTT
> disabled by default. To enable this one must call
> SSL_set_max_early_data with the amount of bytes it is willing to read.
> The above simply does... nothing.
> 
> Is it supposed to work at all or do I miss something? ;)
> 

You're right indeed. 0RTT was added with a development version of OpenSSL 1.1.1,
which had a default value for max early data of 16384, but it was changed to
0 in the meanwhile.
Does the attached patch work for you ?

Thanks !

Olivier
>From cdb864da7cebb97800aef2e114bae6f0d0f96814 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 2 Jan 2019 18:46:41 +0100
Subject: [PATCH] MEDIUM: ssl: Call SSL_CTX_set_max_early_data() to enable
 0RTT.

When we want to enable early data on a listener, explicitely call
SSL_CTX_set_max_early_data(), as the default is now 0.

This should be backported to 1.8.
---
 src/ssl_sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 282b85dd..c24de955 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3869,6 +3869,8 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
SSL_CTX_set_select_certificate_cb(ctx, ssl_sock_switchctx_cbk);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #elif (OPENSSL_VERSION_NUMBER >= 0x10101000L)
+   if (bind_conf->ssl_conf.early_data)
+   SSL_CTX_set_max_early_data(ctx, global.tune.bufsize - 
global.tune.maxrewrite);
SSL_CTX_set_client_hello_cb(ctx, ssl_sock_switchctx_cbk, NULL);
SSL_CTX_set_tlsext_servername_callback(ctx, ssl_sock_switchctx_err_cbk);
 #else
-- 
2.14.4



Re: 1.9 BUG: redispatch broken

2018-12-24 Thread Olivier Houchard
Hi Lukas,

On Sat, Dec 22, 2018 at 11:16:09PM +0100, Lukas Tribus wrote:
> Hello Oliver,
> 
> 
> redispatch is broken since commit 25b401536 ("BUG/MEDIUM: connection:
> Just make sure we closed the fd on connection failure"). It simply
> fails to connect to the next server.
> 
> 1.9 is affected.
> 
> 
> Repro:
> 
> global
> log 10.0.0.4:514 len 65535 local1 debug
> maxconn 1000
> 
> defaults
> log global
> mode http
> option httplog
> timeout connect 1s
> timeout client 30s
> timeout server 30s
> retries 3
> option redispatch 1
> 
> frontend http-in
> bind :80
> default_backend mybak
> 
> backend mybak
> mode http
> balance first
> server primary-fail 10.0.0.199:80 # this server is unreachable
> server backup-ok 10.0.0.254:80
> 
> 
> 
> 
> cheers,

Ooops you're right indeed. The attached patch should fix it.

Thanks a lot for reporting !

Regards,

Olivier
>From 2276c53dac820d0079525730e9bd7abfd3ea408c Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 24 Dec 2018 13:32:13 +0100
Subject: [PATCH] BUG/MEDIUM: servers: Don't try to reuse connection if we
 switched server.

In connect_server(), don't attempt to reuse the old connection if it's
targetting a different server than the one we're supposed to access, or
we will never be able to connect to a server if the first one we tried failed.

This should be backported to 1.9.
---
 src/backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend.c b/src/backend.c
index 2407f8a32..bc38c5710 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1129,7 +1129,7 @@ int connect_server(struct stream *s)
srv_cs = objt_cs(s->si[1].end);
if (srv_cs) {
old_conn = srv_conn = cs_conn(srv_cs);
-   if (old_conn) {
+   if (old_conn && (!old_conn->target || old_conn->target == 
s->target)) {
old_conn->flags &= ~(CO_FL_ERROR | CO_FL_SOCK_RD_SH | 
CO_FL_SOCK_WR_SH);
srv_cs->flags &= ~(CS_FL_ERROR | CS_FL_EOS | 
CS_FL_REOS);
reuse = 1;
-- 
2.19.2



Re: [PATCH] BUG/CRITICAL: SIGBUS crash on aarch64

2018-11-15 Thread Olivier Houchard
On Thu, Nov 15, 2018 at 02:26:59PM +0100, Willy Tarreau wrote:
> On Thu, Nov 15, 2018 at 11:58:36AM +0100, Olivier Houchard wrote:
> > Willy, can you push the attached patch ?
> 
> Applied, thanks. I've just slightly edited it to put parenthesis around
> "i" below :
> 
> > +#define round_ptr_size(i) ((i + (sizeof(void *) - 1)) &~ (sizeof(void *) - 
> > 1))
> 
> so that if someone decides to do round_ptr_size(x & mask) he doesn't end up
> doing (x & (mask + sizeof(void*)-1)) but ((x & mask) + sizeof(...))
> 
> :-)
> 

I give up, macros are way too complicated.

Olivier



Re: [PATCH] BUG/CRITICAL: SIGBUS crash on aarch64

2018-11-15 Thread Olivier Houchard
On Thu, Nov 15, 2018 at 09:48:20AM +, Paul Martin wrote:
> On Wed, Nov 14, 2018 at 06:05:00PM +0100, Olivier Houchard wrote:
> 
> > Oops, you're right indeed.
> > I'm not sure I'm a big fan of special-casing STD_T_UINT. For example,
> > STD_T_FRQP is probably 12bytes too, so it'd be a problem.
> > Can you test the (untested, but hopefully right) patch attached ?
> 
> Yes, your patch works on aarch64 too.
> 

Thanks for testing !

Willy, can you push the attached patch ?

Olivier
>From 521b6527c71921a96ba4134048f6d9ea1d2689d3 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 14 Nov 2018 17:54:36 +0100
Subject: [PATCH] BUG/MEDIUM: Make sure stksess is properly aligned.

When we allocate struct stksess, we also allocate memory to store the
associated data before the struct itself.
As the data can be of different types, they can have different size. However,
we need the struct stksess to be properly aligned, as it can do 64bits
load/store (including atomic load/stores) on 64bits platforms, and some of
them doesn't support unaligned access.
So, when allocating the struct stksess, round the size up to the next
multiple of sizeof(void *), and make sure the struct stksess itself is
properly aligned.
Many thanks to Paul Martin for investigating and reporting that bug.

This should be backported to earlier releases.
---
 src/stick_table.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/stick_table.c b/src/stick_table.c
index 7f141631..4ec7861e 100644
--- a/src/stick_table.c
+++ b/src/stick_table.c
@@ -45,6 +45,7 @@
 /* structure used to return a table key built from a sample */
 static THREAD_LOCAL struct stktable_key static_table_key;
 
+#define round_ptr_size(i) ((i + (sizeof(void *) - 1)) &~ (sizeof(void *) - 1))
 /*
  * Free an allocated sticky session , and decrease sticky sessions counter
  * in table .
@@ -52,7 +53,7 @@ static THREAD_LOCAL struct stktable_key static_table_key;
 void __stksess_free(struct stktable *t, struct stksess *ts)
 {
t->current--;
-   pool_free(t->pool, (void *)ts - t->data_size);
+   pool_free(t->pool, (void *)ts - round_ptr_size(t->data_size));
 }
 
 /*
@@ -230,7 +231,7 @@ struct stksess *__stksess_new(struct stktable *t, struct 
stktable_key *key)
ts = pool_alloc(t->pool);
if (ts) {
t->current++;
-   ts = (void *)ts + t->data_size;
+   ts = (void *)ts + round_ptr_size(t->data_size);
__stksess_init(t, ts);
if (key)
stksess_setkey(t, ts, key);
@@ -598,7 +599,7 @@ int stktable_init(struct stktable *t)
t->updates = EB_ROOT_UNIQUE;
HA_SPIN_INIT(>lock);
 
-   t->pool = create_pool("sticktables", sizeof(struct stksess) + 
t->data_size + t->key_size, MEM_F_SHARED);
+   t->pool = create_pool("sticktables", sizeof(struct stksess) + 
round_ptr_size(t->data_size) + t->key_size, MEM_F_SHARED);
 
t->exp_next = TICK_ETERNITY;
if ( t->expire ) {
-- 
2.17.1



Re: High CPU Usage followed by segfault error

2018-10-16 Thread Olivier Houchard
On Tue, Oct 16, 2018 at 05:02:30PM +0200, Willy Tarreau wrote:
> On Tue, Oct 16, 2018 at 04:11:20PM +0200, Willy Tarreau wrote:
> > Could you please apply the attached patch ? I'm going to merge it into 1.9
> > and we'll backport it to 1.8 later.
> 
> And please add the attached one as well, which is specific to 1.8. I
> suspect that different versions of compiler could emit inappropriate
> code due to the threads_want_sync variable not being marked volatile.
> 
> In your case the issue would manifest itself if you're having heavy
> server queueing ("maxconn" on the server lines) or if you're seeing
> a lot of "server up/down" events.
> 
> Thanks,
> Willy


Nice catch !

This one is a good candidate.

Regards,

Olivier



Re: High CPU Usage followed by segfault error

2018-10-16 Thread Olivier Houchard
Hi Soji,

On Mon, Oct 15, 2018 at 11:10:09PM +0530, Soji Antony wrote:
> Hi Olivier,
> 
> Many thanks for your reply.
> Please find the gdb output given below.
> 
> # gdb /usr/sbin/haproxy core.dump3.13871
> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later  >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /usr/sbin/haproxy...(no debugging symbols
> found)...done.
> [New LWP 13872]
> [New LWP 13873]
> [New LWP 13888]
> [New LWP 13889]
> [New LWP 13890]
> [New LWP 13892]
> [New LWP 13893]
> [New LWP 13894]
> [New LWP 13895]
> [New LWP 13871]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/haproxy'.
> #0  0x55b79086de43 in _start ()
> (gdb) thread apply all bt
> 
> Thread 10 (Thread 0x7f88ee327980 (LWP 13871)):
> #0  0x55b79086dee1 in _start ()
> 
> Thread 9 (Thread 0x7f88d37fe700 (LWP 13895)):
> #0  0x55b79086de46 in _start ()
> 
> Thread 8 (Thread 0x7f88cb7fe700 (LWP 13894)):
> #0  0x55b79086de46 in _start ()
> 
> Thread 7 (Thread 0x7f88d3fff700 (LWP 13893)):
> #0  0x55b79086de43 in _start ()
> 
> Thread 6 (Thread 0x7f88e8c25700 (LWP 13892)):
> #0  0x55b79086de43 in _start ()
> 
> Thread 5 (Thread 0x7f88e94c5700 (LWP 13890)):
> #0  0x55b79086de46 in _start ()
> 
> Thread 4 (Thread 0x7f88e9cc6700 (LWP 13889)):
> #0  0x55b79086de46 in _start ()
> 
> Thread 3 (Thread 0x7f88ea4c7700 (LWP 13888)):
> #0  0x55b79086de43 in _start ()
> 
> Thread 2 (Thread 0x7f88eacc8700 (LWP 13873)):
> #0  0x55b79086de43 in _start ()
> 
> Thread 1 (Thread 0x7f88eb4c9700 (LWP 13872)):
> #0  0x55b79086de43 in _start ()
> (gdb) info threads
>   Id   Target Id Frame
>   10   Thread 0x7f88ee327980 (LWP 13871) 0x55b79086dee1 in _start ()
>   9Thread 0x7f88d37fe700 (LWP 13895) 0x55b79086de46 in _start ()
>   8Thread 0x7f88cb7fe700 (LWP 13894) 0x55b79086de46 in _start ()
>   7Thread 0x7f88d3fff700 (LWP 13893) 0x55b79086de43 in _start ()
>   6Thread 0x7f88e8c25700 (LWP 13892) 0x55b79086de43 in _start ()
>   5Thread 0x7f88e94c5700 (LWP 13890) 0x55b79086de46 in _start ()
>   4Thread 0x7f88e9cc6700 (LWP 13889) 0x55b79086de46 in _start ()
>   3Thread 0x7f88ea4c7700 (LWP 13888) 0x55b79086de43 in _start ()
>   2Thread 0x7f88eacc8700 (LWP 13873) 0x55b79086de43 in _start ()
> * 1Thread 0x7f88eb4c9700 (LWP 13872) 0x55b79086de43 in _start ()
> (gdb)
> 

Unfortunately, as is often the case with gdb, that's less than useful :/
If you have that available, you may install the haproxy-dbg package, but
I'm not convinced it will yield better results.
Can you share your config, obsucating any confidential informations, IP
addresses etc ?
Do you do anything on a stat sockets ?
You mentionned you where getting a segfault, do you know how to reproduce it ?
You also mentionned reloads are frequent, can you tell if the CPU spike happens
immediately after a reload ?

Thanks !

Olivier



Re: High CPU Usage followed by segfault error

2018-10-15 Thread Olivier Houchard
> > [pid 114267] <... sched_yield resumed> ) = 0
> > [pid 114266] sched_yield( 
> > [pid 114265] <... sched_yield resumed> ) = 0
> > [pid 114267] sched_yield( 
> > [pid 114266] <... sched_yield resumed> ) = 0
> > [pid 114265] sched_yield( 
> > [pid 114267] <... sched_yield resumed> ) = 0
> > [pid 114266] sched_yield( 
> > [pid 114265] <... sched_yield resumed> ) = 0
> > [pid 114267] sched_yield( 
> > [pid 114266] <... sched_yield resumed> ) = 0
> > [pid 114267] <... sched_yield resumed> ) = 0
> > [pid 114266] sched_yield( 
> > [pid 114265] sched_yield( 
> > [pid 114267] sched_yield( 
> > [pid 114266] <... sched_yield resumed> ) = 0
> > [pid 114265] <... sched_yield resumed> ) = 0
> > [pid 114267] <... sched_yield resumed> ) = 0
> > [pid 114266] sched_yield( 
> > [pid 114265] sched_yield( 
> > [pid 114267] sched_yield( 
> > [pid 114266] <... sched_yield resumed> ) = 0
> > [pid 114265] <... sched_yield resumed> ) = 0
> > [pid 114267] <... sched_yield resumed> ) = 0
> > [pid 114266] sched_yield( 
> > [pid 114265] sched_yield( 
> >
> > kernel.log
> >
> > Oct 10 19:13:04 int16 kernel: [192997.62] sched: RT throttling
> > activated
> > Oct 10 19:16:28 int16 kernel: [193201.140115] INFO: task :1213
> > blocked for more than 120 seconds.
> > Oct 10 19:16:28 int16 kernel: [193201.144250]   Tainted: G
> > OE   
> > Oct 10 19:16:28 int16 kernel: [193201.147927] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Oct 10 19:16:28 int16 kernel: [193201.152389]  880768c878a8
> > 880768c87968 880766ae3700 880768c88000
> > Oct 10 19:16:28 int16 kernel: [193201.152392]  
> > 7fff 88078ffd50f0 817f1700
> > Oct 10 19:16:28 int16 kernel: [193201.152394]  880768c878c0
> > 817f0fb5 88076f917200 880768c87970
> > Oct 10 19:16:28 int16 kernel: [193201.152396] Call Trace:
> > Oct 10 19:16:28 int16 kernel: [193201.152402]  [] ?
> > bit_wait+0x50/0x50
> > Oct 10 19:16:28 int16 kernel: [193201.152404]  []
> > schedule+0x35/0x80
> > Oct 10 19:16:28 int16 kernel: [193201.152418]  []
> > schedule_timeout+0x23b/0x2d0
> > Oct 10 19:16:28 int16 kernel: [193201.152430]  [] ?
> > xen_clocksource_read+0x15/0x20
> > Oct 10 19:16:28 int16 kernel: [193201.152438]  [] ?
> > sched_clock+0x9/0x10
> > Oct 10 19:16:28 int16 kernel: [193201.152441]  [] ?
> > __blk_mq_alloc_request+0xe4/0x1f0
> > Oct 10 19:16:28 int16 kernel: [193201.152442]  [] ?
> > bit_wait+0x50/0x50
> > Oct 10 19:16:28 int16 kernel: [193201.152445]  []
> > io_schedule_timeout+0xa6/0x110
> > Oct 10 19:16:28 int16 kernel: [193201.152446]  []
> > bit_wait_io+0x1b/0x60
> > Oct 10 19:16:28 int16 kernel: [193201.152448]  []
> > __wait_on_bit+0x62/0x90
> > Oct 10 19:16:28 int16 kernel: [193201.152451]  []
> > wait_on_page_bit+0xc0/0xd0
> > Oct 10 19:16:28 int16 kernel: [193201.152454]  [] ?
> > autoremove_wake_function+0x40/0x40
> > Oct 10 19:16:28 int16 kernel: [193201.152456]  []
> > truncate_inode_pages_range+0x366/0x6d0
> > Oct 10 19:16:28 int16 kernel: [193201.152459]  [] ?
> > blkif_queue_rq+0x508/0x690
> > Oct 10 19:16:28 int16 kernel: [193201.152461]  [] ?
> > down_write+0x12/0x40
> > Oct 10 19:16:28 int16 kernel: [193201.152465]  [] ?
> > unmap_mapping_range+0x66/0x110
> > Oct 10 19:16:28 int16 kernel: [193201.152467]  []
> > truncate_pagecache+0x47/0x60
> > Oct 10 19:16:28 int16 kernel: [193201.152469]  []
> > truncate_setsize+0x32/0x40
> > Oct 10 19:16:28 int16 kernel: [193201.152509]  []
> > xfs_setattr_size+0x168/0x3d0 [xfs]
> > Oct 10 19:16:28 int16 kernel: [193201.152522]  []
> > xfs_vn_setattr+0x9f/0xb0 [xfs]
> > Oct 10 19:16:28 int16 kernel: [193201.152524]  []
> > notify_change+0x250/0x470
> > Oct 10 19:16:28 int16 kernel: [193201.152533]  []
> > do_truncate+0x66/0xa0
> > Oct 10 19:16:28 int16 kernel: [193201.152545]  []
> > path_openat+0x266/0x12e0
> > Oct 10 19:16:28 int16 kernel: [193201.152555]  [] ?
> > record_event_consumer.part.5+0x2ea/0x9e0 [sysdigcloud_probe]
> > Oct 10 19:16:28 int16 kernel: [193201.152564]  [] ?
> > generic_file_read_iter+0x5d5/0x670
> > Oct 10 19:16:28 int16 kernel: [193201.152566]  []
> > do_filp_open+0x7e/0xd0
> > Oct 10 19:16:28 int16 kernel: [193201.152569]  [] ?
> > __alloc_fd+0xc4/0x180
> > Oct 10 19:16:28 int16 kernel: [193201.152571]  []
> > do_sys_o

Re: High CPU Usage followed by segfault error

2018-10-02 Thread Olivier Houchard
Hi,

On Tue, Oct 02, 2018 at 08:26:12PM +0530, Soji Antony wrote:
> Hello,
> 
> We are currently using haproxy 1.8.3 with single process multithreaded
> configuration.
> We have 1 process and 10 threads each mapped to a separate core [0-9]. We
> are running our haproxy instances on a c4.4xlarge aws ec2 instance. The
> only other CPU intensive process running on this server is a log shipper
> which is explicity mapped to cpu cores 13 - 16 explicitly using taskset
> command. Also we have given 'SCHED_RR' priority 99 for haproxy processes.
> 
> OS: Ubuntu 14
> Kernel: 4.4.0-134-generic
> 
> The issue we are seeing with Haproxy is all of a sudden CPU usage spikes to
> 100% on cores which haproxy is using & causing latency spikes and high load
> on the server. We are seeing the following error messages in system /
> kernel logs when this issue happens.
> 
> haproxy[92558]: segfault at 8 ip 55f04b1f5da2 sp 7ffdab2bdd40 error
> 6 in haproxy[55f04b10100
> 0+17]
> 
> Sep 29 12:21:02 marathonlb-int21 kernel: [2223350.996059] sched: RT
> throttling activated
> 
> We are using marathonlb for auto discovery and reloads are quite frequent
> on this server. Last time when this issue happened we had seen haproxy
> using 750% of CPU and it went into D state. Also the old process was also
> taking cpu.
> 
> hard-stop-after was not set in our hap configuration and we were seeing
> multiple old pid's running on the server. After the last outage we had with
> CPU we set 'hard-stop-after' to 10s and now we are not seeing multiple hap
> instances running after reload. I would really appreciate if some one can
> explain us why the CPU usage spikes with the above segfault error & what
> this error exactly means.
> 
> FYI: There was no traffic spike on this hap instance when the issue
> happened. We have even seen the same issue in a non-prod hap where no
> traffic was coming & system went down due to CPU usage & found the same
> segfault error in the logs.
> 

A good first step would probably to upgrade to the latest version if possible.
1.8.3 is quite old, and a bunch of bugs have been fixed since then,
especially when using multithreading.

Regards,

Olivier



Re: Crash reposrt

2018-09-26 Thread Olivier Houchard
Hi Anton,

On Wed, Sep 26, 2018 at 12:09:07PM +0300, prog 76 wrote:
> 
> Hi
> First of all Thank you for this great product. We are very happy to use it 
> for years.
> Unfortunately from version 1.8.12 we have an issue. Sometimes haproxy crash.
> We tried  to upgrade to 1.8.13 and it also crashes.
> Not often, we have almot constant load and it crashes 1 times per week. We 
> have count of instances on different machines and enough amount of them are 
> ready. Also we use Monit service to restart crashed instances.
> But anyway not good because we broke connections.
> Our config haproxy.cfg attached. 
> Crash data haproxy.core.zip attached.
> 

Beware when sending the core on the mailing list, it may contain sensitive
informations.

> It seems always crashes on the same URL last log lines from two crashes
> 
> proxy1web-next haproxy[12763]: x.x.x.x:6733 [07/Sep/2018:10:16:56.924] https~ 
> oldpool/web1-next 2634/0/1/82/3649 500 1434 - -  2/1/0/1/0 0/0 "POST 
> /Mobile/UpdateTechnicianPositionBatched HTTP/1.1"
> proxy2web-next haproxy[88028]: x.x.x.x:6750 [12/Sep/2018:10:55:56.696] https~ 
> oldpool/web2-next 1597/0/1/78/3802 500 1434 - -  4/3/0/1/0 0/0 "POST 
> /Mobile/UpdateTechnicianPositionBatched HTTP/1.1"
> 
> Back trace looks like
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p 
> /var/run/haproxy.pid'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x7f5e6c27981d in __memmove_ssse3_back ()
> at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881
> 1881 ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: No such file or 
> directory.
> (gdb) bt
> #0 0x7f5e6c27981d in __memmove_ssse3_back ()
> at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881
> #1 0x556549c98e9b in memmove (__len=, __src=0x7f5e480594c7,
> __dest=0x7f5e480594e1) at /usr/include/x86_64-linux-gnu/bits/string3.h:57
> #2 buffer_insert_line2 (b=0x7f5e48055f70,
> pos=0x7f5e480594c7 
> "\r\n{\"location\":[{\"coords\":{\"speed\":-1,\"longitude\":-122.31412169240122,\"latitude\":47.311481963215847,\"accuracy\":70.4",
>  '0' , 
> "6,\"altitude_accuracy\":10,\"altitude\":121.7,\"heading\":-1},\"extras\":{},\"is_m"...,
> str=str@entry=0x556549ff0340 "X-Forwarded-Proto: https", len=len@entry=24)
> at src/buffer.c:158
> #3 0x556549bd35b7 in http_header_add_tail (msg=msg@entry=0x7f5e5004a570,
> hdr_idx=hdr_idx@entry=0x7f5e5004a510, text=0x556549ff0340 "X-Forwarded-Proto: 
> https")
> at src/proto_http.c:538
> #4 0x556549bdc9d8 in http_process_req_common (s=s@entry=0x7f5e50027470,
> req=req@entry=0x7f5e50027480, an_bit=an_bit@entry=16, px=0x556549fed430)
> at src/proto_http.c:3516
> #5 0x556549c0dc72 in process_stream (t=) at 
> src/stream.c:1905
> #6 0x556549c8ba07 in process_runnable_tasks () at src/task.c:317
> #7 0x556549c3ec85 in run_poll_loop () at src/haproxy.c:2403
> #8 run_thread_poll_loop (data=) at src/haproxy.c:2470
> #9 0x7f5e6cfa4184 in start_thread (arg=0x7f5e68412700) at 
> pthread_create.c:312
> #10 0x7f5e6c22237d in eventfd (count=26, flags=1208320820)
> at ../sysdeps/unix/sysv/linux/eventfd.c:55
> #11 0x in ?? ()
> (gdb)
>  
> And unfortunately we were not able to reproduce the issue manually. Tried 
> slowhttptest, curl, for now no luck.
> 
> Could you please suggest next steps that we can do to fix that?
> 

Any chance you can give us the binary that generated that core as well ? That
would be really useful.

Thanks !

Olivier



Re: Hang in haproxy 1.8.13

2018-09-11 Thread Olivier Houchard
Hi,

On Tue, Sep 11, 2018 at 12:58:40PM +0100, David King wrote:
> Hi,
> 
> > I just tested, and it seems to happen with the latest 1.8, but not with
> 1.9.
> > Not using kqueue (by using -dk) seems to work around the issue.
> > It's quite interesting it doesn't happen on Centos, it means it is
> probably
> > kqueue-specific (or that it works by accident with epoll/poll/select).
> > I'm investigating.
> >
> > Regards,
> >
> > Olivier
> 
> 
> so i've been running some tests from builds from source, it seems to be
> 1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well
> 
> Thanks
> 
> Dave

Ok I think I figured it out.
The bug was present in master too, but was masked for some reason.

The patches attached should fix this. The first one is for master, and the
second one for 1.8, as the master patch didn't apply cleanly on 1.8.

Regards,

Olivier
>From d950da31340528c37173fc74d1c0f635c977cd03 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
 accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 087a07e7..e2f04f70 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(polled_mask[fd] & tid_bit)) {
/* fd was not watched, it's still not */
-   return 0;
+   return changes;
}
/* fd totally removed from poll list */
EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
-- 
2.14.3

>From e987a5bda41a1b560a352b4ec2d54a8ebcd5965a Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
 accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 1f4762e6..c88cf6f3 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(fdtab[fd].polled_mask & tid_bit)) {
/* fd was not watched, it's still not */
-   return 0;
+   return changes;
}
/* fd totally removed from poll list */
EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
-- 
2.14.3



Re: Hang in haproxy 1.8.13

2018-09-11 Thread Olivier Houchard
Hi,

On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote:
> On Tue, 11 Sep 2018 at 11:55, David King  wrote:
> >
> > Apologies, i forgot to mention this is running on FreeBSD 11.1
> >
> > I've just run the same tests on Centos and there is no issue
> 
> Could you retry with the current development tree (1.9) from git?
> There are a number of fixes waiting to be backported to 1.8 and also a
> number of already backported fixes (but post 1.8.13).
> 
> 

I just tested, and it seems to happen with the latest 1.8, but not with 1.9.
Not using kqueue (by using -dk) seems to work around the issue.
It's quite interesting it doesn't happen on Centos, it means it is probably
kqueue-specific (or that it works by accident with epoll/poll/select).
I'm investigating.

Regards,

Olivier





Re: Help you generate more revenue for your haproxy.com.

2018-09-05 Thread Olivier Houchard
On Wed, Sep 05, 2018 at 10:14:55PM +1000, Rob Thomas wrote:
> You gotta wonder how this guy got this mailing list.  He must have actually
> LOOKED at the website, right?
> 
> Sigh. Spammers.
> 
> For anyone who cares, I don't think it's possible for haproxy to get MORE
> exposure on google.
> 
> [image: image.png]
> 

Notice how he mentionned haproxy.com, which is only 2nd in your google search.

I think we can trust somebody named FREDDIE KIRK, after all he is SEO
Strategist AND Business Development Manager.

Regards,

Olivier

> On Wed, 5 Sep 2018 at 22:07, FREDDIE KIRK 
> wrote:
> 
> > Hi *haproxy.com ,*
> >
> > *Do you need to know how your website currently ranks on search engine
> > result pages and how you can start beating your competitors right now?*
> >
> > Today, I went through your website *haproxy.com *;
> > you seem to have a great website, but only the thing is People are already
> > searching for your products and services, but if you don't use the Right
> > Keywords they're searching for on your site, it will be difficult for them
> > to find you.
> >
> > We will deliver you a huge ROI, high ranking, more traffic, clicks, page
> > views and most importantly converting those visitors into paying customers.
> >
> > Let me know if I should share a *Plan of Action* for your website
> >
> > Kind Regards
> >
> > SEO Strategist
> > Business Development Manager
> >





Re: How to verify HAProxy build on Solaris/SPARC ?

2018-08-31 Thread Olivier Houchard
Hi Lukas,

On Fri, Aug 31, 2018 at 02:03:47PM +0200, Lukas Tribus wrote:
> Hello,
> 
> 
> On Fri, 31 Aug 2018 at 04:30, Willy Tarreau  wrote:
> > I'd like to ask you to test something just in case it helps. Could
> > you please modify your makefile to add "-pthread" to "-DUSE_THREAD"
> > like this :
> >
> >  ifneq ($(USE_THREAD),)
> >  BUILD_OPTIONS   += $(call ignore_implicit,USE_THREAD)
> > -OPTIONS_CFLAGS  += -DUSE_THREAD
> > +OPTIONS_CFLAGS  += -DUSE_THREAD -pthread
> >  OPTIONS_LDFLAGS += -lpthread
> >  endif
> 
> Note that in a thread from July Oliver concluded that
> USE_PTHREAD_PSHARED=1 is needed under Solaris:
> 
> https://www.mail-archive.com/haproxy@formilux.org/msg30696.html
> 
> 
> Not sure if we are talking about the same exact issue here though.
> 

I think in this case it is fine, as its gcc is recent enough to have
the required builtins.

Regards,

Olivier



Re: [PATCH] BUG/MAJOR: thread: lua: Wrong SSL context initialization.

2018-08-29 Thread Olivier Houchard
On Wed, Aug 29, 2018 at 02:11:45PM +0200, Frederic Lecaille wrote:
> This patch is in relation with one bug reproduced by the reg testing file
> sent by Pieter in this thread:
> https://www.mail-archive.com/haproxy@formilux.org/msg31079.html
> 
> Must be checked by Thierry.
> Must be backported to 1.8.
> 
> Note that Pieter reg testing files reg-tests/lua/b2.* come with this
> patch.
> 
> 
> Fred.

> From d6d38a354a89b55f91bb9962c5832a089d960b60 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
> Date: Wed, 29 Aug 2018 13:46:24 +0200
> Subject: [PATCH] BUG/MAJOR: thread: lua: Wrong SSL context initialization.
> 
> When calling ->prepare_srv() callback for SSL server which
> depends on global "nbthread" value, this latter was not already parsed,
> so equal to 1 default value. This lead to bad memory accesses.
> 
> Thank you to Pieter (PiBa-NL) for having reported this issue and
> for having provided a very helpful reg testing file to reproduce
> this issue (reg-test/lua/b2.*).
> 

That sounds good, nice catch !

And yes thanks Pieter, as usual :)

Olivier



Re: lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-28 Thread Olivier Houchard
Hi,

On Mon, Aug 27, 2018 at 03:26:50PM +0200, Frederic Lecaille wrote:
[...]
> 
> According to Pieter traces, haproxy has registered HTTP service mode lua
> applets in HTTP mode. Your patch fixes a TCP service mode issue.
> reg-test/lua/b1.vtc script runs both HTTP and TCP lua applets. But this
> is the TCP mode one which makes sometimes fail this script.
> 
> > > > It seems one thread is stuck in lua_gc() while holding the global LUA 
> > > > lock,
> > > > but I don't know enough about LUA to guess what is going on.
> > > 
> > > What is suspect is that the HTTP and TCP applet functions
> > > hlua_applet_(http|tcp)_fct() are called several times even when the applet
> > > is done, or when the streams are disconnected or closed:
> > > 
> > >   /* If the stream is disconnect or closed, ldo nothing. */
> > >  if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO))
> > >  return;
> > > 
> > > this leads to call hlua_ctx_resume() several times from the same thread I
> > > guess.
> > > 
> > 
> > But if hlua_applet_(http|tcp)_fct() just returns, who calls
> > hlua_ctx_resume() ? :)
> 
> hlua_applet_(http|tcp)_fct() functions. If your run the script previously
> mentioned, when it fails this is because hlua_applet_*tcp*_fct() is
> infinitely called.
> 
> 

Ok you're right, I have a patch for that problem, which should definitively
be different from Pieter's problem :)
Willy, I think it's safe to be applied, and should probably be backported
(albeit it should be adapted, given the API differences with buffers/channels)
to 1.8 and 1.7, I've been able to reproduce the problem on both.

Regards,

Olivier
>From bf62441f9d0b305e16a74dbe3341ee7933c04761 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 28 Aug 2018 14:41:31 +0200
Subject: [PATCH] BUG/MEDIUM: hlua: Make sure we drain the output buffer when
 done.

In hlua_applet_tcp_fct(), drain the output buffer when the applet is done
running, every time we're called.
Overwise, there's a race condition, and the output buffer could be filled
after the applet ran, and as it is never cleared, the stream interface
will never be destroyed.

This should be backported to 1.8 and 1.7.
---
 src/hlua.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/hlua.c b/src/hlua.c
index edb4f68c..7bbc854d 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -6446,8 +6446,11 @@ static void hlua_applet_tcp_fct(struct appctx *ctx)
struct hlua *hlua = ctx->ctx.hlua_apptcp.hlua;
 
/* The applet execution is already done. */
-   if (ctx->ctx.hlua_apptcp.flags & APPLET_DONE)
+   if (ctx->ctx.hlua_apptcp.flags & APPLET_DONE) {
+   /* eat the whole request */
+   co_skip(si_oc(si), co_data(si_oc(si)));
return;
+   }
 
/* If the stream is disconnect or closed, ldo nothing. */
if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO))
-- 
2.14.3



Re: lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-27 Thread Olivier Houchard
On Mon, Aug 27, 2018 at 02:29:42PM +0200, Frederic Lecaille wrote:
> On 08/27/2018 01:33 PM, Olivier Houchard wrote:
> > Hi Pieter,
> > 
> > On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote:
> > > Hi List, Thierry, Olivier,
> > > 
> > > Using a lua-socket with connect_ssl and haproxy running with nbthread 3..
> > > results in haproxy hanging with 3 threads for me.
> > > 
> > > This while using both 1.9-7/30 version (with the 2 extra patches from
> > > Olivier avoiding 100% on a single thread.) and also a build of today's
> > > snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> > > 
> > > Below info is at the bottom of the mail:
> > > - haproxy -vv
> > > - gdb backtraces
> > > 
> > > This one is easy to reproduce after just a few calls to the lua function
> > > with the lua code i'm writing on a test-box.. So if a 'simple' config that
> > > makes a reproduction is desired i can likely come up with one.
> > > Same lua code with nbthread 1 seems to work properly.
> > > 
> > > Is below info (the stack traces) enough to come up with a fix? If not 
> > > lemme
> > > know and ill try and make a small reproduction of it.
> > > 
> > > 
> > > root@freebsd11:~ # haproxy -vv
> > > HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> > > Copyright 2000-2018 Willy Tarreau 
> > > 
> > > Build options :
> > >    TARGET  = freebsd
> > >    CPU = generic
> > >    CC  = cc
> > >    CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector
> > > -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement
> > > -fwrapv -fno-strict-overflow -Wno-address-of-packed-member
> > > -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
> > >    OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1
> > > USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1
> > > 
> > > Default settings :
> > >    maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
> > > 
> > > Built with network namespace support.
> > > Built with zlib version : 1.2.11
> > > Running on zlib version : 1.2.11
> > > Compression algorithms supported : identity("identity"), 
> > > deflate("deflate"),
> > > raw-deflate("deflate"), gzip("gzip")
> > > Built with PCRE version : 8.40 2017-01-11
> > > Running on PCRE version : 8.40 2017-01-11
> > > PCRE library supports JIT : yes
> > > Built with multi-threading support.
> > > Encrypted password support via crypt(3): yes
> > > Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
> > > Built with Lua version : Lua 5.3.4
> > > Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> > > Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> > > OpenSSL library supports TLS extensions : yes
> > > OpenSSL library supports SNI : yes
> > > OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
> > > 
> > > Available polling systems :
> > >   kqueue : pref=300,  test result OK
> > >     poll : pref=200,  test result OK
> > >   select : pref=150,  test result OK
> > > Total: 3 (3 usable), will use kqueue.
> > > 
> > > Available multiplexer protocols :
> > > (protocols markes as  cannot be specified using 'proto' keyword)
> > >      : mode=TCP|HTTP   side=FE|BE
> > >    h2 : mode=HTTP   side=FE
> > > 
> > > Available filters :
> > >      [TRACE] trace
> > >      [COMP] compression
> > >      [SPOE] spoe
> > > 
> > > root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
> > > GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
> > > Copyright (C) 2018 Free Software Foundation, Inc.
> > > License GPLv3+: GNU GPL version 3 or later
> > > <http://gnu.org/licenses/gpl.html>
> > > This is free software: you are free to change and redistribute it.
> > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > and "show warranty" for details.
> > > This GDB was configured as "x86_64-portbld-freebsd11.1".
> > > Type "show configuration" for configuration details.
> > > For bug reporting instructions, please see:
> > > <http://www.gnu.org/software/gdb/bugs

Re: lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-27 Thread Olivier Houchard
 No symbol table info available.
> #2  0x00080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #3  0x000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #4  0x00438e45 in hlua_ctx_resume (lua=0x8024f4f80, yield_allowed=1)
> at src/hlua.c:1186
>     ret = 0
>     msg = 0x5a8a24  "\351\200"
>     trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177"
> #5  0x0044887a in hlua_applet_http_fct (ctx=0x8024c5880) at
> src/hlua.c:6716
>     si = 0x802419540
>     strm = 0x802419200
>     res = 0x802419270
>     rule = 0x80242e360
>     px = 0x802512c00
>     hlua = 0x8024f4f80
>     blk1 = 0x7fffdfdfcca0 ""
>     len1 = 34397588737
>     blk2 = 0x802419278 ""
>     len2 = 34397590136
>     ret = 0
> #6  0x005a78a7 in task_run_applet (t=0x80242efe0,
> context=0x8024c5880, state=16385) at src/applet.c:49
>     app = 0x8024c5880
>     si = 0x802419540
> #7  0x005a49a6 in process_runnable_tasks () at src/task.c:384
>     t = 0x80242efe0
>     state = 16385
>     ctx = 0x8024c5880
>     process = 0x5a77f0 
>     t = 0x80242efe0
>     max_processed = 200
> #8  0x0051a6b2 in run_poll_loop () at src/haproxy.c:2386
>     next = 1923608730
>     exp = 1923608070
> #9  0x00517672 in run_thread_poll_loop (data=0x8024849f8) at
> src/haproxy.c:2451
>     start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> last_location = {function = 0x0, file = 0x0, line = 0}}}
>     ptif = 0x8c1980 
>     ptdf = 0x800f177cc
> ---Type  to continue, or q  to quit---
> #10 0x000800f12bc5 in ?? () from /lib/libthr.so.3
> No symbol table info available.
> #11 0x in ?? ()
> No symbol table info available.
> Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
> (gdb)
> 
> 

I found one minor problem with the LUA code (patch attached) but I kinda
doubt this is your problem here.
It seems one thread is stuck in lua_gc() while holding the global LUA lock,
but I don't know enough about LUA to guess what is going on.
As Fred said, it would be really nice to have a sample configuration to
reproduce the problem.

Regards,

Olivier
>From a6269dc836b47daed507791e10c6feba35bbd356 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 27 Aug 2018 12:59:14 +0200
Subject: [PATCH] MINOR: hlua: Don't call RESET_SAFE_LJMP if SET_SAFE_LJMP
 returns 0.

If SET_SAFE_LJMP returns 0, the spinlock is already unlocked, and lua_atpanic
is already set back to hlua_panic_safe, so there's no need to call
RESET_SAFE_LJMP.
---
 src/hlua.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/hlua.c b/src/hlua.c
index 65986473a..d516ee6d7 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -6391,7 +6391,6 @@ static int hlua_applet_tcp_init(struct appctx *ctx, 
struct proxy *px, struct str
error = "critical error";
SEND_ERR(px, "Lua applet tcp '%s': %s.\n",
 ctx->rule->arg.hlua_rule->fcn.name, error);
-   RESET_SAFE_LJMP(hlua->T);
return 0;
}
 
-- 
2.14.3



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Olivier Houchard
Hi Cyril,

On Tue, Aug 21, 2018 at 12:51:13PM +0200, Cyril Bonté wrote:
> Hi Willy,
> 
> I'm afraid there's still some issues with HTTP/2 in the current dev branch :-(
> 
> This morning, I've upgraded a test server and discovered that some HTTPS 
> sites did not work anymore (content hangs and is not sent to the client), 
> I've also noticed some segfaults in haproxy.
> As this is a test server that I've used for several years with haproxy, the 
> configuration begins to be quite ugly, it won't be helpful to provide it in 
> its current state.
> 

I think I figured those out, well at least I don't have hangs anymore, and
I think I understood those segfaults.
The attached patchset should do the trick.

Willy, those should be mergeable.

Thanks !

Olivier
>From 159f4653cffdd26fb62a26d047ac3f87f7506e59 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 14:25:52 +0200
Subject: [PATCH 1/5] BUG/MEDIUM: streams: Don't forget to remove the si from
 the wait list.

When freeing the stream, make sure we remove the stream interfaces from the
wait lists, in case it was in there.
---
 src/stream.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/stream.c b/src/stream.c
index bc0f1ac7..0f2ccf01 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -409,7 +409,9 @@ static void stream_free(struct stream *s)
session_free(sess);
 
tasklet_free(s->si[0].wait_list.task);
+   LIST_DEL(>si[0].wait_list.list);
tasklet_free(s->si[1].wait_list.task);
+   LIST_DEL(>si[1].wait_list.list);
pool_free(pool_head_stream, s);
 
/* We may want to free the maximum amount of pools if the proxy is 
stopping */
-- 
2.14.3

>From 3c93f5ad9f6dd14114c01217a11d56455c1fe62d Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 18:57:51 +0200
Subject: [PATCH 2/5] BUG/MEDIUM: tasklets: Add the thread as active when
 waking a tasklet.

Set the flag for the current thread in active_threads_mask when waking a
tasklet, or we will never run it if no tasks are available.
---
 include/proto/task.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/proto/task.h b/include/proto/task.h
index 13398c53..eef4ded4 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -226,6 +226,7 @@ static inline void tasklet_wakeup(struct tasklet *tl)
return;
LIST_ADDQ(_list[tid], >list);
task_list_size[tid]++;
+   HA_ATOMIC_OR(_tasks_mask, tid_bit);
HA_ATOMIC_ADD(_run_queue, 1);
 
 }
-- 
2.14.3

>From 722665b5f60b65ad0eb035b6dd2162062dcbf42f Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 15:59:43 +0200
Subject: [PATCH 3/5] BUG/MEDIUM: Check if the conn_stream exist in
 si_cs_io_cb.

It is possible that the conn_stream gets detached from the stream_interface,
and as it subscribed to the wait list, si_cs_io_cb() gets called anyway,
so make sure we have a conn_stream before attempting to send more data.
---
 src/stream_interface.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/stream_interface.c b/src/stream_interface.c
index 4b5b760c..52aa7c43 100644
--- a/src/stream_interface.c
+++ b/src/stream_interface.c
@@ -760,8 +760,12 @@ wake_others:
 struct task *si_cs_io_cb(struct task *t, void *ctx, unsigned short state)
 {
struct stream_interface *si = ctx;
+   struct conn_stream *cs = objt_cs(si->end);
+
+   if (!cs)
+   return NULL;
if (!(si->wait_list.wait_reason & SUB_CAN_SEND))
-   si_cs_send(__objt_cs(si->end));
+   si_cs_send(cs);
return (NULL);
 }
 
-- 
2.14.3

>From bee74b66b702126dd7bd808d0f4037ba885441a2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 16:36:10 +0200
Subject: [PATCH 4/5] BUG/MEDIUM: H2: Activate polling after successful
 h2_snd_buf().

Make sure h2_send() is called after h2_snd_buf() by activating polling.
---
 src/mux_h2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mux_h2.c b/src/mux_h2.c
index 4a3150a2..7824cfe4 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -3531,6 +3531,8 @@ static size_t h2_snd_buf(struct conn_stream *cs, struct 
buffer *buf, size_t coun
}
 
b_del(buf, total);
+   if (total > 0)
+   conn_xprt_want_send(h2s->h2c->conn);
return total;
 }
 
-- 
2.14.3

>From 64220c175bdc43fc82efc73c633be3b0212ab3e2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 16:37:06 +0200
Subject: [PATCH 5/5] BUG/MEDIUM: stream_interface: Call the wake callback
 after sending.

If we subscribed to send, and the callback is called, call the wake callback
after, so that process_stream() may be woken up if needed.
---
 src/stream_interface.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/stream_interface.c b/src/stream_interface.c
index 52aa7c43..51f2300d 100644
--- a

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-21 Thread Olivier Houchard
Hi Pieter,

On Mon, Aug 20, 2018 at 09:33:49PM +0200, PiBa-NL wrote:
> Hi Olivier,
> 
> Op 17-8-2018 om 14:51 schreef Willy Tarreau:
> > On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:
> > > That is true, this one is not a bug, but a pessimization, by using the 
> > > global
> > > update_list which is more costly than the local one.
> > > 
> > > Patches attached to do as suggest.
> > Applied, thank you!
> > willy
> 
> Just a little update :)
> 
> The '1.9-dev0-48d92ee 2018/07/30'+your initial 2 patches, is running
> correctly using as little cpu as can be expected for its little workload for
> 4+ days now. I think i can call it 'fix confirmed', as you already knew ;) ,
> previously the issue would have likely returned in this time period..
> 
> Ill keep it running for a few more days, and switch back to nbthread 3
> then.. Till next time ;)
> 
> Thanks again!
> 
> Best regards,
> 
> PiBa-NL (Pieter)
> 

Thanks for the feedback :)
Let us know how it goes with 3 threads !

Regards,

Olivier



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-17 Thread Olivier Houchard
On Thu, Aug 16, 2018 at 07:31:17PM +0200, Willy Tarreau wrote:
> Both patches applied, thanks guys!
> 
> Olivier, I have a suggestion for this one :
> On Thu, Aug 16, 2018 at 07:17:07PM +0200, Olivier Houchard wrote:
> > From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard 
> > Date: Thu, 16 Aug 2018 19:03:02 +0200
> > Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
> >  nbthread == 1
> > 
> > Make sure we don't insert a task in the global run queue if nbthread == 1,
> > as, as an optimisation, we avoid reading from it if nbthread == 1.
> > ---
> >  src/task.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/task.c b/src/task.c
> > index de097baf7..e357bc169 100644
> > --- a/src/task.c
> > +++ b/src/task.c
> > @@ -395,7 +395,8 @@ void process_runnable_tasks()
> > state = HA_ATOMIC_AND(>state, ~TASK_RUNNING);
> > if (state)
> >  #ifdef USE_THREAD
> > -   __task_wakeup(t, (t->thread_mask == tid_bit) ?
> > +   __task_wakeup(t, (t->thread_mask == tid_bit ||
> > +   global.nbthread == 1) ?
> > _local[tid] : );
> >  #else
> 
> I'm pretty sure we'll get caught by this sort of stuff in the future
> again. I think it would be safer to proceed like this :
> 
>  __task_wakeup(t, ((t->thread_mask & all_threads_mask) == tid_bit)
> 
> It should always be valid regardless of the number of threads. The same
> could be done in task_wakeup().
> 

Sure, why not.

> I suspect we may have a similar issue with the fd cache applied to listeners
> here :
> 
>static inline void updt_fd_polling(const int fd)
>{
> if (fdtab[fd].thread_mask == tid_bit) {
> unsigned int oldupdt;
> 
> /* note: we don't have a test-and-set yet in hathreads */
> 
> 
> In this case the thread_mask might be larger than all_threads_mask and
> we might fail this test. Or we may see that when threads exit and the
> threads mask changes.
> 
> Just my two cents, both patches were applied anyway.
> 

That is true, this one is not a bug, but a pessimization, by using the global
update_list which is more costly than the local one.

Patches attached to do as suggest.

Regards,

Olivier

>From c61f3f76f7e546aac14d7db33552dba11ce71e12 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 13:36:08 +0200
Subject: [PATCH 1/2] MINOR: tasks: Don't special-case when nbthreads == 1

Instead of checking if nbthreads == 1, just and thread_mask with
all_threads_mask to know if we're supposed to add the task to the local or
the global runqueue.
---
 src/task.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/task.c b/src/task.c
index e357bc169..ce5b4f907 100644
--- a/src/task.c
+++ b/src/task.c
@@ -395,8 +395,7 @@ void process_runnable_tasks()
state = HA_ATOMIC_AND(>state, ~TASK_RUNNING);
if (state)
 #ifdef USE_THREAD
-   __task_wakeup(t, (t->thread_mask == tid_bit ||
-   global.nbthread == 1) ?
+   __task_wakeup(t, ((t->thread_mask & 
all_threads_mask) == tid_bit) ?
_local[tid] : );
 #else
__task_wakeup(t, _local[tid]);
-- 
2.14.3

>From ea3b07fd2ed96b46ea107c9d862b5a1c27e728c2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 13:37:59 +0200
Subject: [PATCH 2/2] MINOR: fd cache: And the thread_mask with
 all_threads_mask.

When we choose to insert a fd in either the global or the local fd update list,
and the thread_mask against all_threads_mask before checking if it's tid_bit,
that way, if we run with nbthreads==1, we will always use the local list,
which is cheaper than the global one.
---
 include/proto/fd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/proto/fd.h b/include/proto/fd.h
index a4cee3220..a3ec5e854 100644
--- a/include/proto/fd.h
+++ b/include/proto/fd.h
@@ -109,7 +109,7 @@ void fd_rm_from_fd_list(volatile struct fdlist *list, int 
fd, int off);
  */
 static inline void updt_fd_polling(const int fd)
 {
-   if (fdtab[fd].thread_mask == tid_bit) {
+   if ((fdtab[fd].thread_mask & all_threads_mask) == tid_bit) {
unsigned int oldupdt;
 
/* note: we don't have a test-and-set yet in hathreads */
-- 
2.14.3



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard
Hi again,

On Thu, Aug 16, 2018 at 05:50:27PM +0200, Olivier Houchard wrote:
> Hi Pieter,
> 
> On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> > Hi List,
> > 
> > Anyone got a idea how to debug this further?
> > Currently its running at 100% again, any pointers to debug the process as
> > its running would be appreciated.
> > 
> > Or should i compile again from current master and 'hope' it doesn't return?
> > 
> > b.t.w. truss output is as follows:
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > 
> > Regards,
> > PiBa-NL (Pieter)
> 
> I'm interested in figuring that one out.
> From the look at it, it seems the scheduler thinks there's a task to be run,
> and so won't let the poller sleep in kevent().
> Can you
> - update to the latest master, even though I don't think any relevant fix
> was applied since Jul 30.
> - compile it with -O0, so that we can get meaningful informations from gdb.
> - When/if that happens again, getting a core, and send it to me with the
> haproxy binary, assuming there's no confidential information in that core,
> of course.
> 
> Thanks !
> 

So after discussing on IRC, I'm pretty sure I figured it out. The two
attached patches should fix it.

Thanks a lot !

Olivier
>From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:02 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
 nbthread == 1

Make sure we don't insert a task in the global run queue if nbthread == 1,
as, as an optimisation, we avoid reading from it if nbthread == 1.
---
 src/task.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/task.c b/src/task.c
index de097baf7..e357bc169 100644
--- a/src/task.c
+++ b/src/task.c
@@ -395,7 +395,8 @@ void process_runnable_tasks()
state = HA_ATOMIC_AND(>state, ~TASK_RUNNING);
if (state)
 #ifdef USE_THREAD
-   __task_wakeup(t, (t->thread_mask == tid_bit) ?
+   __task_wakeup(t, (t->thread_mask == tid_bit ||
+   global.nbthread == 1) ?
_local[tid] : );
 #else
__task_wakeup(t, _local[tid]);
-- 
2.14.3

>From 7640aa3de3c9ad00fe82cda4a50351e46fc0bf48 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:50 +0200
Subject: [PATCH 2/2] BUG/MEDIUM: sessions: Don't use t->state.

In session_expire_embryonic(), don't use t->state, use the "state" argument
instead, as t->state has been cleaned before we're being called.
---
 src/session.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/session.c b/src/session.c
index 1d66b739f..d7d8544c7 100644
--- a/src/session.c
+++ b/src/session.c
@@ -388,7 +388,7 @@ static struct task *session_expire_embryonic(struct task 
*t, void *context, unsi
 {
struct session *sess = context;
 
-   if (!(t->state & TASK_WOKEN_TIMER))
+   if (!(state & TASK_WOKEN_TIMER))
return t;
 
session_kill_embryonic(sess);
-- 
2.14.3



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard
Hi Pieter,

On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> Hi List,
> 
> Anyone got a idea how to debug this further?
> Currently its running at 100% again, any pointers to debug the process as
> its running would be appreciated.
> 
> Or should i compile again from current master and 'hope' it doesn't return?
> 
> b.t.w. truss output is as follows:
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> 
> Regards,
> PiBa-NL (Pieter)

I'm interested in figuring that one out.
>From the look at it, it seems the scheduler thinks there's a task to be run,
and so won't let the poller sleep in kevent().
Can you
- update to the latest master, even though I don't think any relevant fix
was applied since Jul 30.
- compile it with -O0, so that we can get meaningful informations from gdb.
- When/if that happens again, getting a core, and send it to me with the
haproxy binary, assuming there's no confidential information in that core,
of course.

Thanks !

Olivier



Re: Issue with TCP splicing

2018-07-25 Thread Olivier Houchard
On Wed, Jul 25, 2018 at 09:02:24AM -0400, Julien Semaan wrote:
> Hi Olivier,
> 
> Thanks for the time you're taking to check the issue.
> 
> I'll get an environment back with TCP splicing enabled and I'll run it in
> GDB and provide you a core dump
> 

That would be great, thank you !

Olivier



Re: Issue with TCP splicing

2018-07-25 Thread Olivier Houchard
Hi Julien,

On Tue, Jul 24, 2018 at 01:29:49PM -0400, Julien Semaan wrote:
> > Sorry, that was a "can" that really meant "can't" :) I can't reproduce it.
>     Aw well, I was surprised it was so easy :)
> 

yea, that would be too easy :)

> > Can you try to upgrade to 1.8.12 ? A number of bugs have been fixed since
>     I did try the upgrade to 1.8.12, got the same results (segfault)
> although I wasn't able to confirm it did segfault in the TCP splicing.
> 
> > What kind hove load do you have when it segfaults ?
>     Far from enormous, maximum 10 requests per second, but as I said in my
> first post, the amount of TCP retransmissions and resets is very large due
> to the fact we're black-holing the traffic since we use haproxy for our
> captive portal
>     I'd be happy to provide a pcap but for privacy reasons I can't extract
> it from a production environment and I can't see to replicate it in lab.
> 

I understand you don't want to send that kind of data.
I definitively can't seem to reproduce it, with a configuration very similar
to yours, including using netfilter to drop random packets to try to match
your setup as best as possible.
I'm afraid unless we're able to reproduce it, or you at least get a core,
it'll be very difficult to debug.

Regards,

Olivier



Re: Issue with TCP splicing

2018-07-24 Thread Olivier Houchard
Hi Julian,

On Tue, Jul 24, 2018 at 12:58:27PM -0400, Julien Semaan wrote:
> Hi Olivier,
> 
> Glad you're able to replicate it because I can't get it to happen
> consistently!
> I'd be happy if you could share the details of how it could be replicated if
> that's not too complex or hard to explain via email.

Sorry, that was a "can" that really meant "can't" :) I can't reproduce it.

> 
> Anyway, attached to this email, you'll find the haproxy configuration that
> has gotten the issue.
> Also, a lua script we do use and that is referenced in the configuration.
> 
> This is running on CentOS 7.5.1804 with kernel 3.10.0-862.6.3.el7.x86_64
> 
> It is running through systemd, unit file attached to this email as well.
> 
> Let me know if you need more infos, and I'll be glad to provide them or if
> you have a pcap I can replay to generate this issue in our lab.
> 
> Best Regards,
> 

Thanks a lot for the informations !
I'm now pretty sure it's not related to directly splicing, more likely some 
memory corruption.
Can you try to upgrade to 1.8.12 ? A number of bugs have been fixed since
1.8.9.
What kind hove load do you have when it segfaults ?

Thanks !

Olivier



Re: Issue with TCP splicing

2018-07-24 Thread Olivier Houchard
Hi Julian,

On Mon, Jul 23, 2018 at 09:07:32AM -0400, Julien Semaan wrote:
> Hi all,
> 
> We're currently using haproxy in our project PacketFence
> (https://packetfence.org) and are currently experiencing an issue with
> haproxy segfaulting when TCP splicing is enabled.
> 
> We're currently running version 1.8.9 and are occasionally getting segfaults
> on this specific line in stream.c (line 2131):
> (objt_cs(si_b->end) && __objt_cs(si_b->end)->conn->xprt &&
> __objt_cs(si_b->end)->conn->xprt->snd_pipe) &&
> 
> I wasn't too bright when I found it through gdb and forgot to copy the
> backtrace, so I'm hoping that the issue can be found with this limited
> information.
> 
> After commenting out the code for TCP splicing with the patch attached to
> the email, then the issue stopped happening.
> 
> Best Regards,
> 

I can seem to reproduce this.
Care to share your configuration and your setup ?

Thanks !

Olivier



[PATCH] MINOR: server: Don't make "server" in frontend fatal.

2018-07-24 Thread Olivier Houchard
Hi,

Right now, when we have "server", "default-server", or "server-template"
in a frontend, we warn about it being ignored, only to be considered fatal
later.
That sounds a bit silly, so the attached patch makes it non-fatal.

Regards,

Olivier
>From 9d2ab5b57dd4d14bce82923cb9b35bb74ac642bb Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 24 Jul 2018 16:48:59 +0200
Subject: [PATCH] BUG/MINOR: servers: Don't make "server" in a frontend fatal.

When parsing the configuration, if "server", "default-server" or
"server-template" are found in a frontend, we first warn that it will be
ignored, only to be considered a fatal error later. Be true to our word, and
just ignore it.

This should be backported to 1.8 and 1.7.
---
 src/server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/server.c b/src/server.c
index d96edc77a..4498fd878 100644
--- a/src/server.c
+++ b/src/server.c
@@ -1937,7 +1937,7 @@ int parse_server(const char *file, int linenum, char 
**args, struct proxy *curpr
goto out;
}
else if (warnifnotcap(curproxy, PR_CAP_BE, file, linenum, 
args[0], NULL))
-   err_code |= ERR_ALERT | ERR_FATAL;
+   err_code |= ERR_WARN;
 
/* There is no mandatory first arguments for default server. */
if (srv) {
-- 
2.14.3



Re: Building HAProxy 1.8 fails on Solaris

2018-07-20 Thread Olivier Houchard
Hi,

On Sat, Jul 21, 2018 at 12:51:53AM +0200, Lukas Tribus wrote:
> Hello,
> 
> On Fri, 20 Jul 2018 at 15:58, Olivier Houchard  wrote:
> >
> > Hi LuKas,
> >
> > On Fri, Jul 20, 2018 at 01:53:35PM +0200, Lukas Tribus wrote:
> > > Hello Oliver,
> > >
> > > On Fri, 20 Jul 2018 at 11:55, Olivier Houchard 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Jul 20, 2018 at 12:22:20AM +, Thrawn wrote:
> > > > >  So...is there a way to adapt this patch so it won't cause random SSL
> > > errors and is suitable to apply to the trunk? We don't really want to run 
> > > a
> > > customised build in production...
> > > >
> > > > You don't need the patch, just using USE_PTHREAD_PSHARED=yes should be
> > > enough.
> > >
> > > I don't really understand, are you saying that the spinlock introduced
> > > in cd1a526a doesn't work properly (as in: causes random SSL errors)? How
> > > does this work on FreeBSD and OpenBSD? This sounds like a supported
> > > configuration on a supported OS causes random SSL errrors when in
> > > multiprocess mode, but I imagine I got something wrong here.
> > >
> > > Please help me understand this issue.
> > >
> >
> > No, it works fine, but if you compile without USE_THREADS, the HA_ATOMIC*
> > macroes I used in my patch are in fact not atomic at all, so that may cause
> > random and unpredictable failures if the SSL cache code use them.
> 
> Ok, I guess my question is, how can we improve this so that SSL cache
> doesn't unpredictably fail?
> Threading is a huge feature, but behaving unpredictably when threading
> is disabled (or unsupported on the specific OS) is bad.
> 
> Can't the code handle the SSL cache like it did before threading was
> introduced, when threading is disabled?
> 
> 

My patch won't be applied, so it'll behave the exact same way it used to.
Reading the 1.6 code, the calls to __sync_lock* were already there, so
it probably did not compile on Solaris with a gcc older than 4.1, unless
USE_PTHREAD_PSHARED was defined.

Regards,

Olivier



Re: Building HAProxy 1.8 fails on Solaris

2018-07-20 Thread Olivier Houchard
Hi LuKas,

On Fri, Jul 20, 2018 at 01:53:35PM +0200, Lukas Tribus wrote:
> Hello Oliver,
> 
> On Fri, 20 Jul 2018 at 11:55, Olivier Houchard 
> wrote:
> >
> > Hi,
> >
> > On Fri, Jul 20, 2018 at 12:22:20AM +, Thrawn wrote:
> > >  So...is there a way to adapt this patch so it won't cause random SSL
> errors and is suitable to apply to the trunk? We don't really want to run a
> customised build in production...
> >
> > You don't need the patch, just using USE_PTHREAD_PSHARED=yes should be
> enough.
> 
> I don't really understand, are you saying that the spinlock introduced
> in cd1a526a doesn't work properly (as in: causes random SSL errors)? How
> does this work on FreeBSD and OpenBSD? This sounds like a supported
> configuration on a supported OS causes random SSL errrors when in
> multiprocess mode, but I imagine I got something wrong here.
> 
> Please help me understand this issue.
> 

No, it works fine, but if you compile without USE_THREADS, the HA_ATOMIC*
macroes I used in my patch are in fact not atomic at all, so that may cause
random and unpredictable failures if the SSL cache code use them.

Regards,

Olivier



Re: Building HAProxy 1.8 fails on Solaris

2018-07-20 Thread Olivier Houchard
Hi,

On Fri, Jul 20, 2018 at 12:22:20AM +, Thrawn wrote:
>  So...is there a way to adapt this patch so it won't cause random SSL errors 
> and is suitable to apply to the trunk? We don't really want to run a 
> customised build in production...

You don't need the patch, just using USE_PTHREAD_PSHARED=yes should be enough.

Regards,

Olivier

> On Wednesday, 18 July 2018, 10:45:38 pm AEST, Olivier Houchard 
>  wrote:  
>  
>  Hi,
> 
> On Wed, Jul 18, 2018 at 02:17:59AM +, Thrawn wrote:
> > Mea culpa, I applied the patch incorrectly. After fixing that, I can 
> > successfully build with 'USE_THREAD=' but without 'USE_PTHREAD_PSHARED=yes' 
> > (although from what Olivier said, I probably shouldn't do that).  On 
> > Wednesday, 18 July 2018, 12:10:57 pm AEST, Thrawn 
> >  wrote:  
> 
> Yeah the patch may get you to experience random errors if you share a SSL
> cache across multiple process, USE_PTHREAD_PSHARED=yes should build as well,
> and should do the right thing.
> 
> Regards,
> 
> Olivier
>   



Re: Building HAProxy 1.8 fails on Solaris

2018-07-18 Thread Olivier Houchard
Hi,

On Wed, Jul 18, 2018 at 02:17:59AM +, Thrawn wrote:
> Mea culpa, I applied the patch incorrectly. After fixing that, I can 
> successfully build with 'USE_THREAD=' but without 'USE_PTHREAD_PSHARED=yes' 
> (although from what Olivier said, I probably shouldn't do that).   On 
> Wednesday, 18 July 2018, 12:10:57 pm AEST, Thrawn 
>  wrote:  

Yeah the patch may get you to experience random errors if you share a SSL
cache across multiple process, USE_PTHREAD_PSHARED=yes should build as well,
and should do the right thing.

Regards,

Olivier



Re: Building HAProxy 1.8 fails on Solaris

2018-07-17 Thread Olivier Houchard
Hi again,

On Tue, Jul 17, 2018 at 01:55:33PM +0200, Olivier Houchard wrote:
> Hi Lukas,
> 
> On Tue, Jul 17, 2018 at 01:08:39PM +0200, Lukas Tribus wrote:
> > On Tue, 17 Jul 2018 at 01:09, Thrawn  
> > wrote:
> > >
> > > Ah, indeed, the GCC version provided on our server is 3.4.3. But the 
> > > readme on https://github.com/haproxy/haproxy says "GCC between 2.95 and 
> > > 4.8". Can the build be changed to continue supporting older GCC, or do 
> > > the docs need an update?
> > 
> > Like I said earlier, "make clean" before compiling with different
> > parameters, like USE_THREAD=
> > 
> > Haproxy 1.8 is supposed to build fine with gcc 3 when disabling
> > threading, but if you just compiled with threads enabled, you do need
> > to "make clean":
> > 
> > 
> > > I think your gcc is just too old. Those appeared around gcc 4.1 or so.
> > 
> > With USE_THREAD= it is supposed to build fine. While threading will
> > not work with gcc 3, we did not drop support for gcc3 altogether.
> > 
> > 
> 
> Unfortunately it is not true. __sync_* was used in include/proto/shctx.h.
> The attached patch uses the haproxy macroes instead, and so should get it
> to compile again with older gcc. Thrawn, can you please test it ?
> 

After talking with Will a bit, we realized this patch might not work if
using a cache shared across multiple process.
Can you just add USE_PTHREAD_PSHARED=yes on your command line, it should do
the trick ?

Thanks !

Olivier



Re: Building HAProxy 1.8 fails on Solaris

2018-07-17 Thread Olivier Houchard
Hi Lukas,

On Tue, Jul 17, 2018 at 01:08:39PM +0200, Lukas Tribus wrote:
> On Tue, 17 Jul 2018 at 01:09, Thrawn  wrote:
> >
> > Ah, indeed, the GCC version provided on our server is 3.4.3. But the readme 
> > on https://github.com/haproxy/haproxy says "GCC between 2.95 and 4.8". Can 
> > the build be changed to continue supporting older GCC, or do the docs need 
> > an update?
> 
> Like I said earlier, "make clean" before compiling with different
> parameters, like USE_THREAD=
> 
> Haproxy 1.8 is supposed to build fine with gcc 3 when disabling
> threading, but if you just compiled with threads enabled, you do need
> to "make clean":
> 
> 
> > I think your gcc is just too old. Those appeared around gcc 4.1 or so.
> 
> With USE_THREAD= it is supposed to build fine. While threading will
> not work with gcc 3, we did not drop support for gcc3 altogether.
> 
> 

Unfortunately it is not true. __sync_* was used in include/proto/shctx.h.
The attached patch uses the haproxy macroes instead, and so should get it
to compile again with older gcc. Thrawn, can you please test it ?

Thanks !

Olivier
>From 91546285ceed74516f42a9e98fac14a5094133e0 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 17 Jul 2018 13:52:19 +0200
Subject: [PATCH] MINOR: shctx: Use HA_ATOMIC_* instead of using the gcc
 builtins.

When implementing atomic operations, use the HA_ATOMIC macroes provided by
hathreads.h, instead of using the (old) gcc builtins. That way it may uses
the newer builtins, or get it to compile with an old gcc that doesn't provide
any atomic builtins.
---
 include/proto/shctx.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/proto/shctx.h b/include/proto/shctx.h
index 55cb2a77..4eed582b 100644
--- a/include/proto/shctx.h
+++ b/include/proto/shctx.h
@@ -132,17 +132,18 @@ static inline unsigned char atomic_dec(unsigned int *ptr)
 #else /* if no x86_64 or i586 arch: use less optimized gcc >= 4.1 built-ins */
 static inline unsigned int xchg(unsigned int *ptr, unsigned int x)
 {
-   return __sync_lock_test_and_set(ptr, x);
+   return HA_ATOMIC_XCHG(ptr, x);
 }
 
 static inline unsigned int cmpxchg(unsigned int *ptr, unsigned int old, 
unsigned int new)
 {
-   return __sync_val_compare_and_swap(ptr, old, new);
+   HA_ATOMIC_CAS(ptr, , new);
+   return old;
 }
 
 static inline unsigned char atomic_dec(unsigned int *ptr)
 {
-   return __sync_sub_and_fetch(ptr, 1) ? 1 : 0;
+   return HA_ATOMIC_SUB(ptr, 1) ? 1 : 0;
 }
 
 #endif
-- 
2.14.3



Re: Building HAProxy 1.8 fails on Solaris

2018-07-16 Thread Olivier Houchard
Hi,

On Mon, Jul 16, 2018 at 01:12:18AM +, Thrawn wrote:
>  Update: If I disable threading with
> USE_THREAD=
> then the build gets much further, but still fails eventually with:
> gcc  -g -o haproxy src/ev_poll.o ebtree/ebtree.o ebtree/eb32sctree.o 
> ebtree/eb32tree.o ebtree/eb64tree.o ebtree/ebmbtree.o ebtree/ebsttree.o 
> ebtree/ebimtree.o ebtree/ebistree.o src/proto_http.o src/cfgparse.o 
> src/server.o src/stream.o src/flt_spoe.o src/stick_table.o src/stats.o 
> src/mux_h2.o src/checks.o src/haproxy.o src/log.o src/dns.o src/peers.o 
> src/standard.o src/sample.o src/cli.o src/stream_interface.o src/proto_tcp.o 
> src/backend.o src/proxy.o src/tcp_rules.o src/listener.o src/flt_http_comp.o 
> src/pattern.o src/cache.o src/filters.o src/vars.o src/acl.o src/payload.o 
> src/connection.o src/raw_sock.o src/proto_uxst.o src/flt_trace.o 
> src/session.o src/ev_select.o src/channel.o src/task.o src/queue.o 
> src/applet.o src/map.o src/frontend.o src/freq_ctr.o src/lb_fwlc.o 
> src/mux_pt.o src/auth.o src/fd.o src/hpack-dec.o src/memory.o src/lb_fwrr.o 
> src/lb_chash.o src/lb_fas.o src/hathreads.o src/chunk.o src/lb_map.o 
> src/xxhash.o src/regex.o src/shctx.o src/buffer.o src/action.o src/h1.o 
> src/compression.o src/pipe.o src/namespace.o src/sha1.o src/hpack-tbl.o 
> src/hpack-enc.o src/uri_auth.o src/time.o src/proto_udp.o src/arg.o 
> src/signal.o src/protocol.o src/lru.o src/hdr_idx.o src/hpack-huff.o 
> src/mailers.o src/h2.o src/base64.o src/hash.o -lnsl -lsocket  -lcrypt 
> -L../pcre/lib -Wl,-Bstatic -lpcreposix -lpcre -Wl,-Bdynamic Undefined         
>               first referenced symbol                             in 
> file__sync_sub_and_fetch                
> src/cache.o__sync_val_compare_and_swap         
> src/cache.o__sync_lock_test_and_set            src/cache.old: fatal: symbol 
> referencing errors. No output written to haproxycollect2: ld returned 1 exit 
> statusgmake: *** [haproxy] Error 1

I think your gcc is just too old. Those appeared around gcc 4.1 or so.

Regards,

Olivier



[PATCHES] Fix a few shortcomings in the tasklet code

2018-06-14 Thread Olivier Houchard
Hi,

Attached are 2 patches that fix a few bugs in the tasklet code.
It should have little incidence right now because tasklets are unused, but
will be useful for later work.

Regards,

Olivier
>From fd2838a8b4eae2d9801592889285ae221fc3a7cb Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 8 Jun 2018 17:08:19 +0200
Subject: [PATCH 1/2] MINOR: tasks: Make sure we correctly init and deinit a
 tasklet.

Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
---
 include/proto/task.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index 760b368b..47f74d4e 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -229,6 +229,7 @@ static inline void task_insert_into_tasklet_list(struct 
task *t)
 static inline void task_remove_from_task_list(struct task *t)
 {
LIST_DEL(&((struct tasklet *)t)->list);
+   LIST_INIT(&((struct tasklet *)t)->list);
task_list_size[tid]--;
HA_ATOMIC_SUB(_run_queue, 1);
if (!TASK_IS_TASKLET(t)) {
@@ -270,7 +271,7 @@ static inline void tasklet_init(struct tasklet *t)
t->nice = -32768;
t->calls = 0;
t->state = 0;
-   t->list.p = t->list.n = NULL;
+   LIST_INIT(>list);
 }
 
 static inline struct tasklet *tasklet_new(void)
@@ -321,9 +322,10 @@ static inline void task_free(struct task *t)
t->process = NULL;
 }
 
-
 static inline void tasklet_free(struct tasklet *tl)
 {
+   LIST_DEL(>list);
+
pool_free(pool_head_tasklet, tl);
if (unlikely(stopping))
pool_flush(pool_head_tasklet);
-- 
2.14.3

>From e09a118d1a7b120e4529f56c2ba4458f5059f5ec Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 14 Jun 2018 15:40:47 +0200
Subject: [PATCH 2/2] BUG/MINOR: tasklets: Just make sure we don't pass a
 tasklet to the handler.

We can't just set t to NULL if it's a tasklet, or we'd have a hard time
accessing to t->process, so just make sure we pass NULL as the first parameter
of t->process if it's a tasklet.
This should be a non-issue at this point, as tasklets aren't used yet.
---
 src/task.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/task.c b/src/task.c
index fb484073..815df24c 100644
--- a/src/task.c
+++ b/src/task.c
@@ -342,13 +342,11 @@ void process_runnable_tasks()
rqueue_size[tid]--;
t->calls++;
curr_task = (struct task *)t;
-   if (TASK_IS_TASKLET(t))
-   t = NULL;
if (likely(process == process_stream))
t = process_stream(t, ctx, state);
else {
if (t->process != NULL)
-   t = process(t, ctx, state);
+   t = process(TASK_IS_TASKLET(t) ? NULL : t, ctx, 
state);
else {
__task_free(t);
t = NULL;
-- 
2.14.3



Re: haproxy-1.8.8 seamless reloads failing with abns@ sockets

2018-06-07 Thread Olivier Houchard
Hi Willy,

On Thu, Jun 07, 2018 at 11:45:39AM +0200, Willy Tarreau wrote:
> Hi Olivier,
> 
> On Wed, Jun 06, 2018 at 06:40:05PM +0200, Olivier Houchard wrote:
> > You're right indeed, that code was not written with abns sockets in mind.
> > The attached patch should fix it. It was created from master, but should
> > apply to 1.8 as well.
> > 
> > Thanks !
> > 
> > Olivier
> 
> > >From 3ba0fbb7c9e854aafb8a6b98482ad7d23bbb414d Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard 
> > Date: Wed, 6 Jun 2018 18:34:34 +0200
> > Subject: [PATCH] MINOR: unix: Make sure we can transfer abns sockets as 
> > well  on seamless reload.
> 
> Would you be so kind as to tag it "BUG" so that our beloved stable
> team catches it for the next 1.8 ? ;-)
> 

Sir yes sir.

> > diff --git a/src/proto_uxst.c b/src/proto_uxst.c
> > index 9fc50dff4..a1da337fe 100644
> > --- a/src/proto_uxst.c
> > +++ b/src/proto_uxst.c
> > @@ -146,7 +146,12 @@ static int uxst_find_compatible_fd(struct listener *l)
> > after_sockname++;
> > if (!strcmp(after_sockname, ".tmp"))
> > break;
> > -   }
> > +   /* abns sockets sun_path starts with a \0 */
> > +   } else if (un1->sun_path[0] == 0
> > +   && un2->sun_path[0] == 0
> > +   && !strncmp(>sun_path[1], >sun_path[1],
> > +   sizeof(un1->sun_path) - 1))
> > +   break;
> 
> It may still randomly fail here because null bytes are explicitly permitted
> in the sun_path. Instead I'd suggest this :
> 
>   } else if (un1->sun_path[0] == 0 &&
>  memcmp(un1->sun_path, un2->sun_path, sizeof(un1->sun_path) 
> == 0)
> 
> Jarno, if you still notice occasional failures, please try with this.
> 

You're right, as unlikely as it can be in our current scenario, better safe
than sorry.
The attached patch is updated to reflect that.

Regards,

Olivier
>From b6c8bd3102abcf1bb1660429b9b737fcd7a60b61 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 18:34:34 +0200
Subject: [PATCH] BUG/MINOR: unix: Make sure we can transfer abns sockets on
 seamless reload.

When checking if a socket we got from the parent is suitable for a listener,
we just checked that the path matched sockname.tmp, however this is
unsuitable for abns sockets, where we don't have to create a temporary
file and rename it later.
To detect that, check that the first character of the sun_path is 0 for
both, and if so, that _path[1] is the same too.

This should be backported to 1.8.
---
 src/proto_uxst.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/proto_uxst.c b/src/proto_uxst.c
index 9fc50dff4..ab788bde7 100644
--- a/src/proto_uxst.c
+++ b/src/proto_uxst.c
@@ -146,7 +146,12 @@ static int uxst_find_compatible_fd(struct listener *l)
after_sockname++;
if (!strcmp(after_sockname, ".tmp"))
break;
-   }
+   /* abns sockets sun_path starts with a \0 */
+   } else if (un1->sun_path[0] == 0
+   && un2->sun_path[0] == 0
+   && !memcmp(>sun_path[1], >sun_path[1],
+   sizeof(un1->sun_path) - 1))
+   break;
}
xfer_sock = xfer_sock->next;
}
-- 
2.14.3



Re: haproxy-1.8.8 seamless reloads failing with abns@ sockets

2018-06-06 Thread Olivier Houchard
Hi Jarno,

On Sat, May 12, 2018 at 06:04:10PM +0300, Jarno Huuskonen wrote:
> Hi,
> 
> I'm testing 1.8.8(1.8.8-52ec357 snapshot) and seamless reloads
> (expose-fd listeners).
> 
> I'm testing with this config (missing some default timeouts):
> --8<
> global
> stats socket /tmp/stats level admin expose-fd listeners
> 
> defaults
> mode http
> log global
> option httplog
> retries 2
> timeout connect 1500ms
> timeout client  10s
> timeout server  10s
> 
> listen testme
> bind ipv4@127.0.0.1:8080
> server test_abns_server abns@wpproc1 send-proxy-v2
> 
> frontend test_abns
> bind abns@wpproc1 accept-proxy
> http-request deny deny_status 200
> --8<
> 
> Reloads (kill -USR2 $(cat /tmp/haproxy.pid)) are failing:
> "Starting frontend test_abns: cannot listen to socket []"
> (And request to 127.0.0.1:8080 timeout).
> 
> I guess the problem is that on reload, haproxy is trying
> to bind the abns socket again, because (proto_uxst.c) uxst_bind_listener /
> uxst_find_compatible_fd doesn't find existing (the one copied over from
> old process) file descriptor for this abns socket.
> 
> Is uxst_find_compatible_fd only looking for .X.tmp sockets
> and ignoring abns sockets where path starts with \0 ?
> 
> Using unix socket instead of abns socket makes the reload work.
> 

Sorry for the late answer.

You're right indeed, that code was not written with abns sockets in mind.
The attached patch should fix it. It was created from master, but should
apply to 1.8 as well.

Thanks !

Olivier
>From 3ba0fbb7c9e854aafb8a6b98482ad7d23bbb414d Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 18:34:34 +0200
Subject: [PATCH] MINOR: unix: Make sure we can transfer abns sockets as well
 on seamless reload.

When checking if a socket we got from the parent is suitable for a listener,
we just checked that the path matched sockname.tmp, however this is
unsuitable for abns sockets, where we don't have to create a temporary
file and rename it later.
To detect that, check that the first character of the sun_path is 0 for
both, and if so, that _path[1] is the same too.

This should be backported to 1.8.
---
 src/proto_uxst.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/proto_uxst.c b/src/proto_uxst.c
index 9fc50dff4..a1da337fe 100644
--- a/src/proto_uxst.c
+++ b/src/proto_uxst.c
@@ -146,7 +146,12 @@ static int uxst_find_compatible_fd(struct listener *l)
after_sockname++;
if (!strcmp(after_sockname, ".tmp"))
break;
-   }
+   /* abns sockets sun_path starts with a \0 */
+   } else if (un1->sun_path[0] == 0
+   && un2->sun_path[0] == 0
+   && !strncmp(>sun_path[1], >sun_path[1],
+   sizeof(un1->sun_path) - 1))
+   break;
}
xfer_sock = xfer_sock->next;
}
-- 
2.14.3



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
On Wed, Jun 06, 2018 at 10:06:30AM -0400, Patrick Hemmer wrote:
> 
> 
> On 2018/6/6 08:24, Olivier Houchard wrote:
> > Hi Willy,
> >
> > On Wed, Jun 06, 2018 at 02:09:01PM +0200, Willy Tarreau wrote:
> >> On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
> >>> When building without threads enabled, instead of just using the global
> >>> runqueue, just use the local runqueue associated with the only thread, as
> >>> that's what is now expected for a single thread in 
> >>> prcoess_runnable_tasks().
> >> Just out of curiosity, shouldn't we #ifdef out the global runqueue
> >> definition when running without threads in order to catch such cases
> >> in the future ?
> >>
> > I think this is actually a good idea.
> > My only concern is it adds quite a bit of #ifdef USE_THREAD, see the 
> > attached
> > patch.
> >
> > Regards,
> >
> > Olivier
> With this patch I'm getting:
> 
> include/proto/task.h:138:26: error: use of undeclared identifier 'rqueue'
> 
> With the previous patch, both reported issues are resolved.
> 

Hi Patrick,

The last patch depended on the first one, so without it that failure is
expected.

Thanks a lot for reporting and testing.

Willy, I think you can push both the patches.

Regards,

Olivier



Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
Hi Willy,

On Wed, Jun 06, 2018 at 02:09:01PM +0200, Willy Tarreau wrote:
> On Wed, Jun 06, 2018 at 02:04:35PM +0200, Olivier Houchard wrote:
> > When building without threads enabled, instead of just using the global
> > runqueue, just use the local runqueue associated with the only thread, as
> > that's what is now expected for a single thread in prcoess_runnable_tasks().
> 
> Just out of curiosity, shouldn't we #ifdef out the global runqueue
> definition when running without threads in order to catch such cases
> in the future ?
> 

I think this is actually a good idea.
My only concern is it adds quite a bit of #ifdef USE_THREAD, see the attached
patch.

Regards,

Olivier
>From baeb750ed13307010bfef39de92ec9bb8af54022 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 14:22:03 +0200
Subject: [PATCH] MINOR: tasks: Don't define rqueue if we're building without
 threads.

To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
---
 include/proto/task.h |  2 ++
 src/task.c   | 28 +---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index dc8a54481..266246098 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -93,7 +93,9 @@ extern struct pool_head *pool_head_tasklet;
 extern struct pool_head *pool_head_notification;
 extern THREAD_LOCAL struct task *curr_task; /* task currently running or NULL 
*/
 extern THREAD_LOCAL struct eb32sc_node *rq_next; /* Next task to be 
potentially run */
+#ifdef USE_THREAD
 extern struct eb_root rqueue;  /* tree constituting the run queue */
+#endif
 extern struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the 
per-thread run queue */
 extern struct list task_list[MAX_THREADS]; /* List of tasks to be run, mixing 
tasks and tasklets */
 extern int task_list_size[MAX_THREADS]; /* Number of task sin the task_list */
diff --git a/src/task.c b/src/task.c
index 16c723230..c961725a1 100644
--- a/src/task.c
+++ b/src/task.c
@@ -49,9 +49,11 @@ __decl_hathreads(HA_SPINLOCK_T __attribute__((aligned(64))) 
rq_lock); /* spin lo
 __decl_hathreads(HA_SPINLOCK_T __attribute__((aligned(64))) wq_lock); /* spin 
lock related to wait queue */
 
 static struct eb_root timers;  /* sorted timers tree */
+#ifdef USE_THREAD
 struct eb_root rqueue;  /* tree constituting the run queue */
-struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the per-thread 
run queue */
 static int global_rqueue_size; /* Number of element sin the global runqueue */
+#endif
+struct eb_root rqueue_local[MAX_THREADS]; /* tree constituting the per-thread 
run queue */
 static int rqueue_size[MAX_THREADS]; /* Number of elements in the per-thread 
run queue */
 static unsigned int rqueue_ticks;  /* insertion count */
 
@@ -68,10 +70,13 @@ void __task_wakeup(struct task *t, struct eb_root *root)
void *expected = NULL;
int *rq_size;
 
+#ifdef USE_THREAD
if (root == ) {
rq_size = _rqueue_size;
HA_SPIN_LOCK(TASK_RQ_LOCK, _lock);
-   } else {
+   } else
+#endif
+   {
int nb = root - _local[0];
rq_size = _size[nb];
}
@@ -80,8 +85,10 @@ void __task_wakeup(struct task *t, struct eb_root *root)
 */
 redo:
if (unlikely(!HA_ATOMIC_CAS(>rq.node.leaf_p, , (void 
*)0x1))) {
+#ifdef USE_THREAD
if (root == )
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
+#endif
return;
}
/* There's a small race condition, when running a task, the thread
@@ -104,8 +111,10 @@ redo:
state = (volatile unsigned short)(t->state);
if (unlikely(state != 0 && !(state & TASK_RUNNING)))
goto redo;
+#ifdef USE_THREAD
if (root == )
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
+#endif
return;
}
HA_ATOMIC_ADD(_run_queue, 1);
@@ -124,10 +133,13 @@ redo:
}
 
eb32sc_insert(root, >rq, t->thread_mask);
+#ifdef USE_THREAD
if (root == ) {
global_rqueue_size++;
HA_SPIN_UNLOCK(TASK_RQ_LOCK, _lock);
-   } else {
+   } else
+#endif
+   {
int nb = root - _local[0];
 
rqueue_size[nb]++;
@@ -239,7 +251,9 @@ void process_runnable_tasks()
 {
struct task *t;
int max_processed;
+#ifdef USE_THREAD
uint64_t average = 0;
+#endif
 
tasks_run_queue_cur = tasks_run_queue; /* keep a copy for reporting */
nb_tasks_cur = nb_tasks;
@@ -253,6 +267,7 @@ void process_runnable_tasks()
return;
}
 
+#ifdef USE_THREAD
average = tasks_run_queue / global.nbthread;
 

Re: haproxy requests hanging since b0bdae7

2018-06-06 Thread Olivier Houchard
Hi Patrick,

On Tue, Jun 05, 2018 at 05:02:41PM -0400, Patrick Hemmer wrote:
> It seems that commit b0bdae7 has completely broken haproxy for me. When
> I send a request to haproxy, it just sits there. The backend server
> receives nothing, and the client waits for a response.
> Running with debug enabled I see just a single line:
> :f1.accept(0004)=0005 from [127.0.0.1:63663] ALPN=
> 
> commit b0bdae7b88d53cf8f18af0deab6d4c29ac25b7f9 (refs/bisect/bad)
> Author: Olivier Houchard 
> Date:   Fri May 18 18:45:28 2018 +0200
> 
> MAJOR: tasks: Introduce tasklets.
>
> Introduce tasklets, lightweight tasks. They have no notion of priority,
> they are just run as soon as possible, and will probably be used for I/O
> later.
>
> For the moment they're used to replace the temporary thread-local list
> that was used in the scheduler. The first part of the struct is common
> with tasks so that tasks can be cast to tasklets and queued in this
> list.
> Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
> it cannot accidently be confused as not in the queue.
>
> Pure tasklets are identifiable by their nice value of -32768 (which is
> normally not possible).
> 
> Issue reproducible with a very simple config:
> 
> defaults
>   mode http
> frontend f1
>   bind :8081
>   default_backend b1
> backend b1
>   server s1 127.0.0.1:8081
> 
> Compiled on OS-X with only a single make variable of TARGET=osx
> Compiler: clang-900.0.39.2
> 
> 

Oops, seems I broke haproxy when built without thread support.
The attached patch should fix both the issues you reported, can you confirm
it ?

Thanks a lot !

Olivier
>From d3c0abb18b44a942dcc7ead072be84f323184d0f Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 6 Jun 2018 14:01:08 +0200
Subject: [PATCH] BUG/MEDIUM: tasks: Use the local runqueue when building
 without threads.

When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
---
 include/proto/task.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index 0c2c5f28c..dc8a54481 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -133,7 +133,7 @@ static inline void task_wakeup(struct task *t, unsigned int 
f)
else
root = 
 #else
-   struct eb_root *root = 
+   struct eb_root *root = _local[tid];
 #endif
 
state = HA_ATOMIC_OR(>state, f);
-- 
2.14.3



Re: [PATCH]: MINOR :task another explicit cast

2018-06-05 Thread Olivier Houchard
Hi,

On Tue, Jun 05, 2018 at 10:46:34AM +, David CARLIER wrote:
> Hi,
> 
> Did a full rebuild and caught it only.
> 
> Regards.



Oops, thanks a lot David, I hope it'll be the last one :)

Willy, can you please push it ?

Thanks !

Olivier



  1   2   >