Re: segfault in haproxy=1.8.4

2018-03-26 Thread Willy Tarreau
On Mon, Mar 26, 2018 at 08:53:07AM +0300, ?? ? wrote:
> Hi!
> 
> It's been almost 2 weeks since I've installed the patch and there were no
> segfaults since then. It seems that the problem is fixed now. Thank you!

Thank you Maxim for the report, it's much appreciated!

Willy



Re: segfault in haproxy=1.8.4

2018-03-25 Thread Максим Куприянов
Hi!

It's been almost 2 weeks since I've installed the patch and there were no
segfaults since then. It seems that the problem is fixed now. Thank you!

2018-03-19 23:16 GMT+03:00 William Dauchy :

> On Mon, Mar 19, 2018 at 08:41:16PM +0100, Willy Tarreau wrote:
> > For me, "experimental" simply means "we did our best to ensure it works
> > but we're realist and know that bug-free doesn't exist, so a risk remains
> > that a bug will be hard enough to fix so as to force you to disable the
> > feature for the time it takes to fix it". This issue between threads and
> > queue is one such example. Some of the bugs faced on H2 requiring some
> > heavy changes were other examples. But overall we know these features
> > are highly demanded and are committed to make them work fine :-)
>
> you are right, we probably magnified in our head the different issues we
> had related to this.
>
> > I'm still interested in knowing if you find crazy last percentile values.
> > We've had that a very long time ago (version 1.3 or so) when some pending
> > conns were accidently skipped, so I know how queues can amplify small
> > issues. The only real risk here in my opinion is that the sync point was
> > only used for health checks till now so it was running at low loads and
> > if it had any issue, it would likely have remained unnoticed. But the
> code
> > is small enough to be audited, and after re-reading it this afternoon I
> > found it fine.
>
> will do, migrating some low latency applications is more mid/longterm but
> will see how the first results during the preparation tests.
>
> > If you want to run a quick test with epoll, just apply this dirty hack :
> >
> > diff --git a/src/ev_epoll.c b/src/ev_epoll.c
> > index b98ca8c..7bafd16 100644
> > --- a/src/ev_epoll.c
> > +++ b/src/ev_epoll.c
> > @@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int
> exp)
> > fd_nbupdt = 0;
> >
> > /* compute the epoll_wait() timeout */
> > -   if (!exp)
> > +   if (1)
> > +   wait_time = 0;
> > +   else if (!exp)
> > wait_time = MAX_DELAY_MS;
> > else if (tick_is_expired(exp, now_ms)) {
> > activity[tid].poll_exp++;
> >
> > Please note that as this, it's suboptimal because it will increase the
> > contention on other places, causing the perfomance to be a bit lower in
> > certain situations. I do have some experimental code to loop on epoll
> > instead but it's not completely stable yet. We an exchange on this later
> > if you want. But feel free to apply this to your latency tests.
>
> thanks a lot, will give a try!
>
> --
> William
>


Re: segfault in haproxy=1.8.4

2018-03-19 Thread William Dauchy
On Mon, Mar 19, 2018 at 08:41:16PM +0100, Willy Tarreau wrote:
> For me, "experimental" simply means "we did our best to ensure it works
> but we're realist and know that bug-free doesn't exist, so a risk remains
> that a bug will be hard enough to fix so as to force you to disable the
> feature for the time it takes to fix it". This issue between threads and
> queue is one such example. Some of the bugs faced on H2 requiring some
> heavy changes were other examples. But overall we know these features
> are highly demanded and are committed to make them work fine :-)

you are right, we probably magnified in our head the different issues we
had related to this.

> I'm still interested in knowing if you find crazy last percentile values.
> We've had that a very long time ago (version 1.3 or so) when some pending
> conns were accidently skipped, so I know how queues can amplify small
> issues. The only real risk here in my opinion is that the sync point was
> only used for health checks till now so it was running at low loads and
> if it had any issue, it would likely have remained unnoticed. But the code
> is small enough to be audited, and after re-reading it this afternoon I
> found it fine.

will do, migrating some low latency applications is more mid/longterm but
will see how the first results during the preparation tests.

> If you want to run a quick test with epoll, just apply this dirty hack :
>
> diff --git a/src/ev_epoll.c b/src/ev_epoll.c
> index b98ca8c..7bafd16 100644
> --- a/src/ev_epoll.c
> +++ b/src/ev_epoll.c
> @@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
> fd_nbupdt = 0;
>
> /* compute the epoll_wait() timeout */
> -   if (!exp)
> +   if (1)
> +   wait_time = 0;
> +   else if (!exp)
> wait_time = MAX_DELAY_MS;
> else if (tick_is_expired(exp, now_ms)) {
> activity[tid].poll_exp++;
>
> Please note that as this, it's suboptimal because it will increase the
> contention on other places, causing the perfomance to be a bit lower in
> certain situations. I do have some experimental code to loop on epoll
> instead but it's not completely stable yet. We an exchange on this later
> if you want. But feel free to apply this to your latency tests.

thanks a lot, will give a try!

-- 
William



Re: segfault in haproxy=1.8.4

2018-03-19 Thread Willy Tarreau
On Mon, Mar 19, 2018 at 08:28:14PM +0100, William Dauchy wrote:
> On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote:
> > Threading was clearly released with an experimental status, just like
> > H2, because we knew we'd be facing some post-release issues in these
> > two areas that are hard to get 100% right at once. However I consider
> > that the situation has got much better, and to confirm this, both of
> > these are now enabled by default in HapTech's products. With this said,
> > I expect that over time we'll continue to see a few bugs, but not more
> > than what we're seeing in various areas. For example, we didn't get a
> > single issue on haproxy.org since it was updated to the 1.8.1 or so,
> > 3 months ago. So this is getting quite good.
> 
> ok it was not clear to me as being experimental since it was quite
> widely advertised in several blog posts, but I probably missed
> something.

For me, "experimental" simply means "we did our best to ensure it works
but we're realist and know that bug-free doesn't exist, so a risk remains
that a bug will be hard enough to fix so as to force you to disable the
feature for the time it takes to fix it". This issue between threads and
queue is one such example. Some of the bugs faced on H2 requiring some
heavy changes were other examples. But overall we know these features
are highly demanded and are committed to make them work fine :-)

> > Also if you're running with nbproc > 1 instead, the maxconn setting is
> > not really respected since it becomes per-process. When you run with
> > 8 processes it doesn't mean much anymore, or you need to have small
> > maxconn settings, implying that sometimes a process might queue some
> > requests while there are available slots in other processes. Thus I'd
> > argue that the threads here significantly improve the situation by
> > allowing all connection slots to be used by all CPUs, which is a real
> > improvement which should theorically show you lower latencies.
> 
> thanks for these details. We will run some tests on our side as well;
> the commit message made me worried about the last percentile of
> requests which might have crazy numbers sometimes.
> I now better understand we are speaking about 1.75 extra microseconds.

I'm still interested in knowing if you find crazy last percentile values.
We've had that a very long time ago (version 1.3 or so) when some pending
conns were accidently skipped, so I know how queues can amplify small
issues. The only real risk here in my opinion is that the sync point was
only used for health checks till now so it was running at low loads and
if it had any issue, it would likely have remained unnoticed. But the code
is small enough to be audited, and after re-reading it this afternoon I
found it fine.

> > Note that if this is of interest to you, it's trivial to make haproxy
> > run in busy polling mode, and in this case the performance increases to
> > 30900 conn/s, at the expense of eating all your CPU (which possibly you
> > don't care about if the latency is your worst ennemy). We can possibly
> > even improve this to ensure that it's done only when there are existing
> > sessions on a given thread. Let me know if this is something that could
> > be of interest to you, as I think we could make this configurable and
> > bypass the sync point in this case.
> 
> It is definitely something interesting for us to make it configurable.
> I will try to have a look as well.

If you want to run a quick test with epoll, just apply this dirty hack :

diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index b98ca8c..7bafd16 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
fd_nbupdt = 0;
 
/* compute the epoll_wait() timeout */
-   if (!exp)
+   if (1)
+   wait_time = 0;
+   else if (!exp)
wait_time = MAX_DELAY_MS;
else if (tick_is_expired(exp, now_ms)) {
activity[tid].poll_exp++;

Please note that as this, it's suboptimal because it will increase the
contention on other places, causing the perfomance to be a bit lower in
certain situations. I do have some experimental code to loop on epoll
instead but it's not completely stable yet. We an exchange on this later
if you want. But feel free to apply this to your latency tests.

> > We noticed a nice performance boost on the last one with many cores
> > (24 threads, something like +40% on connection rate), but we'll probably
> > see even better once the rest is addressed.
> 
> indeed, I remember we spoke about those improvments at the last meetup.
> nice work, 1.9 looks already interesting from this point of view!

In fact 1.8's target was to get threads working and in a good enough
shape to improve on them. 1.9's target will be to have them even faster :-)

Willy



Re: segfault in haproxy=1.8.4

2018-03-19 Thread William Dauchy
Hi Willy,

Thank your for your detailed answer.

On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote:
> Threading was clearly released with an experimental status, just like
> H2, because we knew we'd be facing some post-release issues in these
> two areas that are hard to get 100% right at once. However I consider
> that the situation has got much better, and to confirm this, both of
> these are now enabled by default in HapTech's products. With this said,
> I expect that over time we'll continue to see a few bugs, but not more
> than what we're seeing in various areas. For example, we didn't get a
> single issue on haproxy.org since it was updated to the 1.8.1 or so,
> 3 months ago. So this is getting quite good.

ok it was not clear to me as being experimental since it was quite
widely advertised in several blog posts, but I probably missed
something. Thanks for the clarification though.
Since 1.8.1 the only issues we had with nbthread was indeed related
to using it along with seamless reload but I will give a second try with
the latest patches you released in 1.8 tree today.

> I ran a stress test on this patch, with a single server running with
> "maxconn 1", with a frontend bound to two threads. I measure exactly
> 3 conn/s with a single thread (keep in mind that there's a single
> connection at once), and 28500 with two threads. Thus the sync point
> takes on average an extra 1.75 microsecond, compared to the 35
> microseconds it takes on average to finish processing the request
> (connect, server processing, response, close).
> Also if you're running with nbproc > 1 instead, the maxconn setting is
> not really respected since it becomes per-process. When you run with
> 8 processes it doesn't mean much anymore, or you need to have small
> maxconn settings, implying that sometimes a process might queue some
> requests while there are available slots in other processes. Thus I'd
> argue that the threads here significantly improve the situation by
> allowing all connection slots to be used by all CPUs, which is a real
> improvement which should theorically show you lower latencies.

thanks for these details. We will run some tests on our side as well;
the commit message made me worried about the last percentile of
requests which might have crazy numbers sometimes.
I now better understand we are speaking about 1.75 extra microseconds.

> Note that if this is of interest to you, it's trivial to make haproxy
> run in busy polling mode, and in this case the performance increases to
> 30900 conn/s, at the expense of eating all your CPU (which possibly you
> don't care about if the latency is your worst ennemy). We can possibly
> even improve this to ensure that it's done only when there are existing
> sessions on a given thread. Let me know if this is something that could
> be of interest to you, as I think we could make this configurable and
> bypass the sync point in this case.

It is definitely something interesting for us to make it configurable.
I will try to have a look as well.

> No they're definitely not for 1.8 and still really touchy. We're
> progressively attacking locks wherever we can. Some further patches
> will refine the scheduler to make it more parallel, and even the code
> above will continue to change, see it as a first step in the right
> direction.

understood.

> We noticed a nice performance boost on the last one with many cores
> (24 threads, something like +40% on connection rate), but we'll probably
> see even better once the rest is addressed.

indeed, I remember we spoke about those improvments at the last meetup.
nice work, 1.9 looks already interesting from this point of view!


Cheers,

-- 
William



Re: segfault in haproxy=1.8.4

2018-03-19 Thread Willy Tarreau
Hi William,

On Mon, Mar 19, 2018 at 06:57:50PM +0100, William Dauchy wrote:
> > However, be careful. This new implementation should be thread-safe
> > (hopefully...). But it is not optimal and in some situations, it could be 
> > really
> > slower in multi-threaded mode than in single-threaded one. The problem is 
> > that,
> > when we try to dequeue pending connections, we process it from the older 
> > one to
> > the newer one independently to the thread's affinity. So we need to wait the
> > other threads' wakeup to really process them. If threads are blocked in the
> > poller, this will add a significant latency. This problem happens when 
> > maxconn
> > values are very low.
> 
> Regarding this last section, we are a bit worried about the usability
> of the new `nbthread` feature in 1.8. It raised a few question on our
> side:
> - Is it considered as an experimental feature?

Threading was clearly released with an experimental status, just like
H2, because we knew we'd be facing some post-release issues in these
two areas that are hard to get 100% right at once. However I consider
that the situation has got much better, and to confirm this, both of
these are now enabled by default in HapTech's products. With this said,
I expect that over time we'll continue to see a few bugs, but not more
than what we're seeing in various areas. For example, we didn't get a
single issue on haproxy.org since it was updated to the 1.8.1 or so,
3 months ago. So this is getting quite good.

> - Should we expect potential latencies side effects in some situations
> as described in your commit message? (and so avoid to use it for low
> latency usage)

I ran a stress test on this patch, with a single server running with
"maxconn 1", with a frontend bound to two threads. I measure exactly
3 conn/s with a single thread (keep in mind that there's a single
connection at once), and 28500 with two threads. Thus the sync point
takes on average an extra 1.75 microsecond, compared to the 35
microseconds it takes on average to finish processing the request
(connect, server processing, response, close).

Also if you're running with nbproc > 1 instead, the maxconn setting is
not really respected since it becomes per-process. When you run with
8 processes it doesn't mean much anymore, or you need to have small
maxconn settings, implying that sometimes a process might queue some
requests while there are available slots in other processes. Thus I'd
argue that the threads here significantly improve the situation by
allowing all connection slots to be used by all CPUs, which is a real
improvement which should theorically show you lower latencies.

Note that if this is of interest to you, it's trivial to make haproxy
run in busy polling mode, and in this case the performance increases to
30900 conn/s, at the expense of eating all your CPU (which possibly you
don't care about if the latency is your worst ennemy). We can possibly
even improve this to ensure that it's done only when there are existing
sessions on a given thread. Let me know if this is something that could
be of interest to you, as I think we could make this configurable and
bypass the sync point in this case.

> - I saw some commits for 1.9 release which probably improves the
> situation about lockess threading
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc
> But I guess they will not be backported for 1.8, right?

No they're definitely not for 1.8 and still really touchy. We're
progressively attacking locks wherever we can. Some further patches
will refine the scheduler to make it more parallel, and even the code
above will continue to change, see it as a first step in the right
direction.

We noticed a nice performance boost on the last one with many cores
(24 threads, something like +40% on connection rate), but we'll probably
see even better once the rest is addressed.

Cheers,
Willy



Re: segfault in haproxy=1.8.4

2018-03-19 Thread William Dauchy
Hi Christopher,

On Thu, Mar 15, 2018 at 04:05:04PM +0100, Christopher Faulet wrote:
> From 91b1349b6a1a64d43cc41e8546ff1d1ce17a8e14 Mon Sep 17 00:00:00 2001
> From: Christopher Faulet 
> Date: Wed, 14 Mar 2018 16:18:06 +0100
> Subject: [PATCH] BUG/MAJOR: threads/queue: Fix thread-safety issues on the
>  queues management

[...]

> However, be careful. This new implementation should be thread-safe
> (hopefully...). But it is not optimal and in some situations, it could be 
> really
> slower in multi-threaded mode than in single-threaded one. The problem is 
> that,
> when we try to dequeue pending connections, we process it from the older one 
> to
> the newer one independently to the thread's affinity. So we need to wait the
> other threads' wakeup to really process them. If threads are blocked in the
> poller, this will add a significant latency. This problem happens when maxconn
> values are very low.

Regarding this last section, we are a bit worried about the usability
of the new `nbthread` feature in 1.8. It raised a few question on our
side:
- Is it considered as an experimental feature?
- Should we expect potential latencies side effects in some situations
as described in your commit message? (and so avoid to use it for low
latency usage)
- I saw some commits for 1.9 release which probably improves the
situation about lockess threading
http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59
http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc
But I guess they will not be backported for 1.8, right?

Best,
-- 
William



Re: segfault in haproxy=1.8.4

2018-03-15 Thread Christopher Faulet

Le 15/03/2018 à 15:50, Willy Tarreau a écrit :

On Thu, Mar 15, 2018 at 02:49:59PM +0100, Christopher Faulet wrote:

When we scan a queue, it is locked. So on your mark, the server queue is
already locked. To remove a pendconn from a queue, we also need to have a
lock on this queue, if it is still linked. Else we can safely remove it,
nobody can fall on the pendconn by scanning the queue.


OK that's exactly the information I was missing, so I think all is good
then!


In fact without locking the queue (server one and/or proxy one), it is
unsafe to loop on it. In same way, we must lock the queue to remove a
pendconn from it. I will add comments to clarify everything. But, I really
think it is thread-safe.


Absolutely. Thanks for the detailed explanation!



Nice, I'm relieved. You made me doubt myself :)
Here is the updated patch.

Thanks
--
Christopher Faulet
>From 91b1349b6a1a64d43cc41e8546ff1d1ce17a8e14 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 14 Mar 2018 16:18:06 +0100
Subject: [PATCH] BUG/MAJOR: threads/queue: Fix thread-safety issues on the
 queues management

The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to ->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.

So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its  field.

However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.

This patch must be backported in 1.8.
---
 include/common/hathreads.h |   2 +
 include/proto/queue.h  |   1 +
 include/types/queue.h  |  10 +-
 include/types/stream.h |   2 +-
 src/proto_http.c   |   2 -
 src/queue.c| 324 +++--
 src/stream.c   |   3 +-
 7 files changed, 209 insertions(+), 135 deletions(-)

diff --git a/include/common/hathreads.h b/include/common/hathreads.h
index 3da6dba4..19299db7 100644
--- a/include/common/hathreads.h
+++ b/include/common/hathreads.h
@@ -240,6 +240,7 @@ enum lock_label {
 	PIPES_LOCK,
 	START_LOCK,
 	TLSKEYS_REF_LOCK,
+	PENDCONN_LOCK,
 	LOCK_LABELS
 };
 struct lock_stat {
@@ -360,6 +361,7 @@ static inline const char *lock_label(enum lock_label label)
 	case PIPES_LOCK:   return "PIPES";
 	case START_LOCK:   return "START";
 	case TLSKEYS_REF_LOCK: return "TLSKEYS_REF";
+	case PENDCONN_LOCK:return "PENDCONN";
 	case LOCK_LABELS:  break; /* keep compiler happy */
 	};
 	/* only way to come here is consecutive to an internal bug */
diff --git a/include/proto/queue.h b/include/proto/queue.h
index f66d809f..2d4773a0 100644
--- a/include/proto/queue.h
+++ b/include/proto/queue.h
@@ -38,6 +38,7 @@ extern struct pool_head *pool_head_pendconn;
 
 int init_pendconn();
 struct pendconn *pendconn_add(struct stream *strm);
+int pendconn_dequeue(struct stream *strm);
 void pendconn_free(struct pendconn *p);
 void process_srv_queue(struct server *s);
 unsigned int srv_dynamic_maxconn(const struct server *s);
diff --git a/include/types/queue.h b/include/types/queue.h
index 4b354514..42dbbd04 100644
--- a/include/types/queue.h
+++ b/include/types/queue.h
@@ -24,15 +24,19 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
 struct stream;
 
 struct pendconn {
-	struct list list;		/* chaining ... */
-	struct stream *strm;		/* the stream waiting for a connection */
-	struct server *srv;		/* the server we are waiting for */
+	intstrm_flags; /* stream flags */
+	struct stream *strm;
+	struct proxy  *px;
+	struct server *srv;/* the server we are waiting for, may be NULL */
+	struct listlist;   /* next pendconn */
+	__decl_hathreads(HA_SPINLOCK_T lock);
 };
 
 #endif /* _TYPES_QUEUE_H */
diff --git a/include/types/stream.h b/include/types/stream.h
index 227b0ffb..0dbc79f4 100644
--- 

Re: segfault in haproxy=1.8.4

2018-03-15 Thread Willy Tarreau
On Thu, Mar 15, 2018 at 02:49:59PM +0100, Christopher Faulet wrote:
> When we scan a queue, it is locked. So on your mark, the server queue is
> already locked. To remove a pendconn from a queue, we also need to have a
> lock on this queue, if it is still linked. Else we can safely remove it,
> nobody can fall on the pendconn by scanning the queue.

OK that's exactly the information I was missing, so I think all is good
then!

> In fact without locking the queue (server one and/or proxy one), it is
> unsafe to loop on it. In same way, we must lock the queue to remove a
> pendconn from it. I will add comments to clarify everything. But, I really
> think it is thread-safe.

Absolutely. Thanks for the detailed explanation!

Willy



Re: segfault in haproxy=1.8.4

2018-03-15 Thread Christopher Faulet

Le 15/03/2018 à 12:16, Willy Tarreau a écrit :

+static struct stream *pendconn_process_next_strm(struct server *srv, struct 
proxy *px)
   {
+   struct pendconn *p = NULL;
+   struct server   *rsrv;
rsrv = srv->track;
if (!rsrv)
rsrv = srv;
+   if (srv->nbpend) {
+   list_for_each_entry(p, >pendconns, list) {


-- mark here --


+   if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock))
+   goto ps_found;
+   }
+   p = NULL;
+   }
+
+  ps_found:
+   if (srv_currently_usable(rsrv) && px->nbpend) {
+   struct pendconn *pp;
+
+   list_for_each_entry(pp, >pendconns, list) {
+   /* If the server pendconn is older than the proxy one,
+* we process the server one. */
+   if (p && !tv_islt(>strm->logs.tv_request, 
>strm->logs.tv_request))
+   goto pendconn_found;


Here there is still a race : pp->strm is dereferenced out of the lock,
so the owner could very well decide to disappear in the mean time. I
think this check needs to be moved in the trylock below. Or another
option (probably simpler and cleaner) would be to add a tv_request
field to the pendconn, and directly use it.


Well, not really. When we access to a pendconn in a queue (proxy or server)
we always get a lock on it. In other side, a stream must release its
pendconn, if any, before being released. To do so the right queue must be
locked by the stream. So when we manipulate a pendconn in a queue we are
sure that the stream cannot release it in same time. And if the stream
released it, it cannot be in the queue.

Everything should be thread-safe because I added an lock on the pendconn
_AND_ the pendconn is now only released by the stream itself. These 2
conditions are necessary and sufficient to guarantee the thread-safety.


OK that's fine. I think that you should indicate for certain functions
that they must only be called under this or that lock (I remember seeing
one this morning in this situation, I don't remember which one).

But then what prevents a stream from removing itself from the queue at
the mark above ? It takes the lock, removes itself, unlocks, but  is
still seen by the function, the trylock succeeds (on just freed memory),
the stream is dereferenced again after being freed, and we may even
unlink it twice, immediately corrupting memory. I may be wrong but I
don't see what prevents this case from being possible. If it is, you may
need to use another lock when manipulating this list (likely the server
lock as you initially did), though that could become expensive.



When we scan a queue, it is locked. So on your mark, the server queue is 
already locked. To remove a pendconn from a queue, we also need to have 
a lock on this queue, if it is still linked. Else we can safely remove 
it, nobody can fall on the pendconn by scanning the queue.


In fact without locking the queue (server one and/or proxy one), it is 
unsafe to loop on it. In same way, we must lock the queue to remove a 
pendconn from it. I will add comments to clarify everything. But, I 
really think it is thread-safe.


--
Christopher Faulet



Re: segfault in haproxy=1.8.4

2018-03-15 Thread Willy Tarreau
On Thu, Mar 15, 2018 at 11:32:41AM +0100, Christopher Faulet wrote:
> I tested with an ugly and really unsafe/buggy hack to migrate the stream and
> its client FD before waking it up, and this problem disappears. So this is
> definitely the right way to handle it. But for now, this could be a problem
> for everyone using maxconn with threads.

Well, it cannot be worse than the current situation! Also, don't worry, as
in general when working with very low maxconn, you face high wake-up rates
because operations are much less batched, so the workloads exhibiting this
case where some tasks are idle waiting for a connection shouldn't be that
common, and that will buy us time to try to address this better.

> > > +static struct stream *pendconn_process_next_strm(struct server *srv, 
> > > struct proxy *px)
> > >   {
> > > + struct pendconn *p = NULL;
> > > + struct server   *rsrv;
> > >   rsrv = srv->track;
> > >   if (!rsrv)
> > >   rsrv = srv;
> > > + if (srv->nbpend) {
> > > + list_for_each_entry(p, >pendconns, list) {

-- mark here --

> > > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock))
> > > + goto ps_found;
> > > + }
> > > + p = NULL;
> > > + }
> > > +
> > > +  ps_found:
> > > + if (srv_currently_usable(rsrv) && px->nbpend) {
> > > + struct pendconn *pp;
> > > +
> > > + list_for_each_entry(pp, >pendconns, list) {
> > > + /* If the server pendconn is older than the proxy one,
> > > +  * we process the server one. */
> > > + if (p && !tv_islt(>strm->logs.tv_request, 
> > > >strm->logs.tv_request))
> > > + goto pendconn_found;
> > 
> > Here there is still a race : pp->strm is dereferenced out of the lock,
> > so the owner could very well decide to disappear in the mean time. I
> > think this check needs to be moved in the trylock below. Or another
> > option (probably simpler and cleaner) would be to add a tv_request
> > field to the pendconn, and directly use it.
> 
> Well, not really. When we access to a pendconn in a queue (proxy or server)
> we always get a lock on it. In other side, a stream must release its
> pendconn, if any, before being released. To do so the right queue must be
> locked by the stream. So when we manipulate a pendconn in a queue we are
> sure that the stream cannot release it in same time. And if the stream
> released it, it cannot be in the queue.
> 
> Everything should be thread-safe because I added an lock on the pendconn
> _AND_ the pendconn is now only released by the stream itself. These 2
> conditions are necessary and sufficient to guarantee the thread-safety.

OK that's fine. I think that you should indicate for certain functions
that they must only be called under this or that lock (I remember seeing
one this morning in this situation, I don't remember which one).

But then what prevents a stream from removing itself from the queue at
the mark above ? It takes the lock, removes itself, unlocks, but  is
still seen by the function, the trylock succeeds (on just freed memory),
the stream is dereferenced again after being freed, and we may even
unlink it twice, immediately corrupting memory. I may be wrong but I
don't see what prevents this case from being possible. If it is, you may
need to use another lock when manipulating this list (likely the server
lock as you initially did), though that could become expensive.

OK for the rest :-)

Willy



Re: segfault in haproxy=1.8.4

2018-03-15 Thread Christopher Faulet

Le 15/03/2018 à 07:19, Willy Tarreau a écrit :

Hi Christopher,

first, thank you for this one, I know how painful it can have been!


Hi Willy,

Thanks for your support :)


On Wed, Mar 14, 2018 at 09:56:19PM +0100, Christopher Faulet wrote:
(...)

But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.


This problem is not related to your patch but to the task scheduler, which
we need to improve^Wfix so that it can properly wake up a task on a sleeping
thread. We need to check this with Emeric and Olivier to see how to best
address it in 1.8 and how to ensure that 1.9 is clean regarding this.



I tested with an ugly and really unsafe/buggy hack to migrate the stream 
and its client FD before waking it up, and this problem disappears. So 
this is definitely the right way to handle it. But for now, this could 
be a problem for everyone using maxconn with threads.



I'm having a few comments on the code below :


@@ -97,44 +76,66 @@ static inline struct pendconn *pendconn_from_px(const 
struct proxy *px) {
   * The stream is immediately marked as "assigned", and both its  and
   *  are set to ,
   */
-static struct stream *pendconn_get_next_strm(struct server *srv, struct proxy 
*px)
+static struct stream *pendconn_process_next_strm(struct server *srv, struct 
proxy *px)
  {
-   struct pendconn *ps, *pp;
-   struct stream *strm;
-   struct server *rsrv;
+   struct pendconn *p = NULL;
+   struct server   *rsrv;
  
  	rsrv = srv->track;

if (!rsrv)
rsrv = srv;
  
-	ps = pendconn_from_srv(srv);

-   pp = pendconn_from_px(px);
-   /* we want to get the definitive pendconn in  */
-   if (!pp || !srv_currently_usable(rsrv)) {
-   if (!ps)
-   return NULL;
-   } else {
-   /* pendconn exists in the proxy queue */
-   if (!ps || tv_islt(>strm->logs.tv_request, 
>strm->logs.tv_request))
-   ps = pp;
+   if (srv->nbpend) {
+   list_for_each_entry(p, >pendconns, list) {
+   if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock))
+   goto ps_found;
+   }
+   p = NULL;
+   }
+
+  ps_found:
+   if (srv_currently_usable(rsrv) && px->nbpend) {
+   struct pendconn *pp;
+
+   list_for_each_entry(pp, >pendconns, list) {
+   /* If the server pendconn is older than the proxy one,
+* we process the server one. */
+   if (p && !tv_islt(>strm->logs.tv_request, 
>strm->logs.tv_request))
+   goto pendconn_found;


Here there is still a race : pp->strm is dereferenced out of the lock,
so the owner could very well decide to disappear in the mean time. I
think this check needs to be moved in the trylock below. Or another
option (probably simpler and cleaner) would be to add a tv_request
field to the pendconn, and directly use it.


Well, not really. When we access to a pendconn in a queue (proxy or 
server) we always get a lock on it. In other side, a stream must release 
its pendconn, if any, before being released. To do so the right queue 
must be locked by the stream. So when we manipulate a pendconn in a 
queue we are sure that the stream cannot release it in same time. And if 
the stream released it, it cannot be in the queue.


Everything should be thread-safe because I added an lock on the pendconn 
_AND_ the pendconn is now only released by the stream itself. These 2 
conditions are necessary and sufficient to guarantee the thread-safety.



+  pendconn_found:
+   pendconn_unlinked(p);
+   p->strm_flags  = (p->strm_flags & (SF_DIRECT | SF_ADDR_SET));
+   p->strm_flags |= SF_ASSIGNED;


I think that it would be simpler to consider that strm_flags start with
the stream's initial flags, then only manipulate the ones we're interested
in, and only apply the final mask at the end. This way the enumeration of
all manipulated flags doesn't need to be known at each location. This means
you should only keep SF_ASSIGNED above and drop the line filterinng on the
two other flags. See below for the other instances.


Right, I will change that.

[...]


-/*
- * Detaches pending connection , decreases the pending count, and frees
- * the pending connection. The connection might have been queued to a specific
- * server as well as to the proxy. The stream also gets marked unqueued.
+/* Try to dequeue pending connection attached to the stream . It must
+ * always exists here. If the 

Re: segfault in haproxy=1.8.4

2018-03-15 Thread Willy Tarreau
Hi Christopher,

first, thank you for this one, I know how painful it can have been!

On Wed, Mar 14, 2018 at 09:56:19PM +0100, Christopher Faulet wrote:
(...)
> But it is not optimal and in some situations, it could be really
> slower in multi-threaded mode than in single-threaded one. The problem is 
> that,
> when we try to dequeue pending connections, we process it from the older one 
> to
> the newer one independently to the thread's affinity. So we need to wait the
> other threads' wakeup to really process them. If threads are blocked in the
> poller, this will add a significant latency. This problem happens when maxconn
> values are very low.

This problem is not related to your patch but to the task scheduler, which
we need to improve^Wfix so that it can properly wake up a task on a sleeping
thread. We need to check this with Emeric and Olivier to see how to best
address it in 1.8 and how to ensure that 1.9 is clean regarding this.

I'm having a few comments on the code below :

> @@ -97,44 +76,66 @@ static inline struct pendconn *pendconn_from_px(const 
> struct proxy *px) {
>   * The stream is immediately marked as "assigned", and both its  and
>   *  are set to ,
>   */
> -static struct stream *pendconn_get_next_strm(struct server *srv, struct 
> proxy *px)
> +static struct stream *pendconn_process_next_strm(struct server *srv, struct 
> proxy *px)
>  {
> - struct pendconn *ps, *pp;
> - struct stream *strm;
> - struct server *rsrv;
> + struct pendconn *p = NULL;
> + struct server   *rsrv;
>  
>   rsrv = srv->track;
>   if (!rsrv)
>   rsrv = srv;
>  
> - ps = pendconn_from_srv(srv);
> - pp = pendconn_from_px(px);
> - /* we want to get the definitive pendconn in  */
> - if (!pp || !srv_currently_usable(rsrv)) {
> - if (!ps)
> - return NULL;
> - } else {
> - /* pendconn exists in the proxy queue */
> - if (!ps || tv_islt(>strm->logs.tv_request, 
> >strm->logs.tv_request))
> - ps = pp;
> + if (srv->nbpend) {
> + list_for_each_entry(p, >pendconns, list) {
> + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock))
> + goto ps_found;
> + }
> + p = NULL;
> + }
> +
> +  ps_found:
> + if (srv_currently_usable(rsrv) && px->nbpend) {
> + struct pendconn *pp;
> +
> + list_for_each_entry(pp, >pendconns, list) {
> + /* If the server pendconn is older than the proxy one,
> +  * we process the server one. */
> + if (p && !tv_islt(>strm->logs.tv_request, 
> >strm->logs.tv_request))
> + goto pendconn_found;

Here there is still a race : pp->strm is dereferenced out of the lock,
so the owner could very well decide to disappear in the mean time. I
think this check needs to be moved in the trylock below. Or another
option (probably simpler and cleaner) would be to add a tv_request
field to the pendconn, and directly use it.

> +
> + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) {
> + /* Let's switch from the server pendconn to the
> +  * proxy pendconn. Don't forget to unlock the
> +  * server pendconn, if any. */
> + if (p)
> + HA_SPIN_UNLOCK(PENDCONN_LOCK, >lock);
> + p = pp;
> + goto pendconn_found;
> + }
> + }
>   }
> - strm = ps->strm;
> - __pendconn_free(ps);
>  
> - /* we want to note that the stream has now been assigned a server */
> - strm->flags |= SF_ASSIGNED;
> - strm->target = >obj_type;
> - __stream_add_srv_conn(strm, srv);
> + if (!p)
> + return NULL;
> +
> +  pendconn_found:
> + pendconn_unlinked(p);
> + p->strm_flags  = (p->strm_flags & (SF_DIRECT | SF_ADDR_SET));
> + p->strm_flags |= SF_ASSIGNED;

I think that it would be simpler to consider that strm_flags start with
the stream's initial flags, then only manipulate the ones we're interested
in, and only apply the final mask at the end. This way the enumeration of
all manipulated flags doesn't need to be known at each location. This means
you should only keep SF_ASSIGNED above and drop the line filterinng on the
two other flags. See below for the other instances.

> @@ -206,26 +204,28 @@ struct pendconn *pendconn_add(struct stream *strm)
>   */
>  int pendconn_redistribute(struct server *s)
>  {
> - struct pendconn *pc, *pc_bck;
> + struct pendconn *p, *pback;
>   int xferred = 0;
>  
> + /* The REDISP option was specified. We will ignore cookie and force to
> +  * balance or use the dispatcher. */
> + if ((s->proxy->options & (PR_O_REDISP|PR_O_PERSIST)) != PR_O_REDISP)
> + return 

Re: segfault in haproxy=1.8.4

2018-03-14 Thread Максим Куприянов
Hi, Christopher!

Thank you very much for the patch. I'll apply it to my canary host today
but it will take a week or even more to assure that no crashes occur.
Anyway I'll write you back.

2018-03-14 23:56 GMT+03:00 Christopher Faulet :

> Le 07/03/2018 à 09:58, Christopher Faulet a écrit :
>
>> I found thread-safety bugs about the management of pending connections.
>> It is totally fucked up :) It needs to be entirely reworked. I'm on it.
>> I hope to propose a patch this afternoon.
>>
>
> Hi,
>
> Sorry for the lag. This issue was definitely harder to fix that I thought
> so at first glance. Here is a proposal to fix queues management. Could you
> check if it fixes your bug please ?
>
> --
> Christopher Faulet
>


Re: segfault in haproxy=1.8.4

2018-03-14 Thread Christopher Faulet

Le 07/03/2018 à 09:58, Christopher Faulet a écrit :

I found thread-safety bugs about the management of pending connections.
It is totally fucked up :) It needs to be entirely reworked. I'm on it.
I hope to propose a patch this afternoon.


Hi,

Sorry for the lag. This issue was definitely harder to fix that I 
thought so at first glance. Here is a proposal to fix queues management. 
Could you check if it fixes your bug please ?


--
Christopher Faulet
>From 32926ada6a00c11980182d582a521d3833e36f23 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 14 Mar 2018 16:18:06 +0100
Subject: [PATCH] BUG/MAJOR: thread/queue: Fix thread-safety issues on the
 queues management

The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to ->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.

So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its  field.

However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.

This patch must be backported in 1.8.
---
 include/common/hathreads.h |   2 +
 include/proto/queue.h  |   1 +
 include/types/queue.h  |   9 +-
 include/types/stream.h |   2 +-
 src/proto_http.c   |   2 -
 src/queue.c| 283 +++--
 src/stream.c   |   3 +-
 7 files changed, 180 insertions(+), 122 deletions(-)

diff --git a/include/common/hathreads.h b/include/common/hathreads.h
index 3da6dba4..19299db7 100644
--- a/include/common/hathreads.h
+++ b/include/common/hathreads.h
@@ -240,6 +240,7 @@ enum lock_label {
 	PIPES_LOCK,
 	START_LOCK,
 	TLSKEYS_REF_LOCK,
+	PENDCONN_LOCK,
 	LOCK_LABELS
 };
 struct lock_stat {
@@ -360,6 +361,7 @@ static inline const char *lock_label(enum lock_label label)
 	case PIPES_LOCK:   return "PIPES";
 	case START_LOCK:   return "START";
 	case TLSKEYS_REF_LOCK: return "TLSKEYS_REF";
+	case PENDCONN_LOCK:return "PENDCONN";
 	case LOCK_LABELS:  break; /* keep compiler happy */
 	};
 	/* only way to come here is consecutive to an internal bug */
diff --git a/include/proto/queue.h b/include/proto/queue.h
index f66d809f..2d4773a0 100644
--- a/include/proto/queue.h
+++ b/include/proto/queue.h
@@ -38,6 +38,7 @@ extern struct pool_head *pool_head_pendconn;
 
 int init_pendconn();
 struct pendconn *pendconn_add(struct stream *strm);
+int pendconn_dequeue(struct stream *strm);
 void pendconn_free(struct pendconn *p);
 void process_srv_queue(struct server *s);
 unsigned int srv_dynamic_maxconn(const struct server *s);
diff --git a/include/types/queue.h b/include/types/queue.h
index 4b354514..3e7d5a12 100644
--- a/include/types/queue.h
+++ b/include/types/queue.h
@@ -24,15 +24,18 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
 struct stream;
 
 struct pendconn {
-	struct list list;		/* chaining ... */
-	struct stream *strm;		/* the stream waiting for a connection */
-	struct server *srv;		/* the server we are waiting for */
+	intstrm_flags; /* stream flags */
+	struct stream *strm;
+	struct server *srv;/* the server we are waiting for, may be NULL */
+	struct listlist;   /* next pendconn */
+	__decl_hathreads(HA_SPINLOCK_T lock);
 };
 
 #endif /* _TYPES_QUEUE_H */
diff --git a/include/types/stream.h b/include/types/stream.h
index 227b0ffb..0dbc79f4 100644
--- a/include/types/stream.h
+++ b/include/types/stream.h
@@ -124,7 +124,7 @@ struct stream {
 	struct session *sess;   /* the session this stream is attached to */
 
 	struct server *srv_conn;/* stream already has a slot on a server and is not in queue */
-	struct pendconn *pend_pos;  /* if not NULL, points to the position in the pending queue */
+	struct pendconn *pend_pos;  /* if not NULL, points to the pending position in the 

Re: segfault in haproxy=1.8.4

2018-03-05 Thread Willy Tarreau
On Mon, Mar 05, 2018 at 09:19:16PM +0500, ?? ? wrote:
> Hi Willy!
> 
> I have 2 more haproxy-servers with exactly the same configuration and load.
> Both has threads compiled in but not enabled in config (no nbthreads). And
> there're no segfaults at all. So I'm sure everything is fine without
> threads.
> Haproxy's config file itself is way too large to find out exact
> proxy-section where the fault occurred to tell you about configuration :(
> But if it could help: we have few heavy loaded proxy-sections with hundred
> of servers inside, round-roubin algo and maxconn=2. Stick tables are
> enabled but only to calculate average characteristics of traffic and use it
> in ACL so no real stickness on backends.
> I still have the core file available, so I can extract some more detailed
> traces if you need'em.

OK thanks for these details, these are already quite valuable information.
Please don't lose the executable file that produced the core, in case we'd
need to dig deeper into it.

Cheers,
Willy



Re: segfault in haproxy=1.8.4

2018-03-05 Thread Максим Куприянов
Hi Willy!

I have 2 more haproxy-servers with exactly the same configuration and load.
Both has threads compiled in but not enabled in config (no nbthreads). And
there're no segfaults at all. So I'm sure everything is fine without
threads.
Haproxy's config file itself is way too large to find out exact
proxy-section where the fault occurred to tell you about configuration :(
But if it could help: we have few heavy loaded proxy-sections with hundred
of servers inside, round-roubin algo and maxconn=2. Stick tables are
enabled but only to calculate average characteristics of traffic and use it
in ACL so no real stickness on backends.
I still have the core file available, so I can extract some more detailed
traces if you need'em.

Thank you,
Maksim

2018-03-05 18:56 GMT+05:00 Willy Tarreau <w...@1wt.eu>:

> Hi Maxsim,
>
> On Mon, Mar 05, 2018 at 03:08:11PM +0300, ?? ? wrote:
> > Hi!
> >
> > I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It
> happens
> > usually under heavy load. Can you take a look?
> >
> > Using host libthread_db library "/lib/x86_64-linux-gnu/
> libthread_db.so.1".
> > Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market.
> cfg
> > -p /var/run/haproxy-market'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
> > 292 HA_ATOMIC_SUB(>srv->nbpend, 1);
> > (gdb) bt
> > #0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
> > #1  0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0,
> > srv=0x55d61ea3d950) at src/queue.c:122
> > #2  process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153
>
> It looks very likely that this could be directly related to the threads
> indeed, and pendconns can be recycled very quickly, so maybe we're reusing
> one a bit too fast (or maybe we have a very old use-after-free there that
> is magnified by the threads).
>
> Did you have the opportunity to get this one without threads, or conversely
> do you know if it works fine without threads ?
>
> I guess you have maxconn enabled on your servers, could you tell us a bit
> more about the way the farm is set up (LB algo, #servers, average maxconn,
> stickiness or not).
>
> Thanks,
> Willy
>


Re: segfault in haproxy=1.8.4

2018-03-05 Thread Willy Tarreau
Hi Maxsim,

On Mon, Mar 05, 2018 at 03:08:11PM +0300, ?? ? wrote:
> Hi!
> 
> I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It happens
> usually under heavy load. Can you take a look?
> 
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market.cfg
> -p /var/run/haproxy-market'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
> 292 HA_ATOMIC_SUB(>srv->nbpend, 1);
> (gdb) bt
> #0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
> #1  0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0,
> srv=0x55d61ea3d950) at src/queue.c:122
> #2  process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153

It looks very likely that this could be directly related to the threads
indeed, and pendconns can be recycled very quickly, so maybe we're reusing
one a bit too fast (or maybe we have a very old use-after-free there that
is magnified by the threads).

Did you have the opportunity to get this one without threads, or conversely
do you know if it works fine without threads ?

I guess you have maxconn enabled on your servers, could you tell us a bit
more about the way the farm is set up (LB algo, #servers, average maxconn,
stickiness or not).

Thanks,
Willy



segfault in haproxy=1.8.4

2018-03-05 Thread Максим Куприянов
Hi!

I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It happens
usually under heavy load. Can you take a look?

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market.cfg
-p /var/run/haproxy-market'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
292 HA_ATOMIC_SUB(>srv->nbpend, 1);
(gdb) bt
#0  __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292
#1  0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0,
srv=0x55d61ea3d950) at src/queue.c:122
#2  process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153
#3  0x55d61bdcfcf6 in stream_free (s=0x7f1a6899a440) at src/stream.c:315
#4  process_stream (t=) at src/stream.c:2513
#5  0x55d61be48aa8 in process_runnable_tasks () at src/task.c:311
#6  0x55d61bdfdc84 in run_poll_loop () at src/haproxy.c:2398
#7  run_thread_poll_loop (data=) at src/haproxy.c:2460
#8  0x7f1a7f91d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#9  0x7f1a7eb9b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) quit

root@a.local# haproxy -vv
HA-Proxy version 1.8.4-1 2018/02/08
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1
USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

--
Best regards
Maksim Kupriianov