Re: segfault in haproxy=1.8.4
On Mon, Mar 26, 2018 at 08:53:07AM +0300, ?? ? wrote: > Hi! > > It's been almost 2 weeks since I've installed the patch and there were no > segfaults since then. It seems that the problem is fixed now. Thank you! Thank you Maxim for the report, it's much appreciated! Willy
Re: segfault in haproxy=1.8.4
Hi! It's been almost 2 weeks since I've installed the patch and there were no segfaults since then. It seems that the problem is fixed now. Thank you! 2018-03-19 23:16 GMT+03:00 William Dauchy: > On Mon, Mar 19, 2018 at 08:41:16PM +0100, Willy Tarreau wrote: > > For me, "experimental" simply means "we did our best to ensure it works > > but we're realist and know that bug-free doesn't exist, so a risk remains > > that a bug will be hard enough to fix so as to force you to disable the > > feature for the time it takes to fix it". This issue between threads and > > queue is one such example. Some of the bugs faced on H2 requiring some > > heavy changes were other examples. But overall we know these features > > are highly demanded and are committed to make them work fine :-) > > you are right, we probably magnified in our head the different issues we > had related to this. > > > I'm still interested in knowing if you find crazy last percentile values. > > We've had that a very long time ago (version 1.3 or so) when some pending > > conns were accidently skipped, so I know how queues can amplify small > > issues. The only real risk here in my opinion is that the sync point was > > only used for health checks till now so it was running at low loads and > > if it had any issue, it would likely have remained unnoticed. But the > code > > is small enough to be audited, and after re-reading it this afternoon I > > found it fine. > > will do, migrating some low latency applications is more mid/longterm but > will see how the first results during the preparation tests. > > > If you want to run a quick test with epoll, just apply this dirty hack : > > > > diff --git a/src/ev_epoll.c b/src/ev_epoll.c > > index b98ca8c..7bafd16 100644 > > --- a/src/ev_epoll.c > > +++ b/src/ev_epoll.c > > @@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int > exp) > > fd_nbupdt = 0; > > > > /* compute the epoll_wait() timeout */ > > - if (!exp) > > + if (1) > > + wait_time = 0; > > + else if (!exp) > > wait_time = MAX_DELAY_MS; > > else if (tick_is_expired(exp, now_ms)) { > > activity[tid].poll_exp++; > > > > Please note that as this, it's suboptimal because it will increase the > > contention on other places, causing the perfomance to be a bit lower in > > certain situations. I do have some experimental code to loop on epoll > > instead but it's not completely stable yet. We an exchange on this later > > if you want. But feel free to apply this to your latency tests. > > thanks a lot, will give a try! > > -- > William >
Re: segfault in haproxy=1.8.4
On Mon, Mar 19, 2018 at 08:41:16PM +0100, Willy Tarreau wrote: > For me, "experimental" simply means "we did our best to ensure it works > but we're realist and know that bug-free doesn't exist, so a risk remains > that a bug will be hard enough to fix so as to force you to disable the > feature for the time it takes to fix it". This issue between threads and > queue is one such example. Some of the bugs faced on H2 requiring some > heavy changes were other examples. But overall we know these features > are highly demanded and are committed to make them work fine :-) you are right, we probably magnified in our head the different issues we had related to this. > I'm still interested in knowing if you find crazy last percentile values. > We've had that a very long time ago (version 1.3 or so) when some pending > conns were accidently skipped, so I know how queues can amplify small > issues. The only real risk here in my opinion is that the sync point was > only used for health checks till now so it was running at low loads and > if it had any issue, it would likely have remained unnoticed. But the code > is small enough to be audited, and after re-reading it this afternoon I > found it fine. will do, migrating some low latency applications is more mid/longterm but will see how the first results during the preparation tests. > If you want to run a quick test with epoll, just apply this dirty hack : > > diff --git a/src/ev_epoll.c b/src/ev_epoll.c > index b98ca8c..7bafd16 100644 > --- a/src/ev_epoll.c > +++ b/src/ev_epoll.c > @@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) > fd_nbupdt = 0; > > /* compute the epoll_wait() timeout */ > - if (!exp) > + if (1) > + wait_time = 0; > + else if (!exp) > wait_time = MAX_DELAY_MS; > else if (tick_is_expired(exp, now_ms)) { > activity[tid].poll_exp++; > > Please note that as this, it's suboptimal because it will increase the > contention on other places, causing the perfomance to be a bit lower in > certain situations. I do have some experimental code to loop on epoll > instead but it's not completely stable yet. We an exchange on this later > if you want. But feel free to apply this to your latency tests. thanks a lot, will give a try! -- William
Re: segfault in haproxy=1.8.4
On Mon, Mar 19, 2018 at 08:28:14PM +0100, William Dauchy wrote: > On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote: > > Threading was clearly released with an experimental status, just like > > H2, because we knew we'd be facing some post-release issues in these > > two areas that are hard to get 100% right at once. However I consider > > that the situation has got much better, and to confirm this, both of > > these are now enabled by default in HapTech's products. With this said, > > I expect that over time we'll continue to see a few bugs, but not more > > than what we're seeing in various areas. For example, we didn't get a > > single issue on haproxy.org since it was updated to the 1.8.1 or so, > > 3 months ago. So this is getting quite good. > > ok it was not clear to me as being experimental since it was quite > widely advertised in several blog posts, but I probably missed > something. For me, "experimental" simply means "we did our best to ensure it works but we're realist and know that bug-free doesn't exist, so a risk remains that a bug will be hard enough to fix so as to force you to disable the feature for the time it takes to fix it". This issue between threads and queue is one such example. Some of the bugs faced on H2 requiring some heavy changes were other examples. But overall we know these features are highly demanded and are committed to make them work fine :-) > > Also if you're running with nbproc > 1 instead, the maxconn setting is > > not really respected since it becomes per-process. When you run with > > 8 processes it doesn't mean much anymore, or you need to have small > > maxconn settings, implying that sometimes a process might queue some > > requests while there are available slots in other processes. Thus I'd > > argue that the threads here significantly improve the situation by > > allowing all connection slots to be used by all CPUs, which is a real > > improvement which should theorically show you lower latencies. > > thanks for these details. We will run some tests on our side as well; > the commit message made me worried about the last percentile of > requests which might have crazy numbers sometimes. > I now better understand we are speaking about 1.75 extra microseconds. I'm still interested in knowing if you find crazy last percentile values. We've had that a very long time ago (version 1.3 or so) when some pending conns were accidently skipped, so I know how queues can amplify small issues. The only real risk here in my opinion is that the sync point was only used for health checks till now so it was running at low loads and if it had any issue, it would likely have remained unnoticed. But the code is small enough to be audited, and after re-reading it this afternoon I found it fine. > > Note that if this is of interest to you, it's trivial to make haproxy > > run in busy polling mode, and in this case the performance increases to > > 30900 conn/s, at the expense of eating all your CPU (which possibly you > > don't care about if the latency is your worst ennemy). We can possibly > > even improve this to ensure that it's done only when there are existing > > sessions on a given thread. Let me know if this is something that could > > be of interest to you, as I think we could make this configurable and > > bypass the sync point in this case. > > It is definitely something interesting for us to make it configurable. > I will try to have a look as well. If you want to run a quick test with epoll, just apply this dirty hack : diff --git a/src/ev_epoll.c b/src/ev_epoll.c index b98ca8c..7bafd16 100644 --- a/src/ev_epoll.c +++ b/src/ev_epoll.c @@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) fd_nbupdt = 0; /* compute the epoll_wait() timeout */ - if (!exp) + if (1) + wait_time = 0; + else if (!exp) wait_time = MAX_DELAY_MS; else if (tick_is_expired(exp, now_ms)) { activity[tid].poll_exp++; Please note that as this, it's suboptimal because it will increase the contention on other places, causing the perfomance to be a bit lower in certain situations. I do have some experimental code to loop on epoll instead but it's not completely stable yet. We an exchange on this later if you want. But feel free to apply this to your latency tests. > > We noticed a nice performance boost on the last one with many cores > > (24 threads, something like +40% on connection rate), but we'll probably > > see even better once the rest is addressed. > > indeed, I remember we spoke about those improvments at the last meetup. > nice work, 1.9 looks already interesting from this point of view! In fact 1.8's target was to get threads working and in a good enough shape to improve on them. 1.9's target will be to have them even faster :-) Willy
Re: segfault in haproxy=1.8.4
Hi Willy, Thank your for your detailed answer. On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote: > Threading was clearly released with an experimental status, just like > H2, because we knew we'd be facing some post-release issues in these > two areas that are hard to get 100% right at once. However I consider > that the situation has got much better, and to confirm this, both of > these are now enabled by default in HapTech's products. With this said, > I expect that over time we'll continue to see a few bugs, but not more > than what we're seeing in various areas. For example, we didn't get a > single issue on haproxy.org since it was updated to the 1.8.1 or so, > 3 months ago. So this is getting quite good. ok it was not clear to me as being experimental since it was quite widely advertised in several blog posts, but I probably missed something. Thanks for the clarification though. Since 1.8.1 the only issues we had with nbthread was indeed related to using it along with seamless reload but I will give a second try with the latest patches you released in 1.8 tree today. > I ran a stress test on this patch, with a single server running with > "maxconn 1", with a frontend bound to two threads. I measure exactly > 3 conn/s with a single thread (keep in mind that there's a single > connection at once), and 28500 with two threads. Thus the sync point > takes on average an extra 1.75 microsecond, compared to the 35 > microseconds it takes on average to finish processing the request > (connect, server processing, response, close). > Also if you're running with nbproc > 1 instead, the maxconn setting is > not really respected since it becomes per-process. When you run with > 8 processes it doesn't mean much anymore, or you need to have small > maxconn settings, implying that sometimes a process might queue some > requests while there are available slots in other processes. Thus I'd > argue that the threads here significantly improve the situation by > allowing all connection slots to be used by all CPUs, which is a real > improvement which should theorically show you lower latencies. thanks for these details. We will run some tests on our side as well; the commit message made me worried about the last percentile of requests which might have crazy numbers sometimes. I now better understand we are speaking about 1.75 extra microseconds. > Note that if this is of interest to you, it's trivial to make haproxy > run in busy polling mode, and in this case the performance increases to > 30900 conn/s, at the expense of eating all your CPU (which possibly you > don't care about if the latency is your worst ennemy). We can possibly > even improve this to ensure that it's done only when there are existing > sessions on a given thread. Let me know if this is something that could > be of interest to you, as I think we could make this configurable and > bypass the sync point in this case. It is definitely something interesting for us to make it configurable. I will try to have a look as well. > No they're definitely not for 1.8 and still really touchy. We're > progressively attacking locks wherever we can. Some further patches > will refine the scheduler to make it more parallel, and even the code > above will continue to change, see it as a first step in the right > direction. understood. > We noticed a nice performance boost on the last one with many cores > (24 threads, something like +40% on connection rate), but we'll probably > see even better once the rest is addressed. indeed, I remember we spoke about those improvments at the last meetup. nice work, 1.9 looks already interesting from this point of view! Cheers, -- William
Re: segfault in haproxy=1.8.4
Hi William, On Mon, Mar 19, 2018 at 06:57:50PM +0100, William Dauchy wrote: > > However, be careful. This new implementation should be thread-safe > > (hopefully...). But it is not optimal and in some situations, it could be > > really > > slower in multi-threaded mode than in single-threaded one. The problem is > > that, > > when we try to dequeue pending connections, we process it from the older > > one to > > the newer one independently to the thread's affinity. So we need to wait the > > other threads' wakeup to really process them. If threads are blocked in the > > poller, this will add a significant latency. This problem happens when > > maxconn > > values are very low. > > Regarding this last section, we are a bit worried about the usability > of the new `nbthread` feature in 1.8. It raised a few question on our > side: > - Is it considered as an experimental feature? Threading was clearly released with an experimental status, just like H2, because we knew we'd be facing some post-release issues in these two areas that are hard to get 100% right at once. However I consider that the situation has got much better, and to confirm this, both of these are now enabled by default in HapTech's products. With this said, I expect that over time we'll continue to see a few bugs, but not more than what we're seeing in various areas. For example, we didn't get a single issue on haproxy.org since it was updated to the 1.8.1 or so, 3 months ago. So this is getting quite good. > - Should we expect potential latencies side effects in some situations > as described in your commit message? (and so avoid to use it for low > latency usage) I ran a stress test on this patch, with a single server running with "maxconn 1", with a frontend bound to two threads. I measure exactly 3 conn/s with a single thread (keep in mind that there's a single connection at once), and 28500 with two threads. Thus the sync point takes on average an extra 1.75 microsecond, compared to the 35 microseconds it takes on average to finish processing the request (connect, server processing, response, close). Also if you're running with nbproc > 1 instead, the maxconn setting is not really respected since it becomes per-process. When you run with 8 processes it doesn't mean much anymore, or you need to have small maxconn settings, implying that sometimes a process might queue some requests while there are available slots in other processes. Thus I'd argue that the threads here significantly improve the situation by allowing all connection slots to be used by all CPUs, which is a real improvement which should theorically show you lower latencies. Note that if this is of interest to you, it's trivial to make haproxy run in busy polling mode, and in this case the performance increases to 30900 conn/s, at the expense of eating all your CPU (which possibly you don't care about if the latency is your worst ennemy). We can possibly even improve this to ensure that it's done only when there are existing sessions on a given thread. Let me know if this is something that could be of interest to you, as I think we could make this configurable and bypass the sync point in this case. > - I saw some commits for 1.9 release which probably improves the > situation about lockess threading > http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59 > http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc > But I guess they will not be backported for 1.8, right? No they're definitely not for 1.8 and still really touchy. We're progressively attacking locks wherever we can. Some further patches will refine the scheduler to make it more parallel, and even the code above will continue to change, see it as a first step in the right direction. We noticed a nice performance boost on the last one with many cores (24 threads, something like +40% on connection rate), but we'll probably see even better once the rest is addressed. Cheers, Willy
Re: segfault in haproxy=1.8.4
Hi Christopher, On Thu, Mar 15, 2018 at 04:05:04PM +0100, Christopher Faulet wrote: > From 91b1349b6a1a64d43cc41e8546ff1d1ce17a8e14 Mon Sep 17 00:00:00 2001 > From: Christopher Faulet> Date: Wed, 14 Mar 2018 16:18:06 +0100 > Subject: [PATCH] BUG/MAJOR: threads/queue: Fix thread-safety issues on the > queues management [...] > However, be careful. This new implementation should be thread-safe > (hopefully...). But it is not optimal and in some situations, it could be > really > slower in multi-threaded mode than in single-threaded one. The problem is > that, > when we try to dequeue pending connections, we process it from the older one > to > the newer one independently to the thread's affinity. So we need to wait the > other threads' wakeup to really process them. If threads are blocked in the > poller, this will add a significant latency. This problem happens when maxconn > values are very low. Regarding this last section, we are a bit worried about the usability of the new `nbthread` feature in 1.8. It raised a few question on our side: - Is it considered as an experimental feature? - Should we expect potential latencies side effects in some situations as described in your commit message? (and so avoid to use it for low latency usage) - I saw some commits for 1.9 release which probably improves the situation about lockess threading http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59 http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc But I guess they will not be backported for 1.8, right? Best, -- William
Re: segfault in haproxy=1.8.4
Le 15/03/2018 à 15:50, Willy Tarreau a écrit : On Thu, Mar 15, 2018 at 02:49:59PM +0100, Christopher Faulet wrote: When we scan a queue, it is locked. So on your mark, the server queue is already locked. To remove a pendconn from a queue, we also need to have a lock on this queue, if it is still linked. Else we can safely remove it, nobody can fall on the pendconn by scanning the queue. OK that's exactly the information I was missing, so I think all is good then! In fact without locking the queue (server one and/or proxy one), it is unsafe to loop on it. In same way, we must lock the queue to remove a pendconn from it. I will add comments to clarify everything. But, I really think it is thread-safe. Absolutely. Thanks for the detailed explanation! Nice, I'm relieved. You made me doubt myself :) Here is the updated patch. Thanks -- Christopher Faulet >From 91b1349b6a1a64d43cc41e8546ff1d1ce17a8e14 Mon Sep 17 00:00:00 2001 From: Christopher FauletDate: Wed, 14 Mar 2018 16:18:06 +0100 Subject: [PATCH] BUG/MAJOR: threads/queue: Fix thread-safety issues on the queues management The management of the servers and the proxies queues was not thread-safe at all. First, the accesses to ->pend_pos were not protected. So it was possible to release it on a thread (for instance because the stream is released) and to use it in same time on another one (because we redispatch pending connections for a server). Then, the accesses to stream's information (flags and target) from anywhere is forbidden. To be safe, The stream's state must always be updated in the context of process_stream. So to fix these issues, the queue module has been refactored. A lock has been added in the pendconn structure. And now, when we try to dequeue a pending connection, we start by unlinking it from the server/proxy queue and we wake up the stream. Then, it is the stream reponsibility to really dequeue it (or release it). This way, we are sure that only the stream can create and release its field. However, be careful. This new implementation should be thread-safe (hopefully...). But it is not optimal and in some situations, it could be really slower in multi-threaded mode than in single-threaded one. The problem is that, when we try to dequeue pending connections, we process it from the older one to the newer one independently to the thread's affinity. So we need to wait the other threads' wakeup to really process them. If threads are blocked in the poller, this will add a significant latency. This problem happens when maxconn values are very low. This patch must be backported in 1.8. --- include/common/hathreads.h | 2 + include/proto/queue.h | 1 + include/types/queue.h | 10 +- include/types/stream.h | 2 +- src/proto_http.c | 2 - src/queue.c| 324 +++-- src/stream.c | 3 +- 7 files changed, 209 insertions(+), 135 deletions(-) diff --git a/include/common/hathreads.h b/include/common/hathreads.h index 3da6dba4..19299db7 100644 --- a/include/common/hathreads.h +++ b/include/common/hathreads.h @@ -240,6 +240,7 @@ enum lock_label { PIPES_LOCK, START_LOCK, TLSKEYS_REF_LOCK, + PENDCONN_LOCK, LOCK_LABELS }; struct lock_stat { @@ -360,6 +361,7 @@ static inline const char *lock_label(enum lock_label label) case PIPES_LOCK: return "PIPES"; case START_LOCK: return "START"; case TLSKEYS_REF_LOCK: return "TLSKEYS_REF"; + case PENDCONN_LOCK:return "PENDCONN"; case LOCK_LABELS: break; /* keep compiler happy */ }; /* only way to come here is consecutive to an internal bug */ diff --git a/include/proto/queue.h b/include/proto/queue.h index f66d809f..2d4773a0 100644 --- a/include/proto/queue.h +++ b/include/proto/queue.h @@ -38,6 +38,7 @@ extern struct pool_head *pool_head_pendconn; int init_pendconn(); struct pendconn *pendconn_add(struct stream *strm); +int pendconn_dequeue(struct stream *strm); void pendconn_free(struct pendconn *p); void process_srv_queue(struct server *s); unsigned int srv_dynamic_maxconn(const struct server *s); diff --git a/include/types/queue.h b/include/types/queue.h index 4b354514..42dbbd04 100644 --- a/include/types/queue.h +++ b/include/types/queue.h @@ -24,15 +24,19 @@ #include #include +#include #include struct stream; struct pendconn { - struct list list; /* chaining ... */ - struct stream *strm; /* the stream waiting for a connection */ - struct server *srv; /* the server we are waiting for */ + intstrm_flags; /* stream flags */ + struct stream *strm; + struct proxy *px; + struct server *srv;/* the server we are waiting for, may be NULL */ + struct listlist; /* next pendconn */ + __decl_hathreads(HA_SPINLOCK_T lock); }; #endif /* _TYPES_QUEUE_H */ diff --git a/include/types/stream.h b/include/types/stream.h index 227b0ffb..0dbc79f4 100644 ---
Re: segfault in haproxy=1.8.4
On Thu, Mar 15, 2018 at 02:49:59PM +0100, Christopher Faulet wrote: > When we scan a queue, it is locked. So on your mark, the server queue is > already locked. To remove a pendconn from a queue, we also need to have a > lock on this queue, if it is still linked. Else we can safely remove it, > nobody can fall on the pendconn by scanning the queue. OK that's exactly the information I was missing, so I think all is good then! > In fact without locking the queue (server one and/or proxy one), it is > unsafe to loop on it. In same way, we must lock the queue to remove a > pendconn from it. I will add comments to clarify everything. But, I really > think it is thread-safe. Absolutely. Thanks for the detailed explanation! Willy
Re: segfault in haproxy=1.8.4
Le 15/03/2018 à 12:16, Willy Tarreau a écrit : +static struct stream *pendconn_process_next_strm(struct server *srv, struct proxy *px) { + struct pendconn *p = NULL; + struct server *rsrv; rsrv = srv->track; if (!rsrv) rsrv = srv; + if (srv->nbpend) { + list_for_each_entry(p, >pendconns, list) { -- mark here -- + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) + goto ps_found; + } + p = NULL; + } + + ps_found: + if (srv_currently_usable(rsrv) && px->nbpend) { + struct pendconn *pp; + + list_for_each_entry(pp, >pendconns, list) { + /* If the server pendconn is older than the proxy one, +* we process the server one. */ + if (p && !tv_islt(>strm->logs.tv_request, >strm->logs.tv_request)) + goto pendconn_found; Here there is still a race : pp->strm is dereferenced out of the lock, so the owner could very well decide to disappear in the mean time. I think this check needs to be moved in the trylock below. Or another option (probably simpler and cleaner) would be to add a tv_request field to the pendconn, and directly use it. Well, not really. When we access to a pendconn in a queue (proxy or server) we always get a lock on it. In other side, a stream must release its pendconn, if any, before being released. To do so the right queue must be locked by the stream. So when we manipulate a pendconn in a queue we are sure that the stream cannot release it in same time. And if the stream released it, it cannot be in the queue. Everything should be thread-safe because I added an lock on the pendconn _AND_ the pendconn is now only released by the stream itself. These 2 conditions are necessary and sufficient to guarantee the thread-safety. OK that's fine. I think that you should indicate for certain functions that they must only be called under this or that lock (I remember seeing one this morning in this situation, I don't remember which one). But then what prevents a stream from removing itself from the queue at the mark above ? It takes the lock, removes itself, unlocks, but is still seen by the function, the trylock succeeds (on just freed memory), the stream is dereferenced again after being freed, and we may even unlink it twice, immediately corrupting memory. I may be wrong but I don't see what prevents this case from being possible. If it is, you may need to use another lock when manipulating this list (likely the server lock as you initially did), though that could become expensive. When we scan a queue, it is locked. So on your mark, the server queue is already locked. To remove a pendconn from a queue, we also need to have a lock on this queue, if it is still linked. Else we can safely remove it, nobody can fall on the pendconn by scanning the queue. In fact without locking the queue (server one and/or proxy one), it is unsafe to loop on it. In same way, we must lock the queue to remove a pendconn from it. I will add comments to clarify everything. But, I really think it is thread-safe. -- Christopher Faulet
Re: segfault in haproxy=1.8.4
On Thu, Mar 15, 2018 at 11:32:41AM +0100, Christopher Faulet wrote: > I tested with an ugly and really unsafe/buggy hack to migrate the stream and > its client FD before waking it up, and this problem disappears. So this is > definitely the right way to handle it. But for now, this could be a problem > for everyone using maxconn with threads. Well, it cannot be worse than the current situation! Also, don't worry, as in general when working with very low maxconn, you face high wake-up rates because operations are much less batched, so the workloads exhibiting this case where some tasks are idle waiting for a connection shouldn't be that common, and that will buy us time to try to address this better. > > > +static struct stream *pendconn_process_next_strm(struct server *srv, > > > struct proxy *px) > > > { > > > + struct pendconn *p = NULL; > > > + struct server *rsrv; > > > rsrv = srv->track; > > > if (!rsrv) > > > rsrv = srv; > > > + if (srv->nbpend) { > > > + list_for_each_entry(p, >pendconns, list) { -- mark here -- > > > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) > > > + goto ps_found; > > > + } > > > + p = NULL; > > > + } > > > + > > > + ps_found: > > > + if (srv_currently_usable(rsrv) && px->nbpend) { > > > + struct pendconn *pp; > > > + > > > + list_for_each_entry(pp, >pendconns, list) { > > > + /* If the server pendconn is older than the proxy one, > > > + * we process the server one. */ > > > + if (p && !tv_islt(>strm->logs.tv_request, > > > >strm->logs.tv_request)) > > > + goto pendconn_found; > > > > Here there is still a race : pp->strm is dereferenced out of the lock, > > so the owner could very well decide to disappear in the mean time. I > > think this check needs to be moved in the trylock below. Or another > > option (probably simpler and cleaner) would be to add a tv_request > > field to the pendconn, and directly use it. > > Well, not really. When we access to a pendconn in a queue (proxy or server) > we always get a lock on it. In other side, a stream must release its > pendconn, if any, before being released. To do so the right queue must be > locked by the stream. So when we manipulate a pendconn in a queue we are > sure that the stream cannot release it in same time. And if the stream > released it, it cannot be in the queue. > > Everything should be thread-safe because I added an lock on the pendconn > _AND_ the pendconn is now only released by the stream itself. These 2 > conditions are necessary and sufficient to guarantee the thread-safety. OK that's fine. I think that you should indicate for certain functions that they must only be called under this or that lock (I remember seeing one this morning in this situation, I don't remember which one). But then what prevents a stream from removing itself from the queue at the mark above ? It takes the lock, removes itself, unlocks, but is still seen by the function, the trylock succeeds (on just freed memory), the stream is dereferenced again after being freed, and we may even unlink it twice, immediately corrupting memory. I may be wrong but I don't see what prevents this case from being possible. If it is, you may need to use another lock when manipulating this list (likely the server lock as you initially did), though that could become expensive. OK for the rest :-) Willy
Re: segfault in haproxy=1.8.4
Le 15/03/2018 à 07:19, Willy Tarreau a écrit : Hi Christopher, first, thank you for this one, I know how painful it can have been! Hi Willy, Thanks for your support :) On Wed, Mar 14, 2018 at 09:56:19PM +0100, Christopher Faulet wrote: (...) But it is not optimal and in some situations, it could be really slower in multi-threaded mode than in single-threaded one. The problem is that, when we try to dequeue pending connections, we process it from the older one to the newer one independently to the thread's affinity. So we need to wait the other threads' wakeup to really process them. If threads are blocked in the poller, this will add a significant latency. This problem happens when maxconn values are very low. This problem is not related to your patch but to the task scheduler, which we need to improve^Wfix so that it can properly wake up a task on a sleeping thread. We need to check this with Emeric and Olivier to see how to best address it in 1.8 and how to ensure that 1.9 is clean regarding this. I tested with an ugly and really unsafe/buggy hack to migrate the stream and its client FD before waking it up, and this problem disappears. So this is definitely the right way to handle it. But for now, this could be a problem for everyone using maxconn with threads. I'm having a few comments on the code below : @@ -97,44 +76,66 @@ static inline struct pendconn *pendconn_from_px(const struct proxy *px) { * The stream is immediately marked as "assigned", and both its and * are set to , */ -static struct stream *pendconn_get_next_strm(struct server *srv, struct proxy *px) +static struct stream *pendconn_process_next_strm(struct server *srv, struct proxy *px) { - struct pendconn *ps, *pp; - struct stream *strm; - struct server *rsrv; + struct pendconn *p = NULL; + struct server *rsrv; rsrv = srv->track; if (!rsrv) rsrv = srv; - ps = pendconn_from_srv(srv); - pp = pendconn_from_px(px); - /* we want to get the definitive pendconn in */ - if (!pp || !srv_currently_usable(rsrv)) { - if (!ps) - return NULL; - } else { - /* pendconn exists in the proxy queue */ - if (!ps || tv_islt(>strm->logs.tv_request, >strm->logs.tv_request)) - ps = pp; + if (srv->nbpend) { + list_for_each_entry(p, >pendconns, list) { + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) + goto ps_found; + } + p = NULL; + } + + ps_found: + if (srv_currently_usable(rsrv) && px->nbpend) { + struct pendconn *pp; + + list_for_each_entry(pp, >pendconns, list) { + /* If the server pendconn is older than the proxy one, +* we process the server one. */ + if (p && !tv_islt(>strm->logs.tv_request, >strm->logs.tv_request)) + goto pendconn_found; Here there is still a race : pp->strm is dereferenced out of the lock, so the owner could very well decide to disappear in the mean time. I think this check needs to be moved in the trylock below. Or another option (probably simpler and cleaner) would be to add a tv_request field to the pendconn, and directly use it. Well, not really. When we access to a pendconn in a queue (proxy or server) we always get a lock on it. In other side, a stream must release its pendconn, if any, before being released. To do so the right queue must be locked by the stream. So when we manipulate a pendconn in a queue we are sure that the stream cannot release it in same time. And if the stream released it, it cannot be in the queue. Everything should be thread-safe because I added an lock on the pendconn _AND_ the pendconn is now only released by the stream itself. These 2 conditions are necessary and sufficient to guarantee the thread-safety. + pendconn_found: + pendconn_unlinked(p); + p->strm_flags = (p->strm_flags & (SF_DIRECT | SF_ADDR_SET)); + p->strm_flags |= SF_ASSIGNED; I think that it would be simpler to consider that strm_flags start with the stream's initial flags, then only manipulate the ones we're interested in, and only apply the final mask at the end. This way the enumeration of all manipulated flags doesn't need to be known at each location. This means you should only keep SF_ASSIGNED above and drop the line filterinng on the two other flags. See below for the other instances. Right, I will change that. [...] -/* - * Detaches pending connection , decreases the pending count, and frees - * the pending connection. The connection might have been queued to a specific - * server as well as to the proxy. The stream also gets marked unqueued. +/* Try to dequeue pending connection attached to the stream . It must + * always exists here. If the
Re: segfault in haproxy=1.8.4
Hi Christopher, first, thank you for this one, I know how painful it can have been! On Wed, Mar 14, 2018 at 09:56:19PM +0100, Christopher Faulet wrote: (...) > But it is not optimal and in some situations, it could be really > slower in multi-threaded mode than in single-threaded one. The problem is > that, > when we try to dequeue pending connections, we process it from the older one > to > the newer one independently to the thread's affinity. So we need to wait the > other threads' wakeup to really process them. If threads are blocked in the > poller, this will add a significant latency. This problem happens when maxconn > values are very low. This problem is not related to your patch but to the task scheduler, which we need to improve^Wfix so that it can properly wake up a task on a sleeping thread. We need to check this with Emeric and Olivier to see how to best address it in 1.8 and how to ensure that 1.9 is clean regarding this. I'm having a few comments on the code below : > @@ -97,44 +76,66 @@ static inline struct pendconn *pendconn_from_px(const > struct proxy *px) { > * The stream is immediately marked as "assigned", and both its and > * are set to , > */ > -static struct stream *pendconn_get_next_strm(struct server *srv, struct > proxy *px) > +static struct stream *pendconn_process_next_strm(struct server *srv, struct > proxy *px) > { > - struct pendconn *ps, *pp; > - struct stream *strm; > - struct server *rsrv; > + struct pendconn *p = NULL; > + struct server *rsrv; > > rsrv = srv->track; > if (!rsrv) > rsrv = srv; > > - ps = pendconn_from_srv(srv); > - pp = pendconn_from_px(px); > - /* we want to get the definitive pendconn in */ > - if (!pp || !srv_currently_usable(rsrv)) { > - if (!ps) > - return NULL; > - } else { > - /* pendconn exists in the proxy queue */ > - if (!ps || tv_islt(>strm->logs.tv_request, > >strm->logs.tv_request)) > - ps = pp; > + if (srv->nbpend) { > + list_for_each_entry(p, >pendconns, list) { > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) > + goto ps_found; > + } > + p = NULL; > + } > + > + ps_found: > + if (srv_currently_usable(rsrv) && px->nbpend) { > + struct pendconn *pp; > + > + list_for_each_entry(pp, >pendconns, list) { > + /* If the server pendconn is older than the proxy one, > + * we process the server one. */ > + if (p && !tv_islt(>strm->logs.tv_request, > >strm->logs.tv_request)) > + goto pendconn_found; Here there is still a race : pp->strm is dereferenced out of the lock, so the owner could very well decide to disappear in the mean time. I think this check needs to be moved in the trylock below. Or another option (probably simpler and cleaner) would be to add a tv_request field to the pendconn, and directly use it. > + > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, >lock)) { > + /* Let's switch from the server pendconn to the > + * proxy pendconn. Don't forget to unlock the > + * server pendconn, if any. */ > + if (p) > + HA_SPIN_UNLOCK(PENDCONN_LOCK, >lock); > + p = pp; > + goto pendconn_found; > + } > + } > } > - strm = ps->strm; > - __pendconn_free(ps); > > - /* we want to note that the stream has now been assigned a server */ > - strm->flags |= SF_ASSIGNED; > - strm->target = >obj_type; > - __stream_add_srv_conn(strm, srv); > + if (!p) > + return NULL; > + > + pendconn_found: > + pendconn_unlinked(p); > + p->strm_flags = (p->strm_flags & (SF_DIRECT | SF_ADDR_SET)); > + p->strm_flags |= SF_ASSIGNED; I think that it would be simpler to consider that strm_flags start with the stream's initial flags, then only manipulate the ones we're interested in, and only apply the final mask at the end. This way the enumeration of all manipulated flags doesn't need to be known at each location. This means you should only keep SF_ASSIGNED above and drop the line filterinng on the two other flags. See below for the other instances. > @@ -206,26 +204,28 @@ struct pendconn *pendconn_add(struct stream *strm) > */ > int pendconn_redistribute(struct server *s) > { > - struct pendconn *pc, *pc_bck; > + struct pendconn *p, *pback; > int xferred = 0; > > + /* The REDISP option was specified. We will ignore cookie and force to > + * balance or use the dispatcher. */ > + if ((s->proxy->options & (PR_O_REDISP|PR_O_PERSIST)) != PR_O_REDISP) > + return
Re: segfault in haproxy=1.8.4
Hi, Christopher! Thank you very much for the patch. I'll apply it to my canary host today but it will take a week or even more to assure that no crashes occur. Anyway I'll write you back. 2018-03-14 23:56 GMT+03:00 Christopher Faulet: > Le 07/03/2018 à 09:58, Christopher Faulet a écrit : > >> I found thread-safety bugs about the management of pending connections. >> It is totally fucked up :) It needs to be entirely reworked. I'm on it. >> I hope to propose a patch this afternoon. >> > > Hi, > > Sorry for the lag. This issue was definitely harder to fix that I thought > so at first glance. Here is a proposal to fix queues management. Could you > check if it fixes your bug please ? > > -- > Christopher Faulet >
Re: segfault in haproxy=1.8.4
Le 07/03/2018 à 09:58, Christopher Faulet a écrit : I found thread-safety bugs about the management of pending connections. It is totally fucked up :) It needs to be entirely reworked. I'm on it. I hope to propose a patch this afternoon. Hi, Sorry for the lag. This issue was definitely harder to fix that I thought so at first glance. Here is a proposal to fix queues management. Could you check if it fixes your bug please ? -- Christopher Faulet >From 32926ada6a00c11980182d582a521d3833e36f23 Mon Sep 17 00:00:00 2001 From: Christopher FauletDate: Wed, 14 Mar 2018 16:18:06 +0100 Subject: [PATCH] BUG/MAJOR: thread/queue: Fix thread-safety issues on the queues management The management of the servers and the proxies queues was not thread-safe at all. First, the accesses to ->pend_pos were not protected. So it was possible to release it on a thread (for instance because the stream is released) and to use it in same time on another one (because we redispatch pending connections for a server). Then, the accesses to stream's information (flags and target) from anywhere is forbidden. To be safe, The stream's state must always be updated in the context of process_stream. So to fix these issues, the queue module has been refactored. A lock has been added in the pendconn structure. And now, when we try to dequeue a pending connection, we start by unlinking it from the server/proxy queue and we wake up the stream. Then, it is the stream reponsibility to really dequeue it (or release it). This way, we are sure that only the stream can create and release its field. However, be careful. This new implementation should be thread-safe (hopefully...). But it is not optimal and in some situations, it could be really slower in multi-threaded mode than in single-threaded one. The problem is that, when we try to dequeue pending connections, we process it from the older one to the newer one independently to the thread's affinity. So we need to wait the other threads' wakeup to really process them. If threads are blocked in the poller, this will add a significant latency. This problem happens when maxconn values are very low. This patch must be backported in 1.8. --- include/common/hathreads.h | 2 + include/proto/queue.h | 1 + include/types/queue.h | 9 +- include/types/stream.h | 2 +- src/proto_http.c | 2 - src/queue.c| 283 +++-- src/stream.c | 3 +- 7 files changed, 180 insertions(+), 122 deletions(-) diff --git a/include/common/hathreads.h b/include/common/hathreads.h index 3da6dba4..19299db7 100644 --- a/include/common/hathreads.h +++ b/include/common/hathreads.h @@ -240,6 +240,7 @@ enum lock_label { PIPES_LOCK, START_LOCK, TLSKEYS_REF_LOCK, + PENDCONN_LOCK, LOCK_LABELS }; struct lock_stat { @@ -360,6 +361,7 @@ static inline const char *lock_label(enum lock_label label) case PIPES_LOCK: return "PIPES"; case START_LOCK: return "START"; case TLSKEYS_REF_LOCK: return "TLSKEYS_REF"; + case PENDCONN_LOCK:return "PENDCONN"; case LOCK_LABELS: break; /* keep compiler happy */ }; /* only way to come here is consecutive to an internal bug */ diff --git a/include/proto/queue.h b/include/proto/queue.h index f66d809f..2d4773a0 100644 --- a/include/proto/queue.h +++ b/include/proto/queue.h @@ -38,6 +38,7 @@ extern struct pool_head *pool_head_pendconn; int init_pendconn(); struct pendconn *pendconn_add(struct stream *strm); +int pendconn_dequeue(struct stream *strm); void pendconn_free(struct pendconn *p); void process_srv_queue(struct server *s); unsigned int srv_dynamic_maxconn(const struct server *s); diff --git a/include/types/queue.h b/include/types/queue.h index 4b354514..3e7d5a12 100644 --- a/include/types/queue.h +++ b/include/types/queue.h @@ -24,15 +24,18 @@ #include #include +#include #include struct stream; struct pendconn { - struct list list; /* chaining ... */ - struct stream *strm; /* the stream waiting for a connection */ - struct server *srv; /* the server we are waiting for */ + intstrm_flags; /* stream flags */ + struct stream *strm; + struct server *srv;/* the server we are waiting for, may be NULL */ + struct listlist; /* next pendconn */ + __decl_hathreads(HA_SPINLOCK_T lock); }; #endif /* _TYPES_QUEUE_H */ diff --git a/include/types/stream.h b/include/types/stream.h index 227b0ffb..0dbc79f4 100644 --- a/include/types/stream.h +++ b/include/types/stream.h @@ -124,7 +124,7 @@ struct stream { struct session *sess; /* the session this stream is attached to */ struct server *srv_conn;/* stream already has a slot on a server and is not in queue */ - struct pendconn *pend_pos; /* if not NULL, points to the position in the pending queue */ + struct pendconn *pend_pos; /* if not NULL, points to the pending position in the
Re: segfault in haproxy=1.8.4
On Mon, Mar 05, 2018 at 09:19:16PM +0500, ?? ? wrote: > Hi Willy! > > I have 2 more haproxy-servers with exactly the same configuration and load. > Both has threads compiled in but not enabled in config (no nbthreads). And > there're no segfaults at all. So I'm sure everything is fine without > threads. > Haproxy's config file itself is way too large to find out exact > proxy-section where the fault occurred to tell you about configuration :( > But if it could help: we have few heavy loaded proxy-sections with hundred > of servers inside, round-roubin algo and maxconn=2. Stick tables are > enabled but only to calculate average characteristics of traffic and use it > in ACL so no real stickness on backends. > I still have the core file available, so I can extract some more detailed > traces if you need'em. OK thanks for these details, these are already quite valuable information. Please don't lose the executable file that produced the core, in case we'd need to dig deeper into it. Cheers, Willy
Re: segfault in haproxy=1.8.4
Hi Willy! I have 2 more haproxy-servers with exactly the same configuration and load. Both has threads compiled in but not enabled in config (no nbthreads). And there're no segfaults at all. So I'm sure everything is fine without threads. Haproxy's config file itself is way too large to find out exact proxy-section where the fault occurred to tell you about configuration :( But if it could help: we have few heavy loaded proxy-sections with hundred of servers inside, round-roubin algo and maxconn=2. Stick tables are enabled but only to calculate average characteristics of traffic and use it in ACL so no real stickness on backends. I still have the core file available, so I can extract some more detailed traces if you need'em. Thank you, Maksim 2018-03-05 18:56 GMT+05:00 Willy Tarreau <w...@1wt.eu>: > Hi Maxsim, > > On Mon, Mar 05, 2018 at 03:08:11PM +0300, ?? ? wrote: > > Hi! > > > > I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It > happens > > usually under heavy load. Can you take a look? > > > > Using host libthread_db library "/lib/x86_64-linux-gnu/ > libthread_db.so.1". > > Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market. > cfg > > -p /var/run/haproxy-market'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 > > 292 HA_ATOMIC_SUB(>srv->nbpend, 1); > > (gdb) bt > > #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 > > #1 0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0, > > srv=0x55d61ea3d950) at src/queue.c:122 > > #2 process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153 > > It looks very likely that this could be directly related to the threads > indeed, and pendconns can be recycled very quickly, so maybe we're reusing > one a bit too fast (or maybe we have a very old use-after-free there that > is magnified by the threads). > > Did you have the opportunity to get this one without threads, or conversely > do you know if it works fine without threads ? > > I guess you have maxconn enabled on your servers, could you tell us a bit > more about the way the farm is set up (LB algo, #servers, average maxconn, > stickiness or not). > > Thanks, > Willy >
Re: segfault in haproxy=1.8.4
Hi Maxsim, On Mon, Mar 05, 2018 at 03:08:11PM +0300, ?? ? wrote: > Hi! > > I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It happens > usually under heavy load. Can you take a look? > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market.cfg > -p /var/run/haproxy-market'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 > 292 HA_ATOMIC_SUB(>srv->nbpend, 1); > (gdb) bt > #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 > #1 0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0, > srv=0x55d61ea3d950) at src/queue.c:122 > #2 process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153 It looks very likely that this could be directly related to the threads indeed, and pendconns can be recycled very quickly, so maybe we're reusing one a bit too fast (or maybe we have a very old use-after-free there that is magnified by the threads). Did you have the opportunity to get this one without threads, or conversely do you know if it works fine without threads ? I guess you have maxconn enabled on your servers, could you tell us a bit more about the way the farm is set up (LB algo, #servers, average maxconn, stickiness or not). Thanks, Willy
segfault in haproxy=1.8.4
Hi! I have a backtrace for segfault in haproxy=1.8.4 with 4 threads. It happens usually under heavy load. Can you take a look? Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy-market.cfg -p /var/run/haproxy-market'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 292 HA_ATOMIC_SUB(>srv->nbpend, 1); (gdb) bt #0 __pendconn_free (p=0x55d61e970cd8) at src/queue.c:292 #1 0x55d61be491de in pendconn_get_next_strm (px=0x55d61e96fea0, srv=0x55d61ea3d950) at src/queue.c:122 #2 process_srv_queue (s=0x55d61ea3d950) at src/queue.c:153 #3 0x55d61bdcfcf6 in stream_free (s=0x7f1a6899a440) at src/stream.c:315 #4 process_stream (t=) at src/stream.c:2513 #5 0x55d61be48aa8 in process_runnable_tasks () at src/task.c:311 #6 0x55d61bdfdc84 in run_poll_loop () at src/haproxy.c:2398 #7 run_thread_poll_loop (data=) at src/haproxy.c:2460 #8 0x7f1a7f91d184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #9 0x7f1a7eb9b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) quit root@a.local# haproxy -vv HA-Proxy version 1.8.4-1 2018/02/08 Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.1 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Encrypted password support via crypt(3): yes Built with multi-threading support. Built with PCRE version : 8.31 2012-07-06 Running on PCRE version : 8.31 2012-07-06 PCRE library supports JIT : no (libpcre build without JIT?) Built with zlib version : 1.2.8 Running on zlib version : 1.2.8 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with network namespace support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [SPOE] spoe [COMP] compression [TRACE] trace -- Best regards Maksim Kupriianov