Re: Help tracking "connection refused" under pressure on v2.9
Hi Willy, On Fri, 29 Mar 2024 07:17:56 +0100 Willy Tarreau wrote: > > These "connection refused" is from our watchdog; but the effects are as > > perceptible from the outside. When our watchdog hits this situation, > > it will forcefully restart HAProxy (we have 2 instances) because there > > will be a considerable service degradation. If you remember, there's > > https://github.com/haproxy/haproxy/issues/1895 and we talked briefly > > about this in person, at HAProxyConf. > > Thanks for the context. I didn't remember about the issue. I remembered > we discussed for a while but didn't remember about the issue in question > obviously, given the number of issues I'm dealing with :-/ > > In the issue above I'm seeing an element from Felipe saying that telnet > to port 80 can take between 3 seconds to accept. That really makes me > think about either the SYN queue being full, causing drops and retransmits, > or a lack of socket memory to accept packets. That one could possibly be > caused by tcp_mem not being large enough due to some transfers with high > latency fast clients taking a lot of RAM, but it should not affect the > local UNIX socket. Also, killing the process means killing all the > associated connections and will definitely result in freeing a huge > amount of network buffers, so it could fuel that direction. If you have > two instances, did you notice if the two start to behave badly at the > same time ? If that's the case, it would definitely indicate a possible > resource-based cause like socket memory etc. Of our 2 HAProxy instances, it is usually one (mostly the frontend one) that exhibits this behavior. And as it is imperative that the corrective action be as swift as possible, all instances are terminated (which can include older instances, from graceful reloads), and new instances are started. Very harsh, but at >50 Gbps, each full second of downtime adds up considerably to network pressure. So for context, our least capable machine has 256 GB RAM. We have not seen any spikes over the metrics we monitor, and this issue tends to happen at a very stable steady-state, albeit a loaded one. While it is currently outside of our range for detailed data, we didn't notice anything unusual, especially regarding memory usage, on these traps we reported. But of course, there could be a metric that we're not yet aware that correlates. Anyone from the dustier, darkest corners that you know of? :-) > > > But this is incredibly elusive to reproduce; it comes and goes. It > > might happen every few minutes, or not happen at all for months. Not > > tied to a specific setup: different versions, kernels, machines. In > > fact, we do not have better ways to detect the situation, at least not > > as fast, reactive, and resilient. > > It might be useful to take periodic snapshots of /proc/slabinfo and > see if something jumps during such incidents (grep for TCP, net, skbuff > there). I guess you have not noticed any "out of socket memory" nor such > indications in your kernel logs, of course :-/ We have no indications of memory pressure related to network. At the peak, we usually see something like 15~22% overall active memory (it fails me, but it might take >70% of active memory for these machines to actually degrade, maybe more). As for TCP stuff, around 16~30k active sockets, plus some 50~100k in timewait, and still not creating any problems. > > Another one that could make sense to monitor is "PoolFailed" in > "show info". It should always remain zero. We collect this (all available actually); I don't remember this one ever measuring more than zero. But we'll keep an eye on it. In time, could this be somewhat unrelated to HAProxy? I.e., maybe kernel? Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
Re: Help tracking "connection refused" under pressure on v2.9
Hi Willy, On Thu, 28 Mar 2024 04:37:11 +0100 Willy Tarreau wrote: > Thanks guys! So there seems to be an annoying bug. However I'm not sure > how this is related to your "connection refused", except if you try to > connect at the moment the process crashes and restarts, of course. I'm > seeing that the bug here is stktable_requeue_exp() calling task_queue() > with an invalid task expiration. I'm having a look now. I'll respond in > the issue with what I can find, thanks for your report. These "connection refused" is from our watchdog; but the effects are as perceptible from the outside. When our watchdog hits this situation, it will forcefully restart HAProxy (we have 2 instances) because there will be a considerable service degradation. If you remember, there's https://github.com/haproxy/haproxy/issues/1895 and we talked briefly about this in person, at HAProxyConf. But this is incredibly elusive to reproduce; it comes and goes. It might happen every few minutes, or not happen at all for months. Not tied to a specific setup: different versions, kernels, machines. In fact, we do not have better ways to detect the situation, at least not as fast, reactive, and resilient. > > Since you were speaking about FD count and maxconn at 900k, please let > me take this opportunity for a few extra sanity checks. By default we > assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared > to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k > FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a > per-process hard limit and usually is set to 1M, and that > /proc/sys/fs/file-max sets a system-wide limit and depends on the amount > of RAM, so both may interact with such a large setting. We could for > example imagine that at ~256k connections with as many pipes you're > reaching around 1M FDs and that the connection from socat to the CLI > socket cannot be accepted and is rejected. Since you recently updated > your kernel, it might be worth checking if the default values are still > in line with your usage. We set our defaults pretty high in anticipation: /proc/sys/fs/file-max = 5M; /proc/sys/fs/nr_open = 3M; Even with our software stack, we do not reach the limits. A long time ago we did hit (lower limits back then) and the effects are devastating. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
Re: Help tracking "connection refused" under pressure on v2.9
On Wed, 27 Mar 2024 11:06:39 -0300 Felipe Wilhelms Damasio wrote: > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 > sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000] We managed to get a core file, and so created ticket #2508 (https://github.com/haproxy/haproxy/issues/2508) with more details. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
Re: Help tracking "connection refused" under pressure on v2.9
On Wed, 27 Mar 2024 11:06:39 -0300 Felipe Wilhelms Damasio wrote: > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 sp:7fd7c002f100 > error:0 in haproxy[42c000+1f7000] In our build, this would be where instruction pointer was: (gdb) list *0x5b10e6 0x5b10e6 is in __task_queue (src/task.c:285). 280(wq != &tg_ctx->timers && wq != &th_ctx->timers)); 281 #endif 282 /* if this happens the process is doomed anyway, so better catch it now 283 * so that we have the caller in the stack. 284 */ 285 BUG_ON(task->expire == TICK_ETERNITY); 286 287 if (likely(task_in_wq(task))) 288 __task_unlink_wq(task); 289 However, we can't produce a stack trace from only the instruction pointer; at least I don't know how (but would love to learn if it is possible). We are trying to get a core dump, too. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
Re: 1.8.19: Segfault at startup with nbthread > 1 and option tcp-check
Hi Leonhard, On Mon, 1 Apr 2019 13:14:06 +0200 Leonhard Wimmer wrote: > I also managed to get a stack trace with the current haproxy-1.8 (a3cfe8f4): > > #0 tcpcheck_main (check=check@entry=0x55746608) at src/checks.c:2774 > #1 0x5561c464 in connect_conn_chk (t=0x5575d1f0) at > src/checks.c:1549 This segfault might be fixed with this commit: http://git.haproxy.org/?p=haproxy.git;a=commit;h=4bccea98912c74fa42c665ec25e417c2cca4eee7 Cheers, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/
[PATCH] BUG/MAJOR: segfault during tcpcheck_main
Hello, We have been chasing a segfault for a few weeks and today we were able to track it down. There is a null-pointer dereferencing when using tcp-check connect; although we don't know yet as to why the protocol family might be unknown during tcpcheck_main(). On some of our servers, HAProxy was crashing a few times per day, without any clear pattern, while there were no crashes on other servers running the same version/configuration. We manually checked every use of protocol_by_family() and only the one in tcpcheck_main() was unprotected. To comply with the coding standards, we used the same behavior found in another function in src/checks.c, which also calls protocol_by_family but that function tests for a NULL pointer before dereferencing it. Best regards, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/ >From 29684452b9cf923e321537ffc1f4694c88131f4b Mon Sep 17 00:00:00 2001 From: Ricardo Nabinger Sanchez Date: Thu, 28 Mar 2019 21:42:23 -0300 Subject: [PATCH] BUG/MAJOR: segfault during tcpcheck_main When using TCP health checks (tcp-check connect), it is possible to crash with a segfault when, for reasons yet to be understood, the protocol family is unknown. In the function tcpcheck_main(), proto is dereferenced without a prior test in case it is NULL, leading to the segfault during proto->connect dereference. The line has been unmodified since it was introduced, in commit 69e273f3fcfbfb9cc0fb5a09668faad66cfbd36b. This was the only use of proto (or more specifically, the return of protocol_by_family()) that was unprotected; all other callsites perform the test for a NULL pointer. This patch should be backported to 1.9, 1.8, 1.7, and 1.6. --- src/checks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/checks.c b/src/checks.c index 35744c6b..31004ddf 100644 --- a/src/checks.c +++ b/src/checks.c @@ -2839,7 +2839,7 @@ static int tcpcheck_main(struct check *check) cs_attach(cs, check, &check_conn_cb); ret = SF_ERR_INTERNAL; - if (proto->connect) + if (proto && proto->connect) ret = proto->connect(conn, 1 /* I/O polling is always needed */, (next && next->action == TCPCHK_ACT_EXPECT) ? 0 : 2); -- 2.21.0
Re: 1.9.2: Crash with 300% CPU and stuck agent-checks
On Thu, 14 Mar 2019 11:58:39 +0100 Willy Tarreau wrote: > But I know it's possible to do better > using "-ex" or something like this. Here is an example, for "thread apply all backtrace": $ gdb -batch -ex 't a a bt' /bin/sleep 4929 0x7fa7529904d4 in nanosleep () from /lib64/libc.so.6 Thread 1 (process 4929): #0 0x7fa7529904d4 in nanosleep () from /lib64/libc.so.6 #1 0x004053d7 in ?? () #2 0x00405220 in ?? () #3 0x004024c0 in ?? () #4 0x7fa7528d8b5b in __libc_start_main () from /lib64/libc.so.6 #5 0x0040257a in ?? () [Inferior 1 (process 4929) detached] Note the use of "-batch", so there won't be GDB prompts such as confirmation for detaching from the process. Cheers, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/
Re: possible use of unitialized value in v2.0-dev0-274-g1a0fe3be
On Wed, 6 Feb 2019 19:12:31 +0100 Tim Düsterhus wrote: > Line 4398 is missing here, it appends a marker (empty string) to mark > the end of the array. > > > ... > > > > 4450 /* look for the Host header and place it in :authority > > */ > > 4451 auth = ist2(NULL, 0); > > 4452 for (hdr = 0; hdr < sizeof(list)/sizeof(list[0]); > > hdr++) { > > 4453 if (isteq(list[hdr].n, ist(""))) > > // (here, assume the condition is false, so control keeps in this block...) > > > > We established that `list` is an array without holes terminated by an > empty string. > > Thus either: > 1. The Condition is false, then the value must be initialized > or > 2. The Condition is true, then the loop is exited. > > Thus I believe this is a false-positive. Thank you for checking this out and, by extension, Willy on his set of replise. I missed the marker (and apparently, so did Clang). Cheers, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/
possible use of unitialized value in v2.0-dev0-274-g1a0fe3be
Hello, scan-build found a 28-step path where an unitialized value could be used in h2s_htx_bck_make_req_headers(). Here is a shortened version: 4378 idx = htx_get_head(htx); // returns the SL that we skip 4379 while ((idx = htx_get_next(htx, idx)) != -1) { 4380 blk = htx_get_blk(htx, idx); 4381 type = htx_get_blk_type(blk); 4382 4383 if (type == HTX_BLK_UNUSED) 4384 continue; 4385 4386 if (type != HTX_BLK_HDR) // (here, assume condition is true, so control leaves the loop...) 4387 break; 4388 4389 if (unlikely(hdr >= sizeof(list)/sizeof(list[0]) - 1)) 4390 goto fail; 4391 // (... and list will not be initialized.) 4392 list[hdr].n = htx_get_blk_name(htx, blk); 4393 list[hdr].v = htx_get_blk_value(htx, blk); 4394 hdr++; 4395 } ... 4450 /* look for the Host header and place it in :authority */ 4451 auth = ist2(NULL, 0); 4452 for (hdr = 0; hdr < sizeof(list)/sizeof(list[0]); hdr++) { 4453 if (isteq(list[hdr].n, ist(""))) // (here, assume the condition is false, so control keeps in this block...) 4454 break; // end 4455 4456 if (isteq(list[hdr].n, ist("host"))) { 4457 auth = list[hdr].v; // (... auth receives an uninitialized value from list ...) 4458 break; 4459 } 4460 } 4461 } 4462 else { 4463 /* for CONNECT, :authority is taken from the path */ 4464 auth = path; 4465 } 4466 // (... and here auth is evaluated, but it contains whatever uninitialized // data that list had, because its initialization has been jumped over.) 4467 if (auth.ptr && !hpack_encode_header(&outbuf, ist(":authority"), auth)) { 4468 /* output full */ 4469 if (b_space_wraps(&h2c->mbuf)) 4470 goto realign_again; 4471 goto full; 4472 } While this feels like a convoluted or unlikely scenario, the path leading to the use of uninitialized value seems to be correctly unearthed by scan-build. Also, there might be a chance that this path invokes undefined behavior, leading to further surprises. Does this make sense? Cheers, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/
possible use-after-free on v1.9-dev0-3-ge78915a4
Hello, While compile-testing 1.9-dev with Clang/LLVM analyzer, it found the following (possible) scenario: 1- in function cfg_cache_postparser(), when entering the nested loop in line 914, cache_ptr is free()ed and redefined to cache (line 918): 914 list_for_each_entry(cache, &caches, list) { 915 if (!strcmp(cache->id, cache_ptr)) { 916 /* there can be only one filter per cache, so we free it there */ 917 free(cache_ptr); 918 cache_ptr = cache; 919 break; 920 } 921 } 2- loop is interrupted on the break in line 919; 3- if the test in line 923 passes (testing cache_ptr with the pointer fcont->conf), ha_alert() function in line 924 will be called and attempt to print/dereference the (recently freed) content pointed by fconf->conf: 923 if (cache_ptr == fconf->conf) { 924 ha_alert("Proxy '%s': unable to find the cache '%s' referenced by the filter 'cache'.\n", 925 curproxy->id, (char *)fconf->conf); 926 err++; 927 } I'm not sure how realistic this might be (especially to exploit), but it feels worthy of reporting. Cheers, -- Ricardo Nabinger Sanchez http://www.taghos.com.br/