Re: Help tracking "connection refused" under pressure on v2.9

2024-03-29 Thread Ricardo Nabinger Sanchez
Hi Willy,

On Fri, 29 Mar 2024 07:17:56 +0100
Willy Tarreau  wrote:

> > These "connection refused" is from our watchdog; but the effects are as
> > perceptible from the outside.  When our watchdog hits this situation,
> > it will forcefully restart HAProxy (we have 2 instances) because there
> > will be a considerable service degradation.  If you remember, there's
> > https://github.com/haproxy/haproxy/issues/1895 and we talked briefly
> > about this in person, at HAProxyConf.  
> 
> Thanks for the context. I didn't remember about the issue. I remembered
> we discussed for a while but didn't remember about the issue in question
> obviously, given the number of issues I'm dealing with :-/
> 
> In the issue above I'm seeing an element from Felipe saying that telnet
> to port 80 can take between 3 seconds to accept. That really makes me
> think about either the SYN queue being full, causing drops and retransmits,
> or a lack of socket memory to accept packets. That one could possibly be
> caused by tcp_mem not being large enough due to some transfers with high
> latency fast clients taking a lot of RAM, but it should not affect the
> local UNIX socket. Also, killing the process means killing all the
> associated connections and will definitely result in freeing a huge
> amount of network buffers, so it could fuel that direction. If you have
> two instances, did you notice if the two start to behave badly at the
> same time ? If that's the case, it would definitely indicate a possible
> resource-based cause like socket memory etc.

Of our 2 HAProxy instances, it is usually one (mostly the frontend one)
that exhibits this behavior.  And as it is imperative that the
corrective action be as swift as possible, all instances are terminated
(which can include older instances, from graceful reloads), and new
instances are started.  Very harsh, but at >50 Gbps, each full second
of downtime adds up considerably to network pressure.

So for context, our least capable machine has 256 GB RAM.  We have not
seen any spikes over the metrics we monitor, and this issue tends to
happen at a very stable steady-state, albeit a loaded one.  While it is
currently outside of our range for detailed data, we didn't notice
anything unusual, especially regarding memory usage, on these traps we
reported.

But of course, there could be a metric that we're not yet aware that
correlates.  Anyone from the dustier, darkest corners that you know of?
:-)


> 
> > But this is incredibly elusive to reproduce; it comes and goes.  It
> > might happen every few minutes, or not happen at all for months.  Not
> > tied to a specific setup: different versions, kernels, machines.  In
> > fact, we do not have better ways to detect the situation, at least not
> > as fast, reactive, and resilient.  
> 
> It might be useful to take periodic snapshots of /proc/slabinfo and
> see if something jumps during such incidents (grep for TCP, net, skbuff
> there). I guess you have not noticed any "out of socket memory" nor such
> indications in your kernel logs, of course :-/

We have no indications of memory pressure related to network.  At the
peak, we usually see something like 15~22% overall active memory (it
fails me, but it might take >70% of active memory for these machines to
actually degrade, maybe more).  As for TCP stuff, around 16~30k active
sockets, plus some 50~100k in timewait, and still not creating any
problems.


> 
> Another one that could make sense to monitor is "PoolFailed" in
> "show info". It should always remain zero.

We collect this (all available actually); I don't remember this one
ever measuring more than zero.  But we'll keep an eye on it.

In time, could this be somewhat unrelated to HAProxy?  I.e., maybe
kernel?

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



Re: Help tracking "connection refused" under pressure on v2.9

2024-03-28 Thread Ricardo Nabinger Sanchez
Hi Willy,

On Thu, 28 Mar 2024 04:37:11 +0100
Willy Tarreau  wrote:

> Thanks guys! So there seems to be an annoying bug. However I'm not sure
> how this is related to your "connection refused", except if you try to
> connect at the moment the process crashes and restarts, of course. I'm
> seeing that the bug here is stktable_requeue_exp() calling task_queue()
> with an invalid task expiration. I'm having a look now. I'll respond in
> the issue with what I can find, thanks for your report.

These "connection refused" is from our watchdog; but the effects are as
perceptible from the outside.  When our watchdog hits this situation,
it will forcefully restart HAProxy (we have 2 instances) because there
will be a considerable service degradation.  If you remember, there's
https://github.com/haproxy/haproxy/issues/1895 and we talked briefly
about this in person, at HAProxyConf.

But this is incredibly elusive to reproduce; it comes and goes.  It
might happen every few minutes, or not happen at all for months.  Not
tied to a specific setup: different versions, kernels, machines.  In
fact, we do not have better ways to detect the situation, at least not
as fast, reactive, and resilient.


> 
> Since you were speaking about FD count and maxconn at 900k, please let
> me take this opportunity for a few extra sanity checks. By default we
> assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared
> to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k
> FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a
> per-process hard limit and usually is set to 1M, and that
> /proc/sys/fs/file-max sets a system-wide limit and depends on the amount
> of RAM, so both may interact with such a large setting. We could for
> example imagine that at ~256k connections with as many pipes you're
> reaching around 1M FDs and that the connection from socat to the CLI
> socket cannot be accepted and is rejected. Since you recently updated
> your kernel, it might be worth checking if the default values are still
> in line with your usage.

We set our defaults pretty high in anticipation:

/proc/sys/fs/file-max = 5M;
/proc/sys/fs/nr_open = 3M;

Even with our software stack, we do not reach the limits.  A long time
ago we did hit (lower limits back then) and the effects are devastating.

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Ricardo Nabinger Sanchez
On Wed, 27 Mar 2024 11:06:39 -0300
Felipe Wilhelms Damasio  wrote:

> kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26
> sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000]

We managed to get a core file, and so created ticket #2508
(https://github.com/haproxy/haproxy/issues/2508) with more details.

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Ricardo Nabinger Sanchez
On Wed, 27 Mar 2024 11:06:39 -0300
Felipe Wilhelms Damasio  wrote:

> kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 sp:7fd7c002f100 
> error:0 in haproxy[42c000+1f7000]

In our build, this would be where instruction pointer was:

(gdb) list *0x5b10e6
0x5b10e6 is in __task_queue (src/task.c:285).
280(wq != &tg_ctx->timers && wq != &th_ctx->timers));
281 #endif
282 /* if this happens the process is doomed anyway, so better 
catch it now
283  * so that we have the caller in the stack.
284  */
285 BUG_ON(task->expire == TICK_ETERNITY);
286
287 if (likely(task_in_wq(task)))
288 __task_unlink_wq(task);
289

However, we can't produce a stack trace from only the instruction
pointer; at least I don't know how (but would love to learn if it is
possible).

We are trying to get a core dump, too.

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



Re: 1.8.19: Segfault at startup with nbthread > 1 and option tcp-check

2019-04-01 Thread Ricardo Nabinger Sanchez
Hi Leonhard,

On Mon, 1 Apr 2019 13:14:06 +0200
Leonhard Wimmer
 wrote:

> I also managed to get a stack trace with the current haproxy-1.8 (a3cfe8f4):
> 
> #0  tcpcheck_main (check=check@entry=0x55746608) at src/checks.c:2774
> #1  0x5561c464 in connect_conn_chk (t=0x5575d1f0) at
> src/checks.c:1549

This segfault might be fixed with this commit:


http://git.haproxy.org/?p=haproxy.git;a=commit;h=4bccea98912c74fa42c665ec25e417c2cca4eee7

Cheers,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/




[PATCH] BUG/MAJOR: segfault during tcpcheck_main

2019-03-28 Thread Ricardo Nabinger Sanchez
Hello,

We have been chasing a segfault for a few weeks and today we were able
to track it down.  There is a null-pointer dereferencing when using
tcp-check connect; although we don't know yet as to why the protocol
family might be unknown during tcpcheck_main().

On some of our servers, HAProxy was crashing a few times per day,
without any clear pattern, while there were no crashes on other servers
running the same version/configuration.

We manually checked every use of protocol_by_family() and only the one
in tcpcheck_main() was unprotected.  To comply with the coding
standards, we used the same behavior found in another function in
src/checks.c, which also calls protocol_by_family but that function
tests for a NULL pointer before dereferencing it.

Best regards,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/
>From 29684452b9cf923e321537ffc1f4694c88131f4b Mon Sep 17 00:00:00 2001
From: Ricardo Nabinger Sanchez 
Date: Thu, 28 Mar 2019 21:42:23 -0300
Subject: [PATCH] BUG/MAJOR: segfault during tcpcheck_main

When using TCP health checks (tcp-check connect), it is possible to
crash with a segfault when, for reasons yet to be understood, the
protocol family is unknown.

In the function tcpcheck_main(), proto is dereferenced without a prior
test in case it is NULL, leading to the segfault during proto->connect
dereference.

The line has been unmodified since it was introduced, in commit
69e273f3fcfbfb9cc0fb5a09668faad66cfbd36b.  This was the only use of
proto (or more specifically, the return of  protocol_by_family()) that
was unprotected; all other callsites perform the test for a NULL
pointer.

This patch should be backported to 1.9, 1.8, 1.7, and 1.6.
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index 35744c6b..31004ddf 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2839,7 +2839,7 @@ static int tcpcheck_main(struct check *check)
 			cs_attach(cs, check, &check_conn_cb);
 
 			ret = SF_ERR_INTERNAL;
-			if (proto->connect)
+			if (proto && proto->connect)
 ret = proto->connect(conn,
 		 1 /* I/O polling is always needed */,
 		 (next && next->action == TCPCHK_ACT_EXPECT) ? 0 : 2);
-- 
2.21.0



Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-15 Thread Ricardo Nabinger Sanchez
On Thu, 14 Mar 2019 11:58:39 +0100
Willy Tarreau  wrote:

> But I know it's possible to do better
> using "-ex" or something like this.

Here is an example, for "thread apply all backtrace":

$ gdb -batch -ex 't a a bt' /bin/sleep 4929
0x7fa7529904d4 in nanosleep () from /lib64/libc.so.6

Thread 1 (process 4929):
#0  0x7fa7529904d4 in nanosleep () from /lib64/libc.so.6
#1  0x004053d7 in ?? ()
#2  0x00405220 in ?? ()
#3  0x004024c0 in ?? ()
#4  0x7fa7528d8b5b in __libc_start_main ()
from /lib64/libc.so.6 #5  0x0040257a in ?? ()
[Inferior 1 (process 4929) detached]

Note the use of "-batch", so there won't be GDB prompts such as
confirmation for detaching from the process.

Cheers,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/




Re: possible use of unitialized value in v2.0-dev0-274-g1a0fe3be

2019-02-08 Thread Ricardo Nabinger Sanchez
On Wed, 6 Feb 2019 19:12:31 +0100
Tim Düsterhus  wrote:

> Line 4398 is missing here, it appends a marker (empty string) to mark
> the end of the array.
> 
> > ...
> > 
> > 4450 /* look for the Host header and place it in :authority 
> > */
> > 4451 auth = ist2(NULL, 0);
> > 4452 for (hdr = 0; hdr < sizeof(list)/sizeof(list[0]); 
> > hdr++) {
> > 4453 if (isteq(list[hdr].n, ist("")))
> > // (here, assume the condition is false, so control keeps in this block...) 
> >  
> 
> We established that `list` is an array without holes terminated by an
> empty string.
> 
> Thus either:
> 1. The Condition is false, then the value must be initialized
> or
> 2. The Condition is true, then the loop is exited.
> 
> Thus I believe this is a false-positive.

Thank you for checking this out and, by extension, Willy on his set of
replise.  I missed the marker (and apparently, so did Clang).

Cheers,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/




possible use of unitialized value in v2.0-dev0-274-g1a0fe3be

2019-02-06 Thread Ricardo Nabinger Sanchez
Hello,

scan-build found a 28-step path where an unitialized value could be used in
h2s_htx_bck_make_req_headers().

Here is a shortened version:

4378 idx = htx_get_head(htx); // returns the SL that we skip
4379 while ((idx = htx_get_next(htx, idx)) != -1) {
4380 blk = htx_get_blk(htx, idx);
4381 type = htx_get_blk_type(blk);
4382 
4383 if (type == HTX_BLK_UNUSED)
4384 continue;
4385 
4386 if (type != HTX_BLK_HDR)
// (here, assume condition is true, so control leaves the loop...)
4387 break;
4388 
4389 if (unlikely(hdr >= sizeof(list)/sizeof(list[0]) - 1))
4390 goto fail;
4391 
// (... and list will not be initialized.)
4392 list[hdr].n = htx_get_blk_name(htx, blk);
4393 list[hdr].v = htx_get_blk_value(htx, blk);
4394 hdr++;
4395 }

...

4450 /* look for the Host header and place it in :authority */
4451 auth = ist2(NULL, 0);
4452 for (hdr = 0; hdr < sizeof(list)/sizeof(list[0]); hdr++) {
4453 if (isteq(list[hdr].n, ist("")))
// (here, assume the condition is false, so control keeps in this block...)
4454 break; // end
4455 
4456 if (isteq(list[hdr].n, ist("host"))) {
4457 auth = list[hdr].v;
// (... auth receives an uninitialized value from list ...)
4458 break;
4459 }
4460 }
4461 }
4462 else {
4463 /* for CONNECT, :authority is taken from the path */
4464 auth = path;
4465 }
4466 
// (... and here auth is evaluated, but it contains whatever uninitialized
//  data that list had, because its initialization has been jumped over.)
4467 if (auth.ptr && !hpack_encode_header(&outbuf, ist(":authority"), 
auth)) {
4468 /* output full */
4469 if (b_space_wraps(&h2c->mbuf))
4470 goto realign_again;
4471 goto full;
4472 }

While this feels like a convoluted or unlikely scenario, the path leading
to the use of uninitialized value seems to be correctly unearthed by
scan-build.  Also, there might be a chance that this path invokes undefined
behavior, leading to further surprises.

Does this make sense?

Cheers,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/




possible use-after-free on v1.9-dev0-3-ge78915a4

2017-11-27 Thread Ricardo Nabinger Sanchez
Hello,

While compile-testing 1.9-dev with Clang/LLVM analyzer, it found the
following (possible) scenario:

1- in function cfg_cache_postparser(), when entering the nested loop in
line 914, cache_ptr is free()ed and redefined to cache (line 918):

 914 list_for_each_entry(cache, &caches, list) {
 915 if (!strcmp(cache->id, cache_ptr)) {
 916 /* there can be only one filter 
per cache, so we free it there */
 917 free(cache_ptr);
 918 cache_ptr = cache;
 919 break;
 920 }
 921 }


2- loop is interrupted on the break in line 919;

3- if the test in line 923 passes (testing cache_ptr with the pointer
fcont->conf), ha_alert() function in line 924 will be called and attempt
to print/dereference the (recently freed) content pointed by
fconf->conf:

 923 if (cache_ptr == fconf->conf) {
 924 ha_alert("Proxy '%s': unable to find the 
cache '%s' referenced by the filter 'cache'.\n",
 925  curproxy->id, (char 
*)fconf->conf);
 926 err++;
 927 }

I'm not sure how realistic this might be (especially to exploit), but it
feels worthy of reporting.

Cheers,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/