On Thu, Apr 15, 2021 at 07:53:15PM +, Robin H. Johnson wrote:
> But your thought of CPU pinning was good.
> I went to confirm it in the host, and I'm not certain if the cpu-map is
> working
> right.
Ignore me, long day and I didn't think to check each thread PID:
# ps -e -T | grep haproxy -w
On Thu, Apr 15, 2021 at 09:23:07AM +0200, Willy Tarreau wrote:
> On Thu, Apr 15, 2021 at 07:13:53AM +, Robin H. Johnson wrote:
> > Thanks; I will need to catch it faster or automate this, because the
> > watchdog does a MUCH better job restarting it than before, less than 30
> > seconds of
On Thu, Apr 15, 2021 at 07:13:53AM +, Robin H. Johnson wrote:
> Thanks; I will need to catch it faster or automate this, because the
> watchdog does a MUCH better job restarting it than before, less than 30
> seconds of 100% CPU before the watchdog reliably kills it.
I see. Then collecting
On Thu, Apr 15, 2021 at 08:59:35AM +0200, Willy Tarreau wrote:
> On Wed, Apr 14, 2021 at 01:53:06PM +0200, Christopher Faulet wrote:
> > > nbthread=64, nbproc=1 on both 1.8/2.x
> >
> > It is thus surprising, if it is really a contention issue, that you never
> > observed slow down on the 1.8.
On Wed, Apr 14, 2021 at 01:53:06PM +0200, Christopher Faulet wrote:
> > nbthread=64, nbproc=1 on both 1.8/2.x
>
> It is thus surprising, if it is really a contention issue, that you never
> observed slow down on the 1.8. There is no watchdog, but the thread
> implementation is a bit awkward on
Le 10/04/2021 à 00:34, Robin H. Johnson a écrit :
On Fri, Apr 09, 2021 at 10:14:26PM +0200, Christopher Faulet wrote:
It seems you have a blocking call in one of your lua script. The threads dump
shows many threads blocked in hlua_ctx_init. Many others are executing lua.
Unfortunately, for a
On Fri, Apr 09, 2021 at 10:14:26PM +0200, Christopher Faulet wrote:
> It seems you have a blocking call in one of your lua script. The threads dump
> shows many threads blocked in hlua_ctx_init. Many others are executing lua.
> Unfortunately, for a unknown reason, there is no stack traceback.
Le 09/04/2021 à 19:26, Robin H. Johnson a écrit :
Hi,
Maciej had said they were going to create a new thread, but I didn't see
one yet.
I want to start by noting problem was much worse on 2.2.8 & 2.2.9, and
that 2.2.13 & 2.3.9 don't get entirely hung at 100% anymore: a big
thanks for that
Hi Robin,
W dniu pt., 9.04.2021 o 19:26 Robin H. Johnson
napisał(a):
> Maciej had said they were going to create a new thread, but I didn't see
> one yet.
>
> I want to start by noting problem was much worse on 2.2.8 & 2.2.9, and
> that 2.2.13 & 2.3.9 don't get entirely hung at 100% anymore: a
Hi,
Maciej had said they were going to create a new thread, but I didn't see
one yet.
I want to start by noting problem was much worse on 2.2.8 & 2.2.9, and
that 2.2.13 & 2.3.9 don't get entirely hung at 100% anymore: a big
thanks for that initial work in fixing the issue.
As I mentioned in my
Hi Christopher,
Yes I know, my issues are always pretty weird. ;) Of course it's not
reproducible. :(
I'll try to collect more data and return to you. I will start a new thread
to not mix those two cases.
Kind regards,
pt., 2 kwi 2021 o 10:13 Christopher Faulet napisał(a):
> Le 31/03/2021 à
Le 31/03/2021 à 13:28, Maciej Zdeb a écrit :
Hi,
Well it's a bit better situation than earlier because only one thread is looping
forever and the rest is working properly. I've tried to verify where exactly the
thread looped but doing "n" in gdb fixed the problem :( After quitting gdb
I've forgot to mention that the backtrace is from 2.2.11 built from
http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=601704962bc9d82b3b1cc97d90d2763db0ae4479
śr., 31 mar 2021 o 13:28 Maciej Zdeb napisał(a):
> Hi,
>
> Well it's a bit better situation than earlier because only one thread is
>
Hi,
Well it's a bit better situation than earlier because only one thread is
looping forever and the rest is working properly. I've tried to verify
where exactly the thread looped but doing "n" in gdb fixed the problem :(
After quitting gdb session all threads were idle. Before I started gdb it
Le 25/03/2021 à 13:38, Maciej Zdeb a écrit :
Hi,
I deployed a patched (with volatile hlua_not_dumpable) HAProxy and so far so
good, no looping. Christopher I saw new patches with hlua_traceback used
instead, looks much cleaner to me, should I verify them instead? :)
Christopher & Willy I've
Hi,
I deployed a patched (with volatile hlua_not_dumpable) HAProxy and so far
so good, no looping. Christopher I saw new patches with hlua_traceback used
instead, looks much cleaner to me, should I verify them instead? :)
Christopher & Willy I've forgotten to thank you for help!
Kind regards,
śr., 24 mar 2021 o 10:37 Christopher Faulet
napisał(a):
> However, reading the other trace Maciej sent (bussy_thread_peers.txt), it
> seems
> possible to stop a memory allocation from other places. Thus, I guess we
> must
> find a more global way to prevent the lua stack dump.
>
I'm not sure
Le 24/03/2021 à 10:16, Willy Tarreau a écrit :
On Wed, Mar 24, 2021 at 10:11:19AM +0100, Maciej Zdeb wrote:
Wow, that's it! :)
0x55d94949e965 <+53>: addl $0x1,%fs:0xfffdd688
0x55d94949e96e <+62>: callq 0x55d9494782c0
0x55d94949e973 <+67>: subl
On Wed, Mar 24, 2021 at 10:11:19AM +0100, Maciej Zdeb wrote:
> Wow, that's it! :)
>
>0x55d94949e965 <+53>: addl $0x1,%fs:0xfffdd688
>0x55d94949e96e <+62>: callq 0x55d9494782c0
>0x55d94949e973 <+67>: subl $0x1,%fs:0xfffdd688
> [...]
>
Wow, that's it! :)
0x55d94949e965 <+53>: addl $0x1,%fs:0xfffdd688
0x55d94949e96e <+62>: callq 0x55d9494782c0
0x55d94949e973 <+67>: subl $0x1,%fs:0xfffdd688
[...]
0x55d94949e99f <+111>: ja 0x55d94949e9b0
0x55d94949e9a1 <+113>: mov
On Wed, Mar 24, 2021 at 09:52:22AM +0100, Willy Tarreau wrote:
> So yes, it looks like gcc decides that a function called "malloc" will
> never use your program's global variables but that "blablalloc" may. I
> have no explanation to this except "optimization craziness" resulting
> in breaking
On Wed, Mar 24, 2021 at 09:41:03AM +0100, Willy Tarreau wrote:
> This is particularly strange. Could you please disassemble hlua_alloc ?
> (dis hlua_alloc) ?
>
> You should find something like this:
>
>0x004476c3 <+147>: addDWORD PTR fs:0xfffdd678,0x1
>
On Wed, Mar 24, 2021 at 08:55:33AM +0100, Maciej Zdeb wrote:
> After reading I wasn't sure anymore I even tested properly patched package.
> :)
Hehe, I know that this happens quite a lot when starting to play with
different binaries.
> Fortunately I have a core file so I verified if
Hi,
wt., 23 mar 2021 o 18:36 Willy Tarreau napisał(a):
> > It is most probably because of compiler optimizations. Some compiler
> > barriers are necessary to avoid instructions reordering. It is the
> purpose
> > of attached patches. Sorry to ask you it again, but could you make some
> > tests
On Tue, Mar 23, 2021 at 04:12:41PM +0100, Christopher Faulet wrote:
> Le 23/03/2021 à 11:14, Maciej Zdeb a écrit :
> > Hi Christopher,
> >
> > Bad news, patches didn't help. Attaching stacktraces, now it looks that
> > spoe that executes lua scripts (free_thread_spue_lua.txt) tried to
> > malloc
Le 23/03/2021 à 11:14, Maciej Zdeb a écrit :
Hi Christopher,
Bad news, patches didn't help. Attaching stacktraces, now it looks that spoe
that executes lua scripts (free_thread_spue_lua.txt) tried to malloc twice. :(
It is most probably because of compiler optimizations. Some compiler
Hi Christopher,
Bad news, patches didn't help. Attaching stacktraces, now it looks that
spoe that executes lua scripts (free_thread_spue_lua.txt) tried to malloc
twice. :(
Kind regards,
pon., 22 mar 2021 o 08:39 Maciej Zdeb napisał(a):
> Hi Christopher,
>
> Thanks! I'm building a patched
Hi Christopher,
Thanks! I'm building a patched version and will return with feedback!
Kind regards,
pt., 19 mar 2021 o 16:40 Christopher Faulet
napisał(a):
> Le 16/03/2021 à 13:46, Maciej Zdeb a écrit :
> > Sorry for spam. In the last message I said that the old process (after
> reload)
> >
Le 16/03/2021 à 13:46, Maciej Zdeb a écrit :
Sorry for spam. In the last message I said that the old process (after reload)
is consuming cpu for lua processing and that's not true, it is processing other
things also.
I'll take a break. ;) Then I'll verify if the issue exists on 2.3 and maybe
Hi Christopher,
That's good news! If you need me to test a patch then let me know.
On my side I'm preparing to update HAProxy to 2.3 and solving some simple
issues like lacking new lines on the end of configuration. ;)
Kind regards,
śr., 17 mar 2021 o 10:49 Christopher Faulet
napisał(a):
>
Le 16/03/2021 à 13:46, Maciej Zdeb a écrit :
Sorry for spam. In the last message I said that the old process (after reload)
is consuming cpu for lua processing and that's not true, it is processing other
things also.
I'll take a break. ;) Then I'll verify if the issue exists on 2.3 and maybe
On Tue, Mar 16, 2021 at 01:46:48PM +0100, Maciej Zdeb wrote:
> In the last message I said that the old process (after
> reload) is consuming cpu for lua processing and that's not true, it is
> processing other things also.
>
> I'll take a break. ;) Then I'll verify if the issue exists on 2.3 and
Sorry for spam. In the last message I said that the old process (after
reload) is consuming cpu for lua processing and that's not true, it is
processing other things also.
I'll take a break. ;) Then I'll verify if the issue exists on 2.3 and maybe
2.4 branch. For each version I need a week or two
Below is the output from perf top - it happens during reload (all threads
on an old process spike and use 100% cpu, processing lua) and after 15-30
seconds old process exits. It is probably a different bug, because it
happens on an old process and I have no idea how it could affect the new
Sure, patch from Christopher attached. :)
wt., 16 mar 2021 o 10:58 Willy Tarreau napisał(a):
> Hi Maciej,
>
> On Tue, Mar 16, 2021 at 10:55:11AM +0100, Maciej Zdeb wrote:
> > Hi,
> >
> > I'm returning with bad news, the patch did not help and the issue
> occurred
> > today (on patched 2.2.10).
Hi Maciej,
On Tue, Mar 16, 2021 at 10:55:11AM +0100, Maciej Zdeb wrote:
> Hi,
>
> I'm returning with bad news, the patch did not help and the issue occurred
> today (on patched 2.2.10). It is definitely related to reloads, however it
> is very rare issue it worked flawlessly the whole week.
OK.
Hi,
I'm returning with bad news, the patch did not help and the issue occurred
today (on patched 2.2.10). It is definitely related to reloads, however it
is very rare issue it worked flawlessly the whole week.
wt., 9 mar 2021 o 09:17 Willy Tarreau napisał(a):
> On Tue, Mar 09, 2021 at
On Tue, Mar 09, 2021 at 09:04:43AM +0100, Maciej Zdeb wrote:
> Hi,
>
> After applying the patch, the issue did not occur, however I'm still not
> sure it is fixed. Unfortunately I don't have a reliable way to trigger it.
OK. If it's related, it's very possible that some of the issues we've
Hi,
After applying the patch, the issue did not occur, however I'm still not
sure it is fixed. Unfortunately I don't have a reliable way to trigger it.
pt., 5 mar 2021 o 22:07 Willy Tarreau napisał(a):
> Note, before 2.4, a single thread can execute Lua scripts at once,
> with the others
On Fri, Mar 05, 2021 at 12:00:52PM +0100, Christopher Faulet wrote:
> Le 05/03/2021 à 11:35, Maciej Zdeb a écrit :
> > Hi Christopher,
> >
> > Thanks, I'll check but it'll take a couple days because the issue is
> > quite rare. I'll return with feedback!
> >
> > Maybe the patch is not backported
Le 05/03/2021 à 11:35, Maciej Zdeb a écrit :
Hi Christopher,
Thanks, I'll check but it'll take a couple days because the issue is quite rare.
I'll return with feedback!
Maybe the patch is not backported to 2.2 because of commit message that states
only 2.3 branch?
That's it. And it was
Hi Christopher,
Thanks, I'll check but it'll take a couple days because the issue is quite
rare. I'll return with feedback!
Maybe the patch is not backported to 2.2 because of commit message that
states only 2.3 branch?
Kind regards,
czw., 4 mar 2021 o 22:34 Christopher Faulet
napisał(a):
>
Le 04/03/2021 à 14:01, Maciej Zdeb a écrit :
Hi,
Sometimes after HAProxy reload it starts to loop infinitely, for example 9 of 10
threads using 100% CPU (gdb sessions attached). I've also dumped the core file
from gdb.
Hi Maciej,
The 2.2.1O is out. But I'm afraid that a fix is missing.
43 matches
Mail list logo