subject:"Re\: 100% cpu usage 1.9\-dev0\-48d92ee 2018\/07\/30, task.\?. but keeps working.. \(nbthread 1\)"

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-21 Thread Olivier Houchard

Hi Pieter,

On Mon, Aug 20, 2018 at 09:33:49PM +0200, PiBa-NL wrote:
> Hi Olivier,
> 
> Op 17-8-2018 om 14:51 schreef Willy Tarreau:
> > On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:
> > > That is true, this one is not a bug, but a pessimization, by using the 
> > > global
> > > update_list which is more costly than the local one.
> > > 
> > > Patches attached to do as suggest.
> > Applied, thank you!
> > willy
> 
> Just a little update :)
> 
> The '1.9-dev0-48d92ee 2018/07/30'+your initial 2 patches, is running
> correctly using as little cpu as can be expected for its little workload for
> 4+ days now. I think i can call it 'fix confirmed', as you already knew ;) ,
> previously the issue would have likely returned in this time period..
> 
> Ill keep it running for a few more days, and switch back to nbthread 3
> then.. Till next time ;)
> 
> Thanks again!
> 
> Best regards,
> 
> PiBa-NL (Pieter)
> 

Thanks for the feedback :)
Let us know how it goes with 3 threads !

Regards,

Olivier

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-20 Thread PiBa-NL


Hi Olivier,

Op 17-8-2018 om 14:51 schreef Willy Tarreau:

On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:

That is true, this one is not a bug, but a pessimization, by using the global
update_list which is more costly than the local one.

Patches attached to do as suggest.

Applied, thank you!
willy


Just a little update :)

The '1.9-dev0-48d92ee 2018/07/30'+your initial 2 patches, is running 
correctly using as little cpu as can be expected for its little workload 
for 4+ days now. I think i can call it 'fix confirmed', as you already 
knew ;) , previously the issue would have likely returned in this time 
period..


Ill keep it running for a few more days, and switch back to nbthread 3 
then.. Till next time ;)


Thanks again!

Best regards,

PiBa-NL (Pieter)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-17 Thread Willy Tarreau

On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:
> That is true, this one is not a bug, but a pessimization, by using the global
> update_list which is more costly than the local one.
> 
> Patches attached to do as suggest.

Applied, thank you!
willy

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-17 Thread Olivier Houchard

On Thu, Aug 16, 2018 at 07:31:17PM +0200, Willy Tarreau wrote:
> Both patches applied, thanks guys!
> 
> Olivier, I have a suggestion for this one :
> On Thu, Aug 16, 2018 at 07:17:07PM +0200, Olivier Houchard wrote:
> > From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard 
> > Date: Thu, 16 Aug 2018 19:03:02 +0200
> > Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
> >  nbthread == 1
> > 
> > Make sure we don't insert a task in the global run queue if nbthread == 1,
> > as, as an optimisation, we avoid reading from it if nbthread == 1.
> > ---
> >  src/task.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/task.c b/src/task.c
> > index de097baf7..e357bc169 100644
> > --- a/src/task.c
> > +++ b/src/task.c
> > @@ -395,7 +395,8 @@ void process_runnable_tasks()
> > state = HA_ATOMIC_AND(&t->state, ~TASK_RUNNING);
> > if (state)
> >  #ifdef USE_THREAD
> > -   __task_wakeup(t, (t->thread_mask == tid_bit) ?
> > +   __task_wakeup(t, (t->thread_mask == tid_bit ||
> > +   global.nbthread == 1) ?
> > &rqueue_local[tid] : &rqueue);
> >  #else
> 
> I'm pretty sure we'll get caught by this sort of stuff in the future
> again. I think it would be safer to proceed like this :
> 
>  __task_wakeup(t, ((t->thread_mask & all_threads_mask) == tid_bit)
> 
> It should always be valid regardless of the number of threads. The same
> could be done in task_wakeup().
> 

Sure, why not.

> I suspect we may have a similar issue with the fd cache applied to listeners
> here :
> 
>static inline void updt_fd_polling(const int fd)
>{
> if (fdtab[fd].thread_mask == tid_bit) {
> unsigned int oldupdt;
> 
> /* note: we don't have a test-and-set yet in hathreads */
> 
> 
> In this case the thread_mask might be larger than all_threads_mask and
> we might fail this test. Or we may see that when threads exit and the
> threads mask changes.
> 
> Just my two cents, both patches were applied anyway.
> 

That is true, this one is not a bug, but a pessimization, by using the global
update_list which is more costly than the local one.

Patches attached to do as suggest.

Regards,

Olivier

>From c61f3f76f7e546aac14d7db33552dba11ce71e12 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 13:36:08 +0200
Subject: [PATCH 1/2] MINOR: tasks: Don't special-case when nbthreads == 1

Instead of checking if nbthreads == 1, just and thread_mask with
all_threads_mask to know if we're supposed to add the task to the local or
the global runqueue.
---
 src/task.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/task.c b/src/task.c
index e357bc169..ce5b4f907 100644
--- a/src/task.c
+++ b/src/task.c
@@ -395,8 +395,7 @@ void process_runnable_tasks()
state = HA_ATOMIC_AND(&t->state, ~TASK_RUNNING);
if (state)
 #ifdef USE_THREAD
-   __task_wakeup(t, (t->thread_mask == tid_bit ||
-   global.nbthread == 1) ?
+   __task_wakeup(t, ((t->thread_mask & 
all_threads_mask) == tid_bit) ?
&rqueue_local[tid] : &rqueue);
 #else
__task_wakeup(t, &rqueue_local[tid]);
-- 
2.14.3

>From ea3b07fd2ed96b46ea107c9d862b5a1c27e728c2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 13:37:59 +0200
Subject: [PATCH 2/2] MINOR: fd cache: And the thread_mask with
 all_threads_mask.

When we choose to insert a fd in either the global or the local fd update list,
and the thread_mask against all_threads_mask before checking if it's tid_bit,
that way, if we run with nbthreads==1, we will always use the local list,
which is cheaper than the global one.
---
 include/proto/fd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/proto/fd.h b/include/proto/fd.h
index a4cee3220..a3ec5e854 100644
--- a/include/proto/fd.h
+++ b/include/proto/fd.h
@@ -109,7 +109,7 @@ void fd_rm_from_fd_list(volatile struct fdlist *list, int 
fd, int off);
  */
 static inline void updt_fd_polling(const int fd)
 {
-   if (fdtab[fd].thread_mask == tid_bit) {
+   if ((fdtab[fd].thread_mask & all_threads_mask) == tid_bit) {
unsigned int oldupdt;
 
/* note: we don't have a test-and-set yet in hathreads */
-- 
2.14.3

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Willy Tarreau

Both patches applied, thanks guys!

Olivier, I have a suggestion for this one :
On Thu, Aug 16, 2018 at 07:17:07PM +0200, Olivier Houchard wrote:
> From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
> From: Olivier Houchard 
> Date: Thu, 16 Aug 2018 19:03:02 +0200
> Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
>  nbthread == 1
> 
> Make sure we don't insert a task in the global run queue if nbthread == 1,
> as, as an optimisation, we avoid reading from it if nbthread == 1.
> ---
>  src/task.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/task.c b/src/task.c
> index de097baf7..e357bc169 100644
> --- a/src/task.c
> +++ b/src/task.c
> @@ -395,7 +395,8 @@ void process_runnable_tasks()
>   state = HA_ATOMIC_AND(&t->state, ~TASK_RUNNING);
>   if (state)
>  #ifdef USE_THREAD
> - __task_wakeup(t, (t->thread_mask == tid_bit) ?
> + __task_wakeup(t, (t->thread_mask == tid_bit ||
> + global.nbthread == 1) ?
>   &rqueue_local[tid] : &rqueue);
>  #else

I'm pretty sure we'll get caught by this sort of stuff in the future
again. I think it would be safer to proceed like this :

 __task_wakeup(t, ((t->thread_mask & all_threads_mask) == tid_bit)

It should always be valid regardless of the number of threads. The same
could be done in task_wakeup().

I suspect we may have a similar issue with the fd cache applied to listeners
here :

   static inline void updt_fd_polling(const int fd)
   {
if (fdtab[fd].thread_mask == tid_bit) {
unsigned int oldupdt;

/* note: we don't have a test-and-set yet in hathreads */

In this case the thread_mask might be larger than all_threads_mask and
we might fail this test. Or we may see that when threads exit and the
threads mask changes.

Just my two cents, both patches were applied anyway.

Willy

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard

Hi again,

On Thu, Aug 16, 2018 at 05:50:27PM +0200, Olivier Houchard wrote:
> Hi Pieter,
> 
> On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> > Hi List,
> > 
> > Anyone got a idea how to debug this further?
> > Currently its running at 100% again, any pointers to debug the process as
> > its running would be appreciated.
> > 
> > Or should i compile again from current master and 'hope' it doesn't return?
> > 
> > b.t.w. truss output is as follows:
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > 
> > Regards,
> > PiBa-NL (Pieter)
> 
> I'm interested in figuring that one out.
> From the look at it, it seems the scheduler thinks there's a task to be run,
> and so won't let the poller sleep in kevent().
> Can you
> - update to the latest master, even though I don't think any relevant fix
> was applied since Jul 30.
> - compile it with -O0, so that we can get meaningful informations from gdb.
> - When/if that happens again, getting a core, and send it to me with the
> haproxy binary, assuming there's no confidential information in that core,
> of course.
> 
> Thanks !
> 

So after discussing on IRC, I'm pretty sure I figured it out. The two
attached patches should fix it.

Thanks a lot !

Olivier
>From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:02 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
 nbthread == 1

Make sure we don't insert a task in the global run queue if nbthread == 1,
as, as an optimisation, we avoid reading from it if nbthread == 1.
---
 src/task.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/task.c b/src/task.c
index de097baf7..e357bc169 100644
--- a/src/task.c
+++ b/src/task.c
@@ -395,7 +395,8 @@ void process_runnable_tasks()
state = HA_ATOMIC_AND(&t->state, ~TASK_RUNNING);
if (state)
 #ifdef USE_THREAD
-   __task_wakeup(t, (t->thread_mask == tid_bit) ?
+   __task_wakeup(t, (t->thread_mask == tid_bit ||
+   global.nbthread == 1) ?
&rqueue_local[tid] : &rqueue);
 #else
__task_wakeup(t, &rqueue_local[tid]);
-- 
2.14.3

>From 7640aa3de3c9ad00fe82cda4a50351e46fc0bf48 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:50 +0200
Subject: [PATCH 2/2] BUG/MEDIUM: sessions: Don't use t->state.

In session_expire_embryonic(), don't use t->state, use the "state" argument
instead, as t->state has been cleaned before we're being called.
---
 src/session.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/session.c b/src/session.c
index 1d66b739f..d7d8544c7 100644
--- a/src/session.c
+++ b/src/session.c
@@ -388,7 +388,7 @@ static struct task *session_expire_embryonic(struct task 
*t, void *context, unsi
 {
struct session *sess = context;
 
-   if (!(t->state & TASK_WOKEN_TIMER))
+   if (!(state & TASK_WOKEN_TIMER))
return t;
 
session_kill_embryonic(sess);
-- 
2.14.3

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard

Hi Pieter,

On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> Hi List,
> 
> Anyone got a idea how to debug this further?
> Currently its running at 100% again, any pointers to debug the process as
> its running would be appreciated.
> 
> Or should i compile again from current master and 'hope' it doesn't return?
> 
> b.t.w. truss output is as follows:
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> 
> Regards,
> PiBa-NL (Pieter)

I'm interested in figuring that one out.
>From the look at it, it seems the scheduler thinks there's a task to be run,
and so won't let the poller sleep in kevent().
Can you
- update to the latest master, even though I don't think any relevant fix
was applied since Jul 30.
- compile it with -O0, so that we can get meaningful informations from gdb.
- When/if that happens again, getting a core, and send it to me with the
haproxy binary, assuming there's no confidential information in that core,
of course.

Thanks !

Olivier

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-15 Thread PiBa-NL


Hi List,

Anyone got a idea how to debug this further?
Currently its running at 100% again, any pointers to debug the process 
as its running would be appreciated.


Or should i compile again from current master and 'hope' it doesn't return?

b.t.w. truss output is as follows:
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)

Regards,
PiBa-NL (Pieter)

Op 8-8-2018 om 22:49 schreef PiBa-NL:

Hi List,

Ive got a weird issue.. and im not sure where/how to continue digging 
at the moment...


Using nbthread=1 nbproc=1, a few lua scripts, ssl offloading / http 
traffic.. Only a few connections < 100...


Sometimes haproxy starts using 100% cpu usage.. After a few days.. 
(Makes it hard to debug..)
Currently running with version 'HA-Proxy version 1.9-dev0-48d92ee 
2018/07/30'
Ive ran some commands against the haproxy socket like 'show activity', 
as can be seen there are lots of loops and tasks in just a second of 
time.


[2.4.3-RELEASE][root@pfsense_3]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show activity

show activity thread_id: 0
date_now: 1533754729.799405
loops: 828928664
wake_cache: 845396
wake_tasks: 827400248
wake_signal: 0
poll_exp: 828245644
poll_drop: 17451
poll_dead: 0
poll_skip: 0
fd_skip: 0
fd_lock: 0
fd_del: 0
conn_dead: 0
stream: 101147
empty_rq: 1242050
long_rq: 0

[2.4.3-RELEASE][root@pfsense_3]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show activity

show activity thread_id: 0
date_now: 1533754731.084664
loops: 829000230
wake_cache: 845398
wake_tasks: 827471812
wake_signal: 0
poll_exp: 828317210
poll_drop: 17452
poll_dead: 0
poll_skip: 0
fd_skip: 0
fd_lock: 0
fd_del: 0
conn_dead: 0
stream: 101149
empty_rq: 1242050
long_rq: 0

Other than that ive tried to attach gdb and step through / log some 
functions.. it passes through.


With a gdb 'command' file like bellow i created a little log of 
function breakpoints hit:

set pagination off
set height 0
set logging on
delete
rbreak haproxy.c:.
rbreak session.c:.
rbreak hlua.c:.
rbreak task.c:.
commands 1-9
cont
end
cont

Which got me a log with the following content.. as you can see it 
'seems' to be looping over the same task multiple times.., which might 
not even be a problem.??. the t=0x802545a60 expires and wakes up , 
expires and wakes up.?.:


Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haprox

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

8 matches

Site Navigation

Mail list logo

Footer information