Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-09-07 Thread Julien Charbon
Hi Ben, On 8/31/17 12:04 PM, Ben RUBSON wrote: >> On 28 Aug 2017, at 11:27, Julien Charbon wrote: >> >> On 8/28/17 10:25 AM, Ben RUBSON wrote: On 16 Aug 2017, at 11:02, Ben RUBSON wrote: > On 15 Aug 2017, at 23:33, Julien Charbon

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-31 Thread Ben RUBSON
> On 28 Aug 2017, at 11:27, Julien Charbon wrote: > > On 8/28/17 10:25 AM, Ben RUBSON wrote: >>> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: On 8/11/17 11:32 AM, Ben

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Julien Charbon
Hi Ben, On 8/28/17 10:25 AM, Ben RUBSON wrote: >> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: >> >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >>> >>> On 8/11/17 11:32 AM, Ben RUBSON wrote: > On 08 Aug 2017, at 13:33, Julien Charbon

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Ben RUBSON
> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: > >> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >> >> On 8/11/17 11:32 AM, Ben RUBSON wrote: On 08 Aug 2017, at 13:33, Julien Charbon wrote: On 8/8/17 10:31 AM, Hans

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-16 Thread Ben RUBSON
> On 15 Aug 2017, at 23:33, Julien Charbon wrote: > > On 8/11/17 11:32 AM, Ben RUBSON wrote: >>> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >>> >>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: Suggested fix attached. >>> >>> I agree we

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-15 Thread Julien Charbon
Hi Ben, On 8/11/17 11:32 AM, Ben RUBSON wrote: >> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >> >> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >>> >>> Suggested fix attached. >> >> I agree we your conclusion. Just for the record, more precisely this >> regression

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-11 Thread Ben RUBSON
> On 08 Aug 2017, at 13:33, Julien Charbon wrote: > > Hi, > > On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >> >> >> Suggested fix attached. > > I agree we your conclusion. Just for the record, more precisely this > regression seems to have been introduced with: > (...)

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 13:56, Slawa Olhovchenkov wrote: On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Julien Charbon
Hi, On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb =

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: > On 08/08/17 13:33, Slawa Olhovchenkov wrote: > > TW_RUNLOCK(V_tw_lock); > > and > > if (INP_INFO_TRY_WLOCK(_tcbinfo)) { > > > > `inp` can be invalidated, freed and this pointer may be invalid? > > If you look one line up

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? --HPS ___

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote: > Here is the conclusion: > > The following code is going in an infinite loop: > > > > for (;;) { > > TW_RLOCK(V_tw_lock); > > tw = TAILQ_FIRST(_twq_2msl); > > if (tw ==

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 10:31, Hans Petter Selasky wrote: > > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = {

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 10:06, Ben RUBSON wrote: On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb (kgdb) print

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: > > On 08/08/17 10:00, Ben RUBSON wrote: >> kgdb) print *twq_2msl.tqh_first >> $2 = { >> tw_inpcb = 0xf8031c570740, > > print *twq_2msl.tqh_first->tw_inpcb (kgdb) print *twq_2msl.tqh_first->tw_inpcb $3 = {

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 09:54, Hans Petter Selasky wrote: > > On 08/08/17 09:43, Ben RUBSON wrote: >> OK. >> I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ? >> Thank you ! >> Ben > > print twq_2msl > print *twq_2msl.tqh_first (kgdb) set print

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 09:43, Ben RUBSON wrote: OK. I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ? Thank you ! Ben print twq_2msl print *twq_2msl.tqh_first --HPS ___ freebsd-net@freebsd.org mailing list

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 09:38, Hans Petter Selasky wrote: > > On 08/08/17 09:37, Ben RUBSON wrote: >>> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: >>> >>> On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 09:38, Hans Petter Selasky wrote: > > On 08/08/17 09:37, Ben RUBSON wrote: >>> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: >>> >>> On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 09:37, Ben RUBSON wrote: On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. Are you using VIMAGE ? No, GENERIC FreeBSD 11.0 on a

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: > > On 08/08/17 09:04, Ben RUBSON wrote: >> "print V_twq_2msl" returns the following : >> No symbol "V_twq_2msl" in current context. > > Are you using VIMAGE ? No, GENERIC FreeBSD 11.0 on a physical server.

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 09:04, Ben RUBSON wrote: Here is vmstat -z : https://benrubson.github.io/vmstatz.log From what I can see there are not TCP allocation failures. This rules out one class of bugs: socket: 864, 2092652, 105, 371, 2318298, 0, 0 unpcb:

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. Are you using VIMAGE ? --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 08:51, Hans Petter Selasky wrote: > > On 08/08/17 01:52, Ben RUBSON wrote: >>> On 07 Aug 2017, at 19:57, Hans Petter Selasky wrote: >>> >>> On 08/07/17 19:19, Ben RUBSON wrote: > On 07 Aug 2017, at 18:19, Matt Joras

Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 01:52, Ben RUBSON wrote: On 07 Aug 2017, at 19:57, Hans Petter Selasky wrote: On 08/07/17 19:19, Ben RUBSON wrote: On 07 Aug 2017, at 18:19, Matt Joras wrote: On 08/07/2017 09:11, Hans Petter Selasky wrote: Hi, Try to enter "kgdb" and

Re: mlx4en, timer irq @100%...

2017-08-07 Thread Ben RUBSON
> On 07 Aug 2017, at 19:57, Hans Petter Selasky wrote: > > On 08/07/17 19:19, Ben RUBSON wrote: >>> On 07 Aug 2017, at 18:19, Matt Joras wrote: >>> >>> On 08/07/2017 09:11, Hans Petter Selasky wrote: Hi, Try to enter "kgdb" and run:

Re: mlx4en, timer irq @100%...

2017-08-07 Thread Ben RUBSON
> On 07 Aug 2017, at 18:19, Matt Joras wrote: > > On 08/07/2017 09:11, Hans Petter Selasky wrote: >> Hi, >> >> Try to enter "kgdb" and run: >> >> thread apply all bt >> >> Look for the callout function in question. >> >> --HPS >> > If you don't have a way to attach kgdb

Re: mlx4en, timer irq @100%...

2017-08-07 Thread Hans Petter Selasky
On 08/04/17 21:09, Ben RUBSON wrote: On 04 Aug 2017, at 19:42, Ben RUBSON wrote: Feel free to ask me whatever you need to investigate on this ! I let this (production :/) server in this state to have a chance to get interesting traces. Server no more in production, I

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Ben RUBSON
> On 04 Aug 2017, at 19:42, Ben RUBSON wrote: > > Feel free to ask me whatever you need to investigate on this ! > I let this (production :/) server in this state to have a chance to get > interesting traces. Server no more in production, I moved service to the standby

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Ben RUBSON
> On 04 Aug 2017, at 19:45, Hans Petter Selasky wrote: > > On 08/04/17 19:42, Ben RUBSON wrote: >>> On 04 Aug 2017, at 19:31, Hans Petter Selasky wrote: >>> >>> On 08/04/17 19:13, Ben RUBSON wrote: 12 100029 intr swi4: clock (0)

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Hans Petter Selasky
On 08/04/17 19:42, Ben RUBSON wrote: On 04 Aug 2017, at 19:31, Hans Petter Selasky wrote: On 08/04/17 19:13, Ben RUBSON wrote: 12 100029 intr swi4: clock (0) tcp_tw_2msl_scan pfslowtimo softclock_call_cc softclock intr_event_execute_handlers ithread_loop

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Ben RUBSON
> On 04 Aug 2017, at 19:31, Hans Petter Selasky wrote: > > On 08/04/17 19:13, Ben RUBSON wrote: >>12 100029 intr swi4: clock (0) tcp_tw_2msl_scan pfslowtimo >> softclock_call_cc softclock intr_event_execute_handlers ithread_loop >> fork_exit fork_trampoline

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Hans Petter Selasky
On 08/04/17 19:13, Ben RUBSON wrote: 12 100029 intr swi4: clock (0) tcp_tw_2msl_scan pfslowtimo softclock_call_cc softclock intr_event_execute_handlers ithread_loop fork_exit fork_trampoline Hi, Can you "procstat -ak" a few times and grep for swi4. If the entry above does

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Ben RUBSON
> On 04 Aug 2017, at 19:02, Hans Petter Selasky wrote: > > On 08/04/17 18:59, Ben RUBSON wrote: >> Hello, >> Not sure this is the right list, but as it seems related to a mlx4en >> device... >> # vmstat -i 1 >> (...) >> interrupt total rate >>

Re: mlx4en, timer irq @100%...

2017-08-04 Thread Hans Petter Selasky
On 08/04/17 18:59, Ben RUBSON wrote: Hello, Not sure this is the right list, but as it seems related to a mlx4en device... # vmstat -i 1 (...) interrupt total rate cpu23:timer 1198 1127 # top -P ALL (...) CPU 23: 0.0% user, 0.0%