Re: ppp panic: locking against myself

2021-11-28 Thread Theo de Raadt
Mark Kettenis  wrote:

> > Date: Sun, 28 Nov 2021 14:32:34 +0100
> > From: Martin Pieuchot 
> > 
> > On 08/09/21(Wed) 07:33, Anton Lindqvist wrote:
> > > On Tue, Sep 07, 2021 at 09:59:22PM -0500, j...@jcs.org wrote:
> > > > >Synopsis:  ppp panic: locking against myself
> > > > >Category:  kernel
> > > > >Environment:
> > > > System  : OpenBSD 6.9
> > > > Details : OpenBSD 6.9 (GENERIC) #2: Tue Aug 10 08:12:32 MDT 
> > > > 2021
> > > >  
> > > > r...@syspatch-69-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> > > > 
> > > > Architecture: OpenBSD.i386
> > > > Machine : i386
> > > > >Description:
> > > > Running pppd over a serial modem. (What year is it?)
> > > > 
> > > > Ran pkg_add vim--no_x11, came back a half hour later and it had
> > > > panicked while installing the last dependency.
> > > > 
> > > > com0: 2 silo overflows, 0 ibuf overflows
> > > > com0: 2 silo overflows, 0 ibuf overflows
> > > > com0: 2 silo overflows, 0 ibuf overflows
> > > > com0: 1 silo overflow, 0 ibuf overflows
> > > > com0: 4 silo overflows, 0 ibuf overflows
> > > > panic: mtx 0xd14b3054: locking against myself
> > > > Stopped at  db_enter+0x4:   popl%ebp
> > > > panic: mtx 0xd14b3054: locking against myself
> > > > Stopped at  db_enter+0x4:   popl%ebp
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND  
> > > >  
> > > > * 67354   3343  0 0x14000  0x2000  softnet  
> > > >   
> > > > db_enter() at db_enter+0x4
> > > > panic(d0bc8c2b) at panic+0xd3
> > > > mtx_enter(d14b3054) at mtx_enter+0x4e
> > > > task_add(d14b3040,d0df4d7c) at task_add+0x1d
> > > > ppp_restart(d1511800) at ppp_restart+0x3a
> > > > pppstart(d17d2200) at pppstart+0x55
> > > > comintr(d14da000) at comintr+0x4a5
> > > > intr_handler(f17d69d8,d14b3740) at intr_handler+0x18
> > > > Xintr_legacy4_untramp() at Xintr_legacy4_untramp+0xfb
> > > > taskq_next_work(d14b3040,f17d6a40) at taskq_next_work+0x8d
> > > > taskq_thread(d14b3040) at taskq_thread+0x43
> > > > https://www.openbsd.org/ddb.html describes the minimum info required in 
> > > > bug
> > > > reports.  Insufficient info makes it difficult to find and fix bugs.
> > > 
> > > Looks like it's trying to schedule a task while already handling one.
> > > The mutex associated with each net task queue have their IPL set to
> > > IPL_NET whereas IPL_TTY is probably needed here.
> > 
> > This sounds reasonable, or even IPL_HIGH because the same could happen
> > in any "real" interrupt handler, no?
> 
> No.  ppp(4) is special since it can be used for dialup connections
> through a serial port.  Interrupt handlers for normal network devices
> run at IP_NET.
> 

So much lore forgotten

This is from the deleted if_sl.c, which is where this problem is stated
nicely

 * Note that splimp() is used throughout to block both (tty) input
 * interrupts and network activity; thus, splimp must be >= spltty.

There were many versions of this hack.  soft tty interrupts were
supposed to help with solving it.

The network aspect has to be deferred in a software way.  It looks like
PPPDISC was overlooked again.



Re: ppp panic: locking against myself

2021-11-28 Thread Mark Kettenis
> Date: Sun, 28 Nov 2021 14:32:34 +0100
> From: Martin Pieuchot 
> 
> On 08/09/21(Wed) 07:33, Anton Lindqvist wrote:
> > On Tue, Sep 07, 2021 at 09:59:22PM -0500, j...@jcs.org wrote:
> > > >Synopsis:ppp panic: locking against myself
> > > >Category:kernel
> > > >Environment:
> > >   System  : OpenBSD 6.9
> > >   Details : OpenBSD 6.9 (GENERIC) #2: Tue Aug 10 08:12:32 MDT 2021
> > >
> > > r...@syspatch-69-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> > > 
> > >   Architecture: OpenBSD.i386
> > >   Machine : i386
> > > >Description:
> > >   Running pppd over a serial modem. (What year is it?)
> > > 
> > >   Ran pkg_add vim--no_x11, came back a half hour later and it had
> > >   panicked while installing the last dependency.
> > > 
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 1 silo overflow, 0 ibuf overflows
> > > com0: 4 silo overflows, 0 ibuf overflows
> > > panic: mtx 0xd14b3054: locking against myself
> > > Stopped atdb_enter+0x4:   popl%ebp
> > > panic: mtx 0xd14b3054: locking against myself
> > > Stopped atdb_enter+0x4:   popl%ebp
> > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > >
> > > * 67354   3343  0 0x14000  0x2000  softnet
> > > 
> > > db_enter() at db_enter+0x4
> > > panic(d0bc8c2b) at panic+0xd3
> > > mtx_enter(d14b3054) at mtx_enter+0x4e
> > > task_add(d14b3040,d0df4d7c) at task_add+0x1d
> > > ppp_restart(d1511800) at ppp_restart+0x3a
> > > pppstart(d17d2200) at pppstart+0x55
> > > comintr(d14da000) at comintr+0x4a5
> > > intr_handler(f17d69d8,d14b3740) at intr_handler+0x18
> > > Xintr_legacy4_untramp() at Xintr_legacy4_untramp+0xfb
> > > taskq_next_work(d14b3040,f17d6a40) at taskq_next_work+0x8d
> > > taskq_thread(d14b3040) at taskq_thread+0x43
> > > https://www.openbsd.org/ddb.html describes the minimum info required in 
> > > bug
> > > reports.  Insufficient info makes it difficult to find and fix bugs.
> > 
> > Looks like it's trying to schedule a task while already handling one.
> > The mutex associated with each net task queue have their IPL set to
> > IPL_NET whereas IPL_TTY is probably needed here.
> 
> This sounds reasonable, or even IPL_HIGH because the same could happen
> in any "real" interrupt handler, no?

No.  ppp(4) is special since it can be used for dialup connections
through a serial port.  Interrupt handlers for normal network devices
run at IP_NET.



Re: panic: kernel diagnostic assertion "!ISSET(rt->rt_flags, RTF_UP)" failed: file "/usr/src/sys/net/route.c", line 506

2021-11-28 Thread Martin Pieuchot
On 26/11/21(Fri) 17:08, Alexander Bluhm wrote:
> On Fri, Nov 26, 2021 at 12:22:39PM +0100, Claudio Jeker wrote:
> > Guess someone introduced a double rtfree() somewhere.
> > Only explenation for this panic.
> 
> Here is a report with OpenBSD 6.9.  Bug has been there for a long
> time.
> 
> https://marc.info/?l=openbsd-bugs=162435709704591=2

I wonder if there isn't a race with rtm_output() or a timeout.  It would
help a lot if one could monitor the routing messages to know which RTM_*
command is issued to the kernel prior to the panic.

If you could also figure out which route (dst, src, flags) is triggering
the panic.



Re: ppp panic: locking against myself

2021-11-28 Thread Martin Pieuchot
On 08/09/21(Wed) 07:33, Anton Lindqvist wrote:
> On Tue, Sep 07, 2021 at 09:59:22PM -0500, j...@jcs.org wrote:
> > >Synopsis:  ppp panic: locking against myself
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 6.9
> > Details : OpenBSD 6.9 (GENERIC) #2: Tue Aug 10 08:12:32 MDT 2021
> >  
> > r...@syspatch-69-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> > 
> > Architecture: OpenBSD.i386
> > Machine : i386
> > >Description:
> > Running pppd over a serial modem. (What year is it?)
> > 
> > Ran pkg_add vim--no_x11, came back a half hour later and it had
> > panicked while installing the last dependency.
> > 
> > com0: 2 silo overflows, 0 ibuf overflows
> > com0: 2 silo overflows, 0 ibuf overflows
> > com0: 2 silo overflows, 0 ibuf overflows
> > com0: 1 silo overflow, 0 ibuf overflows
> > com0: 4 silo overflows, 0 ibuf overflows
> > panic: mtx 0xd14b3054: locking against myself
> > Stopped at  db_enter+0x4:   popl%ebp
> > panic: mtx 0xd14b3054: locking against myself
> > Stopped at  db_enter+0x4:   popl%ebp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND  
> >  
> > * 67354   3343  0 0x14000  0x2000  softnet  
> >   
> > db_enter() at db_enter+0x4
> > panic(d0bc8c2b) at panic+0xd3
> > mtx_enter(d14b3054) at mtx_enter+0x4e
> > task_add(d14b3040,d0df4d7c) at task_add+0x1d
> > ppp_restart(d1511800) at ppp_restart+0x3a
> > pppstart(d17d2200) at pppstart+0x55
> > comintr(d14da000) at comintr+0x4a5
> > intr_handler(f17d69d8,d14b3740) at intr_handler+0x18
> > Xintr_legacy4_untramp() at Xintr_legacy4_untramp+0xfb
> > taskq_next_work(d14b3040,f17d6a40) at taskq_next_work+0x8d
> > taskq_thread(d14b3040) at taskq_thread+0x43
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> 
> Looks like it's trying to schedule a task while already handling one.
> The mutex associated with each net task queue have their IPL set to
> IPL_NET whereas IPL_TTY is probably needed here.

This sounds reasonable, or even IPL_HIGH because the same could happen
in any "real" interrupt handler, no?