On Sun, Jun 05, 2022 at 03:57:39PM +0000, Visa Hankala wrote:
> On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> > On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > > Encountered the following panic:
> > > 
> > > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> > > file "/usr/src/sys/kern/kern_synch.c", line 373
> > > Stopped at      db_enter+0x10:  popq    %rbp
> > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > >  423109  57118     55         0x3          0    2  link
> > >  330695  30276     55         0x3          0    3  link
> > > * 46366  85501     55      0x1003  0x4080400    1  link
> > >  188803  85501     55      0x1003  0x4082000    0K link
> > > db_enter() at db_enter+0x10
> > > panic(ffffffff81f25d2b) at panic+0xbf
> > > __assert(ffffffff81f9a186,ffffffff81f372c8,175,ffffffff81f87c6c) at 
> > > __assert+0x25
> > > sleep_setup(ffff800022d64bf8,ffff800022d64c98,20,ffffffff81f66ac6,0) at 
> > > sleep_setup+0x1d8
> > > cond_wait(ffff800022d64c98,ffffffff81f66ac6) at cond_wait+0x46
> > > timeout_barrier(ffff8000228a28b0) at timeout_barrier+0x109
> > > timeout_del_barrier(ffff8000228a28b0) at timeout_del_barrier+0xa2
> > > sleep_finish(ffff800022d64d90,1) at sleep_finish+0x16d
> > > tsleep(ffffffff823a5130,120,ffffffff81f0b730,2) at tsleep+0xb2
> > > sys_nanosleep(ffff8000228a27f0,ffff800022d64ea0,ffff800022d64ef0) at 
> > > sys_nanosleep+0x12d
> > > syscall(ffff800022d64f60) at syscall+0x374
> > > 
> > > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > > soft-interrupt-driven timeouts could be deleted synchronously without
> > > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > > of the timeout.
> > > 
> > > It looks that with small adjustments timeout_del_barrier() can sleep
> > > in sleep_finish(). The management of run queues is not affected because
> > > the timeout clearing happens after it. As timeout_del_barrier() does not
> > > rely on a timeout or signal catching, there should be no risk of
> > > unbounded recursion or unwanted signal side effects within the sleep
> > > machinery. In a way, a sleep with a timeout is higher-level than
> > > one without.
> > 
> > I trust you on the analysis.  However this looks very fragile to me.
> > 
> > The use of timeout_del_barrier() which can sleep using the global sleep
> > queue is worrying me.  
> 
> I think the queue handling ends in sleep_finish() when SCHED_LOCK()
> is released. The timeout clearing is done outside of it.

That's ok.

> The extra sleeping point inside sleep_finish() is subtle. It should not
> matter in typical use. But is it permissible with the API? Also, if
> timeout_del_barrier() sleeps, the thread's priority can change.

What other options do we have at this point? Spin? Allocate the timeout
dynamically so sleep_finish doesn't have to wait for it and let the
handler clean up? How would you stop the timeout handler waking up the
thread if it's gone back to sleep again for some other reason?

Sleeping here is the least worst option.

As for timeout_del_barrier, if prio is a worry we can provide an
advanced version of it that lets you pass the prio in. I'd also
like to change timeout_barrier so it queues the barrier task at the
head of the pending lists rather than at the tail.

> Note that sleep_finish() already can take an additional nap when
> signal catching is enabled.
> 
> > > Note that endtsleep() can run and set P_TIMEOUT during
> > > timeout_del_barrier() when the thread is blocked in cond_wait().
> > > To avoid unnecessary atomic read-modify-write operations, the clearing
> > > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > > optimization at this point.
> > 
> > I agree this optimization seems unnecessary at the moment.
> > 
> > > While it should be possible to make the code use timeout_del() instead
> > > of timeout_del_barrier(), the outcome might not be outright better. For
> > > example, sleep_setup() and endtsleep() would have to coordinate so that
> > > a late-running timeout from previous sleep cycle would not disturb the
> > > new cycle.
> > 
> > So that's the price for not having to sleep in sleep_finish(), right?
> 
> That is correct. Some synchronization is needed in any case.
> 
> > > To test the barrier path reliably, I made the code call
> > > timeout_del_barrier() twice in a row. The second call is guaranteed
> > > to sleep. Of course, this is not part of the patch.
> > 
> > ok mpi@
> > 
> > > Index: kern/kern_synch.c
> > > ===================================================================
> > > RCS file: src/sys/kern/kern_synch.c,v
> > > retrieving revision 1.187
> > > diff -u -p -r1.187 kern_synch.c
> > > --- kern/kern_synch.c     13 May 2022 15:32:00 -0000      1.187
> > > +++ kern/kern_synch.c     5 Jun 2022 05:04:45 -0000
> > > @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
> > >   p->p_slppri = prio & PRIMASK;
> > >   TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
> > >  
> > > - KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > >   if (timo) {
> > > +         KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > >           sls->sls_timeout = 1;
> > >           timeout_add(&p->p_sleep_to, timo);
> > >   }
> > > @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
> > >  
> > >   if (sls->sls_timeout) {
> > >           if (p->p_flag & P_TIMEOUT) {
> > > -                 atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> > >                   error1 = EWOULDBLOCK;
> > >           } else {
> > > -                 /* This must not sleep. */
> > > +                 /* This can sleep. It must not use timeouts. */
> > >                   timeout_del_barrier(&p->p_sleep_to);
> > > -                 KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > >           }
> > > +         atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> > >   }
> > >  
> > >   /* Check if thread was woken up because of a unwind or signal */
> > > 
> > 
> 

Reply via email to