On Wed, Mar 18, 2026 at 02:52:48PM -0700, Boqun Feng wrote: [...] > > Ah so it is an ABBA deadlock, not a ABA self-deadlock. I guess this is a > > different issue, from the NMI issue? It is more of an issue of calling > > call_srcu API with scheduler locks held. > > > > Something like below I think: > > > > CPU A (BPF tracepoint) CPU B (concurrent call_srcu) > > ---------------------------- ------------------------------------ > > [1] holds &rq->__lock > > [2] > > -> call_srcu > > -> srcu_gp_start_if_needed > > -> srcu_funnel_gp_start > > -> spin_lock_irqsave_ssp_content... > > -> holds srcu locks > > > > [4] calls call_rcu_tasks_trace() [5] srcu_funnel_gp_start (cont..) > > -> queue_delayed_work > > -> call_srcu() -> __queue_work() > > -> srcu_gp_start_if_needed() -> wake_up_worker() > > -> srcu_funnel_gp_start() -> try_to_wake_up() > > -> spin_lock_irqsave_ssp_contention() [6] WANTS rq->__lock > > -> WANTS srcu locks > > I see, we can also have a self deadlock even without CPU B, when CPU A > is going to try_to_wake_up() the a worker on the same CPU. > > An interesting observation is that the deadlock can be avoided in > queue_delayed_work() uses a non-zero delay, that means a timer will be > armed instead of acquiring the rq lock. > > (But I guess BPF also wants to run with timer base lock held, right? ;-) > ;-) ;-)). > > /me going to check Paul's second fix at rcu/dev. >
Oh I mis-read, there is no second fix, just a rcutorture changes. Let me see if I can find out a quick fix ;-) Regards, Boqun > Regards, > Boqun > > > > > If I understand this, this looks like an issue that can happen independent > > of the conversion of the spin locks. > > > > thanks, > > > > -- > > Joel Fernandes
