On Thu, 19 Mar 2026 at 17:48, Boqun Feng <[email protected]> wrote: > > On Thu, Mar 19, 2026 at 05:33:50PM +0100, Sebastian Andrzej Siewior wrote: > > On 2026-03-19 09:27:59 [-0700], Boqun Feng wrote: > > > On Thu, Mar 19, 2026 at 10:03:15AM +0100, Sebastian Andrzej Siewior wrote: > > > > Please just use the queue_delayed_work() with a delay >0. > > > > > > > > > > That doesn't work since queue_delayed_work() with a positive delay will > > > still acquire timer base lock, and we can have BPF instrument with timer > > > base lock held i.e. calling call_srcu() with timer base lock. > > > > > > irq_work on the other hand doesn't use any locking. > > > > Could we please restrict BPF somehow so it does roam free? It is > > absolutely awful to have irq_work() in call_srcu() just because it > > might acquire locks. > > > > I agree it's not RCU's fault ;-) > > I guess it'll be difficult to restrict BPF, however maybe BPF can call > call_srcu() in irq_work instead? Or a more systematic defer mechanism > that allows BPF to defer any lock holding functions to a different > context. (We have a similar issue that BPF cannot call kfree_rcu() in > some cases IIRC). > > But we need to fix this in v7.0, so this short-term fix is still needed. >
I don't think this is an option, even longer term. We already do it when it's incorrect to invoke call_rcu() or any other API in a specific context (e.g., NMI, where we punt it using irq_work). However, the case reported in this thread is different. It was an existing user which worked fine before but got broken now. We were using call_rcu_tasks_trace() just fine in scx callbacks where rq->lock is held before, so the conversion underneath to call_srcu() should continue to remain transparent in this respect. > Regars, > Boqun > > > > Regards, > > > Boqun > > > > > Sebastian
