On Thu, Mar 19, 2026 at 06:02:44PM +0100, Sebastian Andrzej Siewior wrote: > On 2026-03-19 09:48:16 [-0700], Boqun Feng wrote: > > I agree it's not RCU's fault ;-) > > I never claimed it is anyone's fault. I just see that BPF should be able > to do things which kgdb would not be allowed to. > > > I guess it'll be difficult to restrict BPF, however maybe BPF can call > > call_srcu() in irq_work instead? Or a more systematic defer mechanism > > that allows BPF to defer any lock holding functions to a different > > context. (We have a similar issue that BPF cannot call kfree_rcu() in > > some cases IIRC). > > > > But we need to fix this in v7.0, so this short-term fix is still needed. > > I would prefer something substantial before we rush to get a quick fix > and move on. >
The quick fix here is really "restore the previous behavior of call_rcu_tasks_trace() in call_srcu()", and the future work will naturally happen: if the extra irq_work layer turns out calling issues to other SRCU users, then we need to fix them as well. Otherwise, there is no real need to avoid the extra irq_work hop. So I *think* it's OK ;-) Cleaning up all the ad-hoc irq_work usages in BPF is another thing, which can happen if we learn about all the cases and have a good design. > If we could get that irq_work() part only for BPF where it is required > then it would be already a step forward. > I'm happy to include that (i.e. using Qiang's suggestion) if Joel also agrees. > Long term it would be nice if we could avoid calling this while locks > are held. I think call_rcu() can't be used under rq/pi lock, but timers > should be fine. > > Is this rq/pi locking originating from "regular" BPF code or sched_ext? > I think if you have any tracepoint (include traceable functions) under rq/pi locking, then potentially BPF can call call_srcu() there. The root cause of the issues is that BPF is actually like a NMI unless the code is noinstr (There is a rabit hole about BPF calling call_srcu() while it's instrumenting call_srcu() itself). And the right way to solve all the issues is to have a general defer mechanism for BPF. Regards, Boqun > > Regars, > > Boqun > > > > > > Regards, > > > > Boqun > > Sebastian
