On Thu, Mar 19, 2026 at 06:02:44PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-03-19 09:48:16 [-0700], Boqun Feng wrote:
> > I agree it's not RCU's fault ;-)
> 
> I never claimed it is anyone's fault. I just see that BPF should be able
> to do things which kgdb would not be allowed to.
> 
> > I guess it'll be difficult to restrict BPF, however maybe BPF can call
> > call_srcu() in irq_work instead? Or a more systematic defer mechanism
> > that allows BPF to defer any lock holding functions to a different
> > context. (We have a similar issue that BPF cannot call kfree_rcu() in
> > some cases IIRC).
> > 
> > But we need to fix this in v7.0, so this short-term fix is still needed.
> 
> I would prefer something substantial before we rush to get a quick fix
> and move on.
> 

The quick fix here is really "restore the previous behavior of
call_rcu_tasks_trace() in call_srcu()", and the future work will
naturally happen: if the extra irq_work layer turns out calling issues
to other SRCU users, then we need to fix them as well. Otherwise, there
is no real need to avoid the extra irq_work hop. So I *think* it's OK
;-)

Cleaning up all the ad-hoc irq_work usages in BPF is another thing,
which can happen if we learn about all the cases and have a good design.

> If we could get that irq_work() part only for BPF where it is required
> then it would be already a step forward.
> 

I'm happy to include that (i.e. using Qiang's suggestion) if Joel also
agrees.

> Long term it would be nice if we could avoid calling this while locks
> are held. I think call_rcu() can't be used under rq/pi lock, but timers
> should be fine.
> 
> Is this rq/pi locking originating from "regular" BPF code or sched_ext?
> 

I think if you have any tracepoint (include traceable functions) under
rq/pi locking, then potentially BPF can call call_srcu() there.

The root cause of the issues is that BPF is actually like a NMI unless
the code is noinstr (There is a rabit hole about BPF calling
call_srcu() while it's instrumenting call_srcu() itself). And the right
way to solve all the issues is to have a general defer mechanism for
BPF.

Regards,
Boqun

> > Regars,
> > Boqun
> > 
> > > > Regards,
> > > > Boqun
> 
> Sebastian

Reply via email to