On Jan 28, 2026 / 08:42, Paul E. McKenney wrote:
> On Wed, Jan 28, 2026 at 05:55:01PM +0800, Kunwu Chan wrote:
> > On 1/26/26 19:30, Shinichiro Kawasaki wrote:
> > >  kernel: xfs/for-next, 51aba4ca399, v6.19-rc5+
> > >      block device: dm-linear on HDD (non-zoned)
> > >      xfs: zoned
> > 
> > I had a quick look at the attached logs. Across the different runs, the
> > stall traces consistently show CPUs spending extended time in
> > |mm_get_cid()|along the mm/sched context switch path.
> > 
> > This doesn’t seem to indicate an immediate RCU issue by itself, but it
> > raises the question of whether context switch completion can be delayed
> > for unusually long periods under these test configurations.
> 
> Thank you all!
> 
> Us RCU guys looked at this and it also looks to us that at least one
> part of this issue is that mm_get_cid() is spinning.  This is being
> investigated over here:
> 
> https://lore.kernel.org/all/877bt29cgv.ffs@tglx/
> https://lore.kernel.org/all/[email protected]/
> https://lore.kernel.org/all/87y0lh96xo.ffs@tglx/

Knuwu, Paul and RCU experts, thank you very much. It's good to know that the
similar issue is already under investigation. I hope that a fix gets available
in timely manner.

> I have seen the static-key pattern called out by Dave Chinner when running
> KASAN on large systems.  We worked around this by disabling KASAN's use
> of static keys.  In case you were running KASAN in these tests.

As to KASAN, yes, I enable it in my test runs. I find three static-keys under
mm/kasan/*. I will think if they can be disabled in my test runs. Thanks.

Reply via email to