On Thu, Jan 29, 2026 at 09:46:12AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 29, 2026 at 05:27:04AM +0000, Shinichiro Kawasaki wrote:
> > On Jan 28, 2026 / 08:42, Paul E. McKenney wrote:
> > > On Wed, Jan 28, 2026 at 05:55:01PM +0800, Kunwu Chan wrote:
> > > > On 1/26/26 19:30, Shinichiro Kawasaki wrote:
> > > > >  kernel: xfs/for-next, 51aba4ca399, v6.19-rc5+
> > > > >      block device: dm-linear on HDD (non-zoned)
> > > > >      xfs: zoned
> > > > 
> > > > I had a quick look at the attached logs. Across the different runs, the
> > > > stall traces consistently show CPUs spending extended time in
> > > > |mm_get_cid()|along the mm/sched context switch path.
> > > > 
> > > > This doesn’t seem to indicate an immediate RCU issue by itself, but it
> > > > raises the question of whether context switch completion can be delayed
> > > > for unusually long periods under these test configurations.
> > > 
> > > Thank you all!
> > > 
> > > Us RCU guys looked at this and it also looks to us that at least one
> > > part of this issue is that mm_get_cid() is spinning.  This is being
> > > investigated over here:
> > > 
> > > https://lore.kernel.org/all/877bt29cgv.ffs@tglx/
> > > https://lore.kernel.org/all/[email protected]/
> > > https://lore.kernel.org/all/87y0lh96xo.ffs@tglx/
> > 
> > Knuwu, Paul and RCU experts, thank you very much. It's good to know that the
> > similar issue is already under investigation. I hope that a fix gets 
> > available
> > in timely manner.
> > 
> > > I have seen the static-key pattern called out by Dave Chinner when running
> > > KASAN on large systems.  We worked around this by disabling KASAN's use
> > > of static keys.  In case you were running KASAN in these tests.
> > 
> > As to KASAN, yes, I enable it in my test runs. I find three static-keys 
> > under
> > mm/kasan/*. I will think if they can be disabled in my test runs. Thanks.
> 
> There is a set of Kconfig options that disables static branches.  If you
> cannot find them quickly, please let me know and I can look them up.

And Thomas Gleixner posted an alleged fix to the CID issue here:

https://lore.kernel.org/lkml/[email protected]/

Please let him know whether or not it helps.

                                                        Thanx, Paul

Reply via email to