On Wed, Jan 28, 2026 at 05:55:01PM +0800, Kunwu Chan wrote:
> On 1/26/26 19:30, Shinichiro Kawasaki wrote:
> >  kernel: xfs/for-next, 51aba4ca399, v6.19-rc5+
> >      block device: dm-linear on HDD (non-zoned)
> >      xfs: zoned
> 
> I had a quick look at the attached logs. Across the different runs, the
> stall traces consistently show CPUs spending extended time in
> |mm_get_cid()|along the mm/sched context switch path.
> 
> This doesn’t seem to indicate an immediate RCU issue by itself, but it
> raises the question of whether context switch completion can be delayed
> for unusually long periods under these test configurations.

Thank you all!

Us RCU guys looked at this and it also looks to us that at least one
part of this issue is that mm_get_cid() is spinning.  This is being
investigated over here:

https://lore.kernel.org/all/877bt29cgv.ffs@tglx/
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/87y0lh96xo.ffs@tglx/

I have seen the static-key pattern called out by Dave Chinner when running
KASAN on large systems.  We worked around this by disabling KASAN's use
of static keys.  In case you were running KASAN in these tests.

                                                        Thanx, Paul

Reply via email to