On Sat, Mar 02, 2024 at 04:19:17PM -0800, Paul E. McKenney wrote:
> On Sat, Mar 02, 2024 at 08:53:54PM +0100, Max Boone wrote:
> > 
> > Thank you so much for the quick reply!
> > 
> > ​​​​​​I haven't filed a bug with Debian specifically as I'm running the 
> > linux kernel built and provided by Microsoft and Ubuntu as OS on top. If it 
> > helps with the search I'd gladly run Debian and file a bug there, but will 
> > still need to build my own kernel as WSL requires some modules (such as 
> > HyperV storage and sockets) to be built into the kernel instead (meaning 
> > =y) of as modules (meaning =m).
> 
> Ah, if you built your own kernel, then you are your own distro as far
> as kernel issues are concerned.  ;-)
> 
>                                                       Thanx, Paul
> 
> > I'll stick to using the rcu list from here on to avoid spam, thanks again!
> > ​​​​​
> > On Saturday, March 02, 2024 20:43 CET, "Paul E. McKenney" 
> > <paul...@kernel.org> wrote:
> >  [ Adding Boqun and the rcu list on CC. ]
> > 

Thanks, Paul.

> > On Sat, Mar 02, 2024 at 07:59:08PM +0100, Max Boone wrote:
> > >
> > > Dear Dr. McKenney,
> > >
> > > For a couple of years now I've been the sometimes frustrated owner of a 
> > > Microsoft Surface Pro X ARM64 device, which has been getting 
> > > progressively better as more vendors start targeting their builds at 
> > > ARM64 architectures but since the introduction of the device there have 
> > > been issues with the Windows Subsystem for Linux (not more than an 
> > > opinionated Hyper-V VM with extensive tooling) locking up and hanging. 
> > >
> > > When this happens, traces like the following are dumped in the kernel 
> > > messages:
> > > https://github.com/microsoft/WSL/issues/9454#issuecomment-1942222109
> > >
> > > When watching your talk "Decoding Those Inscrutable RCU CPU Stall 
> > > Warnings" you mentioned one can feel free reaching out when bumping into 
> > > such issues. Building other kernel releases, switching off-and-on modules 
> > > and playing with the RCU grace period times so far don't seem to work for 
> > > me (or others in that thread).
> > >
> > > Anyways, I don't really know where to start looking and the call stacks 
> > > aren't very informative (to my eye) either. I'm hoping you might help me 
> > > find the direction to look for the root of this problem.
> > 
> > I am assuming that you have filed a bug with the Debian folks, and before
> > doing that, searched for similar bug reports.
> > 
> > At first glance, this is because things were stuck here:
> > 
> > [ 967.115632] clear_rseq_cs.isra.0+0x4c/0x60
> > [ 967.116433] do_notify_resume+0xf8/0xeb0
> > [ 967.116960] el0_svc+0x3c/0x50
> > [ 967.117537] el0t_64_sync_handler+0x9c/0x120
> > [ 967.118323] el0t_64_sync+0x158/0x15c
> > 
> > So including these function names (clear_rseq_cs() and so on) in your
> > search for similar bug reports would be a good idea.
> > 
> > I am unfamiliar with that code.
> > 
> > So I added Boqun because he works with Linux on HyperV as part of his
> > day job and has a great deal of experience with RCU. He will likely
> > have quite a number of questions for you including exact versions,
> > Debian bug number, the results of your web search, and so on. He might
> > also know an ARM person to get involved in this.
> > 
> > Or maybe he knows the solution off the top of his head!
> > 

I haven't seen this issue before, looks to me the stall is caused by
clear_rseq_cs(), which is basically a put_user(), and I don't have an
immediate theory, could you share the kernel repo and configuration you
used, so that I can see if I can reproduce this? (Note I don't have the
exact device as you do nor an ARM64 Windows system with the exact
Windows build you are using).

Regards,
Boqun

> > Thanx, Paul
> > 
> > 
> >  
> 

Reply via email to