On Sat, Mar 02, 2024 at 04:19:17PM -0800, Paul E. McKenney wrote: > On Sat, Mar 02, 2024 at 08:53:54PM +0100, Max Boone wrote: > > > > Thank you so much for the quick reply! > > > > I haven't filed a bug with Debian specifically as I'm running the > > linux kernel built and provided by Microsoft and Ubuntu as OS on top. If it > > helps with the search I'd gladly run Debian and file a bug there, but will > > still need to build my own kernel as WSL requires some modules (such as > > HyperV storage and sockets) to be built into the kernel instead (meaning > > =y) of as modules (meaning =m). > > Ah, if you built your own kernel, then you are your own distro as far > as kernel issues are concerned. ;-) > > Thanx, Paul > > > I'll stick to using the rcu list from here on to avoid spam, thanks again! > > > > On Saturday, March 02, 2024 20:43 CET, "Paul E. McKenney" > > <paul...@kernel.org> wrote: > > [ Adding Boqun and the rcu list on CC. ] > >
Thanks, Paul. > > On Sat, Mar 02, 2024 at 07:59:08PM +0100, Max Boone wrote: > > > > > > Dear Dr. McKenney, > > > > > > For a couple of years now I've been the sometimes frustrated owner of a > > > Microsoft Surface Pro X ARM64 device, which has been getting > > > progressively better as more vendors start targeting their builds at > > > ARM64 architectures but since the introduction of the device there have > > > been issues with the Windows Subsystem for Linux (not more than an > > > opinionated Hyper-V VM with extensive tooling) locking up and hanging. > > > > > > When this happens, traces like the following are dumped in the kernel > > > messages: > > > https://github.com/microsoft/WSL/issues/9454#issuecomment-1942222109 > > > > > > When watching your talk "Decoding Those Inscrutable RCU CPU Stall > > > Warnings" you mentioned one can feel free reaching out when bumping into > > > such issues. Building other kernel releases, switching off-and-on modules > > > and playing with the RCU grace period times so far don't seem to work for > > > me (or others in that thread). > > > > > > Anyways, I don't really know where to start looking and the call stacks > > > aren't very informative (to my eye) either. I'm hoping you might help me > > > find the direction to look for the root of this problem. > > > > I am assuming that you have filed a bug with the Debian folks, and before > > doing that, searched for similar bug reports. > > > > At first glance, this is because things were stuck here: > > > > [ 967.115632] clear_rseq_cs.isra.0+0x4c/0x60 > > [ 967.116433] do_notify_resume+0xf8/0xeb0 > > [ 967.116960] el0_svc+0x3c/0x50 > > [ 967.117537] el0t_64_sync_handler+0x9c/0x120 > > [ 967.118323] el0t_64_sync+0x158/0x15c > > > > So including these function names (clear_rseq_cs() and so on) in your > > search for similar bug reports would be a good idea. > > > > I am unfamiliar with that code. > > > > So I added Boqun because he works with Linux on HyperV as part of his > > day job and has a great deal of experience with RCU. He will likely > > have quite a number of questions for you including exact versions, > > Debian bug number, the results of your web search, and so on. He might > > also know an ARM person to get involved in this. > > > > Or maybe he knows the solution off the top of his head! > > I haven't seen this issue before, looks to me the stall is caused by clear_rseq_cs(), which is basically a put_user(), and I don't have an immediate theory, could you share the kernel repo and configuration you used, so that I can see if I can reproduce this? (Note I don't have the exact device as you do nor an ARM64 Windows system with the exact Windows build you are using). Regards, Boqun > > Thanx, Paul > > > > > > >