On 25/04/2025 1:36 pm, Alejandro Vallejo wrote: > On Wed Apr 23, 2025 at 12:32 PM BST, Roger Pau Monne wrote: >> There are several errata on Intel regarding the usage of the MONITOR/MWAIT >> instructions, all having in common that stores to the monitored region >> might not wake up the CPU. >> >> Fix them by forcing the sending of an IPI for the affected models. >> >> The Ice Lake issue has been reproduced internally on XenServer hardware, >> and the fix does seem to prevent it. The symptom was APs getting stuck in >> the idle loop immediately after bring up, which in turn prevented the BSP >> from making progress. > Ugh... so this is what it was... Awesome having this madness fixed. > > Do you happen to know if Linux has a similar fix in place?
https://lore.kernel.org/lkml/20250421192205.7cc1a...@davehans-spike.ostc.intel.com/T/#u > >> This would happen before the watchdog was initialized, and hence the >> whole system would get stuck. > That's nasty. It was the misassumption that the watchdog was already > running that had me going in circles thinking it was a lockup rather > than a livelock. Oh, well. > > I believe the kudos for finally being able to reproduce this goes to > Frediano? Of course. The bit about the watchdog is a little bit of a red herring. The rcu_barrier() loop processes softirqs, so the watchdog wouldn't have fired even it had been set up. ~Andrew