Trying to pick this thread up from the discussion "Debugging Windows HVM crashes on Ryzen 3xxx series CPUs." I'm trying to summarize what I see claimed, and my understanding of things, and am not necessarily speaking as an authority, so please correct me where I'm wrong.
Modern Windows guests (at least Windows 10 and Windows Server 2016) crash when running under Xen on AMD Ryzen 3xxx desktop-class cpus (but not the corresponding server cpus). This is true for all upstream releases of Xen (i.e., has nothing to do with Xen 4.13 in particular). Linux systems seem to work just fine. It seems that reverting patch ca2eee92df44 (from Xen 3.4!) fixes the issue for Steven. This would seem to indicate that Windows running on such systems is confused by the topology information presented to Xen. A "proper fix" for this would involve presenting a coherent, rational topology to guests, which in turn relies on the long-awaited CPUID infrastructure, all of which is way out of scope for being fixed by the 4.13 release. The revert in question *only* touches code in libxc; no Xen-side changes are required. One issue that was raised was to do with migration. But as upstream Xen doesn't have cpu leveling (?), it's already not possible to migrate *to* such a system from any other systems. The main worry then would be making sure that we deal properly with migrating *away* from such a system to future systems in future versions. But it's absolutely clear that we can't simply apply the change across the board; it must only be done on Ryzen 3xxx systems. Given that, we have a couple of approaches we could take: 1. Document that Xen 4.13 doesn't work with Ryzen, and punt the issue to 4.14. 2. Try to figure out exactly which changes allow Windows to work, and document that users should add those (temporarily) to xl.cfg files. (If setting these values is broken, this can be fixed.) 3. Have a libxl / xl flag indicating to apply the changes here as-is (or with the minimal changes necessary) 4. Have an environment variable the user can set which will cause the toolstack to do the above on versions that don't have a "proper" 5. If we can make outgoing migrations forward-compatible, then we could think about automatically applying this feature only on the affected cpus. Thoughts? I think the first step should be to identify the minimum set of changes that allow Windows to boot, and then see if we can't automatically apply the changes in a forward-compatible manner (#5). If we can't, then trying to get existing configurations so that you can specify the right bits is the next best option (#2); and having an environment variable would be the final fal-back. It shouldn't be terribly difficult, given the patch, to "bisect" the minimum changes required to enable Windows guests to boot. Who wants to pick that up? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel