Hello, I have dual socket server with the following processor:
[root@xrtmia-09-01 ~]# head /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD EPYC 7281 16-Core Processor stepping : 2 Which has highlighted a issue in the topology derivation logic. (Actually, it was discovered with Xen, but we share the same topology infrastructure and the issue is also present with Linux). There are a total of 64 threads in the system, made of two 32-thread sockets. The APIC IDs for this system are sparse - they are 0x0-0x3, 0x8-0xb, 0x10-0x13 etc, all the way up to 0x7b. This is because the socket is made of 4 nodes with 4 cores each, but space has been left in the layout for the maximum possible number of APIC IDs. In particular, CPUID 0x80000008:ecx reports 0x0000601f. That is, an APIC ID shift of 6 (reporting a maximum of 64 threads per socket), and NC as 31 (reporting 32 threads per socket in the current configuration). c->x86_max_cores is derived from NC and shifted once to exclude threads, giving it a final value of 16 cores per socket. Given the sparseness of the APIC IDs, it is unsafe to allocate an array of c->x86_max_cores entries, then index it with c->cpu_core_id, as half the cores in the system have a cpu_core_id greater than x86_max_cores. There is no logical core ID derived during boot which might be a safe to use as an index. Furthermore, the documentation indicates that these values are expected to be per-package, while they are all actually per-socket (with up to 4 nodes per socket) in the EPYC case. In the short term, my fix will be to not use c->x86_max_cores for sizing the array, but I think this discovery warrants a discussion as to whether the current topology infrastructure/expectations are suitable. Thanks, ~Andrew
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel