Hello,

I have dual socket server with the following processor:

[root@xrtmia-09-01 ~]# head /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD EPYC 7281 16-Core Processor
stepping        : 2

Which has highlighted a issue in the topology derivation logic. 
(Actually, it was discovered with Xen, but we share the same topology
infrastructure and the issue is also present with Linux).

There are a total of 64 threads in the system, made of two 32-thread
sockets.  The APIC IDs for this system are sparse - they are 0x0-0x3,
0x8-0xb, 0x10-0x13 etc, all the way up to 0x7b.

This is because the socket is made of 4 nodes with 4 cores each, but
space has been left in the layout for the maximum possible number of
APIC IDs.

In particular, CPUID 0x80000008:ecx reports 0x0000601f.  That is, an
APIC ID shift of 6 (reporting a maximum of 64 threads per socket), and
NC as 31 (reporting 32 threads per socket in the current configuration).

c->x86_max_cores is derived from NC and shifted once to exclude threads,
giving it a final value of 16 cores per socket.

Given the sparseness of the APIC IDs, it is unsafe to allocate an array
of c->x86_max_cores entries, then index it with c->cpu_core_id, as half
the cores in the system have a cpu_core_id greater than x86_max_cores. 
There is no logical core ID derived during boot which might be a safe to
use as an index.

Furthermore, the documentation indicates that these values are expected
to be per-package, while they are all actually per-socket (with up to 4
nodes per socket) in the EPYC case.

In the short term, my fix will be to not use c->x86_max_cores for sizing
the array, but I think this discovery warrants a discussion as to
whether the current topology infrastructure/expectations are suitable.

Thanks,

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to