On Mon, Mar 09, 2020 at 05:00:37PM +1000, David Gwynne wrote:
> ive been running multi-cpu/core openbsd VMs on esxi on top of amd
> epyc cpus, and have noticed for a long time now that for some reason
> openbsd decides that all the virtual cpus are threads on the one core.
> this is annoying, cos setting hw.smt=1 feels dirty and wrong.
> 
> i spent the weekend figuring this out, and came up with the following.
> 
> our current code assumes that some cpuid fields on recent CPUs contain
> information about package and core topologies which we base the package
> and core ids on. turns out we're pretty much alone in this assumption. if
> you're running on real hardware, they do tend to contain useful info,
> but virtual machines don't fill them in properly.
> 
> every other operating system seems to rely on the one mechanism across
> all families. specifically, the initial local apic id provides a
> globally unique identifier that has bits representing at least the
> package (socket) the logical thread is on, plus the core, and if
> supported, the smt id. multi socket cpus provide information about
> how many bits there are before you get to the ones identifying the
> package, so we use that everywhere.
> 
> only the most recent generation of cpu (zen) supports SMT, but the cpuid
> that provides that information is available on at least the previous
> gen. fortunately they made it so reading the number of threads per
> core falls back gracefully. this means we can default to there being
> one thread per core, and increase that if the cpuid is available and
> appropriately set.
> 
> this seems to fix my vmware problem, but also still seems to work on my
> physical epyc boxes. unfortunately i do not have any other amd systems
> up and running at the moment, so i would appreciate testing on any and
> every amd based amd64 system.
> 
> tests please. ok?

of course hrvoje found a bug. i cleaned up my diff too hard before
sending it out, and ended up reading nthreads from the wrong bits.

this works better on his epyc 2 box, and works right on my epyc 1, esxi
on epyc 1, and on an apu1.

Index: identcpu.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.113
diff -u -p -r1.113 identcpu.c
--- identcpu.c  14 Jun 2019 18:13:55 -0000      1.113
+++ identcpu.c  9 Mar 2020 12:38:59 -0000
@@ -824,37 +824,31 @@ cpu_topology(struct cpu_info *ci)
        apicid = (ebx >> 24) & 0xff;
 
        if (strcmp(cpu_vendor, "AuthenticAMD") == 0) {
+               uint32_t nthreads = 1; /* per core */
+               uint32_t thread_id; /* within a package */
+
                /* We need at least apicid at CPUID 0x80000008 */
                if (ci->ci_pnfeatset < 0x80000008)
                        goto no_topology;
 
-               if (ci->ci_pnfeatset >= 0x8000001e) {
-                       struct cpu_info *ci_other;
-                       CPU_INFO_ITERATOR cii;
+               CPUID(0x80000008, eax, ebx, ecx, edx);
+               core_bits = (ecx >> 12) & 0xf;
 
+               if (ci->ci_pnfeatset >= 0x8000001e) {
                        CPUID(0x8000001e, eax, ebx, ecx, edx);
-                       ci->ci_core_id = ebx & 0xff;
-                       ci->ci_pkg_id = ecx & 0xff;
-                       ci->ci_smt_id = 0;
-                       CPU_INFO_FOREACH(cii, ci_other) {
-                               if (ci != ci_other &&
-                                   ci_other->ci_core_id == ci->ci_core_id &&
-                                   ci_other->ci_pkg_id == ci->ci_pkg_id)
-                                       ci->ci_smt_id++;
-                       }
-               } else {
-                       CPUID(0x80000008, eax, ebx, ecx, edx);
-                       core_bits = (ecx >> 12) & 0xf;
-                       if (core_bits == 0)
-                               goto no_topology;
-                       /* So coreidsize 2 gives 3, 3 gives 7... */
-                       core_mask = (1 << core_bits) - 1;
-                       /* Core id is the least significant considering mask */
-                       ci->ci_core_id = apicid & core_mask;
-                       /* Pkg id is the upper remaining bits */
-                       ci->ci_pkg_id = apicid & ~core_mask;
-                       ci->ci_pkg_id >>= core_bits;
+                       nthreads = ((ebx >> 8) & 0xf) + 1;
                }
+
+               /* Shift the core_bits off to get at the pkg bits */
+               ci->ci_pkg_id = apicid >> core_bits;
+
+               /* Get rid of the package bits */
+               core_mask = (1 << core_bits) - 1;
+               thread_id = apicid & core_mask;
+
+               /* Cut logical thread_id into core id, and smt id in a core */
+               ci->ci_core_id = thread_id / nthreads;
+               ci->ci_smt_id = thread_id % nthreads;
        } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) {
                /* We only support leaf 1/4 detection */
                if (cpuid_level < 4)

Reply via email to