Re: [hwloc-users] AMD EPYC topology
Following up on this, Indeed with a recent kernel the error message goes away. The poor performance stays though (a few percent difference between 4.13 and 4.15rc5), and I'm at a loss as to whether it's related to MPI or not. I see oddities such as locking the job to the first 12 cores yield 100% greater performance than locking to the last 12 cores which I can't explain but I can only suspect are related to some kind of MPI cache partitioning issue. On Sat, Dec 30, 2017 at 8:59 AM, Brice Goglinwrote: > > > Le 29/12/2017 à 23:15, Bill Broadley a écrit : > > > > > > Very interesting, I was running parallel finite element code and was > seeing > > great performance compared to Intel in most cases, but on larger runs it > was 20x > > slower. This would explain it. > > > > Do you know which commit, or anything else that might help find any > related > > discussion? I tried a few google searches without luck. > > > > Is it specific to the 24-core? The slowdown I described happened on a > 32 core > > Epyc single socket as well as a dual socket 24 core AMD Epyc system. > > Hello > > Yes it's 24-core specific (that's the only core-count that doesn't have > 8-core per zeppelin module). > > The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c > > Brice > > > commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c > Author: Suravee Suthikulpanit > Date: Mon Jul 31 10:51:59 2017 +0200 > > x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask > > For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID > to calculate shared_cpu_map. However, APIC IDs are not guaranteed to > be contiguous for cores across different L3s (e.g. family17h system > w/ downcore configuration). This breaks the logic, and results in an > incorrect L3 shared_cpu_map. > > Instead, always use the previously calculated cpu_llc_shared_mask of > each CPU to derive the L3 shared_cpu_map. > > ___ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users > ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
Re: [hwloc-users] AMD EPYC topology
Le 29/12/2017 à 23:15, Bill Broadley a écrit : > > > Very interesting, I was running parallel finite element code and was seeing > great performance compared to Intel in most cases, but on larger runs it was > 20x > slower. This would explain it. > > Do you know which commit, or anything else that might help find any related > discussion? I tried a few google searches without luck. > > Is it specific to the 24-core? The slowdown I described happened on a 32 core > Epyc single socket as well as a dual socket 24 core AMD Epyc system. Hello Yes it's 24-core specific (that's the only core-count that doesn't have 8-core per zeppelin module). The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c Brice commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c Author: Suravee SuthikulpanitDate: Mon Jul 31 10:51:59 2017 +0200 x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID to calculate shared_cpu_map. However, APIC IDs are not guaranteed to be contiguous for cores across different L3s (e.g. family17h system w/ downcore configuration). This breaks the logic, and results in an incorrect L3 shared_cpu_map. Instead, always use the previously calculated cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map. ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
Re: [hwloc-users] AMD EPYC topology
Very interesting, I was running parallel finite element code and was seeing great performance compared to Intel in most cases, but on larger runs it was 20x slower. This would explain it. Do you know which commit, or anything else that might help find any related discussion? I tried a few google searches without luck. Is it specific to the 24-core? The slowdown I described happened on a 32 core Epyc single socket as well as a dual socket 24 core AMD Epyc system. ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
Re: [hwloc-users] AMD EPYC topology
Hello Make sure you use a very recent Linux kernel. There was a bug regarding L3 caches on 24-core Epyc processors which has been fixed in 4.14 and backported in 4.13.x (and maybe in distro kernels too). However, that would likely not cause huge performance difference unless your application heavily depends on the L3 cache. Brice Le 24 décembre 2017 12:46:01 GMT+01:00, Matthew Scuttera écrit : >I'm getting poor performance on OpenMPI tasks on a new AMD 7401P EPYC >server. I suspect hwloc providing a poor topology may have something to >do >with it as I receive this warning below when creating a job. >Requested data files available at http://static.skysight.io/out.tgz >Cheers, >Matthew > > > > >* hwloc 1.11.8 has encountered what looks like an error from the >operating >system. > >* > > >* L3 (cpuset 0x6060) intersects with NUMANode (P#0 cpuset >0x3f3f >nodeset 0x0001) without inclusion! > > >* Error occurred in topology.c line 1088 > > > >* > > > > >* The following FAQ entry in the hwloc documentation may help: > > >* What should I do when hwloc reports "operating system" warnings? > > >* Otherwise please report this error message to the hwloc user's >mailing >list, > >* along with the files generated by the hwloc-gather-topology script. > > > > > > >depth 0:1 Machine (type #1) > > > depth 1: 1 Package (type #3) > > > depth 2: 4 NUMANode (type #2) > > > depth 3: 10 L3Cache (type #4) > > >depth 4:24 L2Cache (type #4) > > > depth 5: 24 L1dCache (type #4) > > > depth 6: 24 L1iCache (type #4) > > > depth 7: 24 Core (type #5) > > > >depth 8:48 PU (type #6) > > > >Special depth -3: 12 Bridge (type #9) > > >Special depth -4: 9 PCI Device (type #10) > > >Special depth -5: 4 OS Device (type #11) ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users