Hi Brice

Here are answers to your questions,
and my latest attempt to solve the problem:

1) Kernel version:

The nodes with new motherboards (node14 and node16) have the
same kernel as the nodes with original motherboards (e.g. node15),
as they were cloned from the same node image:

[root@node14 ~]# uname -a
Linux node14 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[root@node16 ~]# uname -a
Linux node16 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[root@node15 ~]# uname -a
Linux node15 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

**

2) BIOS setup

Besides having different BIOS versions (AMI 3.5 in the new motherboards
vs. 3.0 in the old ones), there are slight diferences in the BIOS setup.
However, the setup is identical in node14, which had the hwloc problems,
and node16, which didn't have hwloc problems.  So I am inclined to think
that any differences in BIOS setup are unlikely to cause the problem.

The only item in the BIOS setting that I think may tangentially affect this is in Advanced->Processor and Clock Settings, where the new motherboards set:

PowerNow     = enabled
C-state mode = C6
Power Cap    = P-state 0
HPC mode     = disabled

whereas the old motherboards have

PowerNow=disabled
[and the other thee items above are hidden because of this setting]

Do you think this may cause the hwloc problem?

There are other minor differences in BIOS setup, that I will remove,
but probably not relevant (IDE config., Remote access, etc).

**

3) dmesg|grep SRAT

I attach the results.
They are identical on nodes 14 and 16, and differ from node15
only on the first line:

[gus@galera ~]$ diff node14_dmesg_grep_SRAT node15_dmesg_grep_SRAT
1c1
< ACPI: SRAT 00000000dfeaa700 00320 (v02 AMD AGESA 00000001 AMD 00000001)
---
> ACPI: SRAT 00000000dfeaa6f0 00320 (v02 AMD AGESA 00000001 AMD 00000001)

**

4) Cleaned/reseated processors, rebooted node14, ran hwloc-gather-topology again.

I opened node14, cleaned and re-seated the processors and heatsinks.
I can't see anything out of the ordinary there.

I rebooted the node and ran hwloc-gather-topology again.
This turn it didn't throw any errors on the terminal window,
which may be a good sign.

[root@node14 ~]# hwloc-gather-topology /tmp/`date +"%Y%m%d%H%M"`.$(uname -n)
Hierarchy gathered in /tmp/201403031639.node14.tar.bz2 and kept in /tmp/tmp.FM97IQCCKc/201403031639.node14/
Expected topology output stored in /tmp/201403031639.node14.output

I attach the diagnostic files.
Was the problem fixed with the processor re-seating, or is it still there?


You characterized the hwloc error before as this:

On 02/28/2014 03:23 PM, Brice Goglin wrote:
> OK, the problem is that node14's BIOS reports invalid NUMA info. It
> properly detects 2 sockets with 16-cores each. But it reports 2 NUMA
> nodes total, instead of 2 per socket (4 total). And hwloc warns because
> the cores contained in these NUMA nodes are incompatible with sockets:
> socket0 contains 0-15
> socket1 contains 16-23
> NUMA node0 contains 0-7+16-23
> NUMA node1 contains 8-15+24-31
>

After reseating the processors, when I run lstopo on node14,
now it shows four NUMA nodes:

NUMA node L#0 with cores 0-7,
NUMA node L#1 with cores 8-15,
NUMA node L#2 with cores 16-23
NUMA node L#3 with cores 24-31

Is the lstopo output all I need to check?
Or do I need to sweep /sys subdirectories to see if it is consistent?
Which /sys subdirectories should I check?
Or alternatively which files in the hwloc-gather-topology output?

Many thanks for your help,
Gus Correa


On 02/28/2014 03:53 PM, Brice Goglin wrote:
Le 28/02/2014 21:30, Gus Correa a écrit :
Hi Brice

The (pdf) output of lstopo shows one L1d (16k) for each core,
and one L1i (64k) for each *pair* of cores.
Is this wrong?

It's correct. AMD uses this "dual-core compute unit" where L2 and L1i
are shared but L1d isn't.

BTW, if there are any helpful web links, or references, or graphs
about the AMD cache structure, I would love to know.

I don't have a common place to find all information unfortunately. Cache
sizes is easy to find, but sharing isn't always specified. I often end
up reading early processor reviews on tech sites such as
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested

I am a bit skeptical that the BIOS is the culprit because I replaced
two motherboards (node14 and node16), and only node14 doesn't pass
the hwloc-gather-topology test.
Just in case, I attach the diagnostic for node16 also,

Hmmm that's very interesting. I assume you have the same kernels on all
these machines?
I have seen a couple cases where the kernel would change the topology
for a same version of the BIOS (for instance old kernels didn't know
that L1i is shared by pair of cores on your CPU), but I have never seen
a case where the kernel changes and *breaks* things.

Can you compare the output of "dmesg | grep SRAT" (or grep SRAT
/var/log/dmesg or kern.log or whatever on your distro) on these nodes?
SRAT is the hardware table that the kernel reads before filling sysfs.
You'll see
[ 0.000000] SRAT: PXM 0 -> APIC 0x07 -> Node 0
which basically means that CPU7 is close to NUMA node 0.

If you only see Nodes 0-1 on node14, and Nodes 0-3 on node15 and node16,
that would at least confirm that the bug is in the hardware.

One last idea could be a different BIOS config, and the BIOS being buggy
only in one of these configs. I've seen that with "interleaved" NUMA
memory config in Supermicro BIOS several years ago.

Brice



if you want to take a look. :)

FYI, the two new motherboards (nodes 14 and 16)
have a *newer* BIOS version (AMI, version 3.5, 11/25/2013)
then the one in the
original nodes (node15 below) (AMI, version 3.0, 08/31/2012).
I even thought of upgrading the old nodes' BIOSes ...
... but now I am not so sure about this ... :(

New motherboards:

[root@node14 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root@node14 ~]# dmidecode -s bios-version
3.5
[root@node14 ~]# dmidecode -s bios-release-date
11/25/2013

**

[root@node16 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root@node16 ~]# dmidecode -s bios-version
3.5
[root@node16 ~]# dmidecode -s bios-release-date
11/25/2013

**

Original motherboard:

[root@node15 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root@node15 ~]# dmidecode -s bios-version
3.0
[root@node15 ~]# dmidecode -s bios-release-date
08/31/2012

**

Thanks again for your help and advice.

Gus Correa

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

ACPI: SRAT 00000000dfeaa700 00320 (v02 AMD    AGESA    00000001 AMD  00000001)
SRAT: PXM 0 -> APIC 32 -> Node 0
SRAT: PXM 0 -> APIC 33 -> Node 0
SRAT: PXM 0 -> APIC 34 -> Node 0
SRAT: PXM 0 -> APIC 35 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 1 -> APIC 40 -> Node 1
SRAT: PXM 1 -> APIC 41 -> Node 1
SRAT: PXM 1 -> APIC 42 -> Node 1
SRAT: PXM 1 -> APIC 43 -> Node 1
SRAT: PXM 1 -> APIC 44 -> Node 1
SRAT: PXM 1 -> APIC 45 -> Node 1
SRAT: PXM 1 -> APIC 46 -> Node 1
SRAT: PXM 1 -> APIC 47 -> Node 1
SRAT: PXM 2 -> APIC 64 -> Node 2
SRAT: PXM 2 -> APIC 65 -> Node 2
SRAT: PXM 2 -> APIC 66 -> Node 2
SRAT: PXM 2 -> APIC 67 -> Node 2
SRAT: PXM 2 -> APIC 68 -> Node 2
SRAT: PXM 2 -> APIC 69 -> Node 2
SRAT: PXM 2 -> APIC 70 -> Node 2
SRAT: PXM 2 -> APIC 71 -> Node 2
SRAT: PXM 3 -> APIC 72 -> Node 3
SRAT: PXM 3 -> APIC 73 -> Node 3
SRAT: PXM 3 -> APIC 74 -> Node 3
SRAT: PXM 3 -> APIC 75 -> Node 3
SRAT: PXM 3 -> APIC 76 -> Node 3
SRAT: PXM 3 -> APIC 77 -> Node 3
SRAT: PXM 3 -> APIC 78 -> Node 3
SRAT: PXM 3 -> APIC 79 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-e0000000
SRAT: Node 0 PXM 0 100000000-820000000
SRAT: Node 1 PXM 1 820000000-1020000000
SRAT: Node 2 PXM 2 1020000000-1820000000
SRAT: Node 3 PXM 3 1820000000-201f000000
ACPI: SRAT 00000000dfeaa6f0 00320 (v02 AMD    AGESA    00000001 AMD  00000001)
SRAT: PXM 0 -> APIC 32 -> Node 0
SRAT: PXM 0 -> APIC 33 -> Node 0
SRAT: PXM 0 -> APIC 34 -> Node 0
SRAT: PXM 0 -> APIC 35 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 1 -> APIC 40 -> Node 1
SRAT: PXM 1 -> APIC 41 -> Node 1
SRAT: PXM 1 -> APIC 42 -> Node 1
SRAT: PXM 1 -> APIC 43 -> Node 1
SRAT: PXM 1 -> APIC 44 -> Node 1
SRAT: PXM 1 -> APIC 45 -> Node 1
SRAT: PXM 1 -> APIC 46 -> Node 1
SRAT: PXM 1 -> APIC 47 -> Node 1
SRAT: PXM 2 -> APIC 64 -> Node 2
SRAT: PXM 2 -> APIC 65 -> Node 2
SRAT: PXM 2 -> APIC 66 -> Node 2
SRAT: PXM 2 -> APIC 67 -> Node 2
SRAT: PXM 2 -> APIC 68 -> Node 2
SRAT: PXM 2 -> APIC 69 -> Node 2
SRAT: PXM 2 -> APIC 70 -> Node 2
SRAT: PXM 2 -> APIC 71 -> Node 2
SRAT: PXM 3 -> APIC 72 -> Node 3
SRAT: PXM 3 -> APIC 73 -> Node 3
SRAT: PXM 3 -> APIC 74 -> Node 3
SRAT: PXM 3 -> APIC 75 -> Node 3
SRAT: PXM 3 -> APIC 76 -> Node 3
SRAT: PXM 3 -> APIC 77 -> Node 3
SRAT: PXM 3 -> APIC 78 -> Node 3
SRAT: PXM 3 -> APIC 79 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-e0000000
SRAT: Node 0 PXM 0 100000000-820000000
SRAT: Node 1 PXM 1 820000000-1020000000
SRAT: Node 2 PXM 2 1020000000-1820000000
SRAT: Node 3 PXM 3 1820000000-201f000000
ACPI: SRAT 00000000dfeaa700 00320 (v02 AMD    AGESA    00000001 AMD  00000001)
SRAT: PXM 0 -> APIC 32 -> Node 0
SRAT: PXM 0 -> APIC 33 -> Node 0
SRAT: PXM 0 -> APIC 34 -> Node 0
SRAT: PXM 0 -> APIC 35 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 1 -> APIC 40 -> Node 1
SRAT: PXM 1 -> APIC 41 -> Node 1
SRAT: PXM 1 -> APIC 42 -> Node 1
SRAT: PXM 1 -> APIC 43 -> Node 1
SRAT: PXM 1 -> APIC 44 -> Node 1
SRAT: PXM 1 -> APIC 45 -> Node 1
SRAT: PXM 1 -> APIC 46 -> Node 1
SRAT: PXM 1 -> APIC 47 -> Node 1
SRAT: PXM 2 -> APIC 64 -> Node 2
SRAT: PXM 2 -> APIC 65 -> Node 2
SRAT: PXM 2 -> APIC 66 -> Node 2
SRAT: PXM 2 -> APIC 67 -> Node 2
SRAT: PXM 2 -> APIC 68 -> Node 2
SRAT: PXM 2 -> APIC 69 -> Node 2
SRAT: PXM 2 -> APIC 70 -> Node 2
SRAT: PXM 2 -> APIC 71 -> Node 2
SRAT: PXM 3 -> APIC 72 -> Node 3
SRAT: PXM 3 -> APIC 73 -> Node 3
SRAT: PXM 3 -> APIC 74 -> Node 3
SRAT: PXM 3 -> APIC 75 -> Node 3
SRAT: PXM 3 -> APIC 76 -> Node 3
SRAT: PXM 3 -> APIC 77 -> Node 3
SRAT: PXM 3 -> APIC 78 -> Node 3
SRAT: PXM 3 -> APIC 79 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-e0000000
SRAT: Node 0 PXM 0 100000000-820000000
SRAT: Node 1 PXM 1 820000000-1020000000
SRAT: Node 2 PXM 2 1020000000-1820000000
SRAT: Node 3 PXM 3 1820000000-201f000000
Machine (P#0 total=134199400KB DMIProductName=H8DGU 
DMIProductVersion=1234567890 DMIProductSerial=1234567890 
DMIProductUUID=534D4349-0002-F190-2500-F1902500637D DMIBoardVendor=Supermicro 
DMIBoardName=H8DGU DMIBoardVersion=1234567890 DMIBoardSerial=NM141S600018 
DMIBoardAssetTag="To Be Filled By O.E.M." DMIChassisVendor=Supermicro 
DMIChassisType=17 DMIChassisVersion=1234567890 DMIChassisSerial=1234567890 
DMIChassisAssetTag="To Be Filled By O.E.M." DMIBIOSVendor="American Megatrends 
Inc." DMIBIOSVersion="3.5       " DMIBIOSDate=11/25/2013 
DMISysVendor=Supermicro Backend=Linux LinuxCgroup=/)
  Socket L#0 (P#0 total=67106920KB CPUModel="AMD Opteron(tm) Processor 6376     
            ")
    NUMANode L#0 (P#0 local=33552488KB total=33552488KB)
      L3Cache L#0 (size=6144KB linesize=64 ways=64)
        L2Cache L#0 (size=2048KB linesize=64 ways=16)
          L1iCache L#0 (size=64KB linesize=64 ways=2)
            L1dCache L#0 (size=16KB linesize=64 ways=4)
              Core L#0 (P#0)
                PU L#0 (P#0)
            L1dCache L#1 (size=16KB linesize=64 ways=4)
              Core L#1 (P#1)
                PU L#1 (P#1)
        L2Cache L#1 (size=2048KB linesize=64 ways=16)
          L1iCache L#1 (size=64KB linesize=64 ways=2)
            L1dCache L#2 (size=16KB linesize=64 ways=4)
              Core L#2 (P#2)
                PU L#2 (P#2)
            L1dCache L#3 (size=16KB linesize=64 ways=4)
              Core L#3 (P#3)
                PU L#3 (P#3)
        L2Cache L#2 (size=2048KB linesize=64 ways=16)
          L1iCache L#2 (size=64KB linesize=64 ways=2)
            L1dCache L#4 (size=16KB linesize=64 ways=4)
              Core L#4 (P#4)
                PU L#4 (P#4)
            L1dCache L#5 (size=16KB linesize=64 ways=4)
              Core L#5 (P#5)
                PU L#5 (P#5)
        L2Cache L#3 (size=2048KB linesize=64 ways=16)
          L1iCache L#3 (size=64KB linesize=64 ways=2)
            L1dCache L#6 (size=16KB linesize=64 ways=4)
              Core L#6 (P#6)
                PU L#6 (P#6)
            L1dCache L#7 (size=16KB linesize=64 ways=4)
              Core L#7 (P#7)
                PU L#7 (P#7)
    NUMANode L#1 (P#1 local=33554432KB total=33554432KB)
      L3Cache L#1 (size=6144KB linesize=64 ways=64)
        L2Cache L#4 (size=2048KB linesize=64 ways=16)
          L1iCache L#4 (size=64KB linesize=64 ways=2)
            L1dCache L#8 (size=16KB linesize=64 ways=4)
              Core L#8 (P#0)
                PU L#8 (P#8)
            L1dCache L#9 (size=16KB linesize=64 ways=4)
              Core L#9 (P#1)
                PU L#9 (P#9)
        L2Cache L#5 (size=2048KB linesize=64 ways=16)
          L1iCache L#5 (size=64KB linesize=64 ways=2)
            L1dCache L#10 (size=16KB linesize=64 ways=4)
              Core L#10 (P#2)
                PU L#10 (P#10)
            L1dCache L#11 (size=16KB linesize=64 ways=4)
              Core L#11 (P#3)
                PU L#11 (P#11)
        L2Cache L#6 (size=2048KB linesize=64 ways=16)
          L1iCache L#6 (size=64KB linesize=64 ways=2)
            L1dCache L#12 (size=16KB linesize=64 ways=4)
              Core L#12 (P#4)
                PU L#12 (P#12)
            L1dCache L#13 (size=16KB linesize=64 ways=4)
              Core L#13 (P#5)
                PU L#13 (P#13)
        L2Cache L#7 (size=2048KB linesize=64 ways=16)
          L1iCache L#7 (size=64KB linesize=64 ways=2)
            L1dCache L#14 (size=16KB linesize=64 ways=4)
              Core L#14 (P#6)
                PU L#14 (P#14)
            L1dCache L#15 (size=16KB linesize=64 ways=4)
              Core L#15 (P#7)
                PU L#15 (P#15)
  Socket L#1 (P#1 total=67092480KB CPUModel="AMD Opteron(tm) Processor 6376     
            ")
    NUMANode L#2 (P#2 local=33554432KB total=33554432KB)
      L3Cache L#2 (size=6144KB linesize=64 ways=64)
        L2Cache L#8 (size=2048KB linesize=64 ways=16)
          L1iCache L#8 (size=64KB linesize=64 ways=2)
            L1dCache L#16 (size=16KB linesize=64 ways=4)
              Core L#16 (P#0)
                PU L#16 (P#16)
            L1dCache L#17 (size=16KB linesize=64 ways=4)
              Core L#17 (P#1)
                PU L#17 (P#17)
        L2Cache L#9 (size=2048KB linesize=64 ways=16)
          L1iCache L#9 (size=64KB linesize=64 ways=2)
            L1dCache L#18 (size=16KB linesize=64 ways=4)
              Core L#18 (P#2)
                PU L#18 (P#18)
            L1dCache L#19 (size=16KB linesize=64 ways=4)
              Core L#19 (P#3)
                PU L#19 (P#19)
        L2Cache L#10 (size=2048KB linesize=64 ways=16)
          L1iCache L#10 (size=64KB linesize=64 ways=2)
            L1dCache L#20 (size=16KB linesize=64 ways=4)
              Core L#20 (P#4)
                PU L#20 (P#20)
            L1dCache L#21 (size=16KB linesize=64 ways=4)
              Core L#21 (P#5)
                PU L#21 (P#21)
        L2Cache L#11 (size=2048KB linesize=64 ways=16)
          L1iCache L#11 (size=64KB linesize=64 ways=2)
            L1dCache L#22 (size=16KB linesize=64 ways=4)
              Core L#22 (P#6)
                PU L#22 (P#22)
            L1dCache L#23 (size=16KB linesize=64 ways=4)
              Core L#23 (P#7)
                PU L#23 (P#23)
    NUMANode L#3 (P#3 local=33538048KB total=33538048KB)
      L3Cache L#3 (size=6144KB linesize=64 ways=64)
        L2Cache L#12 (size=2048KB linesize=64 ways=16)
          L1iCache L#12 (size=64KB linesize=64 ways=2)
            L1dCache L#24 (size=16KB linesize=64 ways=4)
              Core L#24 (P#0)
                PU L#24 (P#24)
            L1dCache L#25 (size=16KB linesize=64 ways=4)
              Core L#25 (P#1)
                PU L#25 (P#25)
        L2Cache L#13 (size=2048KB linesize=64 ways=16)
          L1iCache L#13 (size=64KB linesize=64 ways=2)
            L1dCache L#26 (size=16KB linesize=64 ways=4)
              Core L#26 (P#2)
                PU L#26 (P#26)
            L1dCache L#27 (size=16KB linesize=64 ways=4)
              Core L#27 (P#3)
                PU L#27 (P#27)
        L2Cache L#14 (size=2048KB linesize=64 ways=16)
          L1iCache L#14 (size=64KB linesize=64 ways=2)
            L1dCache L#28 (size=16KB linesize=64 ways=4)
              Core L#28 (P#4)
                PU L#28 (P#28)
            L1dCache L#29 (size=16KB linesize=64 ways=4)
              Core L#29 (P#5)
                PU L#29 (P#29)
        L2Cache L#15 (size=2048KB linesize=64 ways=16)
          L1iCache L#15 (size=64KB linesize=64 ways=2)
            L1dCache L#30 (size=16KB linesize=64 ways=4)
              Core L#30 (P#6)
                PU L#30 (P#30)
            L1dCache L#31 (size=16KB linesize=64 ways=4)
              Core L#31 (P#7)
                PU L#31 (P#31)
depth 0:        1 Machine (type #1)
 depth 1:       2 Socket (type #3)
  depth 2:      4 NUMANode (type #2)
   depth 3:     4 L3Cache (type #4)
    depth 4:    16 L2Cache (type #4)
     depth 5:   16 L1iCache (type #4)
      depth 6:  32 L1dCache (type #4)
       depth 7: 32 Core (type #5)
        depth 8:        32 PU (type #6)
latency matrix between NUMANodes (depth 2) by logical indexes:
  index     0     1     2     3
      0 1.000 1.600 1.600 1.600
      1 1.600 1.000 1.600 1.600
      2 1.600 1.600 1.000 1.600
      3 1.600 1.600 1.600 1.000
Topology not from this system

Attachment: 201403031639.node14.tar.bz2
Description: application/bzip

Reply via email to