Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
Wait, I'm sorry, I must be missing something, please bear with me! By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each. It reports 4 sockets containing 2 NUMA nodes each containing 8 cores

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Kenneth A. Lloyd
You have found what we found (also in other areas of OpenMPI) – that Slurm has some “interesting” behaviors. If it was easy, anyone could do it … Ken == Kenneth A. Lloyd, Jr. CEO - Director, Systems Science Watt Systems Technologies Inc. From: hwloc-users

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Le 28/05/2014 14:57, Craig Kapfer a écrit : > > > Hmm ... the slurm config defines that all nodes have 4 sockets with 16 > cores per socket (which corresponds to the hardware--all nodes are the > same). Slurm node config is as follows: > > NodeName=n[001-008] RealMemory=258452 Sockets=4

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
Hmm ... the slurm config defines that all nodes have 4 sockets with 16 cores per socket (which corresponds to the hardware--all nodes are the same).   Slurm node config is as follows: NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Le 28/05/2014 14:13, Craig Kapfer a écrit : > Interesting, quite right, thank you very much. Yes these are AMD 6300 > series. Same kernel but these boxes seem to have different BIOS > versions, direct from the factory, delivered in the same physical > enclosure even! Some are AMI 3.5 and some

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
Interesting, quite right, thank you very much.  Yes these are AMD 6300 series.   Same kernel but these boxes seem to have different BIOS versions, direct from the factory, delivered in the same physical enclosure even!  Some are AMI 3.5 and some are 3.0. So slurm is then incorrectly parsing

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Aside of the BIOS config, are you sure that you have the exact same BIOS *version* in each node? (can check in /sys/class/dmi/id/bios_*) Same Linux kernel too? Also, recently we've seen somebody fix such problems by unplugging and replugging some CPUs on the motherboard. Seems crazy but it

[hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
We have a bunch of 64-core (quad-socket, 16 cores/socket) AMD servers and some of them are reporting the following error from slurm, which I gather gets its info from hwloc: May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: CPUs=64:64(hw) Boards=1:1(hw)