..@open-mpi.org>
Sent: Wednesday, May 28, 2014 5:16 PM
Subject: Re: [hwloc-users] node configuration differs form hardware
Le 28/05/2014 15:46, Craig Kapfer a écrit :
Wait, I'm sorry, I must be missing something, please bear with me!
>
>By the way, your discussion of groups 1 and 2
Wait, I'm sorry, I must be missing something, please bear with me!
By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't
say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each.
It reports 4 sockets containing 2 NUMA nodes each containing 8 cores
[mailto:hwloc-users-boun...@open-mpi.org] On Behalf Of
Brice Goglin
Sent: Wednesday, May 28, 2014 7:01 AM
To: Craig Kapfer; Hardware locality user list
Subject: Re: [hwloc-users] node configuration differs form hardware
Le 28/05/2014 14:57, Craig Kapfer a écrit :
Hmm ... the slurm config
Le 28/05/2014 14:57, Craig Kapfer a écrit :
>
>
> Hmm ... the slurm config defines that all nodes have 4 sockets with 16
> cores per socket (which corresponds to the hardware--all nodes are the
> same). Slurm node config is as follows:
>
> NodeName=n[001-008] RealMemory=258452 Sockets=4
Hmm ... the slurm config defines that all nodes have 4 sockets with 16 cores
per socket (which corresponds to the hardware--all nodes are the same). Slurm
node config is as follows:
NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN
Le 28/05/2014 14:13, Craig Kapfer a écrit :
> Interesting, quite right, thank you very much. Yes these are AMD 6300
> series. Same kernel but these boxes seem to have different BIOS
> versions, direct from the factory, delivered in the same physical
> enclosure even! Some are AMI 3.5 and some
Interesting, quite right, thank you very much. Yes these are AMD 6300 series.
Same kernel but these boxes seem to have different BIOS versions, direct from
the factory, delivered in the same physical enclosure even! Some are AMI 3.5
and some are 3.0.
So slurm is then incorrectly parsing
Aside of the BIOS config, are you sure that you have the exact same BIOS
*version* in each node? (can check in /sys/class/dmi/id/bios_*) Same
Linux kernel too?
Also, recently we've seen somebody fix such problems by unplugging and
replugging some CPUs on the motherboard. Seems crazy but it