Re: [hwloc-users] node configuration differs form hardware

2014-05-29 Thread Craig Kapfer
Thanks very much.  G2 BIOS is more recent (v3.50 AMI BIOS).  I will upgrade and 
see how it goes.  

Thanks again,

Craig



 From: Brice Goglin <brice.gog...@inria.fr>
To: Craig Kapfer <c_kap...@yahoo.com>; Hardware locality user list 
<hwloc-us...@open-mpi.org> 
Sent: Wednesday, May 28, 2014 5:16 PM
Subject: Re: [hwloc-users] node configuration differs form hardware
 


Le 28/05/2014 15:46, Craig Kapfer a écrit :

Wait, I'm sorry, I must be missing something, please bear with me!
>
>By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't 
>say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each. 
>It reports 4 sockets containing 2 NUMA nodes each containing 8 cores each, and 
>that's likely what you have here (AMD Opteron 6300 or 6200 processors?).
>Output of lstopo from nodes of both BIOS versions seem to indicate that there 
>are 4 sockets, but slurm is reporting on numa nodes, no?  If not, which 
>version of the BIOS is correct? 
>
Ah right, I misread group1. Group1 reports 4 sockets = 4 numa nodes
containing 16 cores each. That's wrong. There are 2 NUMA nodes in
each socket, and 8 cores in each NUMA nodes (instead of 1 NUMA node
in each socket, and 16 core in each NUMA node).

Slurm is indeed saying something wrong. I wonder if it confuses NUMA
nodes and sockets, I can't find anything like this in Google. On
Intel that doesn't matter. On AMD it does.

Anyway G2 is correct, so its BIOS may be less buggy than G1. Which
BIOS is more recent? Try updating the BIOS on one G1 machines to see
if that fixes the issue.


Brice

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
Wait, I'm sorry, I must be missing something, please bear with me!

By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't 
say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each. 
It reports 4 sockets containing 2 NUMA nodes each containing 8 cores each, and 
that's likely what you have here (AMD Opteron 6300 or 6200 processors?).
Output of lstopo from nodes of both BIOS versions seem to indicate that there 
are 4 sockets, but slurm is reporting on numa nodes, no?  If not, which version 
of the BIOS is correct?  


SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
>>>This message indicates that slurm believes the hardware actually has 8 
>>>sockets and 8 cores per socket no?
>>>

Complete lstopo info attached for clarity for group 1 and 2.  

If there is a problem with the BIOS I'd like to correct it so please let me 
know if the BIOS is actually at fault here.  

Thanks!

Craig


On Wednesday, May 28, 2014 4:01 PM, Brice Goglin  wrote:
 


Le 28/05/2014 14:57, Craig Kapfer a écrit :


>
>
>Hmm ... the slurm config defines that all nodes have 4
sockets with 16 cores per socket (which corresponds to
the hardware--all nodes are the same).   Slurm node
config is as follows:
>
>
>NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16 
>ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]
>
>
>But we get this error--so I suspect it's a parsing error on the slurm side?
No, it's slurm properly reading info from hwloc, but that info
doesn't match the actual hardware because the BIOS is buggy.


BriceMachine (128GB)
  NUMANode L#0 (P#0 32GB) + Socket L#0
L3 L#0 (6144KB)
  L2 L#0 (2048KB) + L1i L#0 (64KB)
L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
  L2 L#1 (2048KB) + L1i L#1 (64KB)
L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
  L2 L#2 (2048KB) + L1i L#2 (64KB)
L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
  L2 L#3 (2048KB) + L1i L#3 (64KB)
L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
L3 L#1 (6144KB)
  L2 L#4 (2048KB) + L1i L#4 (64KB)
L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
  L2 L#5 (2048KB) + L1i L#5 (64KB)
L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
  L2 L#6 (2048KB) + L1i L#6 (64KB)
L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
  L2 L#7 (2048KB) + L1i L#7 (64KB)
L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
  NUMANode L#1 (P#2 32GB) + Socket L#1
L3 L#2 (6144KB)
  L2 L#8 (2048KB) + L1i L#8 (64KB)
L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
  L2 L#9 (2048KB) + L1i L#9 (64KB)
L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
  L2 L#10 (2048KB) + L1i L#10 (64KB)
L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
  L2 L#11 (2048KB) + L1i L#11 (64KB)
L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
L3 L#3 (6144KB)
  L2 L#12 (2048KB) + L1i L#12 (64KB)
L1d L#24 (16KB) + Core L#24 + PU L#24 (P#24)
L1d L#25 (16KB) + Core L#25 + PU L#25 (P#25)
  L2 L#13 (2048KB) + L1i L#13 (64KB)
L1d L#26 (16KB) + Core L#26 + PU L#26 (P#26)
L1d L#27 (16KB) + Core L#27 + PU L#27 (P#27)
  L2 L#14 (2048KB) + L1i L#14 (64KB)
L1d L#28 (16KB) + Core L#28 + PU L#28 (P#28)
L1d L#29 (16KB) + Core L#29 + PU L#29 (P#29)
  L2 L#15 (2048KB) + L1i L#15 (64KB)
L1d L#30 (16KB) + Core L#30 + PU L#30 (P#30)
L1d L#31 (16KB) + Core L#31 + PU L#31 (P#31)
  NUMANode L#2 (P#4 32GB) + Socket L#2
L3 L#4 (6144KB)
  L2 L#16 (2048KB) + L1i L#16 (64KB)
L1d L#32 (16KB) + Core L#32 + PU L#32 (P#32)
L1d L#33 (16KB) + Core L#33 + PU L#33 (P#33)
  L2 L#17 (2048KB) + L1i L#17 (64KB)
L1d L#34 (16KB) + Core L#34 + PU L#34 (P#34)
L1d L#35 (16KB) + Core L#35 + PU L#35 (P#35)
  L2 L#18 (2048KB) + L1i L#18 (64KB)
L1d L#36 (16KB) + Core L#36 + PU L#36 (P#36)
L1d L#37 (16KB) + Core L#37 + PU L#37 (P#37)
  L2 L#19 (2048KB) + L1i L#19 (64KB)
L1d L#38 (16KB) + Core L#38 + PU L#38 (P#38)
L1d L#39 (16KB) + Core L#39 + PU L#39 (P#39)
L3 L#5 (6144KB)
  L2 L#20 (2048KB) + L1i L#20 (64KB)
L1d L#40 (16KB) + Core L#40 + PU L#40 (P#40)
L1d L#41 (16KB) + Core L#41 + PU L#41 (P#41)
  L2 L#21 (2048KB) + L1i L#21 (64KB)
L1d L#42 (16KB) + Core 

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Kenneth A. Lloyd
You have found what we found (also in other areas of OpenMPI) – that Slurm
has some “interesting” behaviors.

 

If it was easy, anyone could do it …

 

Ken

==

Kenneth A. Lloyd, Jr.

CEO - Director, Systems Science

Watt Systems Technologies Inc.

 

 

From: hwloc-users [mailto:hwloc-users-boun...@open-mpi.org] On Behalf Of
Brice Goglin
Sent: Wednesday, May 28, 2014 7:01 AM
To: Craig Kapfer; Hardware locality user list
Subject: Re: [hwloc-users] node configuration differs form hardware

 

Le 28/05/2014 14:57, Craig Kapfer a écrit :

 


Hmm ... the slurm config defines that all nodes have 4 sockets with 16 cores
per socket (which corresponds to the hardware--all nodes are the same).
Slurm node config is as follows:

 

NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]

 

But we get this error--so I suspect it's a parsing error on the slurm side?


No, it's slurm properly reading info from hwloc, but that info doesn't match
the actual hardware because the BIOS is buggy.

Brice



Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Le 28/05/2014 14:57, Craig Kapfer a écrit :
>
>
> Hmm ... the slurm config defines that all nodes have 4 sockets with 16
> cores per socket (which corresponds to the hardware--all nodes are the
> same).   Slurm node config is as follows:
>
> NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16
> ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]
>
> But we get this error--so I suspect it's a parsing error on the slurm
> side?

No, it's slurm properly reading info from hwloc, but that info doesn't
match the actual hardware because the BIOS is buggy.

Brice



Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer



Hmm ... the slurm config defines that all nodes have 4 sockets with 16 cores 
per socket (which corresponds to the hardware--all nodes are the same).   Slurm 
node config is as follows:

NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16 
ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]

But we get this error--so I suspect it's a parsing error on the slurm side?:

May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: 
CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw) 
ThreadsPerCore=1:1(hw)

Craig

On Wednesday, May 28, 2014 3:20 PM, Brice Goglin  wrote:
 


Le 28/05/2014 14:13, Craig Kapfer a écrit :

Interesting, quite right, thank you very much.  Yes these are AMD 6300 series.  
Same kernel but these boxes seem to have different BIOS versions, direct from 
the factory, delivered in the same physical enclosure even!  Some are AMI 3.5 
and some are 3.0.
>
>
>So slurm is then incorrectly parsing correct output from lstopo to generate 
>this message?
>
>May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: 
>CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw) 
>ThreadsPerCore=1:1(hw)
It's saying "there are 8 sockets with 8 cores in hw instead of 4
sockets with 16 cores each in config" ?
My feeling is that Slurm just has a (valid) config that says group2
while it was running on group1 in this case.

Brice




Thanks much,
>
>
>
>Craig
>
>
>
>On Wednesday, May 28, 2014 1:39 PM, Brice Goglin  wrote:
> 
>
>
>Aside of the BIOS config, are you sure that you have the exact same BIOS 
>*version* in each node? (can check in /sys/class/dmi/id/bios_*) Same Linux 
>kernel too?
>
>Also, recently we've seen somebody fix such
  problems by unplugging and replugging some CPUs on
  the motherboard. Seems crazy but it happened for
  real...
>
>By the way, your discussion of groups 1 and 2
  below is wrong. Group 2 doesn't say that NUMA node
  == socket, and it doesn't report 8 sockets of 8
  cores each. It reports 4 sockets containing 2 NUMA
  nodes each containing 8 cores each, and that's
  likely what you have here (AMD Opteron 6300 or
  6200 processors?).
>
>Brice
>
>
>
>
>Le 28/05/2014 12:27, Craig Kapfer a écrit :
>
>We have a bunch of 64-core (quad-socket, 16 cores/socket) AMD servers and some 
>of them are reporting the following error from slurm, which I gather gets its 
>info from hwloc: 
>>May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: 
>>CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw) 
>>ThreadsPerCore=1:1(hw)
>>All nodes have the exact same CPUs, motherboards and OS (PXE booted from the 
>>same master image even).  The bios settings between nodes also look the same. 
>> The nodes only differ in the amount of memory and number of DIMMs.  
>>There are two sets of nodes with different output from lstopo: Group 1 
>>(correct): reporting 4 sockets with 16 cores per socket
Group 2 (incorrect): reporting 8 sockets with 8 cores per socket Group 2 seems 
to be (incorrectly?) taking numanodes as sockets. The output of lstopo is 
slightly different in the two groups, note the extra Socket layer for group 2: 
Group 1: Machine (128GB) NUMANode L#0 (P#0 32GB) + Socket L#0 #16 cores listed 
 NUMANode L#1 (P#2 32GB) + Socket L#1 #16 cores listed etc
 Group 2: Machine (256GB) Socket L#0 (64GB) NUMANode L#0 (P#0 32GB) + L3 
L#0 (6144KB) # 8 cores listed  NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB) 
# 8 cores listed  Socket L#1 (64GB) NUMANode L#2 (P#2 32GB) + L3 L#2 
(6144KB) # 8 cores listed etc
 The group 2 reporting doesn't match our hardware, at least as far as 
sockets and cores per socket goes--is there a reason other than the memory 
configuration that could cause this? 
>>Thanks,
>>Craig
>>
>>
>>
>>
>>___
hwloc-users mailing list hwloc-us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Le 28/05/2014 14:13, Craig Kapfer a écrit :
> Interesting, quite right, thank you very much.  Yes these are AMD 6300
> series.  Same kernel but these boxes seem to have different BIOS
> versions, direct from the factory, delivered in the same physical
> enclosure even!  Some are AMI 3.5 and some are 3.0.
>
> So slurm is then incorrectly parsing correct output from lstopo to
> generate this message?
>>
>> May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from 
>> hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw)
>>  CoresPerSocket=16:8(hw) ThreadsPerCore=1:1(hw)
>>

It's saying "there are 8 sockets with 8 cores in hw instead of 4 sockets
with 16 cores each in config" ?
My feeling is that Slurm just has a (valid) config that says group2
while it was running on group1 in this case.

Brice


> Thanks much,
>
> Craig
>
>
> On Wednesday, May 28, 2014 1:39 PM, Brice Goglin
>  wrote:
>
>
> Aside of the BIOS config, are you sure that you have the exact same
> BIOS *version* in each node? (can check in /sys/class/dmi/id/bios_*)
> Same Linux kernel too?
>
> Also, recently we've seen somebody fix such problems by unplugging and
> replugging some CPUs on the motherboard. Seems crazy but it happened
> for real...
>
> By the way, your discussion of groups 1 and 2 below is wrong. Group 2
> doesn't say that NUMA node == socket, and it doesn't report 8 sockets
> of 8 cores each. It reports 4 sockets containing 2 NUMA nodes each
> containing 8 cores each, and that's likely what you have here (AMD
> Opteron 6300 or 6200 processors?).
>
> Brice
>
>
>
> Le 28/05/2014 12:27, Craig Kapfer a écrit :
>> We have a bunch of 64-core (quad-socket, 16 cores/socket) AMD servers and 
>> some of them are reporting the following error from slurm, which I gather 
>> gets its info from hwloc:
>>
>> May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from 
>> hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) 
>> CoresPerSocket=16:8(hw) ThreadsPerCore=1:1(hw)
>>
>> All nodes have the exact same CPUs, motherboards and OS (PXE booted from the 
>> same master image even).  The bios settings between nodes also look the 
>> same.  The nodes only differ in the amount of memory and number of DIMMs.  
>> There are two sets of nodes with different output from lstopo:
>>
>> Group 1 (correct): reporting 4 sockets with 16 cores per socket
>> Group 2 (incorrect): reporting 8 sockets with 8 cores per socket
>>
>> Group 2 seems to be (incorrectly?) taking numanodes as sockets.
>>
>> The output of lstopo is slightly different in the two groups, note the extra 
>> Socket layer for group 2:
>>
>> Group 1: 
>> Machine (128GB)
>>   NUMANode L#0 (P#0 32GB) + Socket L#0
>>   #16 cores listed
>>   
>>   NUMANode L#1 (P#2 32GB) + Socket L#1
>>   #16 cores listed
>>   etc
>> 
>>
>> Group 2:
>> Machine (256GB)
>>   Socket L#0 (64GB)
>> NUMANode L#0 (P#0 32GB) + L3 L#0 (6144KB)
>> # 8 cores listed
>> 
>> NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB)
>> # 8 cores listed
>> 
>>   Socket L#1 (64GB)
>> NUMANode L#2 (P#2 32GB) + L3 L#2 (6144KB)
>> # 8 cores listed
>> etc
>> 
>>
>> The group 2 reporting doesn't match our hardware, at least as far as sockets 
>> and cores per socket goes--is there a reason other than the memory 
>> configuration that could cause this? 
>> Thanks,
>> Craig
>>
>>
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>



Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Craig Kapfer
Interesting, quite right, thank you very much.  Yes these are AMD 6300 series.  
Same kernel but these boxes seem to have different BIOS versions, direct from 
the factory, delivered in the same physical enclosure even!  Some are AMI 3.5 
and some are 3.0.

So slurm is then incorrectly parsing correct output from lstopo to generate 
this message?

May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: 
CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw) 
ThreadsPerCore=1:1(hw)
Thanks much,


Craig


On Wednesday, May 28, 2014 1:39 PM, Brice Goglin  wrote:
 


Aside of the BIOS config, are you sure that you have the exact same BIOS 
*version* in each node? (can check in /sys/class/dmi/id/bios_*) Same Linux 
kernel too?

Also, recently we've seen somebody fix such problems by unplugging
  and replugging some CPUs on the motherboard. Seems crazy but it
  happened for real...

By the way, your discussion of groups 1 and 2 below is wrong.
  Group 2 doesn't say that NUMA node == socket, and it doesn't
  report 8 sockets of 8 cores each. It reports 4 sockets containing
  2 NUMA nodes each containing 8 cores each, and that's likely what
  you have here (AMD Opteron 6300 or 6200 processors?).

Brice




Le 28/05/2014 12:27, Craig Kapfer a écrit :

We have a bunch of 64-core (quad-socket, 16 cores/socket) AMD servers and some 
of them are reporting the following error from slurm, which I gather gets its 
info from hwloc: 
>May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from hardware: 
>CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw) 
>ThreadsPerCore=1:1(hw)
>All nodes have the exact same CPUs, motherboards and OS (PXE booted from the 
>same master image even).  The bios settings between nodes also look the same.  
>The nodes only differ in the amount of memory and number of DIMMs.  
>There are two sets of nodes with different output from lstopo: Group 1 
>(correct): reporting 4 sockets with 16 cores per socket
Group 2 (incorrect): reporting 8 sockets with 8 cores per socket Group 2 seems 
to be (incorrectly?) taking numanodes as sockets. The output of lstopo is 
slightly different in the two groups, note the extra Socket layer for group 2: 
Group 1: Machine (128GB) NUMANode L#0 (P#0 32GB) + Socket L#0 #16 cores listed 
 NUMANode L#1 (P#2 32GB) + Socket L#1 #16 cores listed etc
 Group 2: Machine (256GB) Socket L#0 (64GB) NUMANode L#0 (P#0 32GB) + L3 
L#0 (6144KB) # 8 cores listed  NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB) 
# 8 cores listed  Socket L#1 (64GB) NUMANode L#2 (P#2 32GB) + L3 L#2 
(6144KB) # 8 cores listed etc
 The group 2 reporting doesn't match our hardware, at least as far as 
sockets and cores per socket goes--is there a reason other than the memory 
configuration that could cause this? 
>Thanks,
>Craig
>
>
>
>
>___
hwloc-users mailing list hwloc-us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

Re: [hwloc-users] node configuration differs form hardware

2014-05-28 Thread Brice Goglin
Aside of the BIOS config, are you sure that you have the exact same BIOS
*version* in each node? (can check in /sys/class/dmi/id/bios_*) Same
Linux kernel too?

Also, recently we've seen somebody fix such problems by unplugging and
replugging some CPUs on the motherboard. Seems crazy but it happened for
real...

By the way, your discussion of groups 1 and 2 below is wrong. Group 2
doesn't say that NUMA node == socket, and it doesn't report 8 sockets of
8 cores each. It reports 4 sockets containing 2 NUMA nodes each
containing 8 cores each, and that's likely what you have here (AMD
Opteron 6300 or 6200 processors?).

Brice



Le 28/05/2014 12:27, Craig Kapfer a écrit :
> We have a bunch of 64-core (quad-socket, 16 cores/socket) AMD servers and 
> some of them are reporting the following error from slurm, which I gather 
> gets its info from hwloc:
>
> May 27 11:53:04 n001 slurmd[3629]: Node configuration differs from 
> hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=4:8(hw) 
> CoresPerSocket=16:8(hw) ThreadsPerCore=1:1(hw)
>
> All nodes have the exact same CPUs, motherboards and OS (PXE booted from the 
> same master image even).  The bios settings between nodes also look the same. 
>  The nodes only differ in the amount of memory and number of DIMMs.  
> There are two sets of nodes with different output from lstopo:
>
> Group 1 (correct): reporting 4 sockets with 16 cores per socket
> Group 2 (incorrect): reporting 8 sockets with 8 cores per socket
>
> Group 2 seems to be (incorrectly?) taking numanodes as sockets.
>
> The output of lstopo is slightly different in the two groups, note the extra 
> Socket layer for group 2:
>
> Group 1: 
> Machine (128GB)
>   NUMANode L#0 (P#0 32GB) + Socket L#0
>   #16 cores listed
>   
>   NUMANode L#1 (P#2 32GB) + Socket L#1
>   #16 cores listed
>   etc
> 
>
> Group 2:
> Machine (256GB)
>   Socket L#0 (64GB)
> NUMANode L#0 (P#0 32GB) + L3 L#0 (6144KB)
> # 8 cores listed
> 
> NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB)
> # 8 cores listed
> 
>   Socket L#1 (64GB)
> NUMANode L#2 (P#2 32GB) + L3 L#2 (6144KB)
> # 8 cores listed
> etc
> 
>
> The group 2 reporting doesn't match our hardware, at least as far as sockets 
> and cores per socket goes--is there a reason other than the memory 
> configuration that could cause this? 
> Thanks,
> Craig
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users