Hello,
I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic
architecture. I'm using a node for which lstopo returns that :
Machine (24GB)
NUMANode L#0 (P#0 12GB)
Socket L#0 + L3 L#0 (8192KB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
HostBridge L#0
PCIBridge
PCI 8086:10c9
Net L#0 "eth0"
PCI 8086:10c9
Net L#1 "eth1"
PCIBridge
PCI 15b3:673c
Net L#2 "ib0"
Net L#3 "ib1"
OpenFabrics L#4 "mlx4_0"
PCIBridge
PCI 102b:0522
PCI 8086:3a22
Block L#5 "sda"
Block L#6 "sdb"
Block L#7 "sdc"
Block L#8 "sdd"
NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
And I would like to use the physical numbering. To do that, I created a
rankfile like this :
rank 0=host1 slot=p0:0
rank 1=host1 slot=p0:2
rank 2=host1 slot=p0:4
rank 3=host1 slot=p0:6
rank 4=host1 slot=p1:1
rank 5=host1 slot=p1:3
rank 6=host1 slot=p1:5
rank 7=host1 slot=p1:7
But when I run my job with "/mpiexec -np 8 --rankfile rankfile ./foo/",
I encounter this error :
/Specified slot list: p0:4
Error: Not found
This could mean that a non-existent processor was specified, or
that the specification had improper syntax./
Do you know what I did wrong?
Best regards,
François
--
___
François TESSIER
PhD Student at University of Bordeaux
Tel : 0033.5.24.57.41.52
francois.tess...@inria.fr