Hello,

I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic architecture. I'm using a node for which lstopo returns that :

----------------
Machine (24GB)
  NUMANode L#0 (P#0 12GB)
    Socket L#0 + L3 L#0 (8192KB)
      L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2)
      L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4)
      L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
    HostBridge L#0
      PCIBridge
        PCI 8086:10c9
          Net L#0 "eth0"
        PCI 8086:10c9
          Net L#1 "eth1"
      PCIBridge
        PCI 15b3:673c
          Net L#2 "ib0"
          Net L#3 "ib1"
          OpenFabrics L#4 "mlx4_0"
      PCIBridge
        PCI 102b:0522
      PCI 8086:3a22
        Block L#5 "sda"
        Block L#6 "sdb"
        Block L#7 "sdc"
        Block L#8 "sdd"
  NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB)
    L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
    L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3)
    L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5)
    L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
----------------

And I would like to use the physical numbering. To do that, I created a rankfile like this :

rank 0=host1 slot=p0:0
rank 1=host1 slot=p0:2
rank 2=host1 slot=p0:4
rank 3=host1 slot=p0:6
rank 4=host1 slot=p1:1
rank 5=host1 slot=p1:3
rank 6=host1 slot=p1:5
rank 7=host1 slot=p1:7

But when I run my job with "/mpiexec -np 8 --rankfile rankfile ./foo/", I encounter this error :

/    Specified slot list: p0:4
    Error: Not found

    This could mean that a non-existent processor was specified, or
    that the specification had improper syntax./


Do you know what I did wrong?

Best regards,

François

--
___________________
François TESSIER
PhD Student at University of Bordeaux
Tel : 0033.5.24.57.41.52
francois.tess...@inria.fr


Reply via email to