I've polluted the previous thread on GPU abilites with so much Intel/Phi bits that I decided a few new threads might be a good idea. First off I think the following could be a FAQ entry.

If you have cluster with Phi cards and are using the SCIF interface with OFED, OpenMPI between two hosts (not two Phi cards) is going to choose the wrong interface at runtime. I'll show this by example.

On a node that has a Phi card and has ofed-mic enabled, you end up with two IB interfaces.

tim@phi001 osu]$ ibv_devices
    device                 node GUID
    ------              ----------------
    scif0               4c79bafffe300005
    mlx4_0              003048ffff95f98c

The scif0 interface is not the one you want to use but it is the one that shows up first in the list. By default OpenMPI won't even know what to do with this interface.

$ mpicc osu_bw.c -o osu_bw.openmpi.x

$ mpirun -np 2 -hostfile hosts.nodes osu_bw.openmpi.x
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            phi002.local
  Device name:           scif0
  Device vendor ID:      0x8086
  Device vendor part ID: 0


It completely fails. However if you specify the correct interface (mlx4_0) you get the expected results.

$ mpirun -np 2 -hostfile hosts.nodes --mca btl openib,self,sm -mca 
btl_openib_if_include mlx4_0 osu_bw.openmpi.x
# OSU MPI Bandwidth Test
# Size        Bandwidth (MB/s)
1                         3.25
2                         6.40
4                        12.65
8                        25.53
16                       50.42
32                       97.06
64                      187.02
128                     357.88
256                     663.64
512                    1228.23
1024                   2142.46
2048                   3128.06
4096                   4110.78
8192                   4870.81
16384                  5864.45
32768                  6135.67
65536                  6264.35
131072                 6307.70
262144                 6340.24
524288                 6329.59
1048576                6343.71
2097152                6315.45
4194304                6322.65

Tim

Reply via email to