Do -cpu-set or -cpu-list work?  Or is there a better way to use rankfile?

I have a cluster with 24-cores and 1 GPU per node.  I would like to have
one core drive the GPU and the other 23 to be used thread-parallel with
OpenMP.  My setup is described in my just-previous email to this list:

CentOS-8.2, gcc-8.3, openmpi-4.0.5
$ which mpirun
~/ompi/contrib-gcc830/openmpi-4.0.5-nodl/bin/mpirun

As noted in that email, I cannot get OMPI_Affinity_str to return the
affinities, but I am able now to get --report-bindings to work, so I can
progress.  I have tried both -cpu-set and -cpu-list, but neither seems
to do any bindings.  However, I did get rankfile to work:

$ cat rankfile
rank 0=vcloud slot=0:0
rank 1=vcloud slot=0:1-23
$ mpirun -np 2 --report-bindings -rf rankfile affinity
[vcloud:3277858] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../../../../../../../../../../../../../../../../../..] [vcloud:3277858] MCW rank 1 bound to socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]], socket 0[core 12[hwt 0-1]], socket 0[core 13[hwt 0-1]], socket 0[core 14[hwt 0-1]], socket 0[core 15[hwt 0-1]], socket 0[core 16[hwt 0-1]], socket 0[core 17[hwt 0-1]], socket 0[core 18[hwt 0-1]], socket 0[core 19[hwt 0-1]], socket 0[core 20[hwt 0-1]], socket 0[core 21[hwt 0-1]], socket 0[core 22[hwt 0-1]], socket 0[core 23[hwt 0-1]]: [../BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]

and OpenMP returns the correct number of threads on each process (2
logical for the first rank, 46 for the second).

However, rankfile is inconvenient on large runs, where I would have to
parse the host file, then create a corresponding rankfile.

Is there a better way to do this?

Thanks.....John Cary

PS: with both -cpu-list and cpu-set I tried and got

$ mpirun -np 2 --report-bindings -cpu-set 0,1-23 affinity
[vcloud:3849270] MCW rank 0 is not bound (or bound to all available processors) [vcloud:3849270] MCW rank 1 is not bound (or bound to all available processors)


Reply via email to