Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application (users Digest, Vol 4715, Issue 1)

Bernd Dammann via users Thu, 03 Feb 2022 04:26:22 -0800

Hi David,

On 03/02/2022 00:03 , David Perozzi wrote:

Helo,
I'm trying to run a code implemented with OpenMPI and OpenMP (forthreading) on a large cluster that uses LSF for the job scheduling anddispatch. The problem with LSF is that it is not very straightforward toallocate and bind the right amount of threads to an MPI rank inside asingle node. Therefore, I have to create a rankfile myself, as soon asthe (a priori unknown) ressources are allocated.
So, after my job get dispatched, I run:
mpirun -n "$nslots" -display-allocation -nooversubscribe --map-bycore:PE=1 --bind-to core mpi_allocation/show_numactl.sh>mpi_allocation/allocation_files/allocation.txt

Just out of curiosity: why do you not use the built-in LSF features todo this mapping? Something like


#BSUB -n 4
#BSUB -R "span[block=1] affinity[core(4)]"

mpirun ./MyHybridApplication

This will give you 4 cores for each of your 4 MPI ranks, and it setsOMP_NUM_THREADS=4 automatically. LSF's affinity is even more finegrained, so you can specify that the 4 cores should be on one socket(e.g. if your application is memory bound, and you want to make use ofmore memory bandwidth). Check the LSF documentation for more details.


Examples:

1) with span[block=...] (allow LSF to place resources on one host)

#BSUB -n 4
#BSUB -R "span[block=1] affinity[core(4)]"

export OMP_DISPLAY_AFFINITY=true
export OMP_AFFINITY_FORMAT="host: %H PID: %P TID: %n affinity: %A"
mpirun --tag-output ./hello

gives this output (sorted):

[1,0]<stderr>:host: node-23-8 PID: 2798 TID: 0 affinity: 0
[1,0]<stderr>:host: node-23-8 PID: 2798 TID: 1 affinity: 1
[1,0]<stderr>:host: node-23-8 PID: 2798 TID: 2 affinity: 2
[1,0]<stderr>:host: node-23-8 PID: 2798 TID: 3 affinity: 3
[1,0]<stdout>:Hello world from thread 0!
[1,0]<stdout>:Hello world from thread 1!
[1,0]<stdout>:Hello world from thread 2!
[1,0]<stdout>:Hello world from thread 3!
[1,1]<stderr>:host: node-23-8 PID: 2799 TID: 0 affinity: 4
[1,1]<stderr>:host: node-23-8 PID: 2799 TID: 1 affinity: 5
[1,1]<stderr>:host: node-23-8 PID: 2799 TID: 2 affinity: 6
[1,1]<stderr>:host: node-23-8 PID: 2799 TID: 3 affinity: 7
[1,1]<stdout>:Hello world from thread 0!
[1,1]<stdout>:Hello world from thread 1!
[1,1]<stdout>:Hello world from thread 2!
[1,1]<stdout>:Hello world from thread 3!
[1,2]<stderr>:host: node-23-8 PID: 2803 TID: 0 affinity: 10
[1,2]<stderr>:host: node-23-8 PID: 2803 TID: 1 affinity: 11
[1,2]<stderr>:host: node-23-8 PID: 2803 TID: 2 affinity: 12
[1,2]<stderr>:host: node-23-8 PID: 2803 TID: 3 affinity: 13
[1,2]<stdout>:Hello world from thread 0!
[1,2]<stdout>:Hello world from thread 1!
[1,2]<stdout>:Hello world from thread 2!
[1,2]<stdout>:Hello world from thread 3!
[1,3]<stderr>:host: node-23-8 PID: 2807 TID: 0 affinity: 14
[1,3]<stderr>:host: node-23-8 PID: 2807 TID: 1 affinity: 15
[1,3]<stderr>:host: node-23-8 PID: 2807 TID: 2 affinity: 16
[1,3]<stderr>:host: node-23-8 PID: 2807 TID: 3 affinity: 17
[1,3]<stdout>:Hello world from thread 0!
[1,3]<stdout>:Hello world from thread 1!
[1,3]<stdout>:Hello world from thread 2!
[1,3]<stdout>:Hello world from thread 3!

I got 4 groups of 4 cores each, all on the same host!


2) with span[ptile=...] (force LSF to distribute over several hosts)

#BSUB -n 4
#BSUB -R "span[pile=1] affinity[core(4)]"

export OMP_DISPLAY_AFFINITY=true
export OMP_AFFINITY_FORMAT="host: %H PID: %P TID: %n affinity: %A"
mpirun --tag-output ./hello

gives this (sorted):

[1,0]<stderr>:host: node-23-8 PID: 2438 TID: 0 affinity: 0
[1,0]<stderr>:host: node-23-8 PID: 2438 TID: 1 affinity: 1
[1,0]<stderr>:host: node-23-8 PID: 2438 TID: 2 affinity: 2
[1,0]<stderr>:host: node-23-8 PID: 2438 TID: 3 affinity: 3
[1,0]<stdout>:Hello world from thread 0!
[1,0]<stdout>:Hello world from thread 1!
[1,0]<stdout>:Hello world from thread 2!
[1,0]<stdout>:Hello world from thread 3!
[1,1]<stderr>:host: node-23-7 PID: 19425 TID: 0 affinity: 0
[1,1]<stderr>:host: node-23-7 PID: 19425 TID: 1 affinity: 1
[1,1]<stderr>:host: node-23-7 PID: 19425 TID: 2 affinity: 2
[1,1]<stderr>:host: node-23-7 PID: 19425 TID: 3 affinity: 3
[1,1]<stdout>:Hello world from thread 0!
[1,1]<stdout>:Hello world from thread 1!
[1,1]<stdout>:Hello world from thread 2!
[1,1]<stdout>:Hello world from thread 3!
[1,2]<stderr>:host: node-23-6 PID: 23940 TID: 0 affinity: 0
[1,2]<stderr>:host: node-23-6 PID: 23940 TID: 1 affinity: 1
[1,2]<stderr>:host: node-23-6 PID: 23940 TID: 2 affinity: 2
[1,2]<stderr>:host: node-23-6 PID: 23940 TID: 3 affinity: 3
[1,2]<stdout>:Hello world from thread 0!
[1,2]<stdout>:Hello world from thread 1!
[1,2]<stdout>:Hello world from thread 2!
[1,2]<stdout>:Hello world from thread 3!
[1,3]<stderr>:host: node-23-5 PID: 30341 TID: 0 affinity: 0
[1,3]<stderr>:host: node-23-5 PID: 30341 TID: 1 affinity: 1
[1,3]<stderr>:host: node-23-5 PID: 30341 TID: 2 affinity: 2
[1,3]<stderr>:host: node-23-5 PID: 30341 TID: 3 affinity: 3
[1,3]<stdout>:Hello world from thread 0!
[1,3]<stdout>:Hello world from thread 1!
[1,3]<stdout>:Hello world from thread 2!
[1,3]<stdout>:Hello world from thread 3!

Here I got 4 groups of 4 cores on different hosts!

Maybe the above can be some kind of inspiration to solve your problem ina different way!


/Bernd

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application (users Digest, Vol 4715, Issue 1)

Reply via email to