Be default, OMPI will bind your procs to a single core. You probably want to at 
least bind to socket (for NUMA reasons), or not bind at all if you want to use 
all the cores on the node.

So either add "--bind-to socket" or "--bind-to none" to your cmd line.


On Aug 3, 2020, at 1:33 AM, John Duffy via users <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> > wrote:

Hi

I’m experimenting with hybrid OpenMPI/OpenMP Linpack benchmarks on my small 
cluster, and I’m a bit confused as to how to invoke mpirun.

I have compiled/linked HPL-2.3 with OpenMPI and libopenblas-openmp using the 
GCC -fopenmp option on Ubuntu 20.04 64-bit.

With P=1 and Q=1 in HPL.dat, if I use…

mpirun -x OMP_NUM_THREADS=4 xhpl

top reports...
 top - 08:03:59 up 1 day, 0 min,  1 user,  load average: 2.25, 1.23, 0.88
Tasks: 138 total,   2 running, 136 sleeping,   0 stopped,   0 zombie
%Cpu(s): 77.1 us, 22.2 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3793.3 total,    434.0 free,   2814.1 used,    545.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    919.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
                                                                          
   5787 john      20   0 2959408   2.6g   8128 R 354.0  69.1   2:10.43 xhpl     
                                                                          
   5789 john      20   0  263352   9960   7440 S  14.2   0.3   0:07.42 xhpl     
                                                                          
   5788 john      20   0  263352   9844   7320 S  13.9   0.3   0:07.19 xhpl     
                                                                          
   5790 john      20   0  263356   9896   7376 S  13.6   0.3   0:07.17 xhpl     
                                                                          

… which seems reasonable, but I don’t understand why there are 4 xhpl processes.


In anticipation of adding more nodes, if I use…

mpirun --host node1 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl

top reports...

top - 07:56:27 up 23:52,  1 user,  load average: 1.00, 0.98, 0.68
Tasks: 133 total,   2 running, 131 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.1 us,  0.0 sy,  0.0 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3793.3 total,    454.2 free,   2794.5 used,    544.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    939.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
                                                                          
   5770 john      20   0 2868700   2.5g   7668 R  99.7  68.7   5:20.37 xhpl     
                                                                          

… a single xhpl process (as expected), but with only 25% CPU utilisation and no 
other processes running on the other 3 cores. It would appear OpenBLAS is not 
utilising the 4 cores as expected.


If I then scale it to 2 nodes, with P=1 and Q=2 in HPL.dat...

mpirun --host node1,node2 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl

… similarly, I get a single process on each node, with only 25% CPU utilisation.


Any advice/suggestions on how to involve mpirun in a hybrid OpenMPI/OpenMP 
setup would be appreciated.

Kind regards




Reply via email to