If P=1 and Q=1, your setting up a 1x1 matrix which should only need a single processor. Something tells me you have 4 independent HPL jobs running, rather than one job using 4 threads. I think you should have 2x2 grid if you want to use 4 threads. For HPL, P * Q = number of cores being used.

Prentice

On 8/3/20 4:33 AM, John Duffy via users wrote:
Hi

I’m experimenting with hybrid OpenMPI/OpenMP Linpack benchmarks on my small cluster, and I’m a bit confused as to how to invoke mpirun.

I have compiled/linked HPL-2.3 with OpenMPI and libopenblas-openmp using the GCC -fopenmp option on Ubuntu 20.04 64-bit.

With P=1 and Q=1 in HPL.dat, if I use…

mpirun -x OMP_NUM_THREADS=4 xhpl

top reports...
top - 08:03:59 up 1 day, 0 min,  1 user,  load average: 2.25, 1.23, 0.88
Tasks: 138 total,   2 running, 136 sleeping,   0 stopped,   0 zombie
%Cpu(s): 77.1 us, 22.2 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3793.3 total,    434.0 free,   2814.1 used,   545.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   919.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM   TIME+ COMMAND    5787 john      20   0 2959408   2.6g   8128 R 354.0  69.1   2:10.43 xhpl    5789 john      20   0  263352   9960   7440 S  14.2   0.3   0:07.42 xhpl    5788 john      20   0  263352   9844   7320 S  13.9   0.3   0:07.19 xhpl    5790 john      20   0  263356   9896   7376 S  13.6   0.3   0:07.17 xhpl

… which seems reasonable, but I don’t understand why there are 4 xhpl processes.


In anticipation of adding more nodes, if I use…

mpirun --host node1 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl

top reports...

top - 07:56:27 up 23:52,  1 user,  load average: 1.00, 0.98, 0.68
Tasks: 133 total,   2 running, 131 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.1 us,  0.0 sy,  0.0 ni, 74.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3793.3 total,    454.2 free,   2794.5 used,   544.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   939.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM   TIME+ COMMAND    5770 john      20   0 2868700   2.5g   7668 R  99.7  68.7   5:20.37 xhpl

… a single xhpl process (as expected), but with only 25% CPU utilisation and no other processes running on the other 3 cores. It would appear OpenBLAS is not utilising the 4 cores as expected.


If I then scale it to 2 nodes, with P=1 and Q=2 in HPL.dat...

mpirun --host node1,node2 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl

… similarly, I get a single process on each node, with only 25% CPU utilisation.


Any advice/suggestions on how to involve mpirun in a hybrid OpenMPI/OpenMP setup would be appreciated.

Kind regards



--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

Reply via email to