ps -eaf --forest or indeed pstree is a good way to see what is going on. Also 'htop' is a very useful utility.
Also well worth running 'lstopo' to look at the layout of cores nd caches on your machines. On Mon, 3 Aug 2020 at 09:40, John Duffy via users <users@lists.open-mpi.org> wrote: > Hi > > I’m experimenting with hybrid OpenMPI/OpenMP Linpack benchmarks on my > small cluster, and I’m a bit confused as to how to invoke mpirun. > > I have compiled/linked HPL-2.3 with OpenMPI and libopenblas-openmp using > the GCC -fopenmp option on Ubuntu 20.04 64-bit. > > With P=1 and Q=1 in HPL.dat, if I use… > > mpirun -x OMP_NUM_THREADS=4 xhpl > > top reports... > > top - 08:03:59 up 1 day, 0 min, 1 user, load average: 2.25, 1.23, 0.88 > Tasks: 138 total, 2 running, 136 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 77.1 us, 22.2 sy, 0.0 ni, 0.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 > st > MiB Mem : 3793.3 total, 434.0 free, 2814.1 used, 545.2 buff/cache > MiB Swap: 0.0 total, 0.0 free, 0.0 used. 919.9 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 5787 john 20 0 2959408 2.6g 8128 R 354.0 69.1 2:10.43 > xhpl > > 5789 john 20 0 263352 9960 7440 S 14.2 0.3 0:07.42 > xhpl > > 5788 john 20 0 263352 9844 7320 S 13.9 0.3 0:07.19 > xhpl > > 5790 john 20 0 263356 9896 7376 S 13.6 0.3 0:07.17 > xhpl > > > … which seems reasonable, but I don’t understand why there are 4 xhpl > processes. > > > In anticipation of adding more nodes, if I use… > > mpirun --host node1 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl > > top reports... > > top - 07:56:27 up 23:52, 1 user, load average: 1.00, 0.98, 0.68 > Tasks: 133 total, 2 running, 131 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 25.1 us, 0.0 sy, 0.0 ni, 74.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 > st > MiB Mem : 3793.3 total, 454.2 free, 2794.5 used, 544.7 buff/cache > MiB Swap: 0.0 total, 0.0 free, 0.0 used. 939.9 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 5770 john 20 0 2868700 2.5g 7668 R 99.7 68.7 5:20.37 > xhpl > > > … a single xhpl process (as expected), but with only 25% CPU utilisation > and no other processes running on the other 3 cores. It would appear > OpenBLAS is not utilising the 4 cores as expected. > > > If I then scale it to 2 nodes, with P=1 and Q=2 in HPL.dat... > > mpirun --host node1,node2 --map-by ppr:1:node -x OMP_NUM_THREADS=4 xhpl > > … similarly, I get a single process on each node, with only 25% CPU > utilisation. > > > Any advice/suggestions on how to involve mpirun in a hybrid OpenMPI/OpenMP > setup would be appreciated. > > Kind regards > > > >