Hello The size of your system is such that parallel execution should be beneficial for up to 50, 60 MPI ranks, without any further option. If you are planning to use something like 200 cores the simplest and yet effective way to go would be to use an executable compiled with hybrid MPI + OpenMP parallelism and run it with 4 OpenMP threads.
About your test in the 8 core machines, looking at your output, it seems to me that you are using a multithreaded linear algebra library. If it is the case you should make sure that the number of MPI ranks times the number of threads doesn't exceed the number of cores. 8 in your case. To set the number of openMP threads you need to specify the OMP_NUM_THREADS environment variable. e.g. export OMP_NUM_THREAD=2 if you run with 4 MPIs or export OMP_NUM_THREADS=1 if you run with 8 MPIs hope this helps, best regards -- Pietro ________________________________ Da: users <[email protected]> per conto di Robert Fleming <[email protected]> Inviato: mercoledì 13 aprile 2022 16:36 A: Quantum ESPRESSO users Forum <[email protected]> Oggetto: [QE-users] Advice for Parallel Execution Greetings, I’m running scf calculations on an amorphous Si surface terminated with different functional groups (input file attached for context), and I’m experiencing poor scaling behavior in parallel. I’m running these jobs locally on an 8-core CPU to test before scaling up the system size to an hpc cluster. I’ve noticed that increasing the number of MPI processes from 4 to 8 (mpirun -n 4 pw.x -in [myscript] vs. mpirun -n 8 pw.x -in [myscript]) results in the job taking longer (~ 1hr vs. 1.5 hr). Looking through the documentation, I see that there are several command line switches for different levels of parallelization beyond just the number of MPI processes. While it’s possible that my system size is potentially too small to benefit from parallel execution (24 atoms, 103 electrons), I think it’s probably more likely that I’m not appropriately taking advantage of these. Would anyone be willing to share some advice or “rules of thumb” on the best way to select these parallelization levels for small-to-medium sized jobs (say, an 8 core CPU vs. 150-200 processors on an hpc platform)? Thank you, ________________________________ Robert “Drew” Fleming, Ph.D. Assistant Professor of Mechanical Engineering College of Engineering & Computer Science Arkansas State University (870) 972-3743 [email protected]<mailto:[email protected]>
_______________________________________________ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples _______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list [email protected] https://lists.quantum-espresso.org/mailman/listinfo/users
