Re: [QE-users] Large (and seemingly random) differences between CPU and WALL time

Pietro Delugas Thu, 06 Jun 2019 05:59:55 -0700

Hello

it is a strange behavior which does not depend on the program, there maybe many reasons, it's very hard to guess:


starting from the most trivial things:

* it could be that some other application is using the same processorsas you at the same time ?

* you are using a file system that is not very efficient, e.g. you areusing the home filesystem instead of the scratch disk or something ofthe kind.

* you are using multithreading but you don't have enough processors todo that ? Try to set the envinroment variable OMP_NUM_THREADS to 1before running.


I hope it helps

Pietro


On 06/06/19 13:21, Julien Barbaud wrote:

Dear users,
I am still struggling to understand the parallel performances of QE onthe cluster of my university. I have to say right off the bat thatthis problem might have more to do with the parallel scheduling in ourcluster. However, after many discussions with the people responsiblefor the cluster, they don’t seem to see where the problem would be ontheir side. So I want to check if that could be a more common problemand if you would have some suggestions about it.
The problem in a nutshell: the performance of a pw.x run seemscompletely random on our cluster. Launching the same job on the samenumber of procs can result in calculation times differing by a factorof 5 of more. This is of course a huge issue in planning how manycores I want to use, or just trying to have a clue of what’s going on.
When the speed is particularly low, it seems to be materialized by aWALL time much higher than the CPU time.
To exemplify, here is the same code ran on 3, 6 and 9 cores, with thecorresponding CPU and WALL time:
Procs     CPU time             WALL time

------- ------------            -------------

3 6m56.69s             28m33.48s àbig difference: bad parallelization

6 4m 9.56s              4m20.65s àgood parallelization

9 5min42s               21m13.10s àbad parallelization
The huge difference between CPU time and WALL time is an issue. Buteven looking at the CPU time alone, it doesn’t seem to scale well, asI would not expect the 9 cores to be slower than the 6 (but I lackexperience on this).
If I launch the job again right after on 6 cores, I get something muchslower. This pattern shows up for different inputs, so I does not seemto be related to that directly. The example is from a vc-relax runstopped after 4 iterations
This all feels very random, but do you have an idea why this wouldhappen ? Am I doing something wrong ?
Another example with a run on 3 iterations, for 3,6,9 procs, repeatedtwice to show the “random” variations between 2 runs:
Procs     CPU time             WALL time

------- ------------            -------------

3  6m25.61s            16m17.82s

6 3m18.12s             7m16.88s

9 2m31.85s             6m32.46s 10s

Procs     CPU time             WALL time

------- ------------            -------------

3  7m17.83s            22m53.90s

6 3m42.18s             3m50.74s

9 5m38.31               9m21.52s

Thanks in advance,

Julien


_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Large (and seemingly random) differences between CPU and WALL time

Reply via email to