Hello
it is a strange behavior which does not depend on the program, there may
be many reasons, it's very hard to guess:
starting from the most trivial things:
* it could be that some other application is using the same processors
as you at the same time ?
* you are using a file system that is not very efficient, e.g. you are
using the home filesystem instead of the scratch disk or something of
the kind.
* you are using multithreading but you don't have enough processors to
do that ? Try to set the envinroment variable OMP_NUM_THREADS to 1
before running.
I hope it helps
Pietro
On 06/06/19 13:21, Julien Barbaud wrote:
Dear users,
I am still struggling to understand the parallel performances of QE on
the cluster of my university. I have to say right off the bat that
this problem might have more to do with the parallel scheduling in our
cluster. However, after many discussions with the people responsible
for the cluster, they don’t seem to see where the problem would be on
their side. So I want to check if that could be a more common problem
and if you would have some suggestions about it.
The problem in a nutshell: the performance of a pw.x run seems
completely random on our cluster. Launching the same job on the same
number of procs can result in calculation times differing by a factor
of 5 of more. This is of course a huge issue in planning how many
cores I want to use, or just trying to have a clue of what’s going on.
When the speed is particularly low, it seems to be materialized by a
WALL time much higher than the CPU time.
To exemplify, here is the same code ran on 3, 6 and 9 cores, with the
corresponding CPU and WALL time:
Procs CPU time WALL time
------- ------------ -------------
3 6m56.69s 28m33.48s àbig difference: bad parallelization
6 4m 9.56s 4m20.65s àgood parallelization
9 5min42s 21m13.10s àbad parallelization
The huge difference between CPU time and WALL time is an issue. But
even looking at the CPU time alone, it doesn’t seem to scale well, as
I would not expect the 9 cores to be slower than the 6 (but I lack
experience on this).
If I launch the job again right after on 6 cores, I get something much
slower. This pattern shows up for different inputs, so I does not seem
to be related to that directly. The example is from a vc-relax run
stopped after 4 iterations
This all feels very random, but do you have an idea why this would
happen ? Am I doing something wrong ?
Another example with a run on 3 iterations, for 3,6,9 procs, repeated
twice to show the “random” variations between 2 runs:
Procs CPU time WALL time
------- ------------ -------------
3 6m25.61s 16m17.82s
6 3m18.12s 7m16.88s
9 2m31.85s 6m32.46s 10s
Procs CPU time WALL time
------- ------------ -------------
3 7m17.83s 22m53.90s
6 3m42.18s 3m50.74s
9 5m38.31 9m21.52s
Thanks in advance,
Julien
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users