Hello,

I am hoping to get some advice on how to efficiently run QE on multiple nodes 
in a cluster.  I have a simulation that I have been running on 1node/16cores 
that I am looking to scale up to 2nodes/32cores.  Our cluster has a gpfs 
networked filesystem that has previously caused performance issues due to QE's 
large writes.  The way I solved this while running on one node was to move the 
input files to the node's local hdd/ssd, run the simulation, then move the 
results back to the networked file system.  Now that I have two nodes, the 
script for a single node results in a crash shortly after reading in the 
pseudopotentials.

I have been able to finish a test simulation using only the network storage, so 
I believe that QE is configured properly to run across multiple nodes.  I did 
observe a significant performance hit using the networked storage though, going 
from 1 node/local drive taking ~1 min to 2 nodes/networked storage taking ~7 
minutes.

I would like to be able to return to using the local drives on each node to 
avoid these issues with networked storage.  Is there some sort of setting 
inside QE that can help me with this or is this something that I need to work 
with my cluster admin team to resolve?

Additional info:  Our cluster uses SLURM for job submission and I am currently 
using pw.x and ph.x from QE

Thanks,
Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University


_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to