Hello all, I'm a system administrator of the High Performance Computing Center at Universidade Federal do ABC - Brazil (http://hpc.ufabc.edu.br/). I'm not used about the internals of scientific research and the tools you are used, but we have ran into problems regarding pw.x usage of one of the users we support.
She has a simple job submission file that run pw.x, with an input and an output file. After hours of execution, the system load increases absurdly. We can see it is I/O stuff, but we could not discover why it happens later in the execution of the program nor how to fix it. In this cluster, we use NFS for both "distributed scratch" and home folders (we know we should use a modern parallel file system, but it is not possible for the moment), but each node has a big local scratch partition. Some questions: 1. Why I/O happens later in the execution of pw.x? 2. Documentation (here: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/node18.html#SECTION00043100000000000000) is not clear about the "distributed" or "collected" work. Although it has some tips, I still wonder about suggesting to our user about the best configuration. What can be "parallelized"? What may remain in one place? 3. Is there any flag or configuration we can pass to pw.x to see what it is doing? Any debug flag? 4. Variables you place in input file are documented anywhere? Thank you very much. -- Silas Silva
