This is not a QE problem: the fortran code knows nothing about nodes and cores. It's the software setup for parallel execution on your machine that has a problem.
Paolo On Thu, Jul 16, 2015 at 2:25 PM, mohaddeseh abbasnejad < [email protected]> wrote: > > Dear all, > > I have recently installed PWscf (version 5.1) on our cluster (4 nodes, 32 > cores). > Ifort & mkl version 11.1 has been installed. > When I run pw.x command on every node individually, for both the following > command, it will work properly. > 1- /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in > 2- mpirun -n 4 /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in > However, when I use the following command (again for each of them, > separately), > 3- mpirun -n 8 /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in > it gives me such an error: > > [cluster:14752] *** Process received signal *** > [cluster:14752] Signal: Segmentation fault (11) > [cluster:14752] Signal code: (128) > [cluster:14752] Failing at address: (nil) > [cluster:14752] [ 0] /lib64/libpthread.so.0() [0x3a78c0f710] > [cluster:14752] [ 1] > /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_mc3.so(mkl_blas_zdotc+0x79) > [0x2b5e8e37d4f9] > [cluster:14752] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 4 with PID 14752 on node > cluster.khayam.local exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > This error also exists when I use all the node with each other in parallel > mode (using the following command): > 4- mpirun -n 32 -hostfile testhost /opt/exp_soft/espresso-5.1/bin/pw.x -in > scf.in > The error: > > [cluster:14838] *** Process received signal *** > [cluster:14838] Signal: Segmentation fault (11) > [cluster:14838] Signal code: (128) > [cluster:14838] Failing at address: (nil) > [cluster:14838] [ 0] /lib64/libpthread.so.0() [0x3a78c0f710] > [cluster:14838] [ 1] > /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_mc3.so(mkl_blas_zdotc+0x79) > [0x2b04082cf4f9] > [cluster:14838] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 24 with PID 14838 on node > cluster.khayam.local exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > Any help will be appreciated. > > Regards, > Mohaddeseh > > --------------------------------------------------------- > > Mohaddeseh Abbasnejad, > Room No. 323, Department of Physics, > University of Tehran, North Karegar Ave., > Tehran, P.O. Box: 14395-547- IRAN > Tel. No.: +98 21 6111 8634 & Fax No.: +98 21 8800 4781 > Cellphone: +98 917 731 7514 > E-Mail: [email protected] > Website: http://physics.ut.ac.ir > > --------------------------------------------------------- > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dept. Chemistry&Physics&Environment, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
