On Tue, Feb 21, 2012 at 3:55 AM, Mahmoud Payami <mpayami at aeoi.org.ir> wrote: > Dear QE users, > > I am using sge for running a parallel job. > My "file.qsub" contains the following lines: > -- > #!/bin/bash > # > #$ -cwd > #$ -j y > #$ -S /bin/bash > /opt/openmpi/bin/mpirun /opt/qe/bin/pw.x -npool 2 -ndiag 16 < > /home/mahmoud/file.in > -- > Then I use the orte parallel env and use the command: > qsub -V -pe orte 32 file.qsub > Everything is ok until the first david diagonalization during which the load > on some nodes increases the number of processors (that is,?the node has > totally 8 cores but the load shows at the crash time to be more than 16)?, > and then those nodes hangup.
are you sure you're not simply running out of memory and driving the machines to swap until they give up? axel. > Any comment is highly appreciated. > > Best regards, > ????????????????????? Mahmoud Payami > > -------------------------------- > Mahmoud Payami > Physics Group, AEOI, > Tehran-Iran > > Email: mpayami at aeoi.org.ir > Phone: +98 (0) 21 82064393 > ---------------------------------------------- > > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://www.democritos.it/mailman/listinfo/pw_forum > -- Dr. Axel Kohlmeyer akohlmey at gmail.com ?http://goo.gl/1wk0 College of Science and Technology Temple University, Philadelphia PA, USA.
