Re: [QE-users] Issue with running parallel version

Martina Lessio Thu, 24 May 2018 10:00:12 -0700

Dear Paolo,

Thanks so much for your response and for testing my input file on your
machine with the latest version of Quantum Espresso. I should compile the
latest version on my machine.


Regarding the memory: that number is quite close to my memory limit (60 GB)
and such large memory requirement probably explains why my calculations are
so slow. I will consider using a smaller unit cell and using
pseudopotentials with less valence electrons to reduce the memory
requirement. I have actually been searching for a norm conserving fully
relativistic pseudopotential for tellurium with only s and p valence
electrons (this would reduce the number of valence electrons from 16 to 6)
but I have not been able to find it. I would appreciate any suggestion
regarding this aspect.

Thank you so much,
Martina


On Thu, May 24, 2018 at 4:30 AM, Paolo Giannozzi <[email protected]>
wrote:

> On 16 processors with the latest QE version, I get
>
>      Estimated max dynamical RAM per process >       3.18 GB
>      Estimated total dynamical RAM >      50.87 GB
>
> Do you have that much memory?
>
> Paolo
>
>
> On Tue, May 22, 2018 at 1:14 AM, Martina Lessio <[email protected]>
> wrote:
>
>> Dear Quantum Espresso community,
>>
>> I recently started running the parallel version of QE 5.4.0 and I am
>> getting the following error message in my output file (the error never
>> appeared when I run the code serial on one processor):
>>
>> [compute-0-6.local:31540] 23 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>>
>> [compute-0-6.local:31540] Set MCA parameter "orte_base_help_aggregate" to
>> 0 to see all help / error messages
>>
>> where "compute-0-1" is the name of the node I run my calculation on.
>> After the message is printed in the output the calculation typically
>> continues normally and in same cases gets successfully to the end. I other
>> cases, usually when I had large supercell with about 100 atoms, the
>> calculation becomes extremely slow and does not get to the end in a
>> reasonable time. Therefore, I suspect that the error being printed also
>> signals that the calculation will start running only on one processor.
>> However, upon checking, I noticed that all the processors I requested are
>> still busy with the calculation after the error message shows up.
>>
>> I tried to search the forum and the FAQs for this type of issue but did
>> not find much. I would really appreciate if anybody could share their
>> experience with this type of error.
>> I am providing below my submission script and input file (for a large
>> calculation that runs very slowly after the error message is printed).
>>
>> Thanks so much,
>> Martina
>>
>> --
>> Martina Lessio, Ph.D.
>> Frontiers of Science Lecturer in Discipline
>> Postdoctoral Research Scientist
>> Department of Chemistry
>> Columbia University
>>
>> *Submission script:*
>> #!/bin/bash
>> #SBATCH --job-name=QErun
>> #SBATCH -n 24   # node count
>> #SBATCH -p New # node count
>> #SBATCH -o MoTe2ml_super551OPT.out
>> #SBATCH --mem=60000
>> module load openmpi
>> module load mkl
>> module load compilers
>> mpirun -np 24 pw.x < MoTe2ml_super551OPT.in
>>
>> *Input file:*
>>  &control
>>     calculation = 'relax'
>>     restart_mode='from_scratch',
>>     prefix='MoTe2ml_super5x5relax',
>>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>>  /
>>
>>  &system
>>     ibrav= 4, A=17.65, B=17.65, C=16.882, cosAB=-0.5, cosAC=0, cosBC=0,
>>     nat= 75, ntyp= 2,
>>     ecutwfc =60.
>>     lspinorb =.true., noncolin=.true.
>>  /
>>
>>  &electrons
>>     mixing_mode = 'plain'
>>     mixing_beta = 0.7
>>     conv_thr =  1.0d-10
>>     diago_david_ndim=2
>>     diagonalization='cg'
>>  /
>>
>>  &ions
>>  /
>>
>> ATOMIC_SPECIES
>>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
>>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>>
>> ATOMIC_POSITIONS {crystal}
>> Te     0.133333330         0.066666657         0.313489588
>> Te     0.133333336         0.266666661         0.313489588
>> Te     0.133333334         0.466666672         0.313489588
>> Te     0.133333325         0.666666683         0.313489588
>> Te     0.133333336         0.866666694         0.313489588
>> Te     0.333333312         0.066666657         0.313489588
>> Te     0.333333306         0.266666661         0.313489588
>> Te     0.333333310         0.466666672         0.313489588
>> Te     0.333333304         0.666666683         0.313489588
>> Te     0.333333319         0.866666694         0.313489588
>> Te     0.533333329         0.066666657         0.313489588
>> Te     0.533333336         0.266666661         0.313489588
>> Te     0.533333320         0.466666672         0.313489588
>> Te     0.533333317         0.666666683         0.313489588
>> Te     0.533333335         0.866666694         0.313489588
>> Te     0.733333372         0.066666657         0.313489588
>> Te     0.733333352         0.266666661         0.313489588
>> Te     0.733333390         0.466666672         0.313489588
>> Te     0.733333374         0.666666683         0.313489588
>> Te     0.733333385         0.866666694         0.313489588
>> Te     0.933333361         0.066666657         0.313489588
>> Te     0.933333341         0.266666661         0.313489588
>> Te     0.933333379         0.466666672         0.313489588
>> Te     0.933333363         0.666666683         0.313489588
>> Te     0.933333347         0.866666694         0.313489588
>> Te     0.133333330         0.066666657         0.097661430
>> Te     0.133333336         0.266666661         0.097661430
>> Te     0.133333334         0.466666672         0.097661430
>> Te     0.133333325         0.666666683         0.097661430
>> Te     0.133333336         0.866666694         0.097661430
>> Te     0.333333312         0.066666657         0.097661430
>> Te     0.333333306         0.266666661         0.097661430
>> Te     0.333333310         0.466666672         0.097661430
>> Te     0.333333304         0.666666683         0.097661430
>> Te     0.333333319         0.866666694         0.097661430
>> Te     0.533333329         0.066666657         0.097661430
>> Te     0.533333336         0.266666661         0.097661430
>> Te     0.533333320         0.466666672         0.097661430
>> Te     0.533333317         0.666666683         0.097661430
>> Te     0.533333335         0.866666694         0.097661430
>> Te     0.733333372         0.066666657         0.097661430
>> Te     0.733333352         0.266666661         0.097661430
>> Te     0.733333390         0.466666672         0.097661430
>> Te     0.733333374         0.666666683         0.097661430
>> Te     0.733333385         0.866666694         0.097661430
>> Te     0.933333361         0.066666657         0.097661430
>> Te     0.933333341         0.266666661         0.097661430
>> Te     0.933333379         0.466666672         0.097661430
>> Te     0.933333363         0.666666683         0.097661430
>> Te     0.933333347         0.866666694         0.097661430
>> Mo     0.066666675         0.133333330         0.205570934
>> Mo     0.066666667         0.333333310         0.205570934
>> Mo     0.066666685         0.533333321         0.205570934
>> Mo     0.066666655         0.733333332         0.205570934
>> Mo     0.066666666         0.933333343         0.205570934
>> Mo     0.266666695         0.133333330         0.205570934
>> Mo     0.266666683         0.333333310         0.205570934
>> Mo     0.266666698         0.533333321         0.205570934
>> Mo     0.266666678         0.733333332         0.205570934
>> Mo     0.266666696         0.933333343         0.205570934
>> Mo     0.466666671         0.133333330         0.205570934
>> Mo     0.466666666         0.333333310         0.205570934
>> Mo     0.466666690         0.533333321         0.205570934
>> Mo     0.466666668         0.733333332         0.205570934
>> Mo     0.466666681         0.933333343         0.205570934
>> Mo     0.666666687         0.133333330         0.205570934
>> Mo     0.666666655         0.333333310         0.205570934
>> Mo     0.666666666         0.533333321         0.205570934
>> Mo     0.666666650         0.733333332         0.205570934
>> Mo     0.666666674         0.933333343         0.205570934
>> Mo     0.866666676         0.133333330         0.205570934
>> Mo     0.866666644         0.333333310         0.205570934
>> Mo     0.866666682         0.533333321         0.205570934
>> Mo     0.866666666         0.733333332         0.205570934
>> Mo     0.866666650         0.933333343         0.205570934
>>
>> K_POINTS {automatic}
>>   2 2 1 0 0 0
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://lists.quantum-espresso.org/mailman/listinfo/users
>



-- 
Martina Lessio, Ph.D.
Frontiers of Science Lecturer in Discipline
Postdoctoral Research Scientist
Department of Chemistry
Columbia University

_______________________________________________
users mailing list
[email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Issue with running parallel version

Reply via email to