Dear Vishal Gupta, Which version of Quantum ESPRESSO do you use?
Do you use a SVN version downloaded later than July, 6th 2015 (revision > r11608)? If yes, then you may have the same problem as I do. My problem is > due to the commit on July, 6th (r11608), and it occurs when I run QE on > FERMI @CINECA (BlueGene/Q Architecture, Italian HPC) with 2048 cores (there > is no problem with 1024 cores). http://qe-forge.org/gf/project/q-e/scmsvn/?action=browse&path=%2Ftrunk%2Fespresso%2FModules%2Fmp_world.f90&r1=11607&r2=11608 At the very beginning of the run, there is a message: 4466173:ibm.runjob.client.Job: terminated by signal 11 4466173:ibm.runjob.client.Job: abnormal termination by signal 11 from rank 2031 and the code crashes without producing any output. However, the problem didn't occur on other HPC's I use. I have solved the problem by going back to a revision 11607, which implied changes in the routine Modules/mp_world.f90 by going back from CALL MPI_Init_thread(MPI_THREAD_MULTIPLE, PROVIDED, ierr) to CALL mpi_init_thread(MPI_THREAD_FUNNELED,PROVIDED,ierr) You may also try to do all needed changes in mp_world.f90 and test the code again. HTH Best regards, Iurii Timrov Postdoctoral Researcher SISSA - International School for Advanced Studies Condensed Matter Sector Via Bonomea n. 265, Trieste 34151, Italy On 2015-08-14 19:32, Axel Kohlmeyer wrote: > On Fri, Aug 14, 2015 at 1:20 PM, Vishal Gupta > <[email protected]> wrote: >> Sorry, I should've mentioned. >> I asked them but they said there might be something wrong with the QE >> input >> file. If that was the case, the file shouldn't have been running fine >> with 7 >> processors but it is. Could there really be something wrong with the >> input >> file ? > > sysadmins often say this, so they don't have to check it out, or when > they don't know what they are doing. if they *know* that there is > something wrong with the input, then they should at the very least > tell you what it is. > > but i agree that if it works with less processors, it should work with > more. unless you are using some very unusual settings when launching > the job. more likely is that you are running out of memory on the > machine or are hitting a stack size limit or something similar. your > system manager(s) should be able to figure this out and/or advise you > how to run that you are using less memory, or with a hybrid MPI plus > OpenMP parallelization or whatever else is possible on the specific > machine. > > in any case, it doesn't really sound like a QE problem. > > axel. > > >> Sorry if I am asking stupid doubts but I am little new at this. >> Vishal Gupta >> B.Tech. 3rd year Mechanical >> Indian Institute of Technology Ropar >> Rupnagar (140001), Punjab, India. >> Email :- [email protected] >> >> On Fri, Aug 14, 2015 at 10:32 PM, Axel Kohlmeyer <[email protected]> >> wrote: >>> >>> On Fri, Aug 14, 2015 at 12:58 PM, Vishal Gupta >>> <[email protected]> wrote: >>> > I've been running an SCF calculation for a fee Ni system on High >>> > performance >>> > cluster. The job runs fine with processors 7 or less but it always leads >>> > to >>> > segmentation fault if the no of processors exceeds 7. >>> > The job takes 4-5 days for the run. >>> > Is there any way to increase the no of processors so that it doesn't >>> > lead to >>> > the error ? >>> > mpirun noticed that process rank 0 with PID 6353 on node c7c exited on >>> > signal 11 (Segmentation fault). >>> > or excessive memory leakage. >>> >>> that is really a question your should ask the system manager(s) or >>> user support people of the machine that you are running on. >>> >>> axel. >>> >>> >>> > >>> > Thank You >>> > Vishal Gupta >>> > B.Tech. 3rd year Mechanical >>> > Indian Institute of Technology Ropar >>> > Rupnagar (140001), Punjab, India. >>> > Email :- [email protected] >>> > >>> > _______________________________________________ >>> > Pw_forum mailing list >>> > [email protected] >>> > http://pwscf.org/mailman/listinfo/pw_forum >>> >>> >>> >>> -- >>> Dr. Axel Kohlmeyer [email protected] http://goo.gl/1wk0 >>> College of Science & Technology, Temple University, Philadelphia PA, >>> USA >>> International Centre for Theoretical Physics, Trieste. Italy. >>> _______________________________________________ >>> Pw_forum mailing list >>> [email protected] >>> http://pwscf.org/mailman/listinfo/pw_forum >> >> >> >> _______________________________________________ >> Pw_forum mailing list >> [email protected] >> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
