Dear Joshua, Thanks for your tip. Unfortunately, restarting with a different number of processes is not working for me. I want to migrate the execution between different machines, but at first I'm trying to stop the computation and restart it at the same machine, just with a different number of processes. If I succeed in this scenario, I'll move to migration. The version I'm working on is the latest, 5.3.0, compiled on a Centos7 machine with gfortran 4.8.5 and OpenMPI 1.10.0.
The CONTROL namelist of my input file is like this: &CONTROL prefix = "migration", restart_mode = "from_scratch", wf_collect = .TRUE., outdir = "./scratch/", pseudo_dir = "./pseudopotentials.d", / (you can find the full version here: http://pastebin.com/rxN7KCq3) I started the execution with the following command: $ mpirun -np 2 ~/quantum/install/pw.x -inp test_4.in > test_4.out I left it running for a few minutes. Then, I stopped the calculation with: $ touch migration.EXIT At the output file test_4.out, I can see that the execution went up to the sixth iteration: iteration # 6 ecut= 25.00 Ry beta=0.30 Davidson diagonalization with overlap ethr = 2.90E-04, avg # of iterations = 1.0 (full output here: http://pastebin.com/8YhkWTmr) >From a previous run, I know that there are 28 iterations. After that, I altered the CONTROL namelist for this: &CONTROL prefix = "migration", restart_mode = "restart", wf_collect = .TRUE., outdir = "./scratch/", pseudo_dir = "./pseudopotentials.d", / I restarted the execution with the following command: $ mpirun -np 4 ~/quantum/install/pw.x -inp test_4.in > test_4_migration.out As you can see, instead of 2 processes, I'm setting 4 in the second run. Using the linux tool 'top', I can see that four processes were created. The program seems to find the right iteration, since the output file test_4_migration.out contains the following: Starting wfc from file Calculation restarted from scf iteration # 7 total cpu time spent up to now is 3.3 secs per-process dynamical memory: 44.9 Mb (full output here: http://pastebin.com/GfBBqxYJ) But even after several minutes, no new iterations are appended to the file. And no error messages either. Am I missing something? Cheers, --------------------------------------------------- Name: Joaquim José Xavier Institution: Faculdade de Educação, Ciências, e Letras do Sertão Central - Quixadá - Ceará - Brasil http://www.uece.br/feclesc/ --------------------------------------------------- On Tue, Mar 8, 2016 at 5:12 PM, Joshua Davis <[email protected]> wrote: > Dear Joaquim, > > you may want to look up the "wfcollect" option under &CONTROL > > > http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_PW.html#__top__ > > Joshua Davis > Michigan State University > > On Tue, Mar 8, 2016 at 2:11 PM Malicious Scientist < > [email protected]> wrote: > >> Dear Nicola, >> >> Sorry, my mistake. >> >> --------------------------------------------------- >> Name: Joaquim José Xavier >> Institution: Faculdade de Educação, Ciências, e Letras do Sertão Central >> - Quixadá - Ceará - Brasil >> http://www.uece.br/feclesc/ >> --------------------------------------------------- >> >> On Tue, Mar 8, 2016 at 3:42 PM, Nicola Marzari <[email protected]> >> wrote: >> >>> >>> Dear Malicious, >>> >>> PLEASE see the posting guidelines: >>> http://www.quantum-espresso.org/forum/#1.0 >>> >>> *Sign your post with your name and affiliation.* >>> >>> nicola >>> >>> >>> >>> On 08/03/2016 19:37, Malicious Scientist wrote: >>> > Hello Community, >>> > >>> > I would like to know if it is possible top stop a pw.x run, copy to >>> > files to a different machine, and then restart the computation. >>> > >>> > For example, to stop the execution, I would create a $prefix.EXIT file >>> > on the working directory (just like described at >>> > >>> http://www.quantum-espresso.org/wp-content/uploads/Doc/pw_user_guide/node19.html >>> ). >>> > >>> > After that, I would copy the entire working directory, including the >>> > scratch dir, to a remote server with the same version of QE installed. >>> > Then I would restart the computation setting the 'restart_mode' flag to >>> > 'restart' at the CONTROL namelist. >>> > >>> > Is this supposed to work? If so, may I restart the computation with a >>> > different number of CPUs? >>> > >>> > Thank you for your attention. >>> > >>> > >>> > _______________________________________________ >>> > Pw_forum mailing list >>> > [email protected] >>> > http://pwscf.org/mailman/listinfo/pw_forum >>> > >>> >>> -- >>> ---------------------------------------------------------------------- >>> Prof Nicola Marzari, Chair of Theory and Simulation of Materials, EPFL >>> Director, National Centre for Competence in Research NCCR MARVEL, EPFL >>> http://theossrv1.epfl.ch/Main/Contact http://nccr-marvel.ch/en/project >>> _______________________________________________ >>> Pw_forum mailing list >>> [email protected] >>> http://pwscf.org/mailman/listinfo/pw_forum >>> >> >> _______________________________________________ >> Pw_forum mailing list >> [email protected] >> http://pwscf.org/mailman/listinfo/pw_forum > > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum >
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
