On Wed, 2013-02-27 at 11:10 +0000, Karttunen Antti wrote:
> Dear Andrea,
>
> I'm glad to hear that resetting current_q is no problem. While running some
> tests today, I realized that there is one more point I did not think about in
> the new only_init approach. We run the individual (q,irr) grid jobs in serial
> mode since this is simplest to achieve in a grid. However, it would be really
> helpful to execute the only_init run locally in parallel since in serial mode
> the epsilon + band structure calculation of the largest systems can take
> several days. So, I tried to run only_init in parallel and (q,irr) jobs in
> serial, but then ph.x will fail in q>1 since the only_init run writes
> separate wfc files for all parallel processes and openfilq is looking for
> just one wfc file. My naive first attempt was to modify run_pwscf:
> twfcollect=.FALSE.
> IF (only_init) twfcollect=.TRUE.
> CALL punch( 'all' )
> but I realized that this just writes the data in the _ph0/qdir/prefix.save in
> the wf_collect-format and does not produce the _ph0/qdir/prefix.wfc file that
> openfilq is waiting for.
>
> I wonder if there would be any simple way to
> a) make run_pwscf to write the _ph0/qdir/prefix.wfc file for a parallel
> only_init job
> or b) make openfilq (and phq_init) to read wf_collect-style wavefunction data
> for q>1 if there is no _ph0/qdir/prefix.wfc file (or if a keyword tells it
> so)?
>
> Or would this just complicate things too much? I guess the latter option
> could be considered as an "internal" wf_collect option for ph.x, resulting in
> maximum flexibility.
>
I am not going to implement this in the SVN version, at least not now.
However it seems that if you reopen the wavefunctions after saving them
with twfcollect=.true.. with something like:
CALL punch( 'all' )
IF (only_init) THEN
CALL clean_pw( .TRUE. )
CALL close_files(.true.)
wfc_dir=tmp_dir_phq
tmp_dir=tmp_dir_phq
CALL read_file()
IF (.NOT.lgamma_iq(iq).OR.(qplot.AND.iq>1)) CALL
set_small_group_of_q(nsymq,invsymq,minus_q)
ENDIF
you can both run the epsilon calculation and the next ph.x runs with a
different number of processors. It is really inelegant, and I think
there are better ways to do this, but it seems to work.
Best wishes,
Andrea
> Best wishes,
> Antti
>
> --
> Dr. Antti Karttunen
> Department of Chemistry
> University of Jyv?skyl?, Finland
> Tel: +358-50-3473475
> WWW: http://www.iki.fi/ankarttu
>
>
> -----Original Message-----
> From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On
> Behalf Of Andrea Dal Corso
> Sent: Wednesday, February 27, 2013 12:13 PM
> To: PWSCF Forum
> Subject: Re: [Pw_forum] ph.x: Avoiding the recalculation of the band
> structure in distributed phonon dispersion jobs
>
>
> On Wed, 2013-02-27 at 06:55 +0000, Karttunen Antti wrote:
> > Dear Andrea,
> >
> > Thank you very much for the bug fix and introducing the low_directory_check
> > input variable. Now the process goes very smoothly and we can avoid all the
> > unnecessary band calculations in the future.
> >
> > I noticed that there is still some problem with the GRID_example
> > run_example_3: Looking at the reference output files, epsilon and bands are
> > actually recalculated at every q. I ran the example and it seems that there
> > is some problem with the management of the temporary directories. The
> > example actually runs nicely, if one completely omits the creation of the
> > separate $q.$irr directories and just runs with one single _ph0 directory
> > with one $prefix.phsave and all the qdirs.
> >
> > I also noticed that the run_example_3 always tries to keep the qdir of the
> > last q-point in the current temp directory:
> > cp -r $TMP_DIR/_ph0/$PREFIX.q_8 $TMP_DIR/$q.$irr/_ph0/
> > I guess the reason for this is that without this, ph.x crashes for q<8
> > because seqopn fails for $prefix.q_8/recover? I encountered this with my
> > own tests, too. It seems that after the only_init run, CURRENT_Q in
> > status_run.xml is set to the last q-point and ph.x would then like to have
> > $prefix.q_8 directory around in the following (q,irr) calculations. I'm
> > planning that I don't want to move all qdirs into every (q,irr) _ph0
> > directory, so after the only_init run, I will reset the CURRENT_Q to 1 in
> > my scripts. For example something like
> >
> > sed -r -i '/<CURRENT_Q/,/<\/CURRENT_Q/s/[[:digit:]]+[[:space:]]*$/1/'
> > _ph0/$prefix.phsave/status_run.xml
> >
> > works nicely. Or maybe ph.x could reset CURRENT_Q to 1 in the end of a
> > successful only_init-run? But this might have some side effects I'm not
> > aware of, so I'm also fine with using the above script. Anyway, thanks a
> > lot for all the great work with the grid implementation, this will
> > enormously speed up our work on the phonon calculations of large systems.
> >
> The script had still some problems, now it should be OK. OK also for the
> reset of current_q, I have now commited the change.
>
> The reason for having different directories $q.$irr is that the GRID
> example should work also in different machines that do not share the
> same disk, but it is not necessary to use them when you work with many
> CPUs that share the same disk.
>
> Andrea
>
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
--
Andrea Dal Corso Tel. 0039-040-3787428
SISSA, Via Bonomea 265 Fax. 0039-040-3787249
I-34136 Trieste (Italy) e-mail: dalcorso at sissa.it