On Wed, 2013-02-27 at 11:10 +0000, Karttunen Antti wrote:
> Dear Andrea,
> 
> I'm glad to hear that resetting current_q is no problem. While running some 
> tests today, I realized that there is one more point I did not think about in 
> the new only_init approach. We run the individual (q,irr) grid jobs in serial 
> mode since this is simplest to achieve in a grid. However, it would be really 
> helpful to execute the only_init run locally in parallel since in serial mode 
> the epsilon + band structure calculation of the largest systems can take 
> several days. So, I tried to run only_init in parallel and (q,irr) jobs in 
> serial, but then ph.x will fail in q>1 since the only_init run writes 
> separate wfc files for all parallel processes and openfilq is looking for 
> just one wfc file. My naive first attempt was to modify run_pwscf:
>   twfcollect=.FALSE.
>   IF (only_init) twfcollect=.TRUE.
>   CALL punch( 'all' )
> but I realized that this just writes the data in the _ph0/qdir/prefix.save in 
> the wf_collect-format and does not produce the _ph0/qdir/prefix.wfc file that 
> openfilq is waiting for. 
> 
> I wonder if there would be any simple way to 
> a) make run_pwscf to write the _ph0/qdir/prefix.wfc file for a parallel 
> only_init job
> or b) make openfilq (and phq_init) to read wf_collect-style wavefunction data 
> for q>1  if there is no _ph0/qdir/prefix.wfc file (or if a keyword tells it 
> so)?
> 
> Or would this just complicate things too much? I guess the latter option 
> could be considered as an "internal" wf_collect option for ph.x, resulting in 
> maximum flexibility.
> 

I am not going to implement this in the SVN version, at least not now.
However it seems that if you reopen the wavefunctions after saving them
with twfcollect=.true.. with something like:

     CALL punch( 'all' )
     IF (only_init) THEN
        CALL clean_pw( .TRUE. )
        CALL close_files(.true.)
        wfc_dir=tmp_dir_phq
        tmp_dir=tmp_dir_phq
        CALL read_file()
        IF (.NOT.lgamma_iq(iq).OR.(qplot.AND.iq>1)) CALL
set_small_group_of_q(nsymq,invsymq,minus_q)
     ENDIF

you can both run the epsilon calculation and the next ph.x runs with a
different number of processors. It is really inelegant, and I think
there are better ways to do this, but it seems to work.

Best wishes,

Andrea



> Best wishes,
> Antti
> 
> -- 
> Dr. Antti Karttunen
> Department of Chemistry
> University of Jyv?skyl?, Finland
> Tel: +358-50-3473475
> WWW: http://www.iki.fi/ankarttu 
> 
> 
> -----Original Message-----
> From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On 
> Behalf Of Andrea Dal Corso
> Sent: Wednesday, February 27, 2013 12:13 PM
> To: PWSCF Forum
> Subject: Re: [Pw_forum] ph.x: Avoiding the recalculation of the band 
> structure in distributed phonon dispersion jobs
> 
> 
> On Wed, 2013-02-27 at 06:55 +0000, Karttunen Antti wrote:
> > Dear Andrea,
> > 
> > Thank you very much for the bug fix and introducing the low_directory_check 
> > input variable. Now the process goes very smoothly and we can avoid all the 
> > unnecessary band calculations in the future.
> > 
> > I noticed that there is still some problem with the GRID_example 
> > run_example_3: Looking at the reference output files, epsilon and bands are 
> > actually recalculated at every q. I ran the example and it seems that there 
> > is some problem with the management of the temporary directories. The 
> > example actually runs nicely, if one completely omits the creation of the 
> > separate $q.$irr directories and just runs with one single _ph0 directory 
> > with one $prefix.phsave and all the qdirs. 
> > 
> > I also noticed that the run_example_3 always tries to keep the qdir of the 
> > last q-point in the current temp directory:
> >   cp -r $TMP_DIR/_ph0/$PREFIX.q_8 $TMP_DIR/$q.$irr/_ph0/
> > I guess the reason for this is that without this, ph.x crashes for q<8 
> > because seqopn fails for $prefix.q_8/recover? I encountered this with my 
> > own tests, too. It seems that after the only_init run, CURRENT_Q in 
> > status_run.xml is set to the last q-point and ph.x would then like to have 
> > $prefix.q_8 directory around in the following (q,irr) calculations. I'm 
> > planning that I don't want to move all qdirs into every (q,irr) _ph0 
> > directory, so after the only_init run, I will reset the CURRENT_Q to 1 in 
> > my scripts. For example something like
> > 
> > sed -r -i '/<CURRENT_Q/,/<\/CURRENT_Q/s/[[:digit:]]+[[:space:]]*$/1/' 
> > _ph0/$prefix.phsave/status_run.xml
> > 
> > works nicely. Or maybe ph.x could reset CURRENT_Q to 1 in the end of a 
> > successful only_init-run? But this might have some side effects I'm not 
> > aware of, so I'm also fine with using the above script. Anyway, thanks a 
> > lot for all the great work with the grid implementation, this will 
> > enormously speed up our work on the phonon calculations of large systems.
> > 
> The script had still some problems, now it should be OK. OK also for the
> reset of current_q, I have now commited the change.
> 
> The reason for having different directories $q.$irr is that the GRID
> example should work also in different machines that do not share the
> same disk, but it is not necessary to use them when you work with many
> CPUs that share the same disk.
> 
> Andrea
> 
> 
> 
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
-- 
Andrea Dal Corso                    Tel. 0039-040-3787428
SISSA, Via Bonomea 265              Fax. 0039-040-3787249
I-34136 Trieste (Italy)             e-mail: dalcorso at sissa.it


Reply via email to