Dear Andrea,

I'm glad to hear that resetting current_q is no problem. While running some 
tests today, I realized that there is one more point I did not think about in 
the new only_init approach. We run the individual (q,irr) grid jobs in serial 
mode since this is simplest to achieve in a grid. However, it would be really 
helpful to execute the only_init run locally in parallel since in serial mode 
the epsilon + band structure calculation of the largest systems can take 
several days. So, I tried to run only_init in parallel and (q,irr) jobs in 
serial, but then ph.x will fail in q>1 since the only_init run writes separate 
wfc files for all parallel processes and openfilq is looking for just one wfc 
file. My naive first attempt was to modify run_pwscf:
  twfcollect=.FALSE.
  IF (only_init) twfcollect=.TRUE.
  CALL punch( 'all' )
but I realized that this just writes the data in the _ph0/qdir/prefix.save in 
the wf_collect-format and does not produce the _ph0/qdir/prefix.wfc file that 
openfilq is waiting for. 

I wonder if there would be any simple way to 
a) make run_pwscf to write the _ph0/qdir/prefix.wfc file for a parallel 
only_init job
or b) make openfilq (and phq_init) to read wf_collect-style wavefunction data 
for q>1  if there is no _ph0/qdir/prefix.wfc file (or if a keyword tells it so)?

Or would this just complicate things too much? I guess the latter option could 
be considered as an "internal" wf_collect option for ph.x, resulting in maximum 
flexibility.

Best wishes,
Antti

-- 
Dr. Antti Karttunen
Department of Chemistry
University of Jyv?skyl?, Finland
Tel: +358-50-3473475
WWW: http://www.iki.fi/ankarttu 


-----Original Message-----
From: pw_forum-bounces at pwscf.org [mailto:[email protected]] On 
Behalf Of Andrea Dal Corso
Sent: Wednesday, February 27, 2013 12:13 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] ph.x: Avoiding the recalculation of the band structure 
in distributed phonon dispersion jobs


On Wed, 2013-02-27 at 06:55 +0000, Karttunen Antti wrote:
> Dear Andrea,
> 
> Thank you very much for the bug fix and introducing the low_directory_check 
> input variable. Now the process goes very smoothly and we can avoid all the 
> unnecessary band calculations in the future.
> 
> I noticed that there is still some problem with the GRID_example 
> run_example_3: Looking at the reference output files, epsilon and bands are 
> actually recalculated at every q. I ran the example and it seems that there 
> is some problem with the management of the temporary directories. The example 
> actually runs nicely, if one completely omits the creation of the separate 
> $q.$irr directories and just runs with one single _ph0 directory with one 
> $prefix.phsave and all the qdirs. 
> 
> I also noticed that the run_example_3 always tries to keep the qdir of the 
> last q-point in the current temp directory:
>   cp -r $TMP_DIR/_ph0/$PREFIX.q_8 $TMP_DIR/$q.$irr/_ph0/
> I guess the reason for this is that without this, ph.x crashes for q<8 
> because seqopn fails for $prefix.q_8/recover? I encountered this with my own 
> tests, too. It seems that after the only_init run, CURRENT_Q in 
> status_run.xml is set to the last q-point and ph.x would then like to have 
> $prefix.q_8 directory around in the following (q,irr) calculations. I'm 
> planning that I don't want to move all qdirs into every (q,irr) _ph0 
> directory, so after the only_init run, I will reset the CURRENT_Q to 1 in my 
> scripts. For example something like
> 
> sed -r -i '/<CURRENT_Q/,/<\/CURRENT_Q/s/[[:digit:]]+[[:space:]]*$/1/' 
> _ph0/$prefix.phsave/status_run.xml
> 
> works nicely. Or maybe ph.x could reset CURRENT_Q to 1 in the end of a 
> successful only_init-run? But this might have some side effects I'm not aware 
> of, so I'm also fine with using the above script. Anyway, thanks a lot for 
> all the great work with the grid implementation, this will enormously speed 
> up our work on the phonon calculations of large systems.
> 
The script had still some problems, now it should be OK. OK also for the
reset of current_q, I have now commited the change.

The reason for having different directories $q.$irr is that the GRID
example should work also in different machines that do not share the
same disk, but it is not necessary to use them when you work with many
CPUs that share the same disk.

Andrea




Reply via email to