[Pw_forum] parallel diag. "failure" with too large number of k-points?????

Giovanni Cantele Thu, 14 Jan 2010 13:09:09 +0100

Dear all,

I'm doing some test runs on bulk silicon, using QE 4.0.5. I need to calculate 
the eigenvalues on dense k-points grids,
to test convergence of some properties. So, I did a simple scf + nscf runs with 
with increasing (automatic) k-point grid
(from 8 8 8    1 1 1 to 24 24 24    1 1 1). The calculation is parallel and 
uses 8 CPUs.


All the grids with a number of k-points > 804 show a strange behaviour, namely, 
that once reached the 804th k-point
(so I see in the output the "Computing kpt #:   804" line) the calculation 
remains running but no update of output files is done.

Has anybody experienced such kind of problem?


Just to help (me and you) to understand what is going on, I did some debug:

i) the problem disappears if I use -ndiag 1
(consider that if not specified, by default the parallel algorithm is used, the 
output header contains
Iterative solution of the eigenvalue problem

     a parallel distributed memory algorithm will be used,
     eigenstates matrixes will be distributed block like on
     ortho sub-group =    2*   2 procs
)
Of course, I don't know if in this case (that is, with -ndiag 1) the problem 
could appear with much denser grids (e.g. 64 64 64    1 1 1), but
I would say not

ii) the problem is maybe a memory (allocation/deallocation?????) issue because 
(see also point iv), if the parallel algorithm is used (which is what the code
chooses by default), the problem also disappears on decreasing the cutoff from 
30 Ry to 15 Ry

iii) the code stops (in the sense that keeps running without doing anything) in 
SUBROUTINE pcegterg (PW/cegterg.f90)
at the point 
CALL zsqmred( nbase, vl, desc_old( nlax_ ), desc_old, nbase+notcnv, hl, nx, 
desc )
(this means that the calculation of the 804th k-point executes the line 
immediately before, but not the one immediately after)
Of course, all this provided I did my debug correctly.

iv) using QE 4.1.2 the code runs 25% slower and stops exactly at the same 
k-point but in this case it doesn't keep running
(the code stops with a segmentation fault error)


Another issue I would like to point out is that, in the cases when the 
calculation finishes correctly, the one with "-ndiag 1" runs much
faster (not sure but maybe half the time), so it could be better to set the 
code in such a way that, in these "not-expensive" cases the
parallel diagonalization is disabled by default.

Giovanni



PS these are my input files:

>>>>>>>>>>>>>>>> Si.scf.in
&CONTROL
  calculation  = 'scf'
  title        = 'Si'
  restart_mode = 'from_scratch'
  outdir       = '/scratch/cantele/prova'
  prefix       = 'Si'
  pseudo_dir   = '/home/nm_settings/software/CODES/Quantum-ESPRESSO/pseudo'
  wf_collect   = .true.
  verbosity    = 'high'
/
&SYSTEM
  ibrav        = 2
  celldm(1)    = 10.20927
  nat          = 2
  ntyp         = 1
  ecutwfc      = 30.0
/
&ELECTRONS
  conv_thr     = 1.0d-8
  mixing_beta  = 0.7
/
ATOMIC_SPECIES
  Si    28.0855    Si.pz-vbc.UPF
ATOMIC_POSITIONS { alat }
  Si 0.00 0.00 0.00
  Si 0.25 0.25 0.25
K_POINTS { automatic }
8  8  8    0  0  0



>>>>>>>>>>>>>>>> Si.nscf.in
&CONTROL
  calculation  = 'nscf'
  title            = 'Si'
  restart_mode     = 'from_scratch'
  outdir           = '/scratch/cantele/prova'
  prefix           = 'Si'
  pseudo_dir       = '/home/nm_settings/software/CODES/Quantum-ESPRESSO/pseudo'
  wf_collect       = .true.
  verbosity        = 'high'
/
&SYSTEM
  ibrav            = 2
  celldm(1)        = 10.20927
  nat              = 2
  ntyp             = 1
  ecutwfc          = 30.0
  nbnd             = 60
/
&ELECTRONS
  diago_full_acc   = .true.
  diago_thr_init   = 1.0d-6
/
ATOMIC_SPECIES
  Si    28.0855    Si.pz-vbc.UPF
ATOMIC_POSITIONS { alat }
  Si 0.00 0.00 0.00
  Si 0.25 0.25 0.25
K_POINTS { automatic }
24  24  24    1  1  1

--

Dr. Giovanni Cantele
Coherentia CNR-INFM and Dipartimento di Scienze Fisiche
Universita' di Napoli "Federico II"
Complesso Universitario di Monte S. Angelo - Ed. 6
Via Cintia, I-80126, Napoli, Italy
Phone: +39 081 676910
Fax:   +39 081 676346
E-mail: giovanni.cantele at cnr.it
              giovanni.cantele at na.infn.it
Web: http://people.na.infn.it/~cantele
Research Group: http://www.nanomat.unina.it
Skype contact: giocan74

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://www.democritos.it/pipermail/pw_forum/attachments/20100114/83e2953d/attachment.htm

[Pw_forum] parallel diag. "failure" with too large number of k-points?????

Reply via email to