The bug was just fixed. If you have time, please try the latest svn version and report if it doesn't work as expected
Paolo On Wed, Oct 21, 2015 at 3:09 PM, Kane O'Donnell <[email protected]> wrote: > > Works with 5.2.0 (Cray PrgEnv-gnu, fortran 4.9, Cray libsci) with -ndiag > > 1, -ntg > 1. Thanks! Lesson learned… > > Kane > > > On 21 Oct 2015, at 17:27, Paolo Giannozzi <[email protected]> wrote: > > Unless you need new developments that are available in the svn version > only, please try if it works with the 5.2.0 version. We just found a > problem (also affecting v.5.2.1) with "task groups" that may lead to > strange crashes. > > Paolo > > On Wed, Oct 21, 2015 at 11:04 AM, Kane O'Donnell <[email protected]> > wrote: > >> >> Hi all, >> >> Wondering if I can get some help trying to diagnose a crash. I’m running >> the SVN latest on a Cray XC40 (Magnus - >> https://www.pawsey.org.au/our-systems/magnus-technical-specifications/). >> Usually no problems, but I have difficulties getting the attached slab >> calculation to run past the first few davidson diagonalizations. It’s a 3x3 >> c-oriented slab of Na3Bi, the minimum I can use to capture a certain >> adsorbate reconstruction (this just the bare slab). It’s only a 72 atom >> cell but Bi has a lot of electrons (I think there’s about ~800 electrons >> and ~900 bands). I have spin-orbit coupling switched on (important for this >> solid), and I have been able to do calculations on the smaller unit cell >> using the library pseudopotentials listed in the species block. >> Calculations on systems of this size (e.g. O(1000) electrons, bands) are >> routine on Magnus, so I think I’m probably just doing something stupid but >> can’t seem to figure it out. >> >> Typical run conditions are with 384 processors (16 nodes, 24 cores), with >> -nk 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory >> crash just as the code reports it is allocating random wf’s at the >> beginning. From 16 nodes and upwards, the crashes happen around the >> diagonalization step. Switching to CG works but the slowdown is >> astronomical (~10000 seconds per SCF step, not feasible for a relaxation). >> A typical output is attached. With -ndiag > 1, the error is “problems >> computing cholesky”, with -ndiag = 1, the error is "S matrix not positive >> definite”, both from cdiaghg. A search of the forums suggests this issue >> comes up every now and then on wildly different systems and is usually >> blamed on the user/compiler/lapack/blas/scalapack. So, details: QE was >> compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) against the Cray >> libsci (includes fftw, scalapack, etc), with: >> >> ./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc >> >> and all tests are passed with no problems. >> >> Any ideas? Let me know if there is any further information necessary. >> >> Best regards, >> >> Kane >> >> *Kane O'Donnell* >> *Postdoctoral Research Fellow | Department of Physics, Astronomy and >> Medical Radiation Science* >> >> *Curtin University* >> *Tel |* +61 8 9266 1381 >> *Fax |* +61 8 9266 2377 >> >> *Email |* [email protected] >> >> >> >> >> Curtin University is a trademark of Curtin University of Technology >> CRICOS Provider Code 00301J >> >> &control >> calculation = 'relax', >> title = '', >> outdir = './', >> prefix = 'Na3Bi_331', >> pseudo_dir = >> '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/', >> wf_collect = .true. >> / >> &system >> ibrav = 0, >> nat = 72, >> ntyp = 2, >> nbnd = 896, >> ecutwfc = 50, >> !ecutrho = 280, >> !tot_charge=+1.0, >> occupations = 'smearing', >> smearing = 'mv', >> degauss = 0.0073, >> lspinorb = .true., >> noncolin = .true., >> starting_magnetization(1) = 0.0, >> starting_magnetization(2) = 0.0 >> / >> &electrons >> conv_thr = 1.0D-7 >> / >> &ions >> / >> ATOMIC_SPECIES >> Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF >> Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF >> CELL_PARAMETERS angstrom >> 16.344 0 0 >> -8.172 14.1543 0 >> 1.77359e-15 3.07196e-15 28.965 >> K_POINTS automatic >> 2 2 1 0 0 0 >> ATOMIC_POSITIONS angstrom >> Bi 0 3.146 2.414 0 0 0 >> Na 2.724 4.718 2.414 0 0 0 >> Na 0 3.146 5.629 >> Bi 2.724 1.573 7.241 >> Na 2.724 4.718 7.241 >> Na 2.724 1.573 0.801 0 0 0 >> Na 2.724 1.573 4.026 >> Na 0 3.146 8.854 >> Bi -2.724 7.864 2.414 0 0 0 >> Na 0 9.436 2.414 0 0 0 >> Na -2.724 7.864 5.629 >> Bi 0 6.291 7.241 >> Na 0 9.436 7.241 >> Na 0 6.291 0.801 0 0 0 >> Na 0 6.291 4.026 >> Na -2.724 7.864 8.854 >> Bi -5.448 12.582 2.414 0 0 0 >> Na -2.724 14.154 2.414 0 0 0 >> Na -5.448 12.582 5.629 >> Bi -2.724 11.009 7.241 >> Na -2.724 14.154 7.241 >> Na -2.724 11.009 0.801 0 0 0 >> Na -2.724 11.009 4.026 >> Na -5.448 12.582 8.854 >> Bi 5.448 3.146 2.414 0 0 0 >> Na 8.172 4.718 2.414 0 0 0 >> Na 5.448 3.146 5.629 >> Bi 8.172 1.573 7.241 >> Na 8.172 4.718 7.241 >> Na 8.172 1.573 0.801 0 0 0 >> Na 8.172 1.573 4.026 >> Na 5.448 3.146 8.854 >> Bi 2.724 7.864 2.414 0 0 0 >> Na 5.448 9.436 2.414 0 0 0 >> Na 2.724 7.864 5.629 >> Bi 5.448 6.291 7.241 >> Na 5.448 9.436 7.241 >> Na 5.448 6.291 0.801 0 0 0 >> Na 5.448 6.291 4.026 >> Na 2.724 7.864 8.854 >> Bi 0 12.582 2.414 0 0 0 >> Na 2.724 14.154 2.414 0 0 0 >> Na 0 12.582 5.629 >> Bi 2.724 11.009 7.241 >> Na 2.724 14.154 7.241 >> Na 2.724 11.009 0.801 0 0 0 >> Na 2.724 11.009 4.026 >> Na 0 12.582 8.854 >> Bi 10.896 3.146 2.414 0 0 0 >> Na 13.62 4.718 2.414 0 0 0 >> Na 10.896 3.146 5.629 >> Bi 13.62 1.573 7.241 >> Na 13.62 4.718 7.241 >> Na 13.62 1.573 0.801 0 0 0 >> Na 13.62 1.573 4.026 >> Na 10.896 3.146 8.854 >> Bi 8.172 7.864 2.414 0 0 0 >> Na 10.896 9.436 2.414 0 0 0 >> Na 8.172 7.864 5.629 >> Bi 10.896 6.291 7.241 >> Na 10.896 9.436 7.241 >> Na 10.896 6.291 0.801 0 0 0 >> Na 10.896 6.291 4.026 >> Na 8.172 7.864 8.854 >> Bi 5.448 12.582 2.414 0 0 0 >> Na 8.172 14.154 2.414 0 0 0 >> Na 5.448 12.582 5.629 >> Bi 8.172 11.009 7.241 >> Na 8.172 14.154 7.241 >> Na 8.172 11.009 0.801 0 0 0 >> Na 8.172 11.009 4.026 >> Na 5.448 12.582 8.854 >> >> >> >> >> _______________________________________________ >> Pw_forum mailing list >> [email protected] >> http://pwscf.org/mailman/listinfo/pw_forum >> > > > > -- > Paolo Giannozzi, Dept. Chemistry&Physics&Environment, > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy > Phone +39-0432-558216, fax +39-0432-558222 > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > > > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dept. Chemistry&Physics&Environment, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
