Wondering if I can get some help trying to diagnose a crash. I’m running the SVN latest on a Cray XC40 (Magnus - https://www.pawsey.org.au/our-systems/magnus-technical-specifications/). Usually no problems, but I have difficulties getting the attached slab calculation to run past the first few davidson diagonalizations. It’s a 3x3 c-oriented slab of Na3Bi, the minimum I can use to capture a certain adsorbate reconstruction (this just the bare slab). It’s only a 72 atom cell but Bi has a lot of electrons (I think there’s about ~800 electrons and ~900 bands). I have spin-orbit coupling switched on (important for this solid), and I have been able to do calculations on the smaller unit cell using the library pseudopotentials listed in the species block. Calculations on systems of this size (e.g. O(1000) electrons, bands) are routine on Magnus, so I think I’m probably just doing something stupid but can’t seem to figure it out. Typical run conditions are with 384 processors (16 nodes, 24 cores), with -nk 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory crash just as the code reports it is allocating random wf’s at the beginning. From 16 nodes and upwards, the crashes happen around the diagonalization step. Switching to CG works but the slowdown is astronomical (~10000 seconds per SCF step, not feasible for a relaxation). A typical output is attached. With -ndiag > 1, the error is “problems computing cholesky”, with -ndiag = 1, the error is "S matrix not positive definite”, both from cdiaghg. A search of the forums suggests this issue comes up every now and then on wildly different systems and is usually blamed on the user/compiler/lapack/blas/scalapack. So, details: QE was compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) against the Cray libsci (includes fftw, scalapack, etc), with: ./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc and all tests are passed with no problems. Any ideas? Let me know if there is any further information necessary. Best regards, Kane Kane O'Donnell |
Curtin University is a trademark of Curtin University of Technology CRICOS Provider Code 00301J &control calculation = 'relax', title = '', outdir = './', prefix = 'Na3Bi_331', pseudo_dir = '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/', wf_collect = .true. / &system ibrav = 0, nat = 72, ntyp = 2, nbnd = 896, ecutwfc = 50, !ecutrho = 280, !tot_charge=+1.0, occupations = 'smearing', smearing = 'mv', degauss = 0.0073, lspinorb = .true., noncolin = .true., starting_magnetization(1) = 0.0, starting_magnetization(2) = 0.0 / &electrons conv_thr = 1.0D-7 / &ions / ATOMIC_SPECIES Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF CELL_PARAMETERS angstrom 16.344 0 0 -8.172 14.1543 0 1.77359e-15 3.07196e-15 28.965 K_POINTS automatic 2 2 1 0 0 0 ATOMIC_POSITIONS angstrom Bi 0 3.146 2.414 0 0 0 Na 2.724 4.718 2.414 0 0 0 Na 0 3.146 5.629 Bi 2.724 1.573 7.241 Na 2.724 4.718 7.241 Na 2.724 1.573 0.801 0 0 0 Na 2.724 1.573 4.026 Na 0 3.146 8.854 Bi -2.724 7.864 2.414 0 0 0 Na 0 9.436 2.414 0 0 0 Na -2.724 7.864 5.629 Bi 0 6.291 7.241 Na 0 9.436 7.241 Na 0 6.291 0.801 0 0 0 Na 0 6.291 4.026 Na -2.724 7.864 8.854 Bi -5.448 12.582 2.414 0 0 0 Na -2.724 14.154 2.414 0 0 0 Na -5.448 12.582 5.629 Bi -2.724 11.009 7.241 Na -2.724 14.154 7.241 Na -2.724 11.009 0.801 0 0 0 Na -2.724 11.009 4.026 Na -5.448 12.582 8.854 Bi 5.448 3.146 2.414 0 0 0 Na 8.172 4.718 2.414 0 0 0 Na 5.448 3.146 5.629 Bi 8.172 1.573 7.241 Na 8.172 4.718 7.241 Na 8.172 1.573 0.801 0 0 0 Na 8.172 1.573 4.026 Na 5.448 3.146 8.854 Bi 2.724 7.864 2.414 0 0 0 Na 5.448 9.436 2.414 0 0 0 Na 2.724 7.864 5.629 Bi 5.448 6.291 7.241 Na 5.448 9.436 7.241 Na 5.448 6.291 0.801 0 0 0 Na 5.448 6.291 4.026 Na 2.724 7.864 8.854 Bi 0 12.582 2.414 0 0 0 Na 2.724 14.154 2.414 0 0 0 Na 0 12.582 5.629 Bi 2.724 11.009 7.241 Na 2.724 14.154 7.241 Na 2.724 11.009 0.801 0 0 0 Na 2.724 11.009 4.026 Na 0 12.582 8.854 Bi 10.896 3.146 2.414 0 0 0 Na 13.62 4.718 2.414 0 0 0 Na 10.896 3.146 5.629 Bi 13.62 1.573 7.241 Na 13.62 4.718 7.241 Na 13.62 1.573 0.801 0 0 0 Na 13.62 1.573 4.026 Na 10.896 3.146 8.854 Bi 8.172 7.864 2.414 0 0 0 Na 10.896 9.436 2.414 0 0 0 Na 8.172 7.864 5.629 Bi 10.896 6.291 7.241 Na 10.896 9.436 7.241 Na 10.896 6.291 0.801 0 0 0 Na 10.896 6.291 4.026 Na 8.172 7.864 8.854 Bi 5.448 12.582 2.414 0 0 0 Na 8.172 14.154 2.414 0 0 0 Na 5.448 12.582 5.629 Bi 8.172 11.009 7.241 Na 8.172 14.154 7.241 Na 8.172 11.009 0.801 0 0 0 Na 8.172 11.009 4.026 Na 5.448 12.582 8.854 |
Na3Bi_331.relax.out
Description: Binary data
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
