The bug was just fixed. If you have time, please try the latest svn version
and report if it doesn't work as expected

Paolo

On Wed, Oct 21, 2015 at 3:09 PM, Kane O'Donnell <[email protected]>
wrote:

>
> Works with 5.2.0 (Cray PrgEnv-gnu, fortran 4.9, Cray libsci) with -ndiag >
> 1, -ntg > 1. Thanks! Lesson learned…
>
> Kane
>
>
> On 21 Oct 2015, at 17:27, Paolo Giannozzi <[email protected]> wrote:
>
> Unless you need new developments that are available in the svn version
> only, please try if it works with the 5.2.0 version. We just found a
> problem (also affecting v.5.2.1) with "task groups" that may lead to
> strange crashes.
>
> Paolo
>
> On Wed, Oct 21, 2015 at 11:04 AM, Kane O'Donnell <[email protected]>
>  wrote:
>
>>
>> Hi all,
>>
>> Wondering if I can get some help trying to diagnose a crash. I’m running
>> the SVN latest on a Cray XC40 (Magnus -
>> https://www.pawsey.org.au/our-systems/magnus-technical-specifications/).
>> Usually no problems, but I have difficulties getting the attached slab
>> calculation to run past the first few davidson diagonalizations. It’s a 3x3
>> c-oriented slab of Na3Bi, the minimum I can use to capture a certain
>> adsorbate reconstruction (this just the bare slab). It’s only a 72 atom
>> cell but Bi has a lot of electrons (I think there’s about ~800 electrons
>> and ~900 bands). I have spin-orbit coupling switched on (important for this
>> solid), and I have been able to do calculations on the smaller unit cell
>> using the library pseudopotentials listed in the species block.
>> Calculations on systems of this size (e.g. O(1000) electrons, bands) are
>> routine on Magnus, so I think I’m probably just doing something stupid but
>> can’t seem to figure it out.
>>
>> Typical run conditions are with 384 processors (16 nodes, 24 cores), with
>> -nk 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory
>> crash just as the code reports it is allocating random wf’s at the
>> beginning. From 16 nodes and upwards, the crashes happen around the
>> diagonalization step. Switching to CG works but the slowdown is
>> astronomical (~10000 seconds per SCF step, not feasible for a relaxation).
>> A typical output is attached. With -ndiag > 1, the error is “problems
>> computing cholesky”, with -ndiag = 1, the error is "S matrix not positive
>> definite”, both from cdiaghg. A search of the forums suggests this issue
>> comes up every now and then on wildly different systems and is usually
>> blamed on the user/compiler/lapack/blas/scalapack. So, details: QE was
>> compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) against the Cray
>> libsci (includes fftw, scalapack, etc), with:
>>
>> ./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc
>>
>> and all tests are passed with no problems.
>>
>> Any ideas? Let me know if there is any further information necessary.
>>
>> Best regards,
>>
>> Kane
>>
>> *Kane O'Donnell*
>> *Postdoctoral Research Fellow | Department of Physics, Astronomy and
>> Medical Radiation Science*
>>
>> *Curtin University*
>> *Tel |* +61 8 9266 1381
>> *Fax |* +61 8 9266 2377
>>
>> *Email |* [email protected]
>>
>>
>>
>>
>> Curtin University is a trademark of Curtin University of Technology
>> CRICOS Provider Code 00301J
>>
>> &control
>>   calculation = 'relax',
>>   title = '',
>>   outdir = './',
>>   prefix = 'Na3Bi_331',
>>   pseudo_dir =
>> '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/',
>>   wf_collect = .true.
>> /
>> &system
>>   ibrav = 0,
>>   nat = 72,
>>   ntyp = 2,
>>   nbnd = 896,
>>   ecutwfc = 50,
>>   !ecutrho = 280,
>>   !tot_charge=+1.0,
>>   occupations = 'smearing',
>>   smearing = 'mv',
>>   degauss = 0.0073,
>>   lspinorb = .true.,
>>   noncolin = .true.,
>>   starting_magnetization(1) = 0.0,
>>   starting_magnetization(2) = 0.0
>> /
>> &electrons
>>   conv_thr = 1.0D-7
>> /
>> &ions
>> /
>> ATOMIC_SPECIES
>>   Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
>>   Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF
>> CELL_PARAMETERS angstrom
>>   16.344 0 0
>>   -8.172 14.1543 0
>>   1.77359e-15 3.07196e-15 28.965
>> K_POINTS automatic
>>   2 2 1 0 0 0
>> ATOMIC_POSITIONS angstrom
>> Bi    0    3.146    2.414    0    0    0
>> Na    2.724    4.718    2.414    0    0    0
>> Na    0    3.146    5.629
>> Bi    2.724    1.573    7.241
>> Na    2.724    4.718    7.241
>> Na    2.724    1.573    0.801    0    0    0
>> Na    2.724    1.573    4.026
>> Na    0    3.146    8.854
>> Bi    -2.724    7.864    2.414    0    0    0
>> Na    0    9.436    2.414    0    0    0
>> Na    -2.724    7.864    5.629
>> Bi    0    6.291    7.241
>> Na    0    9.436    7.241
>> Na    0    6.291    0.801    0    0    0
>> Na    0    6.291    4.026
>> Na    -2.724    7.864    8.854
>> Bi    -5.448    12.582    2.414    0    0    0
>> Na    -2.724    14.154    2.414    0    0    0
>> Na    -5.448    12.582    5.629
>> Bi    -2.724    11.009    7.241
>> Na    -2.724    14.154    7.241
>> Na    -2.724    11.009    0.801    0    0    0
>> Na    -2.724    11.009    4.026
>> Na    -5.448    12.582    8.854
>> Bi    5.448    3.146    2.414    0    0    0
>> Na    8.172    4.718    2.414    0    0    0
>> Na    5.448    3.146    5.629
>> Bi    8.172    1.573    7.241
>> Na    8.172    4.718    7.241
>> Na    8.172    1.573    0.801    0    0    0
>> Na    8.172    1.573    4.026
>> Na    5.448    3.146    8.854
>> Bi    2.724    7.864    2.414    0    0    0
>> Na    5.448    9.436    2.414    0    0    0
>> Na    2.724    7.864    5.629
>> Bi    5.448    6.291    7.241
>> Na    5.448    9.436    7.241
>> Na    5.448    6.291    0.801    0    0    0
>> Na    5.448    6.291    4.026
>> Na    2.724    7.864    8.854
>> Bi    0    12.582    2.414    0    0    0
>> Na    2.724    14.154    2.414    0    0    0
>> Na    0    12.582    5.629
>> Bi    2.724    11.009    7.241
>> Na    2.724    14.154    7.241
>> Na    2.724    11.009    0.801    0    0    0
>> Na    2.724    11.009    4.026
>> Na    0    12.582    8.854
>> Bi    10.896    3.146    2.414    0    0    0
>> Na    13.62    4.718    2.414    0    0    0
>> Na    10.896    3.146    5.629
>> Bi    13.62    1.573    7.241
>> Na    13.62    4.718    7.241
>> Na    13.62    1.573    0.801    0    0    0
>> Na    13.62    1.573    4.026
>> Na    10.896    3.146    8.854
>> Bi    8.172    7.864    2.414    0    0    0
>> Na    10.896    9.436    2.414    0    0    0
>> Na    8.172    7.864    5.629
>> Bi    10.896    6.291    7.241
>> Na    10.896    9.436    7.241
>> Na    10.896    6.291    0.801    0    0    0
>> Na    10.896    6.291    4.026
>> Na    8.172    7.864    8.854
>> Bi    5.448    12.582    2.414    0    0    0
>> Na    8.172    14.154    2.414    0    0    0
>> Na    5.448    12.582    5.629
>> Bi    8.172    11.009    7.241
>> Na    8.172    14.154    7.241
>> Na    8.172    11.009    0.801    0    0    0
>> Na    8.172    11.009    4.026
>> Na    5.448    12.582    8.854
>>
>>
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> [email protected]
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
> _______________________________________________
> Pw_forum mailing list
> [email protected]
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> _______________________________________________
> Pw_forum mailing list
> [email protected]
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to