[Pw_forum] Memory distribution problem

Paolo Giannozzi Sat, 01 Mar 2014 22:49:14 +0100

On Sat, 2014-03-01 at 14:27 -0600, Peng Chen wrote:
> 
> And the system is not that large(32 atoms, 400 nband, 8*8*8 kpoints)
> which is run in 128 cores.  I think you are probably right that QE is
> trying to allocate a large array somehow.


... and ? 

> On Fri, Feb 28, 2014 at 10:35 AM, Paolo Giannozzi
> <paolo.giannozzi at uniud.it> wrote:
>         On Fri, 2014-02-28 at 09:12 -0600, Peng Chen wrote:
>         
>         > I think it is memory, because the error message is like:
>         > : 02/27/2014 14:06:20|  main|zeta27|W|job 221982 exceeds job
>         hard
>         > limit "h_vmem" of queue (2871259136.00000 >
>         limit:2147483648.00000) -
>         > sending SIGKILL
>         
>         
>         there are a few hints on how to reduce memory usage to the
>         strict
>         minimum here:
>         
> http://www.quantum-espresso.org/wp-content/uploads/Doc/pw_user_guide/node19.html#SECTION000600100000000000000
>         If the FFT grid is large, reduce mixing_ndim from its default
>         value (8)
>         to 4 or so. If the number of bands is large, distribute
>         nbnd*nbnd
>         matrices using "-ndiag". If you have many k-points, save to
>         disk with
>         disk_io='medium'. The message you get: "2871259136 >
>         limit:2147483648"
>         makes me think that you crash when trying to allocate an array
>         whose
>         size is at least 2871259136-2147483648=a lot. It shouldn' be
>         difficult
>         to figure out where such a large array comes from
>         
>         Paolo
>         
>         
>         >
>         > I normally used h_stak=128M, it is working fine.
>         >
>         >
>         >
>         >
>         >
>         >
>         > On Fri, Feb 28, 2014 at 7:30 AM, Paolo Giannozzi
>         > <paolo.giannozzi at uniud.it> wrote:
>         >         On Thu, 2014-02-27 at 17:30 -0600, Peng Chen wrote:
>         >         > P.S. Most of the jobs failed at the beginning of
>         scf
>         >         calculation, and
>         >         > the length of output scf file is zero.
>         >
>         >
>         >         are you sure the problem is the size of the RAM and
>         not the
>         >         size of
>         >         the stack?
>         >
>         >         P.
>         >
>         >
>         >         >
>         >         >
>         >         > On Thu, Feb 27, 2014 at 5:09 PM, Peng Chen
>         >         <pchen229 at illinois.edu>
>         >         > wrote:
>         >         >         Dear QE users,
>         >         >
>         >         >
>         >         >         Recently, our workstation is updated and
>         there is a
>         >         hard limit
>         >         >         on memory (2G per core). Some of QE jobs
>         are
>         >         constantly failed
>         >         >         (not always) because one of the MPI
>         processes
>         >         exceeded the RAM
>         >         >         limit and was killed. I am wondering if
>         there is a
>         >         way to
>         >         >         distribute using memory more evenly in
>         every core.
>         >         >
>         >         >
>         >
>         >         > _______________________________________________
>         >         > Pw_forum mailing list
>         >         > Pw_forum at pwscf.org
>         >         > http://pwscf.org/mailman/listinfo/pw_forum
>         >
>         >
>         >         --
>         >          Paolo Giannozzi, Dept.
>         Chemistry&Physics&Environment,
>         >          Univ. Udine, via delle Scienze 208, 33100 Udine,
>         Italy
>         >          Phone +39-0432-558216, fax +39-0432-558222
>         >
>         >         _______________________________________________
>         >         Pw_forum mailing list
>         >         Pw_forum at pwscf.org
>         >         http://pwscf.org/mailman/listinfo/pw_forum
>         >
>         >
>         >
>         > _______________________________________________
>         > Pw_forum mailing list
>         > Pw_forum at pwscf.org
>         > http://pwscf.org/mailman/listinfo/pw_forum
>         
>         --
>          Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
>          Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>          Phone +39-0432-558216, fax +39-0432-558222
>         
>         _______________________________________________
>         Pw_forum mailing list
>         Pw_forum at pwscf.org
>         http://pwscf.org/mailman/listinfo/pw_forum
>         
> 
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment, 
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

[Pw_forum] Memory distribution problem

Reply via email to