On Fri, Sep 23, 2016 at 5:48 PM, Francesco Pelizza < [email protected]> wrote:
> After 7 days, neb.x stops to work, no closing/cleaning comments, > nothing else, just it stops to work. > reproducibly, for all jobs, after exactly 7 days and never 9 or 4? under which conditions? serial, parallel, with image parallelization, ... ? A parallel code may hang if any two processes that should go in parallel for some reason don't. This may be caused by subtle buildups of numerical differences, coupled to replicated checks; or by a process dying for whatever reason. Paolo > > I happens with several systems, is not a problem of convergence of a > determined image or anything else, the output file is just > interrupted/truncated. > > I can anyway easily restart the job, its not the issue of loosing CPU > time, but when you queue for HPC it becomes a problem. > > > Anyway, Somebody encountered that problem? Is a known problem? > > Some feedback? > > > Thank you > > > Francesco Pelizza > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
