Rolly,

I assume you use some sort of script to submit or run your calculation. Do not 
run for 500K seconds, split this run in a sequence of short ones and keep the 
max_time within 12h~24h. In this way you always have a chance, if something 
goes wrong in one run of your long relaxation calculation, to resume safely 
without need to recompute too much.

This suggestion is driven by common sense, not because of how QE or QE-GPU work.

HTH

--
Filippo SPIGA
* Sent from my iPhone, sorry for typos *

> On 16 Feb 2016, at 07:01, Rolly Ng <roll...@gmail.com> wrote:
> 
> Dear Paolo,
>  
> Thank you for the clarification, I will give it a trial.
>  
> Regards,
> Rolly
>  
> PhD, Research Fellow,
> Department of Physics and Materials Science,
> City University of Hong Kong
> Tel: +852 3442 4000
> Fax:+852 3442 0538
>  
> From: pw_forum-boun...@pwscf.org [mailto:pw_forum-boun...@pwscf.org] On 
> Behalf Of Paolo Giannozzi
> Sent: Tuesday, February 16, 2016 2:51 PM
> To: PWSCF Forum
> Subject: Re: [Pw_forum] Geometry optimization on QE530-GPU with memory 
> allocation error?
>  
> You do not need to update atomic coordinates: the code will read and use the 
> latest set of coordinates if you restart from a previous run (after a clean 
> stop)
> 
> Paolo
>  
> On Tue, Feb 16, 2016 at 6:39 AM, Rolly Ng <roll...@gmail.com> wrote:
> Dear Filippo,
>  
> Thanks for the quick tip.
>  
> I would like to know the correct method of stop-restart a geometry 
> optimization.
>  
> 1)      Initially, add  max_seconds = 500000 to the &CONTROL section
> 
> 2)      Add restart_mode = from_scractch to the &CONTROL section
> 
> 3)      Run pw-gpu.x and wait for the run to stop after 500000 seconds
> 
> 4)      Modify restart_mode = restart to the &CONTROL section
> 
> 5)      Rerun pw-gpu.x and wait for the run to stop after 500000 seconds
> 
>  
> What I am not sure is the coordinates of atoms for restarting the 
> calculation? Since I am doing  geometry optimization, the positions of the 
> atoms does change and do I need to update the latest coordinates at the 
> 500000 seconds manually? And how can I do that?
>  
> Thanks,
> Rolly
>  
> PhD, Research Fellow,
> Department of Physics and Materials Science,
> City University of Hong Kong
> Tel: +852 3442 4000
> Fax:+852 3442 0538
>  
> From: pw_forum-boun...@pwscf.org [mailto:pw_forum-boun...@pwscf.org] On 
> Behalf Of Filippo Spiga
> Sent: Tuesday, February 16, 2016 12:20 PM
> To: PWSCF Forum
> Subject: Re: [Pw_forum] Geometry optimization on QE530-GPU with memory 
> allocation error?
>  
> Dear Rolly,
>  
> sorry to hear about your problem, I imagine the frustration of losing so much 
> time and being unable to recover because of an error happened in the middle 
> of a SCF step. It is hard to guess what went wrong at that point, especially 
> after the calculation run continuously on multiple GPU for almost 7 days 
> without stop.
>  
> Just a consideration, valid with or without GPU: unless not possible, _never_ 
> run continuously for so long. It is a bad idea for multiple reasons. Always 
> safely checkpoit/restart your calculation more often.
>  
> Cheers
>  
> --
> Filippo SPIGA
> * Sent from my iPhone, sorry for typos *
> 
> On 16 Feb 2016, at 04:01, Rolly Ng <roll...@gmail.com> wrote:
> 
> Dear Filippo and QE-GPU users,
>  
> I am running a geometry optimization and the system contains 128 atoms. It 
> runs fine but until the time spent reaches 590,000 seconds it stops with the 
> error, and the job fails to complete L and I have this error 3 times for 3 
> different cases.
>  
> “Error in memory allocation, program will be terminated (2) !!! Bye…”
>  
> I can confirm the error only appear after running for more than 560,000 
> seconds, so all the previous effort was wasted L if I cannot restart the 
> optimization L.
>  
> I have not seen such problem with QE520-GPU or may be my previous runs did 
> not last for so long.
>  
> Could you please check my input file? Thank you!
>  
> &CONTROL
>                 calculation = 'relax' ,
>                 outdir = '/home/zgdeng/Rolly/TiNSurf200',
>                 pseudo_dir = '/home/zgdeng/SSSP_acc_PBE' ,                    
>                              
> prefix = 'TiNSurf200+Biotin',
>                 verbosity = 'low' ,
>                etot_conv_thr = 1.0D-3 ,
>                forc_conv_thr = 1.0D-2 ,
>                 nstep = 100 ,
>                 tstress = .false. ,
>                 tprnfor = .false. ,
> /
> &SYSTEM
>                 ibrav = 14,
> celldm(1) = 22.9288029598d0, celldm(2)=1.2990423130d0, 
> celldm(3)=5.2512156527d0,
>                 celldm(4) = 0.0000000000d0, celldm(5)=0.0000000000d0, 
> celldm(6)=0.0000000000d0,
>                 nat = 128,
>                 ntyp = 6,
>                 ecutwfc = 30d0 ,
>                 ecutrho = 240d0 ,
>                 nosym = .true. ,
>                 nbnd = 600,
>                input_dft = 'PBE' ,
>                 occupations = 'smearing' ,
>                 degauss = 0.015d0 ,
>                smearing = 'gaussian' ,
> /
> &ELECTRONS
>                 electron_maxstep = 1000,
>                 conv_thr = 1d-06 ,
>                 mixing_mode = 'local-TF' ,
>                 mixing_beta = 0.300d0 ,
>                 diagonalization = 'david' ,
> /
>   &IONS
>                ion_dynamics = 'bfgs' ,
>                upscale = 100.D0 ,
>                bfgs_ndim = 3 ,
> /
> ATOMIC_SPECIES
>                 C 12.010700d0 C_pbe_v1.2.uspp.F.UPF
>                 H 1.007940d0 H.pbe-rrkjus_psl.0.1.UPF
>                 N 14.006700d0 N.pbe.theos.UPF
> O 15.999400d0 O.pbe-n-kjpaw_psl.0.1.UPF
>                 S 32.065000d0 S_pbe_v1.2.uspp.F.UPF
>                 Ti 47.867000d0 ti_pbe_v1.4.uspp.F.UPF
> ATOMIC_POSITIONS {alat}
>                 Ti   0.0000000000d0   0.0000000000d0   0.1021361444d0   0   0 
>   0
> Ti   0.1250000000d0   0.2165113823d0   0.1021361444d0   0   0   0
> Ti   0.0000000000d0   0.1443365914d0   0.3062508969d0   1   1   1
> Ti   0.1250000000d0   0.3608479737d0   0.3062508969d0   1   1   1
> N    0.0000000000d0   0.1443365914d0   0.0001050243d0   0   0   0
> N    0.1250000000d0   0.3608479737d0   0.0001050243d0   0   0   0
> N    0.1250000000d0   0.0721747909d0   0.2042197767d0   1   1   1
> N    0.0000000000d0   0.2886731828d0   0.2042197767d0   1   1   1
> Ti   0.2500000000d0   0.0000000000d0   0.1021361444d0   0   0   0
>                 Ti   0.3750000000d0   0.2165113823d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.2500000000d0   0.1443365914d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.3750000000d0   0.3608479737d0   0.3062508969d0   1   1 
>   1
>                 N    0.2500000000d0   0.1443365914d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   0.3608479737d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   0.0721747909d0   0.2042197767d0   1   1 
>   1
>                 N    0.2500000000d0   0.2886731828d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.5000000000d0   0.0000000000d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.6250000000d0   0.2165113823d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.5000000000d0   0.1443365914d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.6250000000d0   0.3608479737d0   0.3062508969d0   1   1 
>   1
>                 N    0.5000000000d0   0.1443365914d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   0.3608479737d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   0.0721747909d0   0.2042197767d0   1   1 
>   1
>                 N    0.5000000000d0   0.2886731828d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.7500000000d0   0.0000000000d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.8750000000d0   0.2165113823d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.7500000000d0   0.1443365914d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.8750000000d0   0.3608479737d0   0.3062508969d0   1   1 
>   1
>                 N    0.7500000000d0   0.1443365914d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   0.3608479737d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   0.0721747909d0   0.2042197767d0   1   1 
>   1
>                 N    0.7500000000d0   0.2886731828d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.0000000000d0   0.4330097742d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.1250000000d0   0.6495211565d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.0000000000d0   0.5773463656d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.1250000000d0   0.7938577479d0   0.3062508969d0   1   1 
>   1
>                 N    0.0000000000d0   0.5773463656d0   0.0001050243d0   0   0 
>   0
>                 N    0.1250000000d0   0.7938577479d0   0.0001050243d0   0   0 
>   0
>                 N    0.1250000000d0   0.5051845651d0   0.2042197767d0   1   1 
>   1
>                 N    0.0000000000d0   0.7216959474d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.2500000000d0   0.4330097742d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.3750000000d0   0.6495211565d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.2500000000d0   0.5773463656d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.3750000000d0   0.7938577479d0   0.3062508969d0   1   1 
>   1
>                 N    0.2500000000d0   0.5773463656d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   0.7938577479d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   0.5051845651d0   0.2042197767d0   1   1 
>   1
>                 N    0.2500000000d0   0.7216959474d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.5000000000d0   0.4330097742d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.6250000000d0   0.6495211565d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.5000000000d0   0.5773463656d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.6250000000d0   0.7938577479d0   0.3062508969d0   1   1 
>   1
>                 N    0.5000000000d0   0.5773463656d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   0.7938577479d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   0.5051845651d0   0.2042197767d0   1   1 
>   1
>                 N    0.5000000000d0   0.7216959474d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.7500000000d0   0.4330097742d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.8750000000d0   0.6495211565d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.7500000000d0   0.5773463656d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.8750000000d0   0.7938577479d0   0.3062508969d0   1   1 
>   1
>                 N    0.7500000000d0   0.5773463656d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   0.7938577479d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   0.5051845651d0   0.2042197767d0   1   1 
>   1
>                 N    0.7500000000d0   0.7216959474d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.0000000000d0   0.8660325388d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.1250000000d0   1.0825309307d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.0000000000d0   1.0103691302d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.1250000000d0   1.2268675220d0   0.3062508969d0   1   1 
>   1
>                 N    0.0000000000d0   1.0103691302d0   0.0001050243d0   0   0 
>   0
>                 N    0.1250000000d0   1.2268675220d0   0.0001050243d0   0   0 
>   0
>                 N    0.1250000000d0   0.9381943393d0   0.2042197767d0   1   1 
>   1
>                 N    0.0000000000d0   1.1547057216d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.2500000000d0   0.8660325388d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.3750000000d0   1.0825309307d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.2500000000d0   1.0103691302d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.3750000000d0   1.2268675220d0   0.3062508969d0   1   1 
>   1
>                 N    0.2500000000d0   1.0103691302d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   1.2268675220d0   0.0001050243d0   0   0 
>   0
>                 N    0.3750000000d0   0.9381943393d0   0.2042197767d0   1   1 
>   1
>                 N    0.2500000000d0   1.1547057216d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.5000000000d0   0.8660325388d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.6250000000d0   1.0825309307d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.5000000000d0   1.0103691302d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.6250000000d0   1.2268675220d0   0.3062508969d0   1   1 
>   1
>                 N    0.5000000000d0   1.0103691302d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   1.2268675220d0   0.0001050243d0   0   0 
>   0
>                 N    0.6250000000d0   0.9381943393d0   0.2042197767d0   1   1 
>   1
>                 N    0.5000000000d0   1.1547057216d0   0.2042197767d0   1   1 
>   1
>                 Ti   0.7500000000d0   0.8660325388d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.8750000000d0   1.0825309307d0   0.1021361444d0   0   0 
>   0
>                 Ti   0.7500000000d0   1.0103691302d0   0.3062508969d0   1   1 
>   1
>                 Ti   0.8750000000d0   1.2268675220d0   0.3062508969d0   1   1 
>   1
>                 N    0.7500000000d0   1.0103691302d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   1.2268675220d0   0.0001050243d0   0   0 
>   0
>                 N    0.8750000000d0   0.9381943393d0   0.2042197767d0   1   1 
>   1
>                 N    0.7500000000d0   1.1547057216d0   0.2042197767d0   1   1 
>   1
>                 N    0.4062600000d0   0.9896104340d0   0.6937906120d0   1   1 
>   1
>                 C    0.4092000000d0   0.9020160108d0   0.6045199459d0   1   1 
>   1
>                 C    0.4577300000d0   0.7953906178d0   0.6618107087d0   1   1 
>   1
>                 N    0.4939900000d0   0.8337513373d0   0.7754470154d0   1   1 
>   1
>                 C    0.4605200000d0   0.9497168446d0   0.7956116835d0   1   1 
>   1
>                 C    0.5499000000d0   0.7467544736d0   0.5886612747d0   1   1 
>   1
>                 S    0.5127800000d0   0.7970274111d0   0.4537050324d0   1   1 
>   1
>                 C    0.4869600000d0   0.9325824765d0   0.5090003332d0   1   1 
>   1
>                 C    0.5593700000d0   0.6202537332d0   0.5940700268d0   1   1 
>   1
>                 C    0.5857900000d0   0.5794118428d0   0.7112246480d0   1   1 
>   1
>                 C    0.5913300000d0   0.4526253131d0   0.7064460418d0   1   1 
>   1
>                 C    0.6159700000d0   0.4036254371d0   0.8208700308d0   1   1 
>   1
>                 C    0.6181100000d0   0.2770987158d0   0.8104726238d0   1   1 
>   1
>                 O    0.6709500000d0   0.2080416264d0   0.8994807291d0   1   1 
>   1
>                 O    0.5738500000d0   0.2226038907d0   0.7076538214d0   1   1 
>   1
>                 O    0.4792600000d0   1.0152795101d0   0.8997958021d0   1   1 
>   1
>                 H    0.3676800000d0   1.0720216783d0   0.6843909360d0   1   1 
>   1
>                 H    0.3244700000d0   0.8813742285d0   0.5695993618d0   1   1 
>   1
>                 H    0.3864400000d0   0.7347123514d0   0.6695825079d0   1   1 
>   1
>                 H    0.5416000000d0   0.7826340223d0   0.8344706794d0   1   1 
>   1
>                 H    0.6311400000d0   0.7881549521d0   0.6112940141d0   1   1 
>   1
>                 H    0.4487000000d0   0.9936374652d0   0.4486113532d0   1   1 
>   1
>                 H    0.5677800000d0   0.9656950650d0   0.5436058444d0   1   1 
>   1
>                 H    0.6272000000d0   0.5918826491d0   0.5355189723d0   1   1 
>   1
>                 H    0.4775600000d0   0.5827503816d0   0.5669737540d0   1   1 
>   1
>                 H    0.5177200000d0   0.6062890283d0   0.7701432876d0   1   1 
>   1
>                 H    0.6681700000d0   0.6144340236d0   0.7398437733d0   1   1 
>   1
>                 H    0.6588100000d0   0.4267094190d0   0.6464771590d0   1   1 
>   1
>                 H    0.5087600000d0   0.4194737533d0   0.6762515517d0   1   1 
>   1
>                 H    0.5487800000d0   0.4294374078d0   0.8812590108d0   1   1 
>   1
>                 H    0.6993600000d0   0.4344257303d0   0.8513270816d0   1   1 
>   1
>                 H    0.5063400000d0   0.2734743877d0   0.6728382616d0   1   1 
>   1
> K_POINTS {automatic}
>                 4 4 1 0 0 0
>  
> <QE530-GPU memory error.png>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
> 
> 
> 
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
_______________________________________________
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to