Reuti, On Apr 8, 2011, at 1:53 AM, Reuti wrote: > Hi, > > Am 07.04.2011 um 18:38 schrieb William Deegan: > > >> By timed out I mean exceeding the h_rt specified with the job >> >> qsub -l h_rt=5 script_to_sleep_10.sh > > this will be 3), i.e. killed and you can see it only in the messages file of > all involved nodes. For parallel jobs this can be any of the involved nodes. > Same stands for a passed h_vmem limit. > > Did you mean a `qdel` by the original "3) killed" then?
Yes. -Bill > > -- Reuti > >> >> -Bill >> On Apr 7, 2011, at 2:10 AM, Reuti wrote: >> >>> Hi, >>> >>> Am 06.04.2011 um 23:31 schrieb William Deegan: >>> >>>> Is qacct the best way to pull information about a job to see if its: >>>> 1) completed >>>> 2) timedout >>>> 3) killed >>>> 4) still running >>> >>> all you can get with `qacct` or `qstat` is what SGE thinks the job is/was >>> doing. If you want all information you have to check various places: >>> >>> 4) `ps -e f`: maybe it jumped out of the process tree and is still there >>> while SGE think it's over. >>> >>> 3) the "messages" files of all involved exechosts. >>> >>> 2) what do you mean by timedout? Hanging around on a node doing nothing? >>> >>> 1) From SGE's point of view: yes, but there is a short delay between the >>> real end of a job and its entry in the accounting file of some seconds. >>> >>> -- Reuti >>> >>> >>>> Thanks, >>>> Bill >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
