Hi,

Am 07.04.2011 um 18:38 schrieb William Deegan:


> By timed out I mean exceeding the h_rt specified with the job
> 
> qsub -l h_rt=5  script_to_sleep_10.sh

this will be 3), i.e. killed and you can see it only in the messages file of 
all involved nodes. For parallel jobs this can be any of the involved nodes. 
Same stands for a passed h_vmem limit.

Did you mean a `qdel` by the original "3) killed" then?

-- Reuti

> 
> -Bill
> On Apr 7, 2011, at 2:10 AM, Reuti wrote:
> 
>> Hi,
>> 
>> Am 06.04.2011 um 23:31 schrieb William Deegan:
>> 
>>> Is qacct the best way to pull information about a job to see if its:
>>> 1) completed 
>>> 2) timedout
>>> 3) killed
>>> 4) still running
>> 
>> all you can get with `qacct` or `qstat` is what SGE thinks the job is/was 
>> doing. If you want all information you have to check various places:
>> 
>> 4) `ps -e f`: maybe it jumped out of the process tree and is still there 
>> while SGE think it's over.
>> 
>> 3) the "messages" files of all involved exechosts.
>> 
>> 2) what do you mean by timedout? Hanging around on a node doing nothing?
>> 
>> 1) From SGE's point of view: yes, but there is a short delay between the 
>> real end of a job and its entry in the accounting file of some seconds.
>> 
>> -- Reuti
>> 
>> 
>>> Thanks,
>>> Bill
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to