Reuti,
On Apr 8, 2011, at 1:53 AM, Reuti wrote:

> Hi,
> 
> Am 07.04.2011 um 18:38 schrieb William Deegan:
> 
> 
>> By timed out I mean exceeding the h_rt specified with the job
>> 
>> qsub -l h_rt=5  script_to_sleep_10.sh
> 
> this will be 3), i.e. killed and you can see it only in the messages file of 
> all involved nodes. For parallel jobs this can be any of the involved nodes. 
> Same stands for a passed h_vmem limit.
> 
> Did you mean a `qdel` by the original "3) killed" then?

Yes.

-Bill

> 
> -- Reuti
> 
>> 
>> -Bill
>> On Apr 7, 2011, at 2:10 AM, Reuti wrote:
>> 
>>> Hi,
>>> 
>>> Am 06.04.2011 um 23:31 schrieb William Deegan:
>>> 
>>>> Is qacct the best way to pull information about a job to see if its:
>>>> 1) completed 
>>>> 2) timedout
>>>> 3) killed
>>>> 4) still running
>>> 
>>> all you can get with `qacct` or `qstat` is what SGE thinks the job is/was 
>>> doing. If you want all information you have to check various places:
>>> 
>>> 4) `ps -e f`: maybe it jumped out of the process tree and is still there 
>>> while SGE think it's over.
>>> 
>>> 3) the "messages" files of all involved exechosts.
>>> 
>>> 2) what do you mean by timedout? Hanging around on a node doing nothing?
>>> 
>>> 1) From SGE's point of view: yes, but there is a short delay between the 
>>> real end of a job and its entry in the accounting file of some seconds.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Thanks,
>>>> Bill
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to