Hi,

Am 03.04.2013 um 19:24 schrieb Joshua Baker-LePain:

> Howdy all.  We're running GE 2011.11p1, and either I've hit a bug or I'm not 
> understanding the documentation regarding hold_jid correctly.  The qsub man 
> page states, regarding hold_jid, that "If any of the referenced jobs exits  
> with  exit code 100, the submitted job will remain ineligible for execution." 
>  But a simple test seems to dispute that.
> 
> 1) Submit a job that does simply "sleep 30", but include "-l h_rt=10", so
>   that SGE will kill the job.
> 
> 2) Submit a second job where -hold_jid references the first job.
> 
> Given the runtime limit, the first job gets killed and qacct shows:
> 
> failed       100 : assumedly after job

This is not the exit code, the man page refers to. The exit code is one line 
below in the `qacct` output.

The "failed" states are explained here: 
http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-6117/6mlhdapso/  => table 
6-5


> However the second job ends up running anyway.  Am I correct in thinking that 
> it shouldn't do so?

In terms of SGE, it's waiting for the specified job to leave the system. 
Whether by abend or finishing in a proper way doesn't matter.

If the job exits with an exit code of 100, it would still show up in `qstat` 
and inhibit that the second job starts.

-- Reuti


> 
> Thanks.
> 
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to