Hi, Am 03.04.2013 um 19:24 schrieb Joshua Baker-LePain:
> Howdy all. We're running GE 2011.11p1, and either I've hit a bug or I'm not > understanding the documentation regarding hold_jid correctly. The qsub man > page states, regarding hold_jid, that "If any of the referenced jobs exits > with exit code 100, the submitted job will remain ineligible for execution." > But a simple test seems to dispute that. > > 1) Submit a job that does simply "sleep 30", but include "-l h_rt=10", so > that SGE will kill the job. > > 2) Submit a second job where -hold_jid references the first job. > > Given the runtime limit, the first job gets killed and qacct shows: > > failed 100 : assumedly after job This is not the exit code, the man page refers to. The exit code is one line below in the `qacct` output. The "failed" states are explained here: http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-6117/6mlhdapso/ => table 6-5 > However the second job ends up running anyway. Am I correct in thinking that > it shouldn't do so? In terms of SGE, it's waiting for the specified job to leave the system. Whether by abend or finishing in a proper way doesn't matter. If the job exits with an exit code of 100, it would still show up in `qstat` and inhibit that the second job starts. -- Reuti > > Thanks. > > -- > Joshua Baker-LePain > QB3 Shared Cluster Sysadmin > UCSF > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
