Also according to the manual queue_conf:

Exit codes for the epilog attribute can be interpreted based on the
following exit values:
    0: Success
    99: Reschedule job
    100: Put job in error state
    Anything else: Put queue in error state

I've had no luck with the "Anything else" exit codes, they never seem
to put the queue into an error state.  To pause things (and enable the
existing job to be re-run), I've decided to exit with code 100 when
the job has a non zero exit code, and suspend the queue using qmod -s
<queue>.

Thanks,
David


On Thu, Aug 2, 2012 at 1:58 PM, David Erickson <[email protected]> wrote:
> Following up on this, in my spool directory I had an exit_status file,
> but it was always empty (bug?).  Fortunately the usage file in the
> same directory had an "exit_status=<val>" line that I could use to get
> the exit status from the script.
>
> On Tue, Jul 10, 2012 at 3:41 PM, David Erickson <[email protected]> wrote:
>> Great info, will be hacking on this this afternoon.
>>
>> Thanks!
>>
>> On Tue, Jul 10, 2012 at 11:43 AM, Rayson Ho <[email protected]> wrote:
>>> On Tue, Jul 10, 2012 at 5:45 AM, Reuti <[email protected]> wrote:
>>>>
>>>> Just to note, that the path can be accessed by $SGE_JOB_SPOOL_DIR.
>>>
>>>
>>> Thanks Reuti - it will be useful to David.
>>>
>>> I forgot this environment var as I have not used this hack for almost
>>> a year... basically since getting the job exit status in epilog was
>>> added in GE 2011.11 last year I stopped referring to
>>> $SGE_JOB_SPOOL_DIR.
>>>
>>> Rayson
>>>
>>>
>>>>
>>>> -- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to