Hi,

Am 11.09.2012 um 19:10 schrieb Joseph Farran:

> Is there a way ( hopefully easy way ) to have Grid Engine to give an 
> informative message when a job has gone past a limit and killed, like when a 
> job goes over the wall time limit.
> 
> When I get an email from Grid Engine where a job has gone past it's wall time 
> limit, it is not very informative:
> 
> Job 3568 (TEST) Aborted
> Exit Status      = 0
> Signal           = USR1
> User             = me
> Queue            = [email protected]
> Host             = compute-1-1.local
> Start Time       = 09/11/2012 09:54:01
> End Time         = 09/11/2012 09:56:02
> CPU              = 00:00:00
> Max vmem         = 124.145M
> failed assumedly after job because:
> job 3568.1 died through signal USR1 (10)

You can scan the messages file on the node and put the relevant lines in the 
email in the mail-wrapper:

#!/bin/sh

#
# Distinguish between normal jobs and an array job.
#

case `echo "$2" | cut -d " " -f 1` in

      Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
           CONDITION=`echo "$2" | cut -d " " -f 4` ;;

Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
           CONDITION=`echo "$2" | cut -d " " -f 5` ;;

        *) ;;

esac

if [ "$CONDITION" = "Aborted" ]; then
    if [ -f /var/spool/sge/$HOSTNAME/messages -a -r 
/var/spool/sge/$HOSTNAME/messages ]; then
        APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed" 
/var/spool/sge/$HOSTNAME/messages | head -n 1`
    fi
    if [ -z "$APPENDIX" ]; then
        APPENDIX="Unknown, no entry found in messages file on the master node 
of the job."
    fi
fi

#
# No construct and send the email.
#
 
if [ -n "$APPENDIX" ]; then
    (cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2" 
"$3"
else
    mail -s "$2" "$3"
fi

if [ -f /var/spool/sge/context/$JOB_ID -a -w /var/spool/sge/context/$JOB_ID ]; 
then
    rm -f /var/spool/sge/context/$JOB_ID
fi


-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to