Thanks Reuti.

I think this sends an additional email, correct?    Any easy way to append or check for 
"-m bea" in case users does not want the email?

Joseph

On 09/11/2012 11:21 AM, Reuti wrote:
Hi,

Am 11.09.2012 um 19:10 schrieb Joseph Farran:

Is there a way ( hopefully easy way ) to have Grid Engine to give an 
informative message when a job has gone past a limit and killed, like when a 
job goes over the wall time limit.

When I get an email from Grid Engine where a job has gone past it's wall time 
limit, it is not very informative:

Job 3568 (TEST) Aborted
Exit Status      = 0
Signal           = USR1
User             = me
Queue            = [email protected]
Host             = compute-1-1.local
Start Time       = 09/11/2012 09:54:01
End Time         = 09/11/2012 09:56:02
CPU              = 00:00:00
Max vmem         = 124.145M
failed assumedly after job because:
job 3568.1 died through signal USR1 (10)
You can scan the messages file on the node and put the relevant lines in the 
email in the mail-wrapper:

#!/bin/sh

#
# Distinguish between normal jobs and an array job.
#

case `echo "$2" | cut -d " " -f 1` in

       Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
            CONDITION=`echo "$2" | cut -d " " -f 4` ;;

Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
            CONDITION=`echo "$2" | cut -d " " -f 5` ;;

         *) ;;

esac

if [ "$CONDITION" = "Aborted" ]; then
     if [ -f /var/spool/sge/$HOSTNAME/messages -a -r 
/var/spool/sge/$HOSTNAME/messages ]; then
         APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed" 
/var/spool/sge/$HOSTNAME/messages | head -n 1`
     fi
     if [ -z "$APPENDIX" ]; then
         APPENDIX="Unknown, no entry found in messages file on the master node of 
the job."
     fi
fi

#
# No construct and send the email.
#

if [ -n "$APPENDIX" ]; then
     (cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2" 
"$3"
else
     mail -s "$2" "$3"
fi

if [ -f /var/spool/sge/context/$JOB_ID -a -w /var/spool/sge/context/$JOB_ID ]; 
then
     rm -f /var/spool/sge/context/$JOB_ID
fi


-- Reuti


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to