Thanks Reuti.
I think this sends an additional email, correct? Any easy way to append or check for
"-m bea" in case users does not want the email?
Joseph
On 09/11/2012 11:21 AM, Reuti wrote:
Hi,
Am 11.09.2012 um 19:10 schrieb Joseph Farran:
Is there a way ( hopefully easy way ) to have Grid Engine to give an
informative message when a job has gone past a limit and killed, like when a
job goes over the wall time limit.
When I get an email from Grid Engine where a job has gone past it's wall time
limit, it is not very informative:
Job 3568 (TEST) Aborted
Exit Status = 0
Signal = USR1
User = me
Queue = [email protected]
Host = compute-1-1.local
Start Time = 09/11/2012 09:54:01
End Time = 09/11/2012 09:56:02
CPU = 00:00:00
Max vmem = 124.145M
failed assumedly after job because:
job 3568.1 died through signal USR1 (10)
You can scan the messages file on the node and put the relevant lines in the
email in the mail-wrapper:
#!/bin/sh
#
# Distinguish between normal jobs and an array job.
#
case `echo "$2" | cut -d " " -f 1` in
Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
CONDITION=`echo "$2" | cut -d " " -f 4` ;;
Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
CONDITION=`echo "$2" | cut -d " " -f 5` ;;
*) ;;
esac
if [ "$CONDITION" = "Aborted" ]; then
if [ -f /var/spool/sge/$HOSTNAME/messages -a -r
/var/spool/sge/$HOSTNAME/messages ]; then
APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed"
/var/spool/sge/$HOSTNAME/messages | head -n 1`
fi
if [ -z "$APPENDIX" ]; then
APPENDIX="Unknown, no entry found in messages file on the master node of
the job."
fi
fi
#
# No construct and send the email.
#
if [ -n "$APPENDIX" ]; then
(cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2"
"$3"
else
mail -s "$2" "$3"
fi
if [ -f /var/spool/sge/context/$JOB_ID -a -w /var/spool/sge/context/$JOB_ID ];
then
rm -f /var/spool/sge/context/$JOB_ID
fi
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users