Am 12.06.2012 um 15:59 schrieb Mazouzi:
> Or define "h_core 0" in the queue definition to disable it by default.
>
> Hi
>
> In this case, I noticed that when the application (VASP) the requested
> exceed the h_mem. SGE kill the job and i can see an out put from qacct :
>
> failed 100 : assumedly after job
>
> But Nothing is returned for the user like (Segmentation Fault or memory
> exceeded)
>
> Is there a way to show a custom message to the user when they exceed the
> requested h_vmem ?
Yes, you will need a mail-wrapper which looks into the messages file of the
exechost, scanning for an entry for this particular job. For a parallel job
this will work for the master node only (where also the job script ran).
$ cat mailer.sh
#!/bin/sh
#
# Distinguish between normal jobs and an array job.
#
case `echo "$2" | cut -d " " -f 1` in
Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
CONDITION=`echo "$2" | cut -d " " -f 4` ;;
Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
CONDITION=`echo "$2" | cut -d " " -f 5` ;;
*) ;;
esac
#
# Get the reason in case of an abortion of the job.
#
if [ "$CONDITION" = "Aborted" ]; then
if [ -f /var/spool/sge/$HOSTNAME/messages -a -r
/var/spool/sge/$HOSTNAME/messages ]; then
APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed"
/var/spool/sge/$HOSTNAME/messages | head -n 1`
fi
if [ -z "$APPENDIX" ]; then
APPENDIX="Unknown, no entry found in messages file on the master node
of the job."
fi
fi
#
# Now construct and send the email.
#
if [ -n "$APPENDIX" ]; then
(cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2"
"$3"
else
mail -s "$2" "$3"
fi
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users