Am 12.06.2012 um 15:59 schrieb Mazouzi:

> Or define "h_core 0" in the queue definition to disable it by default.
> 
> Hi  
> 
> In this case, I noticed that when the application (VASP)  the requested 
> exceed the h_mem. SGE kill the job and i can see  an out put from qacct :
> 
>   failed       100 : assumedly after job
> 
> But Nothing is returned for the user like (Segmentation Fault or memory 
> exceeded) 
> 
> Is there a way to show a custom message to the user when they exceed the 
> requested h_vmem ?


Yes, you will need a mail-wrapper which looks into the messages file of the 
exechost, scanning for an entry for this particular job. For a parallel job 
this will work for the master node only (where also the job script ran).

$ cat mailer.sh 
#!/bin/sh

#
# Distinguish between normal jobs and an array job.
#

case `echo "$2" | cut -d " " -f 1` in

      Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
           CONDITION=`echo "$2" | cut -d " " -f 4` ;;

Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
           CONDITION=`echo "$2" | cut -d " " -f 5` ;;

        *) ;;

esac

#
# Get the reason in case of an abortion of the job.
#

if [ "$CONDITION" = "Aborted" ]; then
    if [ -f /var/spool/sge/$HOSTNAME/messages -a -r 
/var/spool/sge/$HOSTNAME/messages ]; then
        APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed" 
/var/spool/sge/$HOSTNAME/messages | head -n 1`
    fi

    if [ -z "$APPENDIX" ]; then
        APPENDIX="Unknown, no entry found in messages file on the master node 
of the job."
    fi
fi

#
# Now construct and send the email.
#
 
if [ -n "$APPENDIX" ]; then
    (cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2" 
"$3"
else
    mail -s "$2" "$3"
fi


-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to