Greetings,

We have the terminate_method for our queue set to SIGTERM, so that when the following submission script runs, it should copy back all the files generated to the original directory. The signal is indeed caught, and the copy-back takes place, but it often dies without completing after a short amount of time.

# BEGIN SCRIPT
#============

# standard gridengine script with automatic copying back of data
#$ -S /bin/bash
#$ -N grid_job
#$ -pe singlenode 16
#$ -cwd
#$ -j y
#$ -R y -r n
# set up scratch directory
WORK=/scratch/${USER}/WORK/${JOB_ID}
ORIG=$PWD
function  setup_workdir()  {
    echo  "-- [$(date)] setting up$WORK"
mkdir -p $WORK test -d $WORK || { echo "EE ERROR: Failed to make tmpdir";exit 1;}
    cp  $TPR  $DEFFNM.cpt$DEFFNM.xtc$DEFFNM.trr$DEFFNM.edr$DEFFNM.log$WORK
copy_success="True"
}
function cleanup_exit() {
    # ensure that we don't overwrite complete files with partial ones if job 
killed mid-copy
    echo  "-- [$(date)] cleaning up:$WORK  -->$ORIG"
    cp  $WORK/*  $ORIG  ||  {  echo  "EE ERROR: Did not copy$WORK  --- check 
manually!";exit  1;}
cd $ORIG
    rm  -r  $WORK
    exit  0
}
# make sure that killing the job copies back everything; won't copy back if job
# killed while copying to workstation (a good thing!)
# (GE must be configured to use SIGTERM for killing jobs!)
trap  cleanup_exit TERM
setup_workdir cd $WORK || { echo "EE ERROR: failed to cd$WORK";exit 2;} # MAIN COMPUTATION RUNS HERE cleanup_exit


#============
# END SCRIPT

What is happening here? Is a second SIGTERM sent by gridengine after some time? If so, what is the best way to ensure this copy-back completes on qdel?

As a note, I have tried sending SIGTERM as a notification instead, and setting the `notify` queue configuration key to 24:00:00 (basically, REALLY LONG). This seems to work in some of my tests, but it has failed in actual use when copying back large data files.

David

--
David L. Dotson
Center for Biological Physics
Arizona State University

Email:[email protected]

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to