On 10/10/2012 03:49 PM, Orion Poplawski wrote:

The next thing I'd like to tackle is where to set using the dmtcp_starter
script.  Ideally this would be set in the dmtcp checkpointing configuration
file, but there is no current starter_method/command setting there.  Perhaps
worth an RFE?  Otherwise I may be stuck doing it on a per queue basis or
globally which presents cleanup issues - although it looks like I can use
SGE_CKPT_ENV=dmtcp to see if dmtcp checkpointing has been called for.  Looks
like the thing to do for now.

Okay, this uses SGE_CKPT_ENV:

#!/bin/bash
# dmtcp_starter - dmtcp job starter - runs jobs under dmtcp checkpointing

# starter_methods need to be installed per queue, but we only
# want to setup dmtcp checkpointing for jobs that use it
if [ "$SGE_CKPT_ENV" = dmtcp ]
then
  # Get the base from the config file
  eval `grep ^ckpt_dir= $SGE_JOB_SPOOL_DIR/config`

  # Make the per task checkpoint directory if it doesn't already exist
  CKPTDIR=${ckpt_dir}/${JOB_ID}.${SGE_TASK_ID/undefined/1}
  mkdir -p $CKPTDIR

  # Setup dmtcp_coordinator - this will get killed by the shepherd
export DMTCP_PORT=`dmtcp_coordinator --port 0 --ckptdir $CKPTDIR --exit-on-last --interval 0 --background 2>&1 | grep "Port:" | /bin/sed -e 's/Port://g' -e 's/[ \t]//g'`

  # Record the port for later use by checkpointing scripts
  echo $DMTCP_PORT > $TMPDIR/dmtcp_port

  if [ "$RESTARTED" -eq 2 ]
  then
    # Override the setting in dmtcp_restart_script.sh
    export DMTCP_HOST=$HOSTNAME
    # Restart the job
    exec $CKPTDIR/dmtcp_restart_script.sh
  else
    # We need to move the job script to remove the hostname from the path
    cp $1 $CKPTDIR/jobscript
    shift
    # Start the job (TODO - be able to set the argv[0] for login shell)
    exec dmtcp_checkpoint --quiet $SGE_STARTER_SHELL_PATH $CKPTDIR/jobscript 
"$@"
  fi
else
  # Start the job normally with proper login shell handling
  if [ "$SGE_STARTER_USE_LOGIN_SHELL" == true ]
  then
    shellname=$(basename $SGE_STARTER_SHELL_PATH)
    exec -a -${shellname} $SGE_STARTER_SHELL_PATH "$@"
  else
    exec $SGE_STARTER_SHELL_PATH "$@"
  fi
fi


I've set up a repository here:

https://github.com/opoplawski/gridengine_dmtcp

This is all very raw, hot off the press, but I hope to do more testing in the coming days.

--
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                       [email protected]
Boulder, CO 80301                   http://www.nwra.com
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to