There is an other possibility that we are using with some of our ISVs on site.

You have to build a small application that get the remaining time
periodicaly in the batch script and that trigger the applicative
checkpoint when not enough time is available to continue. SLURM
enables you to get the remaining time using a dedicated call in
slurm.h/libslurm : slurm_get_rem_time.

Here is an example on how to get the remaing time in a job :

[hautreuxm@leaf sandbox]$ cat tremain.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <limits.h>
#include <stdint.h>
#include <slurm/slurm.h>

int main(int argc, char** argv) {

        char * jobid_str;
        uint32_t jobid;
        long rem;

        jobid_str = getenv("SLURM_JOBID");
        if ( jobid_str == NULL ) {
                fprintf(stderr,"Not a SLURM job\n");
                exit(1);
        }

        jobid = (uint32_t) strtol(jobid_str,NULL,10);
        if ( jobid == LONG_MIN || jobid == LONG_MAX ) {
                fprintf(stderr,"Invalid SLURM job ID : %s\n",
                        jobid_str);
                exit(2);
        }

        rem = slurm_get_rem_time(jobid);
        if ( rem == -1 ) {
                fprintf(stderr,"Unable to get remaining time\n");
                exit(3);
        }
        else {
                fprintf(stdout,"%li\n",rem);
                exit(0);
        }
}
[hautreuxm@leaf sandbox]$ gcc -Wall -o tremain tremain.c -lslurm
[hautreuxm@leaf sandbox]$ srun -t 1 ./tremain
58
[hautreuxm@leaf sandbox]$

You could start a background process in your batch job that
periodicaly looks at the remaining time using this tremain app and do
the touch when the returned value is lower than the required amount of
time to do a clean checkpoint + a security amount of time.

HTH
Matthieu

PS: do not use this kind of logic in parallel as it will involve an
RPC to the slurm controller and could result in a very large load if
you run a huge parallel application. That is why you should restrict
this helper task to the batch script only and do the test every minute
or every 2 minutes...


2011/10/17 Moe Jette <je...@schedmd.com>:
> This is from the scancel man page:
>
>> -b, --batch
>>       Signal the batch job shell and its child processes.
>
> You have a few options to do what you want:
> * Pick a signal that does not cause problems for any of the child processes
> (perhaps SIGUSR1 or SIGUSR2)
> * Write a checkpoint/intel_mpi plugin that creates your empty file and
> integrates with SLURM's checkpoint logic
> * Hack the SLURM code so that under specific conditions it only signals the
> parent process, this could break various other functions so proceed with
> caution
>
>
> Quoting Domingos <ddc...@gmail.com>:
>
>> Dear community,
>>
>> I am trying to design a batch script that launches a parallel job with
>> mpirun (the Intel MPI version
>> i'm using does not have PMI interface so i can't launch via srun). The
>> application i'm using offers
>> a feature to stop the calculation smoothly with proper checkpointing.
>> Basically i have to write an empty file in the working directory so
>> that it will be detected by the application which takes subsequent
>> proper abortive action. I thought in designing a script which traps a
>> signal sent
>> via scancel, for example, scancel --batch -s TERM JOBID, but
>> unfortunately, in my particular case, slurm sends the signal to the
>> child processes too. So all the child MPI processes seem
>> to be killed or stopped externally by slurm instead of letting my job
>> script to do it.
>> Can anybody point me to the right track?
>>
>> The version of slurm i am using was packaged by BULL, v.2.0.5, and
>> bellow i include a sketch
>> of my job script.
>>
>> Thanks,
>> Domingos
>>
>> --------------------------------------------------------------
>>    #!/bin/bash
>>    #
>>    #SBATCH -o Si_liquid-%N-%j.out
>>    #SBATCH -J Si_liquid
>>    #SBATCH --ntasks=8
>>    #SBATCH --nodes=1
>>    #SBATCH --cpus-per-task=1
>>
>>    source /opt/intel/Compiler/11.1/069/bin/iccvars.sh intel64
>>    source /opt/intel/Compiler/11.1/069/bin/ifortvars.sh intel64
>>    source /opt/intel/impi/4.0.0.028/intel64/bin/mpivars.sh
>>    export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
>>    export I_MPI_FABRICS=shm:dapl
>>
>>    ulimit -s unlimited
>>    ulimit -a
>>
>>    ...
>>    ...
>>
>>    stagein()
>>    {
>>      ...
>>      ...
>>    }
>>
>>    stageout()
>>    {
>>      ...
>>      ...
>>    }
>>
>>    early()
>>    {
>>        echo ' '
>>        echo ' ############ WARNING:  EARLY TERMINATION #############'
>>        echo ' '
>>
>>        touch stop_scf
>>        sleep 120
>>        # and so parsec does a clean kill ...
>>    }
>>    trap 'early; stageout' SIGTERM
>>
>>    stagein
>>    #-------
>>    HOSTFILE=/tmp/hosts.$SLURM_JOB_ID
>>    srun hostname -s | sort -u > ${HOSTFILE}
>>
>>    mpdboot -n ${SLURM_NNODES} -f ${HOSTFILE} -r ssh
>>    mpdtrace -l
>>    mpiexec -np ${SLURM_NPROCS} ./${EXEC_BIN}
>>    mpdallexit
>>    #-------
>>    stageout
>>
>>    exit
>> --------------------------------------------------------------
>>
>
>
>
>

Reply via email to