[slurm-dev] Re: Can slurm signal the end of jobs?

Daniel Letai Sun, 17 May 2015 01:11:53 -0700

You have several options

1. Epilog: a script automatically executed on job completion
http://slurm.schedmd.com/prolog_epilog.html

Possibly you would want to set EpilogSlurmctld in slurm.conf, which isexecuted by the slurmctld on the head node, rather than the regulareplog executed by slurmd on each node.


2. Add the analysis as a dependant job:
from sbatch man http://slurm.schedmd.com/sbatch.html
*--dependency*=</dependency_list/>
   Defer the start of this job until the specified dependencies have
   been satisfied completed. </dependency_list/> is of the form
   </type:job_id[:job_id][,type:job_id[:job_id]]/>. Many jobs can share
   the same dependency and these jobs may even belong to different
   users. The value may be changed after job submission using the
   scontrol command.

       *after:job_id[:jobid...]*
           This job can begin execution after the specified jobs have

begun execution.*afterany:job_id[:jobid...]*

           This job can begin execution after the specified jobs have

terminated.*afternotok:job_id[:jobid...]*

           This job can begin execution after the specified jobs have
           terminated in some failed state (non-zero exit code, node

failure, timed out, etc).*afterok:job_id[:jobid...]*

           This job can begin execution after the specified jobs have
           successfully executed (ran to completion with an exit code

of zero).*expand:job_id*

           Resources allocated to this job should be used to expand the
           specified job. The job to expand must share the same QOS
           (Quality of Service) and partition. Gang scheduling of

resources in the partition is also not supported.*singleton*

           This job can begin execution after any previously launched

jobs sharing the same job name and user have terminated.


3. Using strigger:
http://slurm.schedmd.com/strigger.html

set a trigger on job completion with strigger --set -f ...

*-f*, *--fini*
   Trigger an event when the specified job completes execution.
*-j*, *--jobid*=/id/
   Job ID of interest. *NOTE:* The *--jobid* option can not be used in
   conjunction with the *--node* option. When the *--jobid* option is
   used in conjunction with the *--up* or *--down* option, all nodes
   allocated to that job will considered the nodes used as a trigger

event.*-p*, *--program*=/path/

   Execute the program at the specified fully qualified pathname when
   the event occurs. You may quote the path and include extra program
   arguments if desired. The program will be executed as the user who
   sets the trigger. If the program fails to terminate within 5

minutes, it will be killed along with any spawned processes.*--set*

   Register an event trigger based upon the supplied options. NOTE: An
   event is only triggered once. A new event trigger must be set
   established for future events of the same type to be processed.
   Triggers can only be set if the command is run by the user
   /SlurmUser/ unless /SlurmUser/ is configured as user root.

Depending on your setup, possibly #3 is least desirable due to:

1. the trigger is not immediate, but only happens after polling (mightnot be a concern)2. set is one time only - you will have create a new trigger each timeyou wish this functionality (can be automated)

point 2 brings the biggest difference between #1 and #2 in my opinion(to your particular use case)If you always want to trigger analysis after each job - #1 is probablythe most suitable. Define it once and forget it

#2 is more suitable if only some of the jobs require analysis - createyour own sbatch templates which are used when starting a job+analysispair, otherwise use a different sbatch template (job only).


A simplified example of such a template:

#!/bin/bash

srun myapp
STEP_ID=${SLURM_JOB_ID}
srun -d afterok:${STEP_ID} myanalysis
####

The other issue is default resource allocation - #2 (unless modifyingthe analysis step) will allocate the same resource for the app and theanalysis.


On 05/13/2015 10:39 PM, Franco Broi wrote:

Try strigger.

On 14 May 2015 3:23 am, Trevor Gale <[email protected]> wrote:


    No, I haven’t. What is epilog?

    Thanks,
    Trevor

    > On May 13, 2015, at 3:21 PM, Daniel Letai <[email protected]> wrote:
    >
    >
    > Have you looked into epilog as a means to start your analysis
    automatically?
    >
    > On 05/13/2015 05:33 PM, Trevor Gale wrote:
    >> Hey all,
    >>
    >> I was just wondering if there is any mechanism built into slurm
    to signal to the user when jobs are done (other than email). I’m
    making a script to run a series of jobs and want to run some
    analysis on the results after the jobs return. I was wondering if
    there is a way to signal that a job submission has ended so my
    program can just start the analysis and not have to have the
    analysis executed separately.
    >>
    >> Thanks,
    >> Trevor


------------------------------------------------------------------------
This email and any files transmitted with it are confidential and areintended solely for the use of the individual or entity to whom theyare addressed. If you are not the original recipient or the personresponsible for delivering the email to the intended recipient, beadvised that you have received this email in error, and that any use,dissemination, forwarding, printing, or copying of this email isstrictly prohibited. If you received this email in error, pleaseimmediately notify the sender and delete the original.

[slurm-dev] Re: Can slurm signal the end of jobs?

Reply via email to