Hello,

I'm trying to understand what happen when a job is submitted.

On the one hand, I've read the *Job Launch Design Guide* (
https://computing.llnl.gov/linux/slurm/job_launch.html) and I've found that
there are five points:

   - Job allocation
   - Step allocation
   - Task allocation
   - Job Step termination
   - Job termination


On the other hand, when we submit a job, for instance:

> srun --gres=gpu:1  sleep 5

we are able to see the next output in our log file:

> sched: _slurm_rpc_allocate_resources JobId=238 NodeList=matraca1 usec=244
> debug:  Configuration for job 238 complete
> debug:  laying out the 1 tasks on 1 hosts matraca1 dist 1
> sched: _slurm_rpc_job_step_create: StepId=238.0 matraca1 usec=456
> debug:  Processing RPC: REQUEST_STEP_COMPLETE for 238.0 nodes 0-0 rc=0
> uid=0
> sched: _slurm_rpc_step_complete StepId=238.0 usec=48
> completing job 238
> sched: job_complete for JobId=238 successful


I believe they should have a relation but, can anybody say which lines from
the debug file are related to each point of the job launch process?

Moreover, I'd like to know which are the role of the job steps in an
execution (or how do they work?).

Thank you,
    Regards!
-- 
*--
*
*Sergio Iserte Agut, assistant researcher,*
*High Performance Computing & Architecture, University Jaume I (Spain)*

Reply via email to