Great!
You've done a good job!
Thank you!

2012/6/6 Moe Jette <[email protected]>

>
> Most of the RPC are logged using debug2() messages, so increasing the
> SlurmctldDebug value by one should make this much more clear and you
> should see something like this:
>
> Job allocation:
> slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION
> from uid=1
> 001
> slurmctld: debug2: found 4 usable nodes from config containing smd[1-4]
> slurmctld: debug2: sched: JobId=1267485 allocated resources: NodeList=smd1
> slurmctld: sched: _slurm_rpc_allocate_resources JobId=1267485
> NodeList=smd1 usec=198
> slurmctld: debug2: _slurm_rpc_job_ready(1267485)=3 usec=3
>
> Step allocation:
> slurmctld: debug2: Processing RPC: REQUEST_JOB_STEP_CREATE from uid=1001
> slurmctld: debug:  Configuration for job 1267485 complete
>
> Task layout (done as part of step creation):
> slurmctld: debug:  laying out the 1 tasks on 1 hosts smd1 dist 1
> slurmctld: sched: _slurm_rpc_job_step_create: StepId=1267485.0 smd1
> usec=224
>
> Step termination:
> slurmctld: debug:  Processing RPC: REQUEST_STEP_COMPLETE for 1267485.0
> nodes 0-0
>  rc=0 uid=0
> slurmctld: sched: _slurm_rpc_step_complete StepId=1267485.0 usec=56
>
> Job termination
> slurmctld: debug2: Processing RPC: REQUEST_COMPLETE_JOB_ALLOCATION
> from uid=1001, JobId=1267485 rc=0
> slurmctld: completing job 1267485
> slurmctld: debug2: Spawning RPC agent for msg_type 7004
> slurmctld: debug2: Spawning RPC agent for msg_type 6011
> slurmctld: sched: job_complete for JobId=1267485 successful
> slurmctld: debug2: _slurm_rpc_complete_job_allocation JobId=1267485
> usec=270
> slurmctld: debug2: got 1 threads to send out
> slurmctld: debug2: got 1 threads to send out
> slurmctld: debug2: Tree head got back 0 looking for 1
> slurmctld: debug2: Tree head got back 1
> slurmctld: debug2: Tree head got them all
> slurmctld: debug2: node_did_resp smd1
> slurmctld: debug2: Processing RPC: MESSAGE_EPILOG_COMPLETE uid=0
> slurmctld: debug2: _slurm_rpc_epilog_complete JobId=1267485 Node=smd1
> usec=45
>
>
> Quoting Sergio Iserte Agut <[email protected]>:
>
> > Hello,
> >
> > I'm trying to understand what happen when a job is submitted.
> >
> > On the one hand, I've read the *Job Launch Design Guide* (
> > https://computing.llnl.gov/linux/slurm/job_launch.html) and I've found
> that
> > there are five points:
> >
> >    - Job allocation
> >    - Step allocation
> >    - Task allocation
> >    - Job Step termination
> >    - Job termination
> >
> >
> > On the other hand, when we submit a job, for instance:
> >
> >> srun --gres=gpu:1  sleep 5
> >
> > we are able to see the next output in our log file:
> >
> >> sched: _slurm_rpc_allocate_resources JobId=238 NodeList=matraca1
> usec=244
> >> debug:  Configuration for job 238 complete
> >> debug:  laying out the 1 tasks on 1 hosts matraca1 dist 1
> >> sched: _slurm_rpc_job_step_create: StepId=238.0 matraca1 usec=456
> >> debug:  Processing RPC: REQUEST_STEP_COMPLETE for 238.0 nodes 0-0 rc=0
> >> uid=0
> >> sched: _slurm_rpc_step_complete StepId=238.0 usec=48
> >> completing job 238
> >> sched: job_complete for JobId=238 successful
> >
> >
> > I believe they should have a relation but, can anybody say which lines
> from
> > the debug file are related to each point of the job launch process?
> >
> > Moreover, I'd like to know which are the role of the job steps in an
> > execution (or how do they work?).
> >
> > Thank you,
> >     Regards!
> > --
> > *--
> > *
> > *Sergio Iserte Agut, assistant researcher,*
> > *High Performance Computing & Architecture, University Jaume I (Spain)*
> >
>
>


-- 
*--
*
*Sergio Iserte Agut, assistant researcher,*
*High Performance Computing & Architecture, University Jaume I (Spain)*

Reply via email to