Sorry, didn't notice you wanted real time profiling.

As mentioned, you can use sstat, or if you wish to probe each node for the local resources used, you could:


xdsh $(squeue -h -j $SLURM_JOBID) "ps -q $(scontrol listpids $SLURM_JOBID | awk 'NR>1{print $1}' ORS=',')"


On 09/19/2016 08:28 AM, Daniel Letai wrote:
Re: [slurm-dev] how to monitor CPU/RAM usage on each node of a slurm job? python API?

You should use HDF5

http://slurm.schedmd.com/hdf5_profile_user_guide.html



On 09/19/2016 03:41 AM, Igor Yakushin wrote:
how to monitor CPU/RAM usage on each node of a slurm job? python API?
Hi All,

I'd like to be able to see for a given jobid how much resources are used by a job on each node it is running on at this moment. Is there a way to do it?

So far it looks like I have to script it: get the list of the involved nodes using, for example, squeue or qstat, ssh to each node and find all the user processes (not 100% guaranteed that they would be from the job I am interested in: is there a way to find UNIX pids corresponding to Slurm jobid?).

Another question: is there python API to slurm? I found pyslurm but so far it would not build with my version of Slurm.

Thank you,
Igor



Reply via email to