We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat. It assumes
that you're using cgroups. It uses ssh to connect to each node so it's
not very scalable but it works well enough for us.
On 09/18/2016 06:42 PM, Igor Yakushin wrote:
how to monitor CPU/RAM usage on each node of a slurm job? python API?
I'd like to be able to see for a given jobid how much resources are
used by a job on each node it is running on at this moment. Is there a
way to do it?
So far it looks like I have to script it: get the list of the involved
nodes using, for example, squeue or qstat, ssh to each node and find
all the user processes (not 100% guaranteed that they would be from
the job I am interested in: is there a way to find UNIX pids
corresponding to Slurm jobid?).
Another question: is there python API to slurm? I found pyslurm but so
far it would not build with my version of Slurm.