Would sstat give you what you need?  (http://slurm.schedmd.com/sstat.html)  It 
doesn't update instantaneously but at least a few times a minute.

If you want to get fancy, I believe that xdmod can integrate with TACC-stats to 
provide graphs about what is happening inside a job but I'm not sure whether 
that updates in "real" time.

One of our summer interns created a custom ganglia interface that checked what 
nodes a job was running on and graphed several relevant variables selected from 
the ganglia RRD files for those nodes.  If you're interested in seeing that 
work, I can look into whether we can share it.

So there are some existing ways of going at this.


From: Igor Yakushin <igor.2...@gmail.com<mailto:igor.2...@gmail.com>>
Reply-To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>>
Date: Sunday, September 18, 2016 at 6:42 PM
To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>>
Subject: [slurm-dev] how to monitor CPU/RAM usage on each node of a slurm job? 
python API?

Hi All,

I'd like to be able to see for a given jobid how much resources are used by a 
job on each node it is running on at this moment. Is there a way to do it?

So far it looks like I have to script it: get the list of the involved nodes 
using, for example, squeue or qstat, ssh to each node and find all the user 
processes (not 100% guaranteed that they would be from the job I am interested 
in: is there a way to find UNIX pids corresponding to Slurm jobid?).

Another question: is there python API to slurm? I found pyslurm but so far it 
would not build with my version of Slurm.

Thank you,

Reply via email to