Ganglia plugin would be interesting. How do ganglia clients on different
nodes communicate? Typically they do not talk to each other but only to the
central node. However, to decide that they are part of the same job, they
somehow need to talk to each other?
On Sun, Sep 18, 2016 at 10:01 PM, Peter A Ruprecht <
> Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html)
> It doesn't update instantaneously but at least a few times a minute.
> If you want to get fancy, I believe that xdmod can integrate with
> TACC-stats to provide graphs about what is happening inside a job but I'm
> not sure whether that updates in "real" time.
> One of our summer interns created a custom ganglia interface that checked
> what nodes a job was running on and graphed several relevant variables
> selected from the ganglia RRD files for those nodes. If you're interested
> in seeing that work, I can look into whether we can share it.
> So there are some existing ways of going at this.
> From: Igor Yakushin <igor.2...@gmail.com>
> Reply-To: slurm-dev <firstname.lastname@example.org>
> Date: Sunday, September 18, 2016 at 6:42 PM
> To: slurm-dev <email@example.com>
> Subject: [slurm-dev] how to monitor CPU/RAM usage on each node of a slurm
> job? python API?
> Hi All,
> I'd like to be able to see for a given jobid how much resources are used
> by a job on each node it is running on at this moment. Is there a way to do
> So far it looks like I have to script it: get the list of the involved
> nodes using, for example, squeue or qstat, ssh to each node and find all
> the user processes (not 100% guaranteed that they would be from the job I
> am interested in: is there a way to find UNIX pids corresponding to Slurm
> Another question: is there python API to slurm? I found pyslurm but so far
> it would not build with my version of Slurm.
> Thank you,