Gah, yes. sstat, not sinfo. ------ The most dangerous phrase in the language is, "We've always done it this way."
- Grace Hopper On 19 September 2016 at 13:00, Peter A Ruprecht <peter.rupre...@colorado.edu > wrote: > Igor, > > Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html) > It doesn't update instantaneously but at least a few times a minute. > > If you want to get fancy, I believe that xdmod can integrate with > TACC-stats to provide graphs about what is happening inside a job but I'm > not sure whether that updates in "real" time. > > One of our summer interns created a custom ganglia interface that checked > what nodes a job was running on and graphed several relevant variables > selected from the ganglia RRD files for those nodes. If you're interested > in seeing that work, I can look into whether we can share it. > > So there are some existing ways of going at this. > > Pete > > From: Igor Yakushin <igor.2...@gmail.com> > Reply-To: slurm-dev <slurm-dev@schedmd.com> > Date: Sunday, September 18, 2016 at 6:42 PM > To: slurm-dev <slurm-dev@schedmd.com> > Subject: [slurm-dev] how to monitor CPU/RAM usage on each node of a slurm > job? python API? > > Hi All, > > I'd like to be able to see for a given jobid how much resources are used > by a job on each node it is running on at this moment. Is there a way to do > it? > > So far it looks like I have to script it: get the list of the involved > nodes using, for example, squeue or qstat, ssh to each node and find all > the user processes (not 100% guaranteed that they would be from the job I > am interested in: is there a way to find UNIX pids corresponding to Slurm > jobid?). > > Another question: is there python API to slurm? I found pyslurm but so far > it would not build with my version of Slurm. > > Thank you, > Igor > >