[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Igor Yakushin
Hi Peter, Ganglia plugin would be interesting. How do ganglia clients on different nodes communicate? Typically they do not talk to each other but only to the central node. However, to decide that they are part of the same job, they somehow need to talk to each other? Thank you, Igor On Sun, Sep

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
Gah, yes. sstat, not sinfo. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 19 September 2016 at 13:00, Peter A Ruprecht wrote: > Igor, > > Would sstat give you what you need?

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Peter A Ruprecht
Igor, Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html) It doesn't update instantaneously but at least a few times a minute. If you want to get fancy, I believe that xdmod can integrate with TACC-stats to provide graphs about what is happening inside a job but I'm not

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
Also, if you have slurm installed on a deb based distro, you can try this https://github.com/edf-hpc/slurm-web I tried to get it running on RPM (Centos) but it is too tightly coupled to deb for my ability to port it. cheers L. -- The most dangerous phrase in the language is, "We've always

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
I think you need a couple of things going on: 1. you have to have some sort of accounting organised and set up 2. your sbatch scripts need to use: srun not just 3. sinfo should then work on the job number. When I asked, that was the response iirc. cheers L. -- The most dangerous phrase

[slurm-dev] how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Igor Yakushin
Hi All, I'd like to be able to see for a given jobid how much resources are used by a job on each node it is running on at this moment. Is there a way to do it? So far it looks like I have to script it: get the list of the involved nodes using, for example, squeue or qstat, ssh to each node and

[slurm-dev] Re: Slurm 15.08.12 - Issue after upgrading to 15.08 - only one job per node is running

2016-09-18 Thread Christopher Samuel
On 18/09/16 03:45, John DeSantis wrote: > Try adding a "DefMemPerCPU" statement in your partition definitions, e.g You can also set that globally. # Global default for jobs - request 2GB per core wanted. DefMemPerCPU=2048 All the best, Chris -- Christopher SamuelSenior Systems