Hi Giovani,
We have just upgraded to 16.05.4.
When I try building pyslurm, it says that version 2.6 of Slurm is required.
Thank you,
Igor


On Mon, Sep 19, 2016 at 8:09 AM, Torres, Giovanni <giovanni.tor...@nih.gov>
wrote:

> On 9/18/16, 8:41 PM, "Igor Yakushin" <igor.2...@gmail.com> wrote:
> >
> > Hi All,
> >
> >
> > I'd like to be able to see for a given jobid how much resources are used
> by a job on each node it is running on at this moment. Is there a way to do
> it?
> >
> > So far it looks like I have to script it: get the list of the involved
> nodes using, for example, squeue or qstat, ssh to each node and find all
> the user processes (not
> > 100% guaranteed that they would be from the job I am interested
> > in: is there a way to find UNIX pids corresponding to Slurm jobid?).
> >
> You can do `scontrol listpids` on a node.  It will return a mapping of
> PIDs to JobIDs.  But from a script, you would have to fork a subshell to
> execute scontrol and then you would have to parse the output.
>
> If you are using the cgroup task plugin, a better way would be to parse
> the output of the cgroup hierarchy (/cgroup or /sys/fs/cgroup, depending on
> your OS) on each compute node.  There is a Python API to libcgroup (
> https://git.fedorahosted.org/git/python-libcgroup.git) but I don’t think
> it is complete and I’m not sure of its status (whether it is maintained or
> not).  If you are doing this from Python, however, I find it easier and
> faster to just glob the cgroup hierarchy and read cgroup.procs and
> memory.stat under the slurm tasks.  You still need to get the CPU state for
> each process or thread under a given job in order to get the “cpu load” for
> that job.
>
> My take on this was to write a small daemon that runs on each node.  It
> gathers metrics for all running slurm processes on a node and aggregates
> them by job.  The daemon then sends the info periodically (every 30
> seconds) to a Redis database in JSON format.  From there, I can write
> command utilities or web tools that query Redis instead of slurmctld.  This
> makes for a stateless monitoring environment.  Given that Redis runs
> in-memory, if Redis goes down, all metrics are lost.  However, as long as
> the daemon is running on each compute node, Redis will be fully repopulated
> in 30 seconds.
>
> I have some code that does all this already, but I don’t think it is ready
> for mass consumption.  I could put it on GitHub if anyone is interested.
>
> >
> > Another question: is there python API to slurm? I found pyslurm but so
> far it would not build with my version of Slurm.
>
> What version of Slurm are you running?  If you are having problems
> building PySlurm, feel free to post questions here:
> https://groups.google.com/forum/#!forum/pyslurm
>
> We’d be happy to help you get PySlurm going.
>
> Best,
> Giovanni
>
>
>
>

Reply via email to