Re: [slurm-dev] How to find out which cores of a node are allocated to a job

Felip Moll Sat, 28 Jan 2012 11:00:14 -0800

Thank you Jette, I didn't remember that some time ago I tried sstat, but It
didn't work.


If I sstat a jobid that has job steps, all goes fine and I see the stat of
the job, but, If I sstat on a jobid without job steps I can't get the stats.
I use ProctrackType=proctrack/linuxproc

I get the error: sstat: error: no steps running for job 11352

I tried it as a normal user and as a root.

In the other hand, with sacct I can see the stats of all my finished jobs,
but not the stats of those that are running or pending. I see the job line
but with the statistics fields empty.

Thank you

2012/1/28 je...@schedmd.com <je...@schedmd.com>

> ** Look at sstat and sacct commands
> --
> Sent from my Android phone. Please excuse my brevity and typos.
>
>
> Felip Moll <lip...@gmail.com> wrote:
>>
>> Related to this, is it possible to know how many CPU and Memory is
>> currently using the job?
>>
>>
>>
>> 2012/1/24 Mark Nelson <mdnels...@gmail.com>
>>
>>> Hi Moe,
>>>
>>> That was exactly what we were after. (hmmm, I really should read through
>>> the man-pages more carefully...)
>>>
>>> I'll pass on the idea of adding an XML output option.
>>>
>>> Many thanks!
>>> Mark.
>>>
>>>
>>> On 25/01/12 03:52, Moe Jette wrote:
>>>
>>>> Use the command "scontrol show job --detail". The output will contain a
>>>> line like this for each node allocated to each job:
>>>> Nodes=tux123 CPU_IDs=2-5 Mem=2048
>>>> While the data does exist, that's not going to be particularly simple to
>>>> parse and work with. There has been talk about adding an "--xml" option
>>>> for XML output from scontrol, but that has never been done. Since SLURM
>>>> is open source, you could modify scontrol to add an "--xml" option or
>>>> build a new tool for your particular application.
>>>>
>>>> Moe Jette
>>>> SchedMD
>>>>
>>>> Quoting Mark Nelson <mdnels...@gmail.com>:
>>>>
>>>>  Hi there,
>>>>>
>>>>> My colleague came up with the question below about running jobs on a
>>>>> normal x86 based cluster. Hopefully someone here can shed some light
>>>>> on this.
>>>>>
>>>>> When running SLURM on a multi-core/multi-socket cluster systems is
>>>>> there any way of finding out the cores allocated for a particular job.
>>>>> Using "scontrol show job" I can find out which nodes are allocated and
>>>>> a total number of cores, but have no way of knowing how these cores
>>>>> might be distributed across the nodes. While the system seems to
>>>>> allocate cores consecutively, across multiple jobs there is no way of
>>>>> knowing which cores are assigned to which job. For example, in an
>>>>> 8-core multi-node system, if I ask for 3 cores across 2 nodes (salloc
>>>>> -n 3 -N 2) how do I know if 2 cores are allocated from the first node
>>>>> and 1 core from the second or visa-versa. Also as nodes are filled up
>>>>> with other jobs, and jobs finish at different times, there is no way
>>>>> of mapping jobs to particular cores. I've seen from other postings
>>>>> that SLURM core numbering might not match the physical hardware core
>>>>> numbering, but for my purposes this is not a problem, as long as the
>>>>> numbering is consistent.
>>>>>
>>>>> The reason I'm asking this question, is I'm trying to integrate SLURM
>>>>> with PTP (Eclipse Parallel Tools Platform) system monitoring that
>>>>> expects to map jobs to nodes and cores in a graphical interface.
>>>>> Therefore for jobs on a multi-core cluster, I need to report on which
>>>>> cores and nodes a particular job is running, in a specified XML format.
>>>>>
>>>>>
>>>>> Many thanks!
>>>>> Mark.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: [slurm-dev] How to find out which cores of a node are allocated to a job

Reply via email to