Related to this, is it possible to know how many CPU and Memory is
currently using the job?



2012/1/24 Mark Nelson <mdnels...@gmail.com>

> Hi Moe,
>
> That was exactly what we were after. (hmmm, I really should read through
> the man-pages more carefully...)
>
> I'll pass on the idea of adding an XML output option.
>
> Many thanks!
> Mark.
>
>
> On 25/01/12 03:52, Moe Jette wrote:
>
>> Use the command "scontrol show job --detail". The output will contain a
>> line like this for each node allocated to each job:
>> Nodes=tux123 CPU_IDs=2-5 Mem=2048
>> While the data does exist, that's not going to be particularly simple to
>> parse and work with. There has been talk about adding an "--xml" option
>> for XML output from scontrol, but that has never been done. Since SLURM
>> is open source, you could modify scontrol to add an "--xml" option or
>> build a new tool for your particular application.
>>
>> Moe Jette
>> SchedMD
>>
>> Quoting Mark Nelson <mdnels...@gmail.com>:
>>
>>  Hi there,
>>>
>>> My colleague came up with the question below about running jobs on a
>>> normal x86 based cluster. Hopefully someone here can shed some light
>>> on this.
>>>
>>> When running SLURM on a multi-core/multi-socket cluster systems is
>>> there any way of finding out the cores allocated for a particular job.
>>> Using "scontrol show job" I can find out which nodes are allocated and
>>> a total number of cores, but have no way of knowing how these cores
>>> might be distributed across the nodes. While the system seems to
>>> allocate cores consecutively, across multiple jobs there is no way of
>>> knowing which cores are assigned to which job. For example, in an
>>> 8-core multi-node system, if I ask for 3 cores across 2 nodes (salloc
>>> -n 3 -N 2) how do I know if 2 cores are allocated from the first node
>>> and 1 core from the second or visa-versa. Also as nodes are filled up
>>> with other jobs, and jobs finish at different times, there is no way
>>> of mapping jobs to particular cores. I've seen from other postings
>>> that SLURM core numbering might not match the physical hardware core
>>> numbering, but for my purposes this is not a problem, as long as the
>>> numbering is consistent.
>>>
>>> The reason I'm asking this question, is I'm trying to integrate SLURM
>>> with PTP (Eclipse Parallel Tools Platform) system monitoring that
>>> expects to map jobs to nodes and cores in a graphical interface.
>>> Therefore for jobs on a multi-core cluster, I need to report on which
>>> cores and nodes a particular job is running, in a specified XML format.
>>>
>>>
>>> Many thanks!
>>> Mark.
>>>
>>>
>>
>>
>>
>

Reply via email to