Hi there,
My colleague came up with the question below about running jobs on a
normal x86 based cluster. Hopefully someone here can shed some light on
this.
When running SLURM on a multi-core/multi-socket cluster systems is there
any way of finding out the cores allocated for a particular job. Using
"scontrol show job" I can find out which nodes are allocated and a total
number of cores, but have no way of knowing how these cores might be
distributed across the nodes. While the system seems to allocate cores
consecutively, across multiple jobs there is no way of knowing which
cores are assigned to which job. For example, in an 8-core multi-node
system, if I ask for 3 cores across 2 nodes (salloc -n 3 -N 2) how do I
know if 2 cores are allocated from the first node and 1 core from the
second or visa-versa. Also as nodes are filled up with other jobs, and
jobs finish at different times, there is no way of mapping jobs to
particular cores. I've seen from other postings that SLURM core
numbering might not match the physical hardware core numbering, but for
my purposes this is not a problem, as long as the numbering is consistent.
The reason I'm asking this question, is I'm trying to integrate SLURM
with PTP (Eclipse Parallel Tools Platform) system monitoring that
expects to map jobs to nodes and cores in a graphical interface.
Therefore for jobs on a multi-core cluster, I need to report on which
cores and nodes a particular job is running, in a specified XML format.
Many thanks!
Mark.
- [slurm-dev] How to find out which cores of a node are al... Mark Nelson
-