Hi there,

My colleague came up with the question below about running jobs on a normal x86 based cluster. Hopefully someone here can shed some light on this.

When running SLURM on a multi-core/multi-socket cluster systems is there any way of finding out the cores allocated for a particular job. Using "scontrol show job" I can find out which nodes are allocated and a total number of cores, but have no way of knowing how these cores might be distributed across the nodes. While the system seems to allocate cores consecutively, across multiple jobs there is no way of knowing which cores are assigned to which job. For example, in an 8-core multi-node system, if I ask for 3 cores across 2 nodes (salloc -n 3 -N 2) how do I know if 2 cores are allocated from the first node and 1 core from the second or visa-versa. Also as nodes are filled up with other jobs, and jobs finish at different times, there is no way of mapping jobs to particular cores. I've seen from other postings that SLURM core numbering might not match the physical hardware core numbering, but for my purposes this is not a problem, as long as the numbering is consistent.

The reason I'm asking this question, is I'm trying to integrate SLURM with PTP (Eclipse Parallel Tools Platform) system monitoring that expects to map jobs to nodes and cores in a graphical interface. Therefore for jobs on a multi-core cluster, I need to report on which cores and nodes a particular job is running, in a specified XML format.


Many thanks!
Mark.

Reply via email to