[slurm-dev] Re: Setting up SLURM for a single multi-core node

Fitzpatrick, Ben Thu, 11 Feb 2016 09:01:39 -0800

Hi Rohan,

We use:


SelectTypeParameters=CR_CORE_DEFAULT_DIST_BLOCK

to try to keep jobs on the one socket if possible - the documentation describes 
it like this:

                     CR_CORE_DEFAULT_DIST_BLOCK
                             Allocate cores within a node using block  distri-
                             bution  by  default.   This  is a pseudo-best-fit
                             algorithm that minimizes the number of boards and
                             minimizes  the  number of sockets (within minimum
                             boards) used for the  allocation.   This  default
                             behavior  can be overridden specifying a particu-
                             lar  "-m"  parameter   with   srun/salloc/sbatch.
                             Without  this  option,  cores  will  be allocated
                             cyclicly across the sockets.

and would probably run like this to get 8 cores:

#!/bin/bash
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks=8
srun /path/to/application1

(that or just --ntasks=16).

Cheers,

Ben

-----Original Message-----
From: Rohan Garg [mailto:rohg...@ccs.neu.edu] 
Sent: 11 February 2016 16:06
To: slurm-dev
Subject: [slurm-dev] Re: Setting up SLURM for a single multi-core node


Thanks for the quick response, Benjamin.

My intent of using the `--sockets-per-node=1` option was to try to
force the allocation of a job on a single socket for spatial locality.
But this is not a hard constraint for me. As I said before, my main
goal is to be able to occupy all the available cores on a multi-core
machine, given enough jobs.

> You have 3 jobs that each request 1 socket = 3 socket.
> You have 2 phys. sockets.

I'm not sure if this is how it's behaving because:

  a) just specifying the `--cores-per-socket` option and the
     `--threads-per-core` option doesn't change the behavior of the
     scheduler; and
  b) in another setting with a 24-core machine (12 cores each on 2 sockets),
     depending on how I submit my jobs, I am able to occupy up to 20 cores
     by submitting 5 jobs, each of which uses the `--sockets-per-node=1`
     option (along with the other two options).

----- Original Message -----
From: "Benjamin Redling" <benjamin.ra...@uni-jena.de>
To: "slurm-dev" <slurm-dev@schedmd.com>
Sent: Thursday, February 11, 2016 8:18:02 AM
Subject: [slurm-dev] Re: Setting up SLURM for a single multi-core node

On 2016-02-11 07:36, Rohan Garg wrote:
> 
> Hello,
> 
> I'm trying to set up SLURM-15.08.1 on a single multi-core node to
> manage multi-threaded jobs. The machine has 16 physical cores
> on 2 sockets with HyperThreading enabled. I'm using the EASY
> scheduling algorithm with backfilling. The goal is to fully utilize all
> the available cores at all times.
> 
> Given a list of three jobs with requirements of 8 cores, 2 cores,
> and 4 cores, the expectation is that the jobs should be co-scheduled
> to utilize 14 of the 16 available cores.  However, I can't seem to
> get SLURM to work as expected. SLURM runs the latter two jobs
> together but refuses to schedule the first job until they finish.
> (Is this the expected behavior of the EASY-backfilling algorithm?)
> 
> Here's the list of jobs:
> 
>   $ cat job1.batch
> 
>     #!/bin/bash
>     #SBATCH --sockets-per-node=1
>     #SBATCH --cores-per-socket=8
>     #SBATCH --threads-per-core=1
>     srun /path/to/application1
>   
>   $ cat job2.batch
>   
>     #!/bin/bash
>     #SBATCH --sockets-per-node=1
>     #SBATCH --cores-per-socket=2
>     #SBATCH --threads-per-core=1
>     srun /path/to/application2
>   
>   $ cat job3.batch
>   
>     #!/bin/bash
>     #SBATCH --sockets-per-node=1
>     #SBATCH --cores-per-socket=4
>     #SBATCH --threads-per-core=1
>     srun /path/to/application3

At a quick glance:

In general let the scheduler do the work. Don't micro-manage.
Be aware that your SBATCH setting are constraints -- not hints.

You have 3 jobs that each request 1 socket = 3 socket.
You have 2 phys. sockets.

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

[slurm-dev] Re: Setting up SLURM for a single multi-core node

Reply via email to