Hi Rémi,

The way that we handle it is to use the cgroups plugins with 
'ConstrainCores=yes' in the cgroup.conf file:

http://slurm.schedmd.com/cgroups.html
http://slurm.schedmd.com/cgroup.conf.html

A job is locked down to its allocated cores, so if the user launches too many 
intensive processes they just slow each other down.

Cheers,

Ben

-----Original Message-----
From: Rémi Piatek [mailto:[email protected]] 
Sent: 23 June 2015 12:44
To: slurm-dev
Subject: [slurm-dev] Re: SLURM allows jobs to start even if they use more CPUs 
than requested


I had considered this simple explanation, but it seemed unlikely to me, 
as it would imply that we completely have to rely on users to specify 
correctly the number of CPUs they need. People use my server for 
CPU-intensive jobs, so it is important for me to make sure that 
resources are fairly shared. I was hoping slurm would allow me to do 
this, and prevent people from free-riding (so far, it would be easy to 
request a small number of CPUs and use a much larger number, thus 
slowing down the other users).

I read that when jobs exceed the memory requested and allocated by 
slurm, they are automatically interrupted. Is there nothing similar for 
the use of CPUs?

Thanks for the help! Much appreciated.


On 06/23/2015 12:05 PM, Loris Bennett wrote:
>
> Rémi Piatek <[email protected]> writes:
>
>> Hello,
>>
>> I am getting started with SLURM and I am having a hard time understanding 
>> how it
>> allocates CPUs to users depending on the resources they request. The problem 
>> I
>> am facing can be summarized as follows. Consider a bash script test.sh that
>> requests 8 CPUs but actually starts a job that uses 10 CPUs:
>>
>>      #!/bin/sh
>>      #SBATCH --ntasks=8
>>      stress -c 10
>>
>> On a server with 32 CPUs, if I start 5 times this script with sbatch 
>> test.sh, 4
>> of them start running right away and the last one appears as pending, as 
>> shown
>> by the squeue command:
>>
>>      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>>          5      main  test.sh     jack PD       0:00      1 (Resources)
>>          1      main  test.sh     jack  R       0:08      1 server
>>          2      main  test.sh     jack  R       0:08      1 server
>>          3      main  test.sh     jack  R       0:05      1 server
>>          4      main  test.sh     jack  R       0:05      1 server
>>
>> The problem is that these 4 jobs are actually using 40 CPUs and overload the
>> server. I would on the contrary expect SLURM to either not start the jobs 
>> that
>> are actually using more resources than requested by the user, or to put them 
>> on
>> hold until there are enough resources to start them. How can I make sure that
>> the users of my server do not start jobs that use too many CPUs?
>>
>> Some useful details about my slurm.conf file:
>>
>>      # SCHEDULING
>>      #DefMemPerCPU=0
>>      FastSchedule=1
>>      #MaxMemPerCPU=0
>>      SchedulerType=sched/backfill
>>      SchedulerPort=7321
>>      SelectType=select/cons_res
>>      SelectTypeParameters=CR_CPU
>>      # COMPUTE NODES
>>      NodeName=server CPUs=32 RealMemory=10000 State=UNKNOWN
>>      # PARTITIONS
>>      PartitionName=main Nodes=server Default=YES Shared=YES MaxTime=INFINITE
>> State=UP
>>
>> I am probably making a trivial mistake in the configuration file, of just
>> misunderstanding a basic concept of SLURM. Any help or advice would be much
>> appreciated.
>>
>> Many thanks in advance!
> Slurm just keeps track of how many cores have been assigned to running
> jobs - it doesn't check how many processes are actually started within a
> given job.  So, it is up to the user to make sure she starts the correct
> number of processes.
>
> Cheers,
>
> Loris
>

Reply via email to