Hello Markus,

That section in my conf reads like:

# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core

so I don't say anything about memory.  Is that bad?

Further, I have slurm version 14.03.9 as per Debian package.  Is that too old?

---david

On Tue, Dec 6, 2016 at 11:53 AM, Markus Koeberl
<markus.koeb...@tugraz.at> wrote:
> On Tuesday 06 December 2016 10:49:33 David van Leeuwen wrote:
>>
>> Hello,
>>
>> I can't understand why jobs---even without asking for GPU
>> resources---don't get scheduled.   I must have something fundamentally
>> wrong in the configuration.  Maybe someone can help.
>>
>> I have 2 machines (physically different apparatuses---I suppose in
>> SLURM parlance this is a node, but I am not sure about that), with
>> resp. 1 and 2 GPUs, and each with (I believe) 6 hyperthreaded CPUs.
>>
>> I would like to be able to schedule either normal CPU jobs
>> (gres=gpu:0) at a granularity 1 job / CPU (so that I can run 12 jobs
>> in parallel), or GPU jobs (gres=gpu:1) at a granularity 1 job / GPU,
>> requiring additionally 1 CPU, so that I can run 3 GPU jobs in
>> parallel.  In that case, there should be still room for 9
>> single-threaded jobs on the cluster (well, maybe not a cluster, but
>> rather a binary system).
>>
>> But in trying to tell SLURM about the gpu's, it has stopped completely
>> scheduling jobs.  Even jobs where I don't even want a GPU.   Slurm
>> claims that "gres/gpu count too low (0 < 1)"---but I have to clue as
>> to what the 0 and the 1 refer to
>> (claimed/detected/physical/reserved/available/required gpus?).
>>
>> # grep gpu /etc/slurm-llnl/slurm.conf
>>
>> GresTypes=gpu
>>
>> NodeName=deep-novo-1 RealMemory=32145 CPUS=12 Sockets=1
>> CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
>>
>> NodeName=deep-novo-2 RealMemory=129105 CPUS=12 Sockets=1
>> CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:2
>
> here it is working using slurm 16.05
> do you have these settings defined in your /etc/slurm-llnl/slurm.conf?
>
> DefMemPerCPU=1000
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
>
>
> regards
> Markus Köberl
> --
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeb...@tugraz.at



-- 
David van Leeuwen

Reply via email to