[slurm-dev] RE: unable to configure backfill

Morris Jette Mon, 18 Nov 2013 09:40:09 -0800

Thanks. The field size was just changed for the next major release.

HUMMEL Michel <[email protected]> wrote:
>I think i've found the problem it comes from a comparison between
>integers of different types here is a patch which solve the problem :
>
>---
>slurm-slurm-13-12-0-0pre4/src/plugins/sched/backfill/backfill.c.org    
>2013-11-18
>17:56:09.741413223 +0100
>+++
>slurm-slurm-13-12-0-0pre4/src/plugins/sched/backfill/backfill.c        
>2013-11-18
>17:57:42.903468026 +0100
>@@ -712,7 +712,7 @@
>                       continue;       /* started in other partition */
>               if (!avail_front_end(job_ptr))
>                       continue;       /* No available frontend for this job */
>-              if (job_ptr->array_task_id != (uint16_t) NO_VAL) {
>+              if (job_ptr->array_task_id != (uint32_t) NO_VAL) {
>                       if (reject_array_job_id == job_ptr->array_job_id)
>                               continue;  /* already rejected array element */
>                       /* assume reject whole array for now, clear if OK */
>
>
> 
>Regards,
>
>[@@ THALES GROUP INTERNAL @@]
>
>De : HUMMEL Michel [mailto:[email protected]] 
>Envoyé : lundi 18 novembre 2013 16:32
>À : slurm-dev
>Objet : [slurm-dev] unable to configure backfill
>
>I am trying the backfill scheduler without success.
>I just want to test it with the most simple configuration possible*
>(see slurm.conf at the end).
>7 homogenous nodes, 12 CPU per node
>
>I submit three jobs and the last should be backfilled, but … :
>$ sbatch --nice=0 -N 5  -c 12 --time-min="09:00" --time="10:00"
>~/slurm/job.sh 
>Submitted batch job 65574
>$ sbatch --nice=0 -N 5  -c 12 --time-min="09:00" --time="10:00"
>~/slurm/job.sh
>Submitted batch job 65575
>sbatch --nice=0 -N 1  -c 12 --time-min="00:40" --time="01:00"
>~/slurm/job.sh
>Submitted batch job 65576
>$ squeue 
>             JOBID PARTITION     NAME     USER ST       TIME  NODES
>NODELIST(REASON)
>             65575    prod.q   job.sh  hummelm PD       0:00      5
>(Resources)
>             65576    prod.q   job.sh  hummelm PD       0:00      1
>(Priority)
>             65574    prod.q   job.sh  hummelm  R       7:54      5
>OGSE[1-5]
>
>I hope someone here can show me the error I’ve made, thks.
>
>(* slurm.conf )
>ControlMachine=OGSE1
>#
>AuthType=auth/munge
>CryptoType=crypto/munge
>MailProg=/bin/mail
>MpiDefault=none
>ProctrackType=proctrack/pgid
>ReturnToService=1
>SlurmctldPidFile=/var/run/slurmctld.pid
>SlurmctldPort=6817
>SlurmdPidFile=/var/run/slurmd.pid
>SlurmdPort=6818
>SlurmdSpoolDir=/var/spool/slurmd
>SlurmUser=root
>StateSaveLocation=/var/spool
>SwitchType=switch/none
>TaskPlugin=task/none
>
>InactiveLimit=0
>KillWait=30
>MinJobAge=300
>SlurmctldTimeout=120
>SlurmdTimeout=300
>Waittime=0
>#
>#
># SCHEDULING
>FastSchedule=1
>SchedulerType=sched/backfill
>SchedulerParameters=bf_interval=20,bf_resolution=10
>SchedulerPort=7321
>##### Round robin select for nodes
>#SelectType=select/cons_res
>#SelectTypeParameters=CR_LLN
>#
>#
># JOB PRIORITY
>PriorityType=priority/multifactor
>PriorityWeightPartition=1000
>############
>#
>#Preemption
>#PreemptMode=REQUEUE
>#PreemptType=preempt/partition_prio
>#
># LOGGING AND ACCOUNTING
>AccountingStorageType=accounting_storage/none
>
>ClusterName=cluster
>DebugFlags=Backfill
>JobCompType=jobcomp/none
>JobAcctGatherType=jobacct_gather/none
>SlurmctldDebug=6
>SlurmdDebug=1
>#
># COMPUTE NODES
>NodeName=OGSE[1-7] CPUs=12 State=UNKNOWN
>PartitionName=prod.q Nodes=OGSE[1-7] Default=YES MaxTime="01:00:00"
>State=UP Priority=10
>PartitionName=urgent.q Nodes=OGSE[1-7] Default=NO MaxTime="01:00:00"
>State=UP Priority=20
>
>
>[@@ THALES GROUP INTERNAL @@]


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

[slurm-dev] RE: unable to configure backfill

Reply via email to