We have something similar with several groups of users taking priority on by group nodes.
Our solution is based on using a special partition for those nodes along with nodes specific node features and some work done at job submission using job_submit lua plugin. Users from those groups submit jobs requesting features related to nodes from their own user group. The result is some jobs being submitted to more than one partition and having different priorities on each one. This is working for us but it adds work to scheduling as job submitted to more than one partition are handled as several jobs by the scheduler. Working with several partitions have some issues if there are a high number of jobs for scheduling (tenths of thousands). I have worked on improving the scheduler for this case but it is just a beta code by now. This could be a good point to discuss in next Slurm User Meeting. On 07/09/2013 08:38 PM, Neil Van Lysel wrote: > Hello, > > Is it possible to grant a user priority on X cores? For example, we have > a small 768 core SLURM cluster, and we would like to give user A > priority on only 512 cores. I am currently using QOS to give specific > users priority on all cores, but I do not know how specify priority on X > cores. > > Here's my slurm.conf file: > > ClusterName="aci" > ControlMachine=aci-service-1 > BackupController=aci-service-2 > SlurmUser=slurm > SlurmctldPort=6817 > SlurmdPort=6818 > AuthType=auth/munge > StateSaveLocation=/tmp/slurmstate > SlurmdSpoolDir=/tmp/slurmd > SwitchType=switch/none > MpiDefault=none > MpiParams=ports=12000-13999 > SlurmctldPidFile=/var/run/slurmctld.pid > SlurmdPidFile=/var/run/slurmd.pid > ProctrackType=proctrack/pgid > CacheGroups=0 > ReturnToService=0 > PropagateResourceLimitsExcept=MEMLOCK,NOFILE > UsePAM=1 > SlurmctldTimeout=120 > SlurmdTimeout=300 > InactiveLimit=0 > MinJobAge=300 > KillWait=30 > Waittime=0 > SchedulerType=sched/backfill > SelectType=select/cons_res > SelectTypeParameters=CR_Core > FastSchedule=1 > PriorityType=priority/multifactor > PriorityWeightQOS=1 > PreemptType=preempt/qos > PreemptMode=cancel > SlurmctldDebug=4 > SlurmctldLogFile=/var/log/slurm/slurmctld.log > SlurmdDebug=4 > SlurmdLogFile=/var/log/slurm/slurmd.log > JobCompType=jobcomp/none > AccountingStorageType=accounting_storage/slurmdbd > AccountingStorageHost=aci-service-1 > AccountingStorageLoc=slurm_acct_db > JobAcctGatherType=jobacct_gather/linux > JobAcctGatherFrequency=30 > NodeName=aci-[001-048] Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 > CPUs=16 State=UNKNOWN > PartitionName=aci Nodes=aci-[001-048] Default=YES MaxTime=INFINITE State=UP > > [root ~]# sacctmgr -p list qos > Name|Priority|GraceTime|Preempt|PreemptMode|Flags|UsageThres|UsageFactor|... > normal|0|00:00:00|low|cluster|||1.000000||||||||||||||||| > low|0|00:00:00||cancel|||1.000000||||||||||||||||| > > If it matters, all of the machines in this cluster are running > Scientific Linux 6.3 and running SLURM version 2.5.1. > > Any help is greatly appreciated. > > Thanks, > > Neil Van Lysel > Center for High Throughput Computing > University of Wisconsin - Madison > [email protected] > WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer
