Weights appear to work well in Slurm 15.08, although I haven't extensively tested it.
If you wanted users to submit to a single compute partition, you could compile slurm with lua support and use a job_submit.lua file similar to this:
function slurm_job_submit(job_desc, part_list, submit_uid) if job_desc.partition == "compute" then job_desc.partition = "omnipath-nodes,infiniband-nodes" end return slurm.SUCCESS endUsing this allows the user to submit to both partitions (unknowingly) and keeps the job from spanning both types of nodes.
Create a compute partition that contains all of your nodes as well as a hidden "omnipath-nodes" and "infiniband-nodes" partitions. Set the weights to prefer the infiniband nodes.
NodeName=n[1-40] Weight=2 ...(fill in the rest) Feature=opa NodeName=n[41-80] Weight=1 ...(fill in the rest) PartitionName=compute Nodes=n[1-80] State=UP PartitionName=omnipath-nodes Nodes=n[1-40] State=UP Hidden=YES PartitionName=infiniband-nodees Nodes=n[41-80] State=UP Hidden=YESYou could then set a feature in the node list for the omnipath nodes like "Feature=opa". If a user needed to use an omnipath node they could submit a job like 'sbatch -p compute --constraint=opa'. If the user just submitted a job like 'sbatch -p compute' the job should be ran on the infiniband nodes first if they are available.
I just typed all this of memory so you might want to double check, but this is how I would do it.
------------------- Nicholas McCollum HPC Systems Administrator Alabama Supercomputer Authority On Wed, 6 Jul 2016, Benjamin Redling wrote:
Hi, On 07/06/2016 11:17, Laurent Facq wrote:i would like to use only one partition with the 80 nodes, and that users who need OPA nodes could add a constraint "OPA+IB" to choose OPA+IB nodes and, that users who dont need OPA are given IB nodes if some are free, and OPA+IB nodes ONLY if no more IB are free. the goal is to do a best effort to reserve the OPA nodes to users who need them.in theory you could define different "weight"s for the node types -- and OPA as GRES -- but only recently somebody on the list wrote that "weight" as _not_ working. Sadly I am missing test setups for all slurm versions after 14.03. (Overthrow the restriction and use two partitions: one for IB-only nodes with higher priority and one for OPA nodes as alternative with lower priority [see "Priority" and "Alternate" as parameters to partitions] That way user wouldn't even need to specify a constraint explicitly -- just choose the lower priority QPA partition.)i thought that if slum search free nodes from 1 to n, putting the OPA+IB node first in the numbering would do the trick, but it seems a little bit more complicated.We are still on 2.3 (Debian 7.9) and I observe that allocation is done in the order I specify the nodes (I can test 14.03 the days to come): E.g. (sched/backfill, select/cons_res, CR_Core_Memory) PartitionName=express Alternate=QC32GBp Shared=NO Priority=200 Nodes=darwin,s17 Default=NO MaxTime=INFINITE State=UP PartitionName=QC32GBp Default=NO Shared=NO Priority=100 Nodes=s17,s[2-7,9-13] MaxTime=INFINITE State=UP PartitionName=MC20GBplus Shared=NO Priority=50 Nodes=s17,s[2-7,9-13],stemnet1 Default=YES MaxTime=INFINITE State=UP s17 on the default partition was always first. And if users specified "express" the next lower priority partition got used too as soon as it filled up (and yes: s17 is intentionally in all three) Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
smime.p7s
Description: S/MIME Cryptographic Signature