Weights appear to work well in Slurm 15.08, although I haven't extensively tested it.

If you wanted users to submit to a single compute partition, you could compile slurm with lua support and use a job_submit.lua file similar to this:

function slurm_job_submit(job_desc, part_list, submit_uid)
  if job_desc.partition == "compute" then
    job_desc.partition = "omnipath-nodes,infiniband-nodes"
  end
  return slurm.SUCCESS
end

Using this allows the user to submit to both partitions (unknowingly) and keeps the job from spanning both types of nodes.

Create a compute partition that contains all of your nodes as well as a hidden "omnipath-nodes" and "infiniband-nodes" partitions. Set the weights to prefer the infiniband nodes.

NodeName=n[1-40] Weight=2 ...(fill in the rest) Feature=opa
NodeName=n[41-80] Weight=1 ...(fill in the rest)
PartitionName=compute Nodes=n[1-80] State=UP
PartitionName=omnipath-nodes Nodes=n[1-40] State=UP Hidden=YES
PartitionName=infiniband-nodees Nodes=n[41-80] State=UP Hidden=YES

You could then set a feature in the node list for the omnipath nodes like "Feature=opa". If a user needed to use an omnipath node they could submit a job like 'sbatch -p compute --constraint=opa'. If the user just submitted a job like 'sbatch -p compute' the job should be ran on the infiniband nodes first if they are available.

I just typed all this of memory so you might want to double check, but this is how I would do it.

-------------------
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority

On Wed, 6 Jul 2016, Benjamin Redling wrote:


Hi,

On 07/06/2016 11:17, Laurent Facq wrote:
i would like to use only one partition with the 80 nodes,
and that users who need OPA nodes could add a constraint "OPA+IB" to
choose OPA+IB nodes
and, that users who dont need OPA are given IB nodes if some are free,
and OPA+IB nodes ONLY if no more IB are free.

the goal is to do a best effort to reserve the OPA nodes to users who
need them.

in theory you could define different "weight"s for the node types -- and
OPA as GRES -- but only recently somebody on the list wrote that
"weight" as _not_ working. Sadly I am missing test setups for all slurm
versions after 14.03.

(Overthrow the restriction and use two partitions:
one for IB-only nodes with higher priority and one for OPA nodes as
alternative with lower priority [see "Priority" and "Alternate" as
parameters to partitions]
That way user wouldn't even need to specify a constraint explicitly --
just choose the lower priority QPA partition.)


i thought that if slum search free nodes from 1 to n, putting the OPA+IB
node first in the numbering would do the trick,
but it seems a little bit more complicated.

We are still on 2.3 (Debian 7.9) and I observe that allocation is done
in the order I specify the nodes (I can test 14.03 the days to come):

E.g. (sched/backfill, select/cons_res, CR_Core_Memory)
PartitionName=express Alternate=QC32GBp Shared=NO Priority=200
Nodes=darwin,s17 Default=NO MaxTime=INFINITE State=UP
PartitionName=QC32GBp  Default=NO Shared=NO Priority=100
Nodes=s17,s[2-7,9-13] MaxTime=INFINITE State=UP
PartitionName=MC20GBplus Shared=NO Priority=50
Nodes=s17,s[2-7,9-13],stemnet1 Default=YES MaxTime=INFINITE State=UP

s17 on the default partition was always first.
And if users specified "express" the next lower priority partition got
used too as soon as it filled up (and yes: s17 is intentionally in all
three)

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to