Christopher Samuel <[email protected]> writes: > On 05/02/14 09:27, Lyn Gerner wrote: > >> You might check out the Weight parameter in the Node section of the >> slurm.conf documentation. I believe you could just give the fat nodes >> a higher node weight than the thinner nodes, to achieve your goal. > > We use it to ensure that our Xeon Phi nodes are allocated after nodes > that don't have them, and that our 512GB nodes are allocated after the > 256GB nodes. Of course the 1 node that has both Xeon Phi AND 512GB is > very heavily weighted against. :-) > > Here's the snippet from our slurm.conf (you can see from the Gres and > RealMemory directives which are which): > > NodeName=barcoo[001-058] NodeAddr=barcoo[001-058] RealMemory=250000 Weight=2 > NodeName=barcoo[059-060] NodeAddr=barcoo[059-060] RealMemory=500000 > Weight=1000 > NodeName=barcoo061 NodeAddr=barcoo061 RealMemory=500000 > Gres=mic:2 Weight=100000 > NodeName=barcoo[062-070] NodeAddr=barcoo[062-070] RealMemory=250000 > Gres=mic:2 Weight=100 > > cheers, > Chris
We do already use weighting, but my understanding was that this would only affect the order in which resources are assigned and not prevent a job from starting even when resources are available. I assume that there is some valid reason for a job waiting, but it is not apparent to me. I guess it would be helpful if it were possible to see exactly what resources a job is waiting for, but I haven't come across a way to do that. Cheers, Loris -- This signature is currently under construction.
