Correction: the GrpNodes or MaxNodes limit of the normal qos will not cause the PartitionNodeLimit problem. It's the qos's PartitionMaxNodes flag that will exempt the job from the partition's node limits.
> -----Original Message----- > From: [email protected] [mailto:owner-slurm- > [email protected]] On Behalf Of Lipari, Don > Sent: Monday, October 17, 2011 8:54 AM > To: [email protected] > Subject: RE: [slurm-dev] Some jobs lose their priority with > Reason=PartitionNodeLimit > > Lennart, > > Your slurmctld.log may contain more info regarding whether a max or min > nodes limit is being exceeded: > > Example: Job xxx requested too many nodes of partition... > > However, it could also be caused by a node limit on the normal qos: > > Run " sacctmgr show qos" to see whether there's a GrpNodes or MaxNodes > limit set for the normal qos. > > Don > > > -----Original Message----- > > From: [email protected] [mailto:owner-slurm- > > [email protected]] On Behalf Of Lennart Karlsson > > Sent: Saturday, October 15, 2011 3:23 AM > > To: [email protected] > > Cc: HAUTREUX Matthieu > > Subject: Re: [slurm-dev] Some jobs lose their priority with > > Reason=PartitionNodeLimit > > > > On 10/14/2011 03:02 PM, HAUTREUX Matthieu wrote: > > > Lennart, > > > > > > I might be wrong, but it seems that your nodes are already > allocated > > > as they all have the "alloc" state in "salloc -p node". > > > > > > As you have configured the partition asking for "Shared=Exclusive", > > as > > > soon as one core is allocated on a node, the whole node is > allocated > > > exclusively. > > > As a result, your submission is pending, waiting for free nodes to > > > run. As soon as a running jobs holding 4 nodes is finished, your > job > > > should be started. > > > > > > HTH > > > Matthieu > > > > > > Matthieu, > > > > You answer why my job is not starting, a straight-forward > > answer to one of the most common questions you get. > > Thanks for answering, but I worry about a more specific > > detail: My PartitionNodeLimit problem. > > > > My question is why the job loses its normal priority, goes > > down to the lowest priority and gets marked with > > the PartitionNodeLimit label. > > > > I would like the job to keep its normal priority, so it has > > a normal chance to start later on. > > > > Cheers, > > -- Lennart Karlsson, UPPMAX, Uppsala University, Sweden > > http://www.uppmax.uu.se >
