Correction:  the GrpNodes or MaxNodes limit of the normal qos will not cause 
the PartitionNodeLimit problem.  It's the qos's PartitionMaxNodes flag that 
will exempt the job from the partition's node limits.

> -----Original Message-----
> From: [email protected] [mailto:owner-slurm-
> [email protected]] On Behalf Of Lipari, Don
> Sent: Monday, October 17, 2011 8:54 AM
> To: [email protected]
> Subject: RE: [slurm-dev] Some jobs lose their priority with
> Reason=PartitionNodeLimit
> 
> Lennart,
> 
> Your slurmctld.log may contain more info regarding whether a max or min
> nodes limit is being exceeded:
> 
> Example:  Job xxx requested too many nodes of partition...
> 
> However, it could also be caused by a node limit on the normal qos:
> 
> Run " sacctmgr show qos" to see whether there's a GrpNodes or MaxNodes
> limit set for the normal qos.
> 
> Don
> 
> > -----Original Message-----
> > From: [email protected] [mailto:owner-slurm-
> > [email protected]] On Behalf Of Lennart Karlsson
> > Sent: Saturday, October 15, 2011 3:23 AM
> > To: [email protected]
> > Cc: HAUTREUX Matthieu
> > Subject: Re: [slurm-dev] Some jobs lose their priority with
> > Reason=PartitionNodeLimit
> >
> > On 10/14/2011 03:02 PM, HAUTREUX Matthieu wrote:
> > > Lennart,
> > >
> > > I might be wrong, but it seems that your nodes are already
> allocated
> > > as they all have the "alloc" state in "salloc -p node".
> > >
> > > As you have configured the partition asking for "Shared=Exclusive",
> > as
> > > soon as one core is allocated on a node, the whole node is
> allocated
> > > exclusively.
> > > As a result, your submission is pending, waiting for free nodes to
> > > run. As soon as a running jobs holding 4 nodes is finished, your
> job
> > > should be started.
> > >
> > > HTH
> > > Matthieu
> >
> >
> > Matthieu,
> >
> > You answer why my job is not starting, a straight-forward
> > answer to one of the most common questions you get.
> > Thanks for answering, but I worry about a more specific
> > detail: My PartitionNodeLimit problem.
> >
> > My question is why the job loses its normal priority, goes
> > down to the lowest priority and gets marked with
> > the PartitionNodeLimit label.
> >
> > I would like the job to keep its normal priority, so it has
> > a normal chance to start later on.
> >
> > Cheers,
> > -- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
> >     http://www.uppmax.uu.se
> 


Reply via email to