A bit more data: it seems the users are requesting both an 
allocation of cores and memory when submitting jobs but there is no guarantee 
(that I’m aware of) that the application is actually limited to the memory 
requested. Could this be the root cause if this state? 

 

            Thanks,

            ~Mike C. 

 

From: Michael Colonno [mailto:[email protected]] 
Sent: Friday, September 26, 2014 11:49 AM
To: slurm-dev
Subject: [slurm-dev] Re: change in node sharing with new(er) version?

 

            Relevant portion of the config file is below – pretty vanilla and I 
don’t think that’s the cause after some more time spent debugging. The nodes in 
question are in state “drng” which I have not seen before. sinfo reports “Low 
RealMemory” for them and this must be the reason additional jobs aren’t being 
scheduled on the offending nodes. So it seems there have been some changings in 
resource monitoring. Prior to the upgrade more than one job would coexist on 
these systems without this warning (and may have been fighting for memory 
sometimes – unknown). 

 

            Thanks,

            ~Mike C. 

 

# SCHEDULING

SchedulerType=sched/backfill

SelectType=select/cons_res

SelectTypeParameters=CR_Core_Memory

FastSchedule=1

 

 

From: Morris Jette [mailto:[email protected]] 
Sent: Friday, September 26, 2014 11:44 AM
To: slurm-dev
Subject: [slurm-dev] Re: change in node sharing with new(er) version?

 

I can't think of any relevant changes. Your config files would help a lot.

On September 26, 2014 11:32:38 AM PDT, Michael Colonno <[email protected]> 
wrote:


 Hi All ~

 I just upgraded a cluster several versions (from 2.5.2 to the 14.03.8); no 
changes were made to the config file (slurm.conf). Prior to the upgrade the 
cluster was configured to allow more than one job to run on a given node 
(specifying cores, memory, etc.). After the upgrade all jobs seem to be 
allocated as if they require exclusive nodes (or the as if the --exclusive flag 
was used) and don't seem to be sharing nodes. I'm guessing there was a change 
in the config file syntax for resource allocation but I can't find anything in 
the docs. Any thoughts? 

 Thanks,
 ~Mike C. 


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.Image 
removed by sender. 

  
<http://smd-server.schedmd.local/cgi-bin/dada/mail.cgi/spacer_image/slurmdev/0c2801cfd9ba$4d6569b0$e8303d10$@stanford/spacer.png>
 

Reply via email to