This was seen with –N 1, restricting to one node. I’m not even 
certain what to call this feature / issue. 

 

            Thanks,

            ~Mike C. 

 

From: David Bigagli 
[http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/977153799945/] 
Sent: Friday, March 22, 2013 9:42 AM
To: slurm-dev
Subject: [slurm-dev] Re: node switching / selection

 

Is it possible the job runs on several nodes, say -N 3, then one node is lost 
so it ends up running on 2 nodes only? Such a job should have been submitted 
with ---no-kill.

 

/David

On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno <[email protected] 
<http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/680101056603/> > 
wrote:


        Actually did mean node below. The job launched on a node and then, with 
no user input, later appeared to be running (or trying to run) on a different 
node. This is rare but happens from time to time. I'm not sure if this is the 
default scheduling algorithm trying make things fit better.

        Cheers,
        ~Mike C.


-----Original Message-----
From: Marcin Stolarek 
[http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/510482344511/ 
<http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/511673610929/> ]
Sent: Friday, March 22, 2013 1:43 AM
To: slurm-dev
Subject: [slurm-dev] Re: node switching / selection


2013/3/22 Michael Colonno <[email protected] 
<http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/680101056603/> >
>
>
>         Hi Folks ~

Hi,
>
>
>         A couple (hopefully) simple questions; I can't find anything that 
> obviously / easily solves these in the man pages. I have a fairly ordinary 
> deployment in which scheduling is done by core so some high-memory systems 
> can be shared.
>
>         - Users have observed that sometimes jobs are being moved from one 
> node to another while running. This makes the particular tool being used 
> unhappy. Is there a >way to prevent this either with a flag or config file 
> entry?

by node you mean cpu?

If so using ProctrackType=proctrack/cgroup (check man for cgroup.conf)  should 
solve your problem, if you are using non cgroup aware kernel (for instance RHEL 
5) you can use cpuset spank plugin.


>         - When scheduling by core the default behavior seems to be to fill up 
> the first node with tasks, then move to the second, etc. Since memory is 
> being shared between >tasks it would be preferable to select a node on which 
> no other jobs (or the minimum number of other jobs) are running before piling 
> onto a node already running a job(s). >How can a tell SLURM the equivalent of 
> "pick an unused node first if available".


I'm not sure if it's possible. Do we have possibility of changing node 
allocation algorithm in slurm (like in moab/maui?)


cheers,
marcin

 

Image removed by sender.

<<image001.jpg>>

Reply via email to