Slurm does not move jobs between nodes without someone explicitly  
removing nodes and adding other nodes to the job allocation.


Quoting Michael Colonno <[email protected]>:

>             This was seen with –N 1, restricting to one node. I’m  
> not even certain what to call this feature / issue.
>
>
>
>             Thanks,
>
>             ~Mike C.
>
>
>
> From: David Bigagli  
> [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/928261550087/]
> Sent: Friday, March 22, 2013 9:42 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: node switching / selection
>
>
>
> Is it possible the job runs on several nodes, say -N 3, then one  
> node is lost so it ends up running on 2 nodes only? Such a job  
> should have been submitted with ---no-kill.
>
>
>
> /David
>
> On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno  
> <[email protected]  
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/358506327936/> >  
> wrote:
>
>
>         Actually did mean node below. The job launched on a node and  
> then, with no user input, later appeared to be running (or trying to  
> run) on a different node. This is rare but happens from time to  
> time. I'm not sure if this is the default scheduling algorithm  
> trying make things fit better.
>
>         Cheers,
>         ~Mike C.
>
>
> -----Original Message-----
> From: Marcin Stolarek  
> [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/548353995454/  
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/829628620372/>  
> ]
> Sent: Friday, March 22, 2013 1:43 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: node switching / selection
>
>
> 2013/3/22 Michael Colonno <[email protected]  
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/358506327936/>  
> >
>>
>>
>>         Hi Folks ~
>
> Hi,
>>
>>
>>         A couple (hopefully) simple questions; I can't find  
>> anything that obviously / easily solves these in the man pages. I  
>> have a fairly ordinary deployment in which scheduling is done by  
>> core so some high-memory systems can be shared.
>>
>>         - Users have observed that sometimes jobs are being moved  
>> from one node to another while running. This makes the particular  
>> tool being used unhappy. Is there a >way to prevent this either  
>> with a flag or config file entry?
>
> by node you mean cpu?
>
> If so using ProctrackType=proctrack/cgroup (check man for  
> cgroup.conf)  should solve your problem, if you are using non cgroup  
> aware kernel (for instance RHEL 5) you can use cpuset spank plugin.
>
>
>>         - When scheduling by core the default behavior seems to be  
>> to fill up the first node with tasks, then move to the second, etc.  
>> Since memory is being shared between >tasks it would be preferable  
>> to select a node on which no other jobs (or the minimum number of  
>> other jobs) are running before piling onto a node already running a  
>> job(s). >How can a tell SLURM the equivalent of "pick an unused  
>> node first if available".
>
>
> I'm not sure if it's possible. Do we have possibility of changing  
> node allocation algorithm in slurm (like in moab/maui?)
>
>
> cheers,
> marcin
>
>
>
> Image removed by sender.
>
>

Reply via email to