That was certainly my opinion as well and I would not have believed it
unless the users showed me two different squeue outputs showing the job running
on a certain node at one point and then a different node later. They claim that
nothing was done to change the job after launching it. Unfortunately I didn't
have any accounting method set up at the time so I will have to wait until if /
when it happens again to look at the detailed history. I will update the threat
if new data emerges.
Thanks,
~Mike C.
-----Original Message-----
From: Moe Jette
[http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/121929059028/]
Sent: Friday, March 22, 2013 12:46 PM
To: slurm-dev; Michael Colonno
Subject: Re: [slurm-dev] Re: node switching / selection
Slurm does not move jobs between nodes without someone explicitly removing
nodes and adding other nodes to the job allocation.
Quoting Michael Colonno <[email protected]>:
> This was seen with –N 1, restricting to one node. I’m not
> even certain what to call this feature / issue.
>
>
>
> Thanks,
>
> ~Mike C.
>
>
>
> From: David Bigagli
> [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/995718609080/
> 5/]
> Sent: Friday, March 22, 2013 9:42 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: node switching / selection
>
>
>
> Is it possible the job runs on several nodes, say -N 3, then one node
> is lost so it ends up running on 2 nodes only? Such a job should have
> been submitted with ---no-kill.
>
>
>
> /David
>
> On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno
> <[email protected]
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/468717416052/> >
> wrote:
>
>
> Actually did mean node below. The job launched on a node and
> then, with no user input, later appeared to be running (or trying to
> run) on a different node. This is rare but happens from time to
> time. I'm not sure if this is the default scheduling algorithm
> trying make things fit better.
>
> Cheers,
> ~Mike C.
>
>
> -----Original Message-----
> From: Marcin Stolarek
> [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/983453023251/
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/418469732284/>
> ]
> Sent: Friday, March 22, 2013 1:43 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: node switching / selection
>
>
> 2013/3/22 Michael Colonno <[email protected]
> <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/468717416052/>
> >
>>
>>
>> Hi Folks ~
>
> Hi,
>>
>>
>> A couple (hopefully) simple questions; I can't find
>> anything that obviously / easily solves these in the man pages. I
>> have a fairly ordinary deployment in which scheduling is done by
>> core so some high-memory systems can be shared.
>>
>> - Users have observed that sometimes jobs are being moved
>> from one node to another while running. This makes the particular
>> tool being used unhappy. Is there a >way to prevent this either
>> with a flag or config file entry?
>
> by node you mean cpu?
>
> If so using ProctrackType=proctrack/cgroup (check man for
> cgroup.conf) should solve your problem, if you are using non cgroup
> aware kernel (for instance RHEL 5) you can use cpuset spank plugin.
>
>
>> - When scheduling by core the default behavior seems to be
>> to fill up the first node with tasks, then move to the second, etc.
>> Since memory is being shared between >tasks it would be preferable
>> to select a node on which no other jobs (or the minimum number of
>> other jobs) are running before piling onto a node already running a
>> job(s). >How can a tell SLURM the equivalent of "pick an unused
>> node first if available".
>
>
> I'm not sure if it's possible. Do we have possibility of changing
> node allocation algorithm in slurm (like in moab/maui?)
>
>
> cheers,
> marcin
>
>
>
> Image removed by sender.
>
>