[slurm-dev] Re: node switching / selection

Carles Fenoy Fri, 22 Mar 2013 14:38:08 -0700

Can it be related to restart of the job due to nodefailure?
El 22/03/2013 22:12, "Michael Colonno" <[email protected]> escribió:


>
>         That was certainly my opinion as well and I would not have
> believed it unless the users showed me two different squeue outputs showing
> the job running on a certain node at one point and then a different node
> later. They claim that nothing was done to change the job after launching
> it. Unfortunately I didn't have any accounting method set up at the time so
> I will have to wait until if / when it happens again to look at the
> detailed history. I will update the threat if new data emerges.
>
>         Thanks,
>         ~Mike C.
>
> -----Original Message-----
> From: Moe Jette [
> http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/119846473197/]
> Sent: Friday, March 22, 2013 12:46 PM
> To: slurm-dev; Michael Colonno
> Subject: Re: [slurm-dev] Re: node switching / selection
>
> Slurm does not move jobs between nodes without someone explicitly removing
> nodes and adding other nodes to the job allocation.
>
>
> Quoting Michael Colonno <[email protected]>:
>
> >             This was seen with –N 1, restricting to one node. I’m not
> > even certain what to call this feature / issue.
> >
> >
> >
> >             Thanks,
> >
> >             ~Mike C.
> >
> >
> >
> > From: David Bigagli
> > [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/050197460553/
> > 5/]
> > Sent: Friday, March 22, 2013 9:42 AM
> > To: slurm-dev
> > Subject: [slurm-dev] Re: node switching / selection
> >
> >
> >
> > Is it possible the job runs on several nodes, say -N 3, then one node
> > is lost so it ends up running on 2 nodes only? Such a job should have
> > been submitted with ---no-kill.
> >
> >
> >
> > /David
> >
> > On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno
> > <[email protected]
> > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/056640426302/>
> >
> > wrote:
> >
> >
> >         Actually did mean node below. The job launched on a node and
> > then, with no user input, later appeared to be running (or trying to
> > run) on a different node. This is rare but happens from time to
> > time. I'm not sure if this is the default scheduling algorithm
> > trying make things fit better.
> >
> >         Cheers,
> >         ~Mike C.
> >
> >
> > -----Original Message-----
> > From: Marcin Stolarek
> > [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/329696846915/
> > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/933337460027/> >
> > ]
> > Sent: Friday, March 22, 2013 1:43 AM
> > To: slurm-dev
> > Subject: [slurm-dev] Re: node switching / selection
> >
> >
> > 2013/3/22 Michael Colonno <[email protected]
> > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/056640426302/> >
> > >
> >>
> >>
> >>         Hi Folks ~
> >
> > Hi,
> >>
> >>
> >>         A couple (hopefully) simple questions; I can't find
> >> anything that obviously / easily solves these in the man pages. I
> >> have a fairly ordinary deployment in which scheduling is done by
> >> core so some high-memory systems can be shared.
> >>
> >>         - Users have observed that sometimes jobs are being moved
> >> from one node to another while running. This makes the particular
> >> tool being used unhappy. Is there a >way to prevent this either
> >> with a flag or config file entry?
> >
> > by node you mean cpu?
> >
> > If so using ProctrackType=proctrack/cgroup (check man for
> > cgroup.conf)  should solve your problem, if you are using non cgroup
> > aware kernel (for instance RHEL 5) you can use cpuset spank plugin.
> >
> >
> >>         - When scheduling by core the default behavior seems to be
> >> to fill up the first node with tasks, then move to the second, etc.
> >> Since memory is being shared between >tasks it would be preferable
> >> to select a node on which no other jobs (or the minimum number of
> >> other jobs) are running before piling onto a node already running a
> >> job(s). >How can a tell SLURM the equivalent of "pick an unused
> >> node first if available".
> >
> >
> > I'm not sure if it's possible. Do we have possibility of changing
> > node allocation algorithm in slurm (like in moab/maui?)
> >
> >
> > cheers,
> > marcin
> >
> >
> >
> > Image removed by sender.
> >
> >
>
>
>

[slurm-dev] Re: node switching / selection

Reply via email to