Can it be related to restart of the job due to nodefailure? El 22/03/2013 22:12, "Michael Colonno" <[email protected]> escribió:
> > That was certainly my opinion as well and I would not have > believed it unless the users showed me two different squeue outputs showing > the job running on a certain node at one point and then a different node > later. They claim that nothing was done to change the job after launching > it. Unfortunately I didn't have any accounting method set up at the time so > I will have to wait until if / when it happens again to look at the > detailed history. I will update the threat if new data emerges. > > Thanks, > ~Mike C. > > -----Original Message----- > From: Moe Jette [ > http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/119846473197/] > Sent: Friday, March 22, 2013 12:46 PM > To: slurm-dev; Michael Colonno > Subject: Re: [slurm-dev] Re: node switching / selection > > Slurm does not move jobs between nodes without someone explicitly removing > nodes and adding other nodes to the job allocation. > > > Quoting Michael Colonno <[email protected]>: > > > This was seen with –N 1, restricting to one node. I’m not > > even certain what to call this feature / issue. > > > > > > > > Thanks, > > > > ~Mike C. > > > > > > > > From: David Bigagli > > [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/050197460553/ > > 5/] > > Sent: Friday, March 22, 2013 9:42 AM > > To: slurm-dev > > Subject: [slurm-dev] Re: node switching / selection > > > > > > > > Is it possible the job runs on several nodes, say -N 3, then one node > > is lost so it ends up running on 2 nodes only? Such a job should have > > been submitted with ---no-kill. > > > > > > > > /David > > > > On Fri, Mar 22, 2013 at 4:06 PM, Michael Colonno > > <[email protected] > > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/056640426302/> > > > > wrote: > > > > > > Actually did mean node below. The job launched on a node and > > then, with no user input, later appeared to be running (or trying to > > run) on a different node. This is rare but happens from time to > > time. I'm not sure if this is the default scheduling algorithm > > trying make things fit better. > > > > Cheers, > > ~Mike C. > > > > > > -----Original Message----- > > From: Marcin Stolarek > > [http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/329696846915/ > > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/933337460027/> > > > ] > > Sent: Friday, March 22, 2013 1:43 AM > > To: slurm-dev > > Subject: [slurm-dev] Re: node switching / selection > > > > > > 2013/3/22 Michael Colonno <[email protected] > > <http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/056640426302/> > > > > > >> > >> > >> Hi Folks ~ > > > > Hi, > >> > >> > >> A couple (hopefully) simple questions; I can't find > >> anything that obviously / easily solves these in the man pages. I > >> have a fairly ordinary deployment in which scheduling is done by > >> core so some high-memory systems can be shared. > >> > >> - Users have observed that sometimes jobs are being moved > >> from one node to another while running. This makes the particular > >> tool being used unhappy. Is there a >way to prevent this either > >> with a flag or config file entry? > > > > by node you mean cpu? > > > > If so using ProctrackType=proctrack/cgroup (check man for > > cgroup.conf) should solve your problem, if you are using non cgroup > > aware kernel (for instance RHEL 5) you can use cpuset spank plugin. > > > > > >> - When scheduling by core the default behavior seems to be > >> to fill up the first node with tasks, then move to the second, etc. > >> Since memory is being shared between >tasks it would be preferable > >> to select a node on which no other jobs (or the minimum number of > >> other jobs) are running before piling onto a node already running a > >> job(s). >How can a tell SLURM the equivalent of "pick an unused > >> node first if available". > > > > > > I'm not sure if it's possible. Do we have possibility of changing > > node allocation algorithm in slurm (like in moab/maui?) > > > > > > cheers, > > marcin > > > > > > > > Image removed by sender. > > > > > > >
