[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread Ole Holm Nielsen


Hi Jin,

I think that I always do your steps 3,4 in the opposite order: Restart 
slurmctld, then slurmd on nodes:


> 3. Restart the slurmd on all nodes
> 4. Restart the slurmctld

Since you run a very old Slurm 15.08, perhaps you should upgrade 15.08 
-> 16.05 -> 17.02.  Soon there will be a 17.11.  FYI: I wrote some notes 
about upgrading: 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm


/Ole



On 10/23/2017 02:55 PM, JinSung Kang wrote:

Hi

Thanks everyone for your response. I have also tested my setup to remove 
nodes from the cluster, and the same thing happens.


*To answer some of the previous questions.*
"Node compute004 appears to have a different slurm.conf than the 
slurmctld" error comes up when I replace slurm.conf in all the devices, 
but it goes away when I restart slurmctld.


slurm version that I'm running is slurm 15.08.7

I've included the slurm.conf rather than slurmdbd.conf.

Cheers,

Jin


On Mon, Oct 23, 2017 at 8:25 AM Ole Holm Nielsen 
> wrote:



Hi Jin,

Your slurmctld.log says "Node compute004 appears to have a different
slurm.conf than the slurmctld" etc.  This will happen if you didn't copy
correctly the slurm.conf to the nodes.  Please correct this
potential error.

Also, please specify which version of Slurm you're running.

/Ole

On 10/22/2017 08:44 PM, JinSung Kang wrote:
 > I am having trouble with adding new nodes into slurm cluster without
 > killing the jobs that are currently running.
 >
 > Right now I
 >
 > 1. Update the slurm.conf and add a new node to it
 > 2. Copy new slurm.conf to all the nodes,
 > 3. Restart the slurmd on all nodes
 > 4. Restart the slurmctld
 >
 > But when I restart slurmctld all the jobs that were currently running
 > are requeued (Begin Time) as reason for not running. The new
added node
 > works perfectly fine.
 >
 > I've included the slurm.conf. I've also included slurmctld.log output
 > when I'm trying to add the new node.



[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread JinSung Kang
Hi

Thanks everyone for your response. I have also tested my setup to remove
nodes from the cluster, and the same thing happens.

*To answer some of the previous questions.*
"Node compute004 appears to have a different slurm.conf than the slurmctld"
error comes up when I replace slurm.conf in all the devices, but it goes
away when I restart slurmctld.

slurm version that I'm running is slurm 15.08.7

I've included the slurm.conf rather than slurmdbd.conf.

Cheers,

Jin


On Mon, Oct 23, 2017 at 8:25 AM Ole Holm Nielsen 
wrote:

>
> Hi Jin,
>
> Your slurmctld.log says "Node compute004 appears to have a different
> slurm.conf than the slurmctld" etc.  This will happen if you didn't copy
> correctly the slurm.conf to the nodes.  Please correct this potential
> error.
>
> Also, please specify which version of Slurm you're running.
>
> /Ole
>
> On 10/22/2017 08:44 PM, JinSung Kang wrote:
> > I am having trouble with adding new nodes into slurm cluster without
> > killing the jobs that are currently running.
> >
> > Right now I
> >
> > 1. Update the slurm.conf and add a new node to it
> > 2. Copy new slurm.conf to all the nodes,
> > 3. Restart the slurmd on all nodes
> > 4. Restart the slurmctld
> >
> > But when I restart slurmctld all the jobs that were currently running
> > are requeued (Begin Time) as reason for not running. The new added node
> > works perfectly fine.
> >
> > I've included the slurm.conf. I've also included slurmctld.log output
> > when I'm trying to add the new node.
>


slurm.conf
Description: Binary data


[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread Ole Holm Nielsen


Hi Jin,

Your slurmctld.log says "Node compute004 appears to have a different
slurm.conf than the slurmctld" etc.  This will happen if you didn't copy 
correctly the slurm.conf to the nodes.  Please correct this potential error.


Also, please specify which version of Slurm you're running.

/Ole

On 10/22/2017 08:44 PM, JinSung Kang wrote:
I am having trouble with adding new nodes into slurm cluster without 
killing the jobs that are currently running.


Right now I

1. Update the slurm.conf and add a new node to it
2. Copy new slurm.conf to all the nodes,
3. Restart the slurmd on all nodes
4. Restart the slurmctld

But when I restart slurmctld all the jobs that were currently running 
are requeued (Begin Time) as reason for not running. The new added node 
works perfectly fine.


I've included the slurm.conf. I've also included slurmctld.log output 
when I'm trying to add the new node.


[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread Bjørn-Helge Mevik
Ole Holm Nielsen  writes:

> I have added nodes to an existing partition several times using the same
> procedure which you describe, and no bad side effects have been noticed. This
> is a very normal kind of operation in a cluster, where hardware may be added
> or retired from time to time, while the cluster of course continues its normal
> production.  We must be able to do this, especially when transferring existing
> nodes into a new Slurm cluster.

I too have done the same a lot of times, and never seen any problem like
this.

> Douglas Jacobsen explained very well why problems may arise.  It seems to me
> that this completely rigid nodelist bit mask in the network is a Slurm design
> problem, and that it ought to be fixed.

The bitmask design is for speed, and given the problem of getting the
backfiller to be fast enough under certain loads (lots of small,
distributed jobs running, and a long queue of pending jobs), I
personally wouldn't want schedmd to sacrifice that for making updates of
node lists easier.  Especially since I haven't seen the problem JinSung
Kang reports. :)

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread Ole Holm Nielsen


I have added nodes to an existing partition several times using the same 
procedure which you describe, and no bad side effects have been noticed. 
 This is a very normal kind of operation in a cluster, where hardware 
may be added or retired from time to time, while the cluster of course 
continues its normal production.  We must be able to do this, especially 
when transferring existing nodes into a new Slurm cluster.


Douglas Jacobsen explained very well why problems may arise.  It seems 
to me that this completely rigid nodelist bit mask in the network is a 
Slurm design problem, and that it ought to be fixed.


Question: How can we pinpoint the problem more precisely in a bug report 
to SchedMD (for support-customers only :-).


/Ole


On 10/22/2017 08:44 PM, JinSung Kang wrote:
I am having trouble with adding new nodes into slurm cluster without 
killing the jobs that are currently running.


Right now I

1. Update the slurm.conf and add a new node to it
2. Copy new slurm.conf to all the nodes,
3. Restart the slurmd on all nodes
4. Restart the slurmctld

But when I restart slurmctld all the jobs that were currently running 
are requeued (Begin Time) as reason for not running. The new added node 
works perfectly fine.


I've included the slurm.conf. I've also included slurmctld.log output 
when I'm trying to add the new node.


[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-23 Thread Merlin Hartley
A workaround is to pre-configure future nodes and mark them as down - then when 
you add them you can just mark them as up.
(see the DownNodes parameter)

Hope this helps!


Merlin
--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom

> On 22 Oct 2017, at 19:55, Douglas Jacobsen  wrote:
> 
> You cannot change the nodelist without draining the system of running jobs 
> (terminating all slurmstepd) and restarting all slurmd and slurmctld.  This 
> is because slurm uses a bit mask to represent the nodelist, and slurm uses a 
> hierarchical overlay communication network. If all daemons don't have the 
> same idea of that network you can run into communication problems which can 
> cause nodes to be marked down, killing the jobs running upon them.
> 
> I think if you are not using message aggregation, you might be able to get 
> away with leaving jobs running and just restarting all slurmd and slurmctld.  
> But the tricky thing is you'll need to quiesce a lot of the rpcs on the 
> system which can partially be done by marking partitions down, but not 
> completely.
> 
> If you are thinking of adding nodes, I think you should look at the future 
> state that nodes can take. I haven't played with this, but I suspect it might 
> buy you some flexibility.
> 
> On Oct 22, 2017 11:43, "JinSung Kang"  wrote:
> Hello,
> 
> I am having trouble with adding new nodes into slurm cluster without killing 
> the jobs that are currently running.
> 
> Right now I 
> 
> 1. Update the slurm.conf and add a new node to it
> 2. Copy new slurm.conf to all the nodes,
> 3. Restart the slurmd on all nodes
> 4. Restart the slurmctld
> 
> But when I restart slurmctld all the jobs that were currently running are 
> requeued (Begin Time) as reason for not running. The new added node works 
> perfectly fine.
> 
> I've included the slurm.conf. I've also included slurmctld.log output when 
> I'm trying to add the new node.
> 
> Cheers,
> 
> Jin



[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-22 Thread Douglas Jacobsen
You cannot change the nodelist without draining the system of running jobs
(terminating all slurmstepd) and restarting all slurmd and slurmctld.  This
is because slurm uses a bit mask to represent the nodelist, and slurm uses
a hierarchical overlay communication network. If all daemons don't have the
same idea of that network you can run into communication problems which can
cause nodes to be marked down, killing the jobs running upon them.

I think if you are not using message aggregation, you might be able to get
away with leaving jobs running and just restarting all slurmd and
slurmctld.  But the tricky thing is you'll need to quiesce a lot of the
rpcs on the system which can partially be done by marking partitions down,
but not completely.

If you are thinking of adding nodes, I think you should look at the future
state that nodes can take. I haven't played with this, but I suspect it
might buy you some flexibility.

On Oct 22, 2017 11:43, "JinSung Kang"  wrote:

> Hello,
>
> I am having trouble with adding new nodes into slurm cluster without
> killing the jobs that are currently running.
>
> Right now I
>
> 1. Update the slurm.conf and add a new node to it
> 2. Copy new slurm.conf to all the nodes,
> 3. Restart the slurmd on all nodes
> 4. Restart the slurmctld
>
> But when I restart slurmctld all the jobs that were currently running are
> requeued (Begin Time) as reason for not running. The new added node works
> perfectly fine.
>
> I've included the slurm.conf. I've also included slurmctld.log output when
> I'm trying to add the new node.
>
> Cheers,
>
> Jin
>