Re: [slurm-users] Migration of slurm communication network / Steps / how to

2023-04-23 Thread Ole Holm Nielsen

On 4/24/23 06:58, Purvesh Parmar wrote:
thank you, but its change of hostnames as well, apart from ip addresses  
as well of the slurm server, database serverver name and slurmd compute 
nodes as well.


I suggest that you talk to your networking people and request that the old 
DNS names be created in the new network's DNS for your Slurm cluster. 
Then Ryan's solution will work.  Changing DNS names is a very simple matter!


My 2 cents,
Ole


On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski > wrote:


I think it’s easier than all of this. Are you actually changing names
of all of these things, or just IP addresses? It they all resolve to
an IP now and you can bring everything down and change the hosts files
or DNS, it seems to me that if the names aren’t changing, that’s that.
I know that “scontrol show cluster” will show the wrong IP address but
I think that updates itself.

The names of the servers are in slurm.conf, but again, if the names
don’t change, that won’t matter. If you have IPs there, you will need
to change them.

Sent from my iPhone

 > On Apr 23, 2023, at 14:01, Purvesh Parmar mailto:purveshp0...@gmail.com>> wrote:
 > 
 > Hello,
 >
 > We have slurm 21.08 on ubuntu 20. We have a cluster of 8 nodes.
Entire slurm communication happens over 192.168.5.x network (LAN).
However as per requirement, now we are migrating the cluster to other
premises and there we have 172.16.1.x (LAN). I have to migrate the
entire network including SLURMDBD (mariadb), SLURMCTLD, SLURMD. ALso
the cluster network is also changing from 192.168.5.x to 172.16.1.x
and each node will be assigned the ip address from the 172.16.1.x
network.
 > The cluster has been running for the last 3 months and it is
required to maintain the old usage stats as well.
 >
 >
 >  Is the procedure correct as below :
 >
 > 1) Stop slurm
 > 2) suspend all the queued jobs
 > 3) backup slurm database
 > 4) change the slurm & munge configuration i.e. munge conf, mariadb
conf, slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute nodes),
gres.conf, service file
 > 5) Later, do the update in the slurm database by executing below
command
 > sacctmgr modify node where node=old_name set name=new_name
 > for all the nodes.
 > ALso, I think, slurm server name and slurmdbd server names are also
required to be updated. How to do it, still checking
 > 6) Finally, start slurmdbd, slurmctld on server and slurmd on
compute nodes
 >
 > Please help and guide for above.
 >
 > Regards,
 >
 > Purvesh Parmar
 > INHAIT




Re: [slurm-users] sview not installed

2023-04-23 Thread Angel de Vicente
Hello,

mohammed shambakey  writes:

> I appreciate your help. Actually, it is built from the source repo
> (and I'm using Ubuntu 22.04). It is solved another way: after the
> regular building using configure, make, make install, I changed the
> directory to the sview folder (/src/sview), then ran "make
> install" just for the sview). After that, sview is installed in the
> correct location.

that is weird. I'm also in Ubuntu 22.04, and after configure, make, make
install there was no problem with sview (i.e. it was properly installed
without having to run another "make install" as you describe).

Cheers,
-- 
Ángel de Vicente
 Research Software Engineer (Supercomputing and BigData)
 Tel.: +34 922-605-747
 Web.: http://research.iac.es/proyecto/polmag/

 GPG: 0x8BDC390B69033F52


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] Migration of slurm communication network / Steps / how to

2023-04-23 Thread Purvesh Parmar
thank you, but its change of hostnames as well, apart from ip addresses  as
well of the slurm server, database serverver name and slurmd compute nodes
as well.

On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski  wrote:

> I think it’s easier than all of this. Are you actually changing names of
> all of these things, or just IP addresses? It they all resolve to an IP now
> and you can bring everything down and change the hosts files or DNS, it
> seems to me that if the names aren’t changing, that’s that. I know that
> “scontrol show cluster” will show the wrong IP address but I think that
> updates itself.
>
> The names of the servers are in slurm.conf, but again, if the names don’t
> change, that won’t matter. If you have IPs there, you will need to change
> them.
>
> Sent from my iPhone
>
> > On Apr 23, 2023, at 14:01, Purvesh Parmar 
> wrote:
> > 
> > Hello,
> >
> > We have slurm 21.08 on ubuntu 20. We have a cluster of 8 nodes. Entire
> slurm communication happens over 192.168.5.x network (LAN). However as per
> requirement, now we are migrating the cluster to other premises and there
> we have 172.16.1.x (LAN). I have to migrate the entire network including
> SLURMDBD (mariadb), SLURMCTLD, SLURMD. ALso the cluster network is also
> changing from 192.168.5.x to 172.16.1.x and each node will be assigned the
> ip address from the 172.16.1.x network.
> > The cluster has been running for the last 3 months and it is required to
> maintain the old usage stats as well.
> >
> >
> >  Is the procedure correct as below :
> >
> > 1) Stop slurm
> > 2) suspend all the queued jobs
> > 3) backup slurm database
> > 4) change the slurm & munge configuration i.e. munge conf, mariadb conf,
> slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute nodes), gres.conf,
> service file
> > 5) Later, do the update in the slurm database by executing below command
> > sacctmgr modify node where node=old_name set name=new_name
> > for all the nodes.
> > ALso, I think, slurm server name and slurmdbd server names are also
> required to be updated. How to do it, still checking
> > 6) Finally, start slurmdbd, slurmctld on server and slurmd on compute
> nodes
> >
> > Please help and guide for above.
> >
> > Regards,
> >
> > Purvesh Parmar
> > INHAIT
>


Re: [slurm-users] Migration of slurm communication network / Steps / how to

2023-04-23 Thread Ryan Novosielski
I think it’s easier than all of this. Are you actually changing names of all of 
these things, or just IP addresses? It they all resolve to an IP now and you 
can bring everything down and change the hosts files or DNS, it seems to me 
that if the names aren’t changing, that’s that. I know that “scontrol show 
cluster” will show the wrong IP address but I think that updates itself. 

The names of the servers are in slurm.conf, but again, if the names don’t 
change, that won’t matter. If you have IPs there, you will need to change them. 

Sent from my iPhone

> On Apr 23, 2023, at 14:01, Purvesh Parmar  wrote:
> 
> Hello,
> 
> We have slurm 21.08 on ubuntu 20. We have a cluster of 8 nodes. Entire slurm 
> communication happens over 192.168.5.x network (LAN). However as per 
> requirement, now we are migrating the cluster to other premises and there we 
> have 172.16.1.x (LAN). I have to migrate the entire network including 
> SLURMDBD (mariadb), SLURMCTLD, SLURMD. ALso the cluster network is also 
> changing from 192.168.5.x to 172.16.1.x and each node will be assigned the ip 
> address from the 172.16.1.x network. 
> The cluster has been running for the last 3 months and it is required to 
> maintain the old usage stats as well.
> 
> 
>  Is the procedure correct as below :
> 
> 1) Stop slurm
> 2) suspend all the queued jobs
> 3) backup slurm database
> 4) change the slurm & munge configuration i.e. munge conf, mariadb conf, 
> slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute nodes), gres.conf, 
> service file 
> 5) Later, do the update in the slurm database by executing below command
> sacctmgr modify node where node=old_name set name=new_name
> for all the nodes.
> ALso, I think, slurm server name and slurmdbd server names are also required 
> to be updated. How to do it, still checking
> 6) Finally, start slurmdbd, slurmctld on server and slurmd on compute nodes
> 
> Please help and guide for above.
> 
> Regards,
> 
> Purvesh Parmar
> INHAIT


[slurm-users] Migration of slurm communication network / Steps / how to

2023-04-23 Thread Purvesh Parmar
Hello,

We have slurm 21.08 on ubuntu 20. We have a cluster of 8 nodes. Entire
slurm communication happens over 192.168.5.x network (LAN). However as per
requirement, now we are migrating the cluster to other premises and there
we have 172.16.1.x (LAN). I have to migrate the entire network including
SLURMDBD (mariadb), SLURMCTLD, SLURMD. ALso the cluster network is also
changing from 192.168.5.x to 172.16.1.x and each node will be assigned the
ip address from the 172.16.1.x network.
The cluster has been running for the last 3 months and it is required to
maintain the old usage stats as well.


 Is the procedure correct as below :

1) Stop slurm
2) suspend all the queued jobs
3) backup slurm database
4) change the slurm & munge configuration i.e. munge conf, mariadb conf,
slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute nodes), gres.conf,
service file
5) Later, do the update in the slurm database by executing below command
sacctmgr modify node where node=old_name set name=new_name
for all the nodes.
ALso, I think, slurm server name and slurmdbd server names are also
required to be updated. How to do it, still checking
6) Finally, start slurmdbd, slurmctld on server and slurmd on compute nodes

Please help and guide for above.

Regards,

Purvesh Parmar
INHAIT


Re: [slurm-users] sview not installed

2023-04-23 Thread mohammed shambakey
Hi

I appreciate your help. Actually, it is built from the source repo (and I'm
using Ubuntu 22.04). It is solved another way: after the regular building
using configure, make, make install, I changed the directory to the sview
folder (/src/sview), then ran "make install" just for the sview).
After that, sview is installed in the correct location.

Regards

On Sun, Apr 23, 2023 at 10:50 AM Ole Holm Nielsen <
ole.h.niel...@fysik.dtu.dk> wrote:

> On 23-04-2023 02:43, mohammed shambakey wrote:
> > I installed slurm 23.11.0-0rc1, and sview is not installed, despite it
> > exists in /src/sview/sview. I can execute it from that path but
> > not /bin (because it does not exist there).
> >
> > I tried just copying it to /bin, but it
> > complained about being just a wrapper.
> >
> > I wonder if I'm missing something?
>
> If your system is RPM based, you will build Slurm packages like this:
>
> $ rpmbuild -ta slurm-23.02.1.tar.bz2  --with mysql --with slurmrestd
>
> The /usr/bin/sview command is located in the
> slurm-23.02.1-1.el7.x86_64.rpm package.
>
> /Ole
>
>

-- 
Mohammed


Re: [slurm-users] sview not installed

2023-04-23 Thread Ole Holm Nielsen

On 23-04-2023 02:43, mohammed shambakey wrote:
I installed slurm 23.11.0-0rc1, and sview is not installed, despite it 
exists in /src/sview/sview. I can execute it from that path but 
not /bin (because it does not exist there).


I tried just copying it to /bin, but it 
complained about being just a wrapper.


I wonder if I'm missing something?


If your system is RPM based, you will build Slurm packages like this:

$ rpmbuild -ta slurm-23.02.1.tar.bz2  --with mysql --with slurmrestd

The /usr/bin/sview command is located in the 
slurm-23.02.1-1.el7.x86_64.rpm package.


/Ole