Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thanks  either  I can use which slurmd  -C gives because I see same set of
node giving different value.or I can also choose the available  memory I
mean 251*1024

Regards
Navin

On Fri, Jul 10, 2020, 20:34 Stephan Roth  wrote:

> It's recommended to round RealMemory down to the next lower gigabyte
> value to prevent nodes from entering a drain state after rebooting with
> a bios- or kernel-update.
>
> Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node
> configuration"
>
> Stephan
>
> On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
> > If you run  slurmd -C  on the compute node, it should tell you what
> > slurm thinks the RealMemory number is.
> >
> > Jeff
> >
> > 
> > *From:* slurm-users  on behalf
> of
> > navin srivastava 
> > *Sent:* Friday, July 10, 2020 6:24 AM
> > *To:* Slurm User Community List 
> > *Subject:* Re: [slurm-users] changes in slurm.
> > Thank you for the answers.
> >
> > is the RealMemory will be decided on the Total Memory value or total
> > usable memory value.
> >
> > i mean if a node having 256GB RAM but free -g will tell about only 251
> GB.
> > deda1x1591:~ # free -g
> >   total   used   free sharedbuffers
> cached
> > Mem:   251 67184  6  0 47
> >
> > so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any
> > slurm command which will provide me the value to add.
> >
> > Regards
> > Navin.
> >
> >
> >
> > On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus  > <mailto:toomuc...@gmail.com>> wrote:
> >
> > Navin,
> >
> > 1. you will need to restart slurmctld when you make changes to the
> > physical definition of a node. This can be done without affecting
> > running jobs.
> >
> > 2. You can have a node in more than one partition. That will not hurt
> > anything. Jobs are allocated to nodes, not partitions, the partition
> is
> > used to determine which node(s) and filter/order jobs. You should add
> > the node to the new partition, but also leave it in the 'test'
> > partition. If you are looking to remove the 'test' partition, set it
> to
> > down and once all the running jobs that are in it finish, then
> > remove it.
> >
> > Brian Andrus
> >
> > On 7/8/2020 10:57 PM, navin srivastava wrote:
> >  > Hi Team,
> >  >
> >  > i have 2 small query.because of the lack of testing environment i
> am
> >  > unable to test the scenario. working on to set up a test
> environment.
> >  >
> >  > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> >  > i found the reason is because there is no RealMemory entry in the
> > node
> >  > definition of the slurm.
> >  >
> >  > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
> > NodeAddr=Node[1-12]
> >  > Sockets=2 CoresPerSocket=10 State=UNKNOWN
> >  >
> >  > if i add the RealMemory it should be able to pick. So my
> query here
> >  > is, is it possible to add RealMemory in the definition anytime
> while
> >  > the jobs are in progres and execute the scontrol reconfigure and
> >  > reload the daemon on client node?  or do we need to take a
> >  > downtime?(which i don't think so)
> >  >
> >  > 2. Also I would like to know what will happen if some jobs are
> > running
> >  > in a partition(say test) and I will move the associated node to
> some
> >  > other partition(say normal) without draining the node.or if i
> > suspend
> >  > the job and then change the node partition and will resume the
> > job. I
> >  > am not deleting the partition here.
> >  >
> >  > Regards
> >  > Navin.
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >
>
>
> ---
> Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
> +4144 632 30 59  |  ETF D 104  |  Sternwartstrasse 7  | 8092 Zurich
> ---
>
>


Re: [slurm-users] changes in slurm.

2020-07-10 Thread Stephan Roth
It's recommended to round RealMemory down to the next lower gigabyte 
value to prevent nodes from entering a drain state after rebooting with 
a bios- or kernel-update.


Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node 
configuration"


Stephan

On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
If you run  slurmd -C  on the compute node, it should tell you what 
slurm thinks the RealMemory number is.


Jeff


*From:* slurm-users  on behalf of 
navin srivastava 

*Sent:* Friday, July 10, 2020 6:24 AM
*To:* Slurm User Community List 
*Subject:* Re: [slurm-users] changes in slurm.
Thank you for the answers.

is the RealMemory will be decided on the Total Memory value or total 
usable memory value.


i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
              total       used       free     shared    buffers     cached
Mem:           251         67        184          6          0         47

so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any 
slurm command which will provide me the value to add.


Regards
Navin.



On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus <mailto:toomuc...@gmail.com>> wrote:


Navin,

1. you will need to restart slurmctld when you make changes to the
physical definition of a node. This can be done without affecting
running jobs.

2. You can have a node in more than one partition. That will not hurt
anything. Jobs are allocated to nodes, not partitions, the partition is
used to determine which node(s) and filter/order jobs. You should add
the node to the new partition, but also leave it in the 'test'
partition. If you are looking to remove the 'test' partition, set it to
down and once all the running jobs that are in it finish, then
remove it.

Brian Andrus

On 7/8/2020 10:57 PM, navin srivastava wrote:
 > Hi Team,
 >
 > i have 2 small query.because of the lack of testing environment i am
 > unable to test the scenario. working on to set up a test environment.
 >
 > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
 > i found the reason is because there is no RealMemory entry in the
node
 > definition of the slurm.
 >
 > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
NodeAddr=Node[1-12]
 > Sockets=2 CoresPerSocket=10 State=UNKNOWN
 >
 > if i add the RealMemory it should be able to pick. So my query here
 > is, is it possible to add RealMemory in the definition anytime while
 > the jobs are in progres and execute the scontrol reconfigure and
 > reload the daemon on client node?  or do we need to take a
 > downtime?(which i don't think so)
 >
 > 2. Also I would like to know what will happen if some jobs are
running
 > in a partition(say test) and I will move the associated node to some
 > other partition(say normal) without draining the node.or if i
suspend
 > the job and then change the node partition and will resume the
job. I
 > am not deleting the partition here.
 >
 > Regards
 > Navin.
 >
 >
 >
 >
 >
 >
 >




---
Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
+4144 632 30 59  |  ETF D 104  |  Sternwartstrasse 7  | 8092 Zurich
---



Re: [slurm-users] changes in slurm.

2020-07-10 Thread Sarlo, Jeffrey S
If you run  slurmd -C  on the compute node, it should tell you what slurm 
thinks the RealMemory number is.

Jeff


From: slurm-users  on behalf of navin 
srivastava 
Sent: Friday, July 10, 2020 6:24 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] changes in slurm.

Thank you for the answers.

is the RealMemory will be decided on the Total Memory value or total usable 
memory value.

i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
 total   used   free sharedbuffers cached
Mem:   251 67184  6  0 47

so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any slurm 
command which will provide me the value to add.

Regards
Navin.



On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus 
mailto:toomuc...@gmail.com>> wrote:
Navin,

1. you will need to restart slurmctld when you make changes to the
physical definition of a node. This can be done without affecting
running jobs.

2. You can have a node in more than one partition. That will not hurt
anything. Jobs are allocated to nodes, not partitions, the partition is
used to determine which node(s) and filter/order jobs. You should add
the node to the new partition, but also leave it in the 'test'
partition. If you are looking to remove the 'test' partition, set it to
down and once all the running jobs that are in it finish, then remove it.

Brian Andrus

On 7/8/2020 10:57 PM, navin srivastava wrote:
> Hi Team,
>
> i have 2 small query.because of the lack of testing environment i am
> unable to test the scenario. working on to set up a test environment.
>
> 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> i found the reason is because there is no RealMemory entry in the node
> definition of the slurm.
>
> NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
> Sockets=2 CoresPerSocket=10 State=UNKNOWN
>
> if i add the RealMemory it should be able to pick. So my query here
> is, is it possible to add RealMemory in the definition anytime while
> the jobs are in progres and execute the scontrol reconfigure and
> reload the daemon on client node?  or do we need to take a
> downtime?(which i don't think so)
>
> 2. Also I would like to know what will happen if some jobs are running
> in a partition(say test) and I will move the associated node to some
> other partition(say normal) without draining the node.or if i suspend
> the job and then change the node partition and will resume the job. I
> am not deleting the partition here.
>
> Regards
> Navin.
>
>
>
>
>
>
>



Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thank you for the answers.

is the RealMemory will be decided on the Total Memory value or total usable
memory value.

i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
 total   used   free sharedbuffers cached
Mem:   251 67184  6  0 47

so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any
slurm command which will provide me the value to add.

Regards
Navin.



On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus  wrote:

> Navin,
>
> 1. you will need to restart slurmctld when you make changes to the
> physical definition of a node. This can be done without affecting
> running jobs.
>
> 2. You can have a node in more than one partition. That will not hurt
> anything. Jobs are allocated to nodes, not partitions, the partition is
> used to determine which node(s) and filter/order jobs. You should add
> the node to the new partition, but also leave it in the 'test'
> partition. If you are looking to remove the 'test' partition, set it to
> down and once all the running jobs that are in it finish, then remove it.
>
> Brian Andrus
>
> On 7/8/2020 10:57 PM, navin srivastava wrote:
> > Hi Team,
> >
> > i have 2 small query.because of the lack of testing environment i am
> > unable to test the scenario. working on to set up a test environment.
> >
> > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> > i found the reason is because there is no RealMemory entry in the node
> > definition of the slurm.
> >
> > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
> > Sockets=2 CoresPerSocket=10 State=UNKNOWN
> >
> > if i add the RealMemory it should be able to pick. So my query here
> > is, is it possible to add RealMemory in the definition anytime while
> > the jobs are in progres and execute the scontrol reconfigure and
> > reload the daemon on client node?  or do we need to take a
> > downtime?(which i don't think so)
> >
> > 2. Also I would like to know what will happen if some jobs are running
> > in a partition(say test) and I will move the associated node to some
> > other partition(say normal) without draining the node.or if i suspend
> > the job and then change the node partition and will resume the job. I
> > am not deleting the partition here.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> >
> >
> >
>
>


Re: [slurm-users] changes in slurm.

2020-07-09 Thread Brian Andrus

Navin,

1. you will need to restart slurmctld when you make changes to the 
physical definition of a node. This can be done without affecting 
running jobs.


2. You can have a node in more than one partition. That will not hurt 
anything. Jobs are allocated to nodes, not partitions, the partition is 
used to determine which node(s) and filter/order jobs. You should add 
the node to the new partition, but also leave it in the 'test' 
partition. If you are looking to remove the 'test' partition, set it to 
down and once all the running jobs that are in it finish, then remove it.


Brian Andrus

On 7/8/2020 10:57 PM, navin srivastava wrote:

Hi Team,

i have 2 small query.because of the lack of testing environment i am 
unable to test the scenario. working on to set up a test environment.


1. In my environment i am unable to pass #SBATCH --mem-2GB option.
i found the reason is because there is no RealMemory entry in the node 
definition of the slurm.


NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12] 
Sockets=2 CoresPerSocket=10 State=UNKNOWN


if i add the RealMemory it should be able to pick. So my query here 
is, is it possible to add RealMemory in the definition anytime while 
the jobs are in progres and execute the scontrol reconfigure and 
reload the daemon on client node?  or do we need to take a 
downtime?(which i don't think so)


2. Also I would like to know what will happen if some jobs are running 
in a partition(say test) and I will move the associated node to some 
other partition(say normal) without draining the node.or if i suspend 
the job and then change the node partition and will resume the job. I 
am not deleting the partition here.


Regards
Navin.











[slurm-users] changes in slurm.

2020-07-09 Thread navin srivastava
Hi Team,

i have 2 small query.because of the lack of testing environment i am unable
to test the scenario. working on to set up a test environment.

1. In my environment i am unable to pass #SBATCH --mem-2GB option.
i found the reason is because there is no RealMemory entry in the node
definition of the slurm.

NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
Sockets=2 CoresPerSocket=10 State=UNKNOWN

if i add the RealMemory it should be able to pick. So my query here is, is
it possible to add RealMemory in the definition anytime while the jobs are
in progres and execute the scontrol reconfigure and reload the daemon on
client node?  or do we need to take a downtime?(which i don't think so)

2. Also I would like to know what will happen if some jobs are running in a
partition(say test) and I will move the associated node to some other
partition(say normal) without draining the node.or if i suspend the job and
then change the node partition and will resume the job. I am not deleting
the partition here.

Regards
Navin.