Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi,

Thanks for your script.
with this i am able to show the limit what i set. but this limt is
not working.

MaxJobs =3, current value = 0

Regards
Navin.

On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen 
wrote:

> On 2/17/20 11:16 AM, navin srivastava wrote:
> > i have an issue with the slurm job limit. i applied the Maxjobs limit on
> > user using
> >
> >   sacctmgr modify user navin1 set maxjobs=3
> >
> > but still i see this is not getting applied. i am still bale to submit
> > more jobs.
> > Slurm version is 17.11.x
> >
> > Let me know what setting is required to implement this.
>
> The tool "showuserlimits" tells you all user limits in the Slurm database.
>   You can download it from
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> and give it a try:
>
> $ showuserlimits -u navin1
>
> /Ole
>
>


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi ole,

i am submitting 100 of jobs are i see all jobs starting at the same time
and all job is going into the run state.
if Maxjobs limit is set it should allow only 3 jobs at any point of time.

Regards
Navin.




On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen 
wrote:

> Hi Navin,
>
> Why do you think the limit is not working?  The MaxJobs limits the number
> of running jobs to 3, but you can still submit as many jobs as you like!
>
> See "man sacctmgr" for definitions of the limits MaxJobs as well as
> MaxSubmitJobs.
>
> /Ole
>
> On 2/17/20 12:04 PM, navin srivastava wrote:
> > Hi,
> >
> > Thanks for your script.
> > with this i am able to show the limit what i set. but this limt is
> > not working.
> >
> > MaxJobs =3, current value = 0
> >
> > Regards
> > Navin.
> >
> > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
> > mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
> >
> > On 2/17/20 11:16 AM, navin srivastava wrote:
> >  > i have an issue with the slurm job limit. i applied the Maxjobs
> > limit on
> >  > user using
> >  >
> >  >   sacctmgr modify user navin1 set maxjobs=3
> >  >
> >  > but still i see this is not getting applied. i am still bale to
> submit
> >  > more jobs.
> >  > Slurm version is 17.11.x
> >  >
> >  > Let me know what setting is required to implement this.
> >
> > The tool "showuserlimits" tells you all user limits in the Slurm
> > database.
> >You can download it from
> >
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> > and give it a try:
> >
> > $ showuserlimits -u navin1
>
>


[slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi Team,

i have an issue with the slurm job limit. i applied the Maxjobs limit on
user using

 sacctmgr modify user navin1 set maxjobs=3

but still i see this is not getting applied. i am still bale to submit more
jobs.
Slurm version is 17.11.x

Let me know what setting is required to implement this.

Regards
Navin.


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi ole,

Thanks Ole.
After setting the Enforce it worked.
I am new to slurm to thanks for helping me.


Regards
Navin


On Mon, Feb 17, 2020 at 5:36 PM Ole Holm Nielsen 
wrote:

> Hi Navin,
>
> I wonder if you have configured the Slurm database and the slurmdbd
> daemon?  I think the limit enforcement requires the use of the database.
>
> What is the output of:
>
> $ scontrol show config | grep AccountingStorageEnforce
>
> See also https://slurm.schedmd.com/accounting.html#limit-enforcement
>
> Limit Enforcement
>
> Various limits and limit enforcement are described in the Resource Limits
> web page.
>
> To enable any limit enforcement you must at least have
> AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you
> have limits set, they will not be enforced. Other options for
> AccountingStorageEnforce and the explanation for each are found on the
> Resource Limits document.
>
> /Ole
>
> On 2/17/20 12:20 PM, navin srivastava wrote:
> > Hi ole,
> >
> > i am submitting 100 of jobs are i see all jobs starting at the same time
> > and all job is going into the run state.
> > if Maxjobs limit is set it should allow only 3 jobs at any point of time.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> > On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen
> > mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
> >
> > Hi Navin,
> >
> > Why do you think the limit is not working?  The MaxJobs limits the
> number
> > of running jobs to 3, but you can still submit as many jobs as you
> like!
> >
> > See "man sacctmgr" for definitions of the limits MaxJobs as well as
> > MaxSubmitJobs.
> >
> > /Ole
> >
> > On 2/17/20 12:04 PM, navin srivastava wrote:
> >  > Hi,
> >  >
> >  > Thanks for your script.
> >  > with this i am able to show the limit what i set. but this limt is
> >  > not working.
> >  >
> >  > MaxJobs =3, current value = 0
> >  >
> >  > Regards
> >  > Navin.
> >  >
> >  > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
> >  > mailto:ole.h.niel...@fysik.dtu.dk>
> > <mailto:ole.h.niel...@fysik.dtu.dk
> > <mailto:ole.h.niel...@fysik.dtu.dk>>> wrote:
> >  >
> >  > On 2/17/20 11:16 AM, navin srivastava wrote:
> >  >  > i have an issue with the slurm job limit. i applied the
> Maxjobs
> >  > limit on
> >  >  > user using
> >  >  >
> >  >  >   sacctmgr modify user navin1 set maxjobs=3
> >  >  >
> >  >  > but still i see this is not getting applied. i am still
> bale
> > to submit
> >  >  > more jobs.
> >  >  > Slurm version is 17.11.x
> >  >  >
> >  >  > Let me know what setting is required to implement this.
> >  >
> >  > The tool "showuserlimits" tells you all user limits in the
> Slurm
> >  > database.
> >  >You can download it from
> >  >
> >
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> >  > and give it a try:
> >  >
> >  > $ showuserlimits -u navin1
> >
>
>
>


Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
Thanks Erik.

Last night i made the changes.

i defined in slurm.conf on all the nodes as well as on the slurm server.

TmpFS=/lscratch

 NodeName=node[01-10]  CPUs=44  RealMemory=257380 Sockets=2
CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN
Feature=P4000 Gres=gpu:2

These nodes having 1.6TB local scratch. i did a scontrol reconfig on all
the nodes but after sometime we saw all nodes went into drain state.then i
revert back the changes with old one.

on all nodes jobs were running and the localsctratch is 20-25% in use.
we have already cleanup script in crontab which used to clean the scratch
space regularly.

is anything wrong here?


Regards
Navin.









On Thu, Apr 16, 2020 at 12:26 AM Ellestad, Erik 
wrote:

> The default value for TmpDisk is 0, so if you want local scratch
> available on a node, the amount of TmpDisk space must be defined in the
> node configuration in slurm.conf.
>
> example:
>
> NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4
> ThreadsPerCore=1 RealMemory=24099 TmpDisk=15
>
> The configuration value for the node definition is in MB.
>
> https://slurm.schedmd.com/slurm.conf.html
>
> *TmpDisk* Total size of temporary disk storage in *TmpFS* in megabytes
> (e.g. "16384"). *TmpFS* (for "Temporary File System") identifies the
> location which jobs should use for temporary storage. Note this does not
> indicate the amount of free space available to the user on the node, only
> the total file system size. The system administration should ensure this
> file system is purged as needed so that user jobs have access to most of
> this space. The Prolog and/or Epilog programs (specified in the
> configuration file) might be used to ensure the file system is kept clean.
> The default value is 0.
>
> When requesting --tmp with srun or sbatch, it can be done in various size
> formats:
>
> *--tmp*=<*size[units]*> Specify a minimum amount of temporary disk space
> per node. Default units are megabytes unless the SchedulerParameters
> configuration parameter includes the "default_gbytes" option for gigabytes.
> Different units can be specified using the suffix [K|M|G|T].
> https://slurm.schedmd.com/sbatch.html
>
>
>
> ---
> Erik Ellestad
> Wynton Cluster SysAdmin
> UCSF
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Tuesday, April 14, 2020 11:19 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] How to request for the allocation of scratch
> .
>
> Thank you Erik.
>
> To define the local scratch on all the compute node is not mandatory? only
> on slurm server is enough right?
> Also the TMPdisk should be defined in MB or can be defined in GB as well
>
> while requesting --tmp , we can use the value in GB right?
>
> Regards
> Navin.
>
>
>
> On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik 
> wrote:
>
> Have you defined the TmpDisk value for each node?
>
> As far as I know, local disk space is not a valid type for GRES.
>
> https://slurm.schedmd.com/gres.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=qpy9RbpYHEmd6jqfs9j8b4IiRJf3GkO-X_v05nL-8Bo=B5wXNgnl6EdXIaS0QQpITnBTSxcjnAHJENyJGLyWltI=>
>
> "Generic resource (GRES) scheduling is supported through a flexible plugin
> mechanism. Support is currently provided for Graphics Processing Units
> (GPUs), CUDA Multi-Process Service (MPS), and Intel® Many Integrated Core
> (MIC) processors."
>
> The only valid solution I've found for scratch is to:
>
> In slurm.conf, define the location of local scratch globally via TmpFS.
>
> And then the amount per host is defined via TmpDisk=xxx.
>
> Then the request for srun/sbatch via --tmp=X
>
>
>
> ---
> Erik Ellestad
> Wynton Cluster SysAdmin
> UCSF
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Tuesday, April 14, 2020 7:32 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] How to request for the allocation of scratch
> .
>
>
> Any suggestion on the above query.need help to understand it.
> Does TmpFS=/scratch   and the request is #SBATCH --tmp=500GB  then it will
> reserve the 500GB from scratch.
> let me know if my assumption is correct?
>
> Regards
> Navin.
>
>
> On Mon, Apr 13, 2020 at 11:10 AM navin srivastava 
> wrote:
>
> Hi Team,
>
> i wanted to define a mechanism to request the local disk space while
> submitting the job.
>
> we have dedicated /scratch of 1.2 TB file system for the execution of the
> job on each of

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
Thank you Erik.

To define the local scratch on all the compute node is not mandatory? only
on slurm server is enough right?
Also the TMPdisk should be defined in MB or can be defined in GB as well

while requesting --tmp , we can use the value in GB right?

Regards
Navin.



On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik 
wrote:

> Have you defined the TmpDisk value for each node?
>
> As far as I know, local disk space is not a valid type for GRES.
>
> https://slurm.schedmd.com/gres.html
>
> "Generic resource (GRES) scheduling is supported through a flexible plugin
> mechanism. Support is currently provided for Graphics Processing Units
> (GPUs), CUDA Multi-Process Service (MPS), and Intel® Many Integrated Core
> (MIC) processors."
>
> The only valid solution I've found for scratch is to:
>
> In slurm.conf, define the location of local scratch globally via TmpFS.
>
> And then the amount per host is defined via TmpDisk=xxx.
>
> Then the request for srun/sbatch via --tmp=X
>
>
>
> ---
> Erik Ellestad
> Wynton Cluster SysAdmin
> UCSF
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Tuesday, April 14, 2020 7:32 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] How to request for the allocation of scratch
> .
>
>
> Any suggestion on the above query.need help to understand it.
> Does TmpFS=/scratch   and the request is #SBATCH --tmp=500GB  then it will
> reserve the 500GB from scratch.
> let me know if my assumption is correct?
>
> Regards
> Navin.
>
>
> On Mon, Apr 13, 2020 at 11:10 AM navin srivastava 
> wrote:
>
> Hi Team,
>
> i wanted to define a mechanism to request the local disk space while
> submitting the job.
>
> we have dedicated /scratch of 1.2 TB file system for the execution of the
> job on each of the compute nodes other than / and other file system.
> i have defined in slurm.conf as TmpFS=/scratch  and then wanted to use
> #SBATCH --scratch =10GB   in the request.
> but it seems it is not accepting this variable except /tmp.
>
> Then i have opted the mechanism of gres.conf
>
> GresTypes=gpu,scratch
>
> and defined each node the scratch value and then requested using
> --gres=lscratch:10GB
> but in this scenario if requesting both gres resources gpu as well as
> scratch it show me only scratch in my Gres resource not gpu.
> does it using the gpu also as a gres resource?
>
> could anybody please advice which is the correct method to achieve the
> same?
> Also, is scratch will be able to calculate the actual usage value on the
> node.
>
> REgards
> Navin.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


[slurm-users] How to request for the allocation of scratch .

2020-04-12 Thread navin srivastava
Hi Team,

i wanted to define a mechanism to request the local disk space while
submitting the job.

we have dedicated /scratch of 1.2 TB file system for the execution of the
job on each of the compute nodes other than / and other file system.
i have defined in slurm.conf as TmpFS=/scratch  and then wanted to use
#SBATCH --scratch =10GB   in the request.
but it seems it is not accepting this variable except /tmp.

Then i have opted the mechanism of gres.conf

GresTypes=gpu,scratch

and defined each node the scratch value and then requested using
--gres=lscratch:10GB
but in this scenario if requesting both gres resources gpu as well as
scratch it show me only scratch in my Gres resource not gpu.
does it using the gpu also as a gres resource?

could anybody please advice which is the correct method to achieve the same?
Also, is scratch will be able to calculate the actual usage value on the
node.

REgards
Navin.


Re: [slurm-users] How to request for the allocation of scratch .

2020-04-20 Thread navin srivastava
I attempted again and it gets succeed.
Thanks for your help.

On Thu, Apr 16, 2020 at 9:45 PM Ellestad, Erik 
wrote:

> That all seems fine to me.
>
> I would check into your slurm logs to try and determine why slurm put your
> nodes into drain state.
>
> Erik
>
> ---
> Erik Ellestad
> Wynton Cluster SysAdmin
> UCSF
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Wednesday, April 15, 2020 10:37 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] How to request for the allocation of scratch
> .
>
> Thanks Erik.
>
> Last night i made the changes.
>
> i defined in slurm.conf on all the nodes as well as on the slurm server.
>
> TmpFS=/lscratch
>
>  NodeName=node[01-10]  CPUs=44  RealMemory=257380 Sockets=2
> CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN
> Feature=P4000 Gres=gpu:2
>
> These nodes having 1.6TB local scratch. i did a scontrol reconfig on all
> the nodes but after sometime we saw all nodes went into drain state.then i
> revert back the changes with old one.
>
> on all nodes jobs were running and the localsctratch is 20-25% in use.
> we have already cleanup script in crontab which used to clean the scratch
> space regularly.
>
> is anything wrong here?
>
>
> Regards
> Navin.
>
>
>
>
>
>
>
>
>
> On Thu, Apr 16, 2020 at 12:26 AM Ellestad, Erik 
> wrote:
>
> The default value for TmpDisk is 0, so if you want local scratch
> available on a node, the amount of TmpDisk space must be defined in the
> node configuration in slurm.conf.
>
> example:
>
> NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4
> ThreadsPerCore=1 RealMemory=24099 TmpDisk=15
>
> The configuration value for the node definition is in MB.
>
> https://slurm.schedmd.com/slurm.conf.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_slurm.conf.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=S4Dlg4-bVtpN7A5uGS3U053sTbGLQhWWqF7czJWRhO8=-q_lB4fzIj8i2PkR5NfWJomcaoDnRlBuyvyxkf4V0hQ=>
>
> *TmpDisk*Total size of temporary disk storage in *TmpFS* in megabytes
> (e.g. "16384"). *TmpFS* (for "Temporary File System") identifies the
> location which jobs should use for temporary storage. Note this does not
> indicate the amount of free space available to the user on the node, only
> the total file system size. The system administration should ensure this
> file system is purged as needed so that user jobs have access to most of
> this space. The Prolog and/or Epilog programs (specified in the
> configuration file) might be used to ensure the file system is kept clean.
> The default value is 0.
>
> When requesting --tmp with srun or sbatch, it can be done in various size
> formats:
>
> *--tmp*=<*size[units]*> Specify a minimum amount of temporary disk space
> per node. Default units are megabytes unless the SchedulerParameters
> configuration parameter includes the "default_gbytes" option for gigabytes.
> Different units can be specified using the suffix [K|M|G|T].
> https://slurm.schedmd.com/sbatch.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_sbatch.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=S4Dlg4-bVtpN7A5uGS3U053sTbGLQhWWqF7czJWRhO8=yF9k9ysKRaVnu_ZnNdHsxj8Yc6X7PXTlId7i3XgZ5V4=>
>
>
>
> ---
> Erik Ellestad
> Wynton Cluster SysAdmin
> UCSF
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Tuesday, April 14, 2020 11:19 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] How to request for the allocation of scratch
> .
>
> Thank you Erik.
>
> To define the local scratch on all the compute node is not mandatory? only
> on slurm server is enough right?
> Also the TMPdisk should be defined in MB or can be defined in GB as well
>
> while requesting --tmp , we can use the value in GB right?
>
> Regards
> Navin.
>
>
>
> On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik 
> wrote:
>
> Have you defined the TmpDisk value for each node?
>
> As far as I know, local disk space is not a valid type for GRES.
>
> https://slurm.schedmd.com/gres.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=qpy9RbpYHEmd6jqfs9j8b4IiRJf3GkO-X_v05nL-8Bo=B5wXNgnl6EdXIaS0QQpITnBTSxcjnAHJENyJGLyWltI=>
>
> "Generic resource (GRES) scheduling is supported through a flexible plugin
> mechanism. Support is curr

[slurm-users] log rotation for slurmctld.

2020-03-13 Thread navin srivastava
Hi,

i wanted to understand how log rotation of slurmctld works.
in my environment i don't have any logrotation for the slurmctld.log and
now the log file size reached to 125GB.

can i move the log file  to some other location and then restart.reload of
slurm service will start a new log file.i think this should work without
any issues.
am i right or it will create any issue.

Also i need to create a log rotate.is the below config works as it is.i
need to do it on production environment so asking to make sure it will work
fine without any issue.

/var/log/slurm/slurmctld.log {
weekly
missingok
notifempty
sharedscripts
create 0600 slurm slurm
rotate 8
compress
postrotate
  /bin/systemctl reload slurmctld.service > /dev/null 2>/dev/null || true
endscript
}



Regards
Navin.


Re: [slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava
i missed to add the scheduling parameter.


SchedulerType=sched/builtin
#SchedulerParameters=enable_user_top
SelectType=select/cons_res
#SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_Core

# JOB PRIORITY
PriorityType=priority/multifactor
PriorityDecayHalfLife=2
PriorityUsageResetPeriod=DAILY
PriorityWeightFairshare=50
PriorityFlags=FAIR_TREE

could you please also suggest here if the scheduling policy is fairshare
then still it will consider the priority over the partition?

Regards
Navin.


On Sat, Apr 4, 2020 at 8:34 PM navin srivastava 
wrote:

> Hi Team,
>
> I am facing one issue in my environment. our slurm version is 17.11.x
>
> My question is i have 2 partition:
>
> Queue A  with  node1 and node2  with Priority=1000 shared=yes
> Queue B with node1 and node2  with priority=100. shared =yes
>
> Problem is when job from A partition is running then the job from
> partition B is not going through eventhough the cpu is available on node1
> and node2.
>
> it is only accepting the job from partition A but not from B.
> Oversubscription=No tells job will run from both the partition then it
> should allow.
>
> Any suggestion.
>
> Regards
> Navin.
>
>
>
>
>
>
>
>


[slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava
Hi Team,

I am facing one issue in my environment. our slurm version is 17.11.x

My question is i have 2 partition:

Queue A  with  node1 and node2  with Priority=1000 shared=yes
Queue B with node1 and node2  with priority=100. shared =yes

Problem is when job from A partition is running then the job from partition
B is not going through eventhough the cpu is available on node1 and node2.

it is only accepting the job from partition A but not from B.
Oversubscription=No tells job will run from both the partition then it
should allow.

Any suggestion.

Regards
Navin.


[slurm-users] not allocating the node for job execution even resources are available.

2020-03-31 Thread navin srivastava
Hi ,

have an issue with the resource allocation.

In the environment have partition like below:

PartitionName=small_jobs Nodes=Node[17,20]  Default=NO MaxTime=INFINITE
State=UP Shared=YES Priority=8000
PartitionName=large_jobs Nodes=Node[17,20]  Default=NO MaxTime=INFINITE
State=UP Shared=YES Priority=100

Also the node allocated with less cpu and lot of cpu resources available

NodeName=Node17 Arch=x86_64 CoresPerSocket=18
   CPUAlloc=4 CPUErr=0 CPUTot=36 CPULoad=4.09
   AvailableFeatures=K2200
   ActiveFeatures=K2200
   Gres=gpu:2
   NodeAddr=Node1717 NodeHostName=Node17 Version=17.11
   OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018
(3090901)
   RealMemory=1 AllocMem=0 FreeMem=225552 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=small_jobs,large_jobs
   BootTime=2020-03-21T18:56:48 SlurmdStartTime=2020-03-31T09:07:03
   CfgTRES=cpu=36,mem=1M,billing=36
   AllocTRES=cpu=4
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

there is no other job in small_jobs partition but several jobs are in
pending in the large_jobs and the resources are available but jobs are not
going through.

one of the job pening output is:

scontrol show job 1250258
   JobId=1250258 JobName=import_workflow
   UserId=m209767(100468) GroupId=oled(4289) MCS_label=N/A
   Priority=363157 Nice=0 Account=oledgrp QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2020-03-28T22:00:13 EligibleTime=2020-03-28T22:00:13
   StartTime=2070-03-19T11:59:09 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2020-03-31T12:58:48
   Partition=large_jobs AllocNode:Sid=deda1x1466:62260
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

this is my slurm.conf file for scheduling.


SchedulerType=sched/builtin
#SchedulerParameters=enable_user_top
SelectType=select/cons_res
#SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_Core


Any idea why the job is not going for execution if cpu cores are avaiable.

Also would like to know if any jobs are running on a particular node and if
i restart the Slurmd service then in what scenario my job will get killed.
Generally it should not kill the job.

Regards
Navin.


Re: [slurm-users] not allocating the node for job execution even resources are available.

2020-04-01 Thread navin srivastava
In addition to the above problem . oversubscription is NO then according to
the document.so in this scenario even if resources are available it is  ot
accepting the job from other partition.  Even i made the same priority for
both the partition but it didn't help. Any Suggestion here.

Slurm Workload Manager - Sharing Consumable Resources
Two OverSubscribe=NO partitions assigned the same set of nodes Jobs from
either partition will be assigned to all available consumable resources. No
consumable resource will be shared. One node could have 2 jobs running on
it, and each job could be from a different partition.

On Tue, Mar 31, 2020 at 4:34 PM navin srivastava 
wrote:

> Hi ,
>
> have an issue with the resource allocation.
>
> In the environment have partition like below:
>
> PartitionName=small_jobs Nodes=Node[17,20]  Default=NO MaxTime=INFINITE
> State=UP Shared=YES Priority=8000
> PartitionName=large_jobs Nodes=Node[17,20]  Default=NO MaxTime=INFINITE
> State=UP Shared=YES Priority=100
>
> Also the node allocated with less cpu and lot of cpu resources available
>
> NodeName=Node17 Arch=x86_64 CoresPerSocket=18
>CPUAlloc=4 CPUErr=0 CPUTot=36 CPULoad=4.09
>AvailableFeatures=K2200
>ActiveFeatures=K2200
>Gres=gpu:2
>NodeAddr=Node1717 NodeHostName=Node17 Version=17.11
>OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018
> (3090901)
>RealMemory=1 AllocMem=0 FreeMem=225552 Sockets=2 Boards=1
>State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>Partitions=small_jobs,large_jobs
>BootTime=2020-03-21T18:56:48 SlurmdStartTime=2020-03-31T09:07:03
>CfgTRES=cpu=36,mem=1M,billing=36
>AllocTRES=cpu=4
>CapWatts=n/a
>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> there is no other job in small_jobs partition but several jobs are in
> pending in the large_jobs and the resources are available but jobs are not
> going through.
>
> one of the job pening output is:
>
> scontrol show job 1250258
>JobId=1250258 JobName=import_workflow
>UserId=m209767(100468) GroupId=oled(4289) MCS_label=N/A
>Priority=363157 Nice=0 Account=oledgrp QOS=normal
>JobState=PENDING Reason=Priority Dependency=(null)
>Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>SubmitTime=2020-03-28T22:00:13 EligibleTime=2020-03-28T22:00:13
>StartTime=2070-03-19T11:59:09 EndTime=Unknown Deadline=N/A
>PreemptTime=None SuspendTime=None SecsPreSuspend=0
>LastSchedEval=2020-03-31T12:58:48
>Partition=large_jobs AllocNode:Sid=deda1x1466:62260
>ReqNodeList=(null) ExcNodeList=(null)
>NodeList=(null)
>NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>TRES=cpu=1,node=1
>Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>Features=(null) DelayBoot=00:00:00
>Gres=(null) Reservation=(null)
>OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>
> this is my slurm.conf file for scheduling.
>
>
> SchedulerType=sched/builtin
> #SchedulerParameters=enable_user_top
> SelectType=select/cons_res
> #SelectTypeParameters=CR_Core_Memory
> SelectTypeParameters=CR_Core
>
>
> Any idea why the job is not going for execution if cpu cores are avaiable.
>
> Also would like to know if any jobs are running on a particular node and
> if i restart the Slurmd service then in what scenario my job will get
> killed. Generally it should not kill the job.
>
> Regards
> Navin.
>
>
>
>
>


Re: [slurm-users] not allocating jobs even resources are free

2020-04-26 Thread navin srivastava
Thanks Brian,

As suggested i gone through document and what i understood  that the fair
tree leads to the Fairshare mechanism and based on that the job should be
scheduling.

so it mean job scheduling will be based on FIFO but priority will be
decided on the Fairshare. i am not sure if both conflicts here.if i see the
normal jobs priority is lower than the GPUsmall priority. so resources are
available with gpusmall partition then it should go. there is no job pend
due to gpu resources. the gpu resources itself not asked with the job.

is there any article where i can see how the fairshare works and which are
setting should not be conflict with this.
According to document it never says that if fair-share is applied then FIFO
should be disabled.

Regards
Navin.





On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson  wrote:

>
> If you haven't looked at the man page for slurm.conf, it will answer most
> if not all your questions.
> https://slurm.schedmd.com/slurm.conf.html but I would depend on the the
> manual version that was distributed with the version you have installed as
> options do change.
>
> There is a ton of information that is tedious to get through but reading
> through it multiple times opens many doors.
>
> DefaultTime is listed in there as a Partition option.
> If you are scheduling gres/gpu resources, it's quite possible there are
> cores available with no corresponding gpus avail.
>
> -b
>
> On 4/24/20 2:49 PM, navin srivastava wrote:
>
> Thanks Brian.
>
> I need  to check the jobs order.
>
> Is there  any way to define the default timeline of the job if user  not
> specifying time limit.
>
> Also what does the meaning of fairtree  in priorities in slurm.Conf file.
>
> The set of nodes are different in partitions.FIFO  does  not care for any
> partitiong.
> Is it like strict odering means the job came 1st will go and until  it
> runs it will  not allow others.
>
> Also priorities is high for gpusmall partition and low for normal jobs and
> the nodes of the normal partition is full but gpusmall cores are available.
>
> Regards
> Navin
>
> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson  wrote:
>
>> Without seeing the jobs in your queue, I would expect the next job in
>> FIFO order to be too large to fit in the current idle resources.
>>
>> Configure it to use the backfill scheduler: SchedulerType=sched/backfill
>>
>>   SchedulerType
>>   Identifies  the type of scheduler to be used.  Note the
>> slurmctld daemon must be restarted for a change in scheduler type to become
>> effective (reconfiguring a running daemon has no effect for this
>> parameter).  The scontrol command can be used to manually change job
>> priorities if desired.  Acceptable values include:
>>
>>   sched/backfill
>>  For a backfill scheduling module to augment the
>> default FIFO scheduling.  Backfill scheduling will initiate lower-priority
>> jobs if doing so does not delay the expected initiation time of any
>> higher  priority  job.   Effectiveness  of  backfill scheduling is
>> dependent upon users specifying job time limits, otherwise all jobs will
>> have the same time limit and backfilling is impossible.  Note documentation
>> for the SchedulerParameters option above.  This is the default
>> configuration.
>>
>>   sched/builtin
>>  This  is  the  FIFO scheduler which initiates jobs
>> in priority order.  If any job in the partition can not be scheduled, no
>> lower priority job in that partition will be scheduled.  An exception is
>> made for jobs that can not run due to partition constraints (e.g. the time
>> limit) or down/drained nodes.  In that case, lower priority jobs can be
>> initiated and not impact the higher priority job.
>>
>>
>>
>> Your partitions are set with maxtime=INFINITE, if your users are not
>> specifying a reasonable timelimit to their jobs, this won't help either.
>>
>>
>> -b
>>
>>
>> On 4/24/20 1:52 PM, navin srivastava wrote:
>>
>> In addition to the above when i see the sprio of both the jobs it says :-
>>
>> for normal queue jobs all jobs showing the same priority
>>
>>  JOBID PARTITION   PRIORITY  FAIRSHARE
>> 1291352 normal   15789  15789
>>
>> for GPUsmall all jobs showing the same priority.
>>
>>  JOBID PARTITION   PRIORITY  FAIRSHARE
>> 1291339 GPUsmall  21052  21053
>>
>> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava 
>> wrote:
>>
>>> Hi Team,
>>>
>>> we are facing some issu

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
In addition to the above when i see the sprio of both the jobs it says :-

for normal queue jobs all jobs showing the same priority

 JOBID PARTITION   PRIORITY  FAIRSHARE
1291352 normal   15789  15789

for GPUsmall all jobs showing the same priority.

 JOBID PARTITION   PRIORITY  FAIRSHARE
1291339 GPUsmall  21052  21053

On Fri, Apr 24, 2020 at 11:14 PM navin srivastava 
wrote:

> Hi Team,
>
> we are facing some issue in our environment. The resources are free but
> job is going into the QUEUE state but not running.
>
> i have attached the slurm.conf file here.
>
> scenario:-
>
> There are job only in the 2 partitions:
>  344 jobs are in PD state in normal partition and the node belongs
> from the normal partitions are full and no more job can run.
>
> 1300 JOBS are in GPUsmall partition are in queue and enough CPU is
> avaiable to execute the jobs but i see the jobs are not scheduling on free
> nodes.
>
> Rest there are no pend jobs in any other partition .
> eg:-
> node status:- node18
>
> NodeName=node18 Arch=x86_64 CoresPerSocket=18
>CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
>AvailableFeatures=K2200
>ActiveFeatures=K2200
>Gres=gpu:2
>NodeAddr=node18 NodeHostName=node18 Version=17.11
>OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018
> (0b375e4)
>RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1
>State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>Partitions=GPUsmall,pm_shared
>BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08
>CfgTRES=cpu=36,mem=1M,billing=36
>AllocTRES=cpu=6
>CapWatts=n/a
>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> node19:-
>
> NodeName=node19 Arch=x86_64 CoresPerSocket=18
>CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
>AvailableFeatures=K2200
>ActiveFeatures=K2200
>Gres=gpu:2
>NodeAddr=node19 NodeHostName=node19 Version=17.11
>OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018
> (3090901)
>RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1
>State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>Partitions=GPUsmall,pm_shared
>BootTime=2020-03-12T06:51:54 SlurmdStartTime=2020-03-12T06:53:14
>CfgTRES=cpu=36,mem=1M,billing=36
>AllocTRES=cpu=16
>CapWatts=n/a
>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> could you please help me to understand what could be the reason?
>
>
>
>
>
>
>
>
>
>


[slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
Hi Team,

we are facing some issue in our environment. The resources are free but job
is going into the QUEUE state but not running.

i have attached the slurm.conf file here.

scenario:-

There are job only in the 2 partitions:
 344 jobs are in PD state in normal partition and the node belongs from the
normal partitions are full and no more job can run.

1300 JOBS are in GPUsmall partition are in queue and enough CPU is
avaiable to execute the jobs but i see the jobs are not scheduling on free
nodes.

Rest there are no pend jobs in any other partition .
eg:-
node status:- node18

NodeName=node18 Arch=x86_64 CoresPerSocket=18
   CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
   AvailableFeatures=K2200
   ActiveFeatures=K2200
   Gres=gpu:2
   NodeAddr=node18 NodeHostName=node18 Version=17.11
   OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018
(0b375e4)
   RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=GPUsmall,pm_shared
   BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08
   CfgTRES=cpu=36,mem=1M,billing=36
   AllocTRES=cpu=6
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

node19:-

NodeName=node19 Arch=x86_64 CoresPerSocket=18
   CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
   AvailableFeatures=K2200
   ActiveFeatures=K2200
   Gres=gpu:2
   NodeAddr=node19 NodeHostName=node19 Version=17.11
   OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018
(3090901)
   RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=GPUsmall,pm_shared
   BootTime=2020-03-12T06:51:54 SlurmdStartTime=2020-03-12T06:53:14
   CfgTRES=cpu=36,mem=1M,billing=36
   AllocTRES=cpu=16
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

could you please help me to understand what could be the reason?
 cat /etc/slurm/slurm.conf
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
#Running_config_start
#ControlMachine=node0
ControlMachine=slurmmaster
ControlAddr=192.168.150.21
AuthType=auth/munge
CryptoType=crypto/munge
CacheGroups=1
ReturnToService=0
ProctrackType=proctrack/linuxproc
SlurmctldPort=6817
SlurmdPort=6818
SchedulerPort=7321
SlurmctldPidFile=/var/slurm/slurmctld.pid
SlurmdPidFile=/var/slurm/slurmd.pid
SlurmdSpoolDir=/var/slurm/spool/slurmd.%n.spool
StateSaveLocation=/var/slurm/state
SlurmctldLogFile=/var/slurm/log/slurmctld.log
SlurmdLogFile=/var/slurm/log/slurmd.%n.log.%h
SlurmUser=hpcadmin
MpiDefault=none
SwitchType=switch/none
TaskPlugin=task/affinity
TaskPluginParam=Sched
SlurmctldTimeout=120
SlurmdTimeout=300
InactiveLimit=0
KillWait=30
MinJobAge=3600
FastSchedule=1
SchedulerType=sched/builtin
#SchedulerParameters=enable_user_top
SelectType=select/cons_res
#SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_Core
AccountingStorageEnforce=associations
AccountingStorageHost=155.250.126.30
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStoreJobComment=YES
ClusterName=merckhpc
JobCompType=jobcomp/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=5
SlurmdDebug=5
Waittime=0
#Running_config_end
#ControlAddr=
#BackupController=
#BackupAddr=
#
#CheckpointType=checkpoint/none
#DisableRootJobs=NO
#EnforcePartLimits=NO
Epilog=/etc/slurm/slurm.epilog.clean
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=99
GresTypes=gpu
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
MaxJobCount=500
#MaxStepCount=4
#MaxTasksPerNode=128
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#Prolog=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#SallocDefaultCommand=
#SrunEpilog=
#SrunProlog=
#TaskEpilog=
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFs=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
MessageTimeout=100
#ResvOverRun=0
#OverTimeLimit=0
#UnkillableStepTimeout=60
#VSizeFactor=0
SchedulerParameters=enable_user_top,default_queue_depth=100
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
#
#
# JOB PRIORITY
PriorityType=priority/multifactor
#PriortyFlags=Ticket_Based
#PriorityDecayHalfLife=1-0
PriorityDecayHalfLife=2
#PriorityCalcPeriod=
#PriorityFavorSmall=YES
#PriorityMaxAge=7-0
PriorityUsageResetPeriod=DAILY

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
Thanks Brian.

I need  to check the jobs order.

Is there  any way to define the default timeline of the job if user  not
specifying time limit.

Also what does the meaning of fairtree  in priorities in slurm.Conf file.

The set of nodes are different in partitions.FIFO  does  not care for any
partitiong.
Is it like strict odering means the job came 1st will go and until  it runs
it will  not allow others.

Also priorities is high for gpusmall partition and low for normal jobs and
the nodes of the normal partition is full but gpusmall cores are available.

Regards
Navin

On Fri, Apr 24, 2020, 23:49 Brian W. Johanson  wrote:

> Without seeing the jobs in your queue, I would expect the next job in FIFO
> order to be too large to fit in the current idle resources.
>
> Configure it to use the backfill scheduler: SchedulerType=sched/backfill
>
>   SchedulerType
>   Identifies  the type of scheduler to be used.  Note the
> slurmctld daemon must be restarted for a change in scheduler type to become
> effective (reconfiguring a running daemon has no effect for this
> parameter).  The scontrol command can be used to manually change job
> priorities if desired.  Acceptable values include:
>
>   sched/backfill
>  For a backfill scheduling module to augment the
> default FIFO scheduling.  Backfill scheduling will initiate lower-priority
> jobs if doing so does not delay the expected initiation time of any
> higher  priority  job.   Effectiveness  of  backfill scheduling is
> dependent upon users specifying job time limits, otherwise all jobs will
> have the same time limit and backfilling is impossible.  Note documentation
> for the SchedulerParameters option above.  This is the default
> configuration.
>
>   sched/builtin
>  This  is  the  FIFO scheduler which initiates jobs in
> priority order.  If any job in the partition can not be scheduled, no lower
> priority job in that partition will be scheduled.  An exception is made for
> jobs that can not run due to partition constraints (e.g. the time limit) or
> down/drained nodes.  In that case, lower priority jobs can be initiated and
> not impact the higher priority job.
>
>
>
> Your partitions are set with maxtime=INFINITE, if your users are not
> specifying a reasonable timelimit to their jobs, this won't help either.
>
>
> -b
>
>
> On 4/24/20 1:52 PM, navin srivastava wrote:
>
> In addition to the above when i see the sprio of both the jobs it says :-
>
> for normal queue jobs all jobs showing the same priority
>
>  JOBID PARTITION   PRIORITY  FAIRSHARE
> 1291352 normal   15789  15789
>
> for GPUsmall all jobs showing the same priority.
>
>  JOBID PARTITION   PRIORITY  FAIRSHARE
> 1291339 GPUsmall  21052  21053
>
> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava 
> wrote:
>
>> Hi Team,
>>
>> we are facing some issue in our environment. The resources are free but
>> job is going into the QUEUE state but not running.
>>
>> i have attached the slurm.conf file here.
>>
>> scenario:-
>>
>> There are job only in the 2 partitions:
>>  344 jobs are in PD state in normal partition and the node belongs
>> from the normal partitions are full and no more job can run.
>>
>> 1300 JOBS are in GPUsmall partition are in queue and enough CPU is
>> avaiable to execute the jobs but i see the jobs are not scheduling on free
>> nodes.
>>
>> Rest there are no pend jobs in any other partition .
>> eg:-
>> node status:- node18
>>
>> NodeName=node18 Arch=x86_64 CoresPerSocket=18
>>CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
>>AvailableFeatures=K2200
>>ActiveFeatures=K2200
>>Gres=gpu:2
>>NodeAddr=node18 NodeHostName=node18 Version=17.11
>>OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018
>> (0b375e4)
>>RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1
>>State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>>Partitions=GPUsmall,pm_shared
>>BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08
>>CfgTRES=cpu=36,mem=1M,billing=36
>>AllocTRES=cpu=6
>>CapWatts=n/a
>>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>> node19:-
>>
>> NodeName=node19 Arch=x86_64 CoresPerSocket=18
>>CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
>>AvailableFeatures=K2200
>>ActiveFeatures=K2200
>>Gres=gpu:2
>>NodeAddr=node19 NodeHostName=node19 Version=17.11

Re: [slurm-users] not allocating jobs even resources are free

2020-05-04 Thread navin srivastava
Thanks Denial for detailed  Description

Regards
Navin

On Sun, May 3, 2020, 13:35 Daniel Letai  wrote:

>
> On 29/04/2020 12:00:13, navin srivastava wrote:
>
> Thanks Daniel.
>
> All jobs went into run state so unable to provide the details but
> definitely will reach out later if we see similar issue.
>
> i am more interested to understand the FIFO with Fair Tree.it will be good
> if anybody provide some insight on this combination and also if we will
> enable the backfilling here how the behaviour will change.
>
> what is the role of the Fair tree here?
>
> Fair tree is the algorithm used to calculate the interim priority, before
> applying weight, but I think after the halflife decay.
>
>
> To make it simple - fifo without fairshare would assign priority based
> only on submission time. With faishare, that naive priority is adjusted
> based on prior usage by the applicable entities (users/departments -
> accounts).
>
>
> Backfill will let you utilize your resources better, since it will allow
> "inserting" low priority jobs before higher priority jobs, provided all
> jobs have defined wall times, and any inserted job doesn't affect in any
> way the start time of a higher priority job, thus allowing utilization of
> "holes" when the scheduler waits for resources to free up, in order to
> insert some large job.
>
>
> Suppose the system is at 60% utilization of cores, and the next fifo job
> requires 42% - it will wait until 2% are free so it can begin, meanwhile
> not allowing any job to start, even if it would tke only 30% of the
> resources (whic are currently free) and would finish before the 2% are free
> anyway.
>
> Backfill would allow such job to start, as long as it's wall time ensures
> it would finish before the 42% job would've started.
>
>
> Fairtree in either case (fifo or backfill) calculates the priority for
> each job the same - if the account had used more resources recently (the
> halflife decay factor) it would get a lower priority even though it was
> submitted earlier than a job from an account that didn't use any resources
> recently.
>
>
> As can be expected, backtree has to loop over all jobs in the queue, in
> order to see if any job can fit out of order. In very busy/active systems,
> that can lead to poor response times, unless tuned correctly in slurm conf
> - look at SchedulerParameters, all params starting with bf_ and in
> particular bf_max_job_test= ,bf_max_time= and bf_continue (but bf_window=
> can also have some impact if set too high).
>
> see the man page at
> https://slurm.schedmd.com/slurm.conf.html#OPT_SchedulerParameters
>
>
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=2
> PriorityUsageResetPeriod=DAILY
> PriorityWeightFairshare=50
> PriorityFlags=FAIR_TREE
>
> Regards
> Navin.
>
>
>
> On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai  wrote:
>
>> Are you sure there are enough resources available? The node is in mixed
>> state, so it's configured for both partitions - it's possible that earlier
>> lower priority jobs are already running thus blocking the later jobs,
>> especially since it's fifo.
>>
>>
>> It would really help if you pasted the results of:
>>
>> squeue
>>
>> sinfo
>>
>>
>> As well as the exact sbatch line, so we can see how many resources per
>> node are requested.
>>
>>
>> On 26/04/2020 12:00:06, navin srivastava wrote:
>>
>> Thanks Brian,
>>
>> As suggested i gone through document and what i understood  that the fair
>> tree leads to the Fairshare mechanism and based on that the job should be
>> scheduling.
>>
>> so it mean job scheduling will be based on FIFO but priority will be
>> decided on the Fairshare. i am not sure if both conflicts here.if i see the
>> normal jobs priority is lower than the GPUsmall priority. so resources are
>> available with gpusmall partition then it should go. there is no job pend
>> due to gpu resources. the gpu resources itself not asked with the job.
>>
>> is there any article where i can see how the fairshare works and which
>> are setting should not be conflict with this.
>> According to document it never says that if fair-share is applied then
>> FIFO should be disabled.
>>
>> Regards
>> Navin.
>>
>>
>>
>>
>>
>> On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson 
>> wrote:
>>
>>>
>>> If you haven't looked at the man page for slurm.conf, it will answer
>>> most if not all your questions.
>>> https://slurm.schedmd.com/slurm.conf.html but I would depend o

Re: [slurm-users] not allocating jobs even resources are free

2020-04-29 Thread navin srivastava
Thanks Daniel.

All jobs went into run state so unable to provide the details but
definitely will reach out later if we see similar issue.

i am more interested to understand the FIFO with Fair Tree.it will be good
if anybody provide some insight on this combination and also if we will
enable the backfilling here how the behaviour will change.

what is the role of the Fair tree here?

PriorityType=priority/multifactor
PriorityDecayHalfLife=2
PriorityUsageResetPeriod=DAILY
PriorityWeightFairshare=50
PriorityFlags=FAIR_TREE

Regards
Navin.



On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai  wrote:

> Are you sure there are enough resources available? The node is in mixed
> state, so it's configured for both partitions - it's possible that earlier
> lower priority jobs are already running thus blocking the later jobs,
> especially since it's fifo.
>
>
> It would really help if you pasted the results of:
>
> squeue
>
> sinfo
>
>
> As well as the exact sbatch line, so we can see how many resources per
> node are requested.
>
>
> On 26/04/2020 12:00:06, navin srivastava wrote:
>
> Thanks Brian,
>
> As suggested i gone through document and what i understood  that the fair
> tree leads to the Fairshare mechanism and based on that the job should be
> scheduling.
>
> so it mean job scheduling will be based on FIFO but priority will be
> decided on the Fairshare. i am not sure if both conflicts here.if i see the
> normal jobs priority is lower than the GPUsmall priority. so resources are
> available with gpusmall partition then it should go. there is no job pend
> due to gpu resources. the gpu resources itself not asked with the job.
>
> is there any article where i can see how the fairshare works and which are
> setting should not be conflict with this.
> According to document it never says that if fair-share is applied then
> FIFO should be disabled.
>
> Regards
> Navin.
>
>
>
>
>
> On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson 
> wrote:
>
>>
>> If you haven't looked at the man page for slurm.conf, it will answer most
>> if not all your questions.
>> https://slurm.schedmd.com/slurm.conf.html but I would depend on the the
>> manual version that was distributed with the version you have installed as
>> options do change.
>>
>> There is a ton of information that is tedious to get through but reading
>> through it multiple times opens many doors.
>>
>> DefaultTime is listed in there as a Partition option.
>> If you are scheduling gres/gpu resources, it's quite possible there are
>> cores available with no corresponding gpus avail.
>>
>> -b
>>
>> On 4/24/20 2:49 PM, navin srivastava wrote:
>>
>> Thanks Brian.
>>
>> I need  to check the jobs order.
>>
>> Is there  any way to define the default timeline of the job if user  not
>> specifying time limit.
>>
>> Also what does the meaning of fairtree  in priorities in slurm.Conf file.
>>
>> The set of nodes are different in partitions.FIFO  does  not care for
>> any  partitiong.
>> Is it like strict odering means the job came 1st will go and until  it
>> runs it will  not allow others.
>>
>> Also priorities is high for gpusmall partition and low for normal jobs
>> and the nodes of the normal partition is full but gpusmall cores are
>> available.
>>
>> Regards
>> Navin
>>
>> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson  wrote:
>>
>>> Without seeing the jobs in your queue, I would expect the next job in
>>> FIFO order to be too large to fit in the current idle resources.
>>>
>>> Configure it to use the backfill scheduler: SchedulerType=sched/backfill
>>>
>>>   SchedulerType
>>>   Identifies  the type of scheduler to be used.  Note the
>>> slurmctld daemon must be restarted for a change in scheduler type to become
>>> effective (reconfiguring a running daemon has no effect for this
>>> parameter).  The scontrol command can be used to manually change job
>>> priorities if desired.  Acceptable values include:
>>>
>>>   sched/backfill
>>>  For a backfill scheduling module to augment the
>>> default FIFO scheduling.  Backfill scheduling will initiate lower-priority
>>> jobs if doing so does not delay the expected initiation time of any
>>> higher  priority  job.   Effectiveness  of  backfill scheduling is
>>> dependent upon users specifying job time limits, otherwise all jobs will
>>> have the same time limit and backfilling is impossible.  Note documentation
>

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-14 Thread navin srivastava
Any suggestion on the above query.need help to understand it.
Does TmpFS=/scratch   and the request is #SBATCH --tmp=500GB  then it will
reserve the 500GB from scratch.
let me know if my assumption is correct?

Regards
Navin.


On Mon, Apr 13, 2020 at 11:10 AM navin srivastava 
wrote:

> Hi Team,
>
> i wanted to define a mechanism to request the local disk space while
> submitting the job.
>
> we have dedicated /scratch of 1.2 TB file system for the execution of the
> job on each of the compute nodes other than / and other file system.
> i have defined in slurm.conf as TmpFS=/scratch  and then wanted to use
> #SBATCH --scratch =10GB   in the request.
> but it seems it is not accepting this variable except /tmp.
>
> Then i have opted the mechanism of gres.conf
>
> GresTypes=gpu,scratch
>
> and defined each node the scratch value and then requested using
> --gres=lscratch:10GB
> but in this scenario if requesting both gres resources gpu as well as
> scratch it show me only scratch in my Gres resource not gpu.
> does it using the gpu also as a gres resource?
>
> could anybody please advice which is the correct method to achieve the
> same?
> Also, is scratch will be able to calculate the actual usage value on the
> node.
>
> REgards
> Navin.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: [slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
Thanks Michael,

yes i have gone through but the licenses are remote license and it will be
used by outside as well not only in slurm.
so basically i am interested to know how we can update the database
dynamically to get the exact value at that point of time.
i mean query the license server and update the database accordingly. does
slurm automatically updated the value based on usage?


Regards
Navin.


On Tue, May 5, 2020 at 7:00 PM Renfro, Michael  wrote:

> Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in
> slurm.conf plus users submitting with the -L flag should suffice. Should be
> able to set that license value is 4 if it’s licensed per node and you can
> run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1
> if it’s a single license good for one run from 1-4 nodes.
>
> There are also options to query a FlexLM or RLM server for license
> management.
>
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601 / Tennessee Tech University
>
> > On May 5, 2020, at 7:54 AM, navin srivastava 
> wrote:
> >
> > Hi Team,
> >
> > we have an application whose licenses is limited .it scales upto 4
> nodes(~80 cores).
> > so if 4 nodes are full, in 5th node job used to get fail.
> > we want to put a restriction so that the application can't go for the
> execution beyond the 4 nodes and fail it should be in queue state.
> > i do not want to keep a separate partition to achieve this config.is
> there a way to achieve this scenario using some dynamic resource which can
> call the license variable on the fly and if it is reached it should keep
> the job in queue.
> >
> > Regards
> > Navin.
> >
> >
> >
>
>


Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
Thanks Micheal.

Actually one application license are based on node and we have 4 Node
license( not a fix node). we have several nodes but when job lands on any 4
random nodes it runs on those nodes only. After that it fails if it goes to
other nodes.

can we define a custom variable and set it on the node level and when user
submit it will pass that variable and then job will and onto those specific
nodes?
i do not want to create a separate partition.

is there any way to achieve this by any other method?

Regards
Navin.


Regards
Navin.

On Tue, May 5, 2020 at 7:46 PM Renfro, Michael  wrote:

> Haven’t done it yet myself, but it’s on my todo list.
>
> But I’d assume that if you use the FlexLM or RLM parts of that
> documentation, that Slurm would query the remote license server
> periodically and hold the job until the necessary licenses were available.
>
> > On May 5, 2020, at 8:37 AM, navin srivastava 
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Thanks Michael,
> >
> > yes i have gone through but the licenses are remote license and it will
> be used by outside as well not only in slurm.
> > so basically i am interested to know how we can update the database
> dynamically to get the exact value at that point of time.
> > i mean query the license server and update the database accordingly.
> does slurm automatically updated the value based on usage?
> >
> >
> > Regards
> > Navin.
> >
> >
> > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael 
> wrote:
> > Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in
> slurm.conf plus users submitting with the -L flag should suffice. Should be
> able to set that license value is 4 if it’s licensed per node and you can
> run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1
> if it’s a single license good for one run from 1-4 nodes.
> >
> > There are also options to query a FlexLM or RLM server for license
> management.
> >
> > --
> > Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> > 931 372-3601 / Tennessee Tech University
> >
> > > On May 5, 2020, at 7:54 AM, navin srivastava 
> wrote:
> > >
> > > Hi Team,
> > >
> > > we have an application whose licenses is limited .it scales upto 4
> nodes(~80 cores).
> > > so if 4 nodes are full, in 5th node job used to get fail.
> > > we want to put a restriction so that the application can't go for the
> execution beyond the 4 nodes and fail it should be in queue state.
> > > i do not want to keep a separate partition to achieve this config.is
> there a way to achieve this scenario using some dynamic resource which can
> call the license variable on the fly and if it is reached it should keep
> the job in queue.
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > >
> >
>
>


[slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
Hi Team,

we have an application whose licenses is limited .it scales upto 4
nodes(~80 cores).
so if 4 nodes are full, in 5th node job used to get fail.
we want to put a restriction so that the application can't go for the
execution beyond the 4 nodes and fail it should be in queue state.
i do not want to keep a separate partition to achieve this config.is there
a way to achieve this scenario using some dynamic resource which can call
the license variable on the fly and if it is reached it should keep the job
in queue.

Regards
Navin.


Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
To explain with more details.

job will be submitted based on core at any time but it will go to any
random nodes but limited to 4 Nodes only.(license having some intelligence
that it calculate the nodes and if it reached to 4 then it will not allow
any more nodes. yes it didn't depend on the no of core available on nodes.

Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3
and node4]

 Again Fifth job assigned by SLURM with 4 cores on any one node
of node1, node2, node3 and node4 then license will be allowed.



Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3
and node4]

 Again Fifth job assigned by SLURM on node5 with 4 cores  then
license will not allowed [ license not found error came in this case]


Regards
Navin.


On Wed, May 6, 2020 at 7:47 PM Renfro, Michael  wrote:

> To make sure I’m reading this correctly, you have a software license that
> lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you
> use? That is, you could run any one of the following sets of jobs:
>
> - four 1-node jobs,
> - two 2-node jobs,
> - one 1-node and one 3-node job,
> - two 1-node and one 2-node jobs,
> - one 4-node job,
>
> simultaneously? And the license isn’t node-locked to specific nodes by MAC
> address or anything similar? But if you try to run jobs beyond what I’ve
> listed above, you run out of licenses, and you want those later jobs to be
> held until licenses are freed up?
>
> If all of those questions have an answer of ‘yes’, I think you want the
> remote license part of the https://slurm.schedmd.com/licenses.html,
> something like:
>
>   sacctmgr add resource name=software_name count=4 percentallowed=100
> server=flex_host servertype=flexlm type=license
>
> and submit jobs with a '-L software_name:N’ flag where N is the number of
> nodes you want to run on.
>
> > On May 6, 2020, at 5:33 AM, navin srivastava 
> wrote:
> >
> > Thanks Micheal.
> >
> > Actually one application license are based on node and we have 4 Node
> license( not a fix node). we have several nodes but when job lands on any 4
> random nodes it runs on those nodes only. After that it fails if it goes to
> other nodes.
> >
> > can we define a custom variable and set it on the node level and when
> user submit it will pass that variable and then job will and onto those
> specific nodes?
> > i do not want to create a separate partition.
> >
> > is there any way to achieve this by any other method?
> >
> > Regards
> > Navin.
> >
> >
> > Regards
> > Navin.
> >
> > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael 
> wrote:
> > Haven’t done it yet myself, but it’s on my todo list.
> >
> > But I’d assume that if you use the FlexLM or RLM parts of that
> documentation, that Slurm would query the remote license server
> periodically and hold the job until the necessary licenses were available.
> >
> > > On May 5, 2020, at 8:37 AM, navin srivastava 
> wrote:
> > >
> > > External Email Warning
> > > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > > Thanks Michael,
> > >
> > > yes i have gone through but the licenses are remote license and it
> will be used by outside as well not only in slurm.
> > > so basically i am interested to know how we can update the database
> dynamically to get the exact value at that point of time.
> > > i mean query the license server and update the database accordingly.
> does slurm automatically updated the value based on usage?
> > >
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael 
> wrote:
> > > Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in
> slurm.conf plus users submitting with the -L flag should suffice. Should be
> able to set that license value is 4 if it’s licensed per node and you can
> run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1
> if it’s a single license good for one run from 1-4 nodes.
> > >
> > > There are also options to query a FlexLM or RLM server for license
> management.
> > >
> > > --
> > > Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> > > 931 372-3601 / Tennessee Tech University
> > >
> > > > On May 5, 2020, at 7:54 AM, navin srivastava 
> wrote:
> > > >
> > > > Hi Team,
> > 

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
Is there no way to set or define a custom variable like at node level and
then you pass the same variable in the job request so that it will land
into those nodes only.


Regards
Navin

On Wed, May 6, 2020, 21:04 Renfro, Michael  wrote:

> Ok, then regular license accounting won’t work.
>
> Somewhat tested, but should work or at least be a starting point. Given a
> job number JOBID that’s already running with this license on one or more
> nodes:
>
>   sbatch -w $(scontrol show job JOBID | grep ' NodeList=' | cut -d= -f2)
> -N 1
>
> should start a one-node job on an available node being used by JOBID. Add
> other parameters as required for cpus-per-task, time limits, or whatever
> else is needed. If you start the larger jobs first, and let the later jobs
> fill in on idle CPUs on those nodes, it should work.
>
> > On May 6, 2020, at 9:46 AM, navin srivastava 
> wrote:
> >
> > To explain with more details.
> >
> > job will be submitted based on core at any time but it will go to any
> random nodes but limited to 4 Nodes only.(license having some intelligence
> that it calculate the nodes and if it reached to 4 then it will not allow
> any more nodes. yes it didn't depend on the no of core available on nodes.
> >
> > Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2,
> node3 and node4]
> >  Again Fifth job assigned by SLURM with 4 cores on any one
> node of node1, node2, node3 and node4 then license will be allowed.
> >
> > Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2,
> node3 and node4]
> >  Again Fifth job assigned by SLURM on node5 with 4 cores
> then license will not allowed [ license not found error came in this case]
> >
> > Regards
> > Navin.
> >
> >
> > On Wed, May 6, 2020 at 7:47 PM Renfro, Michael 
> wrote:
> > To make sure I’m reading this correctly, you have a software license
> that lets you run jobs on up to 4 nodes at once, regardless of how many
> CPUs you use? That is, you could run any one of the following sets of jobs:
> >
> > - four 1-node jobs,
> > - two 2-node jobs,
> > - one 1-node and one 3-node job,
> > - two 1-node and one 2-node jobs,
> > - one 4-node job,
> >
> > simultaneously? And the license isn’t node-locked to specific nodes by
> MAC address or anything similar? But if you try to run jobs beyond what
> I’ve listed above, you run out of licenses, and you want those later jobs
> to be held until licenses are freed up?
> >
> > If all of those questions have an answer of ‘yes’, I think you want the
> remote license part of the https://slurm.schedmd.com/licenses.html,
> something like:
> >
> >   sacctmgr add resource name=software_name count=4 percentallowed=100
> server=flex_host servertype=flexlm type=license
> >
> > and submit jobs with a '-L software_name:N’ flag where N is the number
> of nodes you want to run on.
> >
> > > On May 6, 2020, at 5:33 AM, navin srivastava 
> wrote:
> > >
> > > Thanks Micheal.
> > >
> > > Actually one application license are based on node and we have 4 Node
> license( not a fix node). we have several nodes but when job lands on any 4
> random nodes it runs on those nodes only. After that it fails if it goes to
> other nodes.
> > >
> > > can we define a custom variable and set it on the node level and when
> user submit it will pass that variable and then job will and onto those
> specific nodes?
> > > i do not want to create a separate partition.
> > >
> > > is there any way to achieve this by any other method?
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > > Regards
> > > Navin.
> > >
> > > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael 
> wrote:
> > > Haven’t done it yet myself, but it’s on my todo list.
> > >
> > > But I’d assume that if you use the FlexLM or RLM parts of that
> documentation, that Slurm would query the remote license server
> periodically and hold the job until the necessary licenses were available.
> > >
> > > > On May 5, 2020, at 8:37 AM, navin srivastava 
> wrote:
> > > >
> > > > External Email Warning
> > > > This email originated from outside the university. Please use
> caution when opening attachments, clicking links, or responding to requests.
> > > > Thanks Michael,
> > > >
> > > > yes i have gone through but the licenses are remote license and it
> will be used by outside as well not only in slurm.
> > > > so basically i am inter

[slurm-users] is there a way to delay the scheduling.

2020-08-28 Thread navin srivastava
Hi Team,

facing one issue. several users submitting 2 job in a single batch job
which is very short jobs( says 1-2 sec). so while submitting more job
slurmctld become unresponsive and started giving message

ending job 6e508a88155d9bec40d752c8331d7ae8 to queue.
sbatch: error: Batch job submission failed: Unable to contact slurm
controller (connect failure)
Sending job 6e51ed0e322c87802b0f3a2f23a7967f to queue.
sbatch: error: Batch job submission failed: Unable to contact slurm
controller (connect failure)
Sending job 6e638939f90cd59e60c23b8450af9839 to queue.
sbatch: error: Batch job submission failed: Unable to contact slurm
controller (connect failure)
Sending job 6e6acf36bc7e1394a92155a95feb1c92 to queue.
sbatch: error: Batch job submission failed: Unable to contact slurm
controller (connect failure)
Sending job 6e6c646a29f0ad4e9df35001c367a9f5 to queue.
sbatch: error: Batch job submission failed: Unable to contact slurm
controller (connect failure)
Sending job 6ebcecb4c27d88f0f48d402e2b079c52 to queue.

even that time the load of cpu started consuming more than 100%  of
slurmctld process.
I found that the node is not able to acknowledge immediately to server. it
is moving from comp to idle.
so in my thought delay a scheduling cycle will help here. any idea how it
can be done.

so is there any other solution available for such issues.

Regards
Navin.


[slurm-users] slurm Report

2020-09-24 Thread navin srivastava
Hi team,

i have extracted the %utilization report and found that the idle time is at
the higher end so wanted to check is there any way we can find the node
based utilization?

it will help us to figure out what are the nodes are unutilized.

REgards
navin.


[slurm-users] federation cluster management

2020-09-21 Thread navin srivastava
Deall all,

I read the concept of federation clusters in Slurm. is it really helpful to
maximize the cluster usage?

Actually we have 4 independent clusters with slurm which works with local
storage and wanted to build a federation cluster where we can be able to
utilize the free available compute power of nodes when idle. shall i
achieve the same by building up the federation cluster?

My worry is how i can handle the read/write operation of storage if it is
local to the cluster.

Any idea or suggestion will be welcome

Regards
Navin.


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-30 Thread navin srivastava
Hi Team,

I have differentiated the CPU node and GPU nodes into two different queues.

Now I have 20 Nodes having CPUS (20 cores)only but no GPU.
Another set of nodes having GPU+CPU.some nodes are with 2 GPU and 20 CPU
and some are with 8GPU and 48 CPU assigned to GPU queue

user facing issues when in GPU queue. the scenario is as below:

user submitting jobs with 4CPU+1GPU and also submitting jobs with 4CPU
only. So the situation arises when all the GPU is full and the job
submitted with GPU resources is waiting in queue but there is a large
amount of CPU available but the job which is only required CPU jobs are not
going through because the 4CPU+1GPU job has higher priority over CPU.

is there any mechanism that once all GPU is full in use it will allow the
CPU based job.

Regards
Navin.






On Mon, Jun 22, 2020 at 6:09 PM Diego Zuccato 
wrote:

> Il 16/06/20 16:23, Loris Bennett ha scritto:
>
> > Thanks for pointing this out - I hadn't been aware of this.  Is there
> > anywhere in the documentation where this is explicitly stated?
> I don't remember. Seems Michael's experience is different. Possibly some
> other setting influences that behaviour. Maybe different partition
> priorities?
> But on the small cluster I'm managing it's this way. I'm not an expert
> and I'd like to understand.
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
>


Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thank you for the answers.

is the RealMemory will be decided on the Total Memory value or total usable
memory value.

i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
 total   used   free sharedbuffers cached
Mem:   251 67184  6  0 47

so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any
slurm command which will provide me the value to add.

Regards
Navin.



On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus  wrote:

> Navin,
>
> 1. you will need to restart slurmctld when you make changes to the
> physical definition of a node. This can be done without affecting
> running jobs.
>
> 2. You can have a node in more than one partition. That will not hurt
> anything. Jobs are allocated to nodes, not partitions, the partition is
> used to determine which node(s) and filter/order jobs. You should add
> the node to the new partition, but also leave it in the 'test'
> partition. If you are looking to remove the 'test' partition, set it to
> down and once all the running jobs that are in it finish, then remove it.
>
> Brian Andrus
>
> On 7/8/2020 10:57 PM, navin srivastava wrote:
> > Hi Team,
> >
> > i have 2 small query.because of the lack of testing environment i am
> > unable to test the scenario. working on to set up a test environment.
> >
> > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> > i found the reason is because there is no RealMemory entry in the node
> > definition of the slurm.
> >
> > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
> > Sockets=2 CoresPerSocket=10 State=UNKNOWN
> >
> > if i add the RealMemory it should be able to pick. So my query here
> > is, is it possible to add RealMemory in the definition anytime while
> > the jobs are in progres and execute the scontrol reconfigure and
> > reload the daemon on client node?  or do we need to take a
> > downtime?(which i don't think so)
> >
> > 2. Also I would like to know what will happen if some jobs are running
> > in a partition(say test) and I will move the associated node to some
> > other partition(say normal) without draining the node.or if i suspend
> > the job and then change the node partition and will resume the job. I
> > am not deleting the partition here.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> >
> >
> >
>
>


[slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava
Hi Team,

We have separate partitions for the GPU nodes and only CPU nodes .

scenario: the jobs submitted in our environment is 4CPU+1GPU  as well as
4CPU only in  nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted
and rest other jobs are in queue waiting for the availability of GPU
resources.the job submitted with only CPU is not going through even
though plenty of CPU resources are available but the job which is only
looking CPU, also on pend because of these GPU based jobs( priority of GPU
jobs is higher than CPU one).

Is there any option here we can do,so that when all GPU resources are
exhausted then it should allow the CPU jobs. Is there a way to deal with
it? or some custom solution which we can think of.  There is no issue with
CPU only partitions.

Below is the my slurm configuration file


NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10
RealMemory=128833 State=UNKNOWN
NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10
RealMemory=515954 Feature=HIGHMEM State=UNKNOWN
NodeName=node[28-32]  NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28
RealMemory=257389
NodeName=node[32-33]  NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24
RealMemory=773418
NodeName=node[17-27]  NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18
RealMemory=257687 Feature=K2200 Gres=gpu:2
NodeName=node[34]  NodeAddr=node34 Sockets=2 CoresPerSocket=24
RealMemory=773410 Feature=RTX Gres=gpu:8


PartitionName=node Nodes=node[1-10,14-16,28-33,35]  Default=YES
MaxTime=INFINITE State=UP Shared=YES
PartitionName=nodeGPUsmall Nodes=node[17-27]  Default=NO MaxTime=INFINITE
State=UP Shared=YES
PartitionName=nodeGPUbig Nodes=node[34]  Default=NO MaxTime=INFINITE
State=UP Shared=YES

Regards
Navin.


Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thanks  either  I can use which slurmd  -C gives because I see same set of
node giving different value.or I can also choose the available  memory I
mean 251*1024

Regards
Navin

On Fri, Jul 10, 2020, 20:34 Stephan Roth  wrote:

> It's recommended to round RealMemory down to the next lower gigabyte
> value to prevent nodes from entering a drain state after rebooting with
> a bios- or kernel-update.
>
> Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node
> configuration"
>
> Stephan
>
> On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
> > If you run  slurmd -C  on the compute node, it should tell you what
> > slurm thinks the RealMemory number is.
> >
> > Jeff
> >
> > --------
> > *From:* slurm-users  on behalf
> of
> > navin srivastava 
> > *Sent:* Friday, July 10, 2020 6:24 AM
> > *To:* Slurm User Community List 
> > *Subject:* Re: [slurm-users] changes in slurm.
> > Thank you for the answers.
> >
> > is the RealMemory will be decided on the Total Memory value or total
> > usable memory value.
> >
> > i mean if a node having 256GB RAM but free -g will tell about only 251
> GB.
> > deda1x1591:~ # free -g
> >   total   used   free sharedbuffers
> cached
> > Mem:   251 67184  6  0 47
> >
> > so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any
> > slurm command which will provide me the value to add.
> >
> > Regards
> > Navin.
> >
> >
> >
> > On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus  > <mailto:toomuc...@gmail.com>> wrote:
> >
> > Navin,
> >
> > 1. you will need to restart slurmctld when you make changes to the
> > physical definition of a node. This can be done without affecting
> > running jobs.
> >
> > 2. You can have a node in more than one partition. That will not hurt
> > anything. Jobs are allocated to nodes, not partitions, the partition
> is
> > used to determine which node(s) and filter/order jobs. You should add
> > the node to the new partition, but also leave it in the 'test'
> > partition. If you are looking to remove the 'test' partition, set it
> to
> > down and once all the running jobs that are in it finish, then
> > remove it.
> >
> > Brian Andrus
> >
> > On 7/8/2020 10:57 PM, navin srivastava wrote:
> >  > Hi Team,
> >  >
> >  > i have 2 small query.because of the lack of testing environment i
> am
> >  > unable to test the scenario. working on to set up a test
> environment.
> >  >
> >  > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> >  > i found the reason is because there is no RealMemory entry in the
> > node
> >  > definition of the slurm.
> >  >
> >  > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
> > NodeAddr=Node[1-12]
> >  > Sockets=2 CoresPerSocket=10 State=UNKNOWN
> >  >
> >  > if i add the RealMemory it should be able to pick. So my
> query here
> >  > is, is it possible to add RealMemory in the definition anytime
> while
> >  > the jobs are in progres and execute the scontrol reconfigure and
> >  > reload the daemon on client node?  or do we need to take a
> >  > downtime?(which i don't think so)
> >  >
> >  > 2. Also I would like to know what will happen if some jobs are
> > running
> >  > in a partition(say test) and I will move the associated node to
> some
> >  > other partition(say normal) without draining the node.or if i
> > suspend
> >  > the job and then change the node partition and will resume the
> > job. I
> >  > am not deleting the partition here.
> >  >
> >  > Regards
> >  > Navin.
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >
>
>
> ---
> Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
> +4144 632 30 59  |  ETF D 104  |  Sternwartstrasse 7  | 8092 Zurich
> ---
>
>


Re: [slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava
Thanks Renfro. My scheduling policy is below.

SchedulerType=sched/builtin
SelectType=select/cons_res
SelectTypeParameters=CR_Core
AccountingStorageEnforce=associations
AccountingStorageHost=192.168.150.223
AccountingStorageType=accounting_storage/slurmdbd
ClusterName=hpc
JobCompType=jobcomp/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=5
SlurmdDebug=5
Waittime=0
Epilog=/etc/slurm/slurm.epilog.clean
GresTypes=gpu
MaxJobCount=500
SchedulerParameters=enable_user_top,default_queue_depth=100

# JOB PRIORITY
PriorityType=priority/multifactor
PriorityDecayHalfLife=2
PriorityUsageResetPeriod=DAILY
PriorityWeightFairshare=50
PriorityFlags=FAIR_TREE

let me try changing it to the backfill and will see if it helps.


Regards
Navin.





On Mon, Jul 13, 2020 at 5:16 PM Renfro, Michael  wrote:

> “The *SchedulerType* configuration parameter specifies the scheduler
> plugin to use. Options are sched/backfill, which performs backfill
> scheduling, and sched/builtin, which attempts to schedule jobs in a strict
> priority order within each partition/queue.”
>
> https://slurm.schedmd.com/sched_config.html
>
> If you’re using the builtin scheduler, lower priority jobs have no way to
> run ahead of higher priority jobs. If you’re using the backfill scheduler,
> your jobs will need specific wall times specified, since the idea with
> backfill is to run lower priority jobs ahead of time if and only if they
> can complete without delaying the estimated start time of higher priority
> jobs.
>
> On Jul 13, 2020, at 4:18 AM, navin srivastava 
> wrote:
>
> Hi Team,
>
> We have separate partitions for the GPU nodes and only CPU nodes .
>
> scenario: the jobs submitted in our environment is 4CPU+1GPU  as well as
> 4CPU only in  nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted
> and rest other jobs are in queue waiting for the availability of GPU
> resources.the job submitted with only CPU is not going through even
> though plenty of CPU resources are available but the job which is only
> looking CPU, also on pend because of these GPU based jobs( priority of GPU
> jobs is higher than CPU one).
>
> Is there any option here we can do,so that when all GPU resources are
> exhausted then it should allow the CPU jobs. Is there a way to deal with
> it? or some custom solution which we can think of.  There is no issue with
> CPU only partitions.
>
> Below is the my slurm configuration file
>
>
> NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10
> RealMemory=128833 State=UNKNOWN
> NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10
> RealMemory=515954 Feature=HIGHMEM State=UNKNOWN
> NodeName=node[28-32]  NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28
> RealMemory=257389
> NodeName=node[32-33]  NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24
> RealMemory=773418
> NodeName=node[17-27]  NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18
> RealMemory=257687 Feature=K2200 Gres=gpu:2
> NodeName=node[34]  NodeAddr=node34 Sockets=2 CoresPerSocket=24
> RealMemory=773410 Feature=RTX Gres=gpu:8
>
>
> PartitionName=node Nodes=node[1-10,14-16,28-33,35]  Default=YES
> MaxTime=INFINITE State=UP Shared=YES
> PartitionName=nodeGPUsmall Nodes=node[17-27]  Default=NO MaxTime=INFINITE
> State=UP Shared=YES
> PartitionName=nodeGPUbig Nodes=node[34]  Default=NO MaxTime=INFINITE
> State=UP Shared=YES
>
> Regards
> Navin.
>
>
>


[slurm-users] changes in slurm.

2020-07-09 Thread navin srivastava
Hi Team,

i have 2 small query.because of the lack of testing environment i am unable
to test the scenario. working on to set up a test environment.

1. In my environment i am unable to pass #SBATCH --mem-2GB option.
i found the reason is because there is no RealMemory entry in the node
definition of the slurm.

NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
Sockets=2 CoresPerSocket=10 State=UNKNOWN

if i add the RealMemory it should be able to pick. So my query here is, is
it possible to add RealMemory in the definition anytime while the jobs are
in progres and execute the scontrol reconfigure and reload the daemon on
client node?  or do we need to take a downtime?(which i don't think so)

2. Also I would like to know what will happen if some jobs are running in a
partition(say test) and I will move the associated node to some other
partition(say normal) without draining the node.or if i suspend the job and
then change the node partition and will resume the job. I am not deleting
the partition here.

Regards
Navin.


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread navin srivastava
Thanks Renfro.

I will perform similar setting and let us see how it goes.

Regards

On Mon, Jun 15, 2020, 23:02 Renfro, Michael  wrote:

> So if a GPU job is submitted to a partition containing only GPU nodes, and
> a non-GPU job is submitted to a partition containing at least some nodes
> without GPUs, both jobs should be able to run. Priorities should be
> evaluated on a per-partition basis. I can 100% guarantee that in our HPC,
> pending GPU jobs don't block non-GPU jobs, and vice versa.
>
> I could see a problem if the GPU job was submitted to a partition
> containing both types of nodes: if that job was assigned the highest
> priority for whatever reason (fair share, age, etc.), other jobs in the
> same partition would have to wait until that job started.
>
> A simple solution would be to make a GPU partition containing only GPU
> nodes, and a non-GPU partition containing only non-GPU nodes. Submit GPU
> jobs to the GPU partition, and non-GPU jobs to the non-GPU partition.
>
> Once that works, you could make a partition that includes both types of
> nodes to reduce idle resources, but jobs submitted to that partition would
> have to (a) not require a GPU, (b) require a limited number of CPUs per
> node, so that you'd have some CPUs available for GPU jobs on the nodes
> containing GPUs.
>
> ----------
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Saturday, June 13, 2020 10:47 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] ignore gpu resources to scheduled the cpu
> based jobs
>
>
> Yes we have separate partitions. Some are specific to gpu having 2 nodes
> with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very
> few nodes are without any gpu.
>
> Regards
> Navin
>
>
> On Sat, Jun 13, 2020, 21:11 navin srivastava 
> wrote:
>
> Thanks Renfro.
>
> Yes we have both types of nodes with gpu and nongpu.
> Also some users job require gpu and some applications use only CPU.
>
> So the issue happens when user priority is high and waiting for gpu
> resources which is not available and the job with lower priority is waiting
> even though enough CPU is available which need only CPU resources.
>
> When I hold gpu  jobs the cpu  jobs will go through.
>
> Regards
> Navin
>
> On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:
>
> Will probably need more information to find a solution.
>
> To start, do you have separate partitions for GPU and non-GPU jobs? Do you
> have nodes without GPUs?
>
> On Jun 13, 2020, at 12:28 AM, navin srivastava 
> wrote:
>
> Hi All,
>
> In our environment we have GPU. so what i found is if the user having high
> priority and his job is in queue and waiting for the GPU resources which
> are almost full and not available. so the other user submitted the job
> which does not require the GPU resources are in queue even though lots of
> cpu resources are available.
>
> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
> we can make some changes so that the cpu based job should go through and
> GPU based job can wait till the GPU resources are free.
>
> Regards
> Navin.
>
>
>
>
>


[slurm-users] Changing job order

2020-06-17 Thread navin srivastava
Hi Team,

Is their a way to change the job order in slurm.similar to sorder in PBS.

I want to swap my job from the other top job.

Regards
Navin


Re: [slurm-users] Changing job order

2020-06-18 Thread navin srivastava
Thanks Ole.

Regards
Navin


On Thu, Jun 18, 2020 at 11:56 AM Ole Holm Nielsen <
ole.h.niel...@fysik.dtu.dk> wrote:

> The scontrol command to set the nice level is on the list here:
> https://wiki.fysik.dtu.dk/niflheim/SLURM#useful-commands
>
> /Ole
>
> On 6/18/20 8:05 AM, navin srivastava wrote:
> > Thanks **
> > What is the command to modify the Nice value of an already submitted job.
> >
> > Regards
> > Navin
> >
> > On Thu, Jun 18, 2020 at 4:00 AM Rodrigo Santibáñez
> > mailto:rsantibanez.uch...@gmail.com>>
> wrote:
> >
> > HI Navin,
> >
> > You could set the nice value of both jobs to change the priority and
> > modify the order of execution.
> >
> > El mié., 17 jun. 2020 a las 12:31, navin srivastava
> > (mailto:navin.alt...@gmail.com>>) escribió:
> >
> > Hi Team,
> >
> > Is their a way to change the job order in slurm.similar to sorder
> > in PBS.
> >
> > I want to swap my job from the other top job.
>
>


Re: [slurm-users] Changing job order

2020-06-18 Thread navin srivastava
Thanks
What is the command to modify the Nice value of an already submitted job.

Regards
Navin

On Thu, Jun 18, 2020 at 4:00 AM Rodrigo Santibáñez <
rsantibanez.uch...@gmail.com> wrote:

> HI Navin,
>
> You could set the nice value of both jobs to change the priority and
> modify the order of execution.
>
> El mié., 17 jun. 2020 a las 12:31, navin srivastava (<
> navin.alt...@gmail.com>) escribió:
>
>> Hi Team,
>>
>> Is their a way to change the job order in slurm.similar to sorder in PBS.
>>
>> I want to swap my job from the other top job.
>>
>> Regards
>> Navin
>>
>>


[slurm-users] Job failure issue in Slurm

2020-06-04 Thread navin srivastava
Hi Team,

i am seeing a weird issue in my environment.
one of the gaussian job is failing with the slurm within a minute after it
go for the execution without writing anything and unable to figure out the
reason.
The same job works fine without slurm on the same node.

slurmctld.log

[2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
(normal)(21052)
[2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
(normalGPUsmall)(21052)
[2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
(normalGPUbig)(21052)
[2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:16:12.626] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:17:12.753] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:18:12.882] debug:  sched: JobId=1357498. State=PENDING.
Reason=Priority, Priority=21052.
Partition=normal,normalGPUsmall,normalGPUbig.
[2020-06-03T19:19:13.633] sched: Allocate JobID=1357498 NodeList=oled4
#CPUs=4 Partition=normal
[2020-06-04T12:25:36.961] _job_complete: JobID=1357498 State=0x1 NodeCnt=1
WEXITSTATUS 2
[2020-06-04T12:25:36.961]  SLURM Job_id=1357498 Name=job1 Ended, Run time
17:06:23, FAILED, ExitCode 2
[2020-06-04T12:25:36.962] _job_complete: JobID=1357498 State=0x8005
NodeCnt=1 done

slurmd.log

[2020-06-04T12:22:43.712] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
164642.84(164537+105)
[2020-06-04T12:23:13.712] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
164762.82(164657+105)
[2020-06-04T12:23:43.712] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
164882.81(164777+105)
[2020-06-04T12:24:13.712] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
165002.79(164897+105)
[2020-06-04T12:24:43.712] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
165122.77(165016+105)
[2020-06-04T12:25:13.713] [1357498.batch] debug:  jag_common_poll_data:
Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
165242.75(165136+105)
[2020-06-04T12:25:36.955] [1357498.batch] task 0 (64084) exited with exit
code 2.
[2020-06-04T12:25:36.955] [1357498.batch] debug:  task_p_post_term:
affinity 1357498.4294967294, task 0
[2020-06-04T12:25:36.960] [1357498.batch] debug:
 step_terminate_monitor_stop signaling condition
[2020-06-04T12:25:36.960] [1357498.batch] job 1357498 completed with
slurm_rc = 0, job_rc = 512
[2020-06-04T12:25:36.960] [1357498.batch] sending
REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 512
[2020-06-04T12:25:36.961] [1357498.batch] debug:  Message thread exited
[2020-06-04T12:25:36.962] [1357498.batch] done with job
[2020-06-04T12:25:36.962] debug:  task_p_slurmd_release_resources: affinity
jobid 1357498
[2020-06-04T12:25:36.962] debug:  credential for job 1357498 revoked
[2020-06-04T12:25:36.963] debug:  Waiting for job 1357498's prolog to
complete
[2020-06-04T12:25:36.963] debug:  Finished wait for job 1357498's prolog to
complete
[2020-06-04T12:25:36.963] debug:  [job 1357498] attempting to run epilog
[/etc/slurm/slurm.epilog.clean]
[2020-06-04T12:25:37.254] debug:  completed epilog for jobid 1357498
[2020-06-04T12:25:37.254] debug:  Job 1357498: sent epilog complete msg: rc
= 0

any suggestion will be welcome to troubleshoot this issue further.

Regards
Navin.


Re: [slurm-users] Job failure issue in Slurm

2020-06-08 Thread navin srivastava
Thanks sathish.

All other jobs are running fine across the cluster so I don't think it is
related to any pam  module issue. I am investigating issue further.i will
come back to you with more details

Regards
Navin


On Mon, Jun 8, 2020, 19:24 sathish  wrote:

> Hi Navin,
>
> Was this working earlier or is this the first time are you trying ?
> Are you using pam module ? if yes, try disabling the pam module and see
> if it works.
>
> Thanks
> Sathish
>
> On Thu, Jun 4, 2020 at 10:47 PM navin srivastava 
> wrote:
>
>> Hi Team,
>>
>> i am seeing a weird issue in my environment.
>> one of the gaussian job is failing with the slurm within a minute after
>> it go for the execution without writing anything and unable to figure out
>> the reason.
>> The same job works fine without slurm on the same node.
>>
>> slurmctld.log
>>
>> [2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
>> (normal)(21052)
>> [2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
>> (normalGPUsmall)(21052)
>> [2020-06-03T19:14:33.170] debug:  Job 1357498 has more than one partition
>> (normalGPUbig)(21052)
>> [2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:15:12.497] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:16:12.626] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:17:12.753] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:18:12.882] debug:  sched: JobId=1357498. State=PENDING.
>> Reason=Priority, Priority=21052.
>> Partition=normal,normalGPUsmall,normalGPUbig.
>> [2020-06-03T19:19:13.633] sched: Allocate JobID=1357498 NodeList=oled4
>> #CPUs=4 Partition=normal
>> [2020-06-04T12:25:36.961] _job_complete: JobID=1357498 State=0x1
>> NodeCnt=1 WEXITSTATUS 2
>> [2020-06-04T12:25:36.961]  SLURM Job_id=1357498 Name=job1 Ended, Run time
>> 17:06:23, FAILED, ExitCode 2
>> [2020-06-04T12:25:36.962] _job_complete: JobID=1357498 State=0x8005
>> NodeCnt=1 done
>>
>> slurmd.log
>>
>> [2020-06-04T12:22:43.712] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 164642.84(164537+105)
>> [2020-06-04T12:23:13.712] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 164762.82(164657+105)
>> [2020-06-04T12:23:43.712] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 164882.81(164777+105)
>> [2020-06-04T12:24:13.712] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 165002.79(164897+105)
>> [2020-06-04T12:24:43.712] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 165122.77(165016+105)
>> [2020-06-04T12:25:13.713] [1357498.batch] debug:  jag_common_poll_data:
>> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time
>> 165242.75(165136+105)
>> [2020-06-04T12:25:36.955] [1357498.batch] task 0 (64084) exited with exit
>> code 2.
>> [2020-06-04T12:25:36.955] [1357498.batch] debug:  task_p_post_term:
>> affinity 1357498.4294967294, task 0
>> [2020-06-04T12:25:36.960] [1357498.batch] debug:
>>  step_terminate_monitor_stop signaling condition
>> [2020-06-04T12:25:36.960] [1357498.batch] job 1357498 completed with
>> slurm_rc = 0, job_rc = 512
>> [2020-06-04T12:25:36.960] [1357498.batch] sending
>> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 512
>> [2020-06-04T12:25:36.961] [1357498.batch] debug:  Message thread exited
>> [2020-06-04T12:25:36.962] [1357498.batch] done with job
>> [2020-06-04T12:25:36.962] debug:  task_p_slurmd_release_resources:
>> affinity jobid 1357498
>> [2020-06-04T12:25:36.962] debug:  credential for job 1357498 revoked
>> [2020-06-04T12:25:36.963] debug:  Waiting for job 1357498's prolog to
>> complete
>> [2020-06-04T12:25:36.963] debug:  Finished wait for job 1357498's prolog
>> to complete
>> [2020-06-04T12:25:36.963] debug:  [job 1357498] attempting to run epilog
>> [/etc/slurm/slurm.epilog.clean]
>> [2020-06-04T12:25:37.254] debug:  completed epilog for jobid 1357498
>> [2020-06-04T12:25:37.254] debug:  Job 1357498: sent epilog complete msg:
>> rc = 0
>>
>> any suggestion will be welcome to troubleshoot this issue further.
>>
>> Regards
>> Navin.
>>
>>
>>
>>
>
> --
> Regards.
> Sathish
>


Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
I tried by executing the debug mode but there also it is not writing
anything.

i waited for about 5-10 minutes

deda1x1452:/etc/sysconfig # /usr/sbin/slurmd -v -v

No output on terminal.

The OS is SLES12-SP4 . All firewall services are disabled.

The recent change is the local hostname earlier it was with local hostname
node1,node2,etc but we have moved to dns based hostname which is deda

NodeName=node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=node[1-12]
Sockets=2 CoresPerSocket=10 State=UNKNOWN
other than this it is fine but after that i have done several time slurmd
process started on the node and it works fine but now i am seeing this
issue today.

Regards
Navin.









On Thu, Jun 11, 2020 at 6:06 PM Riebs, Andy  wrote:

> Navin,
>
>
>
> As you can see, systemd provides very little service-specific information.
> For slurm, you really need to go to the slurm logs to find out what
> happened.
>
>
>
> Hint: A quick way to identify problems like this with slurmd and slurmctld
> is to run them with the “-Dvvv” option, causing them to log to your window,
> and usually causing the problem to become immediately obvious.
>
>
>
> For example,
>
>
>
> # /usr/local/slurm/sbin/slurmd -D
>
>
>
> Just it ^C when you’re done, if necessary. Of course, if it doesn’t fail
> when you run it this way, it’s time to look elsewhere.
>
>
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 8:25 AM
> *To:* Slurm User Community List 
> *Subject:* [slurm-users] unable to start slurmd process.
>
>
>
> Hi Team,
>
>
>
> when i am trying to start the slurmd process i am getting the below error.
>
>
>
> 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node
> daemon...
> 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start
> operation timed out. Terminating.
> 2020-06-11T13:13:28.684479+02:00 oled3 systemd[1]: Failed to start Slurm
> node daemon.
> 2020-06-11T13:13:28.684759+02:00 oled3 systemd[1]: slurmd.service: Unit
> entered failed state.
> 2020-06-11T13:13:28.684917+02:00 oled3 systemd[1]: slurmd.service: Failed
> with result 'timeout'.
> 2020-06-11T13:15:01.437172+02:00 oled3 cron[8094]:
> pam_unix(crond:session): session opened for user root by (uid=0)
>
>
>
> Slurm version is 17.11.8
>
>
>
> The server and slurm is running from long time and we have not made any
> changes but today when i am starting it is giving this error message.
>
> Any idea what could be wrong here.
>
>
>
> Regards
>
> Navin.
>
>
>
>
>
>
>
>
>


[slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
Hi Team,

when i am trying to start the slurmd process i am getting the below error.

2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node
daemon...
2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start
operation timed out. Terminating.
2020-06-11T13:13:28.684479+02:00 oled3 systemd[1]: Failed to start Slurm
node daemon.
2020-06-11T13:13:28.684759+02:00 oled3 systemd[1]: slurmd.service: Unit
entered failed state.
2020-06-11T13:13:28.684917+02:00 oled3 systemd[1]: slurmd.service: Failed
with result 'timeout'.
2020-06-11T13:15:01.437172+02:00 oled3 cron[8094]: pam_unix(crond:session):
session opened for user root by (uid=0)

Slurm version is 17.11.8

The server and slurm is running from long time and we have not made any
changes but today when i am starting it is giving this error message.
Any idea what could be wrong here.

Regards
Navin.


Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
i collected the log from slurmctld and it says below

[2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284
Nodelist=oled3
[2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3
RPC:REQUEST_TERMINATE_JOB : Communication connection failure
[2020-06-11T07:14:50.210] error: Nodes oled3 not responding
[2020-06-11T07:15:54.313] error: Nodes oled3 not responding
[2020-06-11T07:17:34.407] error: Nodes oled3 not responding
[2020-06-11T07:19:14.637] error: Nodes oled3 not responding
[2020-06-11T07:19:54.313] update_node: node oled3 reason set to:
reboot-required
[2020-06-11T07:19:54.313] update_node: node oled3 state set to DRAINING*
[2020-06-11T07:20:43.788] requeue job 1316970 due to failure of node oled3
[2020-06-11T07:20:43.788] requeue job 1349322 due to failure of node oled3
[2020-06-11T07:20:43.789] error: Nodes oled3 not responding, setting DOWN

sinfo says

OLED*   up   infinite  1 drain* oled3

while checking the node i feel node is healthy.

Regards
Navin

On Thu, Jun 11, 2020 at 7:21 PM Riebs, Andy  wrote:

> Weird. “slurmd -Dvvv” ought to report a whole lot of data; I can’t guess
> how to interpret it not reporting anything but the “log file” and “munge”
> messages. When you have it running attached to your window, is there any
> chance that sinfo or scontrol suggest that the node is actually all right?
> Perhaps something in /etc/sysconfig/slurm or the like is messed up?
>
>
>
> If that’s not the case, I think my next step would be to follow up on
> someone else’s suggestion, and scan the slurmctld.log file for the problem
> node name.
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 9:26 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] unable to start slurmd process.
>
>
>
> Sorry Andy I missed to add.
>
> 1st i tried the  slurmd -Dvvv and it is not written anything
>
> slurmd: debug:  Log file re-opened
> slurmd: debug:  Munge authentication plugin loaded
>
>
>
> After that I waited for 10-20 minutes but no output and finally i pressed
> Ctrl^c.
>
>
>
> My doubt is in slurm.conf file:
>
>
>
> ControlMachine=deda1x1466
> ControlAddr=192.168.150.253
>
>
>
> The deda1x1466 is having a different interface with different IP which
> compute node is unable to ping but IP is pingable.
>
> could be one of the reason?
>
>
>
> but other nodes having the same config and there i am able to start the
> slurmd. so bit of confusion.
>
>
>
> Regards
>
> Navin.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Regards
>

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
i am able to get the output scontrol show node oled3
also the oled3 is pinging fine

and scontrol ping output showing like

Slurmctld(primary/backup) at deda1x1466/(NULL) are UP/DOWN

so all looks ok to me.

REgards
Navin.



On Thu, Jun 11, 2020 at 8:38 PM Riebs, Andy  wrote:

> So there seems to be a failure to communicate between slurmctld and the
> oled3 slurmd.
>
>
>
> From oled3, try “scontrol ping” to confirm that it can see the slurmctld
> daemon.
>
>
>
> From the head node, try “scontrol show node oled3”, and then ping the
> address that is shown for “NodeAddr=”
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 10:40 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] unable to start slurmd process.
>
>
>
> i collected the log from slurmctld and it says below
>
>
>
> [2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3
> RPC:REQUEST_TERMINATE_JOB : Communication connection failure
> [2020-06-11T07:14:50.210] error: Nodes oled3 not responding
> [2020-06-11T07:15:54.313] error: Nodes oled3 not responding
> [2020-06-11T07:17:34.407] error: Nodes oled3 not responding
> [2020-06-11T07:19:14.637] error: Nodes oled3 not responding
> [2020-06-11T07:19:54.313] update_node: node oled3 reason set to:
> reboot-required
> [2020-06-11T07:19:54.313] update_node: node oled3 state set to DRAINING*
> [2020-06-11T07:20:43.788] requeue job 1316970 due to failure of node oled3
> [2020-06-11T07:20:43.788] requeue job 1349322 due to failure of node oled3
> [2020-06-11T07:20:43.789] error: Nodes oled3 not responding, setting DOWN
>
>
>
> sinfo says
>
>
>
> OLED*   up   infinite  1 drain* oled3
>
>
>
> while checking the node i feel node is healthy.
>
>
>
> Regards
>
> Navin
>
>
>
> On Thu, Jun 11, 2020 at 7:21 PM Riebs, Andy  wrote:
>
> Weird. “slurmd -Dvvv” ought to report a whole lot of data; I can’t guess
> how to interpret it not reporting anything but the “log file” and “munge”
> messages. When you have it running attached to your window, is there any
> chance that sinfo or scontrol suggest that the node is actually all right?
> Perhaps something in /etc/sysconfig/slurm or the like is messed up?
>
>
>
> If tha

Re: [slurm-users] unable to start slurmd process.

2020-06-12 Thread navin srivastava
Hi Team,

After my Analysis i found that the user used the qdel command which is a
plugin with slurm and the job is not killed properly and it makes the
slurmstepd process in a kind of hung state. so when i was trying to start
the slurmd the process was not getting started.after killing those
processes. slurmd started without any issues.

Regards
Navin.




On Thu, Jun 11, 2020 at 9:23 PM Riebs, Andy  wrote:

> Short of getting on the system and kicking the tires myself, I’m fresh out
> of ideas. Does “sinfo -R” offer any hints?
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 11:31 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] unable to start slurmd process.
>
>
>
> i am able to get the output scontrol show node oled3
>
> also the oled3 is pinging fine
>
>
>
> and scontrol ping output showing like
>
>
>
> Slurmctld(primary/backup) at deda1x1466/(NULL) are UP/DOWN
>
>
>
> so all looks ok to me.
>
>
>
> REgards
>
> Navin.
>
>
>
>
>
>
>
> On Thu, Jun 11, 2020 at 8:38 PM Riebs, Andy  wrote:
>
> So there seems to be a failure to communicate between slurmctld and the
> oled3 slurmd.
>
>
>
> From oled3, try “scontrol ping” to confirm that it can see the slurmctld
> daemon.
>
>
>
> From the head node, try “scontrol show node oled3”, and then ping the
> address that is shown for “NodeAddr=”
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 10:40 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] unable to start slurmd process.
>
>
>
> i collected the log from slurmctld and it says below
>
>
>
> [2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284
> Nodelist=oled3
> [2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3
> RPC:REQUEST_TERMINATE_JOB : Communication connection failure
> [2020-06-11T07:14:50.210] error: Nodes oled3 not responding
> [2020-06-11T07:15:54.313] error: Nodes oled3 not responding
> [2020-06-11T07:17:34.407] error: Nodes oled3 not responding
> [2020-06-11T07:19:14.637] error: Nodes oled3 not responding
> [2020-06-11T07:19:54.313] update_node: node oled3 reason set to:
> reboot-required
> [2020-06-11T07:19:54.313] update_node: no

[slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-12 Thread navin srivastava
Hi All,

In our environment we have GPU. so what i found is if the user having high
priority and his job is in queue and waiting for the GPU resources which
are almost full and not available. so the other user submitted the job
which does not require the GPU resources are in queue even though lots of
cpu resources are available.

our scheduling mechanism is FIFO and Fair tree enabled. Is there any way we
can make some changes so that the cpu based job should go through and GPU
based job can wait till the GPU resources are free.

Regards
Navin.


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
Yes we have separate partitions. Some are specific to gpu having 2 nodes
with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very
few nodes are without any gpu.

Regards
Navin


On Sat, Jun 13, 2020, 21:11 navin srivastava  wrote:

> Thanks Renfro.
>
> Yes we have both types of nodes with gpu and nongpu.
> Also some users job require gpu and some applications use only CPU.
>
> So the issue happens when user priority is high and waiting for gpu
> resources which is not available and the job with lower priority is waiting
> even though enough CPU is available which need only CPU resources.
>
> When I hold gpu  jobs the cpu  jobs will go through.
>
> Regards
> Navin
>
> On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:
>
>> Will probably need more information to find a solution.
>>
>> To start, do you have separate partitions for GPU and non-GPU jobs? Do
>> you have nodes without GPUs?
>>
>> On Jun 13, 2020, at 12:28 AM, navin srivastava 
>> wrote:
>>
>> Hi All,
>>
>> In our environment we have GPU. so what i found is if the user having
>> high priority and his job is in queue and waiting for the GPU resources
>> which are almost full and not available. so the other user submitted the
>> job which does not require the GPU resources are in queue even though lots
>> of cpu resources are available.
>>
>> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
>> we can make some changes so that the cpu based job should go through and
>> GPU based job can wait till the GPU resources are free.
>>
>> Regards
>> Navin.
>>
>>
>>
>>
>>


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
Thanks Renfro.

Yes we have both types of nodes with gpu and nongpu.
Also some users job require gpu and some applications use only CPU.

So the issue happens when user priority is high and waiting for gpu
resources which is not available and the job with lower priority is waiting
even though enough CPU is available which need only CPU resources.

When I hold gpu  jobs the cpu  jobs will go through.

Regards
Navin

On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:

> Will probably need more information to find a solution.
>
> To start, do you have separate partitions for GPU and non-GPU jobs? Do you
> have nodes without GPUs?
>
> On Jun 13, 2020, at 12:28 AM, navin srivastava 
> wrote:
>
> Hi All,
>
> In our environment we have GPU. so what i found is if the user having high
> priority and his job is in queue and waiting for the GPU resources which
> are almost full and not available. so the other user submitted the job
> which does not require the GPU resources are in queue even though lots of
> cpu resources are available.
>
> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
> we can make some changes so that the cpu based job should go through and
> GPU based job can wait till the GPU resources are free.
>
> Regards
> Navin.
>
>
>
>
>


Re: [slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
Thank you Andy.

but when i am trying to get the utilization for the months it says it is
100%.
when i tried to find it using utilization by user it gives me a very
different value which i am unable to understand.

deda1x1466:~ # sreport cluster AccountUtilizationByUser  start=10/02/20
 end=10/02/20 cluster=hpc2 -t HOUR --tres=cpu

Cluster/Account/User Utilization 2020-10-02T00:00:00 - 2020-10-02T00:59:59
(3600 secs)
Usage reported in TRES Hours

  Cluster Account Login Proper Name  TRES Name  Used
- --- - --- -- -
hpc2root
 cpu 68159
hpc2stdg_acc   cpu
68159
hpc2stdg_acc   m219018 Harbach Philippcpu   317
hpc2stdg_acc   m253000   Morin Valeriecpu12
hpc2stdg_acc   m254746 Lippolis Eleon+cpu 9
hpc2stdg_acc   m258464Wurl Andreascpu96
hpc2stdg_acc   m262230 Schmelzer Maxi+cpu 2
hpc2stdg_acc   m270962 Heidrich Johan+cpu 67647
hpc2stdg_acc   m271803   Hermsen Markocpu46
hpc2stdg_acc   m275696   Ploetz Tobiascpu10
hpc2stdg_acc   m278452 Brandenburg Ja+cpu19
hpc2stdg_acc   m290493cpu 1

How it is calculating the hour in a day .

Regards
Navin.



On Wed, Nov 18, 2020 at 7:51 PM Andy Riebs  wrote:

> I see from your subsequent post that you're using a pair of clusters
> with a single database, so yes, you are using federation.
>
> The high order bits of the Job ID identify the cluster that ran the job,
> so you will typically have a huge gap between ranges of Job IDs.
>
> Andy
>
> On 11/18/2020 9:15 AM, Andy Riebs wrote:
> > Are you using federated clusters? If not, check slurm.conf -- do you
> > have FirstJobId set?
> >
> > Andy
> >
> > On 11/18/2020 8:42 AM, navin srivastava wrote:
> >> While running the sacct we found that some jobid are not listing.
> >>
> >> 5535566  SYNTHLIBT+  stdg_defq   stdg_acc  1  COMPLETED
> >>0:0
> >> 5535567  SYNTHLIBT+  stdg_defq   stdg_acc  1  COMPLETED
> >>0:0
> >> 11016496 jupyter-s+  stdg_defq   stdg_acc  1  RUNNING
> >>  0:0
> >> 11016496.ex+ extern  stdg_acc  1  COMPLETED
> >>0:0
> >>
> >>  Not able to see the jobid in between these range in sacct info.
> >>  Any hint what went wrong here.
> >>
> >> Regards
> >> Navin.
>
>


[slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
While running the sacct we found that some jobid are not listing.


5535566  SYNTHLIBT+  stdg_defq   stdg_acc  1  COMPLETED  0:0
5535567  SYNTHLIBT+  stdg_defq   stdg_acc  1  COMPLETED  0:0
11016496 jupyter-s+  stdg_defq   stdg_acc  1RUNNING  0:0
11016496.ex+ extern  stdg_acc  1  COMPLETED  0:0

 Not able to see the jobid in between these range in sacct info.
 Any hint what went wrong here.

Regards
Navin.


Re: [slurm-users] Sreport Query

2020-11-17 Thread navin srivastava
is there a way to find the utilization per Node?

Regards
Navin.

On Wed, Nov 18, 2020 at 10:37 AM navin srivastava 
wrote:

> Dear All,
>
> Good Day!
>
> i am seeing one strange behaviour in my environment.
>
> we have 2 clusters in our environment one acting as a database server and
> have pointed the 2nd cluster to the same database.
>
> -- -
>   hpc1  155.250.126.30 6817  8192 1
> normal
>   hpc2  155.250.168.57 6817  8192 1
> normal
>
> While generating the report I am able to generate for the local
> cluster(hpc1) without any issue  and it looks good. but from the second
> cluster data it always shows me 100% utilization from june onwards ,earlier
> data is fine.which is definitely wrong.
>
> sreport cluster utilization start=06/01/20 end=06/30/20 cluster=hpc2 -t
> percent | grep hpc2
> hpc2 100.00%0.00%0.00%0.00%0.00% 99.82%
>
> any suggestion what went wrong here. how to troubleshoot this issue.
>
> Regards
> Navin.
>
>
>
>
>
>
>
>
>


[slurm-users] Slurm Upgrade

2020-11-02 Thread navin srivastava
Dear All,

Currently we are running slurm version 17.11.x and wanted to move to 20.x.

We are building the New server with Slurm 20.2 version and planning to
upgrade the client nodes from 17.x to 20.x.

wanted to check if we can upgrade the Client from 17.x to 20.x directly or
we need to go through 17.x to 18.x and 19.x then 20.x

Regards
Navin.


Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread navin srivastava
Thank you all for the response.

but my question here is

I have already built a new server slurm 20.2 with the latest DB. my
question is,  shall i do a mysqldump into this server from existing server
running with version slurm version 17.11.8 and then i will upgrade all
client with 20.x followed by 18.x and 19.x  or i can uninstall the slurm
17.11.8 and install 20.2 on all compute nodes.

Regards
Navin.









On Tue, Nov 3, 2020 at 12:31 PM Ole Holm Nielsen 
wrote:

> On 11/2/20 2:25 PM, navin srivastava wrote:
> > Currently we are running slurm version 17.11.x and wanted to move to
> 20.x.
> >
> > We are building the New server with Slurm 20.2 version and planning to
> > upgrade the client nodes from 17.x to 20.x.
> >
> > wanted to check if we can upgrade the Client from 17.x to 20.x directly
> or
> > we need to go through 17.x to 18.x and 19.x then 20.x
>
> I have described the Slurm upgrade process in my Wiki page:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
>
> It's based upon experiences and Slurm documentation and seems to work
> correctly.
>
> /Ole
>
>


[slurm-users] Sinfo or squeue stuck for some seconds

2021-08-29 Thread navin srivastava
Dear slurm community users,

We are  using slurm version 20.02.x.

We see the below message appearing a lot of times in slurmctld log
and found that whenever this message is appearing the sinfo/squeue out gets
slow.
No timeout as i kept the value 100.

Warning: Note very large processing time from load_part_uid_allow_list:
usec=10800885 began=16:27:55.952
[2021-08-29T16:28:06.753] Warning: Note very large processing time from
_slurmctld_background: usec=10801120 began=16:27:55.952

Is this a bug or some config issue. if anybody faced the similar
issue.could anybody throw some light on this.

please find the attached slurm.conf.

Regards
Navin.
ClusterName=merckhpc
ControlMachine=Master
ControlAddr=localhost
AuthType=auth/munge
CredType=cred/munge
CacheGroups=1
ReturnToService=0
ProctrackType=proctrack/linuxproc
SlurmctldPort=6817
SlurmdPort=6818
SchedulerPort=7321

SlurmctldPidFile=/var/slurm/slurmctld.pid
SlurmdPidFile=/var/slurm/slurmd.%n.pid
SlurmdSpoolDir=/var/slurm/spool/slurmd.%n.spool
StateSaveLocation=/var/slurm/state
SlurmctldLogFile=/var/slurm/log/slurmctld.log
SlurmdLogFile=/var/slurm/log/slurmd.%n.log.%h
SlurmUser=hpcadmin
MpiDefault=none

SwitchType=switch/none
TaskPlugin=task/affinity
TaskPluginParam=Sched
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
KillWait=30
MinJobAge=3600


SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core

AccountingStorageEnforce=associations
AccountingStorageHost=localhost
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES


JobCompType=jobcomp/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmdDebug=5
SlurmctldDebug=5
Waittime=0

Epilog=/etc/slurm/slurm.epilog.clean
GresTypes=gpu
MaxArraySize=1
MaxJobCount=500
MessageTimeout=100


SchedulerParameters=enable_user_top,default_queue_depth=100
PriorityType=priority/multifactor
PriorityDecayHalfLife=2
PriorityUsageResetPeriod=DAILY
PriorityWeightFairshare=50
PriorityFlags=FAIR_TREE


NodeName=node[35-40] NodeHostname=bng1x[1847-1852] NodeAddr=node[35-40] CPUs=40 
Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386626
NodeName=node[17-26] NodeHostName=bng1x[1590-1599] NodeAddr=node[17-26] CPUs=36 
Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257680 
 Feature=K2200 Gres=gpu:2
NodeName=node41 NodeHostName=bng1x1855 NodeAddr=node41 CPUs=40 Boards=1 
SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386643 
Feature=V100S Gres=gpu:2
NodeName=node[32-33] NodeHostname=bng1x[1793-1794] NodeAddr=node[32-33] 
Sockets=2 CoresPerSocket=24 RealMemory=773690
NodeName=node[28-31] NodeHostname=bng1x[1737-1740] NodeAddr=node[28-31] 
Sockets=2 CoresPerSocket=28 RealMemory=257586
NodeName=node[27] NodeHostname=bng1x1600 NodeAddr=node27 Sockets=2 
CoresPerSocket=18 RealMemory=515728 Feature=K40 Gres=gpu:2
NodeName=node[34] NodeHostname=bng1x1795 NodeAddr=node34 Sockets=2 
CoresPerSocket=24 RealMemory=773682 Feature=RTX Gres=gpu:8
PartitionName=Normal  Nodes=node[28-33,35-40]  Default=Yes MaxTime=INFINITE 
State=UP Shared=YES OverSubscribe=NO 
PartitionName=testq  Nodes=node41  Default=NO MaxTime=INFINITE State=UP 
Shared=YES
PartitionName=smallgpu Nodes=node[34]  Default=NO MaxTime=INFINITE State=UP 
Shared=YES OverSubscribe=NO 
PartitionName=biggpu  Nodes=node[17-27]  Default=NO MaxTime=INFINITE State=UP 
Shared=YES OverSubscribe=NO 


[slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Hi ,

I am looking for a stepwise guide to setup multi cluster implementation.
We wanted to set up 3 clusters and one Login Node to run the job using -M
cluster option.
can anybody have such a setup and can share some insight into how it works
and it is really a stable solution.


Regards
Navin.


Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Thank you Tina.

so if i understood correctly.Database is global to both cluster and running
on login Node?
or is the database running on one of the master Node and shared with
another master server Node?

but as far I have read that the slurm database can also be separate on both
the master and just use the parameter AccountingStorageExternalHost so that
both databases are aware of each other.

Also on the login node in slurm .conf file pointed to which Slurmctld?
is it possible to share the  sample slurm.conf file of login Node.

Regards
Navin.








On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich 
wrote:

> Hi Navin,
>
> well, I have two clusters & login nodes that allow access to both. That
> do? I don't think a third would make any difference in setup.
>
> They need to share a database. As long as the share a database, the
> clusters have 'knowledge' of each other.
>
> So if you set up one database server (running slurmdbd), and then a
> SLURM controller for each cluster (running slurmctld) using that one
> central database, the '-M' option should work.
>
> Tina
>
> On 28/10/2021 10:54, navin srivastava wrote:
> > Hi ,
> >
> > I am looking for a stepwise guide to setup multi cluster implementation.
> > We wanted to set up 3 clusters and one Login Node to run the job using
> > -M cluster option.
> > can anybody have such a setup and can share some insight into how it
> > works and it is really a stable solution.
> >
> >
> > Regards
> > Navin.
>
> --
> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>
> Research Computing and Support Services
> IT Services, University of Oxford
> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>
>


Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Thank you Tina.
It will really help

Regards
Navin

On Thu, Oct 28, 2021, 22:01 Tina Friedrich 
wrote:

> Hello,
>
> I have the database on a separate server (it runs the database and the
> database only). The login nodes run nothing SLURM related, they simply
> have the binaries installed & a SLURM config.
>
> I've never looked into having multiple databases & using
> AccountingStorageExternalHost (in fact I'd forgotten you could do that),
> so I can't comment on that (maybe someone else can); I think that works,
> yes, but as I said never tested that (didn't see much point in running
> multiple databases if one would do the job).
>
> I actually have specific login nodes for both of my clusters, to make it
> easier for users (especially those with not much experience using the
> HPC environment); so I have one login node connecting to cluster 1 and
> one connecting to cluster 1.
>
> I think the relevant bits of slurm.conf Relevant config entries (if I'm
> not mistaken) on the login nodes are probably:
>
> The differences in the slurm config files (that haven't got to do with
> topology & nodes & scheduler tuning) are
>
> ClusterName=cluster1
> ControlMachine=cluster1-slurm
> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>
> ClusterName=cluster2
> ControlMachine=cluster2-slurm
> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>
> (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
> same for cluster2)
>
> And then the have common entries for the AccountingStorageHost:
>
> AccountingStorageHost=slurm-db-prod
> AccountingStorageBackupHost=slurm-db-prod
> AccountingStoragePort=7030
> AccountingStorageType=accounting_storage/slurmdbd
>
> (slurm-db-prod is simply the hostname of the SLURM database server)
>
> Does that help?
>
> Tina
>
> On 28/10/2021 14:59, navin srivastava wrote:
> > Thank you Tina.
> >
> > so if i understood correctly.Database is global to both cluster and
> > running on login Node?
> > or is the database running on one of the master Node and shared with
> > another master server Node?
> >
> > but as far I have read that the slurm database can also be separate on
> > both the master and just use the parameter
> > AccountingStorageExternalHost so that both databases are aware of each
> > other.
> >
> > Also on the login node in slurm .conf file pointed to which Slurmctld?
> > is it possible to share the  sample slurm.conf file of login Node.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
> > mailto:tina.friedr...@it.ox.ac.uk>> wrote:
> >
> > Hi Navin,
> >
> > well, I have two clusters & login nodes that allow access to both.
> That
> > do? I don't think a third would make any difference in setup.
> >
> > They need to share a database. As long as the share a database, the
> > clusters have 'knowledge' of each other.
> >
> > So if you set up one database server (running slurmdbd), and then a
> > SLURM controller for each cluster (running slurmctld) using that one
> > central database, the '-M' option should work.
> >
> > Tina
> >
> > On 28/10/2021 10:54, navin srivastava wrote:
> >  > Hi ,
> >  >
> >  > I am looking for a stepwise guide to setup multi cluster
> > implementation.
> >  > We wanted to set up 3 clusters and one Login Node to run the job
> > using
> >  > -M cluster option.
> >  > can anybody have such a setup and can share some insight into how
> it
> >  > works and it is really a stable solution.
> >  >
> >  >
> >  > Regards
> >  > Navin.
> >
> > --
> > Tina Friedrich, Advanced Research Computing Snr HPC Systems
> > Administrator
> >
> > Research Computing and Support Services
> > IT Services, University of Oxford
> > http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
> > http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
> >
>
> --
> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>
> Research Computing and Support Services
> IT Services, University of Oxford
> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>
>


[slurm-users] maridb version compatibility with Slurm version

2022-08-24 Thread navin srivastava
Hi,

I have a question related to the mariadb vs slurm version compatibility.
Is there any matrix available?

We are running with slurm version 20.02 in our environment on SLES15SP3 and
with mariadb 10.5.x . We are upgrading the OS from SLES15SP3 to SP4 and
with this we see the mariadb version is 10.6.x. and we are not upgrading
the Slurm version.

What is the best way to deal with this as we patch the server quarterly and
keep the slurm version unchanged as I locked this at os level  but the
mariadb version update happens and as far as i see it has no impact.
is it good idea to keep the mariadb version also intact with the slurm
version?

Regards
Navin.