Re: [slurm-users] Job limit in slurm.
Hi, Thanks for your script. with this i am able to show the limit what i set. but this limt is not working. MaxJobs =3, current value = 0 Regards Navin. On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen wrote: > On 2/17/20 11:16 AM, navin srivastava wrote: > > i have an issue with the slurm job limit. i applied the Maxjobs limit on > > user using > > > > sacctmgr modify user navin1 set maxjobs=3 > > > > but still i see this is not getting applied. i am still bale to submit > > more jobs. > > Slurm version is 17.11.x > > > > Let me know what setting is required to implement this. > > The tool "showuserlimits" tells you all user limits in the Slurm database. > You can download it from > https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits > and give it a try: > > $ showuserlimits -u navin1 > > /Ole > >
Re: [slurm-users] Job limit in slurm.
Hi ole, i am submitting 100 of jobs are i see all jobs starting at the same time and all job is going into the run state. if Maxjobs limit is set it should allow only 3 jobs at any point of time. Regards Navin. On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen wrote: > Hi Navin, > > Why do you think the limit is not working? The MaxJobs limits the number > of running jobs to 3, but you can still submit as many jobs as you like! > > See "man sacctmgr" for definitions of the limits MaxJobs as well as > MaxSubmitJobs. > > /Ole > > On 2/17/20 12:04 PM, navin srivastava wrote: > > Hi, > > > > Thanks for your script. > > with this i am able to show the limit what i set. but this limt is > > not working. > > > > MaxJobs =3, current value = 0 > > > > Regards > > Navin. > > > > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen > > mailto:ole.h.niel...@fysik.dtu.dk>> wrote: > > > > On 2/17/20 11:16 AM, navin srivastava wrote: > > > i have an issue with the slurm job limit. i applied the Maxjobs > > limit on > > > user using > > > > > > sacctmgr modify user navin1 set maxjobs=3 > > > > > > but still i see this is not getting applied. i am still bale to > submit > > > more jobs. > > > Slurm version is 17.11.x > > > > > > Let me know what setting is required to implement this. > > > > The tool "showuserlimits" tells you all user limits in the Slurm > > database. > >You can download it from > > > https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits > > and give it a try: > > > > $ showuserlimits -u navin1 > >
[slurm-users] Job limit in slurm.
Hi Team, i have an issue with the slurm job limit. i applied the Maxjobs limit on user using sacctmgr modify user navin1 set maxjobs=3 but still i see this is not getting applied. i am still bale to submit more jobs. Slurm version is 17.11.x Let me know what setting is required to implement this. Regards Navin.
Re: [slurm-users] Job limit in slurm.
Hi ole, Thanks Ole. After setting the Enforce it worked. I am new to slurm to thanks for helping me. Regards Navin On Mon, Feb 17, 2020 at 5:36 PM Ole Holm Nielsen wrote: > Hi Navin, > > I wonder if you have configured the Slurm database and the slurmdbd > daemon? I think the limit enforcement requires the use of the database. > > What is the output of: > > $ scontrol show config | grep AccountingStorageEnforce > > See also https://slurm.schedmd.com/accounting.html#limit-enforcement > > Limit Enforcement > > Various limits and limit enforcement are described in the Resource Limits > web page. > > To enable any limit enforcement you must at least have > AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you > have limits set, they will not be enforced. Other options for > AccountingStorageEnforce and the explanation for each are found on the > Resource Limits document. > > /Ole > > On 2/17/20 12:20 PM, navin srivastava wrote: > > Hi ole, > > > > i am submitting 100 of jobs are i see all jobs starting at the same time > > and all job is going into the run state. > > if Maxjobs limit is set it should allow only 3 jobs at any point of time. > > > > Regards > > Navin. > > > > > > > > > > On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen > > mailto:ole.h.niel...@fysik.dtu.dk>> wrote: > > > > Hi Navin, > > > > Why do you think the limit is not working? The MaxJobs limits the > number > > of running jobs to 3, but you can still submit as many jobs as you > like! > > > > See "man sacctmgr" for definitions of the limits MaxJobs as well as > > MaxSubmitJobs. > > > > /Ole > > > > On 2/17/20 12:04 PM, navin srivastava wrote: > > > Hi, > > > > > > Thanks for your script. > > > with this i am able to show the limit what i set. but this limt is > > > not working. > > > > > > MaxJobs =3, current value = 0 > > > > > > Regards > > > Navin. > > > > > > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen > > > mailto:ole.h.niel...@fysik.dtu.dk> > > <mailto:ole.h.niel...@fysik.dtu.dk > > <mailto:ole.h.niel...@fysik.dtu.dk>>> wrote: > > > > > > On 2/17/20 11:16 AM, navin srivastava wrote: > > > > i have an issue with the slurm job limit. i applied the > Maxjobs > > > limit on > > > > user using > > > > > > > > sacctmgr modify user navin1 set maxjobs=3 > > > > > > > > but still i see this is not getting applied. i am still > bale > > to submit > > > > more jobs. > > > > Slurm version is 17.11.x > > > > > > > > Let me know what setting is required to implement this. > > > > > > The tool "showuserlimits" tells you all user limits in the > Slurm > > > database. > > >You can download it from > > > > > > https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits > > > and give it a try: > > > > > > $ showuserlimits -u navin1 > > > > >
Re: [slurm-users] How to request for the allocation of scratch .
Thanks Erik. Last night i made the changes. i defined in slurm.conf on all the nodes as well as on the slurm server. TmpFS=/lscratch NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2 CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN Feature=P4000 Gres=gpu:2 These nodes having 1.6TB local scratch. i did a scontrol reconfig on all the nodes but after sometime we saw all nodes went into drain state.then i revert back the changes with old one. on all nodes jobs were running and the localsctratch is 20-25% in use. we have already cleanup script in crontab which used to clean the scratch space regularly. is anything wrong here? Regards Navin. On Thu, Apr 16, 2020 at 12:26 AM Ellestad, Erik wrote: > The default value for TmpDisk is 0, so if you want local scratch > available on a node, the amount of TmpDisk space must be defined in the > node configuration in slurm.conf. > > example: > > NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 > ThreadsPerCore=1 RealMemory=24099 TmpDisk=15 > > The configuration value for the node definition is in MB. > > https://slurm.schedmd.com/slurm.conf.html > > *TmpDisk* Total size of temporary disk storage in *TmpFS* in megabytes > (e.g. "16384"). *TmpFS* (for "Temporary File System") identifies the > location which jobs should use for temporary storage. Note this does not > indicate the amount of free space available to the user on the node, only > the total file system size. The system administration should ensure this > file system is purged as needed so that user jobs have access to most of > this space. The Prolog and/or Epilog programs (specified in the > configuration file) might be used to ensure the file system is kept clean. > The default value is 0. > > When requesting --tmp with srun or sbatch, it can be done in various size > formats: > > *--tmp*=<*size[units]*> Specify a minimum amount of temporary disk space > per node. Default units are megabytes unless the SchedulerParameters > configuration parameter includes the "default_gbytes" option for gigabytes. > Different units can be specified using the suffix [K|M|G|T]. > https://slurm.schedmd.com/sbatch.html > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Tuesday, April 14, 2020 11:19 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > Thank you Erik. > > To define the local scratch on all the compute node is not mandatory? only > on slurm server is enough right? > Also the TMPdisk should be defined in MB or can be defined in GB as well > > while requesting --tmp , we can use the value in GB right? > > Regards > Navin. > > > > On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik > wrote: > > Have you defined the TmpDisk value for each node? > > As far as I know, local disk space is not a valid type for GRES. > > https://slurm.schedmd.com/gres.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=qpy9RbpYHEmd6jqfs9j8b4IiRJf3GkO-X_v05nL-8Bo=B5wXNgnl6EdXIaS0QQpITnBTSxcjnAHJENyJGLyWltI=> > > "Generic resource (GRES) scheduling is supported through a flexible plugin > mechanism. Support is currently provided for Graphics Processing Units > (GPUs), CUDA Multi-Process Service (MPS), and Intel® Many Integrated Core > (MIC) processors." > > The only valid solution I've found for scratch is to: > > In slurm.conf, define the location of local scratch globally via TmpFS. > > And then the amount per host is defined via TmpDisk=xxx. > > Then the request for srun/sbatch via --tmp=X > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Tuesday, April 14, 2020 7:32 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > > Any suggestion on the above query.need help to understand it. > Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will > reserve the 500GB from scratch. > let me know if my assumption is correct? > > Regards > Navin. > > > On Mon, Apr 13, 2020 at 11:10 AM navin srivastava > wrote: > > Hi Team, > > i wanted to define a mechanism to request the local disk space while > submitting the job. > > we have dedicated /scratch of 1.2 TB file system for the execution of the > job on each of
Re: [slurm-users] How to request for the allocation of scratch .
Thank you Erik. To define the local scratch on all the compute node is not mandatory? only on slurm server is enough right? Also the TMPdisk should be defined in MB or can be defined in GB as well while requesting --tmp , we can use the value in GB right? Regards Navin. On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik wrote: > Have you defined the TmpDisk value for each node? > > As far as I know, local disk space is not a valid type for GRES. > > https://slurm.schedmd.com/gres.html > > "Generic resource (GRES) scheduling is supported through a flexible plugin > mechanism. Support is currently provided for Graphics Processing Units > (GPUs), CUDA Multi-Process Service (MPS), and Intel® Many Integrated Core > (MIC) processors." > > The only valid solution I've found for scratch is to: > > In slurm.conf, define the location of local scratch globally via TmpFS. > > And then the amount per host is defined via TmpDisk=xxx. > > Then the request for srun/sbatch via --tmp=X > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Tuesday, April 14, 2020 7:32 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > > Any suggestion on the above query.need help to understand it. > Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will > reserve the 500GB from scratch. > let me know if my assumption is correct? > > Regards > Navin. > > > On Mon, Apr 13, 2020 at 11:10 AM navin srivastava > wrote: > > Hi Team, > > i wanted to define a mechanism to request the local disk space while > submitting the job. > > we have dedicated /scratch of 1.2 TB file system for the execution of the > job on each of the compute nodes other than / and other file system. > i have defined in slurm.conf as TmpFS=/scratch and then wanted to use > #SBATCH --scratch =10GB in the request. > but it seems it is not accepting this variable except /tmp. > > Then i have opted the mechanism of gres.conf > > GresTypes=gpu,scratch > > and defined each node the scratch value and then requested using > --gres=lscratch:10GB > but in this scenario if requesting both gres resources gpu as well as > scratch it show me only scratch in my Gres resource not gpu. > does it using the gpu also as a gres resource? > > could anybody please advice which is the correct method to achieve the > same? > Also, is scratch will be able to calculate the actual usage value on the > node. > > REgards > Navin. > > > > > > > > > > > > > > > > > > > > > > > > >
[slurm-users] How to request for the allocation of scratch .
Hi Team, i wanted to define a mechanism to request the local disk space while submitting the job. we have dedicated /scratch of 1.2 TB file system for the execution of the job on each of the compute nodes other than / and other file system. i have defined in slurm.conf as TmpFS=/scratch and then wanted to use #SBATCH --scratch =10GB in the request. but it seems it is not accepting this variable except /tmp. Then i have opted the mechanism of gres.conf GresTypes=gpu,scratch and defined each node the scratch value and then requested using --gres=lscratch:10GB but in this scenario if requesting both gres resources gpu as well as scratch it show me only scratch in my Gres resource not gpu. does it using the gpu also as a gres resource? could anybody please advice which is the correct method to achieve the same? Also, is scratch will be able to calculate the actual usage value on the node. REgards Navin.
Re: [slurm-users] How to request for the allocation of scratch .
I attempted again and it gets succeed. Thanks for your help. On Thu, Apr 16, 2020 at 9:45 PM Ellestad, Erik wrote: > That all seems fine to me. > > I would check into your slurm logs to try and determine why slurm put your > nodes into drain state. > > Erik > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Wednesday, April 15, 2020 10:37 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > Thanks Erik. > > Last night i made the changes. > > i defined in slurm.conf on all the nodes as well as on the slurm server. > > TmpFS=/lscratch > > NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2 > CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN > Feature=P4000 Gres=gpu:2 > > These nodes having 1.6TB local scratch. i did a scontrol reconfig on all > the nodes but after sometime we saw all nodes went into drain state.then i > revert back the changes with old one. > > on all nodes jobs were running and the localsctratch is 20-25% in use. > we have already cleanup script in crontab which used to clean the scratch > space regularly. > > is anything wrong here? > > > Regards > Navin. > > > > > > > > > > On Thu, Apr 16, 2020 at 12:26 AM Ellestad, Erik > wrote: > > The default value for TmpDisk is 0, so if you want local scratch > available on a node, the amount of TmpDisk space must be defined in the > node configuration in slurm.conf. > > example: > > NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 > ThreadsPerCore=1 RealMemory=24099 TmpDisk=15 > > The configuration value for the node definition is in MB. > > https://slurm.schedmd.com/slurm.conf.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_slurm.conf.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=S4Dlg4-bVtpN7A5uGS3U053sTbGLQhWWqF7czJWRhO8=-q_lB4fzIj8i2PkR5NfWJomcaoDnRlBuyvyxkf4V0hQ=> > > *TmpDisk*Total size of temporary disk storage in *TmpFS* in megabytes > (e.g. "16384"). *TmpFS* (for "Temporary File System") identifies the > location which jobs should use for temporary storage. Note this does not > indicate the amount of free space available to the user on the node, only > the total file system size. The system administration should ensure this > file system is purged as needed so that user jobs have access to most of > this space. The Prolog and/or Epilog programs (specified in the > configuration file) might be used to ensure the file system is kept clean. > The default value is 0. > > When requesting --tmp with srun or sbatch, it can be done in various size > formats: > > *--tmp*=<*size[units]*> Specify a minimum amount of temporary disk space > per node. Default units are megabytes unless the SchedulerParameters > configuration parameter includes the "default_gbytes" option for gigabytes. > Different units can be specified using the suffix [K|M|G|T]. > https://slurm.schedmd.com/sbatch.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_sbatch.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=S4Dlg4-bVtpN7A5uGS3U053sTbGLQhWWqF7czJWRhO8=yF9k9ysKRaVnu_ZnNdHsxj8Yc6X7PXTlId7i3XgZ5V4=> > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Tuesday, April 14, 2020 11:19 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > Thank you Erik. > > To define the local scratch on all the compute node is not mandatory? only > on slurm server is enough right? > Also the TMPdisk should be defined in MB or can be defined in GB as well > > while requesting --tmp , we can use the value in GB right? > > Regards > Navin. > > > > On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik > wrote: > > Have you defined the TmpDisk value for each node? > > As far as I know, local disk space is not a valid type for GRES. > > https://slurm.schedmd.com/gres.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html=DwMFaQ=iORugZls2LlYyCAZRB3XLg=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE=qpy9RbpYHEmd6jqfs9j8b4IiRJf3GkO-X_v05nL-8Bo=B5wXNgnl6EdXIaS0QQpITnBTSxcjnAHJENyJGLyWltI=> > > "Generic resource (GRES) scheduling is supported through a flexible plugin > mechanism. Support is curr
[slurm-users] log rotation for slurmctld.
Hi, i wanted to understand how log rotation of slurmctld works. in my environment i don't have any logrotation for the slurmctld.log and now the log file size reached to 125GB. can i move the log file to some other location and then restart.reload of slurm service will start a new log file.i think this should work without any issues. am i right or it will create any issue. Also i need to create a log rotate.is the below config works as it is.i need to do it on production environment so asking to make sure it will work fine without any issue. /var/log/slurm/slurmctld.log { weekly missingok notifempty sharedscripts create 0600 slurm slurm rotate 8 compress postrotate /bin/systemctl reload slurmctld.service > /dev/null 2>/dev/null || true endscript } Regards Navin.
Re: [slurm-users] Resources are free but Job is not getting scheduled.
i missed to add the scheduling parameter. SchedulerType=sched/builtin #SchedulerParameters=enable_user_top SelectType=select/cons_res #SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_Core # JOB PRIORITY PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE could you please also suggest here if the scheduling policy is fairshare then still it will consider the priority over the partition? Regards Navin. On Sat, Apr 4, 2020 at 8:34 PM navin srivastava wrote: > Hi Team, > > I am facing one issue in my environment. our slurm version is 17.11.x > > My question is i have 2 partition: > > Queue A with node1 and node2 with Priority=1000 shared=yes > Queue B with node1 and node2 with priority=100. shared =yes > > Problem is when job from A partition is running then the job from > partition B is not going through eventhough the cpu is available on node1 > and node2. > > it is only accepting the job from partition A but not from B. > Oversubscription=No tells job will run from both the partition then it > should allow. > > Any suggestion. > > Regards > Navin. > > > > > > > >
[slurm-users] Resources are free but Job is not getting scheduled.
Hi Team, I am facing one issue in my environment. our slurm version is 17.11.x My question is i have 2 partition: Queue A with node1 and node2 with Priority=1000 shared=yes Queue B with node1 and node2 with priority=100. shared =yes Problem is when job from A partition is running then the job from partition B is not going through eventhough the cpu is available on node1 and node2. it is only accepting the job from partition A but not from B. Oversubscription=No tells job will run from both the partition then it should allow. Any suggestion. Regards Navin.
[slurm-users] not allocating the node for job execution even resources are available.
Hi , have an issue with the resource allocation. In the environment have partition like below: PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=8000 PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=100 Also the node allocated with less cpu and lot of cpu resources available NodeName=Node17 Arch=x86_64 CoresPerSocket=18 CPUAlloc=4 CPUErr=0 CPUTot=36 CPULoad=4.09 AvailableFeatures=K2200 ActiveFeatures=K2200 Gres=gpu:2 NodeAddr=Node1717 NodeHostName=Node17 Version=17.11 OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018 (3090901) RealMemory=1 AllocMem=0 FreeMem=225552 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=small_jobs,large_jobs BootTime=2020-03-21T18:56:48 SlurmdStartTime=2020-03-31T09:07:03 CfgTRES=cpu=36,mem=1M,billing=36 AllocTRES=cpu=4 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s there is no other job in small_jobs partition but several jobs are in pending in the large_jobs and the resources are available but jobs are not going through. one of the job pening output is: scontrol show job 1250258 JobId=1250258 JobName=import_workflow UserId=m209767(100468) GroupId=oled(4289) MCS_label=N/A Priority=363157 Nice=0 Account=oledgrp QOS=normal JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2020-03-28T22:00:13 EligibleTime=2020-03-28T22:00:13 StartTime=2070-03-19T11:59:09 EndTime=Unknown Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-31T12:58:48 Partition=large_jobs AllocNode:Sid=deda1x1466:62260 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) this is my slurm.conf file for scheduling. SchedulerType=sched/builtin #SchedulerParameters=enable_user_top SelectType=select/cons_res #SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_Core Any idea why the job is not going for execution if cpu cores are avaiable. Also would like to know if any jobs are running on a particular node and if i restart the Slurmd service then in what scenario my job will get killed. Generally it should not kill the job. Regards Navin.
Re: [slurm-users] not allocating the node for job execution even resources are available.
In addition to the above problem . oversubscription is NO then according to the document.so in this scenario even if resources are available it is ot accepting the job from other partition. Even i made the same priority for both the partition but it didn't help. Any Suggestion here. Slurm Workload Manager - Sharing Consumable Resources Two OverSubscribe=NO partitions assigned the same set of nodes Jobs from either partition will be assigned to all available consumable resources. No consumable resource will be shared. One node could have 2 jobs running on it, and each job could be from a different partition. On Tue, Mar 31, 2020 at 4:34 PM navin srivastava wrote: > Hi , > > have an issue with the resource allocation. > > In the environment have partition like below: > > PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE > State=UP Shared=YES Priority=8000 > PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE > State=UP Shared=YES Priority=100 > > Also the node allocated with less cpu and lot of cpu resources available > > NodeName=Node17 Arch=x86_64 CoresPerSocket=18 >CPUAlloc=4 CPUErr=0 CPUTot=36 CPULoad=4.09 >AvailableFeatures=K2200 >ActiveFeatures=K2200 >Gres=gpu:2 >NodeAddr=Node1717 NodeHostName=Node17 Version=17.11 >OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018 > (3090901) >RealMemory=1 AllocMem=0 FreeMem=225552 Sockets=2 Boards=1 >State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A >Partitions=small_jobs,large_jobs >BootTime=2020-03-21T18:56:48 SlurmdStartTime=2020-03-31T09:07:03 >CfgTRES=cpu=36,mem=1M,billing=36 >AllocTRES=cpu=4 >CapWatts=n/a >CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 >ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > there is no other job in small_jobs partition but several jobs are in > pending in the large_jobs and the resources are available but jobs are not > going through. > > one of the job pening output is: > > scontrol show job 1250258 >JobId=1250258 JobName=import_workflow >UserId=m209767(100468) GroupId=oled(4289) MCS_label=N/A >Priority=363157 Nice=0 Account=oledgrp QOS=normal >JobState=PENDING Reason=Priority Dependency=(null) >Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 >RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A >SubmitTime=2020-03-28T22:00:13 EligibleTime=2020-03-28T22:00:13 >StartTime=2070-03-19T11:59:09 EndTime=Unknown Deadline=N/A >PreemptTime=None SuspendTime=None SecsPreSuspend=0 >LastSchedEval=2020-03-31T12:58:48 >Partition=large_jobs AllocNode:Sid=deda1x1466:62260 >ReqNodeList=(null) ExcNodeList=(null) >NodeList=(null) >NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* >TRES=cpu=1,node=1 >Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* >MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >Features=(null) DelayBoot=00:00:00 >Gres=(null) Reservation=(null) >OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > > this is my slurm.conf file for scheduling. > > > SchedulerType=sched/builtin > #SchedulerParameters=enable_user_top > SelectType=select/cons_res > #SelectTypeParameters=CR_Core_Memory > SelectTypeParameters=CR_Core > > > Any idea why the job is not going for execution if cpu cores are avaiable. > > Also would like to know if any jobs are running on a particular node and > if i restart the Slurmd service then in what scenario my job will get > killed. Generally it should not kill the job. > > Regards > Navin. > > > > >
Re: [slurm-users] not allocating jobs even resources are free
Thanks Brian, As suggested i gone through document and what i understood that the fair tree leads to the Fairshare mechanism and based on that the job should be scheduling. so it mean job scheduling will be based on FIFO but priority will be decided on the Fairshare. i am not sure if both conflicts here.if i see the normal jobs priority is lower than the GPUsmall priority. so resources are available with gpusmall partition then it should go. there is no job pend due to gpu resources. the gpu resources itself not asked with the job. is there any article where i can see how the fairshare works and which are setting should not be conflict with this. According to document it never says that if fair-share is applied then FIFO should be disabled. Regards Navin. On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson wrote: > > If you haven't looked at the man page for slurm.conf, it will answer most > if not all your questions. > https://slurm.schedmd.com/slurm.conf.html but I would depend on the the > manual version that was distributed with the version you have installed as > options do change. > > There is a ton of information that is tedious to get through but reading > through it multiple times opens many doors. > > DefaultTime is listed in there as a Partition option. > If you are scheduling gres/gpu resources, it's quite possible there are > cores available with no corresponding gpus avail. > > -b > > On 4/24/20 2:49 PM, navin srivastava wrote: > > Thanks Brian. > > I need to check the jobs order. > > Is there any way to define the default timeline of the job if user not > specifying time limit. > > Also what does the meaning of fairtree in priorities in slurm.Conf file. > > The set of nodes are different in partitions.FIFO does not care for any > partitiong. > Is it like strict odering means the job came 1st will go and until it > runs it will not allow others. > > Also priorities is high for gpusmall partition and low for normal jobs and > the nodes of the normal partition is full but gpusmall cores are available. > > Regards > Navin > > On Fri, Apr 24, 2020, 23:49 Brian W. Johanson wrote: > >> Without seeing the jobs in your queue, I would expect the next job in >> FIFO order to be too large to fit in the current idle resources. >> >> Configure it to use the backfill scheduler: SchedulerType=sched/backfill >> >> SchedulerType >> Identifies the type of scheduler to be used. Note the >> slurmctld daemon must be restarted for a change in scheduler type to become >> effective (reconfiguring a running daemon has no effect for this >> parameter). The scontrol command can be used to manually change job >> priorities if desired. Acceptable values include: >> >> sched/backfill >> For a backfill scheduling module to augment the >> default FIFO scheduling. Backfill scheduling will initiate lower-priority >> jobs if doing so does not delay the expected initiation time of any >> higher priority job. Effectiveness of backfill scheduling is >> dependent upon users specifying job time limits, otherwise all jobs will >> have the same time limit and backfilling is impossible. Note documentation >> for the SchedulerParameters option above. This is the default >> configuration. >> >> sched/builtin >> This is the FIFO scheduler which initiates jobs >> in priority order. If any job in the partition can not be scheduled, no >> lower priority job in that partition will be scheduled. An exception is >> made for jobs that can not run due to partition constraints (e.g. the time >> limit) or down/drained nodes. In that case, lower priority jobs can be >> initiated and not impact the higher priority job. >> >> >> >> Your partitions are set with maxtime=INFINITE, if your users are not >> specifying a reasonable timelimit to their jobs, this won't help either. >> >> >> -b >> >> >> On 4/24/20 1:52 PM, navin srivastava wrote: >> >> In addition to the above when i see the sprio of both the jobs it says :- >> >> for normal queue jobs all jobs showing the same priority >> >> JOBID PARTITION PRIORITY FAIRSHARE >> 1291352 normal 15789 15789 >> >> for GPUsmall all jobs showing the same priority. >> >> JOBID PARTITION PRIORITY FAIRSHARE >> 1291339 GPUsmall 21052 21053 >> >> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava >> wrote: >> >>> Hi Team, >>> >>> we are facing some issu
Re: [slurm-users] not allocating jobs even resources are free
In addition to the above when i see the sprio of both the jobs it says :- for normal queue jobs all jobs showing the same priority JOBID PARTITION PRIORITY FAIRSHARE 1291352 normal 15789 15789 for GPUsmall all jobs showing the same priority. JOBID PARTITION PRIORITY FAIRSHARE 1291339 GPUsmall 21052 21053 On Fri, Apr 24, 2020 at 11:14 PM navin srivastava wrote: > Hi Team, > > we are facing some issue in our environment. The resources are free but > job is going into the QUEUE state but not running. > > i have attached the slurm.conf file here. > > scenario:- > > There are job only in the 2 partitions: > 344 jobs are in PD state in normal partition and the node belongs > from the normal partitions are full and no more job can run. > > 1300 JOBS are in GPUsmall partition are in queue and enough CPU is > avaiable to execute the jobs but i see the jobs are not scheduling on free > nodes. > > Rest there are no pend jobs in any other partition . > eg:- > node status:- node18 > > NodeName=node18 Arch=x86_64 CoresPerSocket=18 >CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07 >AvailableFeatures=K2200 >ActiveFeatures=K2200 >Gres=gpu:2 >NodeAddr=node18 NodeHostName=node18 Version=17.11 >OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018 > (0b375e4) >RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1 >State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A >Partitions=GPUsmall,pm_shared >BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08 >CfgTRES=cpu=36,mem=1M,billing=36 >AllocTRES=cpu=6 >CapWatts=n/a >CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 >ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > node19:- > > NodeName=node19 Arch=x86_64 CoresPerSocket=18 >CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43 >AvailableFeatures=K2200 >ActiveFeatures=K2200 >Gres=gpu:2 >NodeAddr=node19 NodeHostName=node19 Version=17.11 >OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018 > (3090901) >RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1 >State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A >Partitions=GPUsmall,pm_shared >BootTime=2020-03-12T06:51:54 SlurmdStartTime=2020-03-12T06:53:14 >CfgTRES=cpu=36,mem=1M,billing=36 >AllocTRES=cpu=16 >CapWatts=n/a >CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 >ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > could you please help me to understand what could be the reason? > > > > > > > > > >
[slurm-users] not allocating jobs even resources are free
Hi Team, we are facing some issue in our environment. The resources are free but job is going into the QUEUE state but not running. i have attached the slurm.conf file here. scenario:- There are job only in the 2 partitions: 344 jobs are in PD state in normal partition and the node belongs from the normal partitions are full and no more job can run. 1300 JOBS are in GPUsmall partition are in queue and enough CPU is avaiable to execute the jobs but i see the jobs are not scheduling on free nodes. Rest there are no pend jobs in any other partition . eg:- node status:- node18 NodeName=node18 Arch=x86_64 CoresPerSocket=18 CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07 AvailableFeatures=K2200 ActiveFeatures=K2200 Gres=gpu:2 NodeAddr=node18 NodeHostName=node18 Version=17.11 OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018 (0b375e4) RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=GPUsmall,pm_shared BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08 CfgTRES=cpu=36,mem=1M,billing=36 AllocTRES=cpu=6 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s node19:- NodeName=node19 Arch=x86_64 CoresPerSocket=18 CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43 AvailableFeatures=K2200 ActiveFeatures=K2200 Gres=gpu:2 NodeAddr=node19 NodeHostName=node19 Version=17.11 OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018 (3090901) RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=GPUsmall,pm_shared BootTime=2020-03-12T06:51:54 SlurmdStartTime=2020-03-12T06:53:14 CfgTRES=cpu=36,mem=1M,billing=36 AllocTRES=cpu=16 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s could you please help me to understand what could be the reason? cat /etc/slurm/slurm.conf # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # #Running_config_start #ControlMachine=node0 ControlMachine=slurmmaster ControlAddr=192.168.150.21 AuthType=auth/munge CryptoType=crypto/munge CacheGroups=1 ReturnToService=0 ProctrackType=proctrack/linuxproc SlurmctldPort=6817 SlurmdPort=6818 SchedulerPort=7321 SlurmctldPidFile=/var/slurm/slurmctld.pid SlurmdPidFile=/var/slurm/slurmd.pid SlurmdSpoolDir=/var/slurm/spool/slurmd.%n.spool StateSaveLocation=/var/slurm/state SlurmctldLogFile=/var/slurm/log/slurmctld.log SlurmdLogFile=/var/slurm/log/slurmd.%n.log.%h SlurmUser=hpcadmin MpiDefault=none SwitchType=switch/none TaskPlugin=task/affinity TaskPluginParam=Sched SlurmctldTimeout=120 SlurmdTimeout=300 InactiveLimit=0 KillWait=30 MinJobAge=3600 FastSchedule=1 SchedulerType=sched/builtin #SchedulerParameters=enable_user_top SelectType=select/cons_res #SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_Core AccountingStorageEnforce=associations AccountingStorageHost=155.250.126.30 AccountingStorageType=accounting_storage/slurmdbd #AccountingStoreJobComment=YES ClusterName=merckhpc JobCompType=jobcomp/slurmdbd JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=5 SlurmdDebug=5 Waittime=0 #Running_config_end #ControlAddr= #BackupController= #BackupAddr= # #CheckpointType=checkpoint/none #DisableRootJobs=NO #EnforcePartLimits=NO Epilog=/etc/slurm/slurm.epilog.clean #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=99 GresTypes=gpu #GroupUpdateForce=0 #GroupUpdateTime=600 #JobCheckpointDir=/var/slurm/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=1 #KillOnBadExit=0 #Licenses=foo*4,bar #MailProg=/bin/mail #MaxJobCount=5000 MaxJobCount=500 #MaxStepCount=4 #MaxTasksPerNode=128 #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs #Prolog= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #SallocDefaultCommand= #SrunEpilog= #SrunProlog= #TaskEpilog= #TaskProlog= #TopologyPlugin=topology/tree #TmpFs=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= MessageTimeout=100 #ResvOverRun=0 #OverTimeLimit=0 #UnkillableStepTimeout=60 #VSizeFactor=0 SchedulerParameters=enable_user_top,default_queue_depth=100 # # # SCHEDULING #DefMemPerCPU=0 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 # # # JOB PRIORITY PriorityType=priority/multifactor #PriortyFlags=Ticket_Based #PriorityDecayHalfLife=1-0 PriorityDecayHalfLife=2 #PriorityCalcPeriod= #PriorityFavorSmall=YES #PriorityMaxAge=7-0 PriorityUsageResetPeriod=DAILY
Re: [slurm-users] not allocating jobs even resources are free
Thanks Brian. I need to check the jobs order. Is there any way to define the default timeline of the job if user not specifying time limit. Also what does the meaning of fairtree in priorities in slurm.Conf file. The set of nodes are different in partitions.FIFO does not care for any partitiong. Is it like strict odering means the job came 1st will go and until it runs it will not allow others. Also priorities is high for gpusmall partition and low for normal jobs and the nodes of the normal partition is full but gpusmall cores are available. Regards Navin On Fri, Apr 24, 2020, 23:49 Brian W. Johanson wrote: > Without seeing the jobs in your queue, I would expect the next job in FIFO > order to be too large to fit in the current idle resources. > > Configure it to use the backfill scheduler: SchedulerType=sched/backfill > > SchedulerType > Identifies the type of scheduler to be used. Note the > slurmctld daemon must be restarted for a change in scheduler type to become > effective (reconfiguring a running daemon has no effect for this > parameter). The scontrol command can be used to manually change job > priorities if desired. Acceptable values include: > > sched/backfill > For a backfill scheduling module to augment the > default FIFO scheduling. Backfill scheduling will initiate lower-priority > jobs if doing so does not delay the expected initiation time of any > higher priority job. Effectiveness of backfill scheduling is > dependent upon users specifying job time limits, otherwise all jobs will > have the same time limit and backfilling is impossible. Note documentation > for the SchedulerParameters option above. This is the default > configuration. > > sched/builtin > This is the FIFO scheduler which initiates jobs in > priority order. If any job in the partition can not be scheduled, no lower > priority job in that partition will be scheduled. An exception is made for > jobs that can not run due to partition constraints (e.g. the time limit) or > down/drained nodes. In that case, lower priority jobs can be initiated and > not impact the higher priority job. > > > > Your partitions are set with maxtime=INFINITE, if your users are not > specifying a reasonable timelimit to their jobs, this won't help either. > > > -b > > > On 4/24/20 1:52 PM, navin srivastava wrote: > > In addition to the above when i see the sprio of both the jobs it says :- > > for normal queue jobs all jobs showing the same priority > > JOBID PARTITION PRIORITY FAIRSHARE > 1291352 normal 15789 15789 > > for GPUsmall all jobs showing the same priority. > > JOBID PARTITION PRIORITY FAIRSHARE > 1291339 GPUsmall 21052 21053 > > On Fri, Apr 24, 2020 at 11:14 PM navin srivastava > wrote: > >> Hi Team, >> >> we are facing some issue in our environment. The resources are free but >> job is going into the QUEUE state but not running. >> >> i have attached the slurm.conf file here. >> >> scenario:- >> >> There are job only in the 2 partitions: >> 344 jobs are in PD state in normal partition and the node belongs >> from the normal partitions are full and no more job can run. >> >> 1300 JOBS are in GPUsmall partition are in queue and enough CPU is >> avaiable to execute the jobs but i see the jobs are not scheduling on free >> nodes. >> >> Rest there are no pend jobs in any other partition . >> eg:- >> node status:- node18 >> >> NodeName=node18 Arch=x86_64 CoresPerSocket=18 >>CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07 >>AvailableFeatures=K2200 >>ActiveFeatures=K2200 >>Gres=gpu:2 >>NodeAddr=node18 NodeHostName=node18 Version=17.11 >>OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50 UTC 2018 >> (0b375e4) >>RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1 >>State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A >>Partitions=GPUsmall,pm_shared >>BootTime=2019-12-10T14:16:37 SlurmdStartTime=2019-12-10T14:24:08 >>CfgTRES=cpu=36,mem=1M,billing=36 >>AllocTRES=cpu=6 >>CapWatts=n/a >>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 >>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s >> >> node19:- >> >> NodeName=node19 Arch=x86_64 CoresPerSocket=18 >>CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43 >>AvailableFeatures=K2200 >>ActiveFeatures=K2200 >>Gres=gpu:2 >>NodeAddr=node19 NodeHostName=node19 Version=17.11
Re: [slurm-users] not allocating jobs even resources are free
Thanks Denial for detailed Description Regards Navin On Sun, May 3, 2020, 13:35 Daniel Letai wrote: > > On 29/04/2020 12:00:13, navin srivastava wrote: > > Thanks Daniel. > > All jobs went into run state so unable to provide the details but > definitely will reach out later if we see similar issue. > > i am more interested to understand the FIFO with Fair Tree.it will be good > if anybody provide some insight on this combination and also if we will > enable the backfilling here how the behaviour will change. > > what is the role of the Fair tree here? > > Fair tree is the algorithm used to calculate the interim priority, before > applying weight, but I think after the halflife decay. > > > To make it simple - fifo without fairshare would assign priority based > only on submission time. With faishare, that naive priority is adjusted > based on prior usage by the applicable entities (users/departments - > accounts). > > > Backfill will let you utilize your resources better, since it will allow > "inserting" low priority jobs before higher priority jobs, provided all > jobs have defined wall times, and any inserted job doesn't affect in any > way the start time of a higher priority job, thus allowing utilization of > "holes" when the scheduler waits for resources to free up, in order to > insert some large job. > > > Suppose the system is at 60% utilization of cores, and the next fifo job > requires 42% - it will wait until 2% are free so it can begin, meanwhile > not allowing any job to start, even if it would tke only 30% of the > resources (whic are currently free) and would finish before the 2% are free > anyway. > > Backfill would allow such job to start, as long as it's wall time ensures > it would finish before the 42% job would've started. > > > Fairtree in either case (fifo or backfill) calculates the priority for > each job the same - if the account had used more resources recently (the > halflife decay factor) it would get a lower priority even though it was > submitted earlier than a job from an account that didn't use any resources > recently. > > > As can be expected, backtree has to loop over all jobs in the queue, in > order to see if any job can fit out of order. In very busy/active systems, > that can lead to poor response times, unless tuned correctly in slurm conf > - look at SchedulerParameters, all params starting with bf_ and in > particular bf_max_job_test= ,bf_max_time= and bf_continue (but bf_window= > can also have some impact if set too high). > > see the man page at > https://slurm.schedmd.com/slurm.conf.html#OPT_SchedulerParameters > > > PriorityType=priority/multifactor > PriorityDecayHalfLife=2 > PriorityUsageResetPeriod=DAILY > PriorityWeightFairshare=50 > PriorityFlags=FAIR_TREE > > Regards > Navin. > > > > On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai wrote: > >> Are you sure there are enough resources available? The node is in mixed >> state, so it's configured for both partitions - it's possible that earlier >> lower priority jobs are already running thus blocking the later jobs, >> especially since it's fifo. >> >> >> It would really help if you pasted the results of: >> >> squeue >> >> sinfo >> >> >> As well as the exact sbatch line, so we can see how many resources per >> node are requested. >> >> >> On 26/04/2020 12:00:06, navin srivastava wrote: >> >> Thanks Brian, >> >> As suggested i gone through document and what i understood that the fair >> tree leads to the Fairshare mechanism and based on that the job should be >> scheduling. >> >> so it mean job scheduling will be based on FIFO but priority will be >> decided on the Fairshare. i am not sure if both conflicts here.if i see the >> normal jobs priority is lower than the GPUsmall priority. so resources are >> available with gpusmall partition then it should go. there is no job pend >> due to gpu resources. the gpu resources itself not asked with the job. >> >> is there any article where i can see how the fairshare works and which >> are setting should not be conflict with this. >> According to document it never says that if fair-share is applied then >> FIFO should be disabled. >> >> Regards >> Navin. >> >> >> >> >> >> On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson >> wrote: >> >>> >>> If you haven't looked at the man page for slurm.conf, it will answer >>> most if not all your questions. >>> https://slurm.schedmd.com/slurm.conf.html but I would depend o
Re: [slurm-users] not allocating jobs even resources are free
Thanks Daniel. All jobs went into run state so unable to provide the details but definitely will reach out later if we see similar issue. i am more interested to understand the FIFO with Fair Tree.it will be good if anybody provide some insight on this combination and also if we will enable the backfilling here how the behaviour will change. what is the role of the Fair tree here? PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE Regards Navin. On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai wrote: > Are you sure there are enough resources available? The node is in mixed > state, so it's configured for both partitions - it's possible that earlier > lower priority jobs are already running thus blocking the later jobs, > especially since it's fifo. > > > It would really help if you pasted the results of: > > squeue > > sinfo > > > As well as the exact sbatch line, so we can see how many resources per > node are requested. > > > On 26/04/2020 12:00:06, navin srivastava wrote: > > Thanks Brian, > > As suggested i gone through document and what i understood that the fair > tree leads to the Fairshare mechanism and based on that the job should be > scheduling. > > so it mean job scheduling will be based on FIFO but priority will be > decided on the Fairshare. i am not sure if both conflicts here.if i see the > normal jobs priority is lower than the GPUsmall priority. so resources are > available with gpusmall partition then it should go. there is no job pend > due to gpu resources. the gpu resources itself not asked with the job. > > is there any article where i can see how the fairshare works and which are > setting should not be conflict with this. > According to document it never says that if fair-share is applied then > FIFO should be disabled. > > Regards > Navin. > > > > > > On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson > wrote: > >> >> If you haven't looked at the man page for slurm.conf, it will answer most >> if not all your questions. >> https://slurm.schedmd.com/slurm.conf.html but I would depend on the the >> manual version that was distributed with the version you have installed as >> options do change. >> >> There is a ton of information that is tedious to get through but reading >> through it multiple times opens many doors. >> >> DefaultTime is listed in there as a Partition option. >> If you are scheduling gres/gpu resources, it's quite possible there are >> cores available with no corresponding gpus avail. >> >> -b >> >> On 4/24/20 2:49 PM, navin srivastava wrote: >> >> Thanks Brian. >> >> I need to check the jobs order. >> >> Is there any way to define the default timeline of the job if user not >> specifying time limit. >> >> Also what does the meaning of fairtree in priorities in slurm.Conf file. >> >> The set of nodes are different in partitions.FIFO does not care for >> any partitiong. >> Is it like strict odering means the job came 1st will go and until it >> runs it will not allow others. >> >> Also priorities is high for gpusmall partition and low for normal jobs >> and the nodes of the normal partition is full but gpusmall cores are >> available. >> >> Regards >> Navin >> >> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson wrote: >> >>> Without seeing the jobs in your queue, I would expect the next job in >>> FIFO order to be too large to fit in the current idle resources. >>> >>> Configure it to use the backfill scheduler: SchedulerType=sched/backfill >>> >>> SchedulerType >>> Identifies the type of scheduler to be used. Note the >>> slurmctld daemon must be restarted for a change in scheduler type to become >>> effective (reconfiguring a running daemon has no effect for this >>> parameter). The scontrol command can be used to manually change job >>> priorities if desired. Acceptable values include: >>> >>> sched/backfill >>> For a backfill scheduling module to augment the >>> default FIFO scheduling. Backfill scheduling will initiate lower-priority >>> jobs if doing so does not delay the expected initiation time of any >>> higher priority job. Effectiveness of backfill scheduling is >>> dependent upon users specifying job time limits, otherwise all jobs will >>> have the same time limit and backfilling is impossible. Note documentation >
Re: [slurm-users] How to request for the allocation of scratch .
Any suggestion on the above query.need help to understand it. Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will reserve the 500GB from scratch. let me know if my assumption is correct? Regards Navin. On Mon, Apr 13, 2020 at 11:10 AM navin srivastava wrote: > Hi Team, > > i wanted to define a mechanism to request the local disk space while > submitting the job. > > we have dedicated /scratch of 1.2 TB file system for the execution of the > job on each of the compute nodes other than / and other file system. > i have defined in slurm.conf as TmpFS=/scratch and then wanted to use > #SBATCH --scratch =10GB in the request. > but it seems it is not accepting this variable except /tmp. > > Then i have opted the mechanism of gres.conf > > GresTypes=gpu,scratch > > and defined each node the scratch value and then requested using > --gres=lscratch:10GB > but in this scenario if requesting both gres resources gpu as well as > scratch it show me only scratch in my Gres resource not gpu. > does it using the gpu also as a gres resource? > > could anybody please advice which is the correct method to achieve the > same? > Also, is scratch will be able to calculate the actual usage value on the > node. > > REgards > Navin. > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [slurm-users] how to restrict jobs
Thanks Michael, yes i have gone through but the licenses are remote license and it will be used by outside as well not only in slurm. so basically i am interested to know how we can update the database dynamically to get the exact value at that point of time. i mean query the license server and update the database accordingly. does slurm automatically updated the value based on usage? Regards Navin. On Tue, May 5, 2020 at 7:00 PM Renfro, Michael wrote: > Have you seen https://slurm.schedmd.com/licenses.html already? If the > software is just for use inside the cluster, one Licenses= line in > slurm.conf plus users submitting with the -L flag should suffice. Should be > able to set that license value is 4 if it’s licensed per node and you can > run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1 > if it’s a single license good for one run from 1-4 nodes. > > There are also options to query a FlexLM or RLM server for license > management. > > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On May 5, 2020, at 7:54 AM, navin srivastava > wrote: > > > > Hi Team, > > > > we have an application whose licenses is limited .it scales upto 4 > nodes(~80 cores). > > so if 4 nodes are full, in 5th node job used to get fail. > > we want to put a restriction so that the application can't go for the > execution beyond the 4 nodes and fail it should be in queue state. > > i do not want to keep a separate partition to achieve this config.is > there a way to achieve this scenario using some dynamic resource which can > call the license variable on the fly and if it is reached it should keep > the job in queue. > > > > Regards > > Navin. > > > > > > > >
Re: [slurm-users] how to restrict jobs
Thanks Micheal. Actually one application license are based on node and we have 4 Node license( not a fix node). we have several nodes but when job lands on any 4 random nodes it runs on those nodes only. After that it fails if it goes to other nodes. can we define a custom variable and set it on the node level and when user submit it will pass that variable and then job will and onto those specific nodes? i do not want to create a separate partition. is there any way to achieve this by any other method? Regards Navin. Regards Navin. On Tue, May 5, 2020 at 7:46 PM Renfro, Michael wrote: > Haven’t done it yet myself, but it’s on my todo list. > > But I’d assume that if you use the FlexLM or RLM parts of that > documentation, that Slurm would query the remote license server > periodically and hold the job until the necessary licenses were available. > > > On May 5, 2020, at 8:37 AM, navin srivastava > wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Thanks Michael, > > > > yes i have gone through but the licenses are remote license and it will > be used by outside as well not only in slurm. > > so basically i am interested to know how we can update the database > dynamically to get the exact value at that point of time. > > i mean query the license server and update the database accordingly. > does slurm automatically updated the value based on usage? > > > > > > Regards > > Navin. > > > > > > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael > wrote: > > Have you seen https://slurm.schedmd.com/licenses.html already? If the > software is just for use inside the cluster, one Licenses= line in > slurm.conf plus users submitting with the -L flag should suffice. Should be > able to set that license value is 4 if it’s licensed per node and you can > run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1 > if it’s a single license good for one run from 1-4 nodes. > > > > There are also options to query a FlexLM or RLM server for license > management. > > > > -- > > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > > 931 372-3601 / Tennessee Tech University > > > > > On May 5, 2020, at 7:54 AM, navin srivastava > wrote: > > > > > > Hi Team, > > > > > > we have an application whose licenses is limited .it scales upto 4 > nodes(~80 cores). > > > so if 4 nodes are full, in 5th node job used to get fail. > > > we want to put a restriction so that the application can't go for the > execution beyond the 4 nodes and fail it should be in queue state. > > > i do not want to keep a separate partition to achieve this config.is > there a way to achieve this scenario using some dynamic resource which can > call the license variable on the fly and if it is reached it should keep > the job in queue. > > > > > > Regards > > > Navin. > > > > > > > > > > > > >
[slurm-users] how to restrict jobs
Hi Team, we have an application whose licenses is limited .it scales upto 4 nodes(~80 cores). so if 4 nodes are full, in 5th node job used to get fail. we want to put a restriction so that the application can't go for the execution beyond the 4 nodes and fail it should be in queue state. i do not want to keep a separate partition to achieve this config.is there a way to achieve this scenario using some dynamic resource which can call the license variable on the fly and if it is reached it should keep the job in queue. Regards Navin.
Re: [slurm-users] how to restrict jobs
To explain with more details. job will be submitted based on core at any time but it will go to any random nodes but limited to 4 Nodes only.(license having some intelligence that it calculate the nodes and if it reached to 4 then it will not allow any more nodes. yes it didn't depend on the no of core available on nodes. Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3 and node4] Again Fifth job assigned by SLURM with 4 cores on any one node of node1, node2, node3 and node4 then license will be allowed. Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3 and node4] Again Fifth job assigned by SLURM on node5 with 4 cores then license will not allowed [ license not found error came in this case] Regards Navin. On Wed, May 6, 2020 at 7:47 PM Renfro, Michael wrote: > To make sure I’m reading this correctly, you have a software license that > lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you > use? That is, you could run any one of the following sets of jobs: > > - four 1-node jobs, > - two 2-node jobs, > - one 1-node and one 3-node job, > - two 1-node and one 2-node jobs, > - one 4-node job, > > simultaneously? And the license isn’t node-locked to specific nodes by MAC > address or anything similar? But if you try to run jobs beyond what I’ve > listed above, you run out of licenses, and you want those later jobs to be > held until licenses are freed up? > > If all of those questions have an answer of ‘yes’, I think you want the > remote license part of the https://slurm.schedmd.com/licenses.html, > something like: > > sacctmgr add resource name=software_name count=4 percentallowed=100 > server=flex_host servertype=flexlm type=license > > and submit jobs with a '-L software_name:N’ flag where N is the number of > nodes you want to run on. > > > On May 6, 2020, at 5:33 AM, navin srivastava > wrote: > > > > Thanks Micheal. > > > > Actually one application license are based on node and we have 4 Node > license( not a fix node). we have several nodes but when job lands on any 4 > random nodes it runs on those nodes only. After that it fails if it goes to > other nodes. > > > > can we define a custom variable and set it on the node level and when > user submit it will pass that variable and then job will and onto those > specific nodes? > > i do not want to create a separate partition. > > > > is there any way to achieve this by any other method? > > > > Regards > > Navin. > > > > > > Regards > > Navin. > > > > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael > wrote: > > Haven’t done it yet myself, but it’s on my todo list. > > > > But I’d assume that if you use the FlexLM or RLM parts of that > documentation, that Slurm would query the remote license server > periodically and hold the job until the necessary licenses were available. > > > > > On May 5, 2020, at 8:37 AM, navin srivastava > wrote: > > > > > > External Email Warning > > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > > Thanks Michael, > > > > > > yes i have gone through but the licenses are remote license and it > will be used by outside as well not only in slurm. > > > so basically i am interested to know how we can update the database > dynamically to get the exact value at that point of time. > > > i mean query the license server and update the database accordingly. > does slurm automatically updated the value based on usage? > > > > > > > > > Regards > > > Navin. > > > > > > > > > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael > wrote: > > > Have you seen https://slurm.schedmd.com/licenses.html already? If the > software is just for use inside the cluster, one Licenses= line in > slurm.conf plus users submitting with the -L flag should suffice. Should be > able to set that license value is 4 if it’s licensed per node and you can > run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1 > if it’s a single license good for one run from 1-4 nodes. > > > > > > There are also options to query a FlexLM or RLM server for license > management. > > > > > > -- > > > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > > > 931 372-3601 / Tennessee Tech University > > > > > > > On May 5, 2020, at 7:54 AM, navin srivastava > wrote: > > > > > > > > Hi Team, > >
Re: [slurm-users] how to restrict jobs
Is there no way to set or define a custom variable like at node level and then you pass the same variable in the job request so that it will land into those nodes only. Regards Navin On Wed, May 6, 2020, 21:04 Renfro, Michael wrote: > Ok, then regular license accounting won’t work. > > Somewhat tested, but should work or at least be a starting point. Given a > job number JOBID that’s already running with this license on one or more > nodes: > > sbatch -w $(scontrol show job JOBID | grep ' NodeList=' | cut -d= -f2) > -N 1 > > should start a one-node job on an available node being used by JOBID. Add > other parameters as required for cpus-per-task, time limits, or whatever > else is needed. If you start the larger jobs first, and let the later jobs > fill in on idle CPUs on those nodes, it should work. > > > On May 6, 2020, at 9:46 AM, navin srivastava > wrote: > > > > To explain with more details. > > > > job will be submitted based on core at any time but it will go to any > random nodes but limited to 4 Nodes only.(license having some intelligence > that it calculate the nodes and if it reached to 4 then it will not allow > any more nodes. yes it didn't depend on the no of core available on nodes. > > > > Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, > node3 and node4] > > Again Fifth job assigned by SLURM with 4 cores on any one > node of node1, node2, node3 and node4 then license will be allowed. > > > > Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, > node3 and node4] > > Again Fifth job assigned by SLURM on node5 with 4 cores > then license will not allowed [ license not found error came in this case] > > > > Regards > > Navin. > > > > > > On Wed, May 6, 2020 at 7:47 PM Renfro, Michael > wrote: > > To make sure I’m reading this correctly, you have a software license > that lets you run jobs on up to 4 nodes at once, regardless of how many > CPUs you use? That is, you could run any one of the following sets of jobs: > > > > - four 1-node jobs, > > - two 2-node jobs, > > - one 1-node and one 3-node job, > > - two 1-node and one 2-node jobs, > > - one 4-node job, > > > > simultaneously? And the license isn’t node-locked to specific nodes by > MAC address or anything similar? But if you try to run jobs beyond what > I’ve listed above, you run out of licenses, and you want those later jobs > to be held until licenses are freed up? > > > > If all of those questions have an answer of ‘yes’, I think you want the > remote license part of the https://slurm.schedmd.com/licenses.html, > something like: > > > > sacctmgr add resource name=software_name count=4 percentallowed=100 > server=flex_host servertype=flexlm type=license > > > > and submit jobs with a '-L software_name:N’ flag where N is the number > of nodes you want to run on. > > > > > On May 6, 2020, at 5:33 AM, navin srivastava > wrote: > > > > > > Thanks Micheal. > > > > > > Actually one application license are based on node and we have 4 Node > license( not a fix node). we have several nodes but when job lands on any 4 > random nodes it runs on those nodes only. After that it fails if it goes to > other nodes. > > > > > > can we define a custom variable and set it on the node level and when > user submit it will pass that variable and then job will and onto those > specific nodes? > > > i do not want to create a separate partition. > > > > > > is there any way to achieve this by any other method? > > > > > > Regards > > > Navin. > > > > > > > > > Regards > > > Navin. > > > > > > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael > wrote: > > > Haven’t done it yet myself, but it’s on my todo list. > > > > > > But I’d assume that if you use the FlexLM or RLM parts of that > documentation, that Slurm would query the remote license server > periodically and hold the job until the necessary licenses were available. > > > > > > > On May 5, 2020, at 8:37 AM, navin srivastava > wrote: > > > > > > > > External Email Warning > > > > This email originated from outside the university. Please use > caution when opening attachments, clicking links, or responding to requests. > > > > Thanks Michael, > > > > > > > > yes i have gone through but the licenses are remote license and it > will be used by outside as well not only in slurm. > > > > so basically i am inter
[slurm-users] is there a way to delay the scheduling.
Hi Team, facing one issue. several users submitting 2 job in a single batch job which is very short jobs( says 1-2 sec). so while submitting more job slurmctld become unresponsive and started giving message ending job 6e508a88155d9bec40d752c8331d7ae8 to queue. sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) Sending job 6e51ed0e322c87802b0f3a2f23a7967f to queue. sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) Sending job 6e638939f90cd59e60c23b8450af9839 to queue. sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) Sending job 6e6acf36bc7e1394a92155a95feb1c92 to queue. sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) Sending job 6e6c646a29f0ad4e9df35001c367a9f5 to queue. sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) Sending job 6ebcecb4c27d88f0f48d402e2b079c52 to queue. even that time the load of cpu started consuming more than 100% of slurmctld process. I found that the node is not able to acknowledge immediately to server. it is moving from comp to idle. so in my thought delay a scheduling cycle will help here. any idea how it can be done. so is there any other solution available for such issues. Regards Navin.
[slurm-users] slurm Report
Hi team, i have extracted the %utilization report and found that the idle time is at the higher end so wanted to check is there any way we can find the node based utilization? it will help us to figure out what are the nodes are unutilized. REgards navin.
[slurm-users] federation cluster management
Deall all, I read the concept of federation clusters in Slurm. is it really helpful to maximize the cluster usage? Actually we have 4 independent clusters with slurm which works with local storage and wanted to build a federation cluster where we can be able to utilize the free available compute power of nodes when idle. shall i achieve the same by building up the federation cluster? My worry is how i can handle the read/write operation of storage if it is local to the cluster. Any idea or suggestion will be welcome Regards Navin.
Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs
Hi Team, I have differentiated the CPU node and GPU nodes into two different queues. Now I have 20 Nodes having CPUS (20 cores)only but no GPU. Another set of nodes having GPU+CPU.some nodes are with 2 GPU and 20 CPU and some are with 8GPU and 48 CPU assigned to GPU queue user facing issues when in GPU queue. the scenario is as below: user submitting jobs with 4CPU+1GPU and also submitting jobs with 4CPU only. So the situation arises when all the GPU is full and the job submitted with GPU resources is waiting in queue but there is a large amount of CPU available but the job which is only required CPU jobs are not going through because the 4CPU+1GPU job has higher priority over CPU. is there any mechanism that once all GPU is full in use it will allow the CPU based job. Regards Navin. On Mon, Jun 22, 2020 at 6:09 PM Diego Zuccato wrote: > Il 16/06/20 16:23, Loris Bennett ha scritto: > > > Thanks for pointing this out - I hadn't been aware of this. Is there > > anywhere in the documentation where this is explicitly stated? > I don't remember. Seems Michael's experience is different. Possibly some > other setting influences that behaviour. Maybe different partition > priorities? > But on the small cluster I'm managing it's this way. I'm not an expert > and I'd like to understand. > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > >
Re: [slurm-users] changes in slurm.
Thank you for the answers. is the RealMemory will be decided on the Total Memory value or total usable memory value. i mean if a node having 256GB RAM but free -g will tell about only 251 GB. deda1x1591:~ # free -g total used free sharedbuffers cached Mem: 251 67184 6 0 47 so we can add the value is 251*1024 MB or 256*1024MB. or is there any slurm command which will provide me the value to add. Regards Navin. On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus wrote: > Navin, > > 1. you will need to restart slurmctld when you make changes to the > physical definition of a node. This can be done without affecting > running jobs. > > 2. You can have a node in more than one partition. That will not hurt > anything. Jobs are allocated to nodes, not partitions, the partition is > used to determine which node(s) and filter/order jobs. You should add > the node to the new partition, but also leave it in the 'test' > partition. If you are looking to remove the 'test' partition, set it to > down and once all the running jobs that are in it finish, then remove it. > > Brian Andrus > > On 7/8/2020 10:57 PM, navin srivastava wrote: > > Hi Team, > > > > i have 2 small query.because of the lack of testing environment i am > > unable to test the scenario. working on to set up a test environment. > > > > 1. In my environment i am unable to pass #SBATCH --mem-2GB option. > > i found the reason is because there is no RealMemory entry in the node > > definition of the slurm. > > > > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12] > > Sockets=2 CoresPerSocket=10 State=UNKNOWN > > > > if i add the RealMemory it should be able to pick. So my query here > > is, is it possible to add RealMemory in the definition anytime while > > the jobs are in progres and execute the scontrol reconfigure and > > reload the daemon on client node? or do we need to take a > > downtime?(which i don't think so) > > > > 2. Also I would like to know what will happen if some jobs are running > > in a partition(say test) and I will move the associated node to some > > other partition(say normal) without draining the node.or if i suspend > > the job and then change the node partition and will resume the job. I > > am not deleting the partition here. > > > > Regards > > Navin. > > > > > > > > > > > > > > > >
[slurm-users] CPU allocation for the GPU jobs.
Hi Team, We have separate partitions for the GPU nodes and only CPU nodes . scenario: the jobs submitted in our environment is 4CPU+1GPU as well as 4CPU only in nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted and rest other jobs are in queue waiting for the availability of GPU resources.the job submitted with only CPU is not going through even though plenty of CPU resources are available but the job which is only looking CPU, also on pend because of these GPU based jobs( priority of GPU jobs is higher than CPU one). Is there any option here we can do,so that when all GPU resources are exhausted then it should allow the CPU jobs. Is there a way to deal with it? or some custom solution which we can think of. There is no issue with CPU only partitions. Below is the my slurm configuration file NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10 RealMemory=128833 State=UNKNOWN NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10 RealMemory=515954 Feature=HIGHMEM State=UNKNOWN NodeName=node[28-32] NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28 RealMemory=257389 NodeName=node[32-33] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 RealMemory=773418 NodeName=node[17-27] NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18 RealMemory=257687 Feature=K2200 Gres=gpu:2 NodeName=node[34] NodeAddr=node34 Sockets=2 CoresPerSocket=24 RealMemory=773410 Feature=RTX Gres=gpu:8 PartitionName=node Nodes=node[1-10,14-16,28-33,35] Default=YES MaxTime=INFINITE State=UP Shared=YES PartitionName=nodeGPUsmall Nodes=node[17-27] Default=NO MaxTime=INFINITE State=UP Shared=YES PartitionName=nodeGPUbig Nodes=node[34] Default=NO MaxTime=INFINITE State=UP Shared=YES Regards Navin.
Re: [slurm-users] changes in slurm.
Thanks either I can use which slurmd -C gives because I see same set of node giving different value.or I can also choose the available memory I mean 251*1024 Regards Navin On Fri, Jul 10, 2020, 20:34 Stephan Roth wrote: > It's recommended to round RealMemory down to the next lower gigabyte > value to prevent nodes from entering a drain state after rebooting with > a bios- or kernel-update. > > Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node > configuration" > > Stephan > > On 10.07.20 13:46, Sarlo, Jeffrey S wrote: > > If you run slurmd -C on the compute node, it should tell you what > > slurm thinks the RealMemory number is. > > > > Jeff > > > > -------- > > *From:* slurm-users on behalf > of > > navin srivastava > > *Sent:* Friday, July 10, 2020 6:24 AM > > *To:* Slurm User Community List > > *Subject:* Re: [slurm-users] changes in slurm. > > Thank you for the answers. > > > > is the RealMemory will be decided on the Total Memory value or total > > usable memory value. > > > > i mean if a node having 256GB RAM but free -g will tell about only 251 > GB. > > deda1x1591:~ # free -g > > total used free sharedbuffers > cached > > Mem: 251 67184 6 0 47 > > > > so we can add the value is 251*1024 MB or 256*1024MB. or is there any > > slurm command which will provide me the value to add. > > > > Regards > > Navin. > > > > > > > > On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus > <mailto:toomuc...@gmail.com>> wrote: > > > > Navin, > > > > 1. you will need to restart slurmctld when you make changes to the > > physical definition of a node. This can be done without affecting > > running jobs. > > > > 2. You can have a node in more than one partition. That will not hurt > > anything. Jobs are allocated to nodes, not partitions, the partition > is > > used to determine which node(s) and filter/order jobs. You should add > > the node to the new partition, but also leave it in the 'test' > > partition. If you are looking to remove the 'test' partition, set it > to > > down and once all the running jobs that are in it finish, then > > remove it. > > > > Brian Andrus > > > > On 7/8/2020 10:57 PM, navin srivastava wrote: > > > Hi Team, > > > > > > i have 2 small query.because of the lack of testing environment i > am > > > unable to test the scenario. working on to set up a test > environment. > > > > > > 1. In my environment i am unable to pass #SBATCH --mem-2GB option. > > > i found the reason is because there is no RealMemory entry in the > > node > > > definition of the slurm. > > > > > > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] > > NodeAddr=Node[1-12] > > > Sockets=2 CoresPerSocket=10 State=UNKNOWN > > > > > > if i add the RealMemory it should be able to pick. So my > query here > > > is, is it possible to add RealMemory in the definition anytime > while > > > the jobs are in progres and execute the scontrol reconfigure and > > > reload the daemon on client node? or do we need to take a > > > downtime?(which i don't think so) > > > > > > 2. Also I would like to know what will happen if some jobs are > > running > > > in a partition(say test) and I will move the associated node to > some > > > other partition(say normal) without draining the node.or if i > > suspend > > > the job and then change the node partition and will resume the > > job. I > > > am not deleting the partition here. > > > > > > Regards > > > Navin. > > > > > > > > > > > > > > > > > > > > > > > > > > --- > Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch > +4144 632 30 59 | ETF D 104 | Sternwartstrasse 7 | 8092 Zurich > --- > >
Re: [slurm-users] CPU allocation for the GPU jobs.
Thanks Renfro. My scheduling policy is below. SchedulerType=sched/builtin SelectType=select/cons_res SelectTypeParameters=CR_Core AccountingStorageEnforce=associations AccountingStorageHost=192.168.150.223 AccountingStorageType=accounting_storage/slurmdbd ClusterName=hpc JobCompType=jobcomp/slurmdbd JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=5 SlurmdDebug=5 Waittime=0 Epilog=/etc/slurm/slurm.epilog.clean GresTypes=gpu MaxJobCount=500 SchedulerParameters=enable_user_top,default_queue_depth=100 # JOB PRIORITY PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE let me try changing it to the backfill and will see if it helps. Regards Navin. On Mon, Jul 13, 2020 at 5:16 PM Renfro, Michael wrote: > “The *SchedulerType* configuration parameter specifies the scheduler > plugin to use. Options are sched/backfill, which performs backfill > scheduling, and sched/builtin, which attempts to schedule jobs in a strict > priority order within each partition/queue.” > > https://slurm.schedmd.com/sched_config.html > > If you’re using the builtin scheduler, lower priority jobs have no way to > run ahead of higher priority jobs. If you’re using the backfill scheduler, > your jobs will need specific wall times specified, since the idea with > backfill is to run lower priority jobs ahead of time if and only if they > can complete without delaying the estimated start time of higher priority > jobs. > > On Jul 13, 2020, at 4:18 AM, navin srivastava > wrote: > > Hi Team, > > We have separate partitions for the GPU nodes and only CPU nodes . > > scenario: the jobs submitted in our environment is 4CPU+1GPU as well as > 4CPU only in nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted > and rest other jobs are in queue waiting for the availability of GPU > resources.the job submitted with only CPU is not going through even > though plenty of CPU resources are available but the job which is only > looking CPU, also on pend because of these GPU based jobs( priority of GPU > jobs is higher than CPU one). > > Is there any option here we can do,so that when all GPU resources are > exhausted then it should allow the CPU jobs. Is there a way to deal with > it? or some custom solution which we can think of. There is no issue with > CPU only partitions. > > Below is the my slurm configuration file > > > NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10 > RealMemory=128833 State=UNKNOWN > NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10 > RealMemory=515954 Feature=HIGHMEM State=UNKNOWN > NodeName=node[28-32] NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28 > RealMemory=257389 > NodeName=node[32-33] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 > RealMemory=773418 > NodeName=node[17-27] NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18 > RealMemory=257687 Feature=K2200 Gres=gpu:2 > NodeName=node[34] NodeAddr=node34 Sockets=2 CoresPerSocket=24 > RealMemory=773410 Feature=RTX Gres=gpu:8 > > > PartitionName=node Nodes=node[1-10,14-16,28-33,35] Default=YES > MaxTime=INFINITE State=UP Shared=YES > PartitionName=nodeGPUsmall Nodes=node[17-27] Default=NO MaxTime=INFINITE > State=UP Shared=YES > PartitionName=nodeGPUbig Nodes=node[34] Default=NO MaxTime=INFINITE > State=UP Shared=YES > > Regards > Navin. > > >
[slurm-users] changes in slurm.
Hi Team, i have 2 small query.because of the lack of testing environment i am unable to test the scenario. working on to set up a test environment. 1. In my environment i am unable to pass #SBATCH --mem-2GB option. i found the reason is because there is no RealMemory entry in the node definition of the slurm. NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12] Sockets=2 CoresPerSocket=10 State=UNKNOWN if i add the RealMemory it should be able to pick. So my query here is, is it possible to add RealMemory in the definition anytime while the jobs are in progres and execute the scontrol reconfigure and reload the daemon on client node? or do we need to take a downtime?(which i don't think so) 2. Also I would like to know what will happen if some jobs are running in a partition(say test) and I will move the associated node to some other partition(say normal) without draining the node.or if i suspend the job and then change the node partition and will resume the job. I am not deleting the partition here. Regards Navin.
Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs
Thanks Renfro. I will perform similar setting and let us see how it goes. Regards On Mon, Jun 15, 2020, 23:02 Renfro, Michael wrote: > So if a GPU job is submitted to a partition containing only GPU nodes, and > a non-GPU job is submitted to a partition containing at least some nodes > without GPUs, both jobs should be able to run. Priorities should be > evaluated on a per-partition basis. I can 100% guarantee that in our HPC, > pending GPU jobs don't block non-GPU jobs, and vice versa. > > I could see a problem if the GPU job was submitted to a partition > containing both types of nodes: if that job was assigned the highest > priority for whatever reason (fair share, age, etc.), other jobs in the > same partition would have to wait until that job started. > > A simple solution would be to make a GPU partition containing only GPU > nodes, and a non-GPU partition containing only non-GPU nodes. Submit GPU > jobs to the GPU partition, and non-GPU jobs to the non-GPU partition. > > Once that works, you could make a partition that includes both types of > nodes to reduce idle resources, but jobs submitted to that partition would > have to (a) not require a GPU, (b) require a limited number of CPUs per > node, so that you'd have some CPUs available for GPU jobs on the nodes > containing GPUs. > > ---------- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Saturday, June 13, 2020 10:47 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] ignore gpu resources to scheduled the cpu > based jobs > > > Yes we have separate partitions. Some are specific to gpu having 2 nodes > with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very > few nodes are without any gpu. > > Regards > Navin > > > On Sat, Jun 13, 2020, 21:11 navin srivastava > wrote: > > Thanks Renfro. > > Yes we have both types of nodes with gpu and nongpu. > Also some users job require gpu and some applications use only CPU. > > So the issue happens when user priority is high and waiting for gpu > resources which is not available and the job with lower priority is waiting > even though enough CPU is available which need only CPU resources. > > When I hold gpu jobs the cpu jobs will go through. > > Regards > Navin > > On Sat, Jun 13, 2020, 20:37 Renfro, Michael wrote: > > Will probably need more information to find a solution. > > To start, do you have separate partitions for GPU and non-GPU jobs? Do you > have nodes without GPUs? > > On Jun 13, 2020, at 12:28 AM, navin srivastava > wrote: > > Hi All, > > In our environment we have GPU. so what i found is if the user having high > priority and his job is in queue and waiting for the GPU resources which > are almost full and not available. so the other user submitted the job > which does not require the GPU resources are in queue even though lots of > cpu resources are available. > > our scheduling mechanism is FIFO and Fair tree enabled. Is there any way > we can make some changes so that the cpu based job should go through and > GPU based job can wait till the GPU resources are free. > > Regards > Navin. > > > > >
[slurm-users] Changing job order
Hi Team, Is their a way to change the job order in slurm.similar to sorder in PBS. I want to swap my job from the other top job. Regards Navin
Re: [slurm-users] Changing job order
Thanks Ole. Regards Navin On Thu, Jun 18, 2020 at 11:56 AM Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> wrote: > The scontrol command to set the nice level is on the list here: > https://wiki.fysik.dtu.dk/niflheim/SLURM#useful-commands > > /Ole > > On 6/18/20 8:05 AM, navin srivastava wrote: > > Thanks ** > > What is the command to modify the Nice value of an already submitted job. > > > > Regards > > Navin > > > > On Thu, Jun 18, 2020 at 4:00 AM Rodrigo Santibáñez > > mailto:rsantibanez.uch...@gmail.com>> > wrote: > > > > HI Navin, > > > > You could set the nice value of both jobs to change the priority and > > modify the order of execution. > > > > El mié., 17 jun. 2020 a las 12:31, navin srivastava > > (mailto:navin.alt...@gmail.com>>) escribió: > > > > Hi Team, > > > > Is their a way to change the job order in slurm.similar to sorder > > in PBS. > > > > I want to swap my job from the other top job. > >
Re: [slurm-users] Changing job order
Thanks What is the command to modify the Nice value of an already submitted job. Regards Navin On Thu, Jun 18, 2020 at 4:00 AM Rodrigo Santibáñez < rsantibanez.uch...@gmail.com> wrote: > HI Navin, > > You could set the nice value of both jobs to change the priority and > modify the order of execution. > > El mié., 17 jun. 2020 a las 12:31, navin srivastava (< > navin.alt...@gmail.com>) escribió: > >> Hi Team, >> >> Is their a way to change the job order in slurm.similar to sorder in PBS. >> >> I want to swap my job from the other top job. >> >> Regards >> Navin >> >>
[slurm-users] Job failure issue in Slurm
Hi Team, i am seeing a weird issue in my environment. one of the gaussian job is failing with the slurm within a minute after it go for the execution without writing anything and unable to figure out the reason. The same job works fine without slurm on the same node. slurmctld.log [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition (normal)(21052) [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition (normalGPUsmall)(21052) [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition (normalGPUbig)(21052) [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:16:12.626] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:17:12.753] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:18:12.882] debug: sched: JobId=1357498. State=PENDING. Reason=Priority, Priority=21052. Partition=normal,normalGPUsmall,normalGPUbig. [2020-06-03T19:19:13.633] sched: Allocate JobID=1357498 NodeList=oled4 #CPUs=4 Partition=normal [2020-06-04T12:25:36.961] _job_complete: JobID=1357498 State=0x1 NodeCnt=1 WEXITSTATUS 2 [2020-06-04T12:25:36.961] SLURM Job_id=1357498 Name=job1 Ended, Run time 17:06:23, FAILED, ExitCode 2 [2020-06-04T12:25:36.962] _job_complete: JobID=1357498 State=0x8005 NodeCnt=1 done slurmd.log [2020-06-04T12:22:43.712] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 164642.84(164537+105) [2020-06-04T12:23:13.712] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 164762.82(164657+105) [2020-06-04T12:23:43.712] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 164882.81(164777+105) [2020-06-04T12:24:13.712] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 165002.79(164897+105) [2020-06-04T12:24:43.712] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 165122.77(165016+105) [2020-06-04T12:25:13.713] [1357498.batch] debug: jag_common_poll_data: Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time 165242.75(165136+105) [2020-06-04T12:25:36.955] [1357498.batch] task 0 (64084) exited with exit code 2. [2020-06-04T12:25:36.955] [1357498.batch] debug: task_p_post_term: affinity 1357498.4294967294, task 0 [2020-06-04T12:25:36.960] [1357498.batch] debug: step_terminate_monitor_stop signaling condition [2020-06-04T12:25:36.960] [1357498.batch] job 1357498 completed with slurm_rc = 0, job_rc = 512 [2020-06-04T12:25:36.960] [1357498.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 512 [2020-06-04T12:25:36.961] [1357498.batch] debug: Message thread exited [2020-06-04T12:25:36.962] [1357498.batch] done with job [2020-06-04T12:25:36.962] debug: task_p_slurmd_release_resources: affinity jobid 1357498 [2020-06-04T12:25:36.962] debug: credential for job 1357498 revoked [2020-06-04T12:25:36.963] debug: Waiting for job 1357498's prolog to complete [2020-06-04T12:25:36.963] debug: Finished wait for job 1357498's prolog to complete [2020-06-04T12:25:36.963] debug: [job 1357498] attempting to run epilog [/etc/slurm/slurm.epilog.clean] [2020-06-04T12:25:37.254] debug: completed epilog for jobid 1357498 [2020-06-04T12:25:37.254] debug: Job 1357498: sent epilog complete msg: rc = 0 any suggestion will be welcome to troubleshoot this issue further. Regards Navin.
Re: [slurm-users] Job failure issue in Slurm
Thanks sathish. All other jobs are running fine across the cluster so I don't think it is related to any pam module issue. I am investigating issue further.i will come back to you with more details Regards Navin On Mon, Jun 8, 2020, 19:24 sathish wrote: > Hi Navin, > > Was this working earlier or is this the first time are you trying ? > Are you using pam module ? if yes, try disabling the pam module and see > if it works. > > Thanks > Sathish > > On Thu, Jun 4, 2020 at 10:47 PM navin srivastava > wrote: > >> Hi Team, >> >> i am seeing a weird issue in my environment. >> one of the gaussian job is failing with the slurm within a minute after >> it go for the execution without writing anything and unable to figure out >> the reason. >> The same job works fine without slurm on the same node. >> >> slurmctld.log >> >> [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition >> (normal)(21052) >> [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition >> (normalGPUsmall)(21052) >> [2020-06-03T19:14:33.170] debug: Job 1357498 has more than one partition >> (normalGPUbig)(21052) >> [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:15:12.497] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:16:12.626] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:17:12.753] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:18:12.882] debug: sched: JobId=1357498. State=PENDING. >> Reason=Priority, Priority=21052. >> Partition=normal,normalGPUsmall,normalGPUbig. >> [2020-06-03T19:19:13.633] sched: Allocate JobID=1357498 NodeList=oled4 >> #CPUs=4 Partition=normal >> [2020-06-04T12:25:36.961] _job_complete: JobID=1357498 State=0x1 >> NodeCnt=1 WEXITSTATUS 2 >> [2020-06-04T12:25:36.961] SLURM Job_id=1357498 Name=job1 Ended, Run time >> 17:06:23, FAILED, ExitCode 2 >> [2020-06-04T12:25:36.962] _job_complete: JobID=1357498 State=0x8005 >> NodeCnt=1 done >> >> slurmd.log >> >> [2020-06-04T12:22:43.712] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 164642.84(164537+105) >> [2020-06-04T12:23:13.712] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 164762.82(164657+105) >> [2020-06-04T12:23:43.712] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 164882.81(164777+105) >> [2020-06-04T12:24:13.712] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 165002.79(164897+105) >> [2020-06-04T12:24:43.712] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 165122.77(165016+105) >> [2020-06-04T12:25:13.713] [1357498.batch] debug: jag_common_poll_data: >> Task average frequency = 2769 pid 64084 mem size 4625724 23696420 time >> 165242.75(165136+105) >> [2020-06-04T12:25:36.955] [1357498.batch] task 0 (64084) exited with exit >> code 2. >> [2020-06-04T12:25:36.955] [1357498.batch] debug: task_p_post_term: >> affinity 1357498.4294967294, task 0 >> [2020-06-04T12:25:36.960] [1357498.batch] debug: >> step_terminate_monitor_stop signaling condition >> [2020-06-04T12:25:36.960] [1357498.batch] job 1357498 completed with >> slurm_rc = 0, job_rc = 512 >> [2020-06-04T12:25:36.960] [1357498.batch] sending >> REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 512 >> [2020-06-04T12:25:36.961] [1357498.batch] debug: Message thread exited >> [2020-06-04T12:25:36.962] [1357498.batch] done with job >> [2020-06-04T12:25:36.962] debug: task_p_slurmd_release_resources: >> affinity jobid 1357498 >> [2020-06-04T12:25:36.962] debug: credential for job 1357498 revoked >> [2020-06-04T12:25:36.963] debug: Waiting for job 1357498's prolog to >> complete >> [2020-06-04T12:25:36.963] debug: Finished wait for job 1357498's prolog >> to complete >> [2020-06-04T12:25:36.963] debug: [job 1357498] attempting to run epilog >> [/etc/slurm/slurm.epilog.clean] >> [2020-06-04T12:25:37.254] debug: completed epilog for jobid 1357498 >> [2020-06-04T12:25:37.254] debug: Job 1357498: sent epilog complete msg: >> rc = 0 >> >> any suggestion will be welcome to troubleshoot this issue further. >> >> Regards >> Navin. >> >> >> >> > > -- > Regards. > Sathish >
Re: [slurm-users] unable to start slurmd process.
I tried by executing the debug mode but there also it is not writing anything. i waited for about 5-10 minutes deda1x1452:/etc/sysconfig # /usr/sbin/slurmd -v -v No output on terminal. The OS is SLES12-SP4 . All firewall services are disabled. The recent change is the local hostname earlier it was with local hostname node1,node2,etc but we have moved to dns based hostname which is deda NodeName=node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10 State=UNKNOWN other than this it is fine but after that i have done several time slurmd process started on the node and it works fine but now i am seeing this issue today. Regards Navin. On Thu, Jun 11, 2020 at 6:06 PM Riebs, Andy wrote: > Navin, > > > > As you can see, systemd provides very little service-specific information. > For slurm, you really need to go to the slurm logs to find out what > happened. > > > > Hint: A quick way to identify problems like this with slurmd and slurmctld > is to run them with the “-Dvvv” option, causing them to log to your window, > and usually causing the problem to become immediately obvious. > > > > For example, > > > > # /usr/local/slurm/sbin/slurmd -D > > > > Just it ^C when you’re done, if necessary. Of course, if it doesn’t fail > when you run it this way, it’s time to look elsewhere. > > > > Andy > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 8:25 AM > *To:* Slurm User Community List > *Subject:* [slurm-users] unable to start slurmd process. > > > > Hi Team, > > > > when i am trying to start the slurmd process i am getting the below error. > > > > 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node > daemon... > 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start > operation timed out. Terminating. > 2020-06-11T13:13:28.684479+02:00 oled3 systemd[1]: Failed to start Slurm > node daemon. > 2020-06-11T13:13:28.684759+02:00 oled3 systemd[1]: slurmd.service: Unit > entered failed state. > 2020-06-11T13:13:28.684917+02:00 oled3 systemd[1]: slurmd.service: Failed > with result 'timeout'. > 2020-06-11T13:15:01.437172+02:00 oled3 cron[8094]: > pam_unix(crond:session): session opened for user root by (uid=0) > > > > Slurm version is 17.11.8 > > > > The server and slurm is running from long time and we have not made any > changes but today when i am starting it is giving this error message. > > Any idea what could be wrong here. > > > > Regards > > Navin. > > > > > > > > >
[slurm-users] unable to start slurmd process.
Hi Team, when i am trying to start the slurmd process i am getting the below error. 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node daemon... 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start operation timed out. Terminating. 2020-06-11T13:13:28.684479+02:00 oled3 systemd[1]: Failed to start Slurm node daemon. 2020-06-11T13:13:28.684759+02:00 oled3 systemd[1]: slurmd.service: Unit entered failed state. 2020-06-11T13:13:28.684917+02:00 oled3 systemd[1]: slurmd.service: Failed with result 'timeout'. 2020-06-11T13:15:01.437172+02:00 oled3 cron[8094]: pam_unix(crond:session): session opened for user root by (uid=0) Slurm version is 17.11.8 The server and slurm is running from long time and we have not made any changes but today when i am starting it is giving this error message. Any idea what could be wrong here. Regards Navin.
Re: [slurm-users] unable to start slurmd process.
i collected the log from slurmctld and it says below [2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284 Nodelist=oled3 [2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3 RPC:REQUEST_TERMINATE_JOB : Communication connection failure [2020-06-11T07:14:50.210] error: Nodes oled3 not responding [2020-06-11T07:15:54.313] error: Nodes oled3 not responding [2020-06-11T07:17:34.407] error: Nodes oled3 not responding [2020-06-11T07:19:14.637] error: Nodes oled3 not responding [2020-06-11T07:19:54.313] update_node: node oled3 reason set to: reboot-required [2020-06-11T07:19:54.313] update_node: node oled3 state set to DRAINING* [2020-06-11T07:20:43.788] requeue job 1316970 due to failure of node oled3 [2020-06-11T07:20:43.788] requeue job 1349322 due to failure of node oled3 [2020-06-11T07:20:43.789] error: Nodes oled3 not responding, setting DOWN sinfo says OLED* up infinite 1 drain* oled3 while checking the node i feel node is healthy. Regards Navin On Thu, Jun 11, 2020 at 7:21 PM Riebs, Andy wrote: > Weird. “slurmd -Dvvv” ought to report a whole lot of data; I can’t guess > how to interpret it not reporting anything but the “log file” and “munge” > messages. When you have it running attached to your window, is there any > chance that sinfo or scontrol suggest that the node is actually all right? > Perhaps something in /etc/sysconfig/slurm or the like is messed up? > > > > If that’s not the case, I think my next step would be to follow up on > someone else’s suggestion, and scan the slurmctld.log file for the problem > node name. > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 9:26 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > Sorry Andy I missed to add. > > 1st i tried the slurmd -Dvvv and it is not written anything > > slurmd: debug: Log file re-opened > slurmd: debug: Munge authentication plugin loaded > > > > After that I waited for 10-20 minutes but no output and finally i pressed > Ctrl^c. > > > > My doubt is in slurm.conf file: > > > > ControlMachine=deda1x1466 > ControlAddr=192.168.150.253 > > > > The deda1x1466 is having a different interface with different IP which > compute node is unable to ping but IP is pingable. > > could be one of the reason? > > > > but other nodes having the same config and there i am able to start the > slurmd. so bit of confusion. > > > > Regards > > Navin. > > > > > > > > > > > > > > > > > > Regards >
Re: [slurm-users] unable to start slurmd process.
i am able to get the output scontrol show node oled3 also the oled3 is pinging fine and scontrol ping output showing like Slurmctld(primary/backup) at deda1x1466/(NULL) are UP/DOWN so all looks ok to me. REgards Navin. On Thu, Jun 11, 2020 at 8:38 PM Riebs, Andy wrote: > So there seems to be a failure to communicate between slurmctld and the > oled3 slurmd. > > > > From oled3, try “scontrol ping” to confirm that it can see the slurmctld > daemon. > > > > From the head node, try “scontrol show node oled3”, and then ping the > address that is shown for “NodeAddr=” > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 10:40 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > i collected the log from slurmctld and it says below > > > > [2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3 > RPC:REQUEST_TERMINATE_JOB : Communication connection failure > [2020-06-11T07:14:50.210] error: Nodes oled3 not responding > [2020-06-11T07:15:54.313] error: Nodes oled3 not responding > [2020-06-11T07:17:34.407] error: Nodes oled3 not responding > [2020-06-11T07:19:14.637] error: Nodes oled3 not responding > [2020-06-11T07:19:54.313] update_node: node oled3 reason set to: > reboot-required > [2020-06-11T07:19:54.313] update_node: node oled3 state set to DRAINING* > [2020-06-11T07:20:43.788] requeue job 1316970 due to failure of node oled3 > [2020-06-11T07:20:43.788] requeue job 1349322 due to failure of node oled3 > [2020-06-11T07:20:43.789] error: Nodes oled3 not responding, setting DOWN > > > > sinfo says > > > > OLED* up infinite 1 drain* oled3 > > > > while checking the node i feel node is healthy. > > > > Regards > > Navin > > > > On Thu, Jun 11, 2020 at 7:21 PM Riebs, Andy wrote: > > Weird. “slurmd -Dvvv” ought to report a whole lot of data; I can’t guess > how to interpret it not reporting anything but the “log file” and “munge” > messages. When you have it running attached to your window, is there any > chance that sinfo or scontrol suggest that the node is actually all right? > Perhaps something in /etc/sysconfig/slurm or the like is messed up? > > > > If tha
Re: [slurm-users] unable to start slurmd process.
Hi Team, After my Analysis i found that the user used the qdel command which is a plugin with slurm and the job is not killed properly and it makes the slurmstepd process in a kind of hung state. so when i was trying to start the slurmd the process was not getting started.after killing those processes. slurmd started without any issues. Regards Navin. On Thu, Jun 11, 2020 at 9:23 PM Riebs, Andy wrote: > Short of getting on the system and kicking the tires myself, I’m fresh out > of ideas. Does “sinfo -R” offer any hints? > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 11:31 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > i am able to get the output scontrol show node oled3 > > also the oled3 is pinging fine > > > > and scontrol ping output showing like > > > > Slurmctld(primary/backup) at deda1x1466/(NULL) are UP/DOWN > > > > so all looks ok to me. > > > > REgards > > Navin. > > > > > > > > On Thu, Jun 11, 2020 at 8:38 PM Riebs, Andy wrote: > > So there seems to be a failure to communicate between slurmctld and the > oled3 slurmd. > > > > From oled3, try “scontrol ping” to confirm that it can see the slurmctld > daemon. > > > > From the head node, try “scontrol show node oled3”, and then ping the > address that is shown for “NodeAddr=” > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 10:40 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > i collected the log from slurmctld and it says below > > > > [2020-06-10T20:10:38.501] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:14:38.901] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:18:38.255] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:22:38.624] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:26:38.902] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:30:38.230] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:34:38.594] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:38:38.986] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:42:38.402] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:46:38.764] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T20:50:38.094] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:26:38.839] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:30:38.225] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:34:38.582] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:38:38.914] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:42:38.292] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:46:38.542] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:50:38.869] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:54:38.227] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-10T21:58:38.628] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T06:54:39.012] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T06:58:39.411] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:02:39.106] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:06:39.495] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:10:39.814] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:14:39.188] Resending TERMINATE_JOB request JobId=1252284 > Nodelist=oled3 > [2020-06-11T07:14:49.204] agent/is_node_resp: node:oled3 > RPC:REQUEST_TERMINATE_JOB : Communication connection failure > [2020-06-11T07:14:50.210] error: Nodes oled3 not responding > [2020-06-11T07:15:54.313] error: Nodes oled3 not responding > [2020-06-11T07:17:34.407] error: Nodes oled3 not responding > [2020-06-11T07:19:14.637] error: Nodes oled3 not responding > [2020-06-11T07:19:54.313] update_node: node oled3 reason set to: > reboot-required > [2020-06-11T07:19:54.313] update_node: no
[slurm-users] ignore gpu resources to scheduled the cpu based jobs
Hi All, In our environment we have GPU. so what i found is if the user having high priority and his job is in queue and waiting for the GPU resources which are almost full and not available. so the other user submitted the job which does not require the GPU resources are in queue even though lots of cpu resources are available. our scheduling mechanism is FIFO and Fair tree enabled. Is there any way we can make some changes so that the cpu based job should go through and GPU based job can wait till the GPU resources are free. Regards Navin.
Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs
Yes we have separate partitions. Some are specific to gpu having 2 nodes with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very few nodes are without any gpu. Regards Navin On Sat, Jun 13, 2020, 21:11 navin srivastava wrote: > Thanks Renfro. > > Yes we have both types of nodes with gpu and nongpu. > Also some users job require gpu and some applications use only CPU. > > So the issue happens when user priority is high and waiting for gpu > resources which is not available and the job with lower priority is waiting > even though enough CPU is available which need only CPU resources. > > When I hold gpu jobs the cpu jobs will go through. > > Regards > Navin > > On Sat, Jun 13, 2020, 20:37 Renfro, Michael wrote: > >> Will probably need more information to find a solution. >> >> To start, do you have separate partitions for GPU and non-GPU jobs? Do >> you have nodes without GPUs? >> >> On Jun 13, 2020, at 12:28 AM, navin srivastava >> wrote: >> >> Hi All, >> >> In our environment we have GPU. so what i found is if the user having >> high priority and his job is in queue and waiting for the GPU resources >> which are almost full and not available. so the other user submitted the >> job which does not require the GPU resources are in queue even though lots >> of cpu resources are available. >> >> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way >> we can make some changes so that the cpu based job should go through and >> GPU based job can wait till the GPU resources are free. >> >> Regards >> Navin. >> >> >> >> >>
Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs
Thanks Renfro. Yes we have both types of nodes with gpu and nongpu. Also some users job require gpu and some applications use only CPU. So the issue happens when user priority is high and waiting for gpu resources which is not available and the job with lower priority is waiting even though enough CPU is available which need only CPU resources. When I hold gpu jobs the cpu jobs will go through. Regards Navin On Sat, Jun 13, 2020, 20:37 Renfro, Michael wrote: > Will probably need more information to find a solution. > > To start, do you have separate partitions for GPU and non-GPU jobs? Do you > have nodes without GPUs? > > On Jun 13, 2020, at 12:28 AM, navin srivastava > wrote: > > Hi All, > > In our environment we have GPU. so what i found is if the user having high > priority and his job is in queue and waiting for the GPU resources which > are almost full and not available. so the other user submitted the job > which does not require the GPU resources are in queue even though lots of > cpu resources are available. > > our scheduling mechanism is FIFO and Fair tree enabled. Is there any way > we can make some changes so that the cpu based job should go through and > GPU based job can wait till the GPU resources are free. > > Regards > Navin. > > > > >
Re: [slurm-users] missing info from sacct
Thank you Andy. but when i am trying to get the utilization for the months it says it is 100%. when i tried to find it using utilization by user it gives me a very different value which i am unable to understand. deda1x1466:~ # sreport cluster AccountUtilizationByUser start=10/02/20 end=10/02/20 cluster=hpc2 -t HOUR --tres=cpu Cluster/Account/User Utilization 2020-10-02T00:00:00 - 2020-10-02T00:59:59 (3600 secs) Usage reported in TRES Hours Cluster Account Login Proper Name TRES Name Used - --- - --- -- - hpc2root cpu 68159 hpc2stdg_acc cpu 68159 hpc2stdg_acc m219018 Harbach Philippcpu 317 hpc2stdg_acc m253000 Morin Valeriecpu12 hpc2stdg_acc m254746 Lippolis Eleon+cpu 9 hpc2stdg_acc m258464Wurl Andreascpu96 hpc2stdg_acc m262230 Schmelzer Maxi+cpu 2 hpc2stdg_acc m270962 Heidrich Johan+cpu 67647 hpc2stdg_acc m271803 Hermsen Markocpu46 hpc2stdg_acc m275696 Ploetz Tobiascpu10 hpc2stdg_acc m278452 Brandenburg Ja+cpu19 hpc2stdg_acc m290493cpu 1 How it is calculating the hour in a day . Regards Navin. On Wed, Nov 18, 2020 at 7:51 PM Andy Riebs wrote: > I see from your subsequent post that you're using a pair of clusters > with a single database, so yes, you are using federation. > > The high order bits of the Job ID identify the cluster that ran the job, > so you will typically have a huge gap between ranges of Job IDs. > > Andy > > On 11/18/2020 9:15 AM, Andy Riebs wrote: > > Are you using federated clusters? If not, check slurm.conf -- do you > > have FirstJobId set? > > > > Andy > > > > On 11/18/2020 8:42 AM, navin srivastava wrote: > >> While running the sacct we found that some jobid are not listing. > >> > >> 5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED > >>0:0 > >> 5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED > >>0:0 > >> 11016496 jupyter-s+ stdg_defq stdg_acc 1 RUNNING > >> 0:0 > >> 11016496.ex+ extern stdg_acc 1 COMPLETED > >>0:0 > >> > >> Not able to see the jobid in between these range in sacct info. > >> Any hint what went wrong here. > >> > >> Regards > >> Navin. > >
[slurm-users] missing info from sacct
While running the sacct we found that some jobid are not listing. 5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 11016496 jupyter-s+ stdg_defq stdg_acc 1RUNNING 0:0 11016496.ex+ extern stdg_acc 1 COMPLETED 0:0 Not able to see the jobid in between these range in sacct info. Any hint what went wrong here. Regards Navin.
Re: [slurm-users] Sreport Query
is there a way to find the utilization per Node? Regards Navin. On Wed, Nov 18, 2020 at 10:37 AM navin srivastava wrote: > Dear All, > > Good Day! > > i am seeing one strange behaviour in my environment. > > we have 2 clusters in our environment one acting as a database server and > have pointed the 2nd cluster to the same database. > > -- - > hpc1 155.250.126.30 6817 8192 1 > normal > hpc2 155.250.168.57 6817 8192 1 > normal > > While generating the report I am able to generate for the local > cluster(hpc1) without any issue and it looks good. but from the second > cluster data it always shows me 100% utilization from june onwards ,earlier > data is fine.which is definitely wrong. > > sreport cluster utilization start=06/01/20 end=06/30/20 cluster=hpc2 -t > percent | grep hpc2 > hpc2 100.00%0.00%0.00%0.00%0.00% 99.82% > > any suggestion what went wrong here. how to troubleshoot this issue. > > Regards > Navin. > > > > > > > > >
[slurm-users] Slurm Upgrade
Dear All, Currently we are running slurm version 17.11.x and wanted to move to 20.x. We are building the New server with Slurm 20.2 version and planning to upgrade the client nodes from 17.x to 20.x. wanted to check if we can upgrade the Client from 17.x to 20.x directly or we need to go through 17.x to 18.x and 19.x then 20.x Regards Navin.
Re: [slurm-users] Slurm Upgrade
Thank you all for the response. but my question here is I have already built a new server slurm 20.2 with the latest DB. my question is, shall i do a mysqldump into this server from existing server running with version slurm version 17.11.8 and then i will upgrade all client with 20.x followed by 18.x and 19.x or i can uninstall the slurm 17.11.8 and install 20.2 on all compute nodes. Regards Navin. On Tue, Nov 3, 2020 at 12:31 PM Ole Holm Nielsen wrote: > On 11/2/20 2:25 PM, navin srivastava wrote: > > Currently we are running slurm version 17.11.x and wanted to move to > 20.x. > > > > We are building the New server with Slurm 20.2 version and planning to > > upgrade the client nodes from 17.x to 20.x. > > > > wanted to check if we can upgrade the Client from 17.x to 20.x directly > or > > we need to go through 17.x to 18.x and 19.x then 20.x > > I have described the Slurm upgrade process in my Wiki page: > https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm > > It's based upon experiences and Slurm documentation and seems to work > correctly. > > /Ole > >
[slurm-users] Sinfo or squeue stuck for some seconds
Dear slurm community users, We are using slurm version 20.02.x. We see the below message appearing a lot of times in slurmctld log and found that whenever this message is appearing the sinfo/squeue out gets slow. No timeout as i kept the value 100. Warning: Note very large processing time from load_part_uid_allow_list: usec=10800885 began=16:27:55.952 [2021-08-29T16:28:06.753] Warning: Note very large processing time from _slurmctld_background: usec=10801120 began=16:27:55.952 Is this a bug or some config issue. if anybody faced the similar issue.could anybody throw some light on this. please find the attached slurm.conf. Regards Navin. ClusterName=merckhpc ControlMachine=Master ControlAddr=localhost AuthType=auth/munge CredType=cred/munge CacheGroups=1 ReturnToService=0 ProctrackType=proctrack/linuxproc SlurmctldPort=6817 SlurmdPort=6818 SchedulerPort=7321 SlurmctldPidFile=/var/slurm/slurmctld.pid SlurmdPidFile=/var/slurm/slurmd.%n.pid SlurmdSpoolDir=/var/slurm/spool/slurmd.%n.spool StateSaveLocation=/var/slurm/state SlurmctldLogFile=/var/slurm/log/slurmctld.log SlurmdLogFile=/var/slurm/log/slurmd.%n.log.%h SlurmUser=hpcadmin MpiDefault=none SwitchType=switch/none TaskPlugin=task/affinity TaskPluginParam=Sched SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 KillWait=30 MinJobAge=3600 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core AccountingStorageEnforce=associations AccountingStorageHost=localhost AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES JobCompType=jobcomp/slurmdbd JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmdDebug=5 SlurmctldDebug=5 Waittime=0 Epilog=/etc/slurm/slurm.epilog.clean GresTypes=gpu MaxArraySize=1 MaxJobCount=500 MessageTimeout=100 SchedulerParameters=enable_user_top,default_queue_depth=100 PriorityType=priority/multifactor PriorityDecayHalfLife=2 PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE NodeName=node[35-40] NodeHostname=bng1x[1847-1852] NodeAddr=node[35-40] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386626 NodeName=node[17-26] NodeHostName=bng1x[1590-1599] NodeAddr=node[17-26] CPUs=36 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257680 Feature=K2200 Gres=gpu:2 NodeName=node41 NodeHostName=bng1x1855 NodeAddr=node41 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=386643 Feature=V100S Gres=gpu:2 NodeName=node[32-33] NodeHostname=bng1x[1793-1794] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 RealMemory=773690 NodeName=node[28-31] NodeHostname=bng1x[1737-1740] NodeAddr=node[28-31] Sockets=2 CoresPerSocket=28 RealMemory=257586 NodeName=node[27] NodeHostname=bng1x1600 NodeAddr=node27 Sockets=2 CoresPerSocket=18 RealMemory=515728 Feature=K40 Gres=gpu:2 NodeName=node[34] NodeHostname=bng1x1795 NodeAddr=node34 Sockets=2 CoresPerSocket=24 RealMemory=773682 Feature=RTX Gres=gpu:8 PartitionName=Normal Nodes=node[28-33,35-40] Default=Yes MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO PartitionName=testq Nodes=node41 Default=NO MaxTime=INFINITE State=UP Shared=YES PartitionName=smallgpu Nodes=node[34] Default=NO MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO PartitionName=biggpu Nodes=node[17-27] Default=NO MaxTime=INFINITE State=UP Shared=YES OverSubscribe=NO
[slurm-users] Slurm Multi-cluster implementation
Hi , I am looking for a stepwise guide to setup multi cluster implementation. We wanted to set up 3 clusters and one Login Node to run the job using -M cluster option. can anybody have such a setup and can share some insight into how it works and it is really a stable solution. Regards Navin.
Re: [slurm-users] Slurm Multi-cluster implementation
Thank you Tina. so if i understood correctly.Database is global to both cluster and running on login Node? or is the database running on one of the master Node and shared with another master server Node? but as far I have read that the slurm database can also be separate on both the master and just use the parameter AccountingStorageExternalHost so that both databases are aware of each other. Also on the login node in slurm .conf file pointed to which Slurmctld? is it possible to share the sample slurm.conf file of login Node. Regards Navin. On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich wrote: > Hi Navin, > > well, I have two clusters & login nodes that allow access to both. That > do? I don't think a third would make any difference in setup. > > They need to share a database. As long as the share a database, the > clusters have 'knowledge' of each other. > > So if you set up one database server (running slurmdbd), and then a > SLURM controller for each cluster (running slurmctld) using that one > central database, the '-M' option should work. > > Tina > > On 28/10/2021 10:54, navin srivastava wrote: > > Hi , > > > > I am looking for a stepwise guide to setup multi cluster implementation. > > We wanted to set up 3 clusters and one Login Node to run the job using > > -M cluster option. > > can anybody have such a setup and can share some insight into how it > > works and it is really a stable solution. > > > > > > Regards > > Navin. > > -- > Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator > > Research Computing and Support Services > IT Services, University of Oxford > http://www.arc.ox.ac.uk http://www.it.ox.ac.uk > >
Re: [slurm-users] Slurm Multi-cluster implementation
Thank you Tina. It will really help Regards Navin On Thu, Oct 28, 2021, 22:01 Tina Friedrich wrote: > Hello, > > I have the database on a separate server (it runs the database and the > database only). The login nodes run nothing SLURM related, they simply > have the binaries installed & a SLURM config. > > I've never looked into having multiple databases & using > AccountingStorageExternalHost (in fact I'd forgotten you could do that), > so I can't comment on that (maybe someone else can); I think that works, > yes, but as I said never tested that (didn't see much point in running > multiple databases if one would do the job). > > I actually have specific login nodes for both of my clusters, to make it > easier for users (especially those with not much experience using the > HPC environment); so I have one login node connecting to cluster 1 and > one connecting to cluster 1. > > I think the relevant bits of slurm.conf Relevant config entries (if I'm > not mistaken) on the login nodes are probably: > > The differences in the slurm config files (that haven't got to do with > topology & nodes & scheduler tuning) are > > ClusterName=cluster1 > ControlMachine=cluster1-slurm > ControlAddr=/IP_OF_SLURM_CONTROLLER/ > > ClusterName=cluster2 > ControlMachine=cluster2-slurm > ControlAddr=/IP_OF_SLURM_CONTROLLER/ > > (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm, > same for cluster2) > > And then the have common entries for the AccountingStorageHost: > > AccountingStorageHost=slurm-db-prod > AccountingStorageBackupHost=slurm-db-prod > AccountingStoragePort=7030 > AccountingStorageType=accounting_storage/slurmdbd > > (slurm-db-prod is simply the hostname of the SLURM database server) > > Does that help? > > Tina > > On 28/10/2021 14:59, navin srivastava wrote: > > Thank you Tina. > > > > so if i understood correctly.Database is global to both cluster and > > running on login Node? > > or is the database running on one of the master Node and shared with > > another master server Node? > > > > but as far I have read that the slurm database can also be separate on > > both the master and just use the parameter > > AccountingStorageExternalHost so that both databases are aware of each > > other. > > > > Also on the login node in slurm .conf file pointed to which Slurmctld? > > is it possible to share the sample slurm.conf file of login Node. > > > > Regards > > Navin. > > > > > > > > > > > > > > > > > > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich > > mailto:tina.friedr...@it.ox.ac.uk>> wrote: > > > > Hi Navin, > > > > well, I have two clusters & login nodes that allow access to both. > That > > do? I don't think a third would make any difference in setup. > > > > They need to share a database. As long as the share a database, the > > clusters have 'knowledge' of each other. > > > > So if you set up one database server (running slurmdbd), and then a > > SLURM controller for each cluster (running slurmctld) using that one > > central database, the '-M' option should work. > > > > Tina > > > > On 28/10/2021 10:54, navin srivastava wrote: > > > Hi , > > > > > > I am looking for a stepwise guide to setup multi cluster > > implementation. > > > We wanted to set up 3 clusters and one Login Node to run the job > > using > > > -M cluster option. > > > can anybody have such a setup and can share some insight into how > it > > > works and it is really a stable solution. > > > > > > > > > Regards > > > Navin. > > > > -- > > Tina Friedrich, Advanced Research Computing Snr HPC Systems > > Administrator > > > > Research Computing and Support Services > > IT Services, University of Oxford > > http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk> > > http://www.it.ox.ac.uk <http://www.it.ox.ac.uk> > > > > -- > Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator > > Research Computing and Support Services > IT Services, University of Oxford > http://www.arc.ox.ac.uk http://www.it.ox.ac.uk > >
[slurm-users] maridb version compatibility with Slurm version
Hi, I have a question related to the mariadb vs slurm version compatibility. Is there any matrix available? We are running with slurm version 20.02 in our environment on SLES15SP3 and with mariadb 10.5.x . We are upgrading the OS from SLES15SP3 to SP4 and with this we see the mariadb version is 10.6.x. and we are not upgrading the Slurm version. What is the best way to deal with this as we patch the server quarterly and keep the slurm version unchanged as I locked this at os level but the mariadb version update happens and as far as i see it has no impact. is it good idea to keep the mariadb version also intact with the slurm version? Regards Navin.