Thanks Erik. Last night i made the changes.
i defined in slurm.conf on all the nodes as well as on the slurm server. TmpFS=/lscratch NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2 CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=1600000 State=UNKNOWN Feature=P4000 Gres=gpu:2 These nodes having 1.6TB local scratch. i did a scontrol reconfig on all the nodes but after sometime we saw all nodes went into drain state.then i revert back the changes with old one. on all nodes jobs were running and the localsctratch is 20-25% in use. we have already cleanup script in crontab which used to clean the scratch space regularly. is anything wrong here? Regards Navin. On Thu, Apr 16, 2020 at 12:26 AM Ellestad, Erik <erik.elles...@ucsf.edu> wrote: > The default value for TmpDisk is 0, so if you want local scratch > available on a node, the amount of TmpDisk space must be defined in the > node configuration in slurm.conf. > > example: > > NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 > ThreadsPerCore=1 RealMemory=24099 TmpDisk=150000 > > The configuration value for the node definition is in MB. > > https://slurm.schedmd.com/slurm.conf.html > > *TmpDisk* Total size of temporary disk storage in *TmpFS* in megabytes > (e.g. "16384"). *TmpFS* (for "Temporary File System") identifies the > location which jobs should use for temporary storage. Note this does not > indicate the amount of free space available to the user on the node, only > the total file system size. The system administration should ensure this > file system is purged as needed so that user jobs have access to most of > this space. The Prolog and/or Epilog programs (specified in the > configuration file) might be used to ensure the file system is kept clean. > The default value is 0. > > When requesting --tmp with srun or sbatch, it can be done in various size > formats: > > *--tmp*=<*size[units]*> Specify a minimum amount of temporary disk space > per node. Default units are megabytes unless the SchedulerParameters > configuration parameter includes the "default_gbytes" option for gigabytes. > Different units can be specified using the suffix [K|M|G|T]. > https://slurm.schedmd.com/sbatch.html > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > ------------------------------ > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > navin srivastava <navin.alt...@gmail.com> > *Sent:* Tuesday, April 14, 2020 11:19 PM > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > Thank you Erik. > > To define the local scratch on all the compute node is not mandatory? only > on slurm server is enough right? > Also the TMPdisk should be defined in MB or can be defined in GB as well > > while requesting --tmp , we can use the value in GB right? > > Regards > Navin. > > > > On Tue, Apr 14, 2020 at 11:04 PM Ellestad, Erik <erik.elles...@ucsf.edu> > wrote: > > Have you defined the TmpDisk value for each node? > > As far as I know, local disk space is not a valid type for GRES. > > https://slurm.schedmd.com/gres.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_gres.html&d=DwMFaQ&c=iORugZls2LlYyCAZRB3XLg&r=Ct3zEADMmPgyUYfpHDJQWaWsE9mNbEHEhGxpsYoThbE&m=qpy9RbpYHEmd6jqfs9j8b4IiRJf3GkO-X_v05nL-8Bo&s=B5wXNgnl6EdXIaS0QQpITnBTSxcjnAHJENyJGLyWltI&e=> > > "Generic resource (GRES) scheduling is supported through a flexible plugin > mechanism. Support is currently provided for Graphics Processing Units > (GPUs), CUDA Multi-Process Service (MPS), and IntelĀ® Many Integrated Core > (MIC) processors." > > The only valid solution I've found for scratch is to: > > In slurm.conf, define the location of local scratch globally via TmpFS. > > And then the amount per host is defined via TmpDisk=xxx. > > Then the request for srun/sbatch via --tmp=X > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > ------------------------------ > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > navin srivastava <navin.alt...@gmail.com> > *Sent:* Tuesday, April 14, 2020 7:32 AM > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] How to request for the allocation of scratch > . > > > Any suggestion on the above query.need help to understand it. > Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will > reserve the 500GB from scratch. > let me know if my assumption is correct? > > Regards > Navin. > > > On Mon, Apr 13, 2020 at 11:10 AM navin srivastava <navin.alt...@gmail.com> > wrote: > > Hi Team, > > i wanted to define a mechanism to request the local disk space while > submitting the job. > > we have dedicated /scratch of 1.2 TB file system for the execution of the > job on each of the compute nodes other than / and other file system. > i have defined in slurm.conf as TmpFS=/scratch and then wanted to use > #SBATCH --scratch =10GB in the request. > but it seems it is not accepting this variable except /tmp. > > Then i have opted the mechanism of gres.conf > > GresTypes=gpu,scratch > > and defined each node the scratch value and then requested using > --gres=lscratch:10GB > but in this scenario if requesting both gres resources gpu as well as > scratch it show me only scratch in my Gres resource not gpu. > does it using the gpu also as a gres resource? > > could anybody please advice which is the correct method to achieve the > same? > Also, is scratch will be able to calculate the actual usage value on the > node. > > REgards > Navin. > > > > > > > > > > > > > > > > > > > > > > > > >