Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi, Thanks for your script. with this i am able to show the limit what i set. but this limt is not working. MaxJobs =3, current value = 0 Regards Navin. On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen wrote: > On 2/17/20 11:16 AM, navin srivastava wrote: > > i have

Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
t; Why do you think the limit is not working? The MaxJobs limits the number > of running jobs to 3, but you can still submit as many jobs as you like! > > See "man sacctmgr" for definitions of the limits MaxJobs as well as > MaxSubmitJobs. > > /Ole > > On 2/17/

[slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi Team, i have an issue with the slurm job limit. i applied the Maxjobs limit on user using sacctmgr modify user navin1 set maxjobs=3 but still i see this is not getting applied. i am still bale to submit more jobs. Slurm version is 17.11.x Let me know what setting is required to implement

Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
the explanation for each are found on the > Resource Limits document. > > /Ole > > On 2/17/20 12:20 PM, navin srivastava wrote: > > Hi ole, > > > > i am submitting 100 of jobs are i see all jobs starting at the same time > > and all job is going into the run s

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
es unless the SchedulerParameters > configuration parameter includes the "default_gbytes" option for gigabytes. > Different units can be specified using the suffix [K|M|G|T]. > https://slurm.schedmd.com/sbatch.html > > > > --- > Erik Ellestad > Wynton Cluster

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
ine the location of local scratch globally via TmpFS. > > And then the amount per host is defined via TmpDisk=xxx. > > Then the request for srun/sbatch via --tmp=X > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on

[slurm-users] How to request for the allocation of scratch .

2020-04-12 Thread navin srivastava
Hi Team, i wanted to define a mechanism to request the local disk space while submitting the job. we have dedicated /scratch of 1.2 TB file system for the execution of the job on each of the compute nodes other than / and other file system. i have defined in slurm.conf as TmpFS=/scratch and

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-20 Thread navin srivastava
> Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Wednesday, April 15, 2020 10:37 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the alloca

[slurm-users] log rotation for slurmctld.

2020-03-13 Thread navin srivastava
Hi, i wanted to understand how log rotation of slurmctld works. in my environment i don't have any logrotation for the slurmctld.log and now the log file size reached to 125GB. can i move the log file to some other location and then restart.reload of slurm service will start a new log file.i

Re: [slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava
PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE could you please also suggest here if the scheduling policy is fairshare then still it will consider the priority over the partition? Regards Navin. On Sat, Apr 4, 2020 at 8:34 PM navin srivastava wrote: > Hi Team, > > I

[slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava
Hi Team, I am facing one issue in my environment. our slurm version is 17.11.x My question is i have 2 partition: Queue A with node1 and node2 with Priority=1000 shared=yes Queue B with node1 and node2 with priority=100. shared =yes Problem is when job from A partition is running then the

[slurm-users] not allocating the node for job execution even resources are available.

2020-03-31 Thread navin srivastava
Hi , have an issue with the resource allocation. In the environment have partition like below: PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=8000 PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES

Re: [slurm-users] not allocating the node for job execution even resources are available.

2020-04-01 Thread navin srivastava
a different partition. On Tue, Mar 31, 2020 at 4:34 PM navin srivastava wrote: > Hi , > > have an issue with the resource allocation. > > In the environment have partition like below: > > PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE > State=UP

Re: [slurm-users] not allocating jobs even resources are free

2020-04-26 Thread navin srivastava
get through but reading > through it multiple times opens many doors. > > DefaultTime is listed in there as a Partition option. > If you are scheduling gres/gpu resources, it's quite possible there are > cores available with no corresponding gpus avail. > > -b > > On 4/24/2

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
PRIORITY FAIRSHARE 1291339 GPUsmall 21052 21053 On Fri, Apr 24, 2020 at 11:14 PM navin srivastava wrote: > Hi Team, > > we are facing some issue in our environment. The resources are free but > job is going into the QUEUE state but not running. > > i have attached t

[slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
Hi Team, we are facing some issue in our environment. The resources are free but job is going into the QUEUE state but not running. i have attached the slurm.conf file here. scenario:- There are job only in the 2 partitions: 344 jobs are in PD state in normal partition and the node belongs

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava
r users are not > specifying a reasonable timelimit to their jobs, this won't help either. > > > -b > > > On 4/24/20 1:52 PM, navin srivastava wrote: > > In addition to the above when i see the sprio of both the jobs it says :- > > for normal queue jobs all jobs showing the sa

Re: [slurm-users] not allocating jobs even resources are free

2020-05-04 Thread navin srivastava
Thanks Denial for detailed Description Regards Navin On Sun, May 3, 2020, 13:35 Daniel Letai wrote: > > On 29/04/2020 12:00:13, navin srivastava wrote: > > Thanks Daniel. > > All jobs went into run state so unable to provide the details but > definitely will reach out la

Re: [slurm-users] not allocating jobs even resources are free

2020-04-29 Thread navin srivastava
ly help if you pasted the results of: > > squeue > > sinfo > > > As well as the exact sbatch line, so we can see how many resources per > node are requested. > > > On 26/04/2020 12:00:06, navin srivastava wrote: > > Thanks Brian, > > As suggested i gone

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-14 Thread navin srivastava
Any suggestion on the above query.need help to understand it. Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will reserve the 500GB from scratch. let me know if my assumption is correct? Regards Navin. On Mon, Apr 13, 2020 at 11:10 AM navin srivastava wrote: > Hi T

Re: [slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
run from 1-4 nodes. > > There are also options to query a FlexLM or RLM server for license > management. > > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On May 5, 2020, at

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
On May 5, 2020, at 8:37 AM, navin srivastava > wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Thanks Michael, > > >

[slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
Hi Team, we have an application whose licenses is limited .it scales upto 4 nodes(~80 cores). so if 4 nodes are full, in 5th node job used to get fail. we want to put a restriction so that the application can't go for the execution beyond the 4 nodes and fail it should be in queue state. i do not

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
er=flex_host servertype=flexlm type=license > > and submit jobs with a '-L software_name:N’ flag where N is the number of > nodes you want to run on. > > > On May 6, 2020, at 5:33 AM, navin srivastava > wrote: > > > > Thanks Micheal. > > > > Actually on

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
n an available node being used by JOBID. Add > other parameters as required for cpus-per-task, time limits, or whatever > else is needed. If you start the larger jobs first, and let the later jobs > fill in on idle CPUs on those nodes, it should work. > > > On May 6, 2020, at 9:46 A

[slurm-users] is there a way to delay the scheduling.

2020-08-28 Thread navin srivastava
Hi Team, facing one issue. several users submitting 2 job in a single batch job which is very short jobs( says 1-2 sec). so while submitting more job slurmctld become unresponsive and started giving message ending job 6e508a88155d9bec40d752c8331d7ae8 to queue. sbatch: error: Batch job

[slurm-users] slurm Report

2020-09-24 Thread navin srivastava
Hi team, i have extracted the %utilization report and found that the idle time is at the higher end so wanted to check is there any way we can find the node based utilization? it will help us to figure out what are the nodes are unutilized. REgards navin.

[slurm-users] federation cluster management

2020-09-21 Thread navin srivastava
Deall all, I read the concept of federation clusters in Slurm. is it really helpful to maximize the cluster usage? Actually we have 4 independent clusters with slurm which works with local storage and wanted to build a federation cluster where we can be able to utilize the free available compute

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-30 Thread navin srivastava
Hi Team, I have differentiated the CPU node and GPU nodes into two different queues. Now I have 20 Nodes having CPUS (20 cores)only but no GPU. Another set of nodes having GPU+CPU.some nodes are with 2 GPU and 20 CPU and some are with 8GPU and 48 CPU assigned to GPU queue user facing issues

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
> > Brian Andrus > > On 7/8/2020 10:57 PM, navin srivastava wrote: > > Hi Team, > > > > i have 2 small query.because of the lack of testing environment i am > > unable to test the scenario. working on to set up a test environment. > > > > 1. In

[slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava
Hi Team, We have separate partitions for the GPU nodes and only CPU nodes . scenario: the jobs submitted in our environment is 4CPU+1GPU as well as 4CPU only in nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted and rest other jobs are in queue waiting for the availability of GPU

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
gt; If you run slurmd -C on the compute node, it should tell you what > > slurm thinks the RealMemory number is. > > > > Jeff > > > > -------- > > *From:* slurm-users on behalf > of > > navi

Re: [slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava
y > can complete without delaying the estimated start time of higher priority > jobs. > > On Jul 13, 2020, at 4:18 AM, navin srivastava > wrote: > > Hi Team, > > We have separate partitions for the GPU nodes and only CPU nodes . > > scenario: the jobs submitted in our

[slurm-users] changes in slurm.

2020-07-09 Thread navin srivastava
Hi Team, i have 2 small query.because of the lack of testing environment i am unable to test the scenario. working on to set up a test environment. 1. In my environment i am unable to pass #SBATCH --mem-2GB option. i found the reason is because there is no RealMemory entry in the node definition

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread navin srivastava
would > have to (a) not require a GPU, (b) require a limited number of CPUs per > node, so that you'd have some CPUs available for GPU jobs on the nodes > containing GPUs. > > ---------- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Saturday

[slurm-users] Changing job order

2020-06-17 Thread navin srivastava
Hi Team, Is their a way to change the job order in slurm.similar to sorder in PBS. I want to swap my job from the other top job. Regards Navin

Re: [slurm-users] Changing job order

2020-06-18 Thread navin srivastava
Thanks Ole. Regards Navin On Thu, Jun 18, 2020 at 11:56 AM Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> wrote: > The scontrol command to set the nice level is on the list here: > https://wiki.fysik.dtu.dk/niflheim/SLURM#useful-commands > > /Ole > > On 6/18/20 8:05 AM

Re: [slurm-users] Changing job order

2020-06-18 Thread navin srivastava
; modify the order of execution. > > El mié., 17 jun. 2020 a las 12:31, navin srivastava (< > navin.alt...@gmail.com>) escribió: > >> Hi Team, >> >> Is their a way to change the job order in slurm.similar to sorder in PBS. >> >> I want to swap my job from the other top job. >> >> Regards >> Navin >> >>

[slurm-users] Job failure issue in Slurm

2020-06-04 Thread navin srivastava
Hi Team, i am seeing a weird issue in my environment. one of the gaussian job is failing with the slurm within a minute after it go for the execution without writing anything and unable to figure out the reason. The same job works fine without slurm on the same node. slurmctld.log

Re: [slurm-users] Job failure issue in Slurm

2020-06-08 Thread navin srivastava
s working earlier or is this the first time are you trying ? > Are you using pam module ? if yes, try disabling the pam module and see > if it works. > > Thanks > Sathish > > On Thu, Jun 4, 2020 at 10:47 PM navin srivastava > wrote: > >> Hi Team, >> >&g

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
; > For example, > > > > # /usr/local/slurm/sbin/slurmd -D > > > > Just it ^C when you’re done, if necessary. Of course, if it doesn’t fail > when you run it this way, it’s time to look elsewhere. > > > > Andy > > > > *From:* slurm-users [mailt

[slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
Hi Team, when i am trying to start the slurmd process i am getting the below error. 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node daemon... 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start operation timed out. Terminating.

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
urm or the like is messed up? > > > > If that’s not the case, I think my next step would be to follow up on > someone else’s suggestion, and scan the slurmctld.log file for the problem > node name. > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.c

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava
shown for “NodeAddr=” > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 10:40 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process.

Re: [slurm-users] unable to start slurmd process.

2020-06-12 Thread navin srivastava
o:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 11:31 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > i am able to get the output scontrol show nod

[slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-12 Thread navin srivastava
Hi All, In our environment we have GPU. so what i found is if the user having high priority and his job is in queue and waiting for the GPU resources which are almost full and not available. so the other user submitted the job which does not require the GPU resources are in queue even though lots

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
Yes we have separate partitions. Some are specific to gpu having 2 nodes with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very few nodes are without any gpu. Regards Navin On Sat, Jun 13, 2020, 21:11 navin srivastava wrote: > Thanks Renfro. > > Yes we have b

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
d non-GPU jobs? Do you > have nodes without GPUs? > > On Jun 13, 2020, at 12:28 AM, navin srivastava > wrote: > > Hi All, > > In our environment we have GPU. so what i found is if the user having high > priority and his job is in queue and waiting for the G

Re: [slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
Are you using federated clusters? If not, check slurm.conf -- do you > > have FirstJobId set? > > > > Andy > > > > On 11/18/2020 8:42 AM, navin srivastava wrote: > >> While running the sacct we found that some jobid are not listing. > >> > >> 55

[slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
While running the sacct we found that some jobid are not listing. 5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 11016496 jupyter-s+ stdg_defq stdg_acc 1RUNNING 0:0

Re: [slurm-users] Sreport Query

2020-11-17 Thread navin srivastava
is there a way to find the utilization per Node? Regards Navin. On Wed, Nov 18, 2020 at 10:37 AM navin srivastava wrote: > Dear All, > > Good Day! > > i am seeing one strange behaviour in my environment. > > we have 2 clusters in our environment one acting as a datab

[slurm-users] Slurm Upgrade

2020-11-02 Thread navin srivastava
Dear All, Currently we are running slurm version 17.11.x and wanted to move to 20.x. We are building the New server with Slurm 20.2 version and planning to upgrade the client nodes from 17.x to 20.x. wanted to check if we can upgrade the Client from 17.x to 20.x directly or we need to go

Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread navin srivastava
by 18.x and 19.x or i can uninstall the slurm 17.11.8 and install 20.2 on all compute nodes. Regards Navin. On Tue, Nov 3, 2020 at 12:31 PM Ole Holm Nielsen wrote: > On 11/2/20 2:25 PM, navin srivastava wrote: > > Currently we are running slurm version 17.11.x and wanted to mov

[slurm-users] Sinfo or squeue stuck for some seconds

2021-08-29 Thread navin srivastava
Dear slurm community users, We are using slurm version 20.02.x. We see the below message appearing a lot of times in slurmctld log and found that whenever this message is appearing the sinfo/squeue out gets slow. No timeout as i kept the value 100. Warning: Note very large processing time from

[slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Hi , I am looking for a stepwise guide to setup multi cluster implementation. We wanted to set up 3 clusters and one Login Node to run the job using -M cluster option. can anybody have such a setup and can share some insight into how it works and it is really a stable solution. Regards Navin.

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
each other. > > So if you set up one database server (running slurmdbd), and then a > SLURM controller for each cluster (running slurmctld) using that one > central database, the '-M' option should work. > > Tina > > On 28/10/2021 10:54, navin srivastava wrote: > >

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
low access to both. > That > > do? I don't think a third would make any difference in setup. > > > > They need to share a database. As long as the share a database, the > > clusters have 'knowledge' of each other. > > > > So if you set up one databas

[slurm-users] maridb version compatibility with Slurm version

2022-08-24 Thread navin srivastava
Hi, I have a question related to the mariadb vs slurm version compatibility. Is there any matrix available? We are running with slurm version 20.02 in our environment on SLES15SP3 and with mariadb 10.5.x . We are upgrading the OS from SLES15SP3 to SP4 and with this we see the mariadb version is