Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-05 Thread Chris Samuel
On Tuesday, 5 May 2020 3:21:45 PM PDT Dustin Lang wrote: > Since this happens on a fresh new database, I just don't understand how I > can get back to a basic functional state. This is exceedingly frustrating. I have to say that if you're seeing this with 17.11, 18.08 and 19.05 and this only

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-05 Thread Chris Samuel
On Tuesday, 5 May 2020 4:47:12 PM PDT Maria Semple wrote: > I'd like to set different resource limits for different steps of my job. A > sample script might look like this (e.g. job.sh): > > #!/bin/bash > srun --cpus-per-task=1 --mem=1 echo "Starting..." > srun --cpus-per-task=4 --mem=250

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-05 Thread Chris Samuel
On Tuesday, 5 May 2020 3:48:22 PM PDT Sean Crosby wrote: > sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4 Also don't forget you need to tell Slurm to enforce QOS limits with: AccountingStorageEnforce=safe,qos in your Slurm configuration ("safe" is good to set, and turns on

[slurm-users] Job Step Resource Requests are Ignored

2020-05-05 Thread Maria Semple
Hi! I'd like to set different resource limits for different steps of my job. A sample script might look like this (e.g. job.sh): #!/bin/bash srun --cpus-per-task=1 --mem=1 echo "Starting..." srun --cpus-per-task=4 --mem=250 --exclusive srun --cpus-per-task=1 --mem=1 echo "Finished." Then I

Re: [slurm-users] Major newbie - Slurm/jupyterhub

2020-05-05 Thread Lisa Kay Weihl
Hi Michael, I get the gist of everything you mentioned but now I feel even more overwhelmed. Can I not get jupyterhub up and running without all those modules pieces? I was hoping to have a base kernel in jupyterhub that contained a lot of the data science packages. I'm going to have some

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-05 Thread Sean Crosby
Hi Thomas, That value should be sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Wed, 6 May 2020 at 04:53,

[slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-05 Thread Dustin Lang
Hi, I've just upgraded to slurm 19.05.5. With either my old database, OR creating an entirely new database, I am unable to create a new 'cluster' entry in the database -- slurmdbd is segfaulting! # sacctmgr add cluster test3 Adding Cluster(s) Name = test3 Would you like to commit

Re: [slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS

2020-05-05 Thread Dustin Lang
I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentation Fault! On Tue, May 5, 2020 at 2:39 PM Dustin Lang wrote: > Hi, > > Apparently my colleague upgraded the mysql client and server, but, as far > as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql >

Re: [slurm-users] Limit the number of GPUS per user per partition

2020-05-05 Thread Theis, Thomas
Hey Killian, I tried to limit the number of gpus a user can run on at a time by adding MaxTRESPerUser = gres:gpu4 to both the user and the qos.. I restarted slurm control daemon and unfortunately I am still able to run on all the gpus in the partition. Any other ideas? Thomas Theis From:

Re: [slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS

2020-05-05 Thread Dustin Lang
Hi, Apparently my colleague upgraded the mysql client and server, but, as far as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql release notes I don't see anything that looks suspicious there... cheers, --dustin On Tue, May 5, 2020 at 1:37 PM Dustin Lang wrote: > Hi, > >

[slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS

2020-05-05 Thread Dustin Lang
Hi, We're running Slurm 17.11.12. Everything has been working fine, and then suddenly slurmctld is crashing and slurmdbd is crashing. We use fair-share as part of the queuing policy, and previously set up accounts with sacctmgr; that has been working fine for months. If I run slurmdbd in debug

Re: [slurm-users] Major newbie - Slurm/jupyterhub

2020-05-05 Thread Renfro, Michael
Aside from any Slurm configuration, I’d recommend setting up a modules [1 or 2] folder structure for CUDA and other third-party software. That handles LD_LIBRARY_PATH and other similar variables, reduces the chances for library conflicts, and lets users decide their environment on a per-job

Re: [slurm-users] Major newbie - Slurm/jupyterhub

2020-05-05 Thread Lisa Kay Weihl
Thanks Guy, I did find that there was a jupyterhub_slurmspawner log in my home directory. That enabled me to find out that it could not find the path for batchspawner-singleuser. So I added this to jupyter_config.py export PATH=/opt/rh/rh-python36/root/bin:$PATH That seemed to now allow

Re: [slurm-users] how to restrict jobs

2020-05-05 Thread Renfro, Michael
Haven’t done it yet myself, but it’s on my todo list. But I’d assume that if you use the FlexLM or RLM parts of that documentation, that Slurm would query the remote license server periodically and hold the job until the necessary licenses were available. > On May 5, 2020, at 8:37 AM, navin

Re: [slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
Thanks Michael, yes i have gone through but the licenses are remote license and it will be used by outside as well not only in slurm. so basically i am interested to know how we can update the database dynamically to get the exact value at that point of time. i mean query the license server and

Re: [slurm-users] how to restrict jobs

2020-05-05 Thread Renfro, Michael
Have you seen https://slurm.schedmd.com/licenses.html already? If the software is just for use inside the cluster, one Licenses= line in slurm.conf plus users submitting with the -L flag should suffice. Should be able to set that license value is 4 if it’s licensed per node and you can run up

[slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava
Hi Team, we have an application whose licenses is limited .it scales upto 4 nodes(~80 cores). so if 4 nodes are full, in 5th node job used to get fail. we want to put a restriction so that the application can't go for the execution beyond the 4 nodes and fail it should be in queue state. i do not

Re: [slurm-users] Major newbie - Slurm/jupyterhub

2020-05-05 Thread Josef Dvoracek
Hi, please post also the stdout/stderr of the job7117.. What I don't see in UR config and I do have there is: c.SlurmSpawner.hub_connect_ip = '192.168.1.1' #- the IP where slurm job will try to connect to jupyterhub. Also check if port 8081 is reachable from compute nodes. -- josef On

Re: [slurm-users] Major newbie - Slurm/jupyterhub

2020-05-05 Thread Guy Coates
Hi Lisa, Below is my jupyterhub slurm config. It uses the profiles, which allows you to spawn different sized jobs. I found the most useful thing for debugging is to make sure that the --output option is being honoured; any jupyter python errors will end up there, and to to explicitly set the