On Tuesday, 5 May 2020 3:21:45 PM PDT Dustin Lang wrote:
> Since this happens on a fresh new database, I just don't understand how I
> can get back to a basic functional state. This is exceedingly frustrating.
I have to say that if you're seeing this with 17.11, 18.08 and 19.05 and this
only
On Tuesday, 5 May 2020 4:47:12 PM PDT Maria Semple wrote:
> I'd like to set different resource limits for different steps of my job. A
> sample script might look like this (e.g. job.sh):
>
> #!/bin/bash
> srun --cpus-per-task=1 --mem=1 echo "Starting..."
> srun --cpus-per-task=4 --mem=250
On Tuesday, 5 May 2020 3:48:22 PM PDT Sean Crosby wrote:
> sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4
Also don't forget you need to tell Slurm to enforce QOS limits with:
AccountingStorageEnforce=safe,qos
in your Slurm configuration ("safe" is good to set, and turns on
Hi!
I'd like to set different resource limits for different steps of my job. A
sample script might look like this (e.g. job.sh):
#!/bin/bash
srun --cpus-per-task=1 --mem=1 echo "Starting..."
srun --cpus-per-task=4 --mem=250 --exclusive
srun --cpus-per-task=1 --mem=1 echo "Finished."
Then I
Hi Michael,
I get the gist of everything you mentioned but now I feel even more
overwhelmed. Can I not get jupyterhub up and running without all those modules
pieces? I was hoping to have a base kernel in jupyterhub that contained a lot
of the data science packages. I'm going to have some
Hi Thomas,
That value should be
sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia
On Wed, 6 May 2020 at 04:53,
Hi,
I've just upgraded to slurm 19.05.5.
With either my old database, OR creating an entirely new database, I am
unable to create a new 'cluster' entry in the database -- slurmdbd is
segfaulting!
# sacctmgr add cluster test3
Adding Cluster(s)
Name = test3
Would you like to commit
I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentation
Fault!
On Tue, May 5, 2020 at 2:39 PM Dustin Lang wrote:
> Hi,
>
> Apparently my colleague upgraded the mysql client and server, but, as far
> as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql
>
Hey Killian,
I tried to limit the number of gpus a user can run on at a time by adding
MaxTRESPerUser = gres:gpu4 to both the user and the qos.. I restarted slurm
control daemon and unfortunately I am still able to run on all the gpus in the
partition. Any other ideas?
Thomas Theis
From:
Hi,
Apparently my colleague upgraded the mysql client and server, but, as far
as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql
release notes I don't see anything that looks suspicious there...
cheers,
--dustin
On Tue, May 5, 2020 at 1:37 PM Dustin Lang wrote:
> Hi,
>
>
Hi,
We're running Slurm 17.11.12. Everything has been working fine, and then
suddenly slurmctld is crashing and slurmdbd is crashing.
We use fair-share as part of the queuing policy, and previously set up
accounts with sacctmgr; that has been working fine for months.
If I run slurmdbd in debug
Aside from any Slurm configuration, I’d recommend setting up a modules [1 or 2]
folder structure for CUDA and other third-party software. That handles
LD_LIBRARY_PATH and other similar variables, reduces the chances for library
conflicts, and lets users decide their environment on a per-job
Thanks Guy, I did find that there was a jupyterhub_slurmspawner log in my home
directory. That enabled me to find out that it could not find the path for
batchspawner-singleuser.
So I added this to jupyter_config.py
export PATH=/opt/rh/rh-python36/root/bin:$PATH
That seemed to now allow
Haven’t done it yet myself, but it’s on my todo list.
But I’d assume that if you use the FlexLM or RLM parts of that documentation,
that Slurm would query the remote license server periodically and hold the job
until the necessary licenses were available.
> On May 5, 2020, at 8:37 AM, navin
Thanks Michael,
yes i have gone through but the licenses are remote license and it will be
used by outside as well not only in slurm.
so basically i am interested to know how we can update the database
dynamically to get the exact value at that point of time.
i mean query the license server and
Have you seen https://slurm.schedmd.com/licenses.html already? If the software
is just for use inside the cluster, one Licenses= line in slurm.conf plus users
submitting with the -L flag should suffice. Should be able to set that license
value is 4 if it’s licensed per node and you can run up
Hi Team,
we have an application whose licenses is limited .it scales upto 4
nodes(~80 cores).
so if 4 nodes are full, in 5th node job used to get fail.
we want to put a restriction so that the application can't go for the
execution beyond the 4 nodes and fail it should be in queue state.
i do not
Hi,
please post also the stdout/stderr of the job7117..
What I don't see in UR config and I do have there is:
c.SlurmSpawner.hub_connect_ip = '192.168.1.1' #- the IP where slurm job
will try to connect to jupyterhub.
Also check if port 8081 is reachable from compute nodes.
--
josef
On
Hi Lisa,
Below is my jupyterhub slurm config. It uses the profiles, which allows you
to spawn different sized jobs. I found the most useful thing for debugging
is to make sure that the --output option is being honoured; any jupyter
python errors will end up there, and to to explicitly set the
19 matches
Mail list logo