://slurm.schedmd.com/slurm.conf.html#OPT_DefMemPerCPU
From: Alison Peterson
Date: Thursday, April 4, 2024 at 11:58 AM
To: Renfro, Michael
Subject: Re: [EXT] Re: [slurm-users] SLURM configuration help
External Email Warning
This email originated from outside the university. Please use caution when
What does “scontrol show node cusco” and “scontrol show job PENDING_JOB_ID”
show?
On one job we currently have that’s pending due to Resources, that job has
requested 90 CPUs and 180 GB of memory as seen in its ReqTRES= value, but the
node it wants to run on only has 37 CPUs available (seen by
“An LDAP user can login to the login, slurmctld and compute nodes, but when
they try to submit jobs, slurmctld logs an error about invalid account or
partition for user.”
Since I don’t think it was mentioned below, does a non-LDAP user get the same
error, or does it work by default?
We don’t
You can attack this in a few different stages. A lot of what you’re interested
in will be found at various university or national lab sites (I Googled “sbatch
example” for the one below)
1. If you’re good with doing a “make -j” to parallelize a make compilation
over multiple CPUs in a
Is this Northwestern’s Quest HPC or another one? I know at least a few of the
people involved with Quest, and I wouldn’t have thought they’d be in dire need
of coaching.
And to follow on with Davide’s point, this really sounds like a case for
submitting multiple jobs with dependencies between
I’d probably default to OpenHPC just for the community around it, but I’ll also
note that TrinityX might not have had any commits in their GitHub for an
18-month period (unless I’m reading something wrong).
On Oct 3, 2023, at 5:51 AM, John Joseph wrote:
External Email Warning
This email
Given a job ID:
scontrol show hostnames $(scontrol show job some_job_id | grep ' NodeList=' |
cut -d= -f2) | paste -sd,
Maybe there’s something more built-in than this, but it gets the job done.
From: slurm-users on behalf of Alain O'
Miniussi
Date: Thursday, August 17, 2023 at 7:46 AM
To:
If there’s a fairshare component to job priorities, and there’s a share
assigned to each user under the account, wouldn’t the light user’s jobs move
ahead of any of the heavy user’s pending jobs automatically?
From: slurm-users on behalf of "Groner,
Rob"
Reply-To: Slurm User Community List
Going in a completely different direction than you’d planned, but for the same
goal, what about making a script (shell, Python, or otherwise) that could
validate all the constraints and call the scontrol program if appropriate, and
then run that script via “sudo” as one of the regular users?
This should work:
sacctmgr add user someuser account=newaccount # adds user to new account
sacctmgr modify user where user=someuser set defaultaccount=newaccount # change
default
sacctmgr remove user where user=someuser and account=oldaccount # remove from
old account
From: slurm-users on
Someone else may see another option, but NVIDIA MIG seems like the
straightforward option. That would require both a Slurm upgrade and the
purchase of MIG-capable cards.
https://slurm.schedmd.com/gres.html#MIG_Management
Would be able to host 7 users per A100 card, IIRC.
On Apr 3, 2022, at
Slurm supports a l3_cache_as_socket [1] parameter in recent releases. That
would make an Epyc system, for example, appear to have many more sockets than
physically exist, and that should help ensure threads in a single task share a
cache.
You’d want to run slurmd -C on a node with that setting
For later reference, [1] should be the (current) authoritative source on data
types for the job_desc values: some strings, some numbers, some booleans.
[1]
https://github.com/SchedMD/slurm/blob/4c21239d420962246e1ac951eda90476283e7af0/src/plugins/job_submit/lua/job_submit_lua.c#L450
From:
mstadt
Tel: +49 6151 16-21469
Alarich-Weiss-Straße 10
64287 Darmstadt
Office: L2|06 410
On 1/30/22 21:14, Renfro, Michael wrote:
You can. We use:
sacctmgr show assoc where account=researchgroup format=user,share
to see current fairshare within the account, and:
sacctmgr modif
You can. We use:
sacctmgr show assoc where account=researchgroup format=user,share
to see current fairshare within the account, and:
sacctmgr modify user where name=someuser account=researchgroup
set fairshare=N
to modify a particular user's fairshare within the account.
Since there's only 9 factors to assign priority weights to, one way around this
might be to set up separate partitions for high memory and low memory jobs
(with a max memory allowed for the low memory partition), and then use
partition weights to separate those jobs out.
From: slurm-users on
Not answering every question below, but for (1) we're at 200 on a cluster with
a few dozen nodes and around 1k cores, as per
https://lists.schedmd.com/pipermail/slurm-users/2021-June/007463.html -- there
may be other settings in that email that could be beneficial. We had a lot of
idle
Untested, but given a common service account with a GPG key pair, a user with a
GPG key pair, and the EncFS encrypted with a password, the user could encrypt a
password with their own private key and the service account's public key, and
leave it alongside the EncFS.
If the service account is
/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
From: slurm-users On Behalf Of Renfro,
Michael
Sent: Friday, November 26, 2021 8:15 AM
To: Slurm User Community List
Subject: [EXTERNAL] Re: [slurm-users] Reserving cores without immediately
launching
The end of the MPICH section at [1] shows an example using salloc [2].
Worst case, you should be able to use the output of “scontrol show hostnames”
[3] and use that data to make mpiexec command parameters to run one rank per
node, similar to what’s shown at the end of the synopsis section of
return slurm.ERROR
end
end
end
Fritz Ratnasamy
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
On Mon, Sep 27, 2021 at 1:40 PM Renfro, Michael
mailto:ren...@tntec
m.conf/ is there any Slurm service to restart after that?
Thanks again
Fritz Ratnasamy
Data Scientist
Information Technology
The University of Chicago
Booth School of Business
5807 S. Woodlawn
Chicago, Illinois 60637
Phone: +(1) 773-834-4556
On Sat, Sep 25, 2021 at 11:08 AM Renfro, Michael
mail
If you haven't already seen it there's an example Lua script from SchedMD at
[1], and I've got a copy of our local script at [2]. Otherwise, in the order
you asked:
1. That seems reasonable, but our script just checks if there's a gres at
all. I don't *think* any gres other than gres=gpu
If you're not the cluster admin, you'll want to check with them, but that
should be related to a limit in how many node-hours an association (a unique
combination of user, cluster, partition, and account) can have in running or
pending state. Further jobs would get blocked to allow others' jobs
I can imagine at least the following causing differences in the estimated time
and the actual start time:
* If running users have overestimated their job times, and their jobs
finish earlier than expected, the original estimate will be high.
* If another user's job submission gets
Not a solution to your exact problem, but we document partitions for
interactive, debug, and batch, and have a job_submit.lua [1] that routes
GPU-reserving jobs to gpu-interactive, gpu-debug, and gpu partitions
automatically. Since our GPU nodes have extra memory slots, and have tended to
run
Did Diego's suggestion from [1] not help narrow things down?
[1] https://lists.schedmd.com/pipermail/slurm-users/2021-August/007708.html
From: slurm-users on behalf of Jack
Chen
Date: Tuesday, August 10, 2021 at 10:08 AM
To: Slurm User Community List
Subject: Re: [slurm-users] Compact
Not sure it would work out to 60k queued jobs, but we're using:
SchedulerParameters=bf_window=43200,bf_resolution=2160,bf_max_job_user=80,bf_continue,default_queue_depth=200
in our setup. bf_window is driven by our 30-day max job time, bf_resolution is
at 5% of that time, and the other values
re Munich (HMGU)
-
From: slurm-users On Behalf Of Renfro,
Michael
Sent: Tuesday, 8 June 2021 20:12
To: Slurm User Community List
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed
Any reason *not* to create an array of 100k jobs and let the scheduler just
handle things? Current ve
Any reason *not* to create an array of 100k jobs and let the scheduler just
handle things? Current versions of Slurm support arrays of up to 4M jobs, and
you can limit the number of jobs running simultaneously with the '%' specifier
in your array= sbatch parameter.
From: slurm-users on behalf
Untested, but prior experience with cgroups indicates that if things are
working correctly, even if your code tries to run as many processes as you have
cores, those processes will be confined to the cores you reserve.
Try a more compute-intensive worker function that will take some seconds or
could inquire at.
[1] https://github.com/ubccr/xdmod/releases/tag/v9.5.0-rc.4
From: Diego Zuccato
Date: Wednesday, May 12, 2021 at 8:37 AM
To: Renfro, Michael
Cc: Slurm User Community List
Subject: Re: [slurm-users] Cluster usage, filtered by partition
Il 12/05/21 13:30, Diego Zuccato ha s
://xdmod.ccr.buffalo.edu/ — may be the easiest way to explore it.
On May 12, 2021, at 3:52 AM, Diego Zuccato wrote:
Il 11/05/21 21:20, Renfro, Michael ha scritto:
In a word, nothing that's guaranteed to be stable. I got my start from
this reply on the XDMoD list in November 2019. Worked on 8.0:
Tks for the hint
usage, filtered by partition
On Tue, May 11, 2021 at 5:55 AM Renfro, Michael wrote:
>
> XDMoD [1] is useful for this, but it’s not a simple script. It does have some
> user-accessible APIs if you want some report automation. I’m using that to
> create a lightning-talk-style slide at
XDMoD [1] is useful for this, but it’s not a simple script. It does have some
user-accessible APIs if you want some report automation. I’m using that to
create a lightning-talk-style slide at [2].
[1] https://open.xdmod.org/
[2] https://github.com/mikerenfro/one-page-presentation-hpc
On May
I’ve used the structure at
https://gist.github.com/mikerenfro/92d70562f9bb3f721ad1b221a1356de5 to handle
basic test/production branching. I can isolate the new behavior down to just a
specific set of UIDs that way.
Factoring out code into separate functions helps, too.
I’ve seen others go so
You'll definitely need to get slurmd and slurmctld working before proceeding
further. slurmctld is the Slurm controller mentioned when you do the srun.
Though there's probably some other steps you can take to make the slurmd and
slurmctld system services available, it might be simpler to do the
I can't speak to what happens on node failure, but I can at least get you a
greatly simplified pair of scripts that will run only one copy on each node
allocated:
#!/bin/bash
# notarray.sh
#SBATCH --nodes=28
#SBATCH --ntasks-per-node=1
#SBATCH --no-kill
echo "notarray.sh is running on
I'll never miss an opportunity to plug XDMoD for anyone who doesn't want to
write custom analytics for every metric. I've managed to get a little bit into
its API to extract current values for number of jobs completed and the number
of CPU-hours provided, and insert those into a single slide
I'd probably write a shell function that would calculate the time required, and
add it as a command-line parameter to sbatch. We do a similar thing for easier
interactive shells in our /etc/profile.d folder on the login node:
function hpcshell() {
srun --partition=interactive $@ --pty bash -i
Just a starting guess, but are you certain the MATLAB script didn’t try to
allocate enormous amounts of memory for variables? That’d be about 16e9
floating point values, if I did the units correctly.
On Mar 15, 2021, at 12:53 PM, Chin,David wrote:
External Email Warning
This email
There may be prettier ways, but this gets the job done. Captures the output
from each sbatch command to get a job ID, colon separates the ones in the
second group, and removes the trailing colon before submitting the last job:
#!/bin/bash
JOB1=$(sbatch job1.sh | awk '{print $NF}')
echo
We have overlapping partitions for GPU work and some kinds non-GPU work (both
large memory and regular memory jobs).
For 28-core nodes with 2 GPUs, we have:
PartitionName=gpu MaxCPUsPerNode=16 … Nodes=gpunode[001-004]
PartitionName=any-interactive MaxCPUsPerNode=12 …
Yesterday, I posted
Harvard's Arts & Sciences Research Computing group has a good explanation of
these columns at https://docs.rc.fas.harvard.edu/kb/fairshare/ -- might not
answer your exact question, but it does go into how the FairShare column is
calculated.
From: slurm-users
Date: Tuesday, December 1, 2020 at
I think the answer depends on why you’re trying to prevent the observed
behavior:
* Do you want to ensure that one job requesting 9 tasks (and 1 CPU per
task) can’t overstep its reservation and take resources away from other jobs on
those nodes? Cgroups [1] should be able to confine the
From any node you can run scontrol from, what does ‘scontrol show node
GPUNODENAME | grep -i gres’ return? Mine return lines for both “Gres=” and
“CfgTRES=”.
From: slurm-users on behalf of Sajesh
Singh
Reply-To: Slurm User Community List
Date: Thursday, October 8, 2020 at 3:33 PM
To: Slurm
Depending on the users who will be on this cluster, I'd probably adjust the
partition to have a defined, non-infinite MaxTime, and maybe a lower
DefaultTime. Otherwise, it would be very easy for someone to start a job that
reserves all cores until the nodes get rebooted, since all they have to
I could have missed a detail on my description, but we definitely don’t enable
oversubscribe, or shared, or exclusiveuser. All three of those are set to “no”
on all active queues.
Current subset of slurm.conf and squeue output:
=
# egrep '^PartitionName=(gpu|any-interactive) '
Untested, but a combination of a QOS with MaxTRESPerJob=cpu=X and a partition
that allows or denies that QOS may work. A job_submit.lua should be able to
adjust the QOS of a submitted job, too.
On 9/30/20, 10:50 AM, "slurm-users on behalf of Paul Edmon"
wrote:
External Email Warning
Not having a separate test environment, I put logic into my job_submit.lua to
use either the production settings or the ones under development or testing,
based off the UID of the user submitting the job:
=
function slurm_job_submit(job_desc, part_list, submit_uid)
test_user_table = {}
We set DefMemPerCPU in each partition to approximately the amount of RAM in a
node divided by the number of cores in the node. For heterogeneous partitions,
we use a lower limit, and we always reserve a bit of RAM for the OS, too. So
for a 64 GB node with 28 cores, we default to 2000 M per CPU,
One pending job in this partition should have a reason of “Resources”. That job
has the highest priority, and if your job below would delay the
highest-priority job’s start, it’ll get pushed back like you see here.
On Aug 31, 2020, at 12:13 PM, Holtgrewe, Manuel
wrote:
Dear all,
I'm seeing
The PowerShell script I use to provision new users adds them to an Active
Directory group for HPC, ssh-es to the management node to do the sacctmgr
changes, and emails the user. Never had it fail, and I've looped over entire
class sections in PowerShell. Granted, there are some inherent delays
We’ve run a similar setup since I moved to Slurm 3 years ago, with no issues.
Could you share partition definitions from your slurm.conf?
When you see a bunch of jobs pending, which ones have a reason of “Resources”?
Those should be the next ones to run, and ones with a reason of “Priority” are
I’ve only got 2 GPUs in my nodes, but I’ve always used non-overlapping CPUs= or
COREs= settings. Currently, they’re:
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia[0-1] COREs=0-7,9-15
and I’ve got 2 jobs currently running on each node that’s available.
So maybe:
NodeName=c0005
Untested, but you should be able to use a job_submit.lua file to detect if the
job was started with srun or sbatch:
* Check with (job_desc.script == nil or job_desc.script == '')
* Adjust job_desc.time_limit accordingly
Here, I just gave people a shell function "hpcshell", which
Probably unrelated to slurm entirely, and most likely has to do with
lower-level network diagnostics. I can guarantee that it’s possible to access
Internet resources from a compute node. Notes and things to check:
1. Both ping and http/https are IP protocols, but are very different (ping
isn’t
If the 500 parameters happened to be filenames, you could do adapt like
(appropriated from somewhere else, but I can’t find the reference quickly:
=
#!/bin/bash
# get count of files in this directory
NUMFILES=$(ls -1 *.inp | wc -l)
# subtract 1 as we have to use zero-based indexing (first
“The SchedulerType configuration parameter specifies the scheduler plugin to
use. Options are sched/backfill, which performs backfill scheduling, and
sched/builtin, which attempts to schedule jobs in a strict priority order
within each partition/queue.”
There’s a --nice flag to sbatch and srun, at least. Documentation indicates it
decreases priority by 100 by default.
And untested, but it may be possible to use a job_submit.lua [1] to adjust nice
values automatically. At least I can see a nice property in [2], which I assume
means it'd be
On Sat, Jun 13, 2020, 20:37 Renfro, Michael
mailto:ren...@tntech.edu>> wrote:
Will probably need more information to find a solution.
To start, do you have separate partitions for GPU and non-GPU jobs? Do you have
nodes without GPUs?
On Jun 13, 2020, at 12:28 AM, navin srivastava
mail
Will probably need more information to find a solution.
To start, do you have separate partitions for GPU and non-GPU jobs? Do you have
nodes without GPUs?
On Jun 13, 2020, at 12:28 AM, navin srivastava wrote:
Hi All,
In our environment we have GPU. so what i found is if the user having
I think that’s correct. From notes I’ve got for how we want to handle our
fairshare in the future:
Setting up a funded account (which can be assigned a fairshare):
sacctmgr add account member1 Description="Member1 Description" FairShare=N
Adding/removing a user to/from the funded
subscribe should be sufficient.
> If you can't spare a single node then a VM would do the job.
>
> -Paul Edmon-
>
> On 6/11/2020 9:28 AM, Renfro, Michael wrote:
>> That’s close to what we’re doing, but without dedicated nodes. We have three
>> back-end partitions (interacti
That’s close to what we’re doing, but without dedicated nodes. We have three
back-end partitions (interactive, any-interactive, and gpu-interactive), but
the users typically don’t have to consider that, due to our job_submit.lua
plugin.
All three partitions have a default of 2 hours, 1 core, 2
Even without the slurm-bank system, you can enforce a limit on resources with a
QOS applied to those users. Something like:
=
sacctmgr add qos bank1 flags=NoDecay,DenyOnLimit
sacctmgr modify qos bank1 set grptresmins=cpu=1000
sacctmgr add account bank1
sacctmgr modify account name=bank1
I’d compare the RealMemory part of ’scontrol show node
abhi-HP-EliteBook-840-G2’ to the RealMemory part of your slurm.conf:
> Nodes which register to the system with less than the configured resources
> (e.g. too little memory), will be placed in the "DOWN" state to avoid
> scheduling jobs on
restart.
Thanks.
> On May 8, 2020, at 11:47 AM, Renfro, Michael wrote:
>
> Working on something like that now. From an SQL export, I see 16 jobs from
> my user that have a state of 7. Both states 3 and 7 show up as COMPLETED in
> sacct, and may also have some duplicate job en
f,to,pr"
> # Get Slurm individual job accounting records using the "sacct" command
> sacct $partitionselect -n -X -a -S $start_time -E $end_time -o $FORMAT
> -s $STATE
>
> There are numerous output fields which you can inquire, see "sacct -e".
>
> /Ole
still get counted against the user's current requests.
From: Ole Holm Nielsen
Sent: Friday, May 8, 2020 9:27 AM
To: slurm-users@lists.schedmd.com
Cc: Renfro, Michael
Subject: Re: [slurm-users] scontrol show assoc_mgr showing more resources in
use than squeue
re printed in detail by showuserlimits.
These tools are available from https://github.com/OleHolmNielsen/Slurm_tools
/Ole
On 08-05-2020 15:34, Renfro, Michael wrote:
> Hey, folks. I've had a 1000 CPU-day (144 CPU-minutes) GrpTRESMins
> limit applied to each user for years. It generally
Hey, folks. I've had a 1000 CPU-day (144 CPU-minutes) GrpTRESMins limit
applied to each user for years. It generally works as intended, but I have one
user I've noticed whose usage is highly inflated from reality, causing the
GrpTRESMins limit to be enforced much earlier than necessary:
There are MinNodes and MaxNodes settings that can be defined for each partition
listed in slurm.conf [1]. Set both to 1 and you should end up with the non-MPI
partitions you want.
[1] https://slurm.schedmd.com/slurm.conf.html
From: slurm-users on behalf of
>
> Regards
> Navin.
>
>
> On Wed, May 6, 2020 at 7:47 PM Renfro, Michael wrote:
> To make sure I’m reading this correctly, you have a software license that
> lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you
> use? That is, you could run an
pecific
> nodes?
> i do not want to create a separate partition.
>
> is there any way to achieve this by any other method?
>
> Regards
> Navin.
>
>
> Regards
> Navin.
>
> On Tue, May 5, 2020 at 7:46 PM Renfro, Michael wrote:
> Haven’t done it yet mysel
Aside from any Slurm configuration, I’d recommend setting up a modules [1 or 2]
folder structure for CUDA and other third-party software. That handles
LD_LIBRARY_PATH and other similar variables, reduces the chances for library
conflicts, and lets users decide their environment on a per-job
ically updated the value based on usage?
>
>
> Regards
> Navin.
>
>
> On Tue, May 5, 2020 at 7:00 PM Renfro, Michael wrote:
> Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in s
Have you seen https://slurm.schedmd.com/licenses.html already? If the software
is just for use inside the cluster, one Licenses= line in slurm.conf plus users
submitting with the -L flag should suffice. Should be able to set that license
value is 4 if it’s licensed per node and you can run up
Assuming you need a scheduler for whatever size your user population is, so
they need literal JupyterHub, or would they all be satisfied running regular
Jupyter notebooks?
On May 4, 2020, at 7:25 PM, Lisa Kay Weihl wrote:
External Email Warning
This email originated from outside the
d have to specify this when submitting, right? I.e. 'sbatch
> --exclusive myjob.sh', if I understand correctly. Would there be a way to
> simply enforce this, i.e. at the slurm.conf level or something?
>
> Thanks again!
>
> Rutger
>
> On Wed, Apr 29, 2020 at 10:06 PM Renfr
That’s a *really* old version, but
https://slurm.schedmd.com/archive/slurm-15.08.13/sbatch.html indicates there’s
an exclusive flag you can set.
On Apr 29, 2020, at 1:54 PM, Rutger Vos wrote:
.
Hi,
for a smallish machine that has been having degraded performance we want to
implement a
Someone else might see more than I do, but from what you’ve posted, it’s clear
that compute-0-0 will be used only after other lower-weighted nodes are too
full to accept a particular job.
I assume you’ve already submitted a set of jobs requesting enough resources to
fill up all the nodes, and
Can’t speak for everyone, but I went to Slurm 19.05 some months back, and
haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1).
> On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl wrote:
>
> External Email Warning
>
> This email originated from outside the university. Please use
Unless I’m misreading it, you have a wall time limit of 2 days, and jobs that
use up to 32 CPUs. So a total CPU time of up to 64 CPU-days would be possible
for a single job.
So if you want total wall time for jobs instead of CPU time, then you’ll want
to use the Elapsed attribute, not CPUTime.
All of this is subject to scheduler configuration, but: what has job 409978
requested, in terms of resources and time? It looks like it's the highest
priority pending job in the interactive partition, and I’d expect the
interactive partition has a higher priority than the regress partition.
As
Others might have more ideas, but anything I can think of would require a lot
of manual steps to avoid mutual interference with jobs in the other partitions
(allocating resources for a dummy job in the other partition, modifying the MPI
host list to include nodes in the other partition, etc.).
Rather than configure it to only run one job at a time, you can use job
dependencies to make sure only one job of a particular type at a time. A
singleton dependency [1, 2] should work for this. From [1]:
#SBATCH --dependency=singleton --job-name=big-youtube-upload
in any job script would
In addition to Sean’s recommendation, your user might want to use job arrays
[1]. That’s less stress on the scheduler, and throughput should be equivalent
to independent jobs.
[1] https://slurm.schedmd.com/job_array.html
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology
The release notes at https://slurm.schedmd.com/archive/slurm-19.05.5/news.html
indicate you can upgrade from 17.11 or 18.08 to 19.05. I didn’t find equivalent
release notes for 17.11.7, but upgrades over one major release should work.
> On Mar 11, 2020, at 2:01 PM, Will Dennis wrote:
>
>
I’m going to guess the job directive changed between earlier releases and
20.02. An version of the page from last year [1] has no mention of hetjob, and
uses packjob instead.
On a related note, is there a canonical location for older versions of Slurm
documentation? My local man pages are
We have a shared gres.conf that includes node names, which should have the
flexibility to specify node-specific settings for GPUs:
=
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia0 COREs=0-7
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia1 COREs=8-15
=
See the
When I made similar queues, and only wanted my GPU jobs to use up to 8 cores
per GPU, I set Cores=0-7 and 8-15 for each of the two GPU devices in gres.conf.
Have you tried reducing those values to Cores=0 and Cores=20?
> On Feb 27, 2020, at 9:51 PM, Pavel Vashchenkov wrote:
>
> External Email
If that 32 GB is main system RAM, and not GPU RAM, then yes. Since our GPU
nodes are over-provisioned in terms of both RAM and CPU, we end up using the
excess resources for non-GPU jobs.
If that 32 GB is GPU RAM, then I have no experience with that, but I suspect
MPS would be required.
> On
Hey, Matthias. I’m having to translate a bit, so if I get a meaning wrong,
please correct me.
You should be able to set the minimum and maximum number of nodes used for jobs
on a per-partition basis, or to set a default for all partitions. My most
commonly used partition has:
If you want to rigidly define which 20 nodes are available to the one group of
users, you could define a 20-node partition for them, and a 35-node partition
for the priority group, and restrict access by Unix group membership:
PartitionName=restricted Nodes=node0[01-20] AllowGroups=ALL
early
> release of v18.
>
> Best regards,
> David
>
> From: slurm-users on behalf of
> Renfro, Michael
> Sent: 31 January 2020 17:23:05
> To: Slurm User Community List
> Subject: Re: [slurm-users] Longer queuing times for larger jobs
>
> I missed reading w
s at the
> expense of the small fry for example, however that is a difficult decision
> that means that someone has got to wait longer for results..
>
> Best regards,
> David
> From: slurm-users on behalf of
> Renfro, Michael
> Sent: 31 January 2020 13:27
> T
Greetings, fellow general university resource administrator.
Couple things come to mind from my experience:
1) does your serial partition share nodes with the other non-serial partitions?
2) what’s your maximum job time allowed, for serial (if the previous answer was
“yes”) and non-serial
> cgroups is the solution I suppose.
>
> On Tue, Jan 28, 2020 at 7:42 PM Renfro, Michael wrote:
> For the first question: you should be able to define each node’s core count,
> hyperthreading, or other details in slurm.conf. That would allow Slurm to
> schedule (well-behaved) tas
1 - 100 of 147 matches
Mail list logo