You can check with something like this inside of a job: cat
/sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$SLURM_JOB_ID/cpuset.cpus. That
lists which cpus you have access to.
On 5/14/21 4:40 PM, Renfro, Michael wrote:
Untested, but prior experience with cgroups indicates that if things
are worki
On 5/14/21 1:45 am, Diego Zuccato wrote:
Usage reported in Percentage of Total
Cluster TRES Name Allocated Down PLND Dow Idle
Reserved Reported
- --
On 5/14/21 1:45 am, Diego Zuccato wrote:
It just doesn't recognize 'ALL'. It works if I specify the resources.
That's odd, what does this say?
sreport --version
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Untested, but prior experience with cgroups indicates that if things are
working correctly, even if your code tries to run as many processes as you have
cores, those processes will be confined to the cores you reserve.
Try a more compute-intensive worker function that will take some seconds or
Hi you all,
I'm replying to have notifications answering this question. I have a user
whose python script used almost all CPUs, but configured to use only 6 cpus
per task. I reviewed the code, and it doesn't have an explicit call to
multiprocessing or similar. So the user is unaware of this behavi
Hi,
Frequently all of our GPU nodes (8xGPU each) are in MIXED state and there
is no IDLE node. Some jobs require a complete node (all 8 GPUs) and such
jobs therefore have to wait really long before they can run.
Is there a way of improving this situation? E.g. by not blocking IDLE nodes
with jobs
Hi Folks,
We are currently running on SLURM 20.11.6 with cgroups constraints for
memory and CPU/Core. Can the scheduler only expose the requested number of
CPU/Core resources to a job? We have some users that employ python scripts
with the multi processing modules, and the scripts apparently use
XDMod can give these sorts of stats. I also have some diamond
collectors we use in concert with grafana to pull data and plot it which
is useful for seeing large scale usage trends:
https://github.com/fasrc/slurm-diamond-collector
-Paul Edmon-
On 5/13/2021 6:08 PM, Sid Young wrote:
Hi All,
Hello, FWIW we did this with gres.conf and slurm.conf:
in node's /etc/slurm/gres.conf:
AutoDetect=off
Name=gpu Type=quadro_k620 File=/dev/nvidia0 CPUs=0-0
Name=gpu Type=nvs_510 File=/dev/nvidia1 CPUs=1-1
Name=gpu Type=nvs_510 File=/dev/nvidia2 CPUs=2-2
in server's slurm.conf:
NodeName=gputesth
Dear all,
We currently have a single gpu capable server with 10x RTX2080Ti in it. One of
our research groups wants to replace one of these cards with an RTX3090 but
only if we can give them a higher priority on that particular card.
Is it possible to set up a queue that only includes a specifi
Il 14/05/21 10:24, Ole Holm Nielsen ha scritto:
Referring to https://slurm.schedmd.com/tres.html, which TRES are defined
on your cluster?
It just doesn't recognize 'ALL'. It works if I specify the resources.
root@str957-cluster:/var/log# sacctmgr show tres
TypeName ID
-
On 14-05-2021 08:52, Diego Zuccato wrote:
Il 14/05/2021 08:19, Christopher Samuel ha scritto:
sreport -t percent -T ALL cluster utilization
"sreport: fatal: No valid TRES given" :(
This works correctly on our cluster:
$ sreport -t percent -T ALL cluster utilization
12 matches
Mail list logo