Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-05-14 Thread Ryan Cox
You can check with something like this inside of a job:  cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$SLURM_JOB_ID/cpuset.cpus. That lists which cpus you have access to. On 5/14/21 4:40 PM, Renfro, Michael wrote: Untested, but prior experience with cgroups indicates that if things are worki

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Christopher Samuel
On 5/14/21 1:45 am, Diego Zuccato wrote: Usage reported in Percentage of Total   Cluster  TRES Name    Allocated Down PLND Dow    Idle Reserved Reported - --

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Christopher Samuel
On 5/14/21 1:45 am, Diego Zuccato wrote: It just doesn't recognize 'ALL'. It works if I specify the resources. That's odd, what does this say? sreport --version All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-05-14 Thread Renfro, Michael
Untested, but prior experience with cgroups indicates that if things are working correctly, even if your code tries to run as many processes as you have cores, those processes will be confined to the cores you reserve. Try a more compute-intensive worker function that will take some seconds or

Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-05-14 Thread Rodrigo Santibáñez
Hi you all, I'm replying to have notifications answering this question. I have a user whose python script used almost all CPUs, but configured to use only 6 cpus per task. I reviewed the code, and it doesn't have an explicit call to multiprocessing or similar. So the user is unaware of this behavi

[slurm-users] schedule mixed nodes first

2021-05-14 Thread Durai Arasan
Hi, Frequently all of our GPU nodes (8xGPU each) are in MIXED state and there is no IDLE node. Some jobs require a complete node (all 8 GPUs) and such jobs therefore have to wait really long before they can run. Is there a way of improving this situation? E.g. by not blocking IDLE nodes with jobs

[slurm-users] Exposing only requested CPUs to a job on a given node.

2021-05-14 Thread Luis R. Torres
Hi Folks, We are currently running on SLURM 20.11.6 with cgroups constraints for memory and CPU/Core. Can the scheduler only expose the requested number of CPU/Core resources to a job? We have some users that employ python scripts with the multi processing modules, and the scripts apparently use

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Paul Edmon
XDMod can give these sorts of stats.  I also have some diamond collectors we use in concert with grafana to pull data and plot it which is useful for seeing large scale usage trends: https://github.com/fasrc/slurm-diamond-collector -Paul Edmon- On 5/13/2021 6:08 PM, Sid Young wrote: Hi All,

Re: [slurm-users] Different GPU types on the same server

2021-05-14 Thread David Gauchard
Hello, FWIW we did this with gres.conf and slurm.conf: in node's /etc/slurm/gres.conf: AutoDetect=off Name=gpu Type=quadro_k620 File=/dev/nvidia0 CPUs=0-0 Name=gpu Type=nvs_510 File=/dev/nvidia1 CPUs=1-1 Name=gpu Type=nvs_510 File=/dev/nvidia2 CPUs=2-2 in server's slurm.conf: NodeName=gputesth

[slurm-users] Different GPU types on the same server

2021-05-14 Thread Emyr James
Dear all, We currently have a single gpu capable server with 10x RTX2080Ti in it. One of our research groups wants to replace one of these cards with an RTX3090 but only if we can give them a higher priority on that particular card. Is it possible to set up a queue that only includes a specifi

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Diego Zuccato
Il 14/05/21 10:24, Ole Holm Nielsen ha scritto: Referring to https://slurm.schedmd.com/tres.html, which TRES are defined on your cluster? It just doesn't recognize 'ALL'. It works if I specify the resources. root@str957-cluster:/var/log# sacctmgr show tres TypeName ID -

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Ole Holm Nielsen
On 14-05-2021 08:52, Diego Zuccato wrote: Il 14/05/2021 08:19, Christopher Samuel ha scritto: sreport -t percent -T ALL cluster utilization "sreport: fatal: No valid TRES given" :( This works correctly on our cluster: $ sreport -t percent -T ALL cluster utilization