Hi Bill,

Here are a couple of ideas:
At job end, compare each job's memory specification against actual use and work with the offending users. You might configure DefaultMemPerCPU and MaxMemPerCPU to match the CPU and memory allocations (e.g. if 8 CPUs and 8GB, then set both configuration parameters to 1G). Then if someone requests all of the memory on the node they will get all of the CPUs as well. In a job_submit plugin, set a nice value for jobs requesting a lot of memory (i.e. lower their scheduling priority). You could configure MaxMemPerNode, but that would probably impact users who really need a lot of memory.

Moe Jette
SchedMD

Quoting Bill Wichser <[email protected]>:

In doing accounting on past jobs, we are trying to figure out how to account for memory usage as well as core usage. What began as an anomaly has now turned into something my users have found to work quite effectively for their jobs, and that is to add the line:

#SBATCH --mem=MaxMemPerNode

We do share our nodes so this is an unacceptable specification.

Before going down the path of adding yet another check in the job_submit.lua script, I am wondering if there isn't a better way. Currently I do not have this value configured so when I do a "scontrol show config" it comes up as UNLIMITED, not at all what I want. Ideally I'd set this to some small value but suspect that this would have repercussions further along when users actually do allocate the correct amount and that value exceeds this MaxMemPerNode value I'd set low.


Yes, I could just inform my users that this is unacceptable behavior. But we all know that without policing, it will arise again so I'd much rather deal with this once and for all either by adding athe "right" value to slurm.conf or I'll just reject jobs using this variable altogether.

Thanks,
Bill

Reply via email to