[slurm-dev] Re: Slurm GPU accounting

Jens Svalgaard Kohrt Fri, 27 Jun 2014 13:17:49 -0700

Trey,

This sounds like a good starting point. I’ll have a look next week and see 
whether this gets me anywhere.


The reason for this complicated setup is that the cluster was bought by several 
research groups joining their grants.
The GPU nodes were approximately twice as expensive as CPU only nodes.
To be fair to all groups a job using 20 cpu cores (one node) should be about 
the same price as a job with one cpu cores + two GPUS. Currently the GPU job 
only counts 1/20 in the fairshare.

Cheers,

Jens

On 27 Jun 2014, at 17:24 , Trey Dockendorf <[email protected]> wrote:

> 
> Jens,
> 
> If I understand you correctly, your wishing to update the SlurmDBD after a 
> job completes, with a modified "usage" based some some criteria?
> 
> My guess it that could be done with a Plugin, but unsure how.
> 
> If you wish to modify the job BEFORE it runs, which should then upload the 
> correct accounting data upon completion, you could try using the 
> "JobSubmitPlugins" parameter in slurm.conf.
> 
> For example:
> 
> JobSubmitPlugins=lua
> 
> Then in /etc/slurm/job_submit.lua you use some logic that says "if gpu 
> needed, increase max_cpus by 10".
> 
> The SLURM source contains an example job_submit.lua [1].  I also found 
> another good example [2] using Google.
> 
> Might not be the approach your looking for but hopefully sparks an idea :)
> 
> - Trey
> 
> [1] - https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
> [2] - 
> https://github.com/edf-hpc/slurm-llnl-misc-plugins/blob/master/job_submit.lua
> 
> =============================
> 
> Trey Dockendorf 
> Systems Analyst I 
> Texas A&M University 
> Academy for Advanced Telecommunications and Learning Technologies 
> Phone: (979)458-2396 
> Email: [email protected] 
> Jabber: [email protected]
> 
> ----- Original Message -----
>> From: "Jens Svalgaard Kohrt" <[email protected]>
>> To: "slurm-dev" <[email protected]>
>> Sent: Friday, June 27, 2014 9:18:06 AM
>> Subject: [slurm-dev] Slurm GPU accounting
>> 
>> 
>> Hi,
>> 
>> We are trying to setup GPU accounting in a mixed environment with 12
>> CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU
>> cores.
>> 
>> Jobs are submitted to a partition containing all nodes, and are
>> allocated as
>> * if a GPU is needed: on the GPU nodes
>> * if no GPU is needed: on any node (but only on GPU nodes if all CPU
>> nodes are in use)
>> 
>> Everything seems to work, apart from that the GPU’s are "free to use”
>> wrt. Slurms fair share accounting etc
>> 
>> Is it somehow possible to set this up such that accounting wise,
>> getting a GPU corresponds to getting e.g., 10 cpu cores extra?
>> Using Google I’ve only been able to find something about GPU
>> accounting as future work.
>> 
>> In an ideal world it would be nice to be able to have write a job
>> submit/completion script that given information about the
>> requested/allocated
>> * # CPU cores
>> * # GPUs
>> * # Memory
>> * # QOS
>> * # maximum/actual running time
>> calculates the cost of running the job and updates the SlurmDBD
>> database.
>> In my particular context, only something like this is needed
>> 
>>   cost_of_job = time_used * (total_cpus + 10*total_gpus)
>> 
>> Can somebody give a hint on how to do this (if possible)?
>> If not, maybe point me to where in the slurm source code I should
>> start digging?
>> 
>> Thanks!
>> 
>> Jens=

[slurm-dev] Re: Slurm GPU accounting

Reply via email to