many thanks for a very good explanation. We are now working on
implementation of a driver for our self-service. One question came up --
what's the best way to handle a situation when a user decides to "cancel"
an allocation? Should allocation be deleted at all (loosing potentially
accounting data) or rather its limit set to the current usage, preventing
further actions? Is there any "best practice" for this?
On 18 July 2017 at 16:39, Thomas M. Payerle <paye...@umd.edu> wrote:
> Basically, you create an allocation account for each of the organizations
> and projects, with the projects having as "parent"
> the allocation account corresponding to the organization it
> belongs to. Then set the appropriate limits on the allocation
> accounts, and create associations for users with the allocation
> accounts for each project (not org) they belong to.
> Typically, to limit CPU-hours one would limit GrpCPUMins (older
> slurms) or GrpTRESMins (newer slurms, and set the cpu TRES in
> this parameter). All of above is set up with sacctmgr command.
> Slurm will then allow users to submit jobs charging against
> allocation accounts for projects they belong to (i.e. have
> an association with in Slurm's DB), and Slurm will track usage
> and prevent jobs from starting (if other Slurm config setup properly)
> if new job will cause usage to exceed the limit. I have not thoroughly
> tested, but I believe if you oversubscribe an organization's SUs
> (i.e. sum of SU limits on projects belonging to org is greater than
> the SU limit on the org), Slurm will ensure both limits are enforced.
> This could be confusing to users, as I believe most utilities for
> examining allocation account "balances" do not handle multiple layers
> (on the todo list for my utility, but not enough manpower SUs:)
> To my knowledge, Slurm makes no attempt to integrate with external
> user databases of any kind, so if you wanted something a bit more
> automated you would need to write your own cron job or something to check
> for users added to/removed from any of projects you have allocations for
> for and issue the sacctmgr commands to update the Slurm DB. The complexity
> depends on how complex your environment is (how many Slurm partitions,
> but should not be too bad.
> I have Slurm-Sacctmgr and Slurm-Sshare Perl modules on CPAN which
> provide perl friendly wrappers to the sacctmgr and sshare commands
> which might be helpful in such (I also have similar for a few other
> Slurm utilities which I have not had time to polish enough for submission
> to CPAN).
> On Tue, 18 Jul 2017, Ilja Livenson wrote:
>> newbie here.
>> We are working on integrating SLURM with our self-service portal. Our
>> model is as follows:
>> - a user can belong to one or more projects.
>> - projects belong to a particular organization.
>> - limits on usage can be set on project and organization level (e.g. 1000
>> cpu/h per month in project A and 2000 cpu/h in project B).
>> - information about user membership and limits is available over several
>> protocols, including REST/LDAP/FreeIPA.
>> - a user logins into a SLURM submission node and submits a job.
>> Now, we are not quite clear what is the correct approach for enforcing
>> those limits in SLURM. Perhaps someone wiser could suggest an approach or
>> point to a project where similar has been achieved? Googling didn't help
>> much with the latter,
> Tom Payerle
> DIT-ATI-Research Computing paye...@umd.edu
> 4254 Stadium Dr (301) 405-6135
> University of Maryland
> College Park, MD 20742-4111