Hi Danny,
Apologies for top posting on this message, but it might be easier to do so.
We had a think about the previous responses to this thread and we thought that
we could some what cook up some scripts to wrap up the functionality to provide
us with a simple banking system. Just enough to give us a credit/debit system
for groups/projects which just relies on slurm and nothing else.
Please take a look at
https://github.com/jcftang/slurm-bank
SLURM Bank, a collection of wrapper scripts to give slurm GOLD like
capabilities for managing resources.
With the scripts we are able to provide a simple banking system where we can
deposit hours to an account. Users are associated to these accounts from which
they can use to run jobs. If users do not have an account or if they do not
have hours in their account then they cannot run jobs.
Requirements (tested with)
• SLURM 2.2.0
• Scientificlinux 5.4 (bash, rsync, perl)
Here's some notes from when we were cooking up ideas
http://thammuz.tchpc.tcd.ie/mirrors/slurm-bank/slurm-bank-1.1.1.2/html/design.html
http://thammuz.tchpc.tcd.ie/mirrors/slurm-bank/slurm-bank-1.1.1.2/html/walkthrough.html
(this was just me thinking things through)
The system is pretty simple and dumb but it does appear to work (at least at
our site). Much of the complicated problems with refunding hours due to failed
jobs or system down time is left out and we think that it should probably be a
user/people issue than a technical issue. We're planning on rolling out these
scripts into full production use in a few weeks time and we hope that these
scripts will be of use to others.
The full documentation for the scripts are at
http://thammuz.tchpc.tcd.ie/mirrors/slurm-bank/slurm-bank-1.1.1.2/html/index.html
and a tarball for *usage* can be got from
http://thammuz.tchpc.tcd.ie/mirrors/slurm-bank/slurm-bank-1.1.1.2/
I think slurm provides enough reporting and transaction logging functionality
with slurm-bank for us to migrate completely away from a setup which requires
GOLD and maui.
Regards,
Jimmy Tang
--
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/
On 11 May 2011, at 15:34, Danny Auble wrote:
> Hey Paddy,
>
>> Hi Danny, Mark,
>>
>> Just to follow up on this question a little..
>>
>>>> At the moment we are using SLURM with Moab (for scheduling) and Gold
>>>> (for accounting), but I'm having a look at whether we can move to an
>>>> all-SLURM setup to do the same thing.
>>
>> We have the same setup (but with maui instead of moab), and would also like
>> to
>> move to an all-SLURM setup if possible.
>>
>> Banking/reporting is a requirement for our setup, due to the nature of our
>> funding.
>
> That makes sense.
>
>>
>>>> Using this fairly simple setup I can enforce user's jobs to be run
>>>> against only accounts that they're associated with. The next thing I
>>>> need to do (that I'm currently stuck on) is work out how to assign
>>>> quotas to these accounts (a number of Core-Hours if you like) that
>>>> decrease by an appropriate amount every time a job is run. The
>>>> documentation often refers to accounts as "bank accounts" which makes
>>>> me think that this can be done and I just haven't work out how to yet.
>>>
>>> What documentation are you looking at is one question. You can
>>> probably look at
>>> https://computing.llnl.gov/linux/slurm/accounting.html to get a good
>>> idea how to do what you want. Each association and QOS have a
>>> GrpCPUMin limit. If you decide you don't want to do fairshare (the
>>> preferred way of doing things) you can do the hard limit stuff using
>>> the this limit along with the priority/multifactor plugin explained
>>> here https://computing.llnl.gov/linux/slurm/priority_multifactor.html.
>>>
>>> Look primarily at the PriorityDecayHalfLife and
>>> PriorityUsageResetPeriod options for the slurm.conf.
>>
>> The functionality that I think SLURM will do easily is this:
>>
>> * have individual associations/accounts, with one or more users in each
>> * potentially hierarchical accounts (not a biggie at present)
>
> Yes, SLURM does both of these really well.
>
>>
>>
>> The (Gold) functionality that I'm not sure about:
>>
>> * being able to easily `deposit' resources (e.g. CPU hours) into accounts --
>> although maybe 'sacctmgr modify ... set GrpCPUMins=XXX' will do
>> that (it would
>> be nice if the value can be additive/subtractive, rather than absolute)
>
> Yes, currently there is only the absolute, but adding or subtracting could be
> added. Right now you can alter the amount of accumulated time for a
> particular association tree starting at either a user or account using
> sacctmgr (see sacctmgr modify user/account set RawUsage=). Currently this
> only will set the usage to zero as well, but could also be altered to support
> various altercations.
>
>>
>> * being able to see the current `balance' for a given account, for
>> both the user
>> and the site admin
>
> sshare will currently give you usage stats with respect to fairshare usage.
> This could also be altered if the cluster is set up to do this kind of
> accounting to print out the limit values and such instead of the fairshare
> information.
>
>>
>> * lifetimes/deadlines for deposited CPU hours (e.g. a project might
>> only have a
>> lifetime of 6 months)
>
> sacctmgr will display these by listing the associations. But this could also
> be part of the sshare changes since it would be nice if the user only had to
> learn one tool.
>
>>
>> * (optional) different charge rates based on node features (e.g. if most
>> nodes are uniform, but you have a subset of large memory or more cores; or
>> different rates per partitions)
>
> You can do something like this with QOS using the UsageFactor option. But
> this idea currently doesn't exist on a node level, but could probably be
> added as well.
>
>>
>> * (optional) having an extra charge for reservations, and/or a charge for
>> un-used hours in a reservation
>
> Same idea here. No usage factor but it could probably be added.
>
>>
>> * (optional) the equivalent of Gold's audit trail, of being able to see the
>> history of when the account/association was updated
>
> sacctmgr list transactions
>
>>
>>
>> I'm thinking that setting PriorityDecayHalfLife=0 so that the resources (e.g.
>> GrpCPUMins) are absolute will get us most of the way there, but I'm not sure
>> about these other requirements.
>>
>> Any ideas about these features? Are they already there in slurm?
>
> I think you are close to what you need, there are a few things missing, but
> you may be able to get around them by just using something different than you
> have in the past.
>
> Let me know if you have any other questions,
> Danny
>
>>
>> Thanks,
>> Paddy
>>
>> --
>> Paddy Doyle
>> Trinity Centre for High Performance Computing,
>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
>> http://www.tchpc.tcd.ie/
>>
>>
>> ----- End forwarded message -----
>>
>>
>>
>