Mucking about some more this morning, I found that adding an Account and
adding all the QOS to that account was required, along with adding a
User to the database which gets to be a part of that Account.
So what this tells me is that each and every user needs an entry into
this database before any kind of limits can be imposed. This certainly
is an incentive for a master DB which serves multiple different
clusters. Yet it seems overkill given that all users are basically just
using the QOS for job limits in the exact same way. Allowing any user
to assign themselves to any Account, provided the account allowed this,
would suit my needs much better than micro-managing each user individually.
I will play by the given set of rules so written here and see where this
takes us.
Thanks for all your responses thus far.
Sincerely,
Bill
On 02/18/2014 05:17 PM, Bill Wichser wrote:
Okay then, what's a novice to do now? Lol. I have the QOS defined but
expected it to just work. Obviously I'm not assigning something in the
database yet which needs to set these QOS factors for users. Or else
set some default. I'd really, really not have to set each and every user.
Thanks!
Bill
On 02/18/2014 04:31 PM, Danny Auble wrote:
Yeap, that will do it. It makes more sense now what happened.
Doing what Lyn proposed would do the same thing for pending jobs that
are outside of the limit.
I would suggest trying changes on a test cluster (perhaps just on your
desktop) before pushing it into the wild.
On 02/18/14 13:22, Bill Wichser wrote:
Lets start again if I may, this time on a not-yet-in-production cluster.
Version is 2.6.5, OS is RH6.
Single partition.
AccountingStorageEnforce=qos
sacctmgr add qos test priority=1000 MaxNodesPerJob=2 MaxCpusPerJob=40
MaxJobsPerUser=2 MaxCpusPerUser=8
Flags=DenyOnLimit,EnforceUsageThreshold
My job_submit.lua script chooses the qos by walltime and assigns.
Without the AccountingStorageEnforce=qos, the jobs actually run, with
no limits being imposed. The correct qos is indeed assigned.
With AccountingStorageEnforce=qos set, I cannot submit.
$ sbatch test.slurm
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
From slurmctld.log
2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, setting default account value: all
[2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, new qos value: test
[2014-02-18T16:13:16.714] _job_create: invalid account or partition
for user 14119, account 'all', and partition 'all'
[2014-02-18T16:13:16.714] _slurm_rpc_submit_batch_job: Invalid
account or account/partition combination specified
And all pending jobs in the queue now have a InvalidAccount reason.
Bill