Mucking about some more this morning, I found that adding an Account and adding all the QOS to that account was required, along with adding a User to the database which gets to be a part of that Account.

So what this tells me is that each and every user needs an entry into this database before any kind of limits can be imposed. This certainly is an incentive for a master DB which serves multiple different clusters. Yet it seems overkill given that all users are basically just using the QOS for job limits in the exact same way. Allowing any user to assign themselves to any Account, provided the account allowed this, would suit my needs much better than micro-managing each user individually.

I will play by the given set of rules so written here and see where this takes us.

Thanks for all your responses thus far.

Sincerely,
Bill

On 02/18/2014 05:17 PM, Bill Wichser wrote:

Okay then, what's a novice to do now? Lol.  I have the QOS defined but
expected it to just work. Obviously I'm not assigning something in the
database yet which needs to set these QOS factors for users.  Or else
set some default. I'd really, really not have to set each and every user.

Thanks!
Bill

On 02/18/2014 04:31 PM, Danny Auble wrote:

Yeap, that will do it.  It makes more sense now what happened.

Doing what Lyn proposed would do the same thing for pending jobs that
are outside of the limit.

I would suggest trying changes on a test cluster (perhaps just on your
desktop) before pushing it into the wild.



On 02/18/14 13:22, Bill Wichser wrote:

Lets start again if I may, this time on a not-yet-in-production cluster.

Version is 2.6.5, OS is RH6.

Single partition.
AccountingStorageEnforce=qos

sacctmgr add qos test priority=1000 MaxNodesPerJob=2 MaxCpusPerJob=40
MaxJobsPerUser=2 MaxCpusPerUser=8
Flags=DenyOnLimit,EnforceUsageThreshold

My job_submit.lua script chooses the qos by walltime and assigns.


Without the AccountingStorageEnforce=qos, the jobs actually run, with
no limits being imposed.  The correct qos is indeed assigned.

With AccountingStorageEnforce=qos set, I cannot submit.

$ sbatch test.slurm
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified

From slurmctld.log
2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, setting default account value: all
[2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, new qos value: test
[2014-02-18T16:13:16.714] _job_create: invalid account or partition
for user 14119, account 'all', and partition 'all'
[2014-02-18T16:13:16.714] _slurm_rpc_submit_batch_job: Invalid
account or account/partition combination specified

And all pending jobs in the queue now have a InvalidAccount reason.

Bill

Reply via email to