Okay then, what's a novice to do now? Lol. I have the QOS defined but
expected it to just work. Obviously I'm not assigning something in the
database yet which needs to set these QOS factors for users. Or else
set some default. I'd really, really not have to set each and every user.
Thanks!
Bill
On 02/18/2014 04:31 PM, Danny Auble wrote:
Yeap, that will do it. It makes more sense now what happened.
Doing what Lyn proposed would do the same thing for pending jobs that
are outside of the limit.
I would suggest trying changes on a test cluster (perhaps just on your
desktop) before pushing it into the wild.
On 02/18/14 13:22, Bill Wichser wrote:
Lets start again if I may, this time on a not-yet-in-production cluster.
Version is 2.6.5, OS is RH6.
Single partition.
AccountingStorageEnforce=qos
sacctmgr add qos test priority=1000 MaxNodesPerJob=2 MaxCpusPerJob=40
MaxJobsPerUser=2 MaxCpusPerUser=8
Flags=DenyOnLimit,EnforceUsageThreshold
My job_submit.lua script chooses the qos by walltime and assigns.
Without the AccountingStorageEnforce=qos, the jobs actually run, with
no limits being imposed. The correct qos is indeed assigned.
With AccountingStorageEnforce=qos set, I cannot submit.
$ sbatch test.slurm
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
From slurmctld.log
2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, setting default account value: all
[2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from
uid 14119, new qos value: test
[2014-02-18T16:13:16.714] _job_create: invalid account or partition
for user 14119, account 'all', and partition 'all'
[2014-02-18T16:13:16.714] _slurm_rpc_submit_batch_job: Invalid
account or account/partition combination specified
And all pending jobs in the queue now have a InvalidAccount reason.
Bill