Okay then, what's a novice to do now? Lol. I have the QOS defined but expected it to just work. Obviously I'm not assigning something in the database yet which needs to set these QOS factors for users. Or else set some default. I'd really, really not have to set each and every user.

Thanks!
Bill

On 02/18/2014 04:31 PM, Danny Auble wrote:

Yeap, that will do it.  It makes more sense now what happened.

Doing what Lyn proposed would do the same thing for pending jobs that are outside of the limit.

I would suggest trying changes on a test cluster (perhaps just on your desktop) before pushing it into the wild.



On 02/18/14 13:22, Bill Wichser wrote:

Lets start again if I may, this time on a not-yet-in-production cluster.

Version is 2.6.5, OS is RH6.

Single partition.
AccountingStorageEnforce=qos

sacctmgr add qos test priority=1000 MaxNodesPerJob=2 MaxCpusPerJob=40 MaxJobsPerUser=2 MaxCpusPerUser=8 Flags=DenyOnLimit,EnforceUsageThreshold

My job_submit.lua script chooses the qos by walltime and assigns.


Without the AccountingStorageEnforce=qos, the jobs actually run, with no limits being imposed. The correct qos is indeed assigned.

With AccountingStorageEnforce=qos set, I cannot submit.

$ sbatch test.slurm
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

From slurmctld.log
2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from uid 14119, setting default account value: all [2014-02-18T16:13:16.714] job_submit.lua: slurm_job_submit: job from uid 14119, new qos value: test [2014-02-18T16:13:16.714] _job_create: invalid account or partition for user 14119, account 'all', and partition 'all' [2014-02-18T16:13:16.714] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified

And all pending jobs in the queue now have a InvalidAccount reason.

Bill

Reply via email to