Hello all,

We have been getting reports lately of breaching our MaxJobCount (which is
already set very high) lately.  Some standard checks show that there are
between 2,000-3,000 jobs in running and pending status at any given moment
using the squeue options "'%u:%t' -S U -h'".

I started looking into this and believe there may be an issue with
specifying the number of simultaneous tasks.  I have found that if I use
"--array=1-20%1", squeue reports 201 tasks [1].  If don't use "%1", squeue
reports 20 tasks [2].

If I understand "MaxJobCount" in slurm.conf correctly via its man page,
each of these tasks will be considered as 1 job.  If my understanding is
correct, then using "--array=1-20%1" counts as 201 "jobs" in the system
while using "--array=1-20" only counts as 2 "jobs" in the system.

Currently, I have restricted two users who are heavily utilizing array jobs
to a limited number of submissions in order to preserve stability;  other
users have already brought forth concerns of receiving sbatch errors since
Friday.

[1]
http://s3.enemy.org/~mrfusion/client_snippets/squeue_scriptandoutput-1.txt
[2]
http://s3.enemy.org/~mrfusion/client_snippets/squeue_scriptandoutput-2.txt

Thank you,
John DeSantis

Reply via email to