Hello all, We have been getting reports lately of breaching our MaxJobCount (which is already set very high) lately. Some standard checks show that there are between 2,000-3,000 jobs in running and pending status at any given moment using the squeue options "'%u:%t' -S U -h'".
I started looking into this and believe there may be an issue with specifying the number of simultaneous tasks. I have found that if I use "--array=1-20%1", squeue reports 201 tasks [1]. If don't use "%1", squeue reports 20 tasks [2]. If I understand "MaxJobCount" in slurm.conf correctly via its man page, each of these tasks will be considered as 1 job. If my understanding is correct, then using "--array=1-20%1" counts as 201 "jobs" in the system while using "--array=1-20" only counts as 2 "jobs" in the system. Currently, I have restricted two users who are heavily utilizing array jobs to a limited number of submissions in order to preserve stability; other users have already brought forth concerns of receiving sbatch errors since Friday. [1] http://s3.enemy.org/~mrfusion/client_snippets/squeue_scriptandoutput-1.txt [2] http://s3.enemy.org/~mrfusion/client_snippets/squeue_scriptandoutput-2.txt Thank you, John DeSantis
