On 05/02/2014 09:45 AM, Chris Harwell wrote:
max jobs? lightest weight method for determining number pending?
Hi,

Just curious - what is the number of maximum pending jobs you have seen where slurm still holds together? I think we had troubles quite awhile back when numbers would get into 40-80k pd jobs, but haven't looked recently and have also since moved the spool onto SSD.
I know there are sites running/pending 200k+ regularly with little issue. Depending on the box you are running the slurmctld on and the way you have things configured would make a difference. (backfill options have a large impact on how well things work or not for large job counts)

Do you find anything like submitting the job in hold state or dependencies to substantially reduce impact such that you could have an order of magnitude more? Any other tricks?

Is this still the best reference? http://slurm.schedmd.com/high_throughput.html
Yes

 Is the 14.x series any better? We're still using 2.6.7.
Outrageously. There was a lot of work done to enhance the number of jobs the system could handle and look at in a timely fashion.

Somewhat related, when you have a loaded up cluster, but still want to monitor the number of pending jobs, what is the lightest weight way to do that?

I was thinking perhaps this?
/opt/slurmcl2/bin/sdiag | awk '/Last queue length:/ { print $4 }' | head -1

Though I usually just do this:
squeue -o '%A' -h -r -t pd | wc -l
I would expect sdiag to be faster since it is only looking at and sending a small amount of data. squeue sends every job which could be very heavy.

But I wouldn't expect sdiag to always give you the correct stat you are looking for. It only returns the jobs eligible to run. Perhaps that is what you want, but in the scenario of a 10 node system

sbatch -N10 --exclusive --begin=tomorrow test.sh
sbatch -N10 --exclusive test.sh
sbatch -N10 --exclusive test.sh
sbatch -N10 --exclusive test.sh

You get this...

sdiag | awk '/Last queue length:/ { print $4 }' | head -1
2

squeue -o '%A' -h -r -t pd | wc -l
3

Since sdiag isn't taking into account the job that isn't eligible. But perhaps this is exactly what you want.

Danny

Thanks in advance,
Chris

Reply via email to