Re: [slurm-dev] Re: Squeue taking 2+ minutes with 30K jobs

HAUTREUX Matthieu Tue, 06 Dec 2011 08:45:36 -0800

Andrej N. Gritsenko a écrit :

    Hello!


Sometime around Mon, Nov 7, 23:36, I've received written by Clay Teeter:

Here is my patch.  Does this look ok, for now anyway?

And, 20K jobs are now beings pulled at around 1s !!!!!!


    Thank you. With that patch against squeue in our case with 20k
jobs in queue they are showed in about 15s where most of time was spent
by slurmctld which gone to 99% CPU load at that time. Also it's gone too
slow on adding a job - when queue is grown to 20k jobs it accepts only
about 3 jobs per second while with empty queue it can accept tens of
jobs per second. Unfortunately we don't have any profiler there to dig
which function consumes the CPU so no solution is made yet.

    With best wishes.
    Andriy.

Hi,

if not already done, you should probably consider the use ofSchedulerParameters=defer in the controller slurm.conf. Without that,every submission involves an attempt of the scheduler logic whichprobably takes some time to manage the 20k jobs. With that option, youshould no longer have this complexity, only the internal schedulingthread will do the scheduling part every 30 seconds or so and your jobshould be submitted more quickly. One problem that we experimented withthat option, is that the -I parameter of srun was not working properlywith defer mode and you can no longer use it. I do not know if it isstill the case with later version of slurm.


Regards,
Matthieu

Re: [slurm-dev] Re: Squeue taking 2+ minutes with 30K jobs

Reply via email to