Maybe the backfilling algorithm is the problem. It runs getting all
locks related to slurmctld and under high load it could easily take 10
seconds to release the locks. Usually the backfilling just runs every 30
seconds, but with a situation of jobs submissions in a high rate, the
algorithm just stop and start again from first job in the priority queue
each scheduling cycle.

Probably a good solution should be to hold submitted jobs for a while
then inserting them all in the job queue at the same time.

I have been testing a similar situation and a new tool, sdiag, was
developed. Maybe you can obtain more information about what's going on
with it.



On 12/13/2011 01:54 PM, Andrej N. Gritsenko wrote:
>     Hello!
>
> HAUTREUX Matthieu has written once (Tuesday,  6 December, 17:45):
>   
>> if not already done, you should probably consider the use of 
>> SchedulerParameters=defer in the controller slurm.conf. Without that, 
>> every submission involves an attempt of the scheduler logic which 
>> probably takes some time to manage the 20k jobs. With that option, you 
>> should no longer have this complexity, only the internal scheduling 
>> thread will do the scheduling part every 30 seconds or so and your job 
>> should be submitted more quickly. One problem that we experimented with 
>> that option, is that the -I parameter of srun was not working properly 
>> with defer mode and you can no longer use it. I do not know if it is 
>> still the case with later version of slurm.
>>     
>     Thank you for suggestion. We tried it but unfortunately submit rate
> left the same ~3 job/s with 25k jobs in queue so probably something else
> slows it down.
>
>     Andriy.
>   


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm

Reply via email to