how many nodes and cores by node? which type of jobs?

Having such a big amount of jobs, I guess most of them will be waiting
for resources. A backfiling cycle can take a really long cycle if you
set max_job_bf to 15000, but as the backfilling algorithm itself takes
some measures to avoid a long execution, I can not see a problem here.
By other hand, if you are submitting jobs with a high frequency, the
backfilling algorithm stops an starts from the beginning when a new job
is submitted.


On 11/17/2011 01:21 PM, Yuri D'Elia wrote:
> I was reading the documentation about SchedulerParameters for sched/backfill.
>
> I'm currently testing SLURM 2.4 for my workload.
> Usually we submit batches of ~10k (to ~30k) jobs within slurm with a default 
> time limit of 2 days. Though, *usually*, jobs terminate within 2-3 hours 
> (with some exceptions that can take as much as 12).
>
> What would be more desiderable between these two options:
>
> - rising max_jobs_bf (to, say, half the queue size) and bf_window to a month 
> or so,
> - introducing an OverTimeLimit of ~1 day and set all jobs to their average 
> execution time?
>   


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm

Reply via email to