Hi all,

Few time ago I asked about the existence of cpu_factor concept in
slurm. As seems that the time normalization is not implemented in
slurm, I'm looking for solutions (this is basic for our scenario).

** Our time limit come from partition (queue) defaults. So, if the job
goes to queue long it will get a hard limit. We use slurmdb.

Some other experimented admin suggested to use prolog to modify job
time limit. Something like:

squeue -h -o %l -j $SLURM_JOB_ID
[...my opertains...
scontrol update jobid=$SLURM_JOB_ID timelimit=<new time>

this is simple enough, and works great. 

But now I'm wondering if this method is robust enough, for exmaple, what
could happen if more that 100 (low limit) jobs start at once? how many
concurrent connections can scontrol handle?

Maybe Is better to implement it inside slurm's code...

Any suggestion is welcome.

TIA,
Arnau

Reply via email to