Re: [slurm-dev] Pushing slurmctld a little

Moe Jette Wed, 18 Jan 2012 16:37:50 -0800

Most of the discussions have actually been about supporting a higherjob throughput rate, but those same changes would increase SLURM'sability to handle larger job counts. None of that work has movedbeyond the discussion stage.

Moe


Quoting Chris Harwell <super...@gmail.com>:

we've wondered if that was the case. is there any plan or willingness to
implement a finer grained locking?
On Jan 18, 2012 2:02 PM, "Moe Jette" <je...@schedmd.com> wrote:

We have held some discussions on this subject and it isn't simple to
resolve. The best way to do this would probably be to establish
finer-grained locking so there can be more parallelism, say by locking
individual job records rather than the entire job list. That would impact
quite a few sub-systems, for example how we preserve job state.

If you could submit a smaller number of jobs that each have many job
steps, that could address your problem today (say submitting 1000 jobs each
with 1000 steps).

Moe Jette


Quoting Yuri D'Elia <wav...@thregr.org>:

 Hi everyone. I'm trying to increase the number of jobs that can be queued

with SLURM. I'm submitting a lot of very small jobs (that take around ~10
minutes) in batches of ~100k. I would like to be able to queue around 500k
to 1m jobs, if possible, but I'm having a very hard time going beyond 100k
with both 2.3.1 and 2.4.

To make a test, I've raised MaxJobCount to 200000, MessageTimeout to 60
and reduced MinJobAge to 60. Of course, SchedulerParameters has already
"defer" and I tried to set both max_job_bf and interval to very low values
(10 and 600 respectively).

After going beyond ~100k jobs, slurmctld becomes cpu-bound and starts to
timeout on any request. I noticed that just one cpu is used: maybe there is
a way to split the work on multiple cpus?

Is there any other feature that affects performance? I'm using cons_res,
multifactor priority along with accounting, but I would gladly use a
simpler scheduler and less features if I could go beyond the current limit
(which still looks pretty far below my target).

Thanks again.

Re: [slurm-dev] Pushing slurmctld a little

Reply via email to