Alex,

This can definitely be added. Probably the best place to add a test of this sort would be the function job_independent() in src/slurmctld/job_mgr.c. A test in that location would not only work for the backfill scheduling logic, but also for newly submitted jobs and the default (FIFO) scheduling logic.

Quoting Alex Besogonov <alex.besogo...@gmail.com>:

I'm working on Amazon EC2 integration with Slurm. I've found several issues
(like inability to work with CLOUD nodes without DNS names) but they look
fairly easy to fix. CLOUD mode with suspend/restore works OK too.

However, I have another question - is it possible to somehow make Slurm
work in a 'reluctant' mode? Let me explain, nodes on Amazon EC2 are billed
at one-hour increments. So if I start 10 "srun sleep 10" jobs SLURM is
going to resume 10 nodes causing me to be billed for 20 hours of CPU time
even though all the jobs could be completed on a single host in the time it
takes to start all the EC2 nodes.

I've tried to play with ResumeRate but it simply doesn't work well enough.

So I'm thinking about a scheduler that will work in conjunction with the
backfill scheduler. It'll wait until there's at least one task in the queue
which is awaiting execution for more than N seconds to start resuming new
nodes.

Is it feasible or is there a better way to do it?




Reply via email to