After years and years of PBS use, it is time to modernize. Speaking
with a few of the developers at SC13 we have started the switch on two
new clusters soon to be deployed and will not install Torque/Moab on
these but will attempt Slurm instead.
Naturally, things are quite different. I've managed to implement the
job_submit.lua script to emulate a routing queue similar to PBS keyed on
job request time.
But instead of trying to simply convert what I have with my current
setup, maybe there is a better way. For instance, a single partition
and a QOS defined for time_limit lengths instead. Obviously there are
many ways to skin the cat the same as there are for other resource
managers/schedulers.
What I am hoping to find is just some solid advice, from you folks who
are running Slurm. I need some fairshare stuff for groups and limits
for users and total jobs of length T and that's about it for now. And
while this could all be implemented in a variety of ways, is there
something I should be aware of in the overall layout to make this easier
down the road or should I just continue to port things from the years of
doing PBS?
Sincerely,
Bill Wichser
- [slurm-dev] Trying to be a convert, initial config philosophy... Bill Wichser
-