After years and years of PBS use, it is time to modernize. Speaking with a few of the developers at SC13 we have started the switch on two new clusters soon to be deployed and will not install Torque/Moab on these but will attempt Slurm instead.

Naturally, things are quite different. I've managed to implement the job_submit.lua script to emulate a routing queue similar to PBS keyed on job request time.

But instead of trying to simply convert what I have with my current setup, maybe there is a better way. For instance, a single partition and a QOS defined for time_limit lengths instead. Obviously there are many ways to skin the cat the same as there are for other resource managers/schedulers.

What I am hoping to find is just some solid advice, from you folks who are running Slurm. I need some fairshare stuff for groups and limits for users and total jobs of length T and that's about it for now. And while this could all be implemented in a variety of ways, is there something I should be aware of in the overall layout to make this easier down the road or should I just continue to port things from the years of doing PBS?


Sincerely,
Bill Wichser

Reply via email to