Thanks Moe. I am aware of this aspect and have tested it. I am more
concerned about the backend initial setup. As I mentioned, I already
have an implementation running with 4 partitions: test, small, medium,
large. I have written the code for the lua plugin to set the partition
during the job submission. i have overlapping nodes assigned to these
partitions as well as user limits set. i just have not ventured into
the database yet.
Since almost everything I've already attempted here could be done with
QOS and fairshare, and I'll need fairshare anyway, maybe partitions are
redundant. I could just change the lua script to assign the correct QOS
to the job and who cares which node it lands on. (Just like my PBS setups).
So I'm more curious about how folks have set up their initial layout and
open to advice.
Bill
On 01/09/2014 11:32 AM, Moe Jette wrote:
I will just add that Slurm has wrappers for the common PBS user
commands, recognizes the #PBS options in the batch scripts, and can set
PBS environment variables as well.
The conversion should be relatively transparent to users using the
appropriate Slurm plugins and packages.
Moe Jette
SchedMD
Quoting Chris Read <[email protected]>:
After trying a few job managers and slurm configurations we've settled
on a single partition with a hand full of QOS defined. We use the QOS
for time limits and giving specific classes of jobs higher priority. 2
QOS get most of the jobs, and we use FairShare to keep everyone happy.
We have everything in a single group right now, but from what I can
see it should scale out nicely to multiple groups too.
On the flip side though I have no experience at all of PBS...
On Thu, Jan 9, 2014 at 9:34 AM, Bill Wichser <[email protected]> wrote:
After years and years of PBS use, it is time to modernize. Speaking
with a
few of the developers at SC13 we have started the switch on two new
clusters
soon to be deployed and will not install Torque/Moab on these but will
attempt Slurm instead.
Naturally, things are quite different. I've managed to implement the
job_submit.lua script to emulate a routing queue similar to PBS keyed
on job
request time.
But instead of trying to simply convert what I have with my current
setup,
maybe there is a better way. For instance, a single partition and a QOS
defined for time_limit lengths instead. Obviously there are many
ways to
skin the cat the same as there are for other resource
managers/schedulers.
What I am hoping to find is just some solid advice, from you folks
who are
running Slurm. I need some fairshare stuff for groups and limits for
users
and total jobs of length T and that's about it for now. And while this
could all be implemented in a variety of ways, is there something I
should
be aware of in the overall layout to make this easier down the road or
should I just continue to port things from the years of doing PBS?
Sincerely,
Bill Wichser