We use the classic fairshare algorithm here with users having their shares set to to parent and pulling from the group pool rather than having each user have their own fairshare (you can see our doc here: https://docs.rc.fas.harvard.edu/kb/fairshare/). This has worked very well for us for many years.  However, there is a use case where this doesn't work namely breaking ties internal to a group.  We have a lot of private partitions owned by a specific group and when you have a bunch of users in that group the queue turns into FIFO instead of letting lower usage users go first due to the parent flag on the fairshare.  Now this is obviously solved by giving every user their own fairshare but this has the downside of impacting the users priority back on the shared partitions with other groups where they will not be able to use their groups full fairshare but instead are stuck with their own.  Thus their total group fairshare may be something like 0.4 but their personal is stuck at 0 because they are one of the heaviest users in the lab.

Now I get the feeling that Fair Tree might solve this but I can't move to it as it's taken years for our users to even understand and accept the classic fairshare model.  As such I'm trying to come up with solutions that work with in the model.  One option I have been considering is using the job_submit.lua script to set a Nice value for all the jobs based on that users usage.  Basically the nice value would break the internal ties of the group and allow non-FIFO scheduling internal to accounts with out impacting their overall fairshare relative to other groups.

Before I start messing around with this though I wanted to ping this wisdom of the group and see how others handle tie breaking internal to an account/group/lab?  What solutions have people used for this?

-Paul Edmon-


Reply via email to