Janne, Thank you for your patch and the explanation. While I have yet to examine the code changes, I did read through the new html page. It was very helpful to supply an example based off the example from the first multifactor plugin.
I agree that in theory, both of the alternative algorithms you have proposed are an improvement over the first multifactor plugin. I have been looking to run some tests on the first one you sent, but have not had the time. I concur that this second, ticket based algorithm looks to be the better of the two. I particularly like the way the ticketing algorithm considers pending jobs. That is a nice way to keep over-serviced users from being penalized if few of their siblings have jobs in the queue. I wonder though if the ticket based algorithm might not suffer the same danger at very deep hierarchies (greater than 5) as the number of tickets start to be spread pretty thin. Performance is also a question as each is tested under large numbers of pending jobs. In any case, as a separate plugin we will all have a chance to try it out and perform our own A/B comparisons. I will let you know when I have some test results to share. Don > -----Original Message----- > From: Blomqvist Janne [mailto:[email protected]] > Sent: Wednesday, September 12, 2012 3:49 AM > To: slurm-dev > Subject: [slurm-dev] Making hierarchical fair share hierarchical, take 2 > > Hi, > > so here is an updated patch based on the feedback received on our previous > patch, and doing some more thinking on our own. For some background, > we're seeing problems with the current priority/multifactor plugin where > some accounts are under-served partially due to having a different > distribution of usage among their users than other accounts. See > > http://comments.gmane.org/gmane.comp.distributed.slurm.devel/2365 > > for the previous thread with more details. So this updated patch > > 1) Is implemented as a separate plugin, unimaginatively called > priority/multifactor2 (suggestions for a better name welcome!) > > 2) Implements the ticket scheduling approach that I briefly mentioned in the > previous thread. Compared to the algorithm presented in the original patch, > this one should handle arbitrarily deep account hierarchies without running > into limits on the number of bits used to represent the priority, and also the > priorities should be more evenly distributed. > > The documentation part of the patch can also be seen at > > http://tfy.tkk.fi/~job/slurm/doc/html/priority_multifactor2.shtml > > where one can see an explanation of the algorithm. This algorithm is slightly > more expensive than the current one, the major thing AFAICS is that it > iterates over the job list twice instead of once, although when I tested it > even with 20000 jobs in the queue it completed almost instantaneously, so > I'm not sure this is a problem in practice. > > Also, the way the algorithm works is that jobs get non-zero priorities only > once the algorithm has had a chance to run, so one probably wants to run it > reasonably often (once per minute could be a good default, maybe). > Currently the algorithm runs as part of the decay handler thread, which by > default runs only every 5 mins (PriorityCalcPeriod). For the moment I have > just decreased PriorityCalcPeriod, but it should probably be fixed in some > other way (e.g. run the decay handler thread every PriorityCalcPeriod/5 > mins, run the priority calculation every time and the decay handling logic > every 5th iteration, or something like that). Suggestions? > > I have tested the patch on my own test setup, but we haven't deployed it to > production yet. > > -- > Janne Blomqvist
