Hey Paul, Unless you have a very busy cluster (100s of jobs a second) or are running very large jobs (>2000 nodes) I don't think this will be very useful. But I would expect MsgAggregationParams=WindowMsgs=10,WindowTime=10 to be more what you would want. WindowTime=100 may be too long of a wait. I am surprised at the threading of your slurmctld though, I would expect it to have much less threading. Be sure to restart all your slurmd's as well as the slurmctld when you change the parameter. Try again and see if lowering the WindowTime down improves your situation.

Danny

On 12/08/15 10:25, Paul Edmon wrote:

We recently upgraded to 15.08.4 from 14.11 and we wanted to try out the MsgAggregation to see if that would improve cluster throughput and responsiveness. However when we turned it on with the settings of WindowMsgs=10 and WindowTime=100 everything slowed to a crawl and it looked like the slurmctld was threading like crazy. When we turned it off everything returned to normal. Does any one have any suggestions or guidelines for what to set the MsgAggregationParam to? I'm guessing it depends on the size of the cluster as we have the same settings on our test cluster but it is about 10 times smaller in terms of number of nodes than our main one. I'm guessing this is a scaling problem.

Thoughts?  Anyone else using MsgAggregation?

-Paul Edmon-

Reply via email to