I will have to try that out. Thanks for the info.
-Paul Edmon-
On 12/08/2015 01:54 PM, Danny Auble wrote:
Hey Paul, Unless you have a very busy cluster (100s of jobs a second)
or are running very large jobs (>2000 nodes) I don't think this will
be very useful. But I would expect
MsgAggregationParams=WindowMsgs=10,WindowTime=10 to be more what you
would want. WindowTime=100 may be too long of a wait. I am surprised
at the threading of your slurmctld though, I would expect it to have
much less threading. Be sure to restart all your slurmd's as well as
the slurmctld when you change the parameter. Try again and see if
lowering the WindowTime down improves your situation.
Danny
On 12/08/15 10:25, Paul Edmon wrote:
We recently upgraded to 15.08.4 from 14.11 and we wanted to try out
the MsgAggregation to see if that would improve cluster throughput
and responsiveness. However when we turned it on with the settings
of WindowMsgs=10 and WindowTime=100 everything slowed to a crawl and
it looked like the slurmctld was threading like crazy. When we
turned it off everything returned to normal. Does any one have any
suggestions or guidelines for what to set the MsgAggregationParam
to? I'm guessing it depends on the size of the cluster as we have
the same settings on our test cluster but it is about 10 times
smaller in terms of number of nodes than our main one. I'm guessing
this is a scaling problem.
Thoughts? Anyone else using MsgAggregation?
-Paul Edmon-