[slurm-dev] Re: MsgAggregation Parameters

2015-12-08 Thread Danny Auble


Hey Paul,  Unless you have a very busy cluster (100s of jobs a second) 
or are running very large jobs (>2000 nodes) I don't think this will be 
very useful.  But I would expect 
MsgAggregationParams=WindowMsgs=10,WindowTime=10 to be more what you 
would want.  WindowTime=100 may be too long of a wait.  I am surprised 
at the threading of your slurmctld though, I would expect it to have 
much less threading.  Be sure to restart all your slurmd's as well as 
the slurmctld when you change the parameter. Try again and see if 
lowering the WindowTime down improves your situation.


Danny

On 12/08/15 10:25, Paul Edmon wrote:


We recently upgraded to 15.08.4 from 14.11 and we wanted to try out 
the MsgAggregation to see if that would improve cluster throughput and 
responsiveness.  However when we turned it on with the settings of 
WindowMsgs=10 and WindowTime=100 everything slowed to a crawl and it 
looked like the slurmctld was threading like crazy.  When we turned it 
off everything returned to normal.  Does any one have any suggestions 
or guidelines for what to set the MsgAggregationParam to?  I'm 
guessing it depends on the size of the cluster as we have the same 
settings on our test cluster but it is about 10 times smaller in terms 
of number of nodes than our main one.  I'm guessing this is a scaling 
problem.


Thoughts?  Anyone else using MsgAggregation?

-Paul Edmon-


[slurm-dev] Re: MsgAggregation Parameters

2015-12-08 Thread Paul Edmon


I will have to try that out.  Thanks for the info.

-Paul Edmon-

On 12/08/2015 01:54 PM, Danny Auble wrote:


Hey Paul,  Unless you have a very busy cluster (100s of jobs a second) 
or are running very large jobs (>2000 nodes) I don't think this will 
be very useful.  But I would expect 
MsgAggregationParams=WindowMsgs=10,WindowTime=10 to be more what you 
would want.  WindowTime=100 may be too long of a wait.  I am surprised 
at the threading of your slurmctld though, I would expect it to have 
much less threading.  Be sure to restart all your slurmd's as well as 
the slurmctld when you change the parameter. Try again and see if 
lowering the WindowTime down improves your situation.


Danny

On 12/08/15 10:25, Paul Edmon wrote:


We recently upgraded to 15.08.4 from 14.11 and we wanted to try out 
the MsgAggregation to see if that would improve cluster throughput 
and responsiveness.  However when we turned it on with the settings 
of WindowMsgs=10 and WindowTime=100 everything slowed to a crawl and 
it looked like the slurmctld was threading like crazy.  When we 
turned it off everything returned to normal.  Does any one have any 
suggestions or guidelines for what to set the MsgAggregationParam 
to?  I'm guessing it depends on the size of the cluster as we have 
the same settings on our test cluster but it is about 10 times 
smaller in terms of number of nodes than our main one.  I'm guessing 
this is a scaling problem.


Thoughts?  Anyone else using MsgAggregation?

-Paul Edmon-