For #2, I think MemstoreSizeCostFunction belongs to the same category if we are to adopt moving average.
Some factors to consider: The data structure used by StochasticLoadBalancer should be concise. The number of regions in a cluster can be expected to approach 1 million. We cannot afford to store long history of read / write requests in master. Efficiency of cost calculation should be high - there're many cost functions the balancer goes through, it is expected for each cost function to return quickly. Otherwise we would not come up with proper region movement plan(s) in time. Cheers On Wed, Jan 11, 2017 at 5:51 PM, Ted Yu <[email protected]> wrote: > For #2, I think it makes sense to try out using request rates for cost > calculation. > > If the experiment result turns out to be better, we can consider using > such measure. > > Thanks > > On Wed, Jan 11, 2017 at 5:34 PM, Timothy Brown <[email protected]> > wrote: > >> Hi, >> >> I have a couple of questions about the StochasticLoadBalancer. >> >> 1) In CostFromRegionLoadFunction.getRegionLoadCost the cost is weights >> later samples of the RegionLoad more than previous ones. For example, with >> a queue size of 4 it would be (.5 * load1 + .25*load2 + .125*load3 + >> .125*load4). Is this the intended behavior? >> >> 2) Would it make more sense to calculate the ReadRequestCost and >> WriteRequestCost as rates? Right now it looks like the cost is just based >> off the total number of read/write requests a region has gotten over its >> lifetime. >> >> -Tim >> > >
