If there is only two spouts executors and they share node 1 and node 2 that would explain the imbalance. localOrShuffleGrouping favors the local worker over one over the network, most of the tuples would stay on node 1 and node 2 and node 3 would sit idle.
2016-02-22 9:09 GMT-06:00 Muhammad Bilal <mbilalce....@gmail.com>: > Hi, > > I am running the RollingCount Benchmark from this set of benchmarks > <https://github.com/intel-hadoop/storm-benchmark>. Here is the relevant > piece of code: > > spout = new FileReadSpout(BenchmarkUtils.ifAckEnabled(config)); > > TopologyBuilder builder = new TopologyBuilder(); > > builder.setSpout(SPOUT_ID, spout, spoutNum); > builder.setBolt(SPLIT_ID, new WordCount.SplitSentence(), spBoltNum) > .localOrShuffleGrouping(SPOUT_ID); > builder.setBolt(COUNTER_ID, new RollingCountBolt(windowLength, emitFreq), > rcBoltNum) > .fieldsGrouping(SPLIT_ID, new Fields(WordCount.SplitSentence.FIELDS)); > > The FileReadSpout simply reads text from a file. > > I have a three node setup with a total of 96 cores with spBoltNum = 6 and > rcBoltNum > = 6. After a run, I see that there is a significant imbalance in the > capacity metric reported for each executor of the split bolt. Even though > each node has 2 executors for split bolt. I see the following numbers for > capacity of split bolt executors on each node: > > Node 1 ~ 0.95 > > Node 2 ~ 0.7 > > Node 3 ~ 0.25 > > I do not understand this imbalance in utilization as the grouping for > split bolt is localOrShuffleGrouping, I was expecting the capacity reported > for each executor to be more or less equal. What am I missing here? > > Here is the link to Stack Overflow question > <http://stackoverflow.com/questions/35556817/utilization-imbalance-in-storm-bolt-executors> > that I have posted. > > Thanks. > > Regards, > > Bilal > -- Rodrigo Valladares Cotta Master's Student, Computer Science University of Nebraska-Lincoln