If there is only two spouts executors and they share node 1 and node 2 that
would explain the imbalance. localOrShuffleGrouping favors the local worker
over one over the network, most of the tuples would stay on node 1 and node
2 and node 3 would sit idle.

2016-02-22 9:09 GMT-06:00 Muhammad Bilal <mbilalce....@gmail.com>:

> Hi,
>
> I am running the RollingCount Benchmark from this set of benchmarks
> <https://github.com/intel-hadoop/storm-benchmark>. Here is the relevant
> piece of code:
>
> spout = new FileReadSpout(BenchmarkUtils.ifAckEnabled(config));
>
> TopologyBuilder builder = new TopologyBuilder();
>
> builder.setSpout(SPOUT_ID, spout, spoutNum);
> builder.setBolt(SPLIT_ID, new WordCount.SplitSentence(), spBoltNum)
>         .localOrShuffleGrouping(SPOUT_ID);
> builder.setBolt(COUNTER_ID, new RollingCountBolt(windowLength, emitFreq), 
> rcBoltNum)
>         .fieldsGrouping(SPLIT_ID, new Fields(WordCount.SplitSentence.FIELDS));
>
> The FileReadSpout simply reads text from a file.
>
> I have a three node setup with a total of 96 cores with spBoltNum = 6 and 
> rcBoltNum
> = 6. After a run, I see that there is a significant imbalance in the
> capacity metric reported for each executor of the split bolt. Even though
> each node has 2 executors for split bolt. I see the following numbers for
> capacity of split bolt executors on each node:
>
> Node 1 ~ 0.95
>
> Node 2 ~ 0.7
>
> Node 3 ~ 0.25
>
> I do not understand this imbalance in utilization as the grouping for
> split bolt is localOrShuffleGrouping, I was expecting the capacity reported
> for each executor to be more or less equal. What am I missing here?
>
> Here is the link to Stack Overflow question
> <http://stackoverflow.com/questions/35556817/utilization-imbalance-in-storm-bolt-executors>
> that I have posted.
>
> Thanks.
>
> Regards,
>
> Bilal
>



-- 
Rodrigo Valladares Cotta
Master's Student, Computer Science
University of Nebraska-Lincoln

Reply via email to