It is likely a combination of the parallelism of your components vs the
number of workers you have, the scheduler you are using, and the fact that
not all of your components use the same amount of CPU.  aka there is no
short answer.

The default scheduler in storm is doing a round robin scheduling of
executors to workers.  For the most part this is fine, but if the
parallelism of your bolts/spouts is not a multiple of the number of
workers, then not all of the workers will be homogenous.  If then you have
one bolt or spout that uses a lot more CPU then others do you will see this.

In your case, and I am just guessing here, the sparrow_firehose has a
parallelism of 4, but you have 9 workers, also the capacity for that bolt
is much higher than the others, which indicates to me that the CPU
utilization might be a lot higher too (just a guess).  So I would suspect
that the 4 workers with 40% CPU have in them the 4 sparrow_firehose bolts.

There are a few ways to "fix" this.  You can always have the parallelism of
your components be a multiple of the number of workers you have, but if you
have data skew it is still not going to work.  Or you can look at moving to
the resource aware scheduler where it takes into account the amount of
memory and CPU each component needs to be able to try and make sure that a
node is never overloaded.  It does not try to make everything perfectly
even, but it will make sure you never overcommit a single node.

We are also working on elasticity support where storm can rebalance your
topology based off of of actual measurements of the resources your topology
is using, but that is still a ways off.

- Bobby

On Mon, Oct 30, 2017 at 12:33 PM Noppanit Charassinvichai <
[email protected]> wrote:

> I meant workers and not supervisor. Sorry for the confusion.
>
> On Mon, 30 Oct 2017 at 13:32 Noppanit Charassinvichai <
> [email protected]> wrote:
>
>> Ah sorry I mixed up supervisor and workers. Those are all workers.
>>
>> On Mon, 30 Oct 2017 at 13:06 Priyank Shah <[email protected]> wrote:
>>
>>> Hi Noppanit,
>>>
>>>
>>>
>>> Supervisor processes do not run your spouts and bolts. If those CPU
>>> usage percentages are for supervisor processes, you should check the same
>>> for worker processes. Hope this helps.
>>>
>>>
>>>
>>> *From: *Noppanit Charassinvichai <[email protected]>
>>> *Reply-To: *"[email protected]" <[email protected]>
>>> *Date: *Monday, October 30, 2017 at 7:52 AM
>>> *To: *"[email protected]" <[email protected]>
>>> *Subject: *How to distribute the wordload equally for Storm Cluster
>>>
>>>
>>>
>>> Hi!
>>>
>>>
>>>
>>> I have a storm cluster which process events from Kinesis and forward
>>> events to other Kinesis Streams and Druid. I have 9 supervisors running. My
>>> process is mostly CPU bound because it's just forwarding events.
>>>
>>>
>>>
>>> I noticed that all of my supervisors are not working equally.
>>>
>>>
>>>
>>> 4 of them running at 40% CPU
>>>
>>> 4 of them running at 12% CPU
>>>
>>> 1 is barely touching 8%
>>>
>>>
>>>
>>> I wonder if there's a way to equally distribute the work load so I can
>>> reduce the number of supervisor running.
>>>
>>>
>>>
>>> Here's the details of my clusters
>>>
>>>
>>>
>>> - I'm running one Topology only so I'm not sharing resources.
>>>
>>> - I'm running 158 Executors
>>>
>>>  - 60 for Kinesis Spout
>>>
>>>  - 25 to Parse the events
>>>
>>>  - 45 to Send to one Kinesis Stream
>>>
>>>  - 15 to Send to another Kinesis Stream (It has significantly fewer
>>> events)
>>>
>>>  - 4 to Send to Druid
>>>
>>>
>>>
>>> The complete latency is around 140ms which is pretty good.
>>>
>>>
>>>
>>> I have also attached the screenshot of my cluster
>>>
>>>
>>>
>>> Here's the topology configuration
>>>
>>>
>>>
>>>     builder.setSpout("kinesis_spout", spout,
>>> parseInt(property.getSpoutExecutorNumber()));
>>>
>>>
>>>
>>>         builder.setBolt("parse_bolt", new ParsingDruidBolt(),
>>> parseInt(property.getDruidParseBoltExecutorNumber()))
>>>
>>>                 .shuffleGrouping("kinesis_spout");
>>>
>>>
>>>
>>>         builder.setBolt("send_to_kinesis", new EmitPageViewBolt(),
>>> property.getPageViewBoltExecutorNumber())
>>>
>>>                 .shuffleGrouping("parse_bolt", STREAM_PAGE_VIEW);
>>>
>>>
>>>
>>>         builder.setBolt("send_to_variations_kinesis", new
>>> EmitVariationsBolt(), property.getVariationsBoltExecutorNumber())
>>>
>>>                 .shuffleGrouping("parse_bolt", STREAM_VARIATIONS);
>>>
>>>
>>>
>>>         builder.setBolt("sparrow_firehose", new BeamBolt<>(new
>>> DruidBeamFactory(topologyConfig)),
>>> parseInt(property.getDruidSparrowBoltExecutorNumber()))
>>>
>>>                 .shuffleGrouping("parse_bolt", STREAM_EVENT);
>>>
>>> [image: creen Shot 2017-10-30 at 10.48.15 AM.png]
>>>
>>

Reply via email to