It is likely a combination of the parallelism of your components vs the number of workers you have, the scheduler you are using, and the fact that not all of your components use the same amount of CPU. aka there is no short answer.
The default scheduler in storm is doing a round robin scheduling of executors to workers. For the most part this is fine, but if the parallelism of your bolts/spouts is not a multiple of the number of workers, then not all of the workers will be homogenous. If then you have one bolt or spout that uses a lot more CPU then others do you will see this. In your case, and I am just guessing here, the sparrow_firehose has a parallelism of 4, but you have 9 workers, also the capacity for that bolt is much higher than the others, which indicates to me that the CPU utilization might be a lot higher too (just a guess). So I would suspect that the 4 workers with 40% CPU have in them the 4 sparrow_firehose bolts. There are a few ways to "fix" this. You can always have the parallelism of your components be a multiple of the number of workers you have, but if you have data skew it is still not going to work. Or you can look at moving to the resource aware scheduler where it takes into account the amount of memory and CPU each component needs to be able to try and make sure that a node is never overloaded. It does not try to make everything perfectly even, but it will make sure you never overcommit a single node. We are also working on elasticity support where storm can rebalance your topology based off of of actual measurements of the resources your topology is using, but that is still a ways off. - Bobby On Mon, Oct 30, 2017 at 12:33 PM Noppanit Charassinvichai < [email protected]> wrote: > I meant workers and not supervisor. Sorry for the confusion. > > On Mon, 30 Oct 2017 at 13:32 Noppanit Charassinvichai < > [email protected]> wrote: > >> Ah sorry I mixed up supervisor and workers. Those are all workers. >> >> On Mon, 30 Oct 2017 at 13:06 Priyank Shah <[email protected]> wrote: >> >>> Hi Noppanit, >>> >>> >>> >>> Supervisor processes do not run your spouts and bolts. If those CPU >>> usage percentages are for supervisor processes, you should check the same >>> for worker processes. Hope this helps. >>> >>> >>> >>> *From: *Noppanit Charassinvichai <[email protected]> >>> *Reply-To: *"[email protected]" <[email protected]> >>> *Date: *Monday, October 30, 2017 at 7:52 AM >>> *To: *"[email protected]" <[email protected]> >>> *Subject: *How to distribute the wordload equally for Storm Cluster >>> >>> >>> >>> Hi! >>> >>> >>> >>> I have a storm cluster which process events from Kinesis and forward >>> events to other Kinesis Streams and Druid. I have 9 supervisors running. My >>> process is mostly CPU bound because it's just forwarding events. >>> >>> >>> >>> I noticed that all of my supervisors are not working equally. >>> >>> >>> >>> 4 of them running at 40% CPU >>> >>> 4 of them running at 12% CPU >>> >>> 1 is barely touching 8% >>> >>> >>> >>> I wonder if there's a way to equally distribute the work load so I can >>> reduce the number of supervisor running. >>> >>> >>> >>> Here's the details of my clusters >>> >>> >>> >>> - I'm running one Topology only so I'm not sharing resources. >>> >>> - I'm running 158 Executors >>> >>> - 60 for Kinesis Spout >>> >>> - 25 to Parse the events >>> >>> - 45 to Send to one Kinesis Stream >>> >>> - 15 to Send to another Kinesis Stream (It has significantly fewer >>> events) >>> >>> - 4 to Send to Druid >>> >>> >>> >>> The complete latency is around 140ms which is pretty good. >>> >>> >>> >>> I have also attached the screenshot of my cluster >>> >>> >>> >>> Here's the topology configuration >>> >>> >>> >>> builder.setSpout("kinesis_spout", spout, >>> parseInt(property.getSpoutExecutorNumber())); >>> >>> >>> >>> builder.setBolt("parse_bolt", new ParsingDruidBolt(), >>> parseInt(property.getDruidParseBoltExecutorNumber())) >>> >>> .shuffleGrouping("kinesis_spout"); >>> >>> >>> >>> builder.setBolt("send_to_kinesis", new EmitPageViewBolt(), >>> property.getPageViewBoltExecutorNumber()) >>> >>> .shuffleGrouping("parse_bolt", STREAM_PAGE_VIEW); >>> >>> >>> >>> builder.setBolt("send_to_variations_kinesis", new >>> EmitVariationsBolt(), property.getVariationsBoltExecutorNumber()) >>> >>> .shuffleGrouping("parse_bolt", STREAM_VARIATIONS); >>> >>> >>> >>> builder.setBolt("sparrow_firehose", new BeamBolt<>(new >>> DruidBeamFactory(topologyConfig)), >>> parseInt(property.getDruidSparrowBoltExecutorNumber())) >>> >>> .shuffleGrouping("parse_bolt", STREAM_EVENT); >>> >>> [image: creen Shot 2017-10-30 at 10.48.15 AM.png] >>> >>
