There is no magic number, it depends on the specific problem you are trying to solve. You start with some reasonable value for the parallelism and tune it based on your requirements. You could also start with a higher number of “tasks” than the parallelism and then you can rebalance your topology and adjust parallelism on the fly to scale up or down.
See the slides from Taylor’s “Scaling Storm” presentation, you might find it useful - http://www.slideshare.net/ptgoetz/scaling-apache-storm-strata-hadoopworld-2014 From: sam mohel <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, January 23, 2017 at 4:58 PM To: "[email protected]" <[email protected]> Subject: Re: simple question about grouping Many thanks , but how and when can i decide that this number is perfect form me or not ? On Mon, Jan 23, 2017 at 1:27 PM, Arun Mahadevan <[email protected]> wrote: > builder.setBolt("MyBolt", new MyBolt(), 4).shuffleGrouping("MySpout"); i > found this example but couldn't know why he use number 4 ? This is the “parallelism hint” (the number of threads) for “MyBolt”. So in your example there will be 4 threads executing “MyBolt” across the workers in your cluster and the tuples from “MySpout” would be randomly distributed across all of the 4 instances of your bolt. Also see http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html From: sam mohel <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, January 23, 2017 at 4:47 PM To: "[email protected]" <[email protected]> Subject: Re: simple question about grouping excuse me , if i have single spout and single bolt and the bolt doing 2 process so can i do like this builder.setSpout("MySpout", new mySpout(), 1); builder.setBolt("MyBolt", new MyBolt(), 4).shuffleGrouping("MySpout"); i found this example but couldn't know why he use number 4 ? On Mon, Jan 23, 2017 at 1:13 PM, sam mohel <[email protected]> wrote: thanks for replying On Mon, Jan 23, 2017 at 1:14 PM, Arun Mahadevan <[email protected]> wrote: Grouping makes sense only when you have more than one task for a bolt. If your bolt has more than one task, then the grouping will decide how the tuples from the spout are distributed to the individual tasks of the bolt. (shuffe = random, fields = keyed on some field and so on). See http://storm.apache.org/releases/current/Concepts.html Thanks, Arun From: sam mohel <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, January 23, 2017 at 3:09 PM To: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Subject: simple question about grouping i have text file contains data . size of this file is 3.5 MB . My topology consists of one spout and one bolt so is that possible to make all processing in one bolt and in this case what is the role of grouping here ? Thanks in advance
