If your bolts are evenly spread and there is at least one on each worker using localOrShuffleGrouping can be a huge optimization.
Consider this example. You have 10 bolt A tasks, 10 bolt B tasks, and 10 workers. Each worker has 1 of each task. Bolt b subscribes to bolt a. In shuffle grouping, you have 1/10 chance that any given tuple will stay within the worker. 90% of your inter-bolt traffic must be serialized, sent over the network, and deserialized. With localOrShuffleGrouping, that number is 0%. On Wed, Jul 1, 2015 at 3:15 PM, Abhishek Raj <[email protected]> wrote: > I understand the difference between "shuffle grouping" and "local or > shuffle grouping". But I haven't yet come across any instances where I > would prefer using localOrShuffleGrouping instead of shuffleGrouping. The > way I see it, shuffleGrouping helps increasing the concurrency if the tasks > of a given component are distributed across several workers. > > Could anyone throw some light on why the additional grouping method > "localOrShuffleGrouping" was created and probably some use cases? > > Thanks! > Abhishek. >
