You may want to look at partial key grouping feature to reduce hotspots, cases where key cardinality makes a bolt instance straggle. This feature is in 0.9.4
-Rajiv > On Apr 8, 2015, at 11:52 AM, Kashyap Mhaisekar <[email protected]> wrote: > > Hi, > My topology is like the following -> > Spout -> Bolt A -> Bolt B -> Bolt C -> Bolt D > > the groupings between Bolt C -> Bolt D is a field grouping as Bolt D is doing > an aggregation while everything else is a shuffleGrouping. > > Use Case: > If the spout emits 100K tuples such that the emits all are grouped on the > same field, then Bolt D will need to take all the load and hence becomes very > slow. IN this case, increasing the no. of instances of Bolt D will not help > as the grouping is for the instance of Bolt D. > > Question: How can this be optimized? > > Did anyone face such a use case? Please recommend. > > Thanks > Kashyap >
