I also think option 2 is better. There is another reason for choosing this other than being a smaller payload that goes across. Today it could be that A bolt splits the stream 1:1 for B & C. But later if it becomes 1:2 for example, having a different stream for C allows you to scale Bolt C (more parallelism) to improve the throughput. If you had only one Stream, then you can give a 1 message to B and 2 messages (as a list) to C, but there is no way to scale C (even if you add more parallelism, the throughput wouldn't improve as it would have to process 2 messages in serial)
I do not think there is a cost to having more streams and so choosing the second option might be better. On Fri, Aug 7, 2015 at 12:01 PM, Javier Gonzalez <[email protected]> wrote: > Hi all, > > Suppose I have a bolt A that has to send information to two bolts B and C. > Each bolt must receive different information from the original A bolt. > Which of these strategies is more efficient? > > Strategy 1: > - have A declare a single output stream, with fields "forB" and "forC". > - Emit all the information in a single tuple, putting the information for > Bolt B in "forB" and the information for bolt C in "forC". > - Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel. > - In Bolt B and Bolt C execute method read only the relevant part of the > input tuple. > > Strategy 2: > - have A declare two output streams, “streamB” and “streamC“. > - emit one tuple with the information for bolt B in streamB, and one in > with the > information for Bolt C in StreamC. > - Have each bolt subscribe only to their relevant stream. > - Each bolt works as usual with their payload in their execute methods. > > A priori I would think Strategy 2 is better (as we would be emitting > smaller tuples), but I'm not sure if there's a hidden cost/benefit in > having multiple subscribers to a single stream > > Thank you, > Javier >
