I also think option 2 is better. There is another reason for choosing this
other than being a smaller payload that goes across. Today it could be that
A bolt splits the stream 1:1 for B & C. But later if it becomes 1:2 for
example, having a different stream for C allows you to scale Bolt C (more
parallelism) to improve the throughput. If you had only one Stream, then
you can give a 1 message to B and 2 messages (as a list) to C, but there is
no way to scale C (even if you add more parallelism, the throughput
wouldn't improve as it would have to process 2 messages in serial)

I do not think there is a cost to having more streams and so choosing the
second option might be better.

On Fri, Aug 7, 2015 at 12:01 PM, Javier Gonzalez <[email protected]> wrote:

> Hi all,
>
> Suppose I have a bolt A that has to send information to two bolts B and C.
> Each bolt must receive different information from the original A bolt.
> Which of these strategies is more efficient?
>
> Strategy 1:
> - have A declare a single output stream, with fields "forB" and "forC".
> - Emit all the information in a single tuple, putting the information for
> Bolt B in "forB" and the information for bolt C in "forC".
> - Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel.
> - In Bolt B and Bolt C execute method read only the relevant part of the
> input tuple.
>
> Strategy 2:
> - have A declare two output streams, “streamB” and “streamC“.
> - emit one tuple with the information for bolt B in streamB, and one in
> with the
> information for Bolt C in StreamC.
> - Have each bolt subscribe only to their relevant stream.
> - Each bolt works as usual with their payload in their execute methods.
>
> A priori I would think Strategy 2 is better (as we would be emitting
> smaller tuples), but I'm not sure if there's a hidden cost/benefit in
> having multiple subscribers to a single stream
>
> Thank you,
> Javier
>

Reply via email to