With a single worker, bolts A & B would be receiving reference to the same 
tuple since they are running in the same JVM (splitter bolt emits it once and 
the thread/task running A & B get the same instance of the tuple). Now if you 
mutate any value in the tuple (collection of scores in your case) and emit, 
both the bolts would would end mutating the same list which is why you are 
seeing the results from both the paths in your output.

You should avoid mutating the values in the tuples. Always treat the tuple 
values as immutable (for e.g. create a new collection with the updated scores) 
or better make your collections immutable.

Regards,
Arun

On 3/26/17, 5:37 PM, "Michael Sweeney" <[email protected]> wrote:

>
>Hi Guys,
>
>I am new to Storm and am looking at using it on a project for processing 
>NetFlow data. During my initial experimentation with topologies I have 
>encountered some behaviour that I am unsure of.
>
>
>My topology is as follows
>
>       Spout
>         |
>    SplitterBolt
>     |        |
>   BoltA    BoltB
>     |        |
>     |      BoltC
>     |        |
>     |      BoltD
>     |        |
>     OutputBolt
>
>My tuples consist of two Java classes, a parent object and a collection 
>of child objects contained in a List attribute called Scores in the 
>parent object (a Java List).
>
>The SplitterBolt sends each Tuple down both paths. Each of the bolts A-D 
>in the topology tests some attribute in the parent object and then adds 
>a relevant entry to the list to reflect the outcome of the test. This is 
>not my final design but it does reflect the route I will be following as 
>I progress and I will be adding more paths as I proceed.
>
>When I run the above topology with a single worker I note that two 
>instances of each tuple arrives at the OutputBolt a few milliseconds 
>apart. In each case the collection of values in the Score List is 
>exactly the same containing scores from both paths.
>
>If I change the number of workers in the topology to 2 I see a different 
>outcome. I still see two instances of each tuple arrive at the 
>OutputBolt but this time the entries in the Score List are different and 
>either contain only scores from the distinct paths or scores from both.
>
>My questions are:
>
>1. in the first case (single worker) it appears that the tuples sent 
>down the two different paths are the same object in memory even though 
>two versions of the tuple move through the topology - is this correct?
>
>2. in the second case (two workers) I am guessing that when tuples move 
>between workers and are updated then they are indeed different objects 
>(they are after all in different JVM's as I understand). Is this also 
>correct?
>
>Any insight would be appreciated.
>
>Regards
>M
>


Reply via email to