Hi Arun, thanks for the quick feedback.

After reading your response I had had another look at the documentation and notice that I missed the immutable requirement.

Regards


On 2017/03/26 9:28 PM, Arun Mahadevan wrote:
With a single worker, bolts A & B would be receiving reference to the same tuple 
since they are running in the same JVM (splitter bolt emits it once and the thread/task 
running A & B get the same instance of the tuple). Now if you mutate any value in 
the tuple (collection of scores in your case) and emit, both the bolts would would end 
mutating the same list which is why you are seeing the results from both the paths in 
your output.

You should avoid mutating the values in the tuples. Always treat the tuple 
values as immutable (for e.g. create a new collection with the updated scores) 
or better make your collections immutable.

Regards,
Arun

On 3/26/17, 5:37 PM, "Michael Sweeney" <[email protected]> wrote:


Hi Guys,

I am new to Storm and am looking at using it on a project for processing
NetFlow data. During my initial experimentation with topologies I have
encountered some behaviour that I am unsure of.


My topology is as follows

      Spout
        |
   SplitterBolt
    |        |
  BoltA    BoltB
    |        |
    |      BoltC
    |        |
    |      BoltD
    |        |
    OutputBolt

My tuples consist of two Java classes, a parent object and a collection
of child objects contained in a List attribute called Scores in the
parent object (a Java List).

The SplitterBolt sends each Tuple down both paths. Each of the bolts A-D
in the topology tests some attribute in the parent object and then adds
a relevant entry to the list to reflect the outcome of the test. This is
not my final design but it does reflect the route I will be following as
I progress and I will be adding more paths as I proceed.

When I run the above topology with a single worker I note that two
instances of each tuple arrives at the OutputBolt a few milliseconds
apart. In each case the collection of values in the Score List is
exactly the same containing scores from both paths.

If I change the number of workers in the topology to 2 I see a different
outcome. I still see two instances of each tuple arrive at the
OutputBolt but this time the entries in the Score List are different and
either contain only scores from the distinct paths or scores from both.

My questions are:

1. in the first case (single worker) it appears that the tuples sent
down the two different paths are the same object in memory even though
two versions of the tuple move through the topology - is this correct?

2. in the second case (two workers) I am guessing that when tuples move
between workers and are updated then they are indeed different objects
(they are after all in different JVM's as I understand). Is this also
correct?

Any insight would be appreciated.

Regards
M




--
Michael Sweeney
Verify Dynamics (Pty) Ltd
www.verifydynamics.com
[email protected]
mobile: +27 83 300-1898
office: +27 10 593-4448

Reply via email to