Tuple processing

Michael Sweeney Sun, 26 Mar 2017 05:07:48 -0700


Hi Guys,

I am new to Storm and am looking at using it on a project for processingNetFlow data. During my initial experimentation with topologies I haveencountered some behaviour that I am unsure of.



My topology is as follows

      Spout
        |
   SplitterBolt
    |        |
  BoltA    BoltB
    |        |
    |      BoltC
    |        |
    |      BoltD
    |        |
    OutputBolt

My tuples consist of two Java classes, a parent object and a collectionof child objects contained in a List attribute called Scores in theparent object (a Java List).

The SplitterBolt sends each Tuple down both paths. Each of the bolts A-Din the topology tests some attribute in the parent object and then addsa relevant entry to the list to reflect the outcome of the test. This isnot my final design but it does reflect the route I will be following asI progress and I will be adding more paths as I proceed.

When I run the above topology with a single worker I note that twoinstances of each tuple arrives at the OutputBolt a few millisecondsapart. In each case the collection of values in the Score List isexactly the same containing scores from both paths.

If I change the number of workers in the topology to 2 I see a differentoutcome. I still see two instances of each tuple arrive at theOutputBolt but this time the entries in the Score List are different andeither contain only scores from the distinct paths or scores from both.


My questions are:

1. in the first case (single worker) it appears that the tuples sentdown the two different paths are the same object in memory even thoughtwo versions of the tuple move through the topology - is this correct?

2. in the second case (two workers) I am guessing that when tuples movebetween workers and are updated then they are indeed different objects(they are after all in different JVM's as I understand). Is this alsocorrect?


Any insight would be appreciated.

Regards
M

Tuple processing

Reply via email to