Hi Guys,

I am new to Storm and am looking at using it on a project for processing NetFlow data. During my initial experimentation with topologies I have encountered some behaviour that I am unsure of.


My topology is as follows

      Spout
        |
   SplitterBolt
    |        |
  BoltA    BoltB
    |        |
    |      BoltC
    |        |
    |      BoltD
    |        |
    OutputBolt

My tuples consist of two Java classes, a parent object and a collection of child objects contained in a List attribute called Scores in the parent object (a Java List).

The SplitterBolt sends each Tuple down both paths. Each of the bolts A-D in the topology tests some attribute in the parent object and then adds a relevant entry to the list to reflect the outcome of the test. This is not my final design but it does reflect the route I will be following as I progress and I will be adding more paths as I proceed.

When I run the above topology with a single worker I note that two instances of each tuple arrives at the OutputBolt a few milliseconds apart. In each case the collection of values in the Score List is exactly the same containing scores from both paths.

If I change the number of workers in the topology to 2 I see a different outcome. I still see two instances of each tuple arrive at the OutputBolt but this time the entries in the Score List are different and either contain only scores from the distinct paths or scores from both.

My questions are:

1. in the first case (single worker) it appears that the tuples sent down the two different paths are the same object in memory even though two versions of the tuple move through the topology - is this correct?

2. in the second case (two workers) I am guessing that when tuples move between workers and are updated then they are indeed different objects (they are after all in different JVM's as I understand). Is this also correct?

Any insight would be appreciated.

Regards
M

Reply via email to