Hi Adam,

The use case you are describing will result in a lot of network transfer, which 
could adversely affect performance/throughput.

I would suggest taking a look at Storm's micro batching/transactional API (aka 
Trident).

With Trident, your topology will get optimized to minimize network transfer. It 
will (behind the scenes) pipeline stream operations such that they run on a 
single node, and only resort to network data transfer when necessary. This is 
one reason why trident is roughly 2x faster in terms of throughput. 

The cost is latency, but it's entirely tunable. Sub-second ( < 250 ms) latency 
is easy with Trident. The higher your latency tolerance, the higher the 
throughput you can achieve (to a point). 

This is true of most, if not all, streaming frameworks. The difference is that 
Storm, being a pure streaming framework (I.e. Not limited to a batch paradigm 
like Spark streaming), allows you to choose the balance between 
throughput/latency that best fits your use case.

As it stands, the current Trident documentation has proven difficult for many 
people to grok, but I hope to change that in the near future. And as always, 
contributions are more than welcome.

-Taylor


> On Apr 28, 2015, at 12:27 PM, Adam Mitchell <[email protected]> 
> wrote:
> 
> I've got a topology that starts with one single-field input tuple from a 
> spout, then chains a bunch of bolts together.  
> 
> * The spout emits a piece of event data,
> * The first bolt, in declareOutputFields(), declares that it will emit two 
> fields - the input to the bolt plus one new field,
> * The next bolt does the same thing - takes two input fields and adds a third.
> 
> Eventually I've got a bolt that declares it will emit 8 fields, though the 
> first 7 are just pass-throughs from the previous bolt.
> 
> Everything works fine, but since I'm declaring that I will emit 8 fields, I'm 
> adding checks during processBolt() to validate the first 7 fields before 
> doing the work and adding the 8th field that my bolt is meant to do.
> 
> It doesn't feel like a good pattern.
> 
> Is this chaining of one bolt to another the right way to go?  Or should bolts 
> only emit the new fields that they generate?
> 
> (trying to attach an image to this question to illustrate the chaining)
> 
> 
> 
> <storm_topo.png>

Reply via email to