Hi.  In the project I'm working on, we have a lot of code that basically:

    * consumes normal tuples as they come in, building up some sort of 
aggregated representation of what was in those tuples
    * then when a tick tuple comes in, it publishes the whole set of data 
(e.g., it sends the aggregates to some other bolt for processing, or publishes 
to Kafka or Cassandra, whatever)

Of course, given the most straightforward implementation of that, given that 
the bolts typically start at more or less the same time, the tick tuples all 
get delivered at the same time.  So it's really easy to end up in a 
circumstance where some downstream consumer spends 59 seconds out of 60 doing 
nothing, then gets completely pounded on for a second, then spends the next 59 
seconds doing nothing.

In our use cases, generally we want to do things like aggregate data for 60 
seconds, but the aggregates don't all need to line up.

I keep thinking that if there was a way to tell Storm that we want a tick tuple 
every 60 seconds, but delay for a random number of seconds between 0 and 60 
before you send the first one, that'd just fix this right up.  But I don't see 
an obvious way to do that.

Clearly there are ways in which we can take care of this in our code, they just 
involve more code. (-:

It seems like this would be a common use case.  Are there better approaches?  
Is there some trick that would make it possible to smear the tick tuples out 
over time?  If you're in this situation, how do you handle it?

I'd love to be missing something easy and obvious.

Thanks!

        -Steve

Reply via email to