Sounds like you are setting the tick tuple value globally for the whole topology. You could enable bolt-level configuration and then set the values of each bolt as you like. That would be a small amount of new code per bolt.
On 1/11/16, 9:59 AM, "Steve Miller" <[email protected]> wrote: >Hi. In the project I'm working on, we have a lot of code that basically: > > * consumes normal tuples as they come in, building up some sort of >aggregated representation of what was in those tuples > * then when a tick tuple comes in, it publishes the whole set of data >(e.g., it sends the aggregates to some other bolt for processing, or >publishes to Kafka or Cassandra, whatever) > >Of course, given the most straightforward implementation of that, given >that the bolts typically start at more or less the same time, the tick >tuples all get delivered at the same time. So it's really easy to end up >in a circumstance where some downstream consumer spends 59 seconds out of >60 doing nothing, then gets completely pounded on for a second, then >spends the next 59 seconds doing nothing. > >In our use cases, generally we want to do things like aggregate data for >60 seconds, but the aggregates don't all need to line up. > >I keep thinking that if there was a way to tell Storm that we want a tick >tuple every 60 seconds, but delay for a random number of seconds between >0 and 60 before you send the first one, that'd just fix this right up. >But I don't see an obvious way to do that. > >Clearly there are ways in which we can take care of this in our code, >they just involve more code. (-: > >It seems like this would be a common use case. Are there better >approaches? Is there some trick that would make it possible to smear the >tick tuples out over time? If you're in this situation, how do you >handle it? > >I'd love to be missing something easy and obvious. > >Thanks! > > -Steve >
