We have a similar need where we want (likely need) to stagger writes to a NoSQL database that are triggered by tick tuple. Not an immediate need, but gonna see what's out there and maybe write it myself.
--John On Mon, Jan 11, 2016 at 1:07 PM, Aaron.Dossett <[email protected]> wrote: > Ah, I see. I don¹t believe that is possible at all with tuck tuples, but > perhaps someone else will know. > > On 1/11/16, 11:40 AM, "Steve Miller" <[email protected]> wrote: > > >Thanks. It seems like that'd help if we wanted different values per > >bolt, but we want the same value -- we just want the actual times on the > >tuples to happen at different times. That is, if we want stats over a > >60-second period, we want the tick tuple interval to be 60 seconds > >everywhere. > > > >But let's say we have 60 bolts. What we have now is that if all the > >bolts started running at 00:00:23, they all emit/publish at 00:01:23, > >00:02:23, and so forth. What I'd like is to have one bolt publish at > >00:01:23, one at 00:01:24, and so forth up to 00:02:22, or a close > >approximation. > > > > -Steve > > > >On Mon, Jan 11, 2016 at 04:42:03PM +0000, Aaron.Dossett wrote: > >> Sounds like you are setting the tick tuple value globally for the whole > >> topology. You could enable bolt-level configuration and then set the > >> values of each bolt as you like. That would be a small amount of new > >>code > >> per bolt. > >> > >> On 1/11/16, 9:59 AM, "Steve Miller" <[email protected]> wrote: > >> > >> >Hi. In the project I'm working on, we have a lot of code that > >>basically: > >> > > >> > * consumes normal tuples as they come in, building up some sort of > >> >aggregated representation of what was in those tuples > >> > * then when a tick tuple comes in, it publishes the whole set of > >>data > >> >(e.g., it sends the aggregates to some other bolt for processing, or > >> >publishes to Kafka or Cassandra, whatever) > >> > > >> >Of course, given the most straightforward implementation of that, given > >> >that the bolts typically start at more or less the same time, the tick > >> >tuples all get delivered at the same time. So it's really easy to end > >>up > >> >in a circumstance where some downstream consumer spends 59 seconds out > >>of > >> >60 doing nothing, then gets completely pounded on for a second, then > >> >spends the next 59 seconds doing nothing. > >> > > >> >In our use cases, generally we want to do things like aggregate data > >>for > >> >60 seconds, but the aggregates don't all need to line up. > >> > > >> >I keep thinking that if there was a way to tell Storm that we want a > >>tick > >> >tuple every 60 seconds, but delay for a random number of seconds > >>between > >> >0 and 60 before you send the first one, that'd just fix this right up. > >> >But I don't see an obvious way to do that. > >> > > >> >Clearly there are ways in which we can take care of this in our code, > >> >they just involve more code. (-: > >> > > >> >It seems like this would be a common use case. Are there better > >> >approaches? Is there some trick that would make it possible to smear > >>the > >> >tick tuples out over time? If you're in this situation, how do you > >> >handle it? > >> > > >> >I'd love to be missing something easy and obvious. > >> > > >> >Thanks! > >> > > >> > -Steve > >> > > >> > > > >
