We have a similar need where we want (likely need) to stagger writes to a
NoSQL database that are triggered by tick tuple. Not an immediate need, but
gonna see what's out there and maybe write it myself.

--John

On Mon, Jan 11, 2016 at 1:07 PM, Aaron.Dossett <[email protected]>
wrote:

> Ah, I see.  I don¹t believe that is possible at all with tuck tuples, but
> perhaps someone else will know.
>
> On 1/11/16, 11:40 AM, "Steve Miller" <[email protected]> wrote:
>
> >Thanks.  It seems like that'd help if we wanted different values per
> >bolt, but we want the same value -- we just want the actual times on the
> >tuples to happen at different times.  That is, if we want stats over a
> >60-second period, we want the tick tuple interval to be 60 seconds
> >everywhere.
> >
> >But let's say we have 60 bolts.  What we have now is that if all the
> >bolts started running at 00:00:23, they all emit/publish at 00:01:23,
> >00:02:23, and so forth.  What I'd like is to have one bolt publish at
> >00:01:23, one at 00:01:24, and so forth up to 00:02:22, or a close
> >approximation.
> >
> >    -Steve
> >
> >On Mon, Jan 11, 2016 at 04:42:03PM +0000, Aaron.Dossett wrote:
> >> Sounds like you are setting the tick tuple value globally for the whole
> >> topology.  You could enable bolt-level configuration and then set the
> >> values of each bolt as you like.  That would be a small amount of new
> >>code
> >> per bolt.
> >>
> >> On 1/11/16, 9:59 AM, "Steve Miller" <[email protected]> wrote:
> >>
> >> >Hi.  In the project I'm working on, we have a lot of code that
> >>basically:
> >> >
> >> >    * consumes normal tuples as they come in, building up some sort of
> >> >aggregated representation of what was in those tuples
> >> >    * then when a tick tuple comes in, it publishes the whole set of
> >>data
> >> >(e.g., it sends the aggregates to some other bolt for processing, or
> >> >publishes to Kafka or Cassandra, whatever)
> >> >
> >> >Of course, given the most straightforward implementation of that, given
> >> >that the bolts typically start at more or less the same time, the tick
> >> >tuples all get delivered at the same time.  So it's really easy to end
> >>up
> >> >in a circumstance where some downstream consumer spends 59 seconds out
> >>of
> >> >60 doing nothing, then gets completely pounded on for a second, then
> >> >spends the next 59 seconds doing nothing.
> >> >
> >> >In our use cases, generally we want to do things like aggregate data
> >>for
> >> >60 seconds, but the aggregates don't all need to line up.
> >> >
> >> >I keep thinking that if there was a way to tell Storm that we want a
> >>tick
> >> >tuple every 60 seconds, but delay for a random number of seconds
> >>between
> >> >0 and 60 before you send the first one, that'd just fix this right up.
> >> >But I don't see an obvious way to do that.
> >> >
> >> >Clearly there are ways in which we can take care of this in our code,
> >> >they just involve more code. (-:
> >> >
> >> >It seems like this would be a common use case.  Are there better
> >> >approaches?  Is there some trick that would make it possible to smear
> >>the
> >> >tick tuples out over time?  If you're in this situation, how do you
> >> >handle it?
> >> >
> >> >I'd love to be missing something easy and obvious.
> >> >
> >> >Thanks!
> >> >
> >> >    -Steve
> >> >
> >>
> >
>
>

Reply via email to