Re: Spreading tick tuples out?

Aaron . Dossett Mon, 11 Jan 2016 10:09:07 -0800

Ah, I see.  I don¹t believe that is possible at all with tuck tuples, but
perhaps someone else will know.


On 1/11/16, 11:40 AM, "Steve Miller" <[email protected]> wrote:

>Thanks.  It seems like that'd help if we wanted different values per
>bolt, but we want the same value -- we just want the actual times on the
>tuples to happen at different times.  That is, if we want stats over a
>60-second period, we want the tick tuple interval to be 60 seconds
>everywhere.
>
>But let's say we have 60 bolts.  What we have now is that if all the
>bolts started running at 00:00:23, they all emit/publish at 00:01:23,
>00:02:23, and so forth.  What I'd like is to have one bolt publish at
>00:01:23, one at 00:01:24, and so forth up to 00:02:22, or a close
>approximation.
>
>    -Steve
>
>On Mon, Jan 11, 2016 at 04:42:03PM +0000, Aaron.Dossett wrote:
>> Sounds like you are setting the tick tuple value globally for the whole
>> topology.  You could enable bolt-level configuration and then set the
>> values of each bolt as you like.  That would be a small amount of new
>>code
>> per bolt.
>> 
>> On 1/11/16, 9:59 AM, "Steve Miller" <[email protected]> wrote:
>> 
>> >Hi.  In the project I'm working on, we have a lot of code that
>>basically:
>> >
>> >    * consumes normal tuples as they come in, building up some sort of
>> >aggregated representation of what was in those tuples
>> >    * then when a tick tuple comes in, it publishes the whole set of
>>data
>> >(e.g., it sends the aggregates to some other bolt for processing, or
>> >publishes to Kafka or Cassandra, whatever)
>> >
>> >Of course, given the most straightforward implementation of that, given
>> >that the bolts typically start at more or less the same time, the tick
>> >tuples all get delivered at the same time.  So it's really easy to end
>>up
>> >in a circumstance where some downstream consumer spends 59 seconds out
>>of
>> >60 doing nothing, then gets completely pounded on for a second, then
>> >spends the next 59 seconds doing nothing.
>> >
>> >In our use cases, generally we want to do things like aggregate data
>>for
>> >60 seconds, but the aggregates don't all need to line up.
>> >
>> >I keep thinking that if there was a way to tell Storm that we want a
>>tick
>> >tuple every 60 seconds, but delay for a random number of seconds
>>between
>> >0 and 60 before you send the first one, that'd just fix this right up.
>> >But I don't see an obvious way to do that.
>> >
>> >Clearly there are ways in which we can take care of this in our code,
>> >they just involve more code. (-:
>> >
>> >It seems like this would be a common use case.  Are there better
>> >approaches?  Is there some trick that would make it possible to smear
>>the
>> >tick tuples out over time?  If you're in this situation, how do you
>> >handle it?
>> >
>> >I'd love to be missing something easy and obvious.
>> >
>> >Thanks!
>> >
>> >    -Steve
>> >
>> 
>

Re: Spreading tick tuples out?

Reply via email to