Re: Flush aggregated data every X seconds

Raphael Hsieh Thu, 24 Apr 2014 20:54:08 -0700

Thank you very much for your quick reply Corey,
Unfortunately I don't believe the TickSpout exists within Trident yet. I
have seen the threads discussing the implementation of a sliding window and
I've read Michael Noll's
blog<http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/>about
it as well. I don't need a sliding window, as much as just multiple
window chunks if that makes sense haha.


What I'm thinking about resorting to is increasing my Batch size to be much
larger than the throughput of the spout, then at the end of my topology,
doing an aggregation such that everything aggregates to a single tuple, and
running a ".each" on that single tuple with the function just sleeping for
X time.

My theory is that this should allow the stream to back up enough such that
the next batch takes (roughly) the entire next X time amount of data.

Can anyone validate that this technique will work ?


On Thu, Apr 24, 2014 at 8:36 PM, Corey Nolet <[email protected]> wrote:

> Raphael, in your case it sounds like a "TickSpout" could be useful where
> you emit a tuple every n time slices and then sleep until needing to emit
> another. I'm not sure how that'd work in a Trident aggregator, however.
>
> I'm not sure if this is something Nathan or the community would approve
> of, but I've been writing my own framework for doing sliding/tumbling
> windows in Storm that allow aggregations and triggering/eviction by count,
> time, and other policies like "when the time difference between the first
> item and the last item in a window is less than x". The bolts could easily
> be ripped out for doing your own aggregations.
>
> It's located here: https://github.com/calrissian/flowbox
>
> It's very much so in the proof of concept stage. My other requirement (and
> the reason I cared so much to implement this) was that the rules need to be
> dynamic and the topology needs to be static as to make the best use of
> resources while users are defining that they need.
>
>
>
> On Thu, Apr 24, 2014 at 11:27 PM, Raphael Hsieh <[email protected]>wrote:
>
>> Is there a way in Storm Trident to aggregate data over a certain time
>> period and have it flush the data out to an external data store after that
>> time period is up ?
>>
>> Trident does not have the functionality of Tick Tuples yet, so I cannot
>> use that. Everything I've been researching leads to believe that this is
>> not possible in Storm/Trident, however this seems to me to be a fairly
>> standard use case of any streaming map reduce library.
>>
>> For example,
>> If I am receiving a stream of integers
>> I want to aggregate all those integers over a period of 1 second, then
>> persist it into an external datastore.
>>
>> This is not in order to count how much it will add up to over X amount of
>> time, rather I would like to minimize the read/write/updates I do to said
>> datastore.
>>
>> There are many ways in order to reduce these variables, however all of
>> them force me to modify my schema in ways that are unpleasant. Also, I
>> would rather not have my final external datastore be my scratch space,
>> where my program is reading/updating/writing and checking to make sure that
>> the transaction id's line up.
>> Instead I want that scratch work to be done separately, then the final
>> result stored into a final database that no longer needs to do constant
>> updating.
>>
>> Thanks
>> --
>> Raphael Hsieh
>>
>>
>>
>>
>
>


-- 
Raphael Hsieh

Re: Flush aggregated data every X seconds

Reply via email to