Thank you Sanket for the feedback.
Regards SM > On 19-Nov-2015, at 1:57 PM, Sanket Patil <[email protected]> wrote: > > Hey Sandip: > > TD has already outlined the right approach, but let me add a couple of > thoughts as I recently worked on a similar project. I had to compute some > real-time metrics on streaming data. Also, these metrics had to be aggregated > for hour/day/week/month. My data pipeline was Kafka --> Spark Streaming --> > Cassandra. > > I had a spark streaming job that did the following: (1) receive a window of > raw streaming data and write it to Cassandra, and (2) do only the basic > computations that need to be shown on a real-time dashboard, and store the > results in Cassandra. (I had to use sliding window as my computation involved > joining data that might occur in different time windows.) > > I had a separate set of Spark jobs that pulled the raw data from Cassandra, > computed the aggregations and more complex metrics, and wrote it back to the > relevant Cassandra tables. These jobs ran periodically every few minutes. > > Regards, > Sanket > > On Thu, Nov 19, 2015 at 8:09 AM, Sandip Mehta <[email protected] > <mailto:[email protected]>> wrote: > Thank you TD for your time and help. > > SM >> On 19-Nov-2015, at 6:58 AM, Tathagata Das <[email protected] >> <mailto:[email protected]>> wrote: >> >> There are different ways to do the rollups. Either update rollups from the >> streaming application, or you can generate roll ups in a later process - say >> periodic Spark job every hour. Or you could just generate rollups on demand, >> when it is queried. >> The whole thing depends on your downstream requirements - if you always to >> have up to date rollups to show up in dashboard (even day-level stuff), then >> the first approach is better. Otherwise, second and third approaches are >> more efficient. >> >> TD >> >> >> On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <[email protected] >> <mailto:[email protected]>> wrote: >> TD thank you for your reply. >> >> I agree on data store requirement. I am using HBase as an underlying store. >> >> So for every batch interval of say 10 seconds >> >> - Calculate the time dimension ( minutes, hours, day, week, month and >> quarter ) along with other dimensions and metrics >> - Update relevant base table at each batch interval for relevant metrics for >> a given set of dimensions. >> >> Only caveat I see is I’ll have to update each of the different roll up table >> for each batch window. >> >> Is this a valid approach for calculating time series aggregation? >> >> Regards >> SM >> >> For minutes level aggregates I have set up a streaming window say 10 seconds >> and storing minutes level aggregates across multiple dimension in HBase at >> every window interval. >> >>> On 18-Nov-2015, at 7:45 AM, Tathagata Das <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> For this sort of long term aggregations you should use a dedicated data >>> storage systems. Like a database, or a key-value store. Spark Streaming >>> would just aggregate and push the necessary data to the data store. >>> >>> TD >>> >>> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi, >>> >>> I am working on requirement of calculating real time metrics and building >>> prototype on Spark streaming. I need to build aggregate at Seconds, >>> Minutes, Hours and Day level. >>> >>> I am not sure whether I should calculate all these aggregates as different >>> Windowed function on input DStream or shall I use updateStateByKey function >>> for the same. If I have to use updateStateByKey for these time series >>> aggregation, how can I remove keys from the state after different time >>> lapsed? >>> >>> Please suggest. >>> >>> Regards >>> SM >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> <mailto:[email protected]> >>> For additional commands, e-mail: [email protected] >>> <mailto:[email protected]> >>> >>> >> >> > > > > > -- > SuperReceptionist is now available on Android mobiles. Track your business on > the go with call analytics, recordings, insights and more: Download the app > here <https://play.google.com/store/apps/details?id=com.superreceptionist> > > SuperReceptionist is now available on Android mobiles. Track your business on > the go with call analytics, recordings, insights and more: Download the app > here <https://play.google.com/store/apps/details?id=com.superreceptionist>
