1."What do people generally do once they have the data in Spark to enable real-time analytics. Do you store it in some persistent storage and analyze it within some window (let's say the last five minutes) after enough has been aggregated or...?" >>>It is based on your application. If you have dash boarding / alerting application then you would push the aggregated results to UI / message queue. However, if you want these results to be available for later queries, it would need to be persisted in some storage like HBase.
2. "If I want to count the number of occurrences of an event within a given time frame within a streaming context - does Spark support this and how? " >>>Spark supporting windowing, as well as counter. On Thu, Jan 9, 2014 at 11:07 AM, Ognen Duzlevski <[email protected]>wrote: > Hello, > > I am new to spark and have a few questions that are fairly general in > nature: > > I am trying to set up a real-time data analysis pipeline where I have > clients sending events to a collection point (load balanced) and onward the > "collectors" send the data to a Spark cluster via zeromq pub/sub (just an > experiment). > > What do people generally do once they have the data in Spark to enable > real-time analytics. Do you store it in some persistent storage and analyze > it within some window (let's say the last five minutes) after enough has > been aggregated or...? > > If I want to count the number of occurrences of an event within a given > time frame within a streaming context - does Spark support this and how? > General guidelines are OK and any experiences, knowledge and advice is > greatly appreciated! > > Thanks > Ognen >
