Re: [Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Tathagata Das
You can use Spark Streaming's updateStateByKey to do arbitrary
sessionization.
See the example -
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
All it does is store a single number (count of each word seeing since the
beginning), but you can extend it store arbitrary state data. And then you
can use that state to keep track of gaps, windows, etc.
People have done a lot sessionization using this. I am sure others can
chime in.

On Fri, Aug 7, 2015 at 10:48 AM, Ankur Chauhan  wrote:

> Hi all,
>
> I am trying to figure out how to perform equivalent of "Session windows"
> (as mentioned in https://cloud.google.com/dataflow/model/windowing) using
> spark streaming. Is it even possible (i.e. possible to do efficiently at
> scale). Just to expand on the definition:
>
> Taken from the google dataflow documentation:
>
> The simplest kind of session windowing specifies a minimum gap duration.
> All data arriving below a minimum threshold of time delay is grouped into
> the same window. If data arrives after the minimum specified gap duration
> time, this initiates the start of a new window.
>
>
>
>
> Any help would be appreciated.
>
> -- Ankur Chauhan
>


[Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Ankur Chauhan
Hi all,

I am trying to figure out how to perform equivalent of "Session windows" (as 
mentioned in https://cloud.google.com/dataflow/model/windowing) using spark 
streaming. Is it even possible (i.e. possible to do efficiently at scale). Just 
to expand on the definition:

Taken from the google dataflow documentation:

The simplest kind of session windowing specifies a minimum gap duration. All 
data arriving below a minimum threshold of time delay is grouped into the same 
window. If data arrives after the minimum specified gap duration time, this 
initiates the start of a new window.




Any help would be appreciated.

-- Ankur Chauhan


signature.asc
Description: Message signed with OpenPGP using GPGMail