Re: Spark streaming vs. spark usage

Patrick Wendell Wed, 18 Dec 2013 13:50:28 -0800

Hey Nathan,

>  If I understand correctly, that calls existingBatchStuff on each RDD from
> which the window is made, not on the window as a whole.  If I want a
> combined result across the whole window, I'm not sure how to do that
> directly.


Actually, once you call window() you get a new sequence of RDD's where
each RDD actually represents the *entire* sliding window at that time.
So if you are doing aggregations either inside of your batch function
or on the subsequent DStream, it applies to the entire window.

For instance, if you do this:

inputStream.window(Seconds(30)).transform(rdd =>
existingBatchStuff(rdd)).reduce(_ + _)

Your existingBatchStuff as well as the reduce applies to the entire
rolling window.

This is the basic model is that you can compose your existing code
with the concept of rolling windows.

- Patrick

Re: Spark streaming vs. spark usage

Reply via email to