Re: Spark streaming vs. spark usage

Patrick Wendell Wed, 18 Dec 2013 11:33:53 -0800

Hey Nathan,

A DStream adds extra semantics that don't exist in RDD's, for instance
windowing, so there needs to be some extra type information to add
those.


You can directly call existing functions on RDD's using the
'transform' operator. So for instance:

def existingBatchStuff(rdd: in) =
rdd.map.filter....

inputStream.window(X => Y).transform(rdd => existingBatchStuff(rdd))

Could you speak a bit about what type of computation you are doing?
Totally open to ideas about how we could better integrate.

- Patrick




On Tue, Dec 17, 2013 at 8:00 PM, Nathan Kronenfeld
<[email protected]> wrote:
> Hi, Folks.
>
> We've just started looking at Spark Streaming, and I find myself a little
> confused.
>
> As I understood it, one of the main points of the system was that one could
> use the same code when streaming, doing batch processing, or whatnot.
>
> Yet when we try to apply a batch processor that analyzes RDDs to the
> streaming case, we have to copy the code and replace RDDs with DStreams
> everywhere, or dig into details of the component RDDs from which the DStream
> is comprised.
>
> Is there the intention of a common interface between RDD and DStream that we
> could eventually use?  Or is there a different paradigm of working with both
> I'm just missing?
>
>               -Thanks,
>                Nathan
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]

Re: Spark streaming vs. spark usage

Reply via email to