> partitionData.iterator
> }
>
> }
>
>
>
>
>
> *From: *Wenchen Fan <cloud0...@gmail.com>
> *Date: *Wednesday, February 28, 2018 at 12:25 PM
> *To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com>
> *Cc: *"dev@spark.apache.org&
akrar, Jayesh" <jthak...@conversantmedia.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>
Subject: Re: SparkContext - parameter for RDD, but not serializable, why?
My understanding:
RDD is also a driver side stuff like SparkContext, works like a handler to your
distrib
My understanding:
RDD is also a driver side stuff like SparkContext, works like a handler to
your distributed data on the cluster.
However, `RDD.compute` (defines how to produce data for each partition)
needs to be executed on the remote nodes. It's more convenient to make RDD
serializable, and
Hi All,
I was just toying with creating a very rudimentary RDD datasource to understand
the inner workings of RDDs.
It seems that one of the constructors for RDD has a parameter of type
SparkContext, but it (apparently) exists on the driver only and is not
serializable.
Consequently, any