Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan
> partitionData.iterator > } > > } > > > > > > *From: *Wenchen Fan <cloud0...@gmail.com> > *Date: *Wednesday, February 28, 2018 at 12:25 PM > *To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> > *Cc: *"dev@spark.apache.org&

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh
akrar, Jayesh" <jthak...@conversantmedia.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org> Subject: Re: SparkContext - parameter for RDD, but not serializable, why? My understanding: RDD is also a driver side stuff like SparkContext, works like a handler to your distrib

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan
My understanding: RDD is also a driver side stuff like SparkContext, works like a handler to your distributed data on the cluster. However, `RDD.compute` (defines how to produce data for each partition) needs to be executed on the remote nodes. It's more convenient to make RDD serializable, and

SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh
Hi All, I was just toying with creating a very rudimentary RDD datasource to understand the inner workings of RDDs. It seems that one of the constructors for RDD has a parameter of type SparkContext, but it (apparently) exists on the driver only and is not serializable. Consequently, any