List(x.next).iterator is giving you the first element from each partition,
which would be 1, 4 and 7 respectively.
On 3/18/15, 10:19 AM, "ashish.usoni" wrote:
>I am trying to understand about mapPartitions but i am still not sure how
>it
>works
>
>in the below example it create three partition
>
What's the best way to go from:
RDD[(A, B)] to (RDD[A], RDD[B])
If I do:
def separate[A, B](k: RDD[(A, B)]) = (k.map(_._1), k.map(_._2))
Which is the obvious solution, this runs two maps in the cluster. Can I do
some kind of a fold instead:
def separate[A, B](l: List[(A, B)]) = l.foldLeft(Li
So the page that talks about settings:
http://spark.apache.org/docs/1.2.1/configuration.html seems to not apply when
running local contexts. I have a shell script that starts my job:
xport SPARK_MASTER_OPTS="-Dsun.io.serialization.extendedDebugInfo=true"
export SPARK_WORKER_OPTS="-Dsun.io.ser