Hi Pro,
One map() operation in my Spark APP takes an RDD[A] as input and map each
element in RDD[A] using a custom mapping function func(x:A):B to another object
of type B.
I received lots of OutOfMemory error, and after some debugging I find this is
because func() requires significant amount
Hi Pro,
I have a question regarding calling cache()/persist() on an RDD. All RDDs in
Spark are lazily evaluated, but does calling cache()/persist() on a RDD trigger
its immediate evaluation?
My spark app is something like this:
val rdd = sc.textFile().map()
rdd.persist()
while(true){
val c
Hi Pro,
I want to merge elements in a Spark RDD when the two elements satisfy certain
condition
Suppose there is a RDD[Seq[Int]], where some Seq[Int] in this RDD contain
overlapping elements. The task is to merge all overlapping Seq[Int] in this
RDD, and store the result into a new RDD.
For ex