Hi,
Thanks Andrew and Daniel for the response.
Setting spark.shuffle.spill to false didnt make any difference. 5 days
completed in 6 min and 10 days was stuck after around 1hr.
Daniel,in my current use case I cant read all the files to a single RDD.But I
have another use case where I did it i
I've been wondering about this. Is there a difference in performance
between these two?
val rdd1 = sc.textFile(files.mkString(",")) val rdd2 = sc.union(files.map(sc
.textFile(_)))
I don't know about your use-case, Meethu, but it may be worth trying to see
if reading all the files into one RDD (li
How long does it get stuck for? This is a common sign for the OS thrashing
due to out of memory exceptions. If you keep it running longer, does it
throw an error?
Depending on how large your other RDD is (and your join operation), memory
pressure may or may not be the problem at all. It could be t
Hi all,
I want to do a recursive leftOuterJoin between an RDD (created from file)
with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from
30 diff files in each iteration of a loop) varying from 1 to 6 million rows.
When I run it for 5 RDDs,its running successfully in