subject:"\"join operation is taking too much time\""

Re: join operation is taking too much time

2014-06-18 Thread MEETHU MATHEW

Hi, Thanks Andrew and Daniel for the response. Setting spark.shuffle.spill to false didnt make any difference. 5 days completed in 6 min and 10 days was stuck after around 1hr. Daniel,in my current use case I cant read all the files to a single RDD.But I have another use case where I did it i

Re: join operation is taking too much time

2014-06-17 Thread Daniel Darabos

I've been wondering about this. Is there a difference in performance between these two? val rdd1 = sc.textFile(files.mkString(",")) val rdd2 = sc.union(files.map(sc .textFile(_))) I don't know about your use-case, Meethu, but it may be worth trying to see if reading all the files into one RDD (li

Re: join operation is taking too much time

2014-06-17 Thread Andrew Or

How long does it get stuck for? This is a common sign for the OS thrashing due to out of memory exceptions. If you keep it running longer, does it throw an error? Depending on how large your other RDD is (and your join operation), memory pressure may or may not be the problem at all. It could be t

join operation is taking too much time

2014-06-17 Thread MEETHU MATHEW

Hi all, I want to do a recursive leftOuterJoin between an RDD (created from file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop) varying from 1 to 6 million rows. When I run it for 5 RDDs,its running successfully in

Re: join operation is taking too much time

Re: join operation is taking too much time

Re: join operation is taking too much time

join operation is taking too much time

4 matches

Site Navigation

Mail list logo

Footer information