Re: Unable to broadcast a very large variable
I am not using pyspark. The job is written in Scala -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question about relationship between number of files and initial tasks(partitions)
Extending Arthur's question, I am facing the same problem(no of partitions were huge- cored 960, partitions - 16000). I tried to decrease the number of partitions with coalesce, but the problem is unbalanced data. After using coalesce, it gives me Java out of heap space error. There was no out of h