Re: Unable to broadcast a very large variable

2019-04-11 Thread V0lleyBallJunki3
I am not using pyspark. The job is written in Scala -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-11 Thread Sagar Grover
Extending Arthur's question, I am facing the same problem(no of partitions were huge- cored 960, partitions - 16000). I tried to decrease the number of partitions with coalesce, but the problem is unbalanced data. After using coalesce, it gives me Java out of heap space error. There was no out of h