Hi, I've experimented with the parameters provided but we are still seeing the same problem, data is still spilling to disk when there's clearly enough memory on the worker nodes.
Please note that data is distributed equally amongst the 6 Hadoop nodes (About 5GB per node). Any workarounds or clues as to why this is still happening please? Thanks, Majd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-writing-to-disk-when-there-s-enough-memory-tp502p678.html Sent from the Apache Spark User List mailing list archive at Nabble.com.