I'm running a Spark Streaming job on 1.3.1 which contains an updateStateByKey. The job works perfectly fine, but at some point (after a few runs), it starts shuffling to disk no matter how much memory I give the executors.
I have tried changing --executor-memory on spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction, and spark.storage.unrollFraction. But no matter how I configure these, it always spills to disk around 2.5GB. What is the best way to avoid spilling shuffle to disk? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org