Spark Streaming Shuffle to Disk

spearson23 Fri, 04 Dec 2015 09:33:56 -0800

I'm running a Spark Streaming job on 1.3.1 which contains an
updateStateByKey.  The job works perfectly fine, but at some point (after a
few runs), it starts shuffling to disk no matter how much memory I give the
executors.


I have tried changing --executor-memory on spark-submit,
spark.shuffle.memoryFraction, spark.storage.memoryFraction, and
spark.storage.unrollFraction.  But no matter how I configure these, it
always spills to disk around 2.5GB.  

What is the best way to avoid spilling shuffle to disk?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Spark Streaming Shuffle to Disk

Reply via email to