Thanks. We've run into timeout issues at scale as well. We were able to workaround them by setting the following JVM options:
-Dspark.akka.askTimeout=300 -Dspark.akka.timeout=300 -Dspark.worker.timeout=300 NOTE: these JVM options *must* be set on worker nodes (and not just the driver/master) for the settings to take. Allen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-on-Data-size-larger-than-Memory-size-tp6589p7435.html Sent from the Apache Spark User List mailing list archive at Nabble.com.