Thanks. We've run into timeout issues at scale as well. We were able to
workaround them by setting the following JVM options:
-Dspark.akka.askTimeout=300
-Dspark.akka.timeout=300
-Dspark.worker.timeout=300
NOTE: these JVM options *must* be set on worker nodes (and not just the
driver/master) for
Thanks for the clarification.
What is the proper way to configure RDDs when your aggregate data size
exceeds your available working memory size? In particular, in additional to
typical operations, I'm performing cogroups, joins, and coalesces/shuffles.
I see that the default storage level for RDD
We're running into an issue where periodically the master loses connectivity
with workers in the spark cluster. We believe this issue tends to manifest
when the cluster is under heavy load, but we're not entirely sure when it
happens. I've seen one or two other messages to this list about this issu
Hi all,
I had a question: if I have an RDD containing mutable values, and I run a
function over the RDD which mucks with the mutable values, what happens?
What happens in the case of a cogroup? e.g.:
inputRdd.cogroup(inputRdd2).flatMapValues(functionThatModifiesValues())
Will this result in un