GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/1418
[SPARK-2490] Change recursive visiting on RDD dependencies to iterative
approach
When performing some transformations on RDDs after many iterations, the
dependencies of RDDs could be very long. It can easily cause StackOverflowError
when recursively visiting these dependencies in Spark core. For example:
var rdd = sc.makeRDD(Array(1))
for (i <- 1 to 1000) {
rdd = rdd.coalesce(1).cache()
rdd.collect()
}
This PR changes recursive visiting on rdd's dependencies to iterative
approach to avoid StackOverflowError.
In addition to the recursive visiting, since the Java serializer has a
known [bug](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4152790) that
causes StackOverflowError too when serializing/deserializing a large graph of
objects. So applying this PR only solves part of the problem. Using
KryoSerializer to replace Java serializer might be helpful. However, since
KryoSerializer is not supported for `spark.closure.serializer` now, I can not
test if KryoSerializer can solve Java serializer's problem completely.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 remove_recursive_visit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1418.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1418
----
commit 900538bbcb61683bf1418534c2466463a630569f
Author: Liang-Chi Hsieh <[email protected]>
Date: 2014-07-15T10:58:45Z
change recursive visiting on rdd's dependencies to iterative approach to
avoid stackoverflowerror.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---