Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3884#discussion_r22447976
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler,
PoissonSampler, Bernoulli
* on RDD internals.
*/
abstract class RDD[T: ClassTag](
- @transient private var sc: SparkContext,
+ @transient private var _sc: SparkContext,
@transient private var deps: Seq[Dependency[_]]
) extends Serializable with Logging {
+ if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) {
+ throw new SparkException("Spark does not support nested RDDs (see
SPARK-5063)")
+ }
+
+ private def sc: SparkContext = {
+ if (_sc == null) {
+ throw new SparkException(
+ "Can only define RDDs and perform actions on the driver, not in
tasks (see SPARK-5063)")
--- End diff --
Sure. How about this:
> RDD transformations and actions can only be invoked by the driver, not
inside of other transformations; for example, `rdd1.map(x =>
rdd2.values.count() * x)` is invalid because the `values` transformation and
`count` action cannot be performed inside of the `rdd1.map` transformation.
For more information, see SPARK-5063.
Kind of verbose, but I think an example might be the clearest way to
explain this, esp. to someone unfamiliar with the terminology.
It might be nice to keep the JIRA reference since it will make the
exception easier to search for (I'm kind of inspired by React.js's error
messages, which include URL-shortened links to the documentation).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]