[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-70951044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25928/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-70951031 [Test build #25928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25928/consoleFull) for PR 3884 at commit [`2d0d7f7`](https://github.com/apache/spark/commit/2d0d7f78535a193e96309c81b3a6a5fded71fe48). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-70943337 [Test build #25928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25928/consoleFull) for PR 3884 at commit [`2d0d7f7`](https://github.com/apache/spark/commit/2d0d7f78535a193e96309c81b3a6a5fded71fe48). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-70943192 I audited the uses of `assertNotStopped` and removed a bunch of calls in methods that sometimes didn't throw exceptions on Spark 1.2.0. Pending Jenkins, I'm planning to commit this slightly smaller patch to branch-1.2 for inclusion in 1.2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343271 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1466,17 +1531,29 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } } - def getCheckpointDir = checkpointDir + def getCheckpointDir = { +assertNotStopped() +checkpointDir + } /** Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD). */ - def defaultParallelism: Int = taskScheduler.defaultParallelism + def defaultParallelism: Int = { --- End diff -- This throws an exception because `taskScheduler` is null: ``` scala> sc.defaultParallelism java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343194 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1199,6 +1260,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli */ @deprecated("adding jars no longer creates local copies that need to be deleted", "1.0.0") def clearJars() { +assertNotStopped() --- End diff -- I'll revert this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343236 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1458,6 +1522,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * be a HDFS path if running on a cluster. */ def setCheckpointDir(directory: String) { +assertNotStopped() --- End diff -- This actually works, so I'm removing this check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343173 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1146,6 +1206,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. */ def addJar(path: String) { +assertNotStopped() --- End diff -- This is another one of those toss-up cases: it works some of the time, so I'll remove this check for conservatism's sake. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343083 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -,6 +1170,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli */ @deprecated("adding files no longer creates local copies that need to be deleted", "1.0.0") def clearFiles() { +assertNotStopped() --- End diff -- I'll remove this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343038 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1068,7 +1120,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * Returns an immutable map of RDDs that have marked themselves as persistent via cache() call. * Note that this does not necessarily mean the caching or computation was successful. */ - def getPersistentRDDs: Map[Int, RDD[_]] = persistentRdds.toMap --- End diff -- This is safe, so I'll revert this error-checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343023 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1059,6 +1110,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli */ @DeveloperApi def getRDDStorageInfo: Array[RDDInfo] = { +assertNotStopped() --- End diff -- Same here: ``` scala> sc.getRDDStorageInfo org.apache.spark.SparkException: Error sending message as actor is null [message = GetStorageStatus] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221) at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:152) at org.apache.spark.SparkContext.getExecutorStorageStatus(SparkContext.scala:1068) at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1052) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(N ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23343005 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1047,6 +1097,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * memory available for caching. */ def getExecutorMemoryStatus: Map[String, (Long, Long)] = { --- End diff -- This throws an error, so I'll keep it: ``` scala> sc.getExecutorMemoryStatus org.apache.spark.SparkException: Error sending message as actor is null [message = GetMemoryStatus] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221) at org.apache.spark.storage.BlockManagerMaster.getMemoryStatus(BlockManagerMaster.scala:148) at org.apache.spark.SparkContext.getExecutorMemoryStatus(SparkContext.scala:1039) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342970 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1002,6 +1047,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli */ @DeveloperApi override def requestExecutors(numAdditionalExecutors: Int): Boolean = { +assertNotStopped() --- End diff -- This is a toss-up, since I'd expect to throw some sort of error after the scheduler backend is stopped, but there are many cases where it's a no-op and doesn't throw an error. Therefore, I'll remove this, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342907 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -992,6 +1036,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli */ @DeveloperApi def addSparkListener(listener: SparkListener) { +assertNotStopped() --- End diff -- AddSparkListener technically works, so I'll remove this, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342885 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -969,6 +1012,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * use `SparkFiles.get(fileName)` to find its download location. */ def addFile(path: String) { --- End diff -- This throws an NPE: ``` scala> sc.addFile("/usr/share/dict/words") java.lang.NullPointerException at org.apache.spark.SparkFiles$.getRootDirectory(SparkFiles.scala:37) at org.apache.spark.SparkContext.addFile(SparkContext.scala:975) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342851 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -955,6 +993,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * The variable will be sent to each cluster only once. */ def broadcast[T: ClassTag](value: T): Broadcast[T] = { --- End diff -- Broadcast, on the other hand, throws a NPE: ``` scala> sc.broadcast(0) java.lang.NullPointerException at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:951) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342825 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -906,8 +936,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * Create an [[org.apache.spark.Accumulator]] variable of a given type, which tasks can "add" * values to using the `+=` method. Only the driver can access the accumulator's `value`. */ - def accumulator[T](initialValue: T)(implicit param: AccumulatorParam[T]) = --- End diff -- Same for these accumulator methods, so I'll revert these changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342756 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -891,14 +913,22 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** Build the union of a list of RDDs. */ - def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = new UnionRDD(this, rdds) + def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = { --- End diff -- Same here; instantiating new UnionRDDs doesn't cause an error if SC is stopped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342706 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -891,14 +913,22 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** Build the union of a list of RDDs. */ - def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = new UnionRDD(this, rdds) + def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = { +assertNotStopped() +new UnionRDD(this, rdds) + } /** Build the union of a list of RDDs passed as variable-length arguments. */ - def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T] = + def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T] = { +assertNotStopped() new UnionRDD(this, Seq(first) ++ rest) + } /** Get an RDD that has no partitions or elements. */ - def emptyRDD[T: ClassTag] = new EmptyRDD[T](this) + def emptyRDD[T: ClassTag] = { --- End diff -- This did _not_ cause an exception in 1.2, so I'll remove this `assertNotStopped` call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342620 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -550,6 +560,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * Hadoop-supported file system URI, and return it as an RDD of Strings. */ def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String] = { +assertNotStopped() --- End diff -- Same for textFile: ``` scala> sc.textFile("/usr/share/dict/words") java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1468) at org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:545) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r23342589 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -526,6 +534,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli * the argument to avoid this. */ def parallelize[T: ClassTag](seq: Seq[T], numSlices: Int = defaultParallelism): RDD[T] = { --- End diff -- In 1.2, calling this when SparkContext was stopped would throw a NullPointerException: ``` scala> sc.parallelize(1 to 100) java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461) at org.apache.spark.SparkContext.parallelize$default$2(SparkContext.scala:521) at $iwC$$iwC$$iwC$$iwC.(:13) at $iwC$$iwC$$iwC.(:18) at $iwC$$iwC.(:20) at $iwC.(:22) at (:24) at .(:28) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-70910247 @andrewor14 @pwendell @tdas How do you feel about committing this patch, as-is, for 1.2.1? I think it could be a huge support burden reducer / usability improver for many users, since a lot of these issues are really hard to debug. If you'd like, I can grab a copy of branch-1.2 and manually check that all of the `assertNotStopped` methods threw errors (just to make sure that we're not missing any corner-cases where we change behavior for something that used to work). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69684994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25441/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69684987 [Test build #25441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25441/consoleFull) for PR 3884 at commit [`8cff41a`](https://github.com/apache/spark/commit/8cff41aa0b2a22573e61e413e972c727eb6782a8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69678415 [Test build #25441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25441/consoleFull) for PR 3884 at commit [`8cff41a`](https://github.com/apache/spark/commit/8cff41aa0b2a22573e61e413e972c727eb6782a8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69677921 **[Test build #25433 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25433/consoleFull)** for PR 3884 at commit [`6ef68d0`](https://github.com/apache/spark/commit/6ef68d050ca5ef52b25f10537c9e0ac44562ebc0) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69677929 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25433/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69664661 [Test build #25433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25433/consoleFull) for PR 3884 at commit [`6ef68d0`](https://github.com/apache/spark/commit/6ef68d050ca5ef52b25f10537c9e0ac44562ebc0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69637285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25426/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69637280 [Test build #25426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25426/consoleFull) for PR 3884 at commit [`9f6a0b8`](https://github.com/apache/spark/commit/9f6a0b8d3501b2872e75f1eff0bf1e4b765183e0). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69637150 [Test build #25426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25426/consoleFull) for PR 3884 at commit [`9f6a0b8`](https://github.com/apache/spark/commit/9f6a0b8d3501b2872e75f1eff0bf1e4b765183e0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69636824 Alright, I've updated this to use IllegalStateException when methods are called on a stopped SparkContext. I've also added some more helpful error messages to PySpark when users attempt to mis-use SparkContext or RDDs from actions, transformations, or broadcast variables. I plan to merge this into master, then backport a smaller patch which excludes most of the `assertNotStopped()` calls. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69081096 Maybe IllegalStateException? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-69080412 Any opinions on the `assertNotStopped()` checks here? I'd like to backport this patch to other branches since I think it's a huge usability improvement. If there are any changes here that you think might break user programs that used to work, then I'll remove them and re-add them in a separate PR. (Note: I still need to do the PySpark half of the "nested RDDs" and "nested actions" checks) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68946424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25114/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68946403 [Test build #25114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25114/consoleFull) for PR 3884 at commit [`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68936082 [Test build #25114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25114/consoleFull) for PR 3884 at commit [`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68935347 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68820890 [Test build #25085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25085/consoleFull) for PR 3884 at commit [`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68820897 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25085/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68820376 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25084/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68820371 [Test build #25084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25084/consoleFull) for PR 3884 at commit [`99cc09f`](https://github.com/apache/spark/commit/99cc09f6996706f5d067d878d486d8f5dc2c31f7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68817012 [Test build #25085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25085/consoleFull) for PR 3884 at commit [`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68816630 I've added some additional tests to prevent users from calling methods on a stopped SparkContext, since this usually resulted in confusing NullPointerExceptions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68816618 [Test build #25084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25084/consoleFull) for PR 3884 at commit [`99cc09f`](https://github.com/apache/spark/commit/99cc09f6996706f5d067d878d486d8f5dc2c31f7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22448625 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, PoissonSampler, Bernoulli * on RDD internals. */ abstract class RDD[T: ClassTag]( -@transient private var sc: SparkContext, +@transient private var _sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable with Logging { + if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) { +throw new SparkException("Spark does not support nested RDDs (see SPARK-5063)") + } + + private def sc: SparkContext = { +if (_sc == null) { + throw new SparkException( +"Can only define RDDs and perform actions on the driver, not in tasks (see SPARK-5063)") --- End diff -- Looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22447994 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with SharedSparkContext { mutableDependencies += dep } } + + test("Nested RDDs are not supported (SPARK-5063)") { --- End diff -- A quick `git grep` suggests that every suite uses its own style and that there's not an obvious dominant style. I'll just change these tests to the lowercase convention to match RDDSuite, but leave the BroadcastSuite ones as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22447976 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, PoissonSampler, Bernoulli * on RDD internals. */ abstract class RDD[T: ClassTag]( -@transient private var sc: SparkContext, +@transient private var _sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable with Logging { + if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) { +throw new SparkException("Spark does not support nested RDDs (see SPARK-5063)") + } + + private def sc: SparkContext = { +if (_sc == null) { + throw new SparkException( +"Can only define RDDs and perform actions on the driver, not in tasks (see SPARK-5063)") --- End diff -- Sure. How about this: > RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, `rdd1.map(x => rdd2.values.count() * x)` is invalid because the `values` transformation and `count` action cannot be performed inside of the `rdd1.map` transformation. For more information, see SPARK-5063. Kind of verbose, but I think an example might be the clearest way to explain this, esp. to someone unfamiliar with the terminology. It might be nice to keep the JIRA reference since it will make the exception easier to search for (I'm kind of inspired by React.js's error messages, which include URL-shortened links to the documentation). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22447845 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with SharedSparkContext { mutableDependencies += dep } } + + test("Nested RDDs are not supported (SPARK-5063)") { --- End diff -- It varies from suite-to-suite; most start with lowercase because they start with method names. If you look at BroadcastSuite, though, most use uppercase style like I've done here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22447826 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with SharedSparkContext { mutableDependencies += dep } } + + test("Nested RDDs are not supported (SPARK-5063)") { --- End diff -- a nit pick: i don't think we have a standard, but so far test case names start with lower case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3884#discussion_r22447822 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, PoissonSampler, Bernoulli * on RDD internals. */ abstract class RDD[T: ClassTag]( -@transient private var sc: SparkContext, +@transient private var _sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable with Logging { + if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) { +throw new SparkException("Spark does not support nested RDDs (see SPARK-5063)") + } + + private def sc: SparkContext = { +if (_sc == null) { + throw new SparkException( +"Can only define RDDs and perform actions on the driver, not in tasks (see SPARK-5063)") --- End diff -- Pointing to a JIRA ticket might not be the most friendly way for users. Maybe make it more verbose and explain it in one or two lines? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68668004 Haha, the `org.apache.spark.broadcast.BroadcastSuite.Using broadcast after destroy prints callsite` test actually broadcasts an RDD (which is invalid), which is what caused that test failure. I'll fix this up in my next commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68650048 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25036/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68650044 [Test build #25036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25036/consoleFull) for PR 3884 at commit [`57cc8a1`](https://github.com/apache/spark/commit/57cc8a11266770e145c7ca810bec3b95aeefabb3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68647672 @sryza Good idea; I've added a new check which prevents RDDs from being directly broadcasted. I should probably add these checks to PySpark, too. I'm not actually sure what happens if you try to do these invalid things in PySpark, so I should probably try them first and add their errors / stacktraces to the JIRA so that it's easier for me / the support team to pattern-match to this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68647489 [Test build #25036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25036/consoleFull) for PR 3884 at commit [`57cc8a1`](https://github.com/apache/spark/commit/57cc8a11266770e145c7ca810bec3b95aeefabb3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68646490 Will this work for broadcast variables as well? One thing I often see is users trying to directly broadcast an RDD without collecting it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68584489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25005/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68584486 [Test build #25005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25005/consoleFull) for PR 3884 at commit [`15b2e6b`](https://github.com/apache/spark/commit/15b2e6b38d9587790357182abe7918853688722e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3884#issuecomment-68582760 [Test build #25005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25005/consoleFull) for PR 3884 at commit [`15b2e6b`](https://github.com/apache/spark/commit/15b2e6b38d9587790357182abe7918853688722e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/3884 [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations This patch adds more helpful error messages for invalid programs that define nested RDDs and performs actions inside of transformations (e.g. calling `count()` from inside of `map()`). Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-5063 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3884.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3884 commit 15b2e6b38d9587790357182abe7918853688722e Author: Josh Rosen Date: 2015-01-03T04:14:27Z [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org