[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-70951044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25928/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-70951031
  
  [Test build #25928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25928/consoleFull)
 for   PR 3884 at commit 
[`2d0d7f7`](https://github.com/apache/spark/commit/2d0d7f78535a193e96309c81b3a6a5fded71fe48).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-70943337
  
  [Test build #25928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25928/consoleFull)
 for   PR 3884 at commit 
[`2d0d7f7`](https://github.com/apache/spark/commit/2d0d7f78535a193e96309c81b3a6a5fded71fe48).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-70943192
  
I audited the uses of `assertNotStopped` and removed a bunch of calls in 
methods that sometimes didn't throw exceptions on Spark 1.2.0.  Pending 
Jenkins, I'm planning to commit this slightly smaller patch to branch-1.2 for 
inclusion in 1.2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343271
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1466,17 +1531,29 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
 }
   }
 
-  def getCheckpointDir = checkpointDir
+  def getCheckpointDir = {
+assertNotStopped()
+checkpointDir
+  }
 
   /** Default level of parallelism to use when not given by user (e.g. 
parallelize and makeRDD). */
-  def defaultParallelism: Int = taskScheduler.defaultParallelism
+  def defaultParallelism: Int = {
--- End diff --

This throws an exception because `taskScheduler` is null:

```
scala> sc.defaultParallelism
java.lang.NullPointerException
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343194
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1199,6 +1260,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
*/
   @deprecated("adding jars no longer creates local copies that need to be 
deleted", "1.0.0")
   def clearJars() {
+assertNotStopped()
--- End diff --

I'll revert this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343236
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1458,6 +1522,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* be a HDFS path if running on a cluster.
*/
   def setCheckpointDir(directory: String) {
+assertNotStopped()
--- End diff --

This actually works, so I'm removing this check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343173
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1146,6 +1206,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
*/
   def addJar(path: String) {
+assertNotStopped()
--- End diff --

This is another one of those toss-up cases: it works some of the time, so 
I'll remove this check for conservatism's sake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343083
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -,6 +1170,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
*/
   @deprecated("adding files no longer creates local copies that need to be 
deleted", "1.0.0")
   def clearFiles() {
+assertNotStopped()
--- End diff --

I'll remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343038
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1068,7 +1120,10 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
* Returns an immutable map of RDDs that have marked themselves as 
persistent via cache() call.
* Note that this does not necessarily mean the caching or computation 
was successful.
*/
-  def getPersistentRDDs: Map[Int, RDD[_]] = persistentRdds.toMap
--- End diff --

This is safe, so I'll revert this error-checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343023
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1059,6 +1110,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
*/
   @DeveloperApi
   def getRDDStorageInfo: Array[RDDInfo] = {
+assertNotStopped()
--- End diff --

Same here:

```
scala> sc.getRDDStorageInfo
org.apache.spark.SparkException: Error sending message as actor is null 
[message = GetStorageStatus]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178)
at 
org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221)
at 
org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:152)
at 
org.apache.spark.SparkContext.getExecutorStorageStatus(SparkContext.scala:1068)
at 
org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1052)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(N
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23343005
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1047,6 +1097,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* memory available for caching.
*/
   def getExecutorMemoryStatus: Map[String, (Long, Long)] = {
--- End diff --

This throws an error, so I'll keep it:

```
scala> sc.getExecutorMemoryStatus
org.apache.spark.SparkException: Error sending message as actor is null 
[message = GetMemoryStatus]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178)
at 
org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221)
at 
org.apache.spark.storage.BlockManagerMaster.getMemoryStatus(BlockManagerMaster.scala:148)
at 
org.apache.spark.SparkContext.getExecutorMemoryStatus(SparkContext.scala:1039)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342970
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1002,6 +1047,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
*/
   @DeveloperApi
   override def requestExecutors(numAdditionalExecutors: Int): Boolean = {
+assertNotStopped()
--- End diff --

This is a toss-up, since I'd expect to throw some sort of error after the 
scheduler backend is stopped, but there are many cases where it's a no-op and 
doesn't throw an error.  Therefore, I'll remove this, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342907
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -992,6 +1036,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
*/
   @DeveloperApi
   def addSparkListener(listener: SparkListener) {
+assertNotStopped()
--- End diff --

AddSparkListener technically works, so I'll remove this, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342885
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -969,6 +1012,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* use `SparkFiles.get(fileName)` to find its download location.
*/
   def addFile(path: String) {
--- End diff --

This throws an NPE:

```
scala> sc.addFile("/usr/share/dict/words")
java.lang.NullPointerException
at org.apache.spark.SparkFiles$.getRootDirectory(SparkFiles.scala:37)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:975)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342851
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -955,6 +993,11 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* The variable will be sent to each cluster only once.
*/
   def broadcast[T: ClassTag](value: T): Broadcast[T] = {
--- End diff --

Broadcast, on the other hand, throws a NPE:

```
scala> sc.broadcast(0)
java.lang.NullPointerException
at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:79)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:951)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342825
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -906,8 +936,10 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* Create an [[org.apache.spark.Accumulator]] variable of a given type, 
which tasks can "add"
* values to using the `+=` method. Only the driver can access the 
accumulator's `value`.
*/
-  def accumulator[T](initialValue: T)(implicit param: AccumulatorParam[T]) 
=
--- End diff --

Same for these accumulator methods, so I'll revert these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342756
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -891,14 +913,22 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   }
 
   /** Build the union of a list of RDDs. */
-  def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = new UnionRDD(this, 
rdds)
+  def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = {
--- End diff --

Same here; instantiating new UnionRDDs doesn't cause an error if SC is 
stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342706
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -891,14 +913,22 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   }
 
   /** Build the union of a list of RDDs. */
-  def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = new UnionRDD(this, 
rdds)
+  def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T] = {
+assertNotStopped()
+new UnionRDD(this, rdds)
+  }
 
   /** Build the union of a list of RDDs passed as variable-length 
arguments. */
-  def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T] =
+  def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T] = {
+assertNotStopped()
 new UnionRDD(this, Seq(first) ++ rest)
+  }
 
   /** Get an RDD that has no partitions or elements. */
-  def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
+  def emptyRDD[T: ClassTag] = {
--- End diff --

This did _not_ cause an exception in 1.2, so I'll remove this 
`assertNotStopped` call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342620
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -550,6 +560,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* Hadoop-supported file system URI, and return it as an RDD of Strings.
*/
   def textFile(path: String, minPartitions: Int = defaultMinPartitions): 
RDD[String] = {
+assertNotStopped()
--- End diff --

Same for textFile:

```
scala> sc.textFile("/usr/share/dict/words")
java.lang.NullPointerException
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461)
at 
org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1468)
at 
org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:545)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23342589
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -526,6 +534,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* the argument to avoid this.
*/
   def parallelize[T: ClassTag](seq: Seq[T], numSlices: Int = 
defaultParallelism): RDD[T] = {
--- End diff --

In 1.2, calling this when SparkContext was stopped would throw a 
NullPointerException:

```
scala> sc.parallelize(1 to 100)
java.lang.NullPointerException
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1461)
at 
org.apache.spark.SparkContext.parallelize$default$2(SparkContext.scala:521)
at $iwC$$iwC$$iwC$$iwC.(:13)
at $iwC$$iwC$$iwC.(:18)
at $iwC$$iwC.(:20)
at $iwC.(:22)
at (:24)
at .(:28)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-70910247
  
@andrewor14 @pwendell @tdas How do you feel about committing this patch, 
as-is, for 1.2.1?  I think it could be a huge support burden reducer / 
usability improver for many users, since a lot of these issues are really hard 
to debug.

If you'd like, I can grab a copy of branch-1.2 and manually check that all 
of the `assertNotStopped` methods threw errors (just to make sure that we're 
not missing any corner-cases where we change behavior for something that used 
to work).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69684994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25441/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69684987
  
  [Test build #25441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25441/consoleFull)
 for   PR 3884 at commit 
[`8cff41a`](https://github.com/apache/spark/commit/8cff41aa0b2a22573e61e413e972c727eb6782a8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69678415
  
  [Test build #25441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25441/consoleFull)
 for   PR 3884 at commit 
[`8cff41a`](https://github.com/apache/spark/commit/8cff41aa0b2a22573e61e413e972c727eb6782a8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69677921
  
**[Test build #25433 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25433/consoleFull)**
 for PR 3884 at commit 
[`6ef68d0`](https://github.com/apache/spark/commit/6ef68d050ca5ef52b25f10537c9e0ac44562ebc0)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69677929
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25433/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69664661
  
  [Test build #25433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25433/consoleFull)
 for   PR 3884 at commit 
[`6ef68d0`](https://github.com/apache/spark/commit/6ef68d050ca5ef52b25f10537c9e0ac44562ebc0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69637285
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25426/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69637280
  
  [Test build #25426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25426/consoleFull)
 for   PR 3884 at commit 
[`9f6a0b8`](https://github.com/apache/spark/commit/9f6a0b8d3501b2872e75f1eff0bf1e4b765183e0).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69637150
  
  [Test build #25426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25426/consoleFull)
 for   PR 3884 at commit 
[`9f6a0b8`](https://github.com/apache/spark/commit/9f6a0b8d3501b2872e75f1eff0bf1e4b765183e0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-12 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69636824
  
Alright, I've updated this to use IllegalStateException when methods are 
called on a stopped SparkContext.  I've also added some more helpful error 
messages to PySpark when users attempt to mis-use SparkContext or RDDs from 
actions, transformations, or broadcast variables.

I plan to merge this into master, then backport a smaller patch which 
excludes most of the `assertNotStopped()` calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69081096
  
Maybe IllegalStateException?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-69080412
  
Any opinions on the `assertNotStopped()` checks here?  I'd like to backport 
this patch to other branches since I think it's a huge usability improvement.  
If there are any changes here that you think might break user programs that 
used to work, then I'll remove them and re-add them in a separate PR.

(Note: I still need to do the PySpark half of the "nested RDDs" and "nested 
actions" checks)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68946424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25114/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68946403
  
  [Test build #25114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25114/consoleFull)
 for   PR 3884 at commit 
[`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68936082
  
  [Test build #25114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25114/consoleFull)
 for   PR 3884 at commit 
[`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-06 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68935347
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68820890
  
  [Test build #25085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25085/consoleFull)
 for   PR 3884 at commit 
[`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68820897
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25085/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68820376
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25084/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68820371
  
  [Test build #25084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25084/consoleFull)
 for   PR 3884 at commit 
[`99cc09f`](https://github.com/apache/spark/commit/99cc09f6996706f5d067d878d486d8f5dc2c31f7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68817012
  
  [Test build #25085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25085/consoleFull)
 for   PR 3884 at commit 
[`b39e041`](https://github.com/apache/spark/commit/b39e04172d46b036c467b1650f7c27f799bfdfc0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68816630
  
I've added some additional tests to prevent users from calling methods on a 
stopped SparkContext, since this usually resulted in confusing 
NullPointerExceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68816618
  
  [Test build #25084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25084/consoleFull)
 for   PR 3884 at commit 
[`99cc09f`](https://github.com/apache/spark/commit/99cc09f6996706f5d067d878d486d8f5dc2c31f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22448625
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, 
PoissonSampler, Bernoulli
  * on RDD internals.
  */
 abstract class RDD[T: ClassTag](
-@transient private var sc: SparkContext,
+@transient private var _sc: SparkContext,
 @transient private var deps: Seq[Dependency[_]]
   ) extends Serializable with Logging {
 
+  if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) {
+throw new SparkException("Spark does not support nested RDDs (see 
SPARK-5063)")
+  }
+
+  private def sc: SparkContext = {
+if (_sc == null) {
+  throw new SparkException(
+"Can only define RDDs and perform actions on the driver, not in 
tasks (see SPARK-5063)")
--- End diff --

Looks good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22447994
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
   mutableDependencies += dep
 }
   }
+
+  test("Nested RDDs are not supported (SPARK-5063)") {
--- End diff --

A quick `git grep` suggests that every suite uses its own style and that 
there's not an obvious dominant style.  I'll just change these tests to the 
lowercase convention to match RDDSuite, but leave the BroadcastSuite ones as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22447976
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, 
PoissonSampler, Bernoulli
  * on RDD internals.
  */
 abstract class RDD[T: ClassTag](
-@transient private var sc: SparkContext,
+@transient private var _sc: SparkContext,
 @transient private var deps: Seq[Dependency[_]]
   ) extends Serializable with Logging {
 
+  if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) {
+throw new SparkException("Spark does not support nested RDDs (see 
SPARK-5063)")
+  }
+
+  private def sc: SparkContext = {
+if (_sc == null) {
+  throw new SparkException(
+"Can only define RDDs and perform actions on the driver, not in 
tasks (see SPARK-5063)")
--- End diff --

Sure.  How about this:

> RDD transformations and actions can only be invoked by the driver, not 
inside of other transformations; for example, `rdd1.map(x => 
rdd2.values.count() * x)` is invalid because the `values` transformation and 
`count` action cannot be performed inside of the `rdd1.map` transformation.  
For more information, see SPARK-5063.

Kind of verbose, but I think an example might be the clearest way to 
explain this, esp. to someone unfamiliar with the terminology.

It might be nice to keep the JIRA reference since it will make the 
exception easier to search for (I'm kind of inspired by React.js's error 
messages, which include URL-shortened links to the documentation).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22447845
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
   mutableDependencies += dep
 }
   }
+
+  test("Nested RDDs are not supported (SPARK-5063)") {
--- End diff --

It varies from suite-to-suite; most start with lowercase because they start 
with method names.  If you look at BroadcastSuite, though, most use uppercase 
style like I've done here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22447826
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -897,4 +897,23 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
   mutableDependencies += dep
 }
   }
+
+  test("Nested RDDs are not supported (SPARK-5063)") {
--- End diff --

a nit pick: i don't think we have a standard, but so far test case names 
start with lower case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r22447822
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -76,10 +76,22 @@ import org.apache.spark.util.random.{BernoulliSampler, 
PoissonSampler, Bernoulli
  * on RDD internals.
  */
 abstract class RDD[T: ClassTag](
-@transient private var sc: SparkContext,
+@transient private var _sc: SparkContext,
 @transient private var deps: Seq[Dependency[_]]
   ) extends Serializable with Logging {
 
+  if (classOf[RDD[_]].isAssignableFrom(elementClassTag.runtimeClass)) {
+throw new SparkException("Spark does not support nested RDDs (see 
SPARK-5063)")
+  }
+
+  private def sc: SparkContext = {
+if (_sc == null) {
+  throw new SparkException(
+"Can only define RDDs and perform actions on the driver, not in 
tasks (see SPARK-5063)")
--- End diff --

Pointing to a JIRA ticket might not be the most friendly way for users. 
Maybe make it more verbose and explain it in one or two lines?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68668004
  
Haha, the `org.apache.spark.broadcast.BroadcastSuite.Using broadcast after 
destroy prints callsite` test actually broadcasts an RDD (which is invalid), 
which is what caused that test failure.  I'll fix this up in my next commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68650048
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25036/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68650044
  
  [Test build #25036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25036/consoleFull)
 for   PR 3884 at commit 
[`57cc8a1`](https://github.com/apache/spark/commit/57cc8a11266770e145c7ca810bec3b95aeefabb3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68647672
  
@sryza Good idea; I've added a new check which prevents RDDs from being 
directly broadcasted.

I should probably add these checks to PySpark, too.  I'm not actually sure 
what happens if you try to do these invalid things in PySpark, so I should 
probably try them first and add their errors / stacktraces to the JIRA so that 
it's easier for me / the support team to pattern-match to this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68647489
  
  [Test build #25036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25036/consoleFull)
 for   PR 3884 at commit 
[`57cc8a1`](https://github.com/apache/spark/commit/57cc8a11266770e145c7ca810bec3b95aeefabb3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-04 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68646490
  
Will this work for broadcast variables as well?  One thing I often see is 
users trying to directly broadcast an RDD without collecting it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68584489
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25005/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68584486
  
  [Test build #25005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25005/consoleFull)
 for   PR 3884 at commit 
[`15b2e6b`](https://github.com/apache/spark/commit/15b2e6b38d9587790357182abe7918853688722e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-68582760
  
  [Test build #25005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25005/consoleFull)
 for   PR 3884 at commit 
[`15b2e6b`](https://github.com/apache/spark/commit/15b2e6b38d9587790357182abe7918853688722e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5063] Useful error messages for nested ...

2015-01-02 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3884

[SPARK-5063] Useful error messages for nested RDDs and actions inside of 
transformations

This patch adds more helpful error messages for invalid programs that 
define nested RDDs and performs actions inside of transformations (e.g. calling 
`count()` from inside of `map()`).  Currently, these invalid programs lead to 
confusing NullPointerExceptions at runtime and have been a major source of 
questions on the mailing list and StackOverflow.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-5063

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3884.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3884


commit 15b2e6b38d9587790357182abe7918853688722e
Author: Josh Rosen 
Date:   2015-01-03T04:14:27Z

[SPARK-5063] Useful error messages for nested RDDs and actions inside of 
transformations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org