[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-15 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164965793
  
By the way, @reggert how is this related to SPARK-4514?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-15 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164965667
  
Thanks, merging into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9264


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-15 Thread reggert
Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164967910
  
@andrewor14 I included a couple lines to propagate the local properties 
(from the thread that created the `ComplexFutureAction`) to each of the spawned 
jobs (all but the first of which may be launched from other threads, losing the 
local properties unless `setLocalProperties` is called). AFAIK this resolves 
SPARK-4514.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-14 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164512853
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread reggert
Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47446916
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Oh, is that all? :-)

Comment line removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164276707
  
**[Test build #47629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)**
 for PR 9264 at commit 
[`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-164285830
  
**[Test build #47629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47629/consoleFull)**
 for PR 9264 at commit 
[`539ac43`](https://github.com/apache/spark/commit/539ac43c3be54abb61b5a44450cc13cc196113b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`trait JobSubmitter `\n  * `class ComplexFutureAction[T](run : JobSubmitter => 
Future[T])`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-10 Thread reggert
Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47322111
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

If `mapPartitions` throws an exception, won't that cause the test to fail 
before we even get to `Await.result`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-10 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47322325
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

> If mapPartitions throws an exception, won't that cause the test to fail 
before we even get to Await.result?

I'm not against `Await.result` here. I just wanted to point out the comment 
is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-09 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47192789
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

The problem is here it uses `executionContextInvoked.future` to call 
`Await.result`. But the only place that completes the Promise is 
`executionContextInvoked.success(())`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-09 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47190819
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Just tested the logic locally. `Task not serializable` will be throw from 
`mapPartitions` if I added some non-serializable reference to the closure. 

Since no place will call `executionContextInvoked.failure`, the only 
exception will be thrown from `Await.result(executionContextInvoked.future, 
atMost = 15.seconds)` is just TimeoutException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47192089
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

How did you test it?  My expectation is the same as Richard's, that 
`Await.result` will throw the underlying exception.  For example:
```
scala> val fi: Future[Int] = Future { throw new 
java.lang.RuntimeException() }
fi: scala.concurrent.Future[Int] = 
scala.concurrent.impl.Promise$DefaultPromise@3f200884

scala> try { Await.result(fi, 3 seconds) } catch {
 | case _: java.util.concurrent.TimeoutException => println("Timeout")
 | case _: java.lang.RuntimeException => println("Runtime")
 | case _: Throwable => println("Huh?")
 | }
Runtime
res1: AnyVal = ()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47193488
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Ah, got it -- yes, that does appear to be swallowing any exception from 
`mapPartitions`.  Good eyes! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-09 Thread reggert
Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r47117656
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

To clarify, in the event that we've broken task deserialization, I would 
expect to see a `NotSerializableException` or similar error thrown from the 
`Await.result` call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-07 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r46861956
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Since the callback doesn't run, how to verify that we didn't break task 
deserialization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-07 Thread reggert
Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r46909976
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R]): 
Unit = {
+val executionContextInvoked = Promise[Unit]
+val fakeExecutionContext = new ExecutionContext {
+  override def execute(runnable: Runnable): Unit = {
+executionContextInvoked.success(())
+  }
+  override def reportFailure(t: Throwable): Unit = ()
+}
+val starter = Smuggle(new Semaphore(0))
+starter.drainPermits()
+val rdd = sc.parallelize(1 to 100, 4).mapPartitions {itr => 
starter.acquire(1); itr}
+val f = action(rdd)
+f.onComplete(_ => ())(fakeExecutionContext)
+// Here we verify that registering the callback didn't cause a thread 
to be consumed.
+assert(!executionContextInvoked.isCompleted)
+// Now allow the executors to proceed with task processing.
+starter.release(rdd.partitions.length)
+// Waiting for the result verifies that the tasks were successfully 
processed.
+// This mainly exists to verify that we didn't break task 
deserialization.
--- End diff --

Unless I'm mistaken, the `Await.result` call will throw an exception if the 
job fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-07 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r46867432
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -65,17 +64,23 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) 
extends Serializable with Loggi
* Returns a future for retrieving the first num elements of the RDD.
*/
   def takeAsync(num: Int): FutureAction[Seq[T]] = self.withScope {
-val f = new ComplexFutureAction[Seq[T]]
 val callSite = self.context.getCallSite
-
-f.run {
-  // This is a blocking action so we should use 
"AsyncRDDActions.futureExecutionContext" which
-  // is a cached thread pool.
-  val results = new ArrayBuffer[T](num)
-  val totalParts = self.partitions.length
-  var partsScanned = 0
-  self.context.setCallSite(callSite)
-  while (results.size < num && partsScanned < totalParts) {
+val localProperties = self.context.getLocalProperties
+// Cached thread pool to handle aggregation of subtasks.
+implicit val executionContext = AsyncRDDActions.futureExecutionContext
--- End diff --

not necessary for this patch. But I think we can remove `executionContext` 
in future since it doesn't make sense to use a cached thread pool now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-07 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-162639094
  
@reggert sorry for the delay. LGTM except a nit for test. Thanks very much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-06 Thread reggert
Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-162326651
  
All I want for Christmas is a code review. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-12-02 Thread reggert
Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-161528329
  
@zsxwing @JoshRosen I just want to make sure that you guys haven't 
forgotten about this. I haven't heard anything in a week and a half.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-30 Thread reggert
Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-160829564
  
Any feedback?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-23 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-159088473
  
@zsxwing, could you take a look at this patch? I think you'd be a good 
reviewer for this given your experience with similar patches in streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817738
  
**[Test build #46499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46499/consoleFull)**
 for PR 9264 at commit 
[`8fe8000`](https://github.com/apache/spark/commit/8fe8000ee5ada0f82ab171a27cab75e8142debcb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`trait JobSubmitter `\n  * `class ComplexFutureAction[T](run : JobSubmitter => 
Future[T])`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817770
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817771
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46499/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817052
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46498/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158817011
  
**[Test build #46498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46498/consoleFull)**
 for PR 9264 at commit 
[`5816489`](https://github.com/apache/spark/commit/58164891be00627845da1fd8bce8906c20678d12).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`trait JobSubmitter `\n  * `class ComplexFutureAction[T](run : JobSubmitter => 
Future[T])`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...

2015-11-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158801223
  
**[Test build #46499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46499/consoleFull)**
 for PR 9264 at commit 
[`8fe8000`](https://github.com/apache/spark/commit/8fe8000ee5ada0f82ab171a27cab75e8142debcb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org