[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42716336
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42712525
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42799372
  
Looks good to me too !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12515148
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -259,19 +238,30 @@ private[spark] class Executor(
 }
 
 case t: Throwable = {
-  val serviceTime = System.currentTimeMillis() - taskStart
-  val metrics = attemptedTask.flatMap(t = t.metrics)
-  for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
-  }
-  val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
-  execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
+  // Attempt to exit cleanly by informing the driver of our 
failure.
+  // If anything goes wrong (or this was a fatal exception), we 
will delegate to
+  // the default uncaught exception handler, which will terminate 
the Executor.
+  try {
+logError(Exception in task ID  + taskId, t)
+
+val serviceTime = System.currentTimeMillis() - taskStart
+val metrics = attemptedTask.flatMap(t = t.metrics)
+for (m - metrics) {
+  m.executorRunTime = serviceTime
+  m.jvmGCTime = gcTime - startGCTime
+}
+val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
+execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
 
-  // TODO: Should we exit the whole executor here? On the one 
hand, the failed task may
-  // have left some weird state around depending on when the 
exception was thrown, but on
-  // the other hand, maybe we could detect that when future tasks 
fail and exit then.
-  logError(Exception in task ID  + taskId, t)
+// Don't forcibly exit unless the exception was inherently 
fatal, to avoid
+// stopping other tasks unnecessarily.
+if (Utils.isFatalError(t)) {
+  ExecutorUncaughtExceptionHandler.uncaughtException(t)
+}
+  } catch {
+case t2: Throwable =
+  ExecutorUncaughtExceptionHandler.uncaughtException(t2)
--- End diff --

Hmm, good point. I kind of like being explicit over relying on the globally 
set uncaught exception handler. I could be happy with getting rid of this and 
replacing it with a comment, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12515152
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -259,19 +238,30 @@ private[spark] class Executor(
 }
 
 case t: Throwable = {
-  val serviceTime = System.currentTimeMillis() - taskStart
-  val metrics = attemptedTask.flatMap(t = t.metrics)
-  for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
-  }
-  val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
-  execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
+  // Attempt to exit cleanly by informing the driver of our 
failure.
+  // If anything goes wrong (or this was a fatal exception), we 
will delegate to
+  // the default uncaught exception handler, which will terminate 
the Executor.
+  try {
+logError(Exception in task ID  + taskId, t)
+
+val serviceTime = System.currentTimeMillis() - taskStart
+val metrics = attemptedTask.flatMap(t = t.metrics)
+for (m - metrics) {
+  m.executorRunTime = serviceTime
+  m.jvmGCTime = gcTime - startGCTime
+}
+val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
+execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
 
-  // TODO: Should we exit the whole executor here? On the one 
hand, the failed task may
-  // have left some weird state around depending on when the 
exception was thrown, but on
-  // the other hand, maybe we could detect that when future tasks 
fail and exit then.
-  logError(Exception in task ID  + taskId, t)
+// Don't forcibly exit unless the exception was inherently 
fatal, to avoid
+// stopping other tasks unnecessarily.
+if (Utils.isFatalError(t)) {
+  ExecutorUncaughtExceptionHandler.uncaughtException(t)
+}
+  } catch {
+case t2: Throwable =
+  ExecutorUncaughtExceptionHandler.uncaughtException(t2)
--- End diff --

Actually just realized we basically already have that comment, just 
interpreted in a different way :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42763346
  
/cc @scrapcodes who worked on the IndestructableActorSystem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42716337
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14854/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12514419
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -71,7 +71,7 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   stopDaemon()
   startDaemon()
   new Socket(daemonHost, daemonPort)
-case e: Throwable = throw e
+case e: Exception = throw e
--- End diff --

Do we really need this line ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42797560
  
Looks pretty good to me, just made one small comment. I think it's good to 
eliminate these now. I haven't seen many cases where they're super useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42797519
  
 It also turns out that it is unlikely that the IndestructibleActorSystem 
actually works, given testing (here).  
I works but in case of OOMs, the behavior can be very sporadic. The only 
reason it was needed was in akka 2.0.x days netty was tolerating OOMs and it 
was thus never the *chance* that Akka got to deal with them. Since netty almost 
always had them and mostly manage to eat them. Very weird things. Just saying 
for posterity. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-11 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12514626
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -259,19 +238,30 @@ private[spark] class Executor(
 }
 
 case t: Throwable = {
-  val serviceTime = System.currentTimeMillis() - taskStart
-  val metrics = attemptedTask.flatMap(t = t.metrics)
-  for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
-  }
-  val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
-  execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
+  // Attempt to exit cleanly by informing the driver of our 
failure.
+  // If anything goes wrong (or this was a fatal exception), we 
will delegate to
+  // the default uncaught exception handler, which will terminate 
the Executor.
+  try {
+logError(Exception in task ID  + taskId, t)
+
+val serviceTime = System.currentTimeMillis() - taskStart
+val metrics = attemptedTask.flatMap(t = t.metrics)
+for (m - metrics) {
+  m.executorRunTime = serviceTime
+  m.jvmGCTime = gcTime - startGCTime
+}
+val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
+execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
 
-  // TODO: Should we exit the whole executor here? On the one 
hand, the failed task may
-  // have left some weird state around depending on when the 
exception was thrown, but on
-  // the other hand, maybe we could detect that when future tasks 
fail and exit then.
-  logError(Exception in task ID  + taskId, t)
+// Don't forcibly exit unless the exception was inherently 
fatal, to avoid
+// stopping other tasks unnecessarily.
+if (Utils.isFatalError(t)) {
+  ExecutorUncaughtExceptionHandler.uncaughtException(t)
+}
+  } catch {
+case t2: Throwable =
+  ExecutorUncaughtExceptionHandler.uncaughtException(t2)
--- End diff --

Can't the uncaught exception handler for this thread be set to deal with 
this, instead of another catch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-10 Thread aarondav
GitHub user aarondav opened a pull request:

https://github.com/apache/spark/pull/715

[RFC] SPARK-1772 Stop catching Throwable, let Executors die

The main issue this patch fixes is 
[SPARK-1772](https://issues.apache.org/jira/browse/SPARK-1772), in which 
Executors may not die when fatal exceptions (e.g., OOM) are thrown. This patch 
causes Executors to delegate to the ExecutorUncaughtExceptionHandler when a 
fatal exception is thrown.

This patch also continues the fight in the neverending war against `case t: 
Throwable =`, by only catching Exceptions in many places, and adding a wrapper 
for Threads and Runnables to make sure any uncaught exceptions are at least 
printed to the logs.

It also turns out that it is unlikely that the IndestructibleActorSystem 
actually works, given testing 
([here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)). The 
uncaughtExceptionHandler is not called from the places that we expected it 
would be.
[SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620) deals with 
part of this issue, but refactoring our Actor Systems to ensure that exceptions 
are dealt with properly is a much bigger change, outside the scope of this PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aarondav/spark throwable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #715


commit 1867867a00241ff1bd20d2ac3ac610ed126a9280
Author: Aaron Davidson aa...@databricks.com
Date:   2014-05-09T20:28:26Z

[RFC] SPARK-1772 Stop catching Throwable, let Executors die

The main issue this patch fixes is 
[SPARK-1772](https://issues.apache.org/jira/browse/SPARK-1772),
in which Executors may not die when fatal exceptions (e.g., OOM) are 
thrown. This patch causes
Executors to delegate to the ExecutorUncaughtExceptionHandler when a fatal 
exception is thrown.

This patch also continues the fight in the neverending war against `case t: 
Throwable =`,
by only catching Exceptions in many places, and adding a wrapper for 
Threads and Runnables
to make sure any uncaught exceptions are at least printed to the logs.

It also turns out that it is unlikely that the IndestructibleActorSystem 
actually works,
given testing 
([here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)). The
uncaughtExceptionHandler is not called from the places that we expected it 
would be.
[SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620) deals with 
part of this
issue, but refactoring our Actor Systems to ensure that exceptions are 
dealt with properly
is a much bigger change, outside the scope of this PR.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42712542
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---