[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1360#discussion_r17250396
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -295,28 +295,34 @@ private[spark] class Master(
   val execOption = idToApp.get(appId).flatMap(app = 
app.executors.get(execId))
   execOption match {
 case Some(exec) = {
+  val appInfo = idToApp(appId)
--- End diff --

This will crash if `idToApp` doesn't already contain `appId`. Maybe we 
should log a warning instead of throwing a key not found exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1360#discussion_r17250993
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -234,7 +234,7 @@ private[spark] class Worker(
 try {
   logInfo(Asked to launch executor %s/%d for %s.format(appId, 
execId, appDesc.name))
   val manager = new ExecutorRunner(appId, execId, appDesc, cores_, 
memory_,
-self, workerId, host, sparkHome, workDir, akkaUrl, conf, 
ExecutorState.RUNNING)
+self, workerId, host, sparkHome, workDir, akkaUrl, conf, 
ExecutorState.LOADING)
--- End diff --

Is this part of fixing the bug, or just to make the logic in the existing 
state machine clearer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1360#discussion_r17254222
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -295,28 +295,34 @@ private[spark] class Master(
   val execOption = idToApp.get(appId).flatMap(app = 
app.executors.get(execId))
   execOption match {
 case Some(exec) = {
+  val appInfo = idToApp(appId)
   exec.state = state
+  if (state == ExecutorState.RUNNING) { appInfo.resetRetryCount() }
   exec.application.driver ! ExecutorUpdated(execId, state, 
message, exitStatus)
   if (ExecutorState.isFinished(state)) {
-val appInfo = idToApp(appId)
 // Remove this executor from the worker and app
-logInfo(Removing executor  + exec.fullId +  because it is  
+ state)
+logInfo(sRemoving executor ${exec.fullId} because it is 
$state)
 appInfo.removeExecutor(exec)
 exec.worker.removeExecutor(exec)
 
-val normalExit = exitStatus.exists(_ == 0)
+val normalExit = exitStatus == Some(0)
--- End diff --

Just seemed even clearer to directly compare Options instead of using a HOF.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1360#discussion_r17255027
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -234,7 +234,7 @@ private[spark] class Worker(
 try {
   logInfo(Asked to launch executor %s/%d for %s.format(appId, 
execId, appDesc.name))
   val manager = new ExecutorRunner(appId, execId, appDesc, cores_, 
memory_,
-self, workerId, host, sparkHome, workDir, akkaUrl, conf, 
ExecutorState.RUNNING)
+self, workerId, host, sparkHome, workDir, akkaUrl, conf, 
ExecutorState.LOADING)
--- End diff --

It's part of the fix.  ExecutorRunners now start in state LOADING and don't 
become RUNNING until their process is running and stderr  stdout are 
successfully opened -- see the changes in this PR to ExecutorRunner.scala.  If 
ExecutorRunners were to continue to be created in state RUNNING, then new check 
for RUNNING Executors in Master.scala will always prevent Applications from 
being killed even though useful Executors doing actual work may not exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1360#discussion_r17255465
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -295,28 +295,34 @@ private[spark] class Master(
   val execOption = idToApp.get(appId).flatMap(app = 
app.executors.get(execId))
   execOption match {
 case Some(exec) = {
+  val appInfo = idToApp(appId)
--- End diff --

That's not necessary.  If `idToApp` does not contain `appId`, then 
`execOption` will be `None` and we'll never end up in this `case Some(exec)`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54868231
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19987/consoleFull)
 for   PR 1360 at commit 
[`f099c0b`](https://github.com/apache/spark/commit/f099c0b2654759ab5cbfe2bc91cedac10f3bf77f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54876396
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19987/consoleFull)
 for   PR 1360 at commit 
[`f099c0b`](https://github.com/apache/spark/commit/f099c0b2654759ab5cbfe2bc91cedac10f3bf77f).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54908417
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54909182
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20006/consoleFull)
 for   PR 1360 at commit 
[`f099c0b`](https://github.com/apache/spark/commit/f099c0b2654759ab5cbfe2bc91cedac10f3bf77f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54914053
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20006/consoleFull)
 for   PR 1360 at commit 
[`f099c0b`](https://github.com/apache/spark/commit/f099c0b2654759ab5cbfe2bc91cedac10f3bf77f).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1360


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54921279
  
Thanks, I merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53163415
  
I'm not sure I'm following, @mridulm.  The problem is not one of removing 
Executors, but rather of removing Applications that could and should still be 
left running even though some (but not all) Executors assigned to an 
Application are dying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53163562
  
@markhamstra In our cluster, this usually happens due to one or more 
executor being in a bad state : either due to insufficient disk for finishing a 
task or it is in process of cleaning up and exit'ing.
When task fails, usually due to locality, the same task gets re-assigned to 
the executor where it just failed usually due to locality match. And this 
repeated re-schedule on failing executor, fail loop causes application to fail 
since it hits the maximum number of failed tasks for application, or maximum 
number of task failures for a specific task (iirc there are two params).

We alleviate this by setting blacklist timeout to a non trivially 
appropriate value : this prevents the rapid reschedule of a failing task on the 
executor (and usually some other executor picks up the task - the timeout is 
chosen so that this is possible).
If the executor is healthy but cant execute this specific task, then 
blacklist works fine.
If executor is unhealthy and going to exit, then we will still have rapid 
task failures until executor notifies master when it exits - but the failure 
count per task is not hit (iirc the number of failed tasks for app is much 
higher than number of failed attempts per tasks).


Ofcourse, not sure if this is completely applicable in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53164183
  
@mridulm Is this blacklisting behavior a customization that you have made 
to Spark?  If not, could you point me to where and how it is implemented?

What you are describing seems to be orthogonal and probably complementary 
to this PR: Yours, a means to prevent rescheduling of a task on an Executor 
where it cannot run successfully vs. this one, a means to prevent the killing 
of a running Application when some Executors die but others are still running 
the Application successfully.  Sounds to me like we want both of those means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53164487
  
Take a look at 'spark.scheduler.executorTaskBlacklistTime' in 
TaskSetManager.
Since I run mostly in yarn-cluster mode, and there is only single 
application there; I was not sure how relevant black-listing was in your case 
actually ! (multiple apps via standalone I guess ?)

Note that we actually need a third case, which is not yet handled, slow 
executors/stragglers - particularly for low latency stages, they really kill 
execution times for some of our ML jobs (a 50x speedup becomes much much lower 
due to these).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53087747
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19085/consoleFull)
 for   PR 1360 at commit 
[`206a448`](https://github.com/apache/spark/commit/206a44853edf5caea9ce2d8ae83af3d999f5cf41).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53094602
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19085/consoleFull)
 for   PR 1360 at commit 
[`206a448`](https://github.com/apache/spark/commit/206a44853edf5caea9ce2d8ae83af3d999f5cf41).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-22 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-53101677
  
Will blacklisting executors with appropriate value not solve this ? (If
executor is still usable, but some tasks fail on the specific executor for
whatever reason).

The more general problem of detecting straddlers and/or misbehaving
executors is something we have yet to do in spark.
On 22-Aug-2014 11:10 pm, Apache Spark QA notificati...@github.com wrote:

 QA tests have finished
 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19085/consoleFull
 for PR 1360 at commit 206a448
 
https://github.com/apache/spark/commit/206a44853edf5caea9ce2d8ae83af3d999f5cf41
 .

- This patch *passes* unit tests.
- This patch merges cleanly.
- This patch adds no public classes.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1360#issuecomment-53094602.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-12 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-51959659
  
@pwendell Still should go into 1.1.0...  The change is fairly small, and 
the unpatched behavior is pretty nasty for long-running applications.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-51960128
  
QA tests have started for PR 1360. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18382/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-51967252
  
QA results for PR 1360:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18382/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-50915845
  
QA tests have started for PR 1360. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17686/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-50921999
  
QA results for PR 1360:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17686/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-27 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-50281668
  
ping

Probably too late for a 1.0.2-rc, but this should go into 1.0.3 and 1.1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48848235
  
QA tests have started for PR 1360. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16606/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48850708
  
QA results for PR 1360:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16606/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48762929
  
QA tests have started for PR 1360. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16571/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48773641
  
QA results for PR 1360:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16571/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread markhamstra
GitHub user markhamstra opened a pull request:

https://github.com/apache/spark/pull/1360

SPARK-2425 Don't kill a still-running Application because of some 
misbehaving Executors 

Introduces a LOADING - RUNNING ApplicationState transition and prevents 
Master from removing an Application with RUNNING Executors.

Two basic changes: 1) Instead of allowing MAX_NUM_RETRY abnormal Executor 
exits over the entire lifetime of the Application, allow that many since any 
Executor successfully began running the Application; 2) Don't remove the 
Application while Master still thinks that there are RUNNING Executors.

This should be fine as long as the ApplicationInfo doesn't believe any 
Executors are forever RUNNING when they are not.  I think that any non-RUNNING 
Executors will eventually no longer be RUNNING in Master's accounting, but 
another set of eyes should confirm that.  This PR also doesn't try to detect 
which nodes have gone rogue or to kill off bad Workers, so repeatedly failing 
Executors will continue to fail and fill up log files with failure reports as 
long as the Application keeps running.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markhamstra/spark SPARK-2425

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1360


commit 5b85534d376d682b7e1f97f98acd532a305349f8
Author: Mark Hamstra markhams...@gmail.com
Date:   2014-07-09T23:02:43Z

SPARK-2425 introduce LOADING - RUNNING ApplicationState transition
and prevent Master from removing Application with RUNNING Executors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48643572
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48643589
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48643785
  
QA tests have started for PR 1360. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16508/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48656115
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16508/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-48656112
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---