[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
merged to master.  Thanks @attilapiros !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-19 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
Here is the new task for the metrics: 
https://issues.apache.org/jira/browse/SPARK-24594.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92040/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #92040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92040/testReport)**
 for PR 21068 at commit 
[`f71c7c5`](https://github.com/apache/spark/commit/f71c7c547f173c902eec101d510c87d50c7abb86).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #92040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92040/testReport)**
 for PR 21068 at commit 
[`f71c7c5`](https://github.com/apache/spark/commit/f71c7c547f173c902eec101d510c87d50c7abb86).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92031/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #92031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92031/testReport)**
 for PR 21068 at commit 
[`f71c7c5`](https://github.com/apache/spark/commit/f71c7c547f173c902eec101d510c87d50c7abb86).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #92031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92031/testReport)**
 for PR 21068 at commit 
[`f71c7c5`](https://github.com/apache/spark/commit/f71c7c547f173c902eec101d510c87d50c7abb86).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-18 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
Looks like it was modified to kill if all nodes blacklisted so I'm good 
with this approach. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91920/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91920/testReport)**
 for PR 21068 at commit 
[`a462ce0`](https://github.com/apache/spark/commit/a462ce0f929fbd18e708dfc19ca6ad3af8b41315).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91920/testReport)**
 for PR 21068 at commit 
[`a462ce0`](https://github.com/apache/spark/commit/a462ce0f929fbd18e708dfc19ca6ad3af8b41315).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
lgtm

will leave open for a couple of days to let @tgravescs take a look


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91907/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91907/testReport)**
 for PR 21068 at commit 
[`a462ce0`](https://github.com/apache/spark/commit/a462ce0f929fbd18e708dfc19ca6ad3af8b41315).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91905/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91905/testReport)**
 for PR 21068 at commit 
[`848d050`](https://github.com/apache/spark/commit/848d050eda54f31b14286af966dc9358e35658a6).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
Retested manually on a cluster with the result the PR's description is 
updated. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91907/testReport)**
 for PR 21068 at commit 
[`a462ce0`](https://github.com/apache/spark/commit/a462ce0f929fbd18e708dfc19ca6ad3af8b41315).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91905/testReport)**
 for PR 21068 at commit 
[`848d050`](https://github.com/apache/spark/commit/848d050eda54f31b14286af966dc9358e35658a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91860/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91860/testReport)**
 for PR 21068 at commit 
[`aa52f6e`](https://github.com/apache/spark/commit/aa52f6edb998d21e51d0d9a73351548034515a8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91860/testReport)**
 for PR 21068 at commit 
[`aa52f6e`](https://github.com/apache/spark/commit/aa52f6edb998d21e51d0d9a73351548034515a8e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91779/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91779/testReport)**
 for PR 21068 at commit 
[`7fce4ee`](https://github.com/apache/spark/commit/7fce4eec7294abb071200f1674293bfc2089f82b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91779/testReport)**
 for PR 21068 at commit 
[`7fce4ee`](https://github.com/apache/spark/commit/7fce4eec7294abb071200f1674293bfc2089f82b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91764/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91764/testReport)**
 for PR 21068 at commit 
[`61f3d17`](https://github.com/apache/spark/commit/61f3d1718072c252298b6d8ddcca333d1cf122a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91764/testReport)**
 for PR 21068 at commit 
[`61f3d17`](https://github.com/apache/spark/commit/61f3d1718072c252298b6d8ddcca333d1cf122a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-12 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
Tom and I had a chance to discuss this in person, and after some back and 
forth I think we decided that maybe its best to remove the limit but have the 
application fail if the entire cluster is blacklisted.  @tgravescs does that 
sound correct?

I mentioned this briefly to @attilapiros and he mentioned that might be 
hard, but instead you could stop allocation blacklisting which would result in 
the usual yarn app failure from too many executors.  He's going to look at this 
a little more closely and report back here.  I'd be OK with that -- the main 
goal is just make sure that an app doesn't hang if you've blacklisted the 
entire cluster.  I'm pretty sure that's @tgravescs main concern as well.  (If 
the only reasonable way to do that is with the existing limit, I'm fine w/ that 
too.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-31 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
hey sorry I have been meaning to respond to this but keep getting 
sidetracked.  As Tom and I are going to meet in person next week anyway, I 
figure at this point it makes sense to just wait till we chat directly to make 
sure we're on the same page.  It sounds like we're in agreement but at this 
point might as well wait a couple more days, as I haven't had a chance to do a 
final review anyway


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91307/testReport)**
 for PR 21068 at commit 
[`0e78b38`](https://github.com/apache/spark/commit/0e78b383b6f00cbcf7bab53885e7b38da0544dde).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
well the downside to that and not just failing the application is similar 
to what @squito  was mentioning, if the cluster is just busy and you can't get 
containers on those last few nodes, it could hang there for a long time.  More 
then likely all you are going to leave is a single node not blacklisted.   But 
I guess you can have that even with the BLACKLIST_RATIO, its just that you can 
control that better.  I guess perhaps for this pr we just leave as is like 
@squito mentioned and have it off by default. Have a followup to add 
notification into the driver.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #91307 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91307/testReport)**
 for PR 21068 at commit 
[`0e78b38`](https://github.com/apache/spark/commit/0e78b383b6f00cbcf7bab53885e7b38da0544dde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-30 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
@tgravescs what about removing YARN_BLACKLIST_MAX_NODE_BLACKLIST_RATIO 
config and when the set of backlisted nodes reaches numClusterNodes I stop 
synchronising the backlisted nodes toward YARN so there would be still some 
nodes not backlisted (the previous backlisted state so it is still different 
from state of the UI but for a short time) and the failures will be counted so 
finally the old mechanism using MAX_EXECUTOR_FAILURES (if configured) which 
would stop the app. 

This way mostRelevantSubsetOfBlacklistedNodes() and the Expiry from the 
scheduler blacklisted nodes can be removed from code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-29 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
so specifically on the limit, I'm ok with removing it as long as we have 
the basic check to fail.  I guess perhaps you are saying the limit and that 
check are essentially the same thing?   I was thinking that they were different 
in that if you remove the limit from yarn, then the driver and UI side wouldn't 
get out of sync since the only thing the yarn side would do is fail if it hit 
the condition that all nodes are blacklisted.   If you leave the limit as is, 
like you mention it could be a bit confusing to the user as it could acquire an 
executor on the node that was blacklisted but on the yarn side we don't enforce 
due to the limit.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-23 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
I mean when `YarnAllocatorBlacklistTracker` decides to blacklist because of 
allocation failures, it doesn't send any message back to the driver -- so the 
driver doesn't have a msg in the logs, nor in the event log nor a UI update.  
So in client mode, the user would need to get AM logs to know what was going on.

Attila wanted to do it this way because of 
`mostRelevantSubsetOfBlacklistedNodes` -- it seemed weird to send an update to 
the driver when the blacklisting wasn't necessarily even in effect.  Though now 
that I'm thinking about this, maybe it should just send the update anyway, even 
though that blacklist may effectively be ignored.

Re: starvation -- I agree, though "eventually" for resources can be so long 
in practice that to users it all looks the same.

Anyway, though you say you're OK with removing the limit, it seems like you 
feel more strongly about this then I do.  So I think we can keep it, I don't 
think it prevents us from doing something else down the road.

I do think we should add the notification to the driver, including a 
listener event, which just ignores `mostRelevantSubsetOfBlacklistedNodes`, 
unless anyone has a reason for not doing it.  I suggest @attilapiros does that 
in a followup.

If that plan sounds OK, then this is probably nearly ready to merge.  But 
its been a little while since I've looked closely so I'll do another pass 
(probably tomorrow).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-23 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
What do you mean by adding notification to the driver? Like I mentioned I'm 
fine with removing the limit for now but I think we have to do something here 
if the entire cluster gets blacklisted, otherwise users jobs will just hang.  
Its one thing if resources aren't available at the moment (as that can happen 
regardless of blacklisting) and the assumption is they will eventually come 
available but if spark has blacklisted all the nodes in the cluster we should 
just fail if we aren't going to run. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-23 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
I totally understand your motivation for wanting the limit.  But I'm trying 
to balance that against behavior which might not really achieve the desired 
effect and be even more confusing in some cases.

It won't achieve the desired effect if your cluster has more nodes, but 
they're all tied up in other applications.  It'll be confusing to users if they 
see notification about blacklisting in the logs and UI, but then still see 
spark trying to use those nodes anyway.  I wonder if putting this in will make 
it hard

All that said, I don't have a great alternative now, other than just 
removing the limit entirely for the moment and adding notification to the 
driver.  We could have a more general starvation detector, which wouldn't only 
look at node count, but also look at delays in acquiring containers and finding 
places to schedule tasks (related to SPARK-15815 & SPARK-22148), but I don't 
want to tackle all of that here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-22 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
Ah, sorry haven't had time to get to back to this.  Yeah the driver 
interaction could be an issue.  But whether its the limit or just the yarn side 
blacklisting I think you would need some interaction there, right?Or you 
would have to have similar logic that says all nodes blacklisted in the yarn 
side and tell the application to fail.  Otherwise you could blacklist the 
entire cluster based on container launch failures and it would be stuck because 
the driver blacklist wouldn't know about it.  

Personally I'd rather see a limit rather then the current failure as I 
think it would be more robust.  In my opinion I would rather try it at some 
point and have it just fail the max task failures then not try at all.   I've 
seen jobs fail if they only have 1 executor that gets blacklisted that could 
have worked fine if retried. The blacklisting logic isn't perfect.  We do have 
the kill on blacklist which I haven't used much at this point which would also 
help that I guess.

I guess for this I'm fine with removing  the limit for now since that is 
the current behavior in the driver side since communicating back to the driver 
blacklist could be complicated.We do need to handle the all nodes are 
blacklisted on the yarn side issue though.  

I was going to say  this could just be handled by making sure  
spark.yarn.max.executor.failures is sane.  Since I don't think that is really 
the case now since with dynamic allocation its just based on Int.MaxValue or 
whatever the user specifies which could have nothing to do with the actual 
cluster size but you might have a small cluster and someone might want to try 
hard and allow it to fail twice per node or something like that if the yarn 
blacklisting is off.  So do we just need another check  to fail if all or after 
certain percent blacklisted.  Did you have something in mind to replace the 
limit?





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
ping @tgravescs .  honestly I still don't love the blacklist limit, 
especially since it makes reporting back to the driver pretty confusing, and I 
don't think it buys us much.  But I can live with it.  and otherwise I think 
this is ready.

I've also looked at Attila's tests on a real cluster


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90125/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90125/testReport)**
 for PR 21068 at commit 
[`2a8ab8d`](https://github.com/apache/spark/commit/2a8ab8d818fa92e563e31e2d904d3ca6871b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90125/testReport)**
 for PR 21068 at commit 
[`2a8ab8d`](https://github.com/apache/spark/commit/2a8ab8d818fa92e563e31e2d904d3ca6871b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90062/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90062/testReport)**
 for PR 21068 at commit 
[`5760d22`](https://github.com/apache/spark/commit/5760d2285f6e473d1e79a5e9680960069d6205f3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90062/testReport)**
 for PR 21068 at commit 
[`5760d22`](https://github.com/apache/spark/commit/5760d2285f6e473d1e79a5e9680960069d6205f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
I assume it is just a flaky R test.
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90043/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90043/testReport)**
 for PR 21068 at commit 
[`5760d22`](https://github.com/apache/spark/commit/5760d2285f6e473d1e79a5e9680960069d6205f3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #90043 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90043/testReport)**
 for PR 21068 at commit 
[`5760d22`](https://github.com/apache/spark/commit/5760d2285f6e473d1e79a5e9680960069d6205f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89903/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89903/testReport)**
 for PR 21068 at commit 
[`17bbbee`](https://github.com/apache/spark/commit/17bbbee0cf952a32e44fd0767bba08814e351da2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89898/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89898/testReport)**
 for PR 21068 at commit 
[`4df2311`](https://github.com/apache/spark/commit/4df231177343e6be04ec76d8c65e886763a5a152).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89903/testReport)**
 for PR 21068 at commit 
[`17bbbee`](https://github.com/apache/spark/commit/17bbbee0cf952a32e44fd0767bba08814e351da2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89898/testReport)**
 for PR 21068 at commit 
[`4df2311`](https://github.com/apache/spark/commit/4df231177343e6be04ec76d8c65e886763a5a152).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89889/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89889/testReport)**
 for PR 21068 at commit 
[`0ba8510`](https://github.com/apache/spark/commit/0ba85108584d4e2c5649679a10543f9d2cfe367c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89889/testReport)**
 for PR 21068 at commit 
[`0ba8510`](https://github.com/apache/spark/commit/0ba85108584d4e2c5649679a10543f9d2cfe367c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-25 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
A couple more high-level thoughts:

1) Do we want to have a event posted about the node getting blacklisted?  I 
think it would be useful.  But then there needs to be a msg from the 
YarnAllocator back to the driver about the blacklisting.

2) I was thinking about how this interacts with 
[SPARK-13669](https://issues.apache.org/jira/browse/SPARK-13669).  at first I 
was thinking this makes that entirely unnecessary, but I guess that is not true 
-- that is still useful if the external shuffle service goes down *after* the 
executor is started.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
ok sounds fine to me, so we should review as is then


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-19 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
@tgravescs on the blacklist ratio for task-based blacklisting -- there is 
nothing, but there are some related jiras: 
[SPARK-22148](https://issues.apache.org/jira/browse/SPARK-22148) & 
[SPARK-15815](https://issues.apache.org/jira/browse/SPARK-15815)

to be honest I have doubts about the utility of the ratio ... if you really 
want to make sure blacklisting doesn't lead to starvation, you've got to have 
some other mechanism, as you could easily have the remaining nodes be occupied 
or have insufficient resources.

Kubernetes doesn't do anything with the node blacklisting currently: 
[SPARK-23485](https://issues.apache.org/jira/browse/SPARK-23485)

Mesos already has a notion of blacklisting nodes for failing to allocate 
containers, but its currently at odds with the task-based blacklist.  
https://github.com/apache/spark/pull/20640 is somewhat stalled because 
blacklisting based on allocation failures is missing in a general sense.

In any case, I still think we shouldn't make the code more complex for 
something other clusters managers *might* use in the future, and that the 
current overall organization is fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89514/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89514/testReport)**
 for PR 21068 at commit 
[`c92a090`](https://github.com/apache/spark/commit/c92a090e6e3c1dc5776eef1946a28b45731e128b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
thanks for filing that jira @squito, I agree we should have blacklisting 
work with dynamic allocation disabled as well. (A bit of a tangent from this 
jira) I'm actually wondering now about the scheduler blacklisting and whether 
it should have a max blacklisted Ratio as well.  I don't remember if we 
discussed this previously.  

For this, I'm fine either way, if there are people interested in doing the 
mesos/kubernetes stuff now we could certainly coordinate with them to see if 
there is something common we could do now.  I haven't had time to keep up with 
those jira to know though.  Otherwise this isn't public facing so we can do 
that when they decide to implement it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
I think Tom makes a good case for why this should live in the YarnAllocator 
as you have it.

I also don't think you need to worry about creating an abstract class yet, 
that refactoring can be done when another cluster manager tries to share some 
code ... it would just be helpful to keep that use in mind.

also I filed https://issues.apache.org/jira/browse/SPARK-24016 for updating 
the task-based node blacklist even with static allocation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89514 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89514/testReport)**
 for PR 21068 at commit 
[`c92a090`](https://github.com/apache/spark/commit/c92a090e6e3c1dc5776eef1946a28b45731e128b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-17 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
Yes we can create an abstract class from `YarnAllocatorBlacklistTracker` 
(like `AbstractAllocatorBlacklistTracker`) where the method 
`synchronizeBlacklistedNodes` can have different implementations. In this case 
the core and the messages can stay as it is. As I see this is the less risky 
and cheaper solution. On the other hand having the complete blacklisting in the 
driver has a more centralized/clear design. 

We just have to make our mind where to go from here. Any help and 
suggestions are welcomed for the decision.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-17 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
> actually the only other thing I need to make sure is there aren't any 
delays if we now send the information from yarn allocator back to scheduler and 
then I assume it would need to get it back again from scheduler. During that 
the yarn allocator could be calling allocate() and updating things. So we need 
to make sure it gets the most up to date blacklist.

> also I need to double check but the blacklist information isn't being 
sent to the yarn allocator when dynamic allocation is off right? We would want 
that to happen.

yeah both good points.  actually, don't we want to update the general node 
blacklist on the yarn allocator even when dynamic allocation is off?  I don't 
think it gets updated at all unless dynamic allocation is on, it seems all the 
updates originate in `ExecutorAllocationManager`, the blacklist never actively 
pushes updates to the yarn allocator.  That seems like an existing shortcoming.

> do you know if mesos and/or kubernetes can provide this same information?

I don't know about kubernetes at all.  Mesos does provide info when a 
container fails.  I don't think it lets you know the total cluster size, but 
that should be optional.  Btw, node count is never going to be totally 
sufficient, as the remaining nodes might not actually be able to run your 
executors (smaller hardware, always taken up by higher priority applications, 
other constraints in a framework like mesos), its always going to be best 
effort.

@attilapiros and I discussed this briefly yesterday, an alternative to 
moving everything into the BlacklistTracker on the driver is to just have some 
abstract base class, which is changed slightly for each cluster manager.  Then 
you could keep the flow like it is here, with the extra blacklisting living in 
YarnAllocator still.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-17 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21068
  
actually the only other thing I need to make sure is there aren't any 
delays if we now send the information from yarn allocator back to scheduler and 
then I assume it would need to get it back again from scheduler.  During that 
the yarn allocator could be calling allocate() and updating things.  So we need 
to make sure it gets the most up to date blacklist.

also I need to double check but the blacklist information isn't being sent 
to the yarn allocator when dynamic allocation is off right?  We would want that 
to happen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89373/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89373/testReport)**
 for PR 21068 at commit 
[`57086bb`](https://github.com/apache/spark/commit/57086bb1369a522e19bc92f64607b453743605c7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89373/testReport)**
 for PR 21068 at commit 
[`57086bb`](https://github.com/apache/spark/commit/57086bb1369a522e19bc92f64607b453743605c7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89355/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89355/testReport)**
 for PR 21068 at commit 
[`e49bd0d`](https://github.com/apache/spark/commit/e49bd0de5c25df4eb65ba975e948e043c0e076cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89350/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >