[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-26 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
great, any other comment before merging?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
fyi I just merged SPARK-16330, which is currently very yarn specific, but I 
think you could easily refactor that code to reuse a lot for mesos too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91930/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91930/testReport)**
 for PR 20640 at commit 
[`5eda874`](https://github.com/apache/spark/commit/5eda874e1b9b05396c57413b743995201e02ec3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91930/testReport)**
 for PR 20640 at commit 
[`5eda874`](https://github.com/apache/spark/commit/5eda874e1b9b05396c57413b743995201e02ec3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
> @squito reading the code here: 
https://github.com/apache/spark/pull/21068/files
is there an option to update the info about blacklisted node when there is 
a mesos task failure.
It is a bit inconvenient to lose such events and wait for spark tasks to 
fail which may never launch since you dont have any executors running anyway.

That change does not cover mesos, to keep the scope a little smaller and 
because none of us really know how to test on mesos.  But it should be pretty 
straightforward to refactor whats there so you could use the same general logic 
in mesos.  I think it would be easy for you to unify a lot of that, even the 
failure validity interval stuff.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@IgorBerman I saw the ticket. We need to cover mesos task failures because 
they are also pretty common.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91911/testReport)**
 for PR 20640 at commit 
[`2c47271`](https://github.com/apache/spark/commit/2c47271176b82e4859667ede9bb02b28b8fba50e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91911/testReport)**
 for PR 20640 at commit 
[`2c47271`](https://github.com/apache/spark/commit/2c47271176b82e4859667ede9bb02b28b8fba50e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-15 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@felixcheung sure, no problem. I'll open jira, and will add comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-12 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
@IgorBerman any thought on this comment? 
https://github.com/apache/spark/pull/20640#discussion_r191272487


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@felixcheung sorry I missed something? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91639/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
ping @IgorBerman 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-05-25 Thread susanxhuynh
Github user susanxhuynh commented on the issue:

https://github.com/apache/spark/pull/20640
  
I'm in favor of merging this. The hardcoded limit is pretty bad - 
particularly for streaming jobs; it would be preferable to remove it ASAP even 
though it may not be a complete solution.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-05-25 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
sure @squito .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-05-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
thats fine with me, but as I'm neither a user of mesos nor am I in touch w/ 
many mesos users, I'd like wait a bit for more opinions, given the 
ramifications of this change. (that shouldn't block work on the better version, 
if anybody wants to take that on ...) @IgorBerman @susanxhuynh

blacklisting in yarn (at least, what spark does internally already) is 
really not much more sophisticated, at least before SPARK-16630; spark does 
tell yarn which hosts it has blacklisted so it wants to avoid for future 
executors, but thats about it.  Yarn itself is doing a little more as well, as 
it has its own disk health checker etc., and it'll try to exclude resources 
from *all* applications if it thinks they are bad.  but that is independent of 
changes within spark itself.

also I'd like to see a jira for mesos to discuss the other improvements 
we've discussed here to be more like SPARK-16630 so we don't forget about it.  
@skonto can you file that jira and try to capture some of the points that have 
been raised in the discussion here? (or maybe that jira exists already?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-05-19 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@felixcheung  @squito let's merge this to avoid the limitation of 
MAX_SLAVE_FAILURES = 2 (this is not good at all) and then for sake of 
completeness and correctness we adapt the fix in SPARK-16630 to mesos.
I think blacklisting code for yarn is more sophisticated anyway. Thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-04-19 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
Hi Everyone, just wanted to let you know that SPARK-16630 is progressing 
now https://github.com/apache/spark/pull/21068 and after some discussion, the 
implementation will actually need to live inside the cluster manager code for 
yarn.  But Mesos should be able to do something very similar, in fact I suspect 
you could refactor a lot of that code so that its used by mesos as well -- in 
principle the allocation blacklist just needs to know when container allocation 
fails, and optionally the node count of the cluster.  I don't totally 
understand where things live in mesos and if the ApplicationMaster / Driver 
distinction is present there as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-04-01 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto, @squito , @kayousterhout ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88628/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #88628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88628/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #88628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88628/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-27 Thread hantuzun
Github user hantuzun commented on the issue:

https://github.com/apache/spark/pull/20640
  
Thanks a lot for the work here.

Wouldn't it be better if we change the log in the last commit 
(https://github.com/apache/spark/pull/20640/commits/66ed5afae1a5b4856e84c805c7858d552f38b26a)
 like below? 
```diff
- Mesos slave $slaveId failed
+ Task $taskId failed on Mesos slave $slaveId.
```

I'm not sure about logging convention of the project but I feel like 
logging it as `error` could be an option.

---

OOT: Having Spark 2.3 released, how could we have 
https://spark.apache.org/versioning-policy.html updated with Spark 2.4 release 
window?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88486/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #88486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88486/testReport)**
 for PR 20640 at commit 
[`66ed5af`](https://github.com/apache/spark/commit/66ed5afae1a5b4856e84c805c7858d552f38b26a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #88486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88486/testReport)**
 for PR 20640 at commit 
[`66ed5af`](https://github.com/apache/spark/commit/66ed5afae1a5b4856e84c805c7858d552f38b26a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto added logging when tasks failed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-20 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito @IgorBerman let's move on with this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-13 Thread susanxhuynh
Github user susanxhuynh commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto Yes, I'm ok with that. Sorry for the delayed response.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-03-13 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@susanxhuynh are you ok with that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-26 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto, @susanxhuynh, @squito So let's agree that:
1. I'll revert log when there is some failure. I'll reword it to be 
something without "blacklisting"
2. The blacklisting itself will be moved to BlacklistTracker(as it now)

bottom line the only thing missing - adding log in a case of failure(but 
without counting number of failures etc)

WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-22 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@susanxhuynh I agree in case it is not enabled we can log failures as 
usual, but not for blacklisting as it is disabled it wouldnt make sense. User 
should have this option not to care. 

> In the case where an executor fails before entering Spark code (for 
example, Mesos agent failed to create the sandbox), would it be detected?

Good question forgot to mention this. In this scenario a task status update 
will be given
eg. 
[REASON_CONTAINER_LAUNCH_FAILED](https://github.com/apache/mesos/blob/5e5a8102c3281db25a37157dac123b0ca546e030/docs/task-state-reasons.md)

This is done implicitly 
[here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L658)
  in status update which then calls [removeExecutor 
](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L728)
 which then sends a message to drivers point to remove the executor and then 
this 
[line](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L330)
 is called which then will calls another helper 
[method](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L536),
 which calls this [one](ht
 
tps://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L595)
 and in there blacklist info is updated.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-22 Thread susanxhuynh
Github user susanxhuynh commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto We should not remove the logging. The logging 
[here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194)
 is only available if blacklisting is enabled, but by default blacklisting is 
disabled. The BlacklistTracker object [is not 
created](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L748)
 if blacklisting is disabled. But, we might still want to see the log of 
executor failure in this case.

In the case where an executor fails before entering Spark code (for 
example, Mesos agent failed to create the sandbox), would it be detected?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-22 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito @IgorBerman a few points:
a) This should be configurable. SPARK-16630 is not closed, it seems also 
kubernetes needs to support checking for blacklisted nodes when launching 
tasks. From what I see [here 
](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L172)
 though, the CoarseGrainedSchedulerBackend checks if the executor is acceptable 
when it registers back to the backend. So if it is launched on a black listed 
node it is removed immediately. Of course we want to fail fast and thus, in 
kubernetes case we should also have code to exclude the blacklisted nodes when 
launching pods. See 
[here](https://github.com/kubernetes/kubernetes/issues/14573) and 
[here](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/).
 This is essentially the same concept like in 
[yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/
 scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L127-L133).
 
b) I see TaskSchedulerImpl a higher level abstraction compared 
CoarseGrainedSchedulerBackend (this is subclassed by all major backends).  The 
thing is blacklist info is kept in TaskSchedulerImpl and all backends update it 
implicitly in different paths.  CoarseGrainedSchedulerBackend updates it when 
there is a new status update for 
[example](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L122).
 At the end of this path 
[handleFailedTask](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L772)
 is called which updates the related info.
A more interesting path for the discussion is when an executor disconnects 
and this is where 
[Yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L213)
 and 
[Kubernetes](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L428)
 override the driver's endpoint to set 
[criteria](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L356-L366)
 for when a executor is considered lost and task failure should be taken into 
consideration for blacklisting. The reason you want this is preemption, or for 
mesos ,besides that, could be a scenario where we have an agent restarting 
([process is restarte
 d](http://mesos.apache.org/documentation/latest/agent-recovery/)) and then 
executors are killed (framework checkpointing not enabled, eg. when upgrading). 
In the latter case if I restart my agent process I dont want to blacklist it, 
my slave is ready to go as soon as it reconnects to master. I also dont want to 
wait for blacklisting to expire. 

To move forward, I suggest:
a) remove the handling logic for blacklisting in mesos. Logging is done 
when a node is blacklisted 
[here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194).
  
b) override logic at driver's point when there is a 
[need](http://mesos.apache.org/documentation/latest/oversubscription/) for 
doing so. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito, @susanxhuynh  no worries, I have my internal patch.

As a side note, I'm starting to think that my first proposition of 
introducing new config that will just override this hardcoded constant(2) will 
be most pragmatic approach in this situation. It's not clear to me when 
SPARK-16630 will be resolved(open from 1.6). On the other hand, the price would 
be to introduce new config(maybe without documenting it) for people that need 
fix which will be deprecated as soon as SPARK-16630 + this change-set will 
provide as better systematic solution.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
sure @skonto, great to have somebody more knowledgable on mesos taking a 
closer look at this.

sorry @IgorBerman I promised a quick fix here, but have realized this is 
more complicated than we originally thought.  but the discussion is moving 
forward at least, so thanks for driving it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito before you merge anything give me some time to have a look. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
cc @attilapiros , you may be interested b/c of how this relates to 
SPARK-16630


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
ok hmm ... so actually this change would lose some important functionality 
then.  unfortunately I don't have a clear picture yet of how to solve 
SPARK-16630 along with the other blacklisting.

sorry this might actually need a more general solution, need to think about 
it a bit more ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito, yes failed to start mesos executors caused nodes(slaves) to be 
blacklisted


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
@susanxhuynh good point about changing default behavior.  I'd rather have 
the change so we have more unified behavior between mesos and other cluster 
managers.  But I have never run spark on mesos or even had much interaction 
with users of spark on mesos, so I will defer to others judgement.  Another 
option: we could leave the old behavior, unless a user sets 
spark.blacklist.enabled=true.  its a little wonky, but that also guarantees you 
always get some blacklisting.

I've also been considering turning blacklisting on by default in spark 2.4. 
 So far I've had good feedback from users running it (though we'll get way more 
feedback when its on by default).

btw, one hole in the general blacklisting is handling cases when the 
executors fail to even start: https://issues.apache.org/jira/browse/SPARK-16630

would that be covered by mesos with the old code?  just want to make sure 
we aren't losing that ability.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito I'm with debug logging and see declines. Overall it's working. 
However there are different possible reasons for declines not necessary due to 
blacklisted node(Actually there is jira that talks about enriching this 
information about declines)

Regarding "much effect" - I mean that killing executor manually is not 
necessary causing BlacklistTracker to add it to the blacklist(driver usually 
accepts offer from other slave), so chances that I'll cause enough failures on 
same executor are pretty low. Maybe I'll reduce all blacklist.max* to be 1, 
thus aggressively blacklisting only 1 failure. WDYT?

Regarding "other bug", which is probably not a bug...My understanding was 
that in coarse grained mode only 1 executor possible per application per node, 
but might be I'm wrong.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
thanks @IgorBerman, description looks fine to me now, maybe I saw it wrong 
before.

your test sounds pretty good to me ... you could turn on debug logging for 
MesosCoarseGrainedSchedulerBackend and look for these log msgs:


https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L603

What do you mean "it didn't have much effect" -- sounds like it did exactly 
the right thing?

Sorry, I don't really understand description of the other bug you 
mentioned.  Why shouldn't it start a 2nd executor on the same slave for the 
same application?  That seems fine until you have enough failures for the node 
blacklisting to take effect.  There is also a small race (that is relatively 
benign) that would allow you to get a an executor on a node which you are in 
the middle of blacklisting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito so I've run internal workload with blacklisting on, with killing 
executors for the framework manually. It didn't have much effect(only once got 
blacklisting 1 executor and then after timeout it removed it)
If you have some idea how to test it properly I can try to do it 
specifically aiming to this scenario of rejecting offers from mesos master for 
blacklisted slave resource.

ps: not strictly connected to this conversation, but as a side note: I've 
probably found bug that running in coarse grained mode on mesos + 
dyn.allocation on + blacklisting on(not sure if it's relevant) + client mode 
and killing manually executors on some slaves makes spark driver to start 2nd 
executor on same slave for the same application(which is against constraints of 
coarse grained mode?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito updated to your wording. Where do you see headers duplicated? 
I'm going to test custom build that contains this patch(actually on top of 
2.2.0) to verify that basic blacklisting works for executor level blacklisting 
and that offers are rejected up to timeout duration.
So I'll update you about my findings/testing results


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87580/testReport)**
 for PR 20640 at commit 
[`e2ddc1b`](https://github.com/apache/spark/commit/e2ddc1be19e2f978df4fe84073aff3f5b46afe45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87580/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87580/testReport)**
 for PR 20640 at commit 
[`e2ddc1b`](https://github.com/apache/spark/commit/e2ddc1be19e2f978df4fe84073aff3f5b46afe45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
lgtm

@IgorBerman can you cleanup the PR description a little?  headers got 
duplicated.  And I'd reword a bit to something like

> This updates the Mesos scheduler to integrate with the common logic for 
node-blacklisting across all cluster managers in BlacklistTracker.  
Specifically, it removes a hardcoded MAX_SLAVE_FAILURES = 2 in 
MesosCoarseGrainedSchedulerBackend, and uses the blacklist from the 
BlacklistTracker, as Yarn does.

> This closes https://github.com/apache/spark/pull/17619

(the last "this closes" bit is useful for some tools we have, it will close 
the other one when this is merged.)

thanks for doing this.  I will probably leave this open for a bit if more 
mesos users have thoughts


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87564/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87564/testReport)**
 for PR 20640 at commit 
[`2f1db89`](https://github.com/apache/spark/commit/2f1db89de162a05a1cee4c1221de52eacbcbdb68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87563/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87563/testReport)**
 for PR 20640 at commit 
[`1eb701c`](https://github.com/apache/spark/commit/1eb701cdd07652d4f196bc960b7d8218c0c46ba0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87564/testReport)**
 for PR 20640 at commit 
[`2f1db89`](https://github.com/apache/spark/commit/2f1db89de162a05a1cee4c1221de52eacbcbdb68).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87563/testReport)**
 for PR 20640 at commit 
[`1eb701c`](https://github.com/apache/spark/commit/1eb701cdd07652d4f196bc960b7d8218c0c46ba0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito ok, I've cherry-picked @ 2 commits, so the history will contain his 
contribution and added on top of it commit with your proposition, hope now it's 
ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87562/testReport)**
 for PR 20640 at commit 
[`918e066`](https://github.com/apache/spark/commit/918e066cbb661110b6779c558a5377147a5b1d1e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87562/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87562/testReport)**
 for PR 20640 at commit 
[`918e066`](https://github.com/apache/spark/commit/918e066cbb661110b6779c558a5377147a5b1d1e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
thanks for updating.  can you also update the PR description?

yeah its fine to just update this one.  You can't in general update others' 
prs, unless they give you push permissions to their repos.  You *can* start 
from their branch, and then add your changes on top -- that is a little 
preferable as that way the commit history includes their work, so when someone 
merges its a bit more obvious.  If you can adjust this PR that way, that would 
be nice -- but otherwise its OK, I will try to remember to adjust attribution 
when merging


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87558/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87558/testReport)**
 for PR 20640 at commit 
[`2240259`](https://github.com/apache/spark/commit/22402594645b3ee14106da61cb401d6555ba2e1b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87558/testReport)**
 for PR 20640 at commit 
[`2240259`](https://github.com/apache/spark/commit/22402594645b3ee14106da61cb401d6555ba2e1b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito ok, I can do this small fix to the PR of @timout. How can I update 
his PR/his code? Or I can use this one? WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
I understand if you want to do something like this for yourself to unblock, 
but I think I'm -1 on merging this because of adding more configs just for a 
stopgap.

but I think we agree on the right solution here -- if you post that I will 
try to review promptly since I've got this paged in (though it might take a bit 
to merge as we figure out how to test ...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@squito In general I agree with you, however current status of hardcoded 
max 2 failures is blocking, so I've created simple fix that will demand less 
integration testing. I can close it.
I think in context of MesosCoarseGrainedSchedulerBackend the 
TaskSchedulerImpl called scheduler so to test if slaveId is blacklisted 
probably following is needed:
{code}
!scheduler.nodeBlacklist().contains(slaveId)
{code}


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87556/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87556/testReport)**
 for PR 20640 at commit 
[`ca4eba2`](https://github.com/apache/spark/commit/ca4eba2ac59eadd1c1f3c36a5960174bbc9e662f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
@IgorBerman I actually think that 
https://github.com/apache/spark/pull/17619 is the right approach.  As @timout 
pointed out on that one, this functionality doesn't need to be covered in mesos 
specific code at all, as its covered by the BlacklistTracker.  I don't like 
introducing new configs when we don't really need them.  Other than this being 
less invasive, is there another advantage here?

The modifications I suggested to that PR are relatively small -- I think 
its fine if you want to open a PR that is the original updated with my 
suggestions (credit still to @timout), as I'm not sure if they're still working 
on it.  (my fault too as there was such a long delay for a proper review.)

While I had some open questions, I think its a clear improvement in any 
case.  I just need to get a little help on mesos testing, we can ask on the dev 
list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87556/testReport)**
 for PR 20640 at commit 
[`ca4eba2`](https://github.com/apache/spark/commit/ca4eba2ac59eadd1c1f3c36a5960174bbc9e662f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20640
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org