[GitHub] spark issue #17619: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2017-12-22 Thread timout
Github user timout commented on the issue:

https://github.com/apache/spark/pull/17619
  
That does exactly what is supposed to do. And you absolutely right it 
related to executors.
I am sorry if it is not clear from my previous explanations.
Let us say:
Spark Streaming App - very long running app:
 Driver, started by marathon using  docker image, schedules (in mesos 
meaning) executors using 
 docker images.(net=HOST) (every executor started from docker image on some 
mesos agent)
So if some recoverable error happens, for instance: 
ExecutorLostFailure (executor 40 exited caused by one of the running tasks) 
Reason: Remote RPC client disassociated...(I do not know how about others but 
it is relatively often in my env.)
As result the executor will be dead and after 2 failures mesos agent node 
will be included in MesosCoarseGrainedSchedulerBackend black list and driver 
will never schedule (in mesos meaning) executor on it. So the app will 
starve... and notice will not die.
That exactly what happened with my streams apps before that patch.

That patch may be incompatible with master already but i can fix it if 
needed.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17619: [SPARK-19755][Mesos] Blacklist is always active f...

2017-04-12 Thread timout
GitHub user timout opened a pull request:

https://github.com/apache/spark/pull/17619

[SPARK-19755][Mesos] Blacklist is always active for 
MesosCoarseGrainedSchedulerBackend.

## What changes were proposed in this pull request?
MesosCoarseGrainedSchedulerBackend ignored  spark.blacklist.enabled 
configuration property and used hardcoded MAX_SLAVE_FAILURES = 2. The purpose 
of that fix is to remove that hard-coded behaviour. BlacklistTracker is 
resposible for blacklist functionality.

## How was this patch tested?
Unit tests, Manual testing. 
This patch is a clean up. That functionality is tested by BlacklistTracker 
tests.

Author: tabaku...@pulsepoint.com


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/timout/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17619


commit 70ed95ad8c73c1c1ff46dcf191b26f96c51ea09e
Author: antiout 
Date:   2017-04-12T06:17:30Z

Removed hardcoded blacklist functionality, must be controled by 
BlacklistTracker

commit 078634e63aaeacc1b2361a16e1999f0213284ecf
Author: antiout 
Date:   2017-04-12T06:17:34Z

Merge remote-tracking branch 'upstream/master'

commit df2f319a518e1a533dae04d5d6bfa019a8b6845c
Author: antiout 
Date:   2017-04-12T06:35:28Z

Removed hardcoded blacklist functionality, must be controled by 
BlacklistTracker




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org