[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20640 great, any other comment before merging? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 fyi I just merged SPARK-16330, which is currently very yarn specific, but I think you could easily refactor that code to reuse a lot for mesos too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91930/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91930/testReport)** for PR 20640 at commit [`5eda874`](https://github.com/apache/spark/commit/5eda874e1b9b05396c57413b743995201e02ec3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91930/testReport)** for PR 20640 at commit [`5eda874`](https://github.com/apache/spark/commit/5eda874e1b9b05396c57413b743995201e02ec3d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 > @squito reading the code here: https://github.com/apache/spark/pull/21068/files is there an option to update the info about blacklisted node when there is a mesos task failure. It is a bit inconvenient to lose such events and wait for spark tasks to fail which may never launch since you dont have any executors running anyway. That change does not cover mesos, to keep the scope a little smaller and because none of us really know how to test on mesos. But it should be pretty straightforward to refactor whats there so you could use the same general logic in mesos. I think it would be easy for you to unify a lot of that, even the failure validity interval stuff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @IgorBerman I saw the ticket. We need to cover mesos task failures because they are also pretty common. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91911/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91911/testReport)** for PR 20640 at commit [`2c47271`](https://github.com/apache/spark/commit/2c47271176b82e4859667ede9bb02b28b8fba50e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91911/testReport)** for PR 20640 at commit [`2c47271`](https://github.com/apache/spark/commit/2c47271176b82e4859667ede9bb02b28b8fba50e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @felixcheung sure, no problem. I'll open jira, and will add comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20640 @IgorBerman any thought on this comment? https://github.com/apache/spark/pull/20640#discussion_r191272487 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @felixcheung sorry I missed something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91639/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20640 ping @IgorBerman --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20640 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 I'm in favor of merging this. The hardcoded limit is pretty bad - particularly for streaming jobs; it would be preferable to remove it ASAP even though it may not be a complete solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 sure @squito . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 thats fine with me, but as I'm neither a user of mesos nor am I in touch w/ many mesos users, I'd like wait a bit for more opinions, given the ramifications of this change. (that shouldn't block work on the better version, if anybody wants to take that on ...) @IgorBerman @susanxhuynh blacklisting in yarn (at least, what spark does internally already) is really not much more sophisticated, at least before SPARK-16630; spark does tell yarn which hosts it has blacklisted so it wants to avoid for future executors, but thats about it. Yarn itself is doing a little more as well, as it has its own disk health checker etc., and it'll try to exclude resources from *all* applications if it thinks they are bad. but that is independent of changes within spark itself. also I'd like to see a jira for mesos to discuss the other improvements we've discussed here to be more like SPARK-16630 so we don't forget about it. @skonto can you file that jira and try to capture some of the points that have been raised in the discussion here? (or maybe that jira exists already?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @felixcheung @squito let's merge this to avoid the limitation of MAX_SLAVE_FAILURES = 2 (this is not good at all) and then for sake of completeness and correctness we adapt the fix in SPARK-16630 to mesos. I think blacklisting code for yarn is more sophisticated anyway. Thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 Hi Everyone, just wanted to let you know that SPARK-16630 is progressing now https://github.com/apache/spark/pull/21068 and after some discussion, the implementation will actually need to live inside the cluster manager code for yarn. But Mesos should be able to do something very similar, in fact I suspect you could refactor a lot of that code so that its used by mesos as well -- in principle the allocation blacklist just needs to know when container allocation fails, and optionally the node count of the cluster. I don't totally understand where things live in mesos and if the ApplicationMaster / Driver distinction is present there as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20640 @skonto, @squito , @kayousterhout ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88628/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #88628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88628/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #88628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88628/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user hantuzun commented on the issue: https://github.com/apache/spark/pull/20640 Thanks a lot for the work here. Wouldn't it be better if we change the log in the last commit (https://github.com/apache/spark/pull/20640/commits/66ed5afae1a5b4856e84c805c7858d552f38b26a) like below? ```diff - Mesos slave $slaveId failed + Task $taskId failed on Mesos slave $slaveId. ``` I'm not sure about logging convention of the project but I feel like logging it as `error` could be an option. --- OOT: Having Spark 2.3 released, how could we have https://spark.apache.org/versioning-policy.html updated with Spark 2.4 release window? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88486/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #88486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88486/testReport)** for PR 20640 at commit [`66ed5af`](https://github.com/apache/spark/commit/66ed5afae1a5b4856e84c805c7858d552f38b26a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #88486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88486/testReport)** for PR 20640 at commit [`66ed5af`](https://github.com/apache/spark/commit/66ed5afae1a5b4856e84c805c7858d552f38b26a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @skonto added logging when tasks failed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @squito @IgorBerman let's move on with this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 @skonto Yes, I'm ok with that. Sorry for the delayed response. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @susanxhuynh are you ok with that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @skonto, @susanxhuynh, @squito So let's agree that: 1. I'll revert log when there is some failure. I'll reword it to be something without "blacklisting" 2. The blacklisting itself will be moved to BlacklistTracker(as it now) bottom line the only thing missing - adding log in a case of failure(but without counting number of failures etc) WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @susanxhuynh I agree in case it is not enabled we can log failures as usual, but not for blacklisting as it is disabled it wouldnt make sense. User should have this option not to care. > In the case where an executor fails before entering Spark code (for example, Mesos agent failed to create the sandbox), would it be detected? Good question forgot to mention this. In this scenario a task status update will be given eg. [REASON_CONTAINER_LAUNCH_FAILED](https://github.com/apache/mesos/blob/5e5a8102c3281db25a37157dac123b0ca546e030/docs/task-state-reasons.md) This is done implicitly [here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L658) in status update which then calls [removeExecutor ](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L728) which then sends a message to drivers point to remove the executor and then this [line](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L330) is called which then will calls another helper [method](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L536), which calls this [one](ht tps://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L595) and in there blacklist info is updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/20640 @skonto We should not remove the logging. The logging [here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194) is only available if blacklisting is enabled, but by default blacklisting is disabled. The BlacklistTracker object [is not created](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L748) if blacklisting is disabled. But, we might still want to see the log of executor failure in this case. In the case where an executor fails before entering Spark code (for example, Mesos agent failed to create the sandbox), would it be detected? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @squito @IgorBerman a few points: a) This should be configurable. SPARK-16630 is not closed, it seems also kubernetes needs to support checking for blacklisted nodes when launching tasks. From what I see [here ](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L172) though, the CoarseGrainedSchedulerBackend checks if the executor is acceptable when it registers back to the backend. So if it is launched on a black listed node it is removed immediately. Of course we want to fail fast and thus, in kubernetes case we should also have code to exclude the blacklisted nodes when launching pods. See [here](https://github.com/kubernetes/kubernetes/issues/14573) and [here](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/). This is essentially the same concept like in [yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/ scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L127-L133). b) I see TaskSchedulerImpl a higher level abstraction compared CoarseGrainedSchedulerBackend (this is subclassed by all major backends). The thing is blacklist info is kept in TaskSchedulerImpl and all backends update it implicitly in different paths. CoarseGrainedSchedulerBackend updates it when there is a new status update for [example](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L122). At the end of this path [handleFailedTask](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L772) is called which updates the related info. A more interesting path for the discussion is when an executor disconnects and this is where [Yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L213) and [Kubernetes](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L428) override the driver's endpoint to set [criteria](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L356-L366) for when a executor is considered lost and task failure should be taken into consideration for blacklisting. The reason you want this is preemption, or for mesos ,besides that, could be a scenario where we have an agent restarting ([process is restarte d](http://mesos.apache.org/documentation/latest/agent-recovery/)) and then executors are killed (framework checkpointing not enabled, eg. when upgrading). In the latter case if I restart my agent process I dont want to blacklist it, my slave is ready to go as soon as it reconnects to master. I also dont want to wait for blacklisting to expire. To move forward, I suggest: a) remove the handling logic for blacklisting in mesos. Logging is done when a node is blacklisted [here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194). b) override logic at driver's point when there is a [need](http://mesos.apache.org/documentation/latest/oversubscription/) for doing so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito, @susanxhuynh no worries, I have my internal patch. As a side note, I'm starting to think that my first proposition of introducing new config that will just override this hardcoded constant(2) will be most pragmatic approach in this situation. It's not clear to me when SPARK-16630 will be resolved(open from 1.6). On the other hand, the price would be to introduce new config(maybe without documenting it) for people that need fix which will be deprecated as soon as SPARK-16630 + this change-set will provide as better systematic solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 sure @skonto, great to have somebody more knowledgable on mesos taking a closer look at this. sorry @IgorBerman I promised a quick fix here, but have realized this is more complicated than we originally thought. but the discussion is moving forward at least, so thanks for driving it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @squito before you merge anything give me some time to have a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 cc @attilapiros , you may be interested b/c of how this relates to SPARK-16630 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 ok hmm ... so actually this change would lose some important functionality then. unfortunately I don't have a clear picture yet of how to solve SPARK-16630 along with the other blacklisting. sorry this might actually need a more general solution, need to think about it a bit more ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito, yes failed to start mesos executors caused nodes(slaves) to be blacklisted --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 @susanxhuynh good point about changing default behavior. I'd rather have the change so we have more unified behavior between mesos and other cluster managers. But I have never run spark on mesos or even had much interaction with users of spark on mesos, so I will defer to others judgement. Another option: we could leave the old behavior, unless a user sets spark.blacklist.enabled=true. its a little wonky, but that also guarantees you always get some blacklisting. I've also been considering turning blacklisting on by default in spark 2.4. So far I've had good feedback from users running it (though we'll get way more feedback when its on by default). btw, one hole in the general blacklisting is handling cases when the executors fail to even start: https://issues.apache.org/jira/browse/SPARK-16630 would that be covered by mesos with the old code? just want to make sure we aren't losing that ability. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito I'm with debug logging and see declines. Overall it's working. However there are different possible reasons for declines not necessary due to blacklisted node(Actually there is jira that talks about enriching this information about declines) Regarding "much effect" - I mean that killing executor manually is not necessary causing BlacklistTracker to add it to the blacklist(driver usually accepts offer from other slave), so chances that I'll cause enough failures on same executor are pretty low. Maybe I'll reduce all blacklist.max* to be 1, thus aggressively blacklisting only 1 failure. WDYT? Regarding "other bug", which is probably not a bug...My understanding was that in coarse grained mode only 1 executor possible per application per node, but might be I'm wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 thanks @IgorBerman, description looks fine to me now, maybe I saw it wrong before. your test sounds pretty good to me ... you could turn on debug logging for MesosCoarseGrainedSchedulerBackend and look for these log msgs: https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L603 What do you mean "it didn't have much effect" -- sounds like it did exactly the right thing? Sorry, I don't really understand description of the other bug you mentioned. Why shouldn't it start a 2nd executor on the same slave for the same application? That seems fine until you have enough failures for the node blacklisting to take effect. There is also a small race (that is relatively benign) that would allow you to get a an executor on a node which you are in the middle of blacklisting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito so I've run internal workload with blacklisting on, with killing executors for the framework manually. It didn't have much effect(only once got blacklisting 1 executor and then after timeout it removed it) If you have some idea how to test it properly I can try to do it specifically aiming to this scenario of rejecting offers from mesos master for blacklisted slave resource. ps: not strictly connected to this conversation, but as a side note: I've probably found bug that running in coarse grained mode on mesos + dyn.allocation on + blacklisting on(not sure if it's relevant) + client mode and killing manually executors on some slaves makes spark driver to start 2nd executor on same slave for the same application(which is against constraints of coarse grained mode?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito updated to your wording. Where do you see headers duplicated? I'm going to test custom build that contains this patch(actually on top of 2.2.0) to verify that basic blacklisting works for executor level blacklisting and that offers are rejected up to timeout duration. So I'll update you about my findings/testing results --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87580/testReport)** for PR 20640 at commit [`e2ddc1b`](https://github.com/apache/spark/commit/e2ddc1be19e2f978df4fe84073aff3f5b46afe45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87580/testReport)** for PR 20640 at commit [`e2ddc1b`](https://github.com/apache/spark/commit/e2ddc1be19e2f978df4fe84073aff3f5b46afe45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 lgtm @IgorBerman can you cleanup the PR description a little? headers got duplicated. And I'd reword a bit to something like > This updates the Mesos scheduler to integrate with the common logic for node-blacklisting across all cluster managers in BlacklistTracker. Specifically, it removes a hardcoded MAX_SLAVE_FAILURES = 2 in MesosCoarseGrainedSchedulerBackend, and uses the blacklist from the BlacklistTracker, as Yarn does. > This closes https://github.com/apache/spark/pull/17619 (the last "this closes" bit is useful for some tools we have, it will close the other one when this is merged.) thanks for doing this. I will probably leave this open for a bit if more mesos users have thoughts --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87564/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87564/testReport)** for PR 20640 at commit [`2f1db89`](https://github.com/apache/spark/commit/2f1db89de162a05a1cee4c1221de52eacbcbdb68). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87563/testReport)** for PR 20640 at commit [`1eb701c`](https://github.com/apache/spark/commit/1eb701cdd07652d4f196bc960b7d8218c0c46ba0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87564/testReport)** for PR 20640 at commit [`2f1db89`](https://github.com/apache/spark/commit/2f1db89de162a05a1cee4c1221de52eacbcbdb68). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87563/testReport)** for PR 20640 at commit [`1eb701c`](https://github.com/apache/spark/commit/1eb701cdd07652d4f196bc960b7d8218c0c46ba0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito ok, I've cherry-picked @ 2 commits, so the history will contain his contribution and added on top of it commit with your proposition, hope now it's ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87562/testReport)** for PR 20640 at commit [`918e066`](https://github.com/apache/spark/commit/918e066cbb661110b6779c558a5377147a5b1d1e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87562/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87562/testReport)** for PR 20640 at commit [`918e066`](https://github.com/apache/spark/commit/918e066cbb661110b6779c558a5377147a5b1d1e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 thanks for updating. can you also update the PR description? yeah its fine to just update this one. You can't in general update others' prs, unless they give you push permissions to their repos. You *can* start from their branch, and then add your changes on top -- that is a little preferable as that way the commit history includes their work, so when someone merges its a bit more obvious. If you can adjust this PR that way, that would be nice -- but otherwise its OK, I will try to remember to adjust attribution when merging --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87558/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87558/testReport)** for PR 20640 at commit [`2240259`](https://github.com/apache/spark/commit/22402594645b3ee14106da61cb401d6555ba2e1b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87558/testReport)** for PR 20640 at commit [`2240259`](https://github.com/apache/spark/commit/22402594645b3ee14106da61cb401d6555ba2e1b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito ok, I can do this small fix to the PR of @timout. How can I update his PR/his code? Or I can use this one? WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 I understand if you want to do something like this for yourself to unblock, but I think I'm -1 on merging this because of adding more configs just for a stopgap. but I think we agree on the right solution here -- if you post that I will try to review promptly since I've got this paged in (though it might take a bit to merge as we figure out how to test ...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @squito In general I agree with you, however current status of hardcoded max 2 failures is blocking, so I've created simple fix that will demand less integration testing. I can close it. I think in context of MesosCoarseGrainedSchedulerBackend the TaskSchedulerImpl called scheduler so to test if slaveId is blacklisted probably following is needed: {code} !scheduler.nodeBlacklist().contains(slaveId) {code} --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87556/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87556/testReport)** for PR 20640 at commit [`ca4eba2`](https://github.com/apache/spark/commit/ca4eba2ac59eadd1c1f3c36a5960174bbc9e662f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 @IgorBerman I actually think that https://github.com/apache/spark/pull/17619 is the right approach. As @timout pointed out on that one, this functionality doesn't need to be covered in mesos specific code at all, as its covered by the BlacklistTracker. I don't like introducing new configs when we don't really need them. Other than this being less invasive, is there another advantage here? The modifications I suggested to that PR are relatively small -- I think its fine if you want to open a PR that is the original updated with my suggestions (credit still to @timout), as I'm not sure if they're still working on it. (my fault too as there was such a long delay for a proper review.) While I had some open questions, I think its a clear improvement in any case. I just need to get a little help on mesos testing, we can ask on the dev list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #87556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87556/testReport)** for PR 20640 at commit [`ca4eba2`](https://github.com/apache/spark/commit/ca4eba2ac59eadd1c1f3c36a5960174bbc9e662f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20640 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org