[jira] [Assigned] (MESOS-4965) Support resizing of an existing persistent volume
[ https://issues.apache.org/jira/browse/MESOS-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhitao Li reassigned MESOS-4965: Assignee: Zhitao Li > Support resizing of an existing persistent volume > - > > Key: MESOS-4965 > URL: https://issues.apache.org/jira/browse/MESOS-4965 > Project: Mesos > Issue Type: Improvement > Components: storage >Reporter: Zhitao Li >Assignee: Zhitao Li >Priority: Major > Labels: mesosphere, persistent-volumes, storage > > We need a mechanism to update the size of a persistent volume. > The increase case is generally more interesting to us (as long as there still > available disk resource on the same disk). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8488) Docker bug can cause unkillable tasks.
[ https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384432#comment-16384432 ] Andrew Schwartzmeyer edited comment on MESOS-8488 at 3/3/18 2:17 AM: - Commit 1daf6cb03 Author: Akash Gupta akash-gu...@hotmail.com Date: Sun Feb 25 13:37:42 2018 -0800 Windows: Fixed flaky Docker command health check test. The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_ DockerHealthStatusChange` test was flaky on Windows, because the Docker executor manually reaps the container exit code in case that `docker run` fails to get the exit code. This logic doesn't work on Windows, since the process might not be visible to the container host machine, causing `TASK_FAILED` to get sent. By removing the reaping logic on Windows, the test is much more reliable. Review: https://reviews.apache.org/r/65733/ was (Author: andschwa): {noformat} commit 1daf6cb03 Author: Akash Gupta akash-gu...@hotmail.com Date: Sun Feb 25 13:37:42 2018 -0800 Windows: Fixed flaky Docker command health check test. The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_ DockerHealthStatusChange` test was flaky on Windows, because the Docker executor manually reaps the container exit code in case that `docker run` fails to get the exit code. This logic doesn't work on Windows, since the process might not be visible to the container host machine, causing `TASK_FAILED` to get sent. By removing the reaping logic on Windows, the test is much more reliable. Review: https://reviews.apache.org/r/65733/ {noformat} > Docker bug can cause unkillable tasks. > -- > > Key: MESOS-8488 > URL: https://issues.apache.org/jira/browse/MESOS-8488 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Qian Zhang >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > Due to an [issue on the Moby > project|https://github.com/moby/moby/issues/33820], it's possible for Docker > versions 1.13 and later to fail to catch a container exit, so that the > {{docker run}} command which was used to launch the container will never > return. This can lead to the Docker executor becoming stuck in a state where > it believes the container is still running and cannot be killed. > We should update the Docker executor to ensure that containers stuck in such > a state cannot cause unkillable Docker executors/tasks. > One way to do this would be a timeout, after which the Docker executor will > commit suicide if a kill task attempt has not succeeded. However, if we do > this we should also ensure that in the case that the container was actually > still running, either the Docker daemon or the DockerContainerizer would > clean up the container when it does exit. > Another option might be for the Docker executor to directly {{wait()}} on the > container's Linux PID, in order to notice when the container exits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8488) Docker bug can cause unkillable tasks.
[ https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384432#comment-16384432 ] Andrew Schwartzmeyer commented on MESOS-8488: - {noformat} commit 1daf6cb03 Author: Akash Gupta akash-gu...@hotmail.com Date: Sun Feb 25 13:37:42 2018 -0800 Windows: Fixed flaky Docker command health check test. The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_ DockerHealthStatusChange` test was flaky on Windows, because the Docker executor manually reaps the container exit code in case that `docker run` fails to get the exit code. This logic doesn't work on Windows, since the process might not be visible to the container host machine, causing `TASK_FAILED` to get sent. By removing the reaping logic on Windows, the test is much more reliable. Review: https://reviews.apache.org/r/65733/ {noformat} > Docker bug can cause unkillable tasks. > -- > > Key: MESOS-8488 > URL: https://issues.apache.org/jira/browse/MESOS-8488 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Qian Zhang >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > Due to an [issue on the Moby > project|https://github.com/moby/moby/issues/33820], it's possible for Docker > versions 1.13 and later to fail to catch a container exit, so that the > {{docker run}} command which was used to launch the container will never > return. This can lead to the Docker executor becoming stuck in a state where > it believes the container is still running and cannot be killed. > We should update the Docker executor to ensure that containers stuck in such > a state cannot cause unkillable Docker executors/tasks. > One way to do this would be a timeout, after which the Docker executor will > commit suicide if a kill task attempt has not succeeded. However, if we do > this we should also ensure that in the case that the container was actually > still running, either the Docker daemon or the DockerContainerizer would > clean up the container when it does exit. > Another option might be for the Docker executor to directly {{wait()}} on the > container's Linux PID, in order to notice when the container exits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8633) Investigate using HTTP server library (e.g. h2o) in libprocess.
Benjamin Mahler created MESOS-8633: -- Summary: Investigate using HTTP server library (e.g. h2o) in libprocess. Key: MESOS-8633 URL: https://issues.apache.org/jira/browse/MESOS-8633 Project: Mesos Issue Type: Task Components: libprocess Reporter: Benjamin Mahler Currently libprocess provides its own HTTP server implementation that leverages libraries (e.g. http-parser) where possible to simplify this. However, even simpler from a maintainability perspective would be to leverage an HTTP server library. https://github.com/h2o/h2o/ is a high performance HTTP server that can be used as a library. This would also provide a significant performance benefit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-5460) Add HDFS support in Windows builds.
[ https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Coffler reassigned MESOS-5460: --- Assignee: (was: Jeff Coffler) > Add HDFS support in Windows builds. > --- > > Key: MESOS-5460 > URL: https://issues.apache.org/jira/browse/MESOS-5460 > Project: Mesos > Issue Type: Task >Reporter: Andrew Schwartzmeyer >Priority: Minor > Labels: agent, fetcher, mesosphere, windows > > Right now we have a bunch of #ifdefs throughout the codebase around the HDFS > code, because Windows doesn't support it. We should explore adding support > for (e.g.) fetching from HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-5460) Add HDFS support in Windows builds.
[ https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384399#comment-16384399 ] Jeff Coffler commented on MESOS-5460: - More information from Joe: The Hadoop (soft) dependency is a catch-all in case the URI is not supported by other fetch methods (like HTTP). We used to see this used for `hdfs://` and `s3://` URIs. There are definitely ways to fetch these URIs without the Hadoop client We will defer this for now. > Add HDFS support in Windows builds. > --- > > Key: MESOS-5460 > URL: https://issues.apache.org/jira/browse/MESOS-5460 > Project: Mesos > Issue Type: Task >Reporter: Andrew Schwartzmeyer >Assignee: Jeff Coffler >Priority: Minor > Labels: agent, fetcher, mesosphere, windows > > Right now we have a bunch of #ifdefs throughout the codebase around the HDFS > code, because Windows doesn't support it. We should explore adding support > for (e.g.) fetching from HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8576) Improve discard handling of 'Docker::inspect()'
[ https://issues.apache.org/jira/browse/MESOS-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384340#comment-16384340 ] Gilbert Song commented on MESOS-8576: - {noformat} commit 8346ab0c812559ef73e1bbd30718f6c74a023079 Author: Greg MannDate: Fri Mar 2 15:39:58 2018 -0800 Avoided orphan subprocess in the Docker library. This patch ensures that `Docker::inspect` will not leave orphan subprocesses behind. Review: https://reviews.apache.org/r/65887/ {noformat} > Improve discard handling of 'Docker::inspect()' > --- > > Key: MESOS-8576 > URL: https://issues.apache.org/jira/browse/MESOS-8576 > Project: Mesos > Issue Type: Improvement > Components: containerization, docker >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > In the call path of {{Docker::inspect()}}, each continuation currently checks > if {{promise->future().hasDiscard()}}, where the {{promise}} is associated > with the output of the {{docker inspect}} call. However, if the call to > {{docker inspect}} becomes hung indefinitely, then continuations are never > invoked, and a subsequent discard of the returned {{Future}} will have no > effect. We should add proper {{onDiscard}} handling to that {{Future}} so > that appropriate cleanup is performed in such cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8632) Add unit tests which exercise resources dynamically
Andrew Schwartzmeyer created MESOS-8632: --- Summary: Add unit tests which exercise resources dynamically Key: MESOS-8632 URL: https://issues.apache.org/jira/browse/MESOS-8632 Project: Mesos Issue Type: Task Reporter: Andrew Schwartzmeyer Assignee: Andrew Schwartzmeyer As discovered in MESOS-8631, we lack tests which exercise the actual reported resources of a machine. We want a test that: (1) Launches an agent that auto-detects resources (2) That test should be parameterized to use different sets of isolators (3) A task that accepts the resources should use all of them -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-6422) cgroups_tests not correctly tearing down testing hierarchies
[ https://issues.apache.org/jira/browse/MESOS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384307#comment-16384307 ] Yan Xu commented on MESOS-6422: --- Sorry this is low priority for me right now so I am unassigning. > cgroups_tests not correctly tearing down testing hierarchies > > > Key: MESOS-6422 > URL: https://issues.apache.org/jira/browse/MESOS-6422 > Project: Mesos > Issue Type: Bug > Components: cgroups, containerization >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Minor > Labels: cgroups > > We currently do the following in > [CgroupsTest::TearDownTestCase()|https://github.com/apache/mesos/blob/5e850a362edbf494921fedff4037cf4b53088c10/src/tests/containerizer/cgroups_tests.cpp#L83] > {code:title=} > static void TearDownTestCase() > { > AWAIT_READY(cgroups::cleanup(TEST_CGROUPS_HIERARCHY)); > } > {code} > One of its derived test {{CgroupsNoHierarchyTest}} treats > {{TEST_CGROUPS_HIERARCHY}} as a hierarchy so it's able to clean it up as a > hierarchy. > However another derived test {{CgroupsAnyHierarchyTest}} would create new > hierarchies (if none is available) using {{TEST_CGROUPS_HIERARCHY}} as a > parent directory (i.e., base hierarchy) and not as a hierarchy, so when it's > time to clean up, it fails: > {noformat:title=} > [ OK ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems (1 ms) > ../../src/tests/containerizer/cgroups_tests.cpp:88: Failure > (cgroups::cleanup(TEST_CGROUPS_HIERARCHY)).failure(): Operation not permitted > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-6422) cgroups_tests not correctly tearing down testing hierarchies
[ https://issues.apache.org/jira/browse/MESOS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-6422: - Assignee: (was: Yan Xu) > cgroups_tests not correctly tearing down testing hierarchies > > > Key: MESOS-6422 > URL: https://issues.apache.org/jira/browse/MESOS-6422 > Project: Mesos > Issue Type: Bug > Components: cgroups, containerization >Reporter: Yan Xu >Priority: Minor > Labels: cgroups > > We currently do the following in > [CgroupsTest::TearDownTestCase()|https://github.com/apache/mesos/blob/5e850a362edbf494921fedff4037cf4b53088c10/src/tests/containerizer/cgroups_tests.cpp#L83] > {code:title=} > static void TearDownTestCase() > { > AWAIT_READY(cgroups::cleanup(TEST_CGROUPS_HIERARCHY)); > } > {code} > One of its derived test {{CgroupsNoHierarchyTest}} treats > {{TEST_CGROUPS_HIERARCHY}} as a hierarchy so it's able to clean it up as a > hierarchy. > However another derived test {{CgroupsAnyHierarchyTest}} would create new > hierarchies (if none is available) using {{TEST_CGROUPS_HIERARCHY}} as a > parent directory (i.e., base hierarchy) and not as a hierarchy, so when it's > time to clean up, it fails: > {noformat:title=} > [ OK ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems (1 ms) > ../../src/tests/containerizer/cgroups_tests.cpp:88: Failure > (cgroups::cleanup(TEST_CGROUPS_HIERARCHY)).failure(): Operation not permitted > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8631) Can't start a task with every CPU on a Windows machine
[ https://issues.apache.org/jira/browse/MESOS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384301#comment-16384301 ] Andrew Schwartzmeyer commented on MESOS-8631: - I believe this should be back-ported to Mesos 1.5.1. > Can't start a task with every CPU on a Windows machine > -- > > Key: MESOS-8631 > URL: https://issues.apache.org/jira/browse/MESOS-8631 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.5.0 > Environment: Windows 10 using the CPU isolator >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer >Priority: Critical > Labels: cpu, isolator, windows > Fix For: 1.5.1 > > > We have an edge case that existing unit tests don't cover: starting a single > task which consumes every reported CPU. The problem is that the executor > overcommits by 0.1 CPUs, so when we go to set the CPU limit, on say a 12 core > machine, we set it to 12.1/12 * 1 CPU cycles, which results in an invalid > number of cycles (10,083, but the max is 10,000). > We were bounds-checking the minimum (1 cycle) but not the maximum, as the > function did not expect the user to request more CPUs than available. > We'll fix this by checking the max bound. So if, for example, 12.1 CPUs are > requested, re set the limit to 1 CPU cycles, no more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8631) Can't start a task with every CPU on a Windows machine
Andrew Schwartzmeyer created MESOS-8631: --- Summary: Can't start a task with every CPU on a Windows machine Key: MESOS-8631 URL: https://issues.apache.org/jira/browse/MESOS-8631 Project: Mesos Issue Type: Bug Affects Versions: 1.5.0 Environment: Windows 10 using the CPU isolator Reporter: Andrew Schwartzmeyer Assignee: Andrew Schwartzmeyer Fix For: 1.5.1 We have an edge case that existing unit tests don't cover: starting a single task which consumes every reported CPU. The problem is that the executor overcommits by 0.1 CPUs, so when we go to set the CPU limit, on say a 12 core machine, we set it to 12.1/12 * 1 CPU cycles, which results in an invalid number of cycles (10,083, but the max is 10,000). We were bounds-checking the minimum (1 cycle) but not the maximum, as the function did not expect the user to request more CPUs than available. We'll fix this by checking the max bound. So if, for example, 12.1 CPUs are requested, re set the limit to 1 CPU cycles, no more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8619) Docker on Windows uses USERPROFILE instead of HOME for credentials
[ https://issues.apache.org/jira/browse/MESOS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383927#comment-16383927 ] Andrew Schwartzmeyer edited comment on MESOS-8619 at 3/2/18 6:21 PM: - Review: https://reviews.apache.org/r/65872 was (Author: andschwa): {noformat} commit 9bd5d8f9b (HEAD -> master, apache/master) Author: Andrew SchwartzmeyerDate: Wed Feb 28 16:45:49 2018 -0800 Windows: Fixed location of Docker's `config.json` file. Per MESOS-8619, Docker checks `$USERPROFILE/.docker/config.json` instead of `$HOME`. Mesos overrides this environment variable in order to point Docker to a `config.json` file in another location, so we have to fix the assumption we made about Docker. We do not add this constant to stout, because it is not consistent across Windows applications. This particular logic is specific to the implementation of Docker. Other applications might check `$HOME` or `$HOMEPATH` on Windows. Review: https://reviews.apache.org/r/65872 {noformat} > Docker on Windows uses USERPROFILE instead of HOME for credentials > -- > > Key: MESOS-8619 > URL: https://issues.apache.org/jira/browse/MESOS-8619 > Project: Mesos > Issue Type: Bug > Environment: Windows 10 with Docker version 17.12.0-ce, build c97c6d6. >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer >Priority: Major > Labels: docker, windows > Fix For: 1.6.0 > > > The logic for doing a {{docker pull}} of an image for a private registry > assumes that the {{.docker/config.json}} is to be found in {{$HOME}} > (according to the [Mesosphere > instructions|https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html#docker-containerizer] > and the > [code|https://github.com/apache/mesos/blob/b7933c176d719766bdb6459048ede6e94f6a7763/src/docker/docker.cpp#L1710]). > However, this assumption was only true for Linux per the [Docker > code|https://github.com/moby/moby/blob/3a633a712c8bbb863fe7e57ec132dd87a9c4eff7/pkg/homedir/homedir_unix.go#L14], > but on Windows, Docker explicitly looks at the {{USERPROFILE}} environment > variable, again [per the Docker > code|https://github.com/moby/moby/blob/3a633a712c8bbb863fe7e57ec132dd87a9c4eff7/pkg/homedir/homedir_windows.go#L10]. > So in order for Docker to pick up the config file correctly, we need to > change the variable used on Windows in the Docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8630) All subsequent registry operations fail after the registrar is aborted after a failed update
Yan Xu created MESOS-8630: - Summary: All subsequent registry operations fail after the registrar is aborted after a failed update Key: MESOS-8630 URL: https://issues.apache.org/jira/browse/MESOS-8630 Project: Mesos Issue Type: Bug Components: master Reporter: Yan Xu Failure to update registry always aborts the registrar but don't always abort the master process. When the registrar fails to update the registry it would abort the actor and fail all future operations. The rationale as explained here: [https://github.com/apache/mesos/commit/5eaf1eb346fc2f46c852c1246bdff12a89216b60] {quote}In this event, the Master won't commit suicide until the initial failure is processed. However, in the interim, subsequent operations are potentially being performed against the Registrar. This could lead to fighting between masters if a "demoted" master re-attempts to acquire log-leadership! {quote} However when the registrar updates is requested by an operator API (maintenance, quota update, etc) the master process doesn't shut down (a 500 error is returned to the client instead) and all subsequent operations will fail! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8240) Add an option to build the new CLI and run unit tests.
[ https://issues.apache.org/jira/browse/MESOS-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383600#comment-16383600 ] Armand Grillet commented on MESOS-8240: --- https://reviews.apache.org/r/65705/ > Add an option to build the new CLI and run unit tests. > -- > > Key: MESOS-8240 > URL: https://issues.apache.org/jira/browse/MESOS-8240 > Project: Mesos > Issue Type: Improvement > Components: cli >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Major > > An update of the discarded [https://reviews.apache.org/r/52543/] > Also needs to be available for CMake. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-5158) Provide XFS quota support for persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383595#comment-16383595 ] Harold Dost III commented on MESOS-5158: Hey [~jamespeach], On a related note, one of the things mentioned in the source is about the fact that the isolator does not necessarily have knowledge of whether the persistent volume is available. Should we consider making some general utilities inspired by the ones in the xfs isolator that could be refactored so that they may be used by frameworks and isolators alike. Sound like right now there is no reference to the number of containers access a persistent volume. Maybe we would want some sort of reference counting and have a way that volumes in essence manage themselves. I am still learning the lifecycles of some of the different components, but I think it may be better when persistent volumes are are created {{src/common/sources.cpp}} to allocate a project id for the directory, and then as task use the same one increase their usage. As containers are cleaned up reduce the count. > Provide XFS quota support for persistent volumes. > - > > Key: MESOS-5158 > URL: https://issues.apache.org/jira/browse/MESOS-5158 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu >Priority: Major > > Given that the lifecycle of persistent volumes is managed outside of the > isolator, we may need to further abstract out the quota management > functionality to do it outside the XFS isolator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
[ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383511#comment-16383511 ] Stefan Eder commented on MESOS-8158: We experience the same issue with Mesos 1.4.1 (did not work with previous versions e.g. 1.0, 1.3 either). Mesos agent started like: {noformat} ExecStart=/usr/bin/docker run \ --name=mesos_agent \ --net=host \ --pid=host \ # does not work without this as well --privileged \ -v /cgroup:/cgroup \ -v /sys:/sys \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /mnt/data/mesos/slave/work:/mnt/data/mesos/slave/work \ -v /mnt/data/mesos/slave/logs:/mnt/data/mesos/slave/logs \ infonova/mesos-agent:1.4.1-docker-17.09.0-ce \ --ip=10.0.0.100 \ --logging_level=INFO \ --advertise_ip=10.0.0.100 \ --port=5051 \ --advertise_port=5051 \ --master=zk://10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181/MesosDevelopment \ --containerizers=docker \ --recover=reconnect \ --resources=cpus:6;mem:45000;ports:[8000-9000] \ --log_dir=/mnt/data/mesos/slave/logs \ --work_dir=/mnt/data/mesos/slave/work \ --docker_remove_delay=10mins \ --executor_registration_timeout=20mins \ --executor_shutdown_grace_period=10mins \ --hostname=mesos-dev-agent-01 \ --attributes=host:mesos-dev-agent-01;type:openstacknode;dockerfs:overlayfs \ --credential=file:///etc/mesos-slave/password \ --recovery_timeout=24hrs \ --docker_config=file:///home/jenkins/.docker/config.json \ --no-systemd_enable_support \ --docker_mesos_image=infonova/mesos:1.4.1-docker-17.09.0-ce{noformat} Logs of mesos-agent indicate, that it does not find the actual task container: {noformat} I0302 09:33:34.609871 6 docker.cpp:1136] Starting container 'f00102b3-f1bd-4575-86fd-769f198db674' for task 'jenkins-mesos-agent-01' (and executor 'jenkins-mesos-agent-01') of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 I0302 09:33:35.321282 7 slave.cpp:3928] Got registration for executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 from executor(1)@10.0.0.100:45619 I0302 09:33:35.322042 11 docker.cpp:1616] Ignoring updating container f00102b3-f1bd-4575-86fd-769f198db674 because resources passed to update are identical to existing resources I0302 09:33:35.322141 11 slave.cpp:2598] Sending queued task 'jenkins-mesos-agent-01' to executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 at executor(1)@10.0.0.100:45619 E0302 09:33:56.000396 10 slave.cpp:5285] Container 'f00102b3-f1bd-4575-86fd-769f198db674' for executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 failed to start: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-f00102b3-f1bd-4575-86fd-769f198db674': exited with status 1; stderr='Error: No such object: mesos-f00102b3-f1bd-4575-86fd-769f198db674' I0302 09:33:56.000726 7 slave.cpp:5398] Executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 has terminated with unknown status I0302 09:33:56.000792 7 slave.cpp:4392] Handling status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 from @0.0.0.0:0 W0302 09:33:56.001066 12 docker.cpp:1603] Ignoring updating unknown container f00102b3-f1bd-4575-86fd-769f198db674 I0302 09:33:56.001164 10 status_update_manager.cpp:323] Received status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 I0302 09:33:56.001391 10 status_update_manager.cpp:834] Checkpointing UPDATE for status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 I0302 09:33:56.001545 10 slave.cpp:4873] Forwarding the update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 to master@10.0.0.81:5050 I0302 09:33:56.023033 7 status_update_manager.cpp:395] Received status update acknowledgement (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 I0302 09:33:56.023077 7 status_update_manager.cpp:834] Checkpointing ACK for status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 I0302 09:33:56.023187 8 slave.cpp:5509] Cleaning up executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 at executor(1)@10.0.0.100:45619{noformat} Funny thing is though, the
[jira] [Comment Edited] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.
[ https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270038#comment-16270038 ] Greg Mann edited comment on MESOS-8190 at 3/2/18 8:09 AM: -- Reviews here: https://reviews.apache.org/r/63991/ https://reviews.apache.org/r/63992/ https://reviews.apache.org/r/63994/ was (Author: greggomann): Reviews here: https://reviews.apache.org/r/63990/ https://reviews.apache.org/r/63991/ https://reviews.apache.org/r/63992/ https://reviews.apache.org/r/63994/ > Update the master to accept OfferOperationIDs from frameworks. > -- > > Key: MESOS-8190 > URL: https://issues.apache.org/jira/browse/MESOS-8190 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > Master’s {{ACCEPT}} handler should send failed operation updates when a > framework sets the {{OfferOperationID}} on an operation destined for an agent > without the {{RESOURCE_PROVIDER}} capability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)