[jira] [Created] (MESOS-8607) Port mesos-execute to Windows
Andrew Schwartzmeyer created MESOS-8607: --- Summary: Port mesos-execute to Windows Key: MESOS-8607 URL: https://issues.apache.org/jira/browse/MESOS-8607 Project: Mesos Issue Type: Improvement Components: cli Environment: Windows Reporter: Andrew Schwartzmeyer The Mesos CLI, {{mesos-execute}} is a useful developer tool. It is a command-line, stand-alone framework, meaning you can use it to launch a task without standing up e.g. Marathon. Right now, it just doesn't build on Windows, though it would be useful. The starting point would be to turn it on for Windows in the build system (https://github.com/apache/mesos/blob/master/src/cli/CMakeLists.txt) and see how far the compilation gets. Classic cross-platform work: where it fails, identify how to fix the code that's failing to compile (should it be ported or removed etc.?) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8601) Master crashes during slave reregistration after failover.
[ https://issues.apache.org/jira/browse/MESOS-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375270#comment-16375270 ] Greg Mann commented on MESOS-8601: -- {code} commit b4e210678c04e57c2fa9f277b44f6d011da1846a (HEAD -> master, origin/master, origin/HEAD, merge) Author: Chun-Hung HsiaoDate: Fri Feb 23 18:37:17 2018 -0800 Added a master API test for agent re-registration after master failover. This test verifies that subscribing to the 'api/v1' endpoint between a master failover and an agent re-registration won't cause the master to crash. Review: https://reviews.apache.org/r/65775/ {code} {code} commit f2ec2b288e823424b2efe71d62ef90101b7a863f Author: Chun-Hung Hsiao Date: Fri Feb 23 18:37:12 2018 -0800 Fixed a master API bug for agent re-registration after master failover. When the master fails over and a client subscribes to the master before agent re-registration, the master will crash when sending `TASK_ADDED` because the framework info might not have been added to the master yet. This patch fixes this bug. Review: https://reviews.apache.org/r/65774/ {code} > Master crashes during slave reregistration after failover. > -- > > Key: MESOS-8601 > URL: https://issues.apache.org/jira/browse/MESOS-8601 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Blocker > Labels: master > > The following happened after a master failover. > During slave reregistration, new tasks were added and the new leading master > notified all of its subscribers, and triggered the following check failure: > {noformat} > F0222 15:53:44.440387 2805 master.cpp:11190] Check failed: 'framework' Must > be non NULL > *** Check failure stack trace: *** > @ 0x7f1357be521d google::LogMessage::Fail() > @ 0x7f1357be704d google::LogMessage::SendToLog() > @ 0x7f1357be4e0c google::LogMessage::Flush() > @ 0x7f1357be7949 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f1356c80e2d google::CheckNotNull<>() > @ 0x7f1356ce2666 mesos::internal::master::Master::Subscribers::send() > @ 0x7f1356cece83 mesos::internal::master::Slave::addTask() > @ 0x7f1356cf3206 mesos::internal::master::Slave::Slave() > @ 0x7f1356cf5b90 mesos::internal::master::Master::__reregisterSlave() > @ 0x7f1356d02cf8 mesos::internal::master::Master::_reregisterSlave() > @ 0x7f1357b43761 process::ProcessBase::consume() > @ 0x7f1357b5248c process::ProcessManager::resume() > @ 0x7f1357b579f6 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > @ 0x7f1354e6c230 (unknown) > @ 0x7f135468ae25 start_thread > @ 0x7f13543b834d __clone > {noformat} > This was because the master tried to get the framework info when sending the > notification: > https://github.com/apache/mesos/blob/1.5.x/src/master/master.cpp#L11190 > But it added the framework after that: > https://github.com/apache/mesos/blob/1.5.x/src/master/master.cpp#L6963 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8576) Improve discard handling of 'Docker::inspect()'
[ https://issues.apache.org/jira/browse/MESOS-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375252#comment-16375252 ] Greg Mann commented on MESOS-8576: -- Still working on this one. The problem is that {{Docker::inspect()}} has retry logic embedded within the library function, since we often call it before a container has started running in order to detect that the container is up. So, to avoid repeatedly registering {{onDiscard}} callbacks with every retry (which would constitute a memory leak), we need to pass the "context" of the current {{docker inspect}} call through the async call chain, and also make it accessible to the {{onDiscard}} callback which we install onto the returned future. Since the Docker library is not currently a libprocess actor, this is a bit difficult. WIP patch here: https://reviews.apache.org/r/65683/ > Improve discard handling of 'Docker::inspect()' > --- > > Key: MESOS-8576 > URL: https://issues.apache.org/jira/browse/MESOS-8576 > Project: Mesos > Issue Type: Improvement > Components: containerization, docker >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > In the call path of {{Docker::inspect()}}, each continuation currently checks > if {{promise->future().hasDiscard()}}, where the {{promise}} is associated > with the output of the {{docker inspect}} call. However, if the call to > {{docker inspect}} becomes hung indefinitely, then continuations are never > invoked, and a subsequent discard of the returned {{Future}} will have no > effect. We should add proper {{onDiscard}} handling to that {{Future}} so > that appropriate cleanup is performed in such cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'
[ https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375226#comment-16375226 ] Greg Mann edited comment on MESOS-8575 at 2/24/18 1:52 AM: --- {code} commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56 Author: Greg MannDate: Fri Feb 23 16:41:46 2018 -0800 Updated discard handling for Docker 'stop' and 'pull' commands. Review: https://reviews.apache.org/r/65787/ {code} {code} commit 9a414ab2f9dcaa599afa2ef3d5dd38e66e0258c7 Author: Greg Mann Date: Fri Feb 23 16:41:44 2018 -0800 Prevented Docker library from terminating incorrect processes. Previously, the Docker library might call `os::killtree()` on a PID after the associated subprocess had already terminated, which could lead to an unknown process being incorrectly killed. Review: https://reviews.apache.org/r/65786/ {code} was (Author: greggomann): {code} commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56 Author: Greg Mann Date: Fri Feb 23 16:41:46 2018 -0800 Updated discard handling for Docker 'stop' and 'pull' commands. Review: https://reviews.apache.org/r/65787/ {code} > Improve discard handling for 'Docker::stop' and 'Docker::pull' > -- > > Key: MESOS-8575 > URL: https://issues.apache.org/jira/browse/MESOS-8575 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > The functions in the Docker library which issue Docker CLI commands should be > updated so that when the {{Future}} they return is discarded, any > subprocesses which have been spawned will be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs
[ https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375227#comment-16375227 ] Greg Mann commented on MESOS-8573: -- This issue is resolved by MESOS-8575, and verified by the test in the commit cited above. > Container stuck in PULLING when Docker daemon hangs > --- > > Key: MESOS-8573 > URL: https://issues.apache.org/jira/browse/MESOS-8573 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Gilbert Song >Priority: Major > Labels: mesosphere > > When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will > always perform a {{docker inspect}} call before it does a {{docker pull}}. If > either of these two Docker CLI calls hangs indefinitely, the Docker container > will be stuck in the PULLING state. This means that we make no further > progress in the {{launch()}} call path, so the executor binary is never > executed, the {{Future}} associated with the {{launch()}} call is never > failed or satisfied, and {{wait()}} is never called on the container. The > agent chains the executor cleanup onto that {{wait()}} call which is never > made. So, when the executor registration timeout elapses, > {{containerizer->destroy()}} is called on the executor container, but the > rest of the executor cleanup is never performed, and no terminal task status > update is sent. > This leaves the task destined for that Docker executor stuck in TASK_STAGING > from the framework's perspective, and attempts to kill the task will fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'
[ https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375226#comment-16375226 ] Greg Mann commented on MESOS-8575: -- {code} commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56 Author: Greg MannDate: Fri Feb 23 16:41:46 2018 -0800 Updated discard handling for Docker 'stop' and 'pull' commands. Review: https://reviews.apache.org/r/65787/ {code} > Improve discard handling for 'Docker::stop' and 'Docker::pull' > -- > > Key: MESOS-8575 > URL: https://issues.apache.org/jira/browse/MESOS-8575 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > The functions in the Docker library which issue Docker CLI commands should be > updated so that when the {{Future}} they return is discarded, any > subprocesses which have been spawned will be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8591) Add infra to test a hung Docker daemon
[ https://issues.apache.org/jira/browse/MESOS-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375224#comment-16375224 ] Greg Mann commented on MESOS-8591: -- {code} commit 8c793a75bb59e59d4a2b8a63afea38b64b100a97 Author: Greg MannDate: Fri Feb 23 16:41:48 2018 -0800 Added test fixture for a hung Docker daemon. The new 'HungDockerTest' class allows test authors to force certain Docker daemon calls to be delayed for a specified duration. Review: https://reviews.apache.org/r/65751/ {code} > Add infra to test a hung Docker daemon > -- > > Key: MESOS-8591 > URL: https://issues.apache.org/jira/browse/MESOS-8591 > Project: Mesos > Issue Type: Improvement > Components: docker, test >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > We should add infrastructure to our tests which enables us to test the > behavior of the Docker executor and containerizer in the presence of a hung > Docker daemon. > One possible first-order solution is to build a simple binary which never > returns. We could initialize the agent/executor with this binary instead of > the Docker CLI in order to simulate a Docker daemon which hangs on every call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs
[ https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375222#comment-16375222 ] Greg Mann commented on MESOS-8573: -- {code} commit 28ab19b11ca369751f21e921cc4594e00ca667cb Author: Greg MannDate: Fri Feb 23 16:41:51 2018 -0800 Added test for hung 'docker inspect' call during container pull. Review: https://reviews.apache.org/r/65750/ {code} > Container stuck in PULLING when Docker daemon hangs > --- > > Key: MESOS-8573 > URL: https://issues.apache.org/jira/browse/MESOS-8573 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Gilbert Song >Priority: Major > Labels: mesosphere > > When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will > always perform a {{docker inspect}} call before it does a {{docker pull}}. If > either of these two Docker CLI calls hangs indefinitely, the Docker container > will be stuck in the PULLING state. This means that we make no further > progress in the {{launch()}} call path, so the executor binary is never > executed, the {{Future}} associated with the {{launch()}} call is never > failed or satisfied, and {{wait()}} is never called on the container. The > agent chains the executor cleanup onto that {{wait()}} call which is never > made. So, when the executor registration timeout elapses, > {{containerizer->destroy()}} is called on the executor container, but the > rest of the executor cleanup is never performed, and no terminal task status > update is sent. > This leaves the task destined for that Docker executor stuck in TASK_STAGING > from the framework's perspective, and attempts to kill the task will fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8442) Source tree contains generated endpoint documentation
[ https://issues.apache.org/jira/browse/MESOS-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-8442: -- Assignee: Benjamin Mahler (was: Benjamin Bannier) > Source tree contains generated endpoint documentation > - > > Key: MESOS-8442 > URL: https://issues.apache.org/jira/browse/MESOS-8442 > Project: Mesos > Issue Type: Task > Components: documentation >Reporter: Benjamin Bannier >Assignee: Benjamin Mahler >Priority: Major > > Even though we generate documentation automatically in CI, the source tree > still contains checked in, generated endpoint documentation in > {{docs/endpoints}}. > We should remove these source files from the tree. We need to make sure to > * not break automatic website generation with > {{support/mesos-website/build.sh}}, > * not break the local website generation workflow with > {{site/mesos-website-dev.sh}}, and > * not break local website generation workflow with {{rake}} via > {{site/Rakefile}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8605) Terminal task status update will not send if 'docker inspect' is hung
Greg Mann created MESOS-8605: Summary: Terminal task status update will not send if 'docker inspect' is hung Key: MESOS-8605 URL: https://issues.apache.org/jira/browse/MESOS-8605 Project: Mesos Issue Type: Improvement Components: docker Affects Versions: 1.5.0 Reporter: Greg Mann When the agent processes a terminal status update for a task, it calls {{containerizer->update()}} on the container before it forwards the update: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/slave.cpp#L5509-L5514 In the Docker containerizer, {{update()}} calls {{Docker::inspect()}}, which means that if the inspect call hangs, the terminal update will not be sent: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/containerizer/docker.cpp#L1714 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'
[ https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375102#comment-16375102 ] Greg Mann commented on MESOS-8575: -- Review here: https://reviews.apache.org/r/65787/ > Improve discard handling for 'Docker::stop' and 'Docker::pull' > -- > > Key: MESOS-8575 > URL: https://issues.apache.org/jira/browse/MESOS-8575 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > The functions in the Docker library which issue Docker CLI commands should be > updated so that when the {{Future}} they return is discarded, any > subprocesses which have been spawned will be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'
[ https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8575: - Comment: was deleted (was: Review: https://reviews.apache.org/r/65683/) > Improve discard handling for 'Docker::stop' and 'Docker::pull' > -- > > Key: MESOS-8575 > URL: https://issues.apache.org/jira/browse/MESOS-8575 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > The functions in the Docker library which issue Docker CLI commands should be > updated so that when the {{Future}} they return is discarded, any > subprocesses which have been spawned will be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs
[ https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375041#comment-16375041 ] Greg Mann commented on MESOS-8573: -- The following test probes this failure scenario: https://reviews.apache.org/r/65750 > Container stuck in PULLING when Docker daemon hangs > --- > > Key: MESOS-8573 > URL: https://issues.apache.org/jira/browse/MESOS-8573 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Gilbert Song >Priority: Major > Labels: mesosphere > > When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will > always perform a {{docker inspect}} call before it does a {{docker pull}}. If > either of these two Docker CLI calls hangs indefinitely, the Docker container > will be stuck in the PULLING state. This means that we make no further > progress in the {{launch()}} call path, so the executor binary is never > executed, the {{Future}} associated with the {{launch()}} call is never > failed or satisfied, and {{wait()}} is never called on the container. The > agent chains the executor cleanup onto that {{wait()}} call which is never > made. So, when the executor registration timeout elapses, > {{containerizer->destroy()}} is called on the executor container, but the > rest of the executor cleanup is never performed, and no terminal task status > update is sent. > This leaves the task destined for that Docker executor stuck in TASK_STAGING > from the framework's perspective, and attempts to kill the task will fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8604) Quota headroom tracking may be incorrect in the presence of hierarchical reservation.
Meng Zhu created MESOS-8604: --- Summary: Quota headroom tracking may be incorrect in the presence of hierarchical reservation. Key: MESOS-8604 URL: https://issues.apache.org/jira/browse/MESOS-8604 Project: Mesos Issue Type: Bug Components: allocation Affects Versions: 1.5.0 Reporter: Meng Zhu Assignee: Meng Zhu When calculating the global quota headroom, we subtract all unallocated reservations by doing ``` for each role with reservation availableHeadroom -= role total reservation - role allocated reservation; ``` We only traverse roles with reservation. In the presence of hierarchal reservation, this is problematic. Consider a child role (e.g. "a/b") with no reservations, it can still get reserved resources if its ancestor has reservations (e.g. "a" has reservations). However, allocated reserved resources of role “a/b” will be ignored given the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (MESOS-8576) Improve discard handling of 'Docker::inspect()'
[ https://issues.apache.org/jira/browse/MESOS-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8576: - Comment: was deleted (was: Closing in favor of MESOS-8575.) > Improve discard handling of 'Docker::inspect()' > --- > > Key: MESOS-8576 > URL: https://issues.apache.org/jira/browse/MESOS-8576 > Project: Mesos > Issue Type: Improvement > Components: containerization, docker >Affects Versions: 1.5.0 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > In the call path of {{Docker::inspect()}}, each continuation currently checks > if {{promise->future().hasDiscard()}}, where the {{promise}} is associated > with the output of the {{docker inspect}} call. However, if the call to > {{docker inspect}} becomes hung indefinitely, then continuations are never > invoked, and a subsequent discard of the returned {{Future}} will have no > effect. We should add proper {{onDiscard}} handling to that {{Future}} so > that appropriate cleanup is performed in such cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-7518) Mesos master and slave get uninstalled by ubuntu 16.04
[ https://issues.apache.org/jira/browse/MESOS-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374819#comment-16374819 ] Martin Tapp edited comment on MESOS-7518 at 2/23/18 6:46 PM: - Ok, so we're still experiencing this issue after reporting it 10 months ago, any help please? was (Author: doctapp): Ok, so we're still experiencing this issue after reporting it 6 months ago, any help please? > Mesos master and slave get uninstalled by ubuntu 16.04 > -- > > Key: MESOS-7518 > URL: https://issues.apache.org/jira/browse/MESOS-7518 > Project: Mesos > Issue Type: Bug > Components: agent, master >Affects Versions: 1.2.0 > Environment: Ubuntu 16.04 >Reporter: Martin Tapp >Priority: Major > > Since we've upgraded to Mesos 1.2 (from 1.0) on Ubuntu 16.04, the master and > agent/slave sometimes get uninstalled automatically. This always happens > after a reboot, when you restart the service, and maybe every other day > otherwise on it's own. We're running on bare metal servers and VMs with the > same result. > sudo service mesos-master status yields > Warning: mesos-master.service changed on disk. Run 'systemctl daemon-reload' > to reload units. > Same for mesos-slave. > Solution is to re-run our mesos provisioning automatically. apt-get install > mesos re-installs it all the time. > We're using apt source 'deb http://repos.mesosphere.io/ubuntu xenial main' > with key > 'http://keyserver.ubuntu.com/pks/lookup?op=get=on=0xE56151BF' > Any idea? > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7518) Mesos master and slave get uninstalled by ubuntu 16.04
[ https://issues.apache.org/jira/browse/MESOS-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374819#comment-16374819 ] Martin Tapp commented on MESOS-7518: Ok, so we're still experiencing this issue after reporting it 6 months ago, any help please? > Mesos master and slave get uninstalled by ubuntu 16.04 > -- > > Key: MESOS-7518 > URL: https://issues.apache.org/jira/browse/MESOS-7518 > Project: Mesos > Issue Type: Bug > Components: agent, master >Affects Versions: 1.2.0 > Environment: Ubuntu 16.04 >Reporter: Martin Tapp >Priority: Major > > Since we've upgraded to Mesos 1.2 (from 1.0) on Ubuntu 16.04, the master and > agent/slave sometimes get uninstalled automatically. This always happens > after a reboot, when you restart the service, and maybe every other day > otherwise on it's own. We're running on bare metal servers and VMs with the > same result. > sudo service mesos-master status yields > Warning: mesos-master.service changed on disk. Run 'systemctl daemon-reload' > to reload units. > Same for mesos-slave. > Solution is to re-run our mesos provisioning automatically. apt-get install > mesos re-installs it all the time. > We're using apt source 'deb http://repos.mesosphere.io/ubuntu xenial main' > with key > 'http://keyserver.ubuntu.com/pks/lookup?op=get=on=0xE56151BF' > Any idea? > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8603) SlaveTest.TerminalTaskContainerizerUpdateFailsWithGone and SlaveTest.TerminalTaskContainerizerUpdateFailsWithLost are flaky
Jan Schlicht created MESOS-8603: --- Summary: SlaveTest.TerminalTaskContainerizerUpdateFailsWithGone and SlaveTest.TerminalTaskContainerizerUpdateFailsWithLost are flaky Key: MESOS-8603 URL: https://issues.apache.org/jira/browse/MESOS-8603 Project: Mesos Issue Type: Bug Components: test Reporter: Jan Schlicht Attachments: TerminalTaskContainerizerUpdateFailsWithGone, TerminalTaskContainerizerUpdateFailsWithLost Both tests fail from time to time. Attached are verbose test output of failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7176) Add versioning support to network/cni isolator
[ https://issues.apache.org/jira/browse/MESOS-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374228#comment-16374228 ] Qian Zhang commented on MESOS-7176: --- According to [CNI spec|https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions], one of the major changes introduced in CNI spec 0.3.0 is rich result type, the result type of CNI spec 0.3.0 is [https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result|https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result,] which is different from CNI spec 0.2.0. What CNI isolator in Mesos is using is CNI spec 0.2.0, see [here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.proto#L63:L67] for details. As a result, currently CNI isolator can NOT support CNI network configuration whose version is 0.3.0+, because if CNI isolator invokes a CNI plugins (suppose it also supports CNI spec 0.3.0+) with a CNI network configuration of version 0.3.0+ (see below as an example) as its input, the CNI plugin will return the result which conforms the same version of CNI spec as the input CNI network configuration (i.e., 0.3.0 in the example below), but CNI isolator will always use CNI spec 0.2.0 to parse the result (see [here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.cpp#L46:L59] for details.) which will fail. {code:java} { "cniVersion": "0.3.0", "name": "dbnet", "type": "bridge", "bridge": "cni0", "ipam": { "type": "dhcp" } }{code} So I think we should improve CNI isolator to support CNI spec 0.3.0 as well, and parse the result returned by CNI plugin based on the CNI spec version of the result. > Add versioning support to network/cni isolator > -- > > Key: MESOS-7176 > URL: https://issues.apache.org/jira/browse/MESOS-7176 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Avinash Sridharan >Assignee: Deepak Goel >Priority: Major > > Currently the network/cni isolator support CNI SPEC version 0.2 . The CNI > SPEC version 0.3 has already been ratified and introduces new features such > as CNI service chaining and CNI plugin capabilities. However, CNI spec > version 0.3 is incompatible with CNI spec 0.2. Hence we need to introduce > versioning support in `network/cni` isolator in order to make it backward > compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)