[jira] [Created] (MESOS-8607) Port mesos-execute to Windows

2018-02-23 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8607:
---

 Summary: Port mesos-execute to Windows
 Key: MESOS-8607
 URL: https://issues.apache.org/jira/browse/MESOS-8607
 Project: Mesos
  Issue Type: Improvement
  Components: cli
 Environment: Windows
Reporter: Andrew Schwartzmeyer


The Mesos CLI, {{mesos-execute}} is a useful developer tool. It is a 
command-line, stand-alone framework, meaning you can use it to launch a task 
without standing up e.g. Marathon. Right now, it just doesn't build on Windows, 
though it would be useful.

The starting point would be to turn it on for Windows in the build system 
(https://github.com/apache/mesos/blob/master/src/cli/CMakeLists.txt) and see 
how far the compilation gets. Classic cross-platform work: where it fails, 
identify how to fix the code that's failing to compile (should it be ported or 
removed etc.?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8601) Master crashes during slave reregistration after failover.

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375270#comment-16375270
 ] 

Greg Mann commented on MESOS-8601:
--

{code}
commit b4e210678c04e57c2fa9f277b44f6d011da1846a (HEAD -> master, origin/master, 
origin/HEAD, merge)
Author: Chun-Hung Hsiao 
Date:   Fri Feb 23 18:37:17 2018 -0800

Added a master API test for agent re-registration after master failover.

This test verifies that subscribing to the 'api/v1' endpoint between a
master failover and an agent re-registration won't cause the master to
crash.

Review: https://reviews.apache.org/r/65775/
{code}
{code}
commit f2ec2b288e823424b2efe71d62ef90101b7a863f
Author: Chun-Hung Hsiao 
Date:   Fri Feb 23 18:37:12 2018 -0800

Fixed a master API bug for agent re-registration after master failover.

When the master fails over and a client subscribes to the master before
agent re-registration, the master will crash when sending `TASK_ADDED`
because the framework info might not have been added to the master yet.
This patch fixes this bug.

Review: https://reviews.apache.org/r/65774/
{code}

> Master crashes during slave reregistration after failover.
> --
>
> Key: MESOS-8601
> URL: https://issues.apache.org/jira/browse/MESOS-8601
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: master
>
> The following happened after a master failover.
> During slave reregistration, new tasks were added and the new leading master 
> notified all of its subscribers, and triggered the following check failure:
> {noformat}
> F0222 15:53:44.440387  2805 master.cpp:11190] Check failed: 'framework' Must 
> be non NULL
> *** Check failure stack trace: ***
> @ 0x7f1357be521d  google::LogMessage::Fail()
> @ 0x7f1357be704d  google::LogMessage::SendToLog()
> @ 0x7f1357be4e0c  google::LogMessage::Flush()
> @ 0x7f1357be7949  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f1356c80e2d  google::CheckNotNull<>()
> @ 0x7f1356ce2666  mesos::internal::master::Master::Subscribers::send()
> @ 0x7f1356cece83  mesos::internal::master::Slave::addTask()
> @ 0x7f1356cf3206  mesos::internal::master::Slave::Slave()
> @ 0x7f1356cf5b90  mesos::internal::master::Master::__reregisterSlave()
> @ 0x7f1356d02cf8  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f1357b43761  process::ProcessBase::consume()
> @ 0x7f1357b5248c  process::ProcessManager::resume()
> @ 0x7f1357b579f6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f1354e6c230  (unknown)
> @ 0x7f135468ae25  start_thread
> @ 0x7f13543b834d  __clone
> {noformat}
> This was because the master tried to get the framework info when sending the 
> notification: 
> https://github.com/apache/mesos/blob/1.5.x/src/master/master.cpp#L11190
> But it added the framework after that:
> https://github.com/apache/mesos/blob/1.5.x/src/master/master.cpp#L6963



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8576) Improve discard handling of 'Docker::inspect()'

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375252#comment-16375252
 ] 

Greg Mann commented on MESOS-8576:
--

Still working on this one. The problem is that {{Docker::inspect()}} has retry 
logic embedded within the library function, since we often call it before a 
container has started running in order to detect that the container is up. So, 
to avoid repeatedly registering {{onDiscard}} callbacks with every retry (which 
would constitute a memory leak), we need to pass the "context" of the current 
{{docker inspect}} call through the async call chain, and also make it 
accessible to the {{onDiscard}} callback which we install onto the returned 
future. Since the Docker library is not currently a libprocess actor, this is a 
bit difficult.

WIP patch here: https://reviews.apache.org/r/65683/

> Improve discard handling of 'Docker::inspect()'
> ---
>
> Key: MESOS-8576
> URL: https://issues.apache.org/jira/browse/MESOS-8576
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, docker
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
>
> In the call path of {{Docker::inspect()}}, each continuation currently checks 
> if {{promise->future().hasDiscard()}}, where the {{promise}} is associated 
> with the output of the {{docker inspect}} call. However, if the call to 
> {{docker inspect}} becomes hung indefinitely, then continuations are never 
> invoked, and a subsequent discard of the returned {{Future}} will have no 
> effect. We should add proper {{onDiscard}} handling to that {{Future}} so 
> that appropriate cleanup is performed in such cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375226#comment-16375226
 ] 

Greg Mann edited comment on MESOS-8575 at 2/24/18 1:52 AM:
---

{code}
commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56
Author: Greg Mann 
Date:   Fri Feb 23 16:41:46 2018 -0800

Updated discard handling for Docker 'stop' and 'pull' commands.

Review: https://reviews.apache.org/r/65787/
{code}
{code}
commit 9a414ab2f9dcaa599afa2ef3d5dd38e66e0258c7
Author: Greg Mann 
Date:   Fri Feb 23 16:41:44 2018 -0800

Prevented Docker library from terminating incorrect processes.

Previously, the Docker library might call `os::killtree()` on a
PID after the associated subprocess had already terminated, which
could lead to an unknown process being incorrectly killed.

Review: https://reviews.apache.org/r/65786/
{code}


was (Author: greggomann):
{code}
commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56
Author: Greg Mann 
Date:   Fri Feb 23 16:41:46 2018 -0800

Updated discard handling for Docker 'stop' and 'pull' commands.

Review: https://reviews.apache.org/r/65787/
{code}

> Improve discard handling for 'Docker::stop' and 'Docker::pull'
> --
>
> Key: MESOS-8575
> URL: https://issues.apache.org/jira/browse/MESOS-8575
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.6.0
>
>
> The functions in the Docker library which issue Docker CLI commands should be 
> updated so that when the {{Future}} they return is discarded, any 
> subprocesses which have been spawned will be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375227#comment-16375227
 ] 

Greg Mann commented on MESOS-8573:
--

This issue is resolved by MESOS-8575, and verified by the test in the commit 
cited above.

> Container stuck in PULLING when Docker daemon hangs
> ---
>
> Key: MESOS-8573
> URL: https://issues.apache.org/jira/browse/MESOS-8573
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Gilbert Song
>Priority: Major
>  Labels: mesosphere
>
> When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will 
> always perform a {{docker inspect}} call before it does a {{docker pull}}. If 
> either of these two Docker CLI calls hangs indefinitely, the Docker container 
> will be stuck in the PULLING state. This means that we make no further 
> progress in the {{launch()}} call path, so the executor binary is never 
> executed, the {{Future}} associated with the {{launch()}} call is never 
> failed or satisfied, and {{wait()}} is never called on the container. The 
> agent chains the executor cleanup onto that {{wait()}} call which is never 
> made. So, when the executor registration timeout elapses, 
> {{containerizer->destroy()}} is called on the executor container, but the 
> rest of the executor cleanup is never performed, and no terminal task status 
> update is sent.
> This leaves the task destined for that Docker executor stuck in TASK_STAGING 
> from the framework's perspective, and attempts to kill the task will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375226#comment-16375226
 ] 

Greg Mann commented on MESOS-8575:
--

{code}
commit a20a317765d4c577e3d7140e4d7a9b7c5bc85a56
Author: Greg Mann 
Date:   Fri Feb 23 16:41:46 2018 -0800

Updated discard handling for Docker 'stop' and 'pull' commands.

Review: https://reviews.apache.org/r/65787/
{code}

> Improve discard handling for 'Docker::stop' and 'Docker::pull'
> --
>
> Key: MESOS-8575
> URL: https://issues.apache.org/jira/browse/MESOS-8575
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.6.0
>
>
> The functions in the Docker library which issue Docker CLI commands should be 
> updated so that when the {{Future}} they return is discarded, any 
> subprocesses which have been spawned will be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8591) Add infra to test a hung Docker daemon

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375224#comment-16375224
 ] 

Greg Mann commented on MESOS-8591:
--

{code}
commit 8c793a75bb59e59d4a2b8a63afea38b64b100a97
Author: Greg Mann 
Date:   Fri Feb 23 16:41:48 2018 -0800

Added test fixture for a hung Docker daemon.

The new 'HungDockerTest' class allows test authors to force
certain Docker daemon calls to be delayed for a specified
duration.

Review: https://reviews.apache.org/r/65751/
{code}

> Add infra to test a hung Docker daemon
> --
>
> Key: MESOS-8591
> URL: https://issues.apache.org/jira/browse/MESOS-8591
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, test
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.6.0
>
>
> We should add infrastructure to our tests which enables us to test the 
> behavior of the Docker executor and containerizer in the presence of a hung 
> Docker daemon.
> One possible first-order solution is to build a simple binary which never 
> returns. We could initialize the agent/executor with this binary instead of 
> the Docker CLI in order to simulate a Docker daemon which hangs on every call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375222#comment-16375222
 ] 

Greg Mann commented on MESOS-8573:
--

{code}
commit 28ab19b11ca369751f21e921cc4594e00ca667cb
Author: Greg Mann 
Date:   Fri Feb 23 16:41:51 2018 -0800

Added test for hung 'docker inspect' call during container pull.

Review: https://reviews.apache.org/r/65750/
{code}

> Container stuck in PULLING when Docker daemon hangs
> ---
>
> Key: MESOS-8573
> URL: https://issues.apache.org/jira/browse/MESOS-8573
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Gilbert Song
>Priority: Major
>  Labels: mesosphere
>
> When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will 
> always perform a {{docker inspect}} call before it does a {{docker pull}}. If 
> either of these two Docker CLI calls hangs indefinitely, the Docker container 
> will be stuck in the PULLING state. This means that we make no further 
> progress in the {{launch()}} call path, so the executor binary is never 
> executed, the {{Future}} associated with the {{launch()}} call is never 
> failed or satisfied, and {{wait()}} is never called on the container. The 
> agent chains the executor cleanup onto that {{wait()}} call which is never 
> made. So, when the executor registration timeout elapses, 
> {{containerizer->destroy()}} is called on the executor container, but the 
> rest of the executor cleanup is never performed, and no terminal task status 
> update is sent.
> This leaves the task destined for that Docker executor stuck in TASK_STAGING 
> from the framework's perspective, and attempts to kill the task will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8442) Source tree contains generated endpoint documentation

2018-02-23 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-8442:
--

Assignee: Benjamin Mahler  (was: Benjamin Bannier)

> Source tree contains generated endpoint documentation
> -
>
> Key: MESOS-8442
> URL: https://issues.apache.org/jira/browse/MESOS-8442
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Mahler
>Priority: Major
>
> Even though we generate documentation automatically in CI, the source tree 
> still contains checked in, generated endpoint documentation in 
> {{docs/endpoints}}.
> We should remove these source files from the tree. We need to make sure to
> * not break automatic website generation with 
> {{support/mesos-website/build.sh}},
> * not break the local website generation workflow with 
> {{site/mesos-website-dev.sh}}, and
> * not break local website generation workflow with {{rake}} via 
> {{site/Rakefile}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8605) Terminal task status update will not send if 'docker inspect' is hung

2018-02-23 Thread Greg Mann (JIRA)
Greg Mann created MESOS-8605:


 Summary: Terminal task status update will not send if 'docker 
inspect' is hung
 Key: MESOS-8605
 URL: https://issues.apache.org/jira/browse/MESOS-8605
 Project: Mesos
  Issue Type: Improvement
  Components: docker
Affects Versions: 1.5.0
Reporter: Greg Mann


When the agent processes a terminal status update for a task, it calls 
{{containerizer->update()}} on the container before it forwards the update: 
https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/slave.cpp#L5509-L5514

In the Docker containerizer, {{update()}} calls {{Docker::inspect()}}, which 
means that if the inspect call hangs, the terminal update will not be sent: 
https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/containerizer/docker.cpp#L1714



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375102#comment-16375102
 ] 

Greg Mann commented on MESOS-8575:
--

Review here:
https://reviews.apache.org/r/65787/

> Improve discard handling for 'Docker::stop' and 'Docker::pull'
> --
>
> Key: MESOS-8575
> URL: https://issues.apache.org/jira/browse/MESOS-8575
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
>
> The functions in the Docker library which issue Docker CLI commands should be 
> updated so that when the {{Future}} they return is discarded, any 
> subprocesses which have been spawned will be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (MESOS-8575) Improve discard handling for 'Docker::stop' and 'Docker::pull'

2018-02-23 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8575:
-
Comment: was deleted

(was: Review: https://reviews.apache.org/r/65683/)

> Improve discard handling for 'Docker::stop' and 'Docker::pull'
> --
>
> Key: MESOS-8575
> URL: https://issues.apache.org/jira/browse/MESOS-8575
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
>
> The functions in the Docker library which issue Docker CLI commands should be 
> updated so that when the {{Future}} they return is discarded, any 
> subprocesses which have been spawned will be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8573) Container stuck in PULLING when Docker daemon hangs

2018-02-23 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375041#comment-16375041
 ] 

Greg Mann commented on MESOS-8573:
--

The following test probes this failure scenario: 
https://reviews.apache.org/r/65750

> Container stuck in PULLING when Docker daemon hangs
> ---
>
> Key: MESOS-8573
> URL: https://issues.apache.org/jira/browse/MESOS-8573
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Gilbert Song
>Priority: Major
>  Labels: mesosphere
>
> When the {{force}} argument is not set to {{true}}, {{Docker::pull}} will 
> always perform a {{docker inspect}} call before it does a {{docker pull}}. If 
> either of these two Docker CLI calls hangs indefinitely, the Docker container 
> will be stuck in the PULLING state. This means that we make no further 
> progress in the {{launch()}} call path, so the executor binary is never 
> executed, the {{Future}} associated with the {{launch()}} call is never 
> failed or satisfied, and {{wait()}} is never called on the container. The 
> agent chains the executor cleanup onto that {{wait()}} call which is never 
> made. So, when the executor registration timeout elapses, 
> {{containerizer->destroy()}} is called on the executor container, but the 
> rest of the executor cleanup is never performed, and no terminal task status 
> update is sent.
> This leaves the task destined for that Docker executor stuck in TASK_STAGING 
> from the framework's perspective, and attempts to kill the task will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8604) Quota headroom tracking may be incorrect in the presence of hierarchical reservation.

2018-02-23 Thread Meng Zhu (JIRA)
Meng Zhu created MESOS-8604:
---

 Summary: Quota headroom tracking may be incorrect in the presence 
of hierarchical reservation.
 Key: MESOS-8604
 URL: https://issues.apache.org/jira/browse/MESOS-8604
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Affects Versions: 1.5.0
Reporter: Meng Zhu
Assignee: Meng Zhu


When calculating the global quota headroom, we subtract all unallocated 
reservations by doing
```
for each role with reservation
availableHeadroom -= role total reservation - role allocated 
reservation;
```

We only traverse roles with reservation. In the presence of hierarchal 
reservation, this is problematic. Consider a child role (e.g. "a/b") with no 
reservations, it can still get reserved resources if its ancestor has 
reservations (e.g. "a" has reservations). However, allocated reserved resources 
of role “a/b” will be ignored given the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (MESOS-8576) Improve discard handling of 'Docker::inspect()'

2018-02-23 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8576:
-
Comment: was deleted

(was: Closing in favor of MESOS-8575.)

> Improve discard handling of 'Docker::inspect()'
> ---
>
> Key: MESOS-8576
> URL: https://issues.apache.org/jira/browse/MESOS-8576
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, docker
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: mesosphere
>
> In the call path of {{Docker::inspect()}}, each continuation currently checks 
> if {{promise->future().hasDiscard()}}, where the {{promise}} is associated 
> with the output of the {{docker inspect}} call. However, if the call to 
> {{docker inspect}} becomes hung indefinitely, then continuations are never 
> invoked, and a subsequent discard of the returned {{Future}} will have no 
> effect. We should add proper {{onDiscard}} handling to that {{Future}} so 
> that appropriate cleanup is performed in such cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-7518) Mesos master and slave get uninstalled by ubuntu 16.04

2018-02-23 Thread Martin Tapp (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374819#comment-16374819
 ] 

Martin Tapp edited comment on MESOS-7518 at 2/23/18 6:46 PM:
-

Ok, so we're still experiencing this issue after reporting it 10 months ago, 
any help please?


was (Author: doctapp):
Ok, so we're still experiencing this issue after reporting it 6 months ago, any 
help please?

> Mesos master and slave get uninstalled by ubuntu 16.04
> --
>
> Key: MESOS-7518
> URL: https://issues.apache.org/jira/browse/MESOS-7518
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, master
>Affects Versions: 1.2.0
> Environment: Ubuntu 16.04
>Reporter: Martin Tapp
>Priority: Major
>
> Since we've upgraded to Mesos 1.2 (from 1.0) on Ubuntu 16.04, the master and 
> agent/slave sometimes get uninstalled automatically. This always happens 
> after a reboot, when you restart the service, and maybe every other day 
> otherwise on it's own. We're running on bare metal servers and VMs with the 
> same result.
> sudo service mesos-master status yields
> Warning: mesos-master.service changed on disk. Run 'systemctl daemon-reload' 
> to reload units.
> Same for mesos-slave.
> Solution is to re-run our mesos provisioning automatically. apt-get install 
> mesos re-installs it all the time.
> We're using apt source 'deb http://repos.mesosphere.io/ubuntu xenial main' 
> with key 
> 'http://keyserver.ubuntu.com/pks/lookup?op=get=on=0xE56151BF'
> Any idea?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7518) Mesos master and slave get uninstalled by ubuntu 16.04

2018-02-23 Thread Martin Tapp (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374819#comment-16374819
 ] 

Martin Tapp commented on MESOS-7518:


Ok, so we're still experiencing this issue after reporting it 6 months ago, any 
help please?

> Mesos master and slave get uninstalled by ubuntu 16.04
> --
>
> Key: MESOS-7518
> URL: https://issues.apache.org/jira/browse/MESOS-7518
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, master
>Affects Versions: 1.2.0
> Environment: Ubuntu 16.04
>Reporter: Martin Tapp
>Priority: Major
>
> Since we've upgraded to Mesos 1.2 (from 1.0) on Ubuntu 16.04, the master and 
> agent/slave sometimes get uninstalled automatically. This always happens 
> after a reboot, when you restart the service, and maybe every other day 
> otherwise on it's own. We're running on bare metal servers and VMs with the 
> same result.
> sudo service mesos-master status yields
> Warning: mesos-master.service changed on disk. Run 'systemctl daemon-reload' 
> to reload units.
> Same for mesos-slave.
> Solution is to re-run our mesos provisioning automatically. apt-get install 
> mesos re-installs it all the time.
> We're using apt source 'deb http://repos.mesosphere.io/ubuntu xenial main' 
> with key 
> 'http://keyserver.ubuntu.com/pks/lookup?op=get=on=0xE56151BF'
> Any idea?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8603) SlaveTest.TerminalTaskContainerizerUpdateFailsWithGone and SlaveTest.TerminalTaskContainerizerUpdateFailsWithLost are flaky

2018-02-23 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8603:
---

 Summary: SlaveTest.TerminalTaskContainerizerUpdateFailsWithGone 
and SlaveTest.TerminalTaskContainerizerUpdateFailsWithLost are flaky
 Key: MESOS-8603
 URL: https://issues.apache.org/jira/browse/MESOS-8603
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Jan Schlicht
 Attachments: TerminalTaskContainerizerUpdateFailsWithGone, 
TerminalTaskContainerizerUpdateFailsWithLost

Both tests fail from time to time. Attached are verbose test output of failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7176) Add versioning support to network/cni isolator

2018-02-23 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374228#comment-16374228
 ] 

Qian Zhang commented on MESOS-7176:
---

According to [CNI 
spec|https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions],
 one of the major changes introduced in CNI spec 0.3.0 is rich result type, the 
result type of CNI spec 0.3.0 is 
[https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result|https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result,]
 which is different from CNI spec 0.2.0. What CNI isolator in Mesos is using is 
CNI spec 0.2.0, see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.proto#L63:L67]
 for details.

As a result, currently CNI isolator can NOT support CNI network configuration 
whose version is 0.3.0+, because if CNI isolator invokes a CNI plugins (suppose 
it also supports CNI spec 0.3.0+) with a CNI network configuration of version 
0.3.0+ (see below as an example) as its input, the CNI plugin will return the 
result which conforms the same version of CNI spec as the input CNI network 
configuration (i.e., 0.3.0 in the example below), but CNI isolator will always 
use CNI spec 0.2.0 to parse the result (see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.cpp#L46:L59]
 for details.) which will fail.
{code:java}
{
  "cniVersion": "0.3.0",
  "name": "dbnet",
  "type": "bridge",
  "bridge": "cni0",
  "ipam": {
"type": "dhcp"
  }
}{code}
So I think we should improve CNI isolator to support CNI spec 0.3.0 as well, 
and parse the result returned by CNI plugin based on the CNI spec version of 
the result.

> Add versioning support to network/cni isolator
> --
>
> Key: MESOS-7176
> URL: https://issues.apache.org/jira/browse/MESOS-7176
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Deepak Goel
>Priority: Major
>
> Currently the network/cni isolator support CNI SPEC version 0.2 . The CNI 
> SPEC version 0.3 has already been ratified and introduces new features such 
> as CNI service chaining and CNI plugin capabilities. However, CNI spec 
> version 0.3 is incompatible with CNI spec 0.2. Hence we need to introduce 
> versioning support in `network/cni` isolator in order to make it backward 
> compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)