[jira] [Assigned] (MESOS-5231) Create Design Doc for Manage offers in allocator

2018-02-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-5231:


Assignee: Joseph Wu

> Create Design Doc for Manage offers in allocator
> 
>
> Key: MESOS-5231
> URL: https://issues.apache.org/jira/browse/MESOS-5231
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Joseph Wu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8557) Default executor should allow decreasing the escalation grace period of a terminating task

2018-02-08 Thread JIRA
Gastón Kleiman created MESOS-8557:
-

 Summary: Default executor should allow decreasing the escalation 
grace period of a terminating task
 Key: MESOS-8557
 URL: https://issues.apache.org/jira/browse/MESOS-8557
 Project: Mesos
  Issue Type: Bug
Reporter: Gastón Kleiman


The command executor supports [decreasing the escalation grace period of a 
terminating 
task|https://github.com/apache/mesos/blob/c665dd6c22715fa941200020a8f7209f1f5b1ca1/src/launcher/executor.cpp#L800-L803].

For consistency, this should also be supported by the default executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-4553) Manage offers in allocator.

2018-02-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-4553:


Assignee: Joseph Wu

> Manage offers in allocator.
> ---
>
> Key: MESOS-4553
> URL: https://issues.apache.org/jira/browse/MESOS-4553
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Klaus Ma
>Assignee: Joseph Wu
>Priority: Major
>
> Currently, the {{offers}} are managed by {{Master}} which introduces two 
> issues:
> 1. In Quota, master rescind more offers to address race condition
> 2. Allocator can not modify offers: resources return to allocator and offer 
> again,  that impact resources utilisation & performance (MESOS-3078)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1635#comment-1635
 ] 

ASF GitHub Bot commented on MESOS-8534:
---

GitHub user sagar8192 opened a pull request:

https://github.com/apache/mesos/pull/263

Allow nested containers in pods to have separate namespaces(Ref: 
MESOS-8534).

This change allows nested containers to have separate network and mount 
namespaces. It also retains the existing functionality, where a nested 
container can attach to parent's network and mount namespace.

I have not fixed/added tests for this. First, I want to make sure that this 
is the right approach. If this change looks good to the reviewers, I will 
fix/add unit tests.

After this change is shipped, the docs also need to be updated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sagar8192/mesos 
sagarp-MESOS-MESOS-8534-allow-ip-per-container-in-pods

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit 68deedb42212c42c6c7719e8432fd6b0031239c4
Author: Sagar Sadashiv Patwardhan 
Date:   2018-02-09T00:50:44Z

Allow nested containers in pods to have separate namespaces.




> Allow nested containers in TaskGroups to have separate network namespaces
> -
>
> Key: MESOS-8534
> URL: https://issues.apache.org/jira/browse/MESOS-8534
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Sagar Sadashiv Patwardhan
>Priority: Minor
>  Labels: cni
>
> As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to 
> allow nested containers in TaskGroups to have separate namespaces. I am also 
> going to retain the existing functionality, where nested containers can 
> connect to parent/root containers namespace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6838) Reconsider the semantics of `subprocess` on Windows

2018-02-08 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357741#comment-16357741
 ] 

Andrew Schwartzmeyer commented on MESOS-6838:
-

This issue will be resolved with https://reviews.apache.org/r/65574/ by fixing 
the TODO assertion.

> Reconsider the semantics of `subprocess` on Windows
> ---
>
> Key: MESOS-6838
> URL: https://issues.apache.org/jira/browse/MESOS-6838
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: microsoft
>
> Right now, throughout the codebase, we are passing Windows shell commands 
> into `subprocess`'s `argv` parameter, and ignoreing the `path` parameter. For 
> example, we might do something like:
> {code}
> subprocess("", "cmd /c mesos-containerizer.exe", ...)
> {code}
> The `cmd /c` here is required. This obviously does not have high cohesion 
> with the Unix usage of this, so we should consider ways to clean this up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8378) ExamplesTest.PythonFramework stucks.

2018-02-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357714#comment-16357714
 ] 

Till Toenshoff commented on MESOS-8378:
---

The above was seen on El Capitan with Apple clang 8.x.

I can currently not reproduce this. It may very well be a OS or Apple clang 
version specific issue. All machines at my disposal run on High Sierra and 
Apple clang 9.x.

Closing this until it pops up again.


> ExamplesTest.PythonFramework stucks.
> 
>
> Key: MESOS-8378
> URL: https://issues.apache.org/jira/browse/MESOS-8378
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: MacOS with SSL
>Reporter: Alexander Rukletsov
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: flaky-test
> Attachments: ExamplesTest.PythonFramework-badrun1.txt, 
> ExamplesTest.PythonFramework-badrun2.txt
>
>
> Observed this failure today twice on MacOS box. Full logs attached. These 
> lines look suspicious to me:
> {noformat}
> 10:22:22 W0103 02:22:22.359180 3747840 sched.cpp:526] Authentication timed out
> 10:22:22 I0103 02:22:22.359292 1064960 sched.cpp:466] Failed to authenticate 
> with master master@10.0.49.4:62351: Authentication discarded
> 10:22:22 E0103 02:22:22.559609 528384 process.cpp:2922] libprocess: 
> slave(2)@10.0.49.4:62351 terminating due to unordered_map::at: key not found
> 10:22:22 E0103 02:22:22.947485 1064960 process.cpp:2922] libprocess: 
> slave(3)@10.0.49.4:62351 terminating due to unordered_map::at: key not found
> 10:22:23 E0103 02:22:23.008870 528384 process.cpp:2922] libprocess: 
> slave(1)@10.0.49.4:62351 terminating due to unordered_map::at: key not found
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8512) Fetcher doesn't log it's stdout/stderr properly to the log file

2018-02-08 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357712#comment-16357712
 ] 

Andrew Schwartzmeyer commented on MESOS-8512:
-

Going to add unit tests that check for stderr/stdout output, as they would have 
caught this bug.

> Fetcher doesn't log it's stdout/stderr properly to the log file
> ---
>
> Key: MESOS-8512
> URL: https://issues.apache.org/jira/browse/MESOS-8512
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
> Environment: Windows 10
>Reporter: Jeff Coffler
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: windows
>
> The fetcher doesn't log it's stdout or stderr to the task's output files as 
> it does on Linux. This makes it extraordinarily difficult to diagnose fetcher 
> failures (bad URI, or permissions problems, or whatever).
> It does not appear to be a glog issue. I added output to the fetcher via cout 
> and cerr, and that output didn't show up in the log files either. So it 
> appears to be a logging capture issue.
> Note that the container launcher, launched from 
> src/slave/containerizer/mesos/launcher.cpp, does appear to log properly. 
> However, when launching the fetcher itself from 
> src/slave/containerizer/fetcher.cpp (FetcherProcess::run), logging does not 
> happen properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8556) Boost emits warning repeatedly

2018-02-08 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8556:
---

 Summary: Boost emits warning repeatedly
 Key: MESOS-8556
 URL: https://issues.apache.org/jira/browse/MESOS-8556
 Project: Mesos
  Issue Type: Improvement
 Environment: Windows 10 with Boost 1.65.0
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer


Boost emits the following warning when its included in our build:

> Unknown compiler version - please run the configure tests and report the 
> results

It's not a bug, and it doesn't break anything. But it's annoying. And it's 
still in 1.65.0, but will be fixed in 1.65.1. It's just due to an out-of-date 
configuration file detecting the MSVC version (in this case, VS 2017).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8069) Role-related endpoints need to reflect hierarchical accounting.

2018-02-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8069:
-

Assignee: Till Toenshoff

> Role-related endpoints need to reflect hierarchical accounting.
> ---
>
> Key: MESOS-8069
> URL: https://issues.apache.org/jira/browse/MESOS-8069
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, HTTP API, master
>Reporter: Benjamin Mahler
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: multitenancy
>
> With the introduction of hierarchical roles, the role-related endpoints need 
> to be updated to provide aggregated accounting information.
> For example, information about how many resources are allocated to "/eng" 
> should include the resources allocated to "/eng/frontend" and "/eng/backend", 
> since quota guarantees and limits are also applied on the aggregation.
> This also affects the UI display, for example the 'Roles' tab.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8555) Support UCR on Windows

2018-02-08 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-8555:


 Summary: Support UCR on Windows
 Key: MESOS-8555
 URL: https://issues.apache.org/jira/browse/MESOS-8555
 Project: Mesos
  Issue Type: Epic
  Components: containerization
Reporter: Joseph Wu


Docker container support on Windows relies on calling this shim:
https://github.com/Microsoft/hcsshim

If we want to support container images in the Mesos Containerizer on Windows, 
we may need to look into doing the same thing (or similar).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-5371) Implement `fcntl.hpp`

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-5371:


Assignee: Andrew Schwartzmeyer  (was: Alex Clemmer)

> Implement `fcntl.hpp`
> -
>
> Key: MESOS-5371
> URL: https://issues.apache.org/jira/browse/MESOS-5371
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, stout, windows-mvp
>
> `fcntl.hpp` has a bunch of functions that will never work on Windows. We will 
> need to work around them, either by working around specific call sites of 
> functions like `os::cloexec`, or by implementing something that keeps track 
> of which file descriptors are cloexec, and which aren't.
> NOTE: We have elected to log warnings for these functions when we call them, 
> so that it is obvious they have done nothing. This carries a performance 
> penalty especially for the master, and when we resolve this issue, it is 
> important we remove the logging as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-3386) Port remaining Stout and libprocess tests to Windows

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-3386:


Assignee: Eric Mumau  (was: John Kordich)

> Port remaining Stout and libprocess tests to Windows
> 
>
> Key: MESOS-3386
> URL: https://issues.apache.org/jira/browse/MESOS-3386
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Alex Clemmer
>Assignee: Eric Mumau
>Priority: Major
>  Labels: build, mesosphere, microsoft, tests
>
> We will need to go through all the test files and investigate any test that's 
> marked `TEST_TEMP_DISABLED_ON_WINDOWS`.
> Additionally, here is a concise list of the Stout test files that aren't 
> compile as of 12/5/2016:
> {quote}
> Stout:
> path_tests.cpp
> protobuf_tests.cpp
> protobuf_tests.pb.cc
> svn_tests.cpp
> os/sendfile_tests.cpp
> os/signals_tests.cpp
> libprocess:
> io_tests.cpp
> reap_tests.cpp
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-3396) Fully separate out libprocess and Stout CMake build system from the Mesos build system

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-3396:


Assignee: Joseph Wu

> Fully separate out libprocess and Stout CMake build system from the Mesos 
> build system
> --
>
> Key: MESOS-3396
> URL: https://issues.apache.org/jira/browse/MESOS-3396
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: cmake, mesosphere
>
> The official goal is to be able to put libprocess and stout into a call to 
> `ExternalProject_Add`, rather than having them built in-tree as they are now. 
> Since Libprocess and Stout depend on a few variables being defined by the 
> project that is building against it (such as, e.g., the `LINUX` variable) 
> this will involve, at minimum, figuring out which `-D` flags have to be 
> passed through the `ExternalProject_Add` call.
> NOTE: This goal may not be feasible. We will need to trigger a rebuild of 
> many source files if we change a header in Libprocess or Stout, and a relink 
> if we change a .cpp file in the source files of Libprocess. This might 
> require a fair bit of effort.
> Another complication is that `StoutConfigure` manages the dependencies of 
> Stout, and Stout is built through `ExternalProject_Add`, we will need to make 
> sure this is managed in roughly the same way it is now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-3396) Fully separate out libprocess and Stout CMake build system from the Mesos build system

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-3396:


Assignee: (was: Andrew Schwartzmeyer)

> Fully separate out libprocess and Stout CMake build system from the Mesos 
> build system
> --
>
> Key: MESOS-3396
> URL: https://issues.apache.org/jira/browse/MESOS-3396
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: cmake, mesosphere
>
> The official goal is to be able to put libprocess and stout into a call to 
> `ExternalProject_Add`, rather than having them built in-tree as they are now. 
> Since Libprocess and Stout depend on a few variables being defined by the 
> project that is building against it (such as, e.g., the `LINUX` variable) 
> this will involve, at minimum, figuring out which `-D` flags have to be 
> passed through the `ExternalProject_Add` call.
> NOTE: This goal may not be feasible. We will need to trigger a rebuild of 
> many source files if we change a header in Libprocess or Stout, and a relink 
> if we change a .cpp file in the source files of Libprocess. This might 
> require a fair bit of effort.
> Another complication is that `StoutConfigure` manages the dependencies of 
> Stout, and Stout is built through `ExternalProject_Add`, we will need to make 
> sure this is managed in roughly the same way it is now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-5610) Transition Docker image "whiteout" deletion from FTS to something Windows-compatible

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-5610:


Assignee: (was: Andrew Schwartzmeyer)

> Transition Docker image "whiteout" deletion from FTS to something 
> Windows-compatible
> 
>
> Key: MESOS-5610
> URL: https://issues.apache.org/jira/browse/MESOS-5610
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: mesosphere, windows-mvp
>
> In `slave/containerizer/mesos/provisioner/provisioner.cpp`, the function 
> `ProvisionerProcess::__provision` uses the FTS API to walk the directory and 
> delete some files and directories per the Docker v1 spec[1].
> The FTS API conforms to BSD; it is not available on Windows.
> We therefore have 2 options: we can (1) prove we don't need this for Windows 
> Containers, or (2) implement a version of this code that does not depend on 
> FTS, in a manner similar to our modifications to `os::rmdir`.
> [1] https://github.com/docker/docker/blob/master/image/spec/v1.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6699) Port `command_utils_test.cpp`

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-6699:


Assignee: (was: Andrew Schwartzmeyer)

> Port `command_utils_test.cpp`
> -
>
> Key: MESOS-6699
> URL: https://issues.apache.org/jira/browse/MESOS-6699
> Project: Mesos
>  Issue Type: Wish
>  Components: agent
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: microsoft, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6692) Install module dependencies during build

2018-02-08 Thread Li Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357597#comment-16357597
 ] 

Li Li commented on MESOS-6692:
--

Besides Windows, there are some infra work needed to be done. Joseph knew the 
details.

> Install module dependencies during build
> 
>
> Key: MESOS-6692
> URL: https://issues.apache.org/jira/browse/MESOS-6692
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Alex Clemmer
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6706) Port `files_tests.cpp`

2018-02-08 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357598#comment-16357598
 ] 

Joseph Wu commented on MESOS-6706:
--

Two remaining disabled tests in this file:
{code}
TEST_F_TEMP_DISABLED_ON_WINDOWS(FilesTest, ResolveTest)
TEST_F_TEMP_DISABLED_ON_WINDOWS(FilesTest, BrowseTest)
{code}

> Port `files_tests.cpp`
> --
>
> Key: MESOS-6706
> URL: https://issues.apache.org/jira/browse/MESOS-6706
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: microsoft, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6721) Group source files into folders for IDE's

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-6721:


Assignee: (was: Alex Clemmer)

> Group source files into folders for IDE's
> -
>
> Key: MESOS-6721
> URL: https://issues.apache.org/jira/browse/MESOS-6721
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: cmake, microsoft
>
> CMake has good facilities for organizing source files in a project into 
> folders, but we don't really make use of them. This is especially bad for 
> IDEs like XCode and Visual Studio, where the source files will just end up in 
> a folder with literally everything that's included.
> For every executable and library we make, we should do something like this 
> (and it might be wrong, because my memory is hazy here):
> ```
> set_property(TARGET ${AGENT_TARGET} PROPERTY FOLDER "src/slave")
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6771) Add and vet `install` target

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-6771:


Assignee: Joseph Wu  (was: Andrew Schwartzmeyer)

> Add and vet `install` target
> 
>
> Key: MESOS-6771
> URL: https://issues.apache.org/jira/browse/MESOS-6771
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>Priority: Major
>  Labels: cmake, microsoft
>
> We need to be able to do something like `make install` and while CMake comes 
> with something like this out of the box, we do need to vet it (at the very 
> least).
> As a general note (as jpeach suggests), we should take care to not generate a 
> separate binary for `mesos-slave` and `mesos-agent`. If it exists at all, it 
> should be a symlink generated upon `install`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6973) Fix BOOST random generator initialization on Windows

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-6973:


Assignee: Joseph Wu  (was: Li Li)

> Fix BOOST random generator initialization on Windows
> 
>
> Key: MESOS-6973
> URL: https://issues.apache.org/jira/browse/MESOS-6973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Daniel Pravat
>Assignee: Joseph Wu
>Priority: Major
>  Labels: Windows, microsoft
>
> seed_rng::seed_rng does not produced the expected result in Windows since is 
> using `/dev/urandom` file.  
> 0:005> k
>  # Child-SP  RetAddr   Call Site
> 00 0049`22dfc108 7ff6`5193822f kernel32!CreateFileW
> ...
> 0e 0049`22dfc660 7ff6`502228fd 
> mesos_agent!boost::uuids::detail::seed_rng::seed_rng+0x3d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 80]
> 0f 0049`22dfc690 7ff6`502591e3 
> mesos_agent!boost::uuids::detail::seed int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >+0x4d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 246]
> 10 0049`22dfc790 7ff6`50395518 
> mesos_agent!boost::uuids::basic_random_generator int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >::basic_random_generator
>  >+0xd3 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\random_generator.hpp
>  @ 50]
> 11 0049`22dfc800 7ff6`500ad140 mesos_agent!id::UUID::random+0x78 
> [d:\repositories\mesoswin\3rdparty\stout\include\stout\uuid.hpp @ 49]
> 12 0049`22dfc870 7ff6`5007ff55 
> mesos_agent!mesos::internal::slave::Framework::launchExecutor+0x70 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 6301]
> 13 0049`22dfd520 7ff6`502a0a35 
> mesos_agent!mesos::internal::slave::Slave::_run+0x2455 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 1990]
> ...
> 0:005> du @rcx
> 01d7`cc55fb60  "/dev/urandom"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7833) stderr/stdout logs are failing to be served to Marathon

2018-02-08 Thread Li Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Li reassigned MESOS-7833:


Assignee: Andrew Schwartzmeyer  (was: John Kordich)

> stderr/stdout logs are failing to be served to Marathon
> ---
>
> Key: MESOS-7833
> URL: https://issues.apache.org/jira/browse/MESOS-7833
> Project: Mesos
>  Issue Type: Bug
> Environment: Windows 10 mesos-agent using the Mesos Containerizer
> CentOS 7 Marathon + mesos-master + zookeeper
> Deployed following [this 
> guide|https://github.com/Microsoft/mesos-log/blob/master/notes/deployment.md].
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: microsoft, windows
>
> Given an app in Marathon with the command {{powershell -noexit -c 
> get-process}}, we expect it to deploy, and the "Error Log" and "Output Log" 
> of the running instance to return the {{stderr}} and {{stdout}} files from 
> the agent.
> While the files exist on the agent with the appropriate contents, e.g. 
> {{work_dir\slaves\ff198863-667e-46b9-a64d-e22fdff3b3cb-S4\frameworks\ff198863-667e-46b9-a64d-e22fdff3b3cb-\executors\get-process.4211c4e3-7181-11e7-b702-00155dafc802\runs\7fc924b4-4ec1-4be6-9386-d4f7cc17d5ad}}
>  has {{stderr}} and {{stdout}}, and the latter has the output of 
> {{get-process}}, Marathon is unable to retrieve them.
> Clicking the link for the instance returns the error: "Sorry there was a 
> problem retrieving file. Click to retry."
> The Mesos master is receiving the request {{I0725 14:54:49.627329 226319 
> http.cpp:1133] HTTP GET for /master/state?jsonp=jsonp_15d7bbed282 from 
> 10.123.175.200:55885 ...}}, but no further logging is displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-4781) Executor env variables should not be leaked to the command task.

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357563#comment-16357563
 ] 

Jie Yu commented on MESOS-4781:
---

[~gilbert], are you still working on this?

> Executor env variables should not be leaked to the command task.
> 
>
> Key: MESOS-4781
> URL: https://issues.apache.org/jira/browse/MESOS-4781
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Major
>  Labels: mesosphere
>
> Currently, command task inherits the env variables of the command executor. 
> This is less ideal because the command executor environment variables include 
> some Mesos internal env variables like MESOS_XXX and LIBPROCESS_XXX. Also, 
> this behavior does not match what Docker containerizer does. We should 
> construct the env variables from scratch for the command task, rather than 
> relying on inheriting the env variables from the command executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8554) Enhance V1 scheduler send API to receive sync response from Master

2018-02-08 Thread Kapil Arya (JIRA)
Kapil Arya created MESOS-8554:
-

 Summary: Enhance V1 scheduler send API to receive sync response 
from Master
 Key: MESOS-8554
 URL: https://issues.apache.org/jira/browse/MESOS-8554
 Project: Mesos
  Issue Type: Task
  Components: HTTP API
Reporter: Kapil Arya
Assignee: Kapil Arya


Current scheduler HTTP API doesn't provide a way for the scheduler to get a 
synchronous response back from the Master. A synchronous API means the 
scheduler wouldn't have to wait on the event stream to check the status of 
operations that require master-only validation/approval/rejection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7450) Docker containerizer will leak dangling symlinks if restarted with a colon in the sandbox path

2018-02-08 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357440#comment-16357440
 ] 

Joseph Wu commented on MESOS-7450:
--

Yes.  Here's a relevant TODO:
https://github.com/apache/mesos/blob/c665dd6c22715fa941200020a8f7209f1f5b1ca1/src/slave/containerizer/docker.hpp#L445-L448

> Docker containerizer will leak dangling symlinks if restarted with a colon in 
> the sandbox path
> --
>
> Key: MESOS-7450
> URL: https://issues.apache.org/jira/browse/MESOS-7450
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.21.0, 1.2.0
>Reporter: Joseph Wu
>Priority: Major
>  Labels: mesosphere
>
> The Docker CLI has a limitation, which was worked around in MESOS-1833.
> TL;DR: If you launch a container with a colon ({{:}}) in the sandbox path, we 
> will create a symlink to that path and mount that symlink into the Docker 
> container.
> However, when you restart the Mesos agent after launching a container like 
> the above, the Docker containerizer will "forget" about the symlink and 
> thereby not clean it up when the container exits.  We will still GC the 
> actual sandbox, but not the symlink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`

2018-02-08 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-8550:


Assignee: Benno Evers

> Bug in `Master::detected()` leads to coredump in 
> `MasterZooKeeperTest.MasterInfoAddress`
> 
>
> Key: MESOS-8550
> URL: https://issues.apache.org/jira/browse/MESOS-8550
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master
>Reporter: Andrei Budnik
>Assignee: Benno Evers
>Priority: Major
> Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file 
> ../../3rdparty/stout/include/stout/option.hpp, line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if 
> you are using GNU date ***
> 15:55:17 PC: @ 0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 
> 0x70427000) stack trace: ***
> 15:55:17 @ 0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session 
> expired
> 15:55:17 @ 0x70425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x70633000):ZOO_INFO@log_env@794: 
> Client 
> environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/1mHCvU
> 15:55:17 @ 0x7fff4f84f312 abort
> 15:55:17 2018-02-07 
> 07:55:24,945:39896(0x70633000):ZOO_INFO@zookeeper_init@827: Initiating 
> client connection, host=127.0.0.1:52197 sessionTimeout=1 
> watcher=0x10d916590 sessionId=0 sessionPasswd= context=0x7fe1bda706a0 
> flags=0
> 15:55:17 @ 0x7fff4f817368 __assert_rtn
> 15:55:17 @0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17 @0x10bbb04b5 Option<>::operator->()
> 15:55:17 @0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17 @0x10bf54558 
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoSB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17 @0x10bf54310 
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoSD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17 @0x10bf542bb 
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1E13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1DTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_OSG_OSK_N5cpp1416integer_sequenceImJXspT2_SO_
> 15:55:17 @0x10bf541f3 
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_ES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOS1C_
> 15:55:17 @0x10bf540bd 
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS4_6FutureI6OptionINS6_10MasterInfoSG_EEvRKNS4_3PIDIT_EEMSI_FvT0_EOT1_EUlOSE_PNS4_11ProcessBaseEE_JSE_NSt3__112placeholders4__phILi1EEJST_EEEDTclclsr3stdE7forwardISI_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSI_DpOS10_
> 15:55:17 @0x10bf54081 
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS5_6FutureI6OptionINS7_10MasterInfoSH_EEvRKNS5_3PIDIT_EEMSJ_FvT0_EOT1_EUlOSF_PNS5_11ProcessBaseEE_JSF_NSt3__112placeholders4__phILi1EEJSU_EEEvOSJ_DpOT0_
> 15:55:17 @0x10bf53e06 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINSA_10MasterInfoSK_EEvRKNS1_3PIDIT_EEMSM_FvT0_EOT1_EUlOSI_S3_E_JSI_NSt3__112placeholders4__phILi1EEEclEOS3_
> 15:55:17 @0x10ebf464f 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> 15:55:17 @0x10ebf44c4 process::ProcessBase::consume()
> 15:55:17 @0x10ec6f4d9 
> _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
> 15:55:17 @0x10b0b2389 process::ProcessBase::serve()
> 15:55:17 @0x10ebe 

[jira] [Issue Comment Deleted] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`

2018-02-08 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik updated MESOS-8550:
-
Comment: was deleted

(was: [https://reviews.apache.org/r/65571/])

> Bug in `Master::detected()` leads to coredump in 
> `MasterZooKeeperTest.MasterInfoAddress`
> 
>
> Key: MESOS-8550
> URL: https://issues.apache.org/jira/browse/MESOS-8550
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master
>Reporter: Andrei Budnik
>Assignee: Benno Evers
>Priority: Major
> Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file 
> ../../3rdparty/stout/include/stout/option.hpp, line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if 
> you are using GNU date ***
> 15:55:17 PC: @ 0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 
> 0x70427000) stack trace: ***
> 15:55:17 @ 0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session 
> expired
> 15:55:17 @ 0x70425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x70633000):ZOO_INFO@log_env@794: 
> Client 
> environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/1mHCvU
> 15:55:17 @ 0x7fff4f84f312 abort
> 15:55:17 2018-02-07 
> 07:55:24,945:39896(0x70633000):ZOO_INFO@zookeeper_init@827: Initiating 
> client connection, host=127.0.0.1:52197 sessionTimeout=1 
> watcher=0x10d916590 sessionId=0 sessionPasswd= context=0x7fe1bda706a0 
> flags=0
> 15:55:17 @ 0x7fff4f817368 __assert_rtn
> 15:55:17 @0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17 @0x10bbb04b5 Option<>::operator->()
> 15:55:17 @0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17 @0x10bf54558 
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoSB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17 @0x10bf54310 
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoSD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17 @0x10bf542bb 
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1E13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1DTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_OSG_OSK_N5cpp1416integer_sequenceImJXspT2_SO_
> 15:55:17 @0x10bf541f3 
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_ES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOS1C_
> 15:55:17 @0x10bf540bd 
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS4_6FutureI6OptionINS6_10MasterInfoSG_EEvRKNS4_3PIDIT_EEMSI_FvT0_EOT1_EUlOSE_PNS4_11ProcessBaseEE_JSE_NSt3__112placeholders4__phILi1EEJST_EEEDTclclsr3stdE7forwardISI_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSI_DpOS10_
> 15:55:17 @0x10bf54081 
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS5_6FutureI6OptionINS7_10MasterInfoSH_EEvRKNS5_3PIDIT_EEMSJ_FvT0_EOT1_EUlOSF_PNS5_11ProcessBaseEE_JSF_NSt3__112placeholders4__phILi1EEJSU_EEEvOSJ_DpOT0_
> 15:55:17 @0x10bf53e06 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINSA_10MasterInfoSK_EEvRKNS1_3PIDIT_EEMSM_FvT0_EOT1_EUlOSI_S3_E_JSI_NSt3__112placeholders4__phILi1EEEclEOS3_
> 15:55:17 @0x10ebf464f 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> 15:55:17 @0x10ebf44c4 process::ProcessBase::consume()
> 15:55:17 @0x10ec6f4d9 
> _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
> 15:55:17 @0x10b0b2389 process::ProcessBase::serve()
> 

[jira] [Comment Edited] (MESOS-8463) Test MasterAllocatorTest/1.SingleFramework is flaky

2018-02-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357317#comment-16357317
 ] 

Till Toenshoff edited comment on MESOS-8463 at 2/8/18 6:00 PM:
---

This was fixed before on some of the allocator tests, but not all - I wonder 
why not all of them?

Note that the agent registers after a backoff delay.

1. pause the timer
2. advance the timer after the slave-startup by the authentication 
backoff-factor + registration backoff-factor
3. expect at least one {{addSlave}} on the allocator - but not exactly one - 
{{WillOnce(DoAll(InvokeAddSlave();}} could do here
(4.) satisfy a future during the {{addSlave}}  invocation to allow explicit 
waiting for that event on the allocator -- this could ease the stress on later  
awaits and hence would increase the sturdynes

For the scheduler registration, we need to do similar things - however without 
advancing the timer as we dont need to cover a backoff
2. expect at least one {{addFramework}} on the allocator - but again not 
exactly one
(3.) satisfy a future during the {{addFramework}}  invocation to allow explicit 
waiting for that event on the allocator


was (Author: tillt):
This was fixed before on some of the allocator tests, but not all - I wonder 
why not all of them?

Note that the agent registers after a backoff delay.

1. pause the timer
2. advance the timer after the slave-startup by the authentication 
backoff-factor + registration backoff-factor
3. expect at least one {{addSlave}} on the allocator - but not exactly one - 
{{WillOnce(DoAll(InvokeAddSlave();}} could do here
(4.) satisfy a future during the `addSlave`  invocation to allow explicit 
waiting for that event on the allocator -- this could ease the stress on later  
awaits and hence would increase the sturdynes

For the scheduler registration, we need to do similar things - however without 
advancing the timer as we dont need to cover a backoff
2. expect at least one {{addFramework}} on the allocator - but again not 
exactly one
(3.) satisfy a future during the `addFramework`  invocation to allow explicit 
waiting for that event on the allocator -- this could ease the stress on later  
awaits and hence would increase the sturdynes



> Test MasterAllocatorTest/1.SingleFramework is flaky
> ---
>
> Key: MESOS-8463
> URL: https://issues.apache.org/jira/browse/MESOS-8463
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, test
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: flaky-test
> Attachments: consoleText.txt
>
>
> Observed in our internal CI on a ubuntu-16 setup in a plain autotools build,
> {noformat}
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> range {
>   begin: 31000
>   end: 32000
> }
>   }
> }
> id {
>   value: "1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1"
> }
> checkpoint: true
> port: 40262
> , @0x7fe8ffa276c0 { 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 
> 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00>, 32-byte object <48-94 
> 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 
> 00-00 00-00>, 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 
> 01-00 00-00 00-00 00-00 03-00 00-00 73-79 73-74> }, @0x7fe8ffa27720 48-byte 
> object <01-00 00-00 E8-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 08-7A A2-FF E8-7F 00-00 A0-32 24-7D 62-55 00-00 DE-3C 11-0A E9-7F 
> 00-00>, @0x7fe8dc03d4c8 { cpus:2, mem:1024, ports:[31000-32000] }, 
> @0x7fe8dc03d460 {})
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> Stacktrace
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> 

[jira] [Commented] (MESOS-5268) Cgroups CpushareIsolator don't take effect on SLES 11 SP2 SP3

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357318#comment-16357318
 ] 

Jie Yu commented on MESOS-5268:
---

[~AndyPang] is this still an issue for you? Do you plan to work on that?

> Cgroups CpushareIsolator don't take effect on SLES 11 SP2 SP3
> -
>
> Key: MESOS-5268
> URL: https://issues.apache.org/jira/browse/MESOS-5268
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.27.0
> Environment: suse 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 
> UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>Priority: Major
>  Labels: cgroups
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> meson run in SLES  11 sp2 sp3, kernel version 3.0.13/3.076, cpushareisolator 
> don't take effect. Two framework cpushare proportion is 1:3, we find at last 
> in mesos container cpu.shares value is right, but  when we use "top" to see 
> result, the cpu usage is not 1:3. Our Application is multithread and can 
> fulfil the cpu quota when single run.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7069) The linux filesystem isolator should set mode and ownership for host volumes.

2018-02-08 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357316#comment-16357316
 ] 

Ilya Pronin commented on MESOS-7069:


[~jieyu] I believe this was fixed in https://reviews.apache.org/r/61122/. 
Closing this issue.

> The linux filesystem isolator should set mode and ownership for host volumes.
> -
>
> Key: MESOS-7069
> URL: https://issues.apache.org/jira/browse/MESOS-7069
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Ilya Pronin
>Priority: Major
>  Labels: filesystem, linux, volumes
>
> If the host path is a relative path, the linux filesystem isolator should set 
> the mode and ownership for this host volume since it allows non-root user to 
> write to the volume. Note that this is the case of sharing the host 
> fileysystem (without rootfs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8463) Test MasterAllocatorTest/1.SingleFramework is flaky

2018-02-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357317#comment-16357317
 ] 

Till Toenshoff commented on MESOS-8463:
---

This was fixed before on some of the allocator tests, but not all - I wonder 
why not all of them?

Note that the agent registers after a backoff delay.

1. pause the timer
2. advance the timer after the slave-startup by the authentication 
backoff-factor + registration backoff-factor
3. expect at least one {{addSlave}} on the allocator - but not exactly one - 
{{WillOnce(DoAll(InvokeAddSlave();}} could do here
(4.) satisfy a future during the `addSlave`  invocation to allow explicit 
waiting for that event on the allocator -- this could ease the stress on later  
awaits and hence would increase the sturdynes

For the scheduler registration, we need to do similar things - however without 
advancing the timer as we dont need to cover a backoff
2. expect at least one {{addFramework}} on the allocator - but again not 
exactly one
(3.) satisfy a future during the `addFramework`  invocation to allow explicit 
waiting for that event on the allocator -- this could ease the stress on later  
awaits and hence would increase the sturdynes



> Test MasterAllocatorTest/1.SingleFramework is flaky
> ---
>
> Key: MESOS-8463
> URL: https://issues.apache.org/jira/browse/MESOS-8463
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, test
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: flaky-test
> Attachments: consoleText.txt
>
>
> Observed in our internal CI on a ubuntu-16 setup in a plain autotools build,
> {noformat}
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> range {
>   begin: 31000
>   end: 32000
> }
>   }
> }
> id {
>   value: "1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1"
> }
> checkpoint: true
> port: 40262
> , @0x7fe8ffa276c0 { 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 
> 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00>, 32-byte object <48-94 
> 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 
> 00-00 00-00>, 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 
> 01-00 00-00 00-00 00-00 03-00 00-00 73-79 73-74> }, @0x7fe8ffa27720 48-byte 
> object <01-00 00-00 E8-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 08-7A A2-FF E8-7F 00-00 A0-32 24-7D 62-55 00-00 DE-3C 11-0A E9-7F 
> 00-00>, @0x7fe8dc03d4c8 { cpus:2, mem:1024, ports:[31000-32000] }, 
> @0x7fe8dc03d460 {})
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> Stacktrace
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> range {
>   begin: 31000
>   end: 32000
> }
>   }
> }
> id {
>   value: "1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1"
> }
> checkpoint: true
> port: 40262
> , @0x7fe8ffa276c0 { 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 
> 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00>, 32-byte object <48-94 
> 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 
> 00-00 00-00>, 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 
> 01-00 00-00 00-00 00-00 03-00 00-00 73-79 73-74> }, @0x7fe8ffa27720 48-byte 
> object <01-00 00-00 E8-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 08-7A A2-FF E8-7F 00-00 A0-32 24-7D 62-55 00-00 DE-3C 11-0A E9-7F 
> 00-00>, @0x7fe8dc03d4c8 { cpus:2, mem:1024, ports:[31000-32000] }, 
> @0x7fe8dc03d460 {})
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6656) Nested containers can become unkillable

2018-02-08 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357313#comment-16357313
 ] 

Greg Mann commented on MESOS-6656:
--

This seems to have been resolved by MESOS-7858.

> Nested containers can become unkillable
> ---
>
> Key: MESOS-6656
> URL: https://issues.apache.org/jira/browse/MESOS-6656
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: nested
>
> An incident occurred recently in a cluster running a build of Mesos based on 
> commit {{757319357471227c0a1e906076eae8f9aa2fdbd6}} from master. A task group 
> of five tasks was launched via Marathon. After the tasks were launched, one 
> of the containers quickly exited and was successfully destroyed. A couple 
> minutes later, the task group was killed manually via Marathon, and the agent 
> can then be seen repeatedly attempting to kill the tasks for hours. No calls 
> to {{WAIT_NESTED_CONTAINER}} are visible in the agent logs, and the executor 
> logs do not indicate at any point that the nested containers were launched 
> successfully.
> Agent logs:
> {code}
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.890911  
> 6406 slave.cpp:1539] Got assigned task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892299  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892379  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/meta/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893131  
> 6405 slave.cpp:1701] Launching task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893435  
> 6405 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
>  to user 'root'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898026  
> 6405 slave.cpp:6179] Launching executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741- with resources cpus(*):0.1; 
> mem(*):32; disk(*):10; ports(*):[21421-21425] in work directory 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898731  
> 6407 docker.cpp:1000] Skipping non-docker container
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899050  
> 6407 containerizer.cpp:938] Starting container 
> 8750c2a7-8bef-4a69-8ef2-b873f884bf91 for executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899909  
> 6405 slave.cpp:1987] Queued task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> executor 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of 
> framework 

[jira] [Assigned] (MESOS-5754) CommandInfo.user not honored in docker containerizer

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-5754:
-

Assignee: (was: Gilbert Song)

> CommandInfo.user not honored in docker containerizer
> 
>
> Key: MESOS-5754
> URL: https://issues.apache.org/jira/browse/MESOS-5754
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker
>Affects Versions: 1.0.0, 1.2.3, 1.3.1, 1.4.1, 1.5.0
>Reporter: Michael Gummelt
>Priority: Major
>  Labels: mesosphere
>
> Repro by creating a framework that starts a task with CommandInfo.user set, 
> and observe that the dockerized executor is still running as the default 
> (e.g. root).
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-5866) MESOS_DIRECTORY set to a host path when using a docker image w/ unified containerizer

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357302#comment-16357302
 ] 

Jie Yu commented on MESOS-5866:
---

MESOS_SANDBOX should be the one you use. MESOS_DIRECTORY is deprecated. CLosing 
this one. It's documented already.

> MESOS_DIRECTORY set to a host path when using a docker image w/ unified 
> containerizer
> -
>
> Key: MESOS-5866
> URL: https://issues.apache.org/jira/browse/MESOS-5866
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.2
>Reporter: Michael Gummelt
>Priority: Major
>
> Running Spark with the unified containerizer, it fails with:
> {code}
> 16/07/19 21:03:09 INFO DAGScheduler: ResultStage 0 (reduce at 
> SparkPi.scala:36) failed in Unknown s due to Job aborted due to stage 
> failure: Task serialization failed: java.io.IOException: Failed to create 
> local dir in 
> /var/lib/mesos/slave/slaves/003ebcc2-64e2-488f-87b9-f6fa7630c01b-S0/frameworks/003ebcc2-64e2-488f-87b9-f6fa7630c01b-0001/executors/driver-20160719210109-0002/runs/8f21b32e-b929-4369-bce9-9f49a3a8844f/blockmgr-e3a611d4-e0de-48cb-b17a-1e41d97e84c2/11.
> {code}
> This is because MESOS_DIRECTORY is set to /var/lib/mesos/, which is a 
> host path.  The container can't see the host path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-5953) Default work dir is not root for unified containerizer and docker

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-5953:
-

Assignee: Gilbert Song

> Default work dir is not root for unified containerizer and docker
> -
>
> Key: MESOS-5953
> URL: https://issues.apache.org/jira/browse/MESOS-5953
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>Assignee: Gilbert Song
>Priority: Major
>
> According to the docker spec, the default working directory (WORKDIR) is root 
> /. https://docs.docker.com/engine/reference/run/#/workdir
> The unified containerizer with the docker runtime isolator sets the default 
> working directory to /tmp/mesos/sandbox.
> Hence, dockerfiles that are relying on the default workdir will not work 
> because the pwd is changed by mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6340) Set HOME for Mesos tasks

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6340:
-

Assignee: Jie Yu

> Set HOME for Mesos tasks
> 
>
> Key: MESOS-6340
> URL: https://issues.apache.org/jira/browse/MESOS-6340
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Cody Maloney
>Assignee: Jie Yu
>Priority: Major
>
> Quite a few programs assume {{$HOME}} points to a user-editable data file 
> directory.
> One example is PYTHON, which tries to look up $HOME to find user-installed 
> pacakges, and if that fails it tries to look up the user in the passwd 
> database which often goes badly (The container is running under the `nobody` 
> user):
> {code}
> if i == 1:
> if 'HOME' not in os.environ:
> import pwd
> userhome = pwd.getpwuid(os.getuid()).pw_dir
> else:
> userhome = os.environ['HOME']
> {code}
> Just setting HOME by default to WORK_DIR would enable more software to work 
> correctly out of the box. Software which needs to specialize / change it (or 
> schedulers with specific preferences), should still be able to set it 
> arbitrarily and anything a scheduler explicitly sets should overwrite the 
> default value of $WORK_DIR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6555) Namespace 'mnt' is not supported

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6555:
-

Assignee: James Peach

> Namespace 'mnt' is not supported
> 
>
> Key: MESOS-6555
> URL: https://issues.apache.org/jira/browse/MESOS-6555
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, containerization
>Affects Versions: 1.0.0, 1.2.3, 1.3.1, 1.4.1, 1.5.0
> Environment: suse11 sp3,  kernal: 3.0.101-0.47.71-default #1 SMP Thu 
> Nov 12 12:22:22 UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux 
>Reporter: AndyPang
>Assignee: James Peach
>Priority: Minor
>
> the same code run in debain os,kernal version is '4.1.0-0' is ok; while in 
> sus 11 sp3 it has error.
> {code:title=mesos-execute|borderStyle=solid}
> ./mesos-execute --command="sleep 100" --master=:xxx  --name=sleep 
> --docker_image=ubuntu
> I1105 11:26:21.090703 194814 scheduler.cpp:172] Version: 1.0.0
> I1105 11:26:21.092821 194837 scheduler.cpp:461] New master detected at 
> master@:xxx
> Subscribed with ID 'fdb8546d-ca11-4a51-a297-8401e53b7692-'
> Submitted task 'sleep' to agent 'fdb8546d-ca11-4a51-a297-8401e53b7692-S0'
> Received status update TASK_FAILED for task 'sleep'
>   message: 'Failed to launch container: Collect failed: Failed to setup 
> hostname and network files: Failed to enter the mount namespace of pid 
> 194976: Namespace 'mnt' is not supported
> ; Executor terminated'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6555) Namespace 'mnt' is not supported

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357283#comment-16357283
 ] 

Jie Yu commented on MESOS-6555:
---

We should add some check during agent startup.

> Namespace 'mnt' is not supported
> 
>
> Key: MESOS-6555
> URL: https://issues.apache.org/jira/browse/MESOS-6555
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, containerization
>Affects Versions: 1.0.0, 1.2.3, 1.3.1, 1.4.1, 1.5.0
> Environment: suse11 sp3,  kernal: 3.0.101-0.47.71-default #1 SMP Thu 
> Nov 12 12:22:22 UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux 
>Reporter: AndyPang
>Priority: Major
>
> the same code run in debain os,kernal version is '4.1.0-0' is ok; while in 
> sus 11 sp3 it has error.
> {code:title=mesos-execute|borderStyle=solid}
> ./mesos-execute --command="sleep 100" --master=:xxx  --name=sleep 
> --docker_image=ubuntu
> I1105 11:26:21.090703 194814 scheduler.cpp:172] Version: 1.0.0
> I1105 11:26:21.092821 194837 scheduler.cpp:461] New master detected at 
> master@:xxx
> Subscribed with ID 'fdb8546d-ca11-4a51-a297-8401e53b7692-'
> Submitted task 'sleep' to agent 'fdb8546d-ca11-4a51-a297-8401e53b7692-S0'
> Received status update TASK_FAILED for task 'sleep'
>   message: 'Failed to launch container: Collect failed: Failed to setup 
> hostname and network files: Failed to enter the mount namespace of pid 
> 194976: Namespace 'mnt' is not supported
> ; Executor terminated'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6422) cgroups_tests not correctly tearing down testing hierarchies

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357286#comment-16357286
 ] 

Jie Yu commented on MESOS-6422:
---

[~xujyan], do you plan to work on this one?

> cgroups_tests not correctly tearing down testing hierarchies
> 
>
> Key: MESOS-6422
> URL: https://issues.apache.org/jira/browse/MESOS-6422
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, containerization
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Minor
>  Labels: cgroups
>
> We currently do the following in 
> [CgroupsTest::TearDownTestCase()|https://github.com/apache/mesos/blob/5e850a362edbf494921fedff4037cf4b53088c10/src/tests/containerizer/cgroups_tests.cpp#L83]
> {code:title=}
> static void TearDownTestCase()
> {
>   AWAIT_READY(cgroups::cleanup(TEST_CGROUPS_HIERARCHY));
> }
> {code}
> One of its derived test {{CgroupsNoHierarchyTest}} treats 
> {{TEST_CGROUPS_HIERARCHY}} as a hierarchy so it's able to clean it up as a 
> hierarchy.
> However another derived test {{CgroupsAnyHierarchyTest}} would create new 
> hierarchies (if none is available) using {{TEST_CGROUPS_HIERARCHY}} as a 
> parent directory (i.e., base hierarchy) and not as a hierarchy, so when it's 
> time to clean up, it fails:
> {noformat:title=}
> [   OK ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems (1 ms)
> ../../src/tests/containerizer/cgroups_tests.cpp:88: Failure
> (cgroups::cleanup(TEST_CGROUPS_HIERARCHY)).failure(): Operation not permitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6656) Nested containers can become unkillable

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6656:
-

Assignee: Jie Yu

> Nested containers can become unkillable
> ---
>
> Key: MESOS-6656
> URL: https://issues.apache.org/jira/browse/MESOS-6656
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Greg Mann
>Assignee: Jie Yu
>Priority: Major
>  Labels: nested
>
> An incident occurred recently in a cluster running a build of Mesos based on 
> commit {{757319357471227c0a1e906076eae8f9aa2fdbd6}} from master. A task group 
> of five tasks was launched via Marathon. After the tasks were launched, one 
> of the containers quickly exited and was successfully destroyed. A couple 
> minutes later, the task group was killed manually via Marathon, and the agent 
> can then be seen repeatedly attempting to kill the tasks for hours. No calls 
> to {{WAIT_NESTED_CONTAINER}} are visible in the agent logs, and the executor 
> logs do not indicate at any point that the nested containers were launched 
> successfully.
> Agent logs:
> {code}
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.890911  
> 6406 slave.cpp:1539] Got assigned task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892299  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892379  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/meta/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893131  
> 6405 slave.cpp:1701] Launching task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893435  
> 6405 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
>  to user 'root'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898026  
> 6405 slave.cpp:6179] Launching executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741- with resources cpus(*):0.1; 
> mem(*):32; disk(*):10; ports(*):[21421-21425] in work directory 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898731  
> 6407 docker.cpp:1000] Skipping non-docker container
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899050  
> 6407 containerizer.cpp:938] Starting container 
> 8750c2a7-8bef-4a69-8ef2-b873f884bf91 for executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899909  
> 6405 slave.cpp:1987] Queued task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> executor 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 

[jira] [Assigned] (MESOS-6656) Nested containers can become unkillable

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6656:
-

Assignee: Greg Mann  (was: Jie Yu)

> Nested containers can become unkillable
> ---
>
> Key: MESOS-6656
> URL: https://issues.apache.org/jira/browse/MESOS-6656
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Major
>  Labels: nested
>
> An incident occurred recently in a cluster running a build of Mesos based on 
> commit {{757319357471227c0a1e906076eae8f9aa2fdbd6}} from master. A task group 
> of five tasks was launched via Marathon. After the tasks were launched, one 
> of the containers quickly exited and was successfully destroyed. A couple 
> minutes later, the task group was killed manually via Marathon, and the agent 
> can then be seen repeatedly attempting to kill the tasks for hours. No calls 
> to {{WAIT_NESTED_CONTAINER}} are visible in the agent logs, and the executor 
> logs do not indicate at any point that the nested containers were launched 
> successfully.
> Agent logs:
> {code}
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.890911  
> 6406 slave.cpp:1539] Got assigned task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892299  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892379  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/meta/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893131  
> 6405 slave.cpp:1701] Launching task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893435  
> 6405 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
>  to user 'root'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898026  
> 6405 slave.cpp:6179] Launching executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741- with resources cpus(*):0.1; 
> mem(*):32; disk(*):10; ports(*):[21421-21425] in work directory 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898731  
> 6407 docker.cpp:1000] Skipping non-docker container
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899050  
> 6407 containerizer.cpp:938] Starting container 
> 8750c2a7-8bef-4a69-8ef2-b873f884bf91 for executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899909  
> 6405 slave.cpp:1987] Queued task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> executor 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 

[jira] [Assigned] (MESOS-6798) Volumes in `/dev/shm` overridden by mesos containerizer

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6798:
-

Assignee: Jason Lai

> Volumes in `/dev/shm` overridden by mesos containerizer
> ---
>
> Key: MESOS-6798
> URL: https://issues.apache.org/jira/browse/MESOS-6798
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.1, 1.2.0, 1.3.1, 1.4.1, 1.5.0
>Reporter: Zhongbo Tian
>Assignee: Jason Lai
>Priority: Major
>
> When making a volume into `/dev/shm`, the volume is overridden by default 
> mount point.
> For example:
> {code}
> mesos-execute --master=mesos-master --name=test --docker_image=busybox 
> --volumes='[{"container_path":"/tmp/hosts", "host_path":"/etc/hosts", 
> "mode":"RO"}]' --command="cat /tmp/hosts"
> {code}
> This will get an error for "No such file or directory"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6874) Agent silently ignores FS isolation when protobuf is malformed

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6874:
-

Assignee: (was: Gilbert Song)

> Agent silently ignores FS isolation when protobuf is malformed
> --
>
> Key: MESOS-6874
> URL: https://issues.apache.org/jira/browse/MESOS-6874
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Michael Gummelt
>Priority: Minor
>  Labels: newbie
>
> cc [~vinodkone]
> I accidentally set my Mesos ContainerInfo to include a DockerInfo instead of 
> a MesosInfo:
> {code}
> executorInfoBuilder.setContainer(
>  Protos.ContainerInfo.newBuilder()
>  .setType(Protos.ContainerInfo.Type.MESOS)
>  .setDocker(Protos.ContainerInfo.DockerInfo.newBuilder()
>  
> .setImage(podSpec.getContainer().get().getImageName()))
> {code}
> I would have expected a validation error before or during containerization, 
> but instead, the agent silently decided to ignore filesystem isolation 
> altogether, and launch my executor on the host filesystem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7069) The linux filesystem isolator should set mode and ownership for host volumes.

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357272#comment-16357272
 ] 

Jie Yu commented on MESOS-7069:
---

[~ipronin] is this still an issue? Or we can close this one?

> The linux filesystem isolator should set mode and ownership for host volumes.
> -
>
> Key: MESOS-7069
> URL: https://issues.apache.org/jira/browse/MESOS-7069
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Ilya Pronin
>Priority: Major
>  Labels: filesystem, linux, volumes
>
> If the host path is a relative path, the linux filesystem isolator should set 
> the mode and ownership for this host volume since it allows non-root user to 
> write to the volume. Note that this is the case of sharing the host 
> fileysystem (without rootfs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7450) Docker containerizer will leak dangling symlinks if restarted with a colon in the sandbox path

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357265#comment-16357265
 ] 

Jie Yu commented on MESOS-7450:
---

[~kaysoky] is this still an issue?

> Docker containerizer will leak dangling symlinks if restarted with a colon in 
> the sandbox path
> --
>
> Key: MESOS-7450
> URL: https://issues.apache.org/jira/browse/MESOS-7450
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.21.0, 1.2.0
>Reporter: Joseph Wu
>Priority: Major
>  Labels: mesosphere
>
> The Docker CLI has a limitation, which was worked around in MESOS-1833.
> TL;DR: If you launch a container with a colon ({{:}}) in the sandbox path, we 
> will create a symlink to that path and mount that symlink into the Docker 
> container.
> However, when you restart the Mesos agent after launching a container like 
> the above, the Docker containerizer will "forget" about the symlink and 
> thereby not clean it up when the container exits.  We will still GC the 
> actual sandbox, but not the symlink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7543) Allow isolators to specify secret environment

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357262#comment-16357262
 ] 

Jie Yu commented on MESOS-7543:
---

[~karya], what's this ticket about? Re-open if this is still valid.

> Allow isolators to specify secret environment
> -
>
> Key: MESOS-7543
> URL: https://issues.apache.org/jira/browse/MESOS-7543
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, security
>Reporter: Kapil Arya
>Priority: Major
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7599) Mesos Containerizer Cannot Pull from Certain Registries

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7599:
-

Assignee: Gilbert Song

> Mesos Containerizer Cannot Pull from Certain Registries 
> 
>
> Key: MESOS-7599
> URL: https://issues.apache.org/jira/browse/MESOS-7599
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Max Ehrlich
>Assignee: Gilbert Song
>Priority: Major
>
> I have a docker image that is on a registry hosted by gitlab. When I try to 
> start this container using the Mesos containerizer, it is never scheduled. I 
> have a feeling this is from the unusual name for the image that gitlab uses, 
> but I haven't had time to look into the code yet. I have also tried this with 
> a private gitlab instance that is password protected and I have a similar 
> issue (there also seems to be an unrelated issue that the Mesos containerizer 
> doesn't support password protected registries).
> Example image names are as follows
> * registry.gitlab.com/queuecumber/page/excon (public image)
> * gitlab..com:5005/sri/registry/baseline_combo01 (private, password 
> protected)
> The images seem to work using the Docker containerizer, and again I suspect 
> this related to those long names with lots of / in them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7617) UCR cannot read docker images containing long file paths

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7617:
-

Assignee: Chun-Hung Hsiao

> UCR cannot read docker images containing long file paths
> 
>
> Key: MESOS-7617
> URL: https://issues.apache.org/jira/browse/MESOS-7617
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0, 1.3.1
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Major
>  Labels: containerizer, triaged
>
> The latest Docker uses go 1.7.5 
> (https://github.com/moby/moby/blob/master/CHANGELOG.md#contrib-1), in which 
> the {{archive/tar}} package has a bug that cannot handle file paths longer 
> than 100 characters (https://github.com/golang/go/issues/17630). As a result, 
> Docker will generate images containing ill-formed tar files (details below) 
> when there are long paths. Docker itself understands the ill-formed image 
> fine, but a standard tar program will interpret the image as if all files 
> with long paths are placed under the root directory 
> (https://github.com/moby/moby/issues/29360).
> This bug has been fixed in go 1.8, but since Docker is still using the bugged 
> version, we might need to handle these ill-formed images created by Dcoker 
> utilities.
> NOTE: It is confirmed that the {{archive/tar}} package in go 1.8 cannot 
> correctly extract the ill-formed tar files, but the one in go 1.7.5 could.
> Details: the {{archive/tar}} package uses {{USTAR}} format to handle files 
> with 100+-character-long paths (by only putting file name in the {{name}} 
> field and the path in the {{prefix}} field in the tar header), but uses 
> {{OLDGNU}}'s magic string, which does not understand the {{prefix}} field, so 
> a standard tar program will extract such files under the current directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7645) Support RO mode for bind mount volumes with filesystem/linux isolator

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7645:
-

Assignee: Jie Yu  (was: Charles Raimbert)

> Support RO mode for bind mount volumes with filesystem/linux isolator
> -
>
> Key: MESOS-7645
> URL: https://issues.apache.org/jira/browse/MESOS-7645
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Charles Raimbert
>Assignee: Jie Yu
>Priority: Major
>  Labels: storage
>
> The filesystem/linux isolator currently creates all bind mount volumes as RW, 
> even if a volume mode is set as RO.
> The TODO in the isolator code helps to spot the missing capability:
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L587
> {code}
> // TODO(jieyu): Consider the mode in the volume.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7685) Issue using S3FS from docker container with the mesos containerizer

2018-02-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357255#comment-16357255
 ] 

Jie Yu commented on MESOS-7685:
---

If you are using cgroups devices isolator, /dev/fuse won't be accessible. 

> Issue using S3FS from docker container with the mesos containerizer
> ---
>
> Key: MESOS-7685
> URL: https://issues.apache.org/jira/browse/MESOS-7685
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.1.0
>Reporter: Andrei Filip
>Priority: Major
>
> I have a docker image which uses S3FS to mount an amazon S3 bucket for use as 
> a local filesystem. Playing around with this container manually, using 
> docker, i am able to use S3FS as expected.
> When trying to use this image with the mesos containerizer, i get the 
> following error:
> fuse: device not found, try 'modprobe fuse' first
> The way i'm launching a job that runs this s3fs command is via the aurora 
> scheduler. Somehow it seems that docker is able to use the fuse kernel 
> plugin, but the mesos containerizer does not.
> I've also created a stackoverflow topic about this issue here: 
> https://stackoverflow.com/questions/44569238/using-s3fs-in-a-docker-container-ran-by-the-mesos-containerizer/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7685) Issue using S3FS from docker container with the mesos containerizer

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7685:
-

Assignee: Jie Yu

> Issue using S3FS from docker container with the mesos containerizer
> ---
>
> Key: MESOS-7685
> URL: https://issues.apache.org/jira/browse/MESOS-7685
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.1.0
>Reporter: Andrei Filip
>Assignee: Jie Yu
>Priority: Major
>
> I have a docker image which uses S3FS to mount an amazon S3 bucket for use as 
> a local filesystem. Playing around with this container manually, using 
> docker, i am able to use S3FS as expected.
> When trying to use this image with the mesos containerizer, i get the 
> following error:
> fuse: device not found, try 'modprobe fuse' first
> The way i'm launching a job that runs this s3fs command is via the aurora 
> scheduler. Somehow it seems that docker is able to use the fuse kernel 
> plugin, but the mesos containerizer does not.
> I've also created a stackoverflow topic about this issue here: 
> https://stackoverflow.com/questions/44569238/using-s3fs-in-a-docker-container-ran-by-the-mesos-containerizer/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8105) Docker containerizer fails with "Unable to get executor pid after launch"

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8105:
-

Assignee: (was: Jie Yu)

> Docker containerizer fails with "Unable to get executor pid after launch"
> -
>
> Key: MESOS-8105
> URL: https://issues.apache.org/jira/browse/MESOS-8105
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: maybob
>Priority: Major
>  Labels: docker
>
> When running lots of command at the same time by each command using same 
> executor with different executorId by docker,some executor occur error 
> "Unable to get executor pid after launch". 
> Reason of this error may be "docker inspect" hangs or exit 0 with pid 0. 
> Another reason may be lots of docker consume many resources, e.g file 
> descriptor.
> {color:red}Log:{color}
> {code:java}
> I1012 16:15:01.003931 124081 slave.cpp:1619] Got assigned task '920860' for 
> framework framework-id-daily
> I1012 16:15:01.006091 124081 slave.cpp:1900] Authorizing task '920860' for 
> framework framework-id-daily
> I1012 16:15:01.008281 124081 slave.cpp:2087] Launching task '920860' for 
> framework framework-id-daily
> I1012 16:15:01.008779 124081 paths.cpp:573] Trying to chown 
> '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
>  to user 'maybob'
> I1012 16:15:01.009027 124081 slave.cpp:7401] Checkpointing ExecutorInfo to 
> '/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/executor.info'
> I1012 16:15:01.009546 124081 slave.cpp:7038] Launching executor 
> 'Executor_920860' of framework framework-id-daily with resources {} in work 
> directory 
> '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
> I1012 16:15:01.010339 124081 slave.cpp:7429] Checkpointing TaskInfo to 
> '/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3/tasks/920860/task.info'
> I1012 16:15:01.010726 124081 slave.cpp:2316] Queued task '920860' for 
> executor 'Executor_920860' of framework framework-id-daily
> I1012 16:15:01.011740 124088 docker.cpp:1175] Starting container 
> '29c82b61-1242-4de9-80cf-16f46c30e7e3' for executor 'Executor_920860' and 
> framework framework-id-daily
> I1012 16:15:01.013123 124081 slave.cpp:877] Successfully attached file 
> '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
> I1012 16:15:01.013290 124080 fetcher.cpp:353] Starting to fetch URIs for 
> container: 29c82b61-1242-4de9-80cf-16f46c30e7e3, directory: 
> /volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:01.706429 124071 docker.cpp:909] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 378 --memory 427819008 -e 
> LIBPROCESS_PORT=0 -e MESOS_AGENT_ENDPOINT=xxx.xxx.xxx.xxx:5051 -e 
> MESOS_CHECKPOINT=1 -e 
> MESOS_CONTAINER_NAME=mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
>  -e 
> MESOS_DIRECTORY=/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3
>  -e MESOS_EXECUTOR_ID=Executor_920860 -e 
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs -e 
> MESOS_FRAMEWORK_ID=framework-id-daily -e MESOS_HTTP_COMMAND_EXECUTOR=0 -e 
> MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e 
> MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e 
> MESOS_RECOVERY_TIMEOUT=15mins -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_SLAVE_ID=89192f68-d28f-498c-808f-442a1ef576b3-S2 -e 
> MESOS_SLAVE_PID=slave(1)@xxx.xxx.xxx.xxx:5051 -e 
> MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs -v 
> /volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
>  reg.docker.xxx/xx/executor:v25 -c env && cd $MESOS_SANDBOX && 
> ./executor.sh
> I1012 16:15:01.717859 124071 docker.cpp:1071] Running docker -H 
> unix:///var/run/docker.sock inspect 
> mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:02.033951 

[jira] [Assigned] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8158:
-

Assignee: (was: Gilbert Song)

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --
>
> Key: MESOS-8158
> URL: https://issues.apache.org/jira/browse/MESOS-8158
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker, executor
>Affects Versions: 1.4.0
> Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>Reporter: Charles Allen
>Priority: Major
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
> not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> Wed Nov  1 20:34:06 UTC 2017
> {code}
> But, somehow there is a TASK_FAILED message sent to marathon.
> Upon further investigation, the following snippet can be found in the agent 
> logs (running in a docker container)
> {code}
> I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  from gc
> I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling 
> 

[jira] [Assigned] (MESOS-8398) External volumes (through docker/volume isolator) might not be accessible by non-root users.

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8398:
-

Assignee: Jie Yu

> External volumes (through docker/volume isolator) might not be accessible by 
> non-root users.
> 
>
> Key: MESOS-8398
> URL: https://issues.apache.org/jira/browse/MESOS-8398
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Major
>
> That's because we don't perform chown/chmod for external volumes at the 
> moment (because it might be shared across multiple containers). If the 
> container is launched using non-root users, it might not be able to access to 
> the external volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8552) Failing `CGROUPS_ROOT_PidNamespaceForward` and `CGROUPS_ROOT_PidNamespaceBackward`

2018-02-08 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu reassigned MESOS-8552:
---

Assignee: Meng Zhu

> Failing `CGROUPS_ROOT_PidNamespaceForward` and 
> `CGROUPS_ROOT_PidNamespaceBackward`
> --
>
> Key: MESOS-8552
> URL: https://issues.apache.org/jira/browse/MESOS-8552
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Andrei Budnik
>Assignee: Meng Zhu
>Priority: Critical
>  Labels: flaky-test, mesosphere
> Attachments: CGROUPS_ROOT_PidNamespaceBackward-badrun.txt, 
> CGROUPS_ROOT_PidNamespaceForward-badrun.txt
>
>
> {code:java}
> W0208 04:41:06.970381 348 containerizer.cpp:2335] Attempted to destroy 
> unknown container 001fdfaf-7dab-45b9-ab5c-baa3527d50fb
> ../../src/tests/slave_recovery_tests.cpp:5189: Failure
> termination.get() is NONE
> {code}
> {code:java}
> W0208 04:41:10.058873 348 containerizer.cpp:2335] Attempted to destroy 
> unknown container e51afc11-cffe-4861-ba13-b124116522b0
> ../../src/tests/slave_recovery_tests.cpp:5294: Failure
> termination.get() is NONE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2018-02-08 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357239#comment-16357239
 ] 

Greg Mann commented on MESOS-8522:
--

As a mitigation, we could re-scan the mount table after the first pass, and 
allow these failures if the failed entry no longer exists.

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8497) Docker parameter `name` does not work with Docker Containerizer.

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8497:
-

Assignee: Gilbert Song

> Docker parameter `name` does not work with Docker Containerizer.
> 
>
> Key: MESOS-8497
> URL: https://issues.apache.org/jira/browse/MESOS-8497
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jörg Schad
>Assignee: Gilbert Song
>Priority: Major
>  Labels: containerizer
> Attachments: agent.log, master.log
>
>
> When deploying a marathon app with Docker Containerizer (need to check Mesos 
> Containerizer) and the parameter name set, Mesos is not able to 
> recognize/control/kill the started container.
> Steps to reproduce 
>  # Deploy the below marathon app definition
>  #  Watch task being stuck in staging and mesos not being able to kill 
> it/communicate with it
>  ## 
> {quote}e.g., Agent Logs: W0126 18:38:50.00  4988 slave.cpp:6750] Failed 
> to get resource statistics for executor 
> ‘instana-agent.1a1f8d22-02c8-11e8-b607-923c3c523109’ of framework 
> 41f1b534-5f9d-4b5e-bb74-a0e387d5739f-0001: Failed to run ‘docker -H 
> unix:///var/run/docker.sock inspect 
> mesos-1c6f894d-9a3e-408c-8146-47ebab2f28be’: exited with status 1; 
> stderr=’Error: No such image, container or task: 
> mesos-1c6f894d-9a3e-408c-8146-47ebab2f28be{quote}
>  # Check on node and see container running, but not being recognized by mesos
> {noformat}
> {
> "id": "/docker-test",
> "instances": 1,
> "portDefinitions": [],
> "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
> "image": "ubuntu:16.04",
> "parameters": [
> {
> "key": "name",
> "value": "myname"
> }
> ]
> }
> },
> "cpus": 0.1,
> "mem": 128,
> "requirePorts": false,
> "networks": [],
> "healthChecks": [],
> "fetch": [],
> "constraints": [],
> "cmd": "sleep 1000"
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2018-02-08 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8522:
-

Assignee: Jie Yu

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`

2018-02-08 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357207#comment-16357207
 ] 

Benno Evers commented on MESOS-8550:


Andrei's analysis seems to be right, the code was indeed calling 
`leader->has_domain()` on an `Option` without checking that it was 
not `None` first.

 

I posted a fix in in the following review: https://reviews.apache.org/r/65571/

> Bug in `Master::detected()` leads to coredump in 
> `MasterZooKeeperTest.MasterInfoAddress`
> 
>
> Key: MESOS-8550
> URL: https://issues.apache.org/jira/browse/MESOS-8550
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master
>Reporter: Andrei Budnik
>Priority: Major
> Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file 
> ../../3rdparty/stout/include/stout/option.hpp, line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if 
> you are using GNU date ***
> 15:55:17 PC: @ 0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 
> 0x70427000) stack trace: ***
> 15:55:17 @ 0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session 
> expired
> 15:55:17 @ 0x70425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x70633000):ZOO_INFO@log_env@794: 
> Client 
> environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/1mHCvU
> 15:55:17 @ 0x7fff4f84f312 abort
> 15:55:17 2018-02-07 
> 07:55:24,945:39896(0x70633000):ZOO_INFO@zookeeper_init@827: Initiating 
> client connection, host=127.0.0.1:52197 sessionTimeout=1 
> watcher=0x10d916590 sessionId=0 sessionPasswd= context=0x7fe1bda706a0 
> flags=0
> 15:55:17 @ 0x7fff4f817368 __assert_rtn
> 15:55:17 @0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17 @0x10bbb04b5 Option<>::operator->()
> 15:55:17 @0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17 @0x10bf54558 
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoSB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17 @0x10bf54310 
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoSD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17 @0x10bf542bb 
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1E13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1DTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_OSG_OSK_N5cpp1416integer_sequenceImJXspT2_SO_
> 15:55:17 @0x10bf541f3 
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_ES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOS1C_
> 15:55:17 @0x10bf540bd 
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS4_6FutureI6OptionINS6_10MasterInfoSG_EEvRKNS4_3PIDIT_EEMSI_FvT0_EOT1_EUlOSE_PNS4_11ProcessBaseEE_JSE_NSt3__112placeholders4__phILi1EEJST_EEEDTclclsr3stdE7forwardISI_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSI_DpOS10_
> 15:55:17 @0x10bf54081 
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS5_6FutureI6OptionINS7_10MasterInfoSH_EEvRKNS5_3PIDIT_EEMSJ_FvT0_EOT1_EUlOSF_PNS5_11ProcessBaseEE_JSF_NSt3__112placeholders4__phILi1EEJSU_EEEvOSJ_DpOT0_
> 15:55:17 @0x10bf53e06 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINSA_10MasterInfoSK_EEvRKNS1_3PIDIT_EEMSM_FvT0_EOT1_EUlOSI_S3_E_JSI_NSt3__112placeholders4__phILi1EEEclEOS3_
> 15:55:17 @0x10ebf464f 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> 15:55:17 @0x10ebf44c4 process::ProcessBase::consume()
> 

[jira] [Created] (MESOS-8553) Implement a test to reproduce a bug in launch nested container call.

2018-02-08 Thread Andrei Budnik (JIRA)
Andrei Budnik created MESOS-8553:


 Summary: Implement a test to reproduce a bug in launch nested 
container call.
 Key: MESOS-8553
 URL: https://issues.apache.org/jira/browse/MESOS-8553
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Andrei Budnik


It's known that in some circumstances an attempt to launch a nested container 
session might fail with the following error message:
{code:java}
Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such 
file or directory
{code}
That message is written by [linux 
launcher|https://github.com/apache/mesos/blob/f7dbd29bd9809d1dd254041537ca875e7ea26613/src/slave/containerizer/mesos/launch.cpp#L742-L743]
 to stdout. This bug is most likely caused by 
[getMountNamespaceTarget()|https://github.com/apache/mesos/blob/f7dbd29bd9809d1dd254041537ca875e7ea26613/src/slave/containerizer/mesos/utils.cpp#L59].

Steps for the test could be:
1) Start a long running task in its own container (e.g. `sleep 1000`)
2) Start a new short-living nested container via `LAUNCH_NESTED_CONTAINER` 
(e.g. `echo echo`)

3) Call `WAIT_NESTED_CONTAINER` on that nested container

4) Start long-living nested container via `LAUNCH_NESTED_CONTAINER` (e.g. `cat`)
5) Kill that nested container via `KILL_NESTED_CONTAINER`
6) Start another long-living nested container via 
`LAUNCH_NESTED_CONTAINER_SESSION` 

(e.g. `cat`)
7) Attach to that container via `ATTACH_CONTAINER_INPUT` and write non-empty 
message M to container's stdin
8) Check the output of the nested container: it should contain message M

The bug might pop up during step 8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8279) Persistent volumes are not visible in Mesos UI using default executor on Linux.

2018-02-08 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356942#comment-16356942
 ] 

Qian Zhang edited comment on MESOS-8279 at 2/8/18 1:49 PM:
---

The above fix can handle this case: Framework launches a task group which has a 
task using a persistent volume disk, like below:
{code:java}
{
  "tasks":[
{
  "name": "test",
  "task_id": {"value" : "test"},
  "agent_id": {"value" : ""},
  "resources": [
{
  "name": "cpus",
  "type": "SCALAR",
  "scalar": {
"value": 0.1
  }
},
{
  "name": "mem",
  "type": "SCALAR",
  "scalar": {
"value": 32
  }
},
{
  "name": "disk",
  "type": "SCALAR",
  "scalar": {
"value": 1024
  },
  "disk": {
"persistence": {
  "id" : "pv1"
},
"volume": {
  "mode": "RW",
  "container_path": "xxx"
}
  }
}
  ],
  "command": {
"value": "echo hello > xxx/data && sleep 1000"
  }
}
  ]
}
{code}
But it can not handle the case that framework launches a task group inside 
which the executor use a persistent volume disk and the task's 
{{containerInfo}} has a volume of {{SANDBOX_PATH}} type (like below).
{code:java}
  "container": {
"type": "MESOS",
"volumes": [
  {
"mode": "RW",
"container_path": "xxx",
"source": {
  "type": "SANDBOX_PATH",
  "sandbox_path": {
"type": "PARENT",
"path": "foo"
  }
}
  }
]
  }
{code}
I think this is how Marathon launches a pod (task group). If you use Marathon 
to launch a pod and specify a PV for the container inside of the pod, actually 
the executor will have a disk resource which uses that PV and Marathon will 
create the PV and mount it into the executor's sandbox, and the task info 
created by Marathon will have a volume of {{SANDBOX_PATH}} type like the above 
so the task can share the PV with the executor. In this case, the fix to this 
ticket will not do the file attach since it can only handle the case of task 
using persistent volume disk resource, see [this 
code|https://github.com/apache/mesos/blob/32b85a2b06f676b68a16deaa8359ae64a1e8ead9/src/slave/slave.cpp#L1035:L1039]
 for details.

To fix this issue, I think we need to improve 
{{Slave::attachTaskVolumeDirectory}} to handle the task whose {{containerInfo}} 
has a volume of {{SANDBOX_PATH}} type, for such task, call {{Files::attach()}} 
to attach the executor's volume path to the task's volume path.


was (Author: qianzhang):
The above fix can handle this case: Framework launches a task group which has a 
task using a persistent volume disk, like below:
{code:java}
{
  "tasks":[
{
  "name": "test",
  "task_id": {"value" : "test"},
  "agent_id": {"value" : ""},
  "resources": [
{
  "name": "cpus",
  "type": "SCALAR",
  "scalar": {
"value": 0.1
  }
},
{
  "name": "mem",
  "type": "SCALAR",
  "scalar": {
"value": 32
  }
},
{
  "name": "disk",
  "type": "SCALAR",
  "scalar": {
"value": 1024
  },
  "disk": {
"persistence": {
  "id" : "pv1"
},
"volume": {
  "mode": "RW",
  "container_path": "xxx"
}
  }
}
  ],
  "command": {
"value": "echo hello > xxx/data && sleep 1000"
  }
}
  ]
}
{code}
But it can not handle the case that framework launches a task group inside 
which the executor use a persistent volume disk and the task's 
{{containerInfo}} has a volume of {{SANDBOX_PATH}} type (like below).
{code:java}
  "container": {
"type": "MESOS",
"volumes": [
  {
"mode": "RW",
"container_path": "xxx",
"source": {
  "type": "SANDBOX_PATH",
  "sandbox_path": {
"type": "PARENT",
"path": "foo"
  }
}
  }
]
  }
{code}
 I think this is how Marathon launches a pod (task group). If you use Marathon 
to launch a pod and specify a PV for the container inside of the pod, the 
executor actually will have a disk resource which has that PV and Marathon will 
create the PV and mount it into the executor's sandbox, and the task info 
created by Marathon will have a volume of {{SANDBOX_PATH}} type like the above 
so that task can share the PV with the executor. In this case, the fix to this 
ticket will not do the file attach since it can only 

[jira] [Commented] (MESOS-8279) Persistent volumes are not visible in Mesos UI using default executor on Linux.

2018-02-08 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356942#comment-16356942
 ] 

Qian Zhang commented on MESOS-8279:
---

The above fix can handle this case: Framework launches a task group which has a 
task using a persistent volume disk, like below:
{code:java}
{
  "tasks":[
{
  "name": "test",
  "task_id": {"value" : "test"},
  "agent_id": {"value" : ""},
  "resources": [
{
  "name": "cpus",
  "type": "SCALAR",
  "scalar": {
"value": 0.1
  }
},
{
  "name": "mem",
  "type": "SCALAR",
  "scalar": {
"value": 32
  }
},
{
  "name": "disk",
  "type": "SCALAR",
  "scalar": {
"value": 1024
  },
  "disk": {
"persistence": {
  "id" : "pv1"
},
"volume": {
  "mode": "RW",
  "container_path": "xxx"
}
  }
}
  ],
  "command": {
"value": "echo hello > xxx/data && sleep 1000"
  }
}
  ]
}
{code}
But it can not handle the case that framework launches a task group inside 
which the executor use a persistent volume disk and the task's 
{{containerInfo}} has a volume of {{SANDBOX_PATH}} type (like below).
{code:java}
  "container": {
"type": "MESOS",
"volumes": [
  {
"mode": "RW",
"container_path": "xxx",
"source": {
  "type": "SANDBOX_PATH",
  "sandbox_path": {
"type": "PARENT",
"path": "foo"
  }
}
  }
]
  }
{code}
 I think this is how Marathon launches a pod (task group). If you use Marathon 
to launch a pod and specify a PV for the container inside of the pod, the 
executor actually will have a disk resource which has that PV and Marathon will 
create the PV and mount it into the executor's sandbox, and the task info 
created by Marathon will have a volume of {{SANDBOX_PATH}} type like the above 
so that task can share the PV with the executor. In this case, the fix to this 
ticket will not do the file attach since it can only handle the case of task 
using persistent volume disk resource, see [this 
code|https://github.com/apache/mesos/blob/32b85a2b06f676b68a16deaa8359ae64a1e8ead9/src/slave/slave.cpp#L1035:L1039]
 for details.

To fix this issue, I was thinking to improve 
{{Slave::attachTaskVolumeDirectory}} to handle the task whose {{containerInfo}} 
has a volume of {{SANDBOX_PATH}} type: For such task, call {{Files::attach()}} 
to attach the executor's volume path to the task's volume path, but the problem 
is, the file attach will fail because the executor's volume path has not been 
created yet at this moment (sending task to executor), instead it will be 
created by the {{volume/sandbox_path}} isolator when the nested container 
corresponding to the task is launched.

> Persistent volumes are not visible in Mesos UI using default executor on 
> Linux.
> ---
>
> Key: MESOS-8279
> URL: https://issues.apache.org/jira/browse/MESOS-8279
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.3.1, 1.4.1
>Reporter: Jie Yu
>Assignee: Qian Zhang
>Priority: Major
> Fix For: 1.5.0, 1.6.0
>
>
> The reason is because on Linux, if multiple containers in a default executor 
> want to share a persistent volume, it'll use SANDBOX_PATH volume source with 
> type PARENT. This will be translated into a bind mount in the nested 
> container's mount namespace, thus not visible in the host mount namespace. 
> Mesos UI operates in the host mount namespace.
> One potential solution for that is to create a symlink (instead of just a 
> mkdir) in the sandbox. The symlink will be shadowed by the bind mount in the 
> nested container, but in the host mount namespace, it'll points to the 
> corresponding persistent volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8552) Failing `CGROUPS_ROOT_PidNamespaceForward` and `CGROUPS_ROOT_PidNamespaceBackward`

2018-02-08 Thread Andrei Budnik (JIRA)
Andrei Budnik created MESOS-8552:


 Summary: Failing `CGROUPS_ROOT_PidNamespaceForward` and 
`CGROUPS_ROOT_PidNamespaceBackward`
 Key: MESOS-8552
 URL: https://issues.apache.org/jira/browse/MESOS-8552
 Project: Mesos
  Issue Type: Bug
Reporter: Andrei Budnik


{code:java}
W0208 04:41:06.970381 348 containerizer.cpp:2335] Attempted to destroy unknown 
container 001fdfaf-7dab-45b9-ab5c-baa3527d50fb
../../src/tests/slave_recovery_tests.cpp:5189: Failure
termination.get() is NONE
{code}
{code:java}
W0208 04:41:10.058873 348 containerizer.cpp:2335] Attempted to destroy unknown 
container e51afc11-cffe-4861-ba13-b124116522b0 
../../src/tests/slave_recovery_tests.cpp:5294: Failure termination.get() is NONE
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)