[jira] [Assigned] (MESOS-6636) Validate that tasks / executors / reservations do not mix Resource.allocation_info.roles.
[ https://issues.apache.org/jira/browse/MESOS-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo reassigned MESOS-6636: -- Assignee: Jay Guo > Validate that tasks / executors / reservations do not mix > Resource.allocation_info.roles. > - > > Key: MESOS-6636 > URL: https://issues.apache.org/jira/browse/MESOS-6636 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Benjamin Mahler >Assignee: Jay Guo > > With support for multi-role frameworks, we need to make sure that individual > tasks and executors cannot mix roles. Likewise, we do not want to allow a > scheduler to make a reservation based on resources with different allocated > roles. > We will however allow tasks from one role to run on executors from another > role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6637) Validate that schedulers cannot perform operations on offers with different allocation roles.
[ https://issues.apache.org/jira/browse/MESOS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo reassigned MESOS-6637: -- Assignee: Jay Guo > Validate that schedulers cannot perform operations on offers with different > allocation roles. > - > > Key: MESOS-6637 > URL: https://issues.apache.org/jira/browse/MESOS-6637 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Benjamin Mahler >Assignee: Jay Guo > > With support for multi-role frameworks, offers contain allocation info > (currently just the role that the offer is being made to). > In theory, schedulers could perform offer operations across multiple roles, > so long as the tasks, executors, and reservations individually don't mix > roles. However, there doesn't seem to be a clear reason to allow this. So, we > will validate against combining offers from multiple roles. This also makes > it semantically consistent with single-role frameworks (since they do not do > this either). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6742) Adding support for s390x architecture
[ https://issues.apache.org/jira/browse/MESOS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayanampudi Varsha updated MESOS-6742: - Description: There are 2 issues: 1. LdcacheTest.Parse test case fails on s390x machines. 2. From the value of flag docker_registry in slave/flags.cpp, amd64 images get downloaded due to which test cases fail on s390x with "Exec format Error" was: There are 2 issues: 1. LdcacheTest.Parse test case fails on s390x machines. 2. From the value of flag docker_registry in slave.cpp, amd64 images get downloaded due to which test cases fail on s390x with "Exec format Error" > Adding support for s390x architecture > -- > > Key: MESOS-6742 > URL: https://issues.apache.org/jira/browse/MESOS-6742 > Project: Mesos > Issue Type: Bug >Reporter: Ayanampudi Varsha > > There are 2 issues: > 1. LdcacheTest.Parse test case fails on s390x machines. > 2. From the value of flag docker_registry in slave/flags.cpp, amd64 images > get downloaded due to which test cases fail on s390x with "Exec format Error" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6695) Light up Windows agent tests
[ https://issues.apache.org/jira/browse/MESOS-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730900#comment-15730900 ] Joseph Wu commented on MESOS-6695: -- {code} commit 1648491e2f194f5ba9d62cb1e099066fb7f16272 Author: Alex ClemmerDate: Wed Dec 7 16:13:22 2016 -0800 Changed registrar backend to `in_memory` by default in tests. Currently, all instances of the Master in tests set the `--registry` flag to the default value (`replicated_log`). When the `replicated_log` value is set, Masters in tests will back the registrar with the disk, specifically via levelDB. Only a small subset of tests actually require the `replicated_log`; these are tests which expect the master to persist data across failovers. A majority of tests can be run with an `in_memory` registrar backend. Changing the default to `in_memory` will serve multiple purposes: * It will speed up the test suite by ~10-15%. * It will reduce the flakiness observed on the ASF CI. These machines sometimes run into disk contention, which causes registrar reads/write to time out. * It will unblock a majority of tests from being run on Windows, which currently does not implement a persistent registrar backend. This review supercedes and revives: https://reviews.apache.org/r/41665/ Review: https://reviews.apache.org/r/54453/ {code} > Light up Windows agent tests > > > Key: MESOS-6695 > URL: https://issues.apache.org/jira/browse/MESOS-6695 > Project: Mesos > Issue Type: Epic >Reporter: Alex Clemmer >Assignee: Alex Clemmer > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3447) Port svn_tests
[ https://issues.apache.org/jira/browse/MESOS-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730898#comment-15730898 ] Joseph Wu commented on MESOS-3447: -- {code} commit b5b1ead3a8c28b8b65f514fd7b030324a735a26d Author: Alex ClemmerDate: Wed Dec 7 16:57:37 2016 -0800 Windows: Added APR include path to libprocess configuration. Partially addresses MESOS-3447, as APR is a dependency of the SVN facilities of Stout. On Unix builds, APR is expected to have been installed on the system prior to building Mesos (usually by a package manager). Since Windows does not have a package manager or a reasonble way of automatically discovering where a package is installed (aside from the registry), our CMake build system takes it upon itself to manage these system dependencies. This means that on Windows, we need to configure the build to look for the APR headers in our custom-downloaded APR repository. Currently, though, we are not doing this, so when we'll hit a compile-time error if we try to build (e.g.) `svn.hpp`. This commit will introduce the APR include paths as part of the build against Stout. Since Stout is a header-only library, it is (right now) incumbent on whoever is bundling Stout up to manage the third-party dependencies of Stout. In our current implementation, libprocess manages the APR dependency for Stout, hence, we put this logic in libprocess. Review: https://reviews.apache.org/r/54462/ {code} > Port svn_tests > -- > > Key: MESOS-3447 > URL: https://issues.apache.org/jira/browse/MESOS-3447 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: mesosphere, stout > > Should be trivial if we have libapr and libsvn building and linking correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6717) Add Windows support to agent test harness
[ https://issues.apache.org/jira/browse/MESOS-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730897#comment-15730897 ] Joseph Wu commented on MESOS-6717: -- {code} commit 4c0e453296e3ac7c5eda48a98eb7ad570c303d0a Author: Alex ClemmerDate: Wed Dec 7 17:21:01 2016 -0800 Windows: Fixed default isolators in Agent. This commit sets the default isolators on Windows to Windows-specific values, rather than POSIX-specific values. This is a convenience for users on Windows (whom no longer need to specify `--isolation=windows/cpu,filesystem/windows`) and will allow tests to exercise the default set of Agent flags. In particular, this commit will transition Windows builds of the agent away from using the `posix/cpu`, `posix/mem`, and `filesystem/posix` isolators by default, replacing them with `windows/cpu` and `filesystem/windows` (sadly, there is not yet a memory isolator for Windows). Review: https://reviews.apache.org/r/54470/ {code} > Add Windows support to agent test harness > - > > Key: MESOS-6717 > URL: https://issues.apache.org/jira/browse/MESOS-6717 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: microsoft, windows-mvp > > Of particular interest is in `src/tests/CMakeLists.txt` is support enough of > the following that we can successfully run agent tests: > TEST_HELPER_SRC > MESOS_TESTS_UTILS_SRC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6756) I/O switchboard should deal with the case when reaping of the server failed.
[ https://issues.apache.org/jira/browse/MESOS-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-6756: -- Story Points: 3 > I/O switchboard should deal with the case when reaping of the server failed. > > > Key: MESOS-6756 > URL: https://issues.apache.org/jira/browse/MESOS-6756 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Jie Yu > Fix For: 1.2.0 > > > Currently, we don't deal with the reaping failure, which we should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6750) Metrics on the Agent view of the Mesos web UI flickers between empty and non-empty states
[ https://issues.apache.org/jira/browse/MESOS-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730835#comment-15730835 ] haosdent commented on MESOS-6750: - Thanks a lot! Let me verify your patch :- ) > Metrics on the Agent view of the Mesos web UI flickers between empty and > non-empty states > - > > Key: MESOS-6750 > URL: https://issues.apache.org/jira/browse/MESOS-6750 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.0.2, 1.1.0 >Reporter: Joseph Wu >Assignee: haosdent >Priority: Minor > Attachments: patch.diff > > > When viewing a specific agent on the Mesos WebUI, the metrics panel on the > left side of the UI will alternate between having values and being empty. > This is due to two different callbacks that run: > * This one sets the metrics into the {{$scope.state}} variable: > https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L564-L577 > * This one blows away the {{$scope.state}} in favor of a new one: > https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L521 > The metrics callback should simply assign to a different variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6757) Consider using CMake to configure test scripts in the `bin/` diretory
Alex Clemmer created MESOS-6757: --- Summary: Consider using CMake to configure test scripts in the `bin/` diretory Key: MESOS-6757 URL: https://issues.apache.org/jira/browse/MESOS-6757 Project: Mesos Issue Type: Bug Components: cmake Reporter: Alex Clemmer Assignee: Alex Clemmer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6756) I/O switchboard should deal with the case when reaping of the server failed.
Jie Yu created MESOS-6756: - Summary: I/O switchboard should deal with the case when reaping of the server failed. Key: MESOS-6756 URL: https://issues.apache.org/jira/browse/MESOS-6756 Project: Mesos Issue Type: Bug Reporter: Jie Yu Assignee: Jie Yu Currently, we don't deal with the reaping failure, which we should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5646) Build `network/cni` isolator with `libnl` support
[ https://issues.apache.org/jira/browse/MESOS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730703#comment-15730703 ] Qian Zhang commented on MESOS-5646: --- [~avin...@mesosphere.io], I am not working on it now, please feel free to take it over :-) > Build `network/cni` isolator with `libnl` support > - > > Key: MESOS-5646 > URL: https://issues.apache.org/jira/browse/MESOS-5646 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Qian Zhang > Labels: mesosphere > > Currently, the `network/cni` isolator does not have the ability to collect > network statistics for containers launched on a CNI network. We need to give > the `network/cni` isolator the ability to query interfaces, route tables and > statistics in the containers network namespace. To achieve this the > `network/cni` isolator will need to talk `netlink`. > For enabling `netlink` API we need the `network/cni` isolator to be built > with libnl support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6686) Add comments about the meanings of TaskStatus.Reason in mesos.proto
[ https://issues.apache.org/jira/browse/MESOS-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6686: - Labels: documentation newbie (was: ) Component/s: documentation > Add comments about the meanings of TaskStatus.Reason in mesos.proto > --- > > Key: MESOS-6686 > URL: https://issues.apache.org/jira/browse/MESOS-6686 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: haosdent >Priority: Minor > Labels: documentation, newbie > > Some enums in {{TaskStatus.Reason}} are not clear and we should add some > comments in it to describe what it means and when it happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5646) Build `network/cni` isolator with `libnl` support
[ https://issues.apache.org/jira/browse/MESOS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730479#comment-15730479 ] Avinash Sridharan commented on MESOS-5646: -- Hi Qian, Are you still working on the review? See that [~jieyu] has a comment but not much progress on the review. Wanted to finish this ticket up so that we can progress on finishing support for network statistics. If you don't have cycles will take over. Thanks, Avinash > Build `network/cni` isolator with `libnl` support > - > > Key: MESOS-5646 > URL: https://issues.apache.org/jira/browse/MESOS-5646 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Qian Zhang > Labels: mesosphere > > Currently, the `network/cni` isolator does not have the ability to collect > network statistics for containers launched on a CNI network. We need to give > the `network/cni` isolator the ability to query interfaces, route tables and > statistics in the containers network namespace. To achieve this the > `network/cni` isolator will need to talk `netlink`. > For enabling `netlink` API we need the `network/cni` isolator to be built > with libnl support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5533) Agent fails to start on CentOS 6 due to missing cgroup hierarchy.
[ https://issues.apache.org/jira/browse/MESOS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730467#comment-15730467 ] Avinash Sridharan commented on MESOS-5533: -- [~karya] can we mark this as "Resolved" "Not reproducible". Haven't seen this being hit in the CI for quite some time, and we don't have any data to make progress on this? > Agent fails to start on CentOS 6 due to missing cgroup hierarchy. > - > > Key: MESOS-5533 > URL: https://issues.apache.org/jira/browse/MESOS-5533 > Project: Mesos > Issue Type: Bug > Components: build, isolation >Reporter: Kapil Arya >Assignee: Jie Yu > Labels: mesosphere > > With the network CNI isolator, agent now _requires_ cgroups to be installed > on the system. Can we add some check(s) to either automatically disable CNI > module if cgroup hierarchies are not available or ask the user to > install/enable cgroup hierarchies. > On CentOS 6, cgroup tools aren't installed by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5647) Expose network statistics for containers on CNI network in the `network/cni` isolator.
[ https://issues.apache.org/jira/browse/MESOS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5647: - Summary: Expose network statistics for containers on CNI network in the `network/cni` isolator. (was: Expose a network statistics in the `network/cni` isolator.) > Expose network statistics for containers on CNI network in the `network/cni` > isolator. > -- > > Key: MESOS-5647 > URL: https://issues.apache.org/jira/browse/MESOS-5647 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > We need to implement the `usage` method in the `network/cni` isolator to > expose metrics relating to a containers network traffic. > On receiving a request for getting `usage` for a a given container the > `network/cni` isolator could use NETLINK system calls to query the kernel for > interface and routing statistics for a given container's network namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6755) Capturing various tickets for improving CNI support for `MesosContainerizer`
Avinash Sridharan created MESOS-6755: Summary: Capturing various tickets for improving CNI support for `MesosContainerizer` Key: MESOS-6755 URL: https://issues.apache.org/jira/browse/MESOS-6755 Project: Mesos Issue Type: Epic Components: containerization Reporter: Avinash Sridharan Assignee: Avinash Sridharan This is a ticket to capture the ongoing effort to improve CNI support for `MesosContainerizer`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5647) Expose a network statistics in the `network/cni` isolator.
[ https://issues.apache.org/jira/browse/MESOS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5647: - Description: We need to implement the `usage` method in the `network/cni` isolator to expose metrics relating to a containers network traffic. On receiving a request for getting `usage` for a a given container the `network/cni` isolator could use NETLINK system calls to query the kernel for interface and routing statistics for a given container's network namespace. was: We need a statistics endpoint in the `network/cni` isolator to expose metrics relating to a containers network traffic. On receiving a request for a given container the `network/cni` isolator could use NETLINK system calls to query the kernel for interface and routing statistics for a given container's network namespace. > Expose a network statistics in the `network/cni` isolator. > -- > > Key: MESOS-5647 > URL: https://issues.apache.org/jira/browse/MESOS-5647 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > We need to implement the `usage` method in the `network/cni` isolator to > expose metrics relating to a containers network traffic. > On receiving a request for getting `usage` for a a given container the > `network/cni` isolator could use NETLINK system calls to query the kernel for > interface and routing statistics for a given container's network namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5647) Expose a network statistics in the `network/cni` isolator.
[ https://issues.apache.org/jira/browse/MESOS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan reassigned MESOS-5647: Assignee: Avinash Sridharan (was: Qian Zhang) > Expose a network statistics in the `network/cni` isolator. > -- > > Key: MESOS-5647 > URL: https://issues.apache.org/jira/browse/MESOS-5647 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > We need a statistics endpoint in the `network/cni` isolator to expose metrics > relating to a containers network traffic. > On receiving a request for a given container the `network/cni` isolator could > use NETLINK system calls to query the kernel for interface and routing > statistics for a given container's network namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5647) Expose a network statistics in the `network/cni` isolator.
[ https://issues.apache.org/jira/browse/MESOS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5647: - Summary: Expose a network statistics in the `network/cni` isolator. (was: Expose a statistics endpoint on the `network/cni` isolator.) > Expose a network statistics in the `network/cni` isolator. > -- > > Key: MESOS-5647 > URL: https://issues.apache.org/jira/browse/MESOS-5647 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: linux >Reporter: Avinash Sridharan >Assignee: Qian Zhang > Labels: mesosphere > > We need a statistics endpoint in the `network/cni` isolator to expose metrics > relating to a containers network traffic. > On receiving a request for a given container the `network/cni` isolator could > use NETLINK system calls to query the kernel for interface and routing > statistics for a given container's network namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6567) Actively Scan for CNI Configurations
[ https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan reassigned MESOS-6567: Assignee: Avinash Sridharan > Actively Scan for CNI Configurations > > > Key: MESOS-6567 > URL: https://issues.apache.org/jira/browse/MESOS-6567 > Project: Mesos > Issue Type: Improvement >Reporter: Dan Osborne >Assignee: Avinash Sridharan > > Mesos-Agent currently loads the CNI configs into memory at startup. After > this point, new configurations that are added will remain unknown to the > Mesos Agent process until it is restarted. > This ticket is to request that the Mesos Agent process can the CNI config > directory each time it is networking a task, so that modifying, adding, and > removing networks will not require a slave reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6754) Include command in task's state.json entry
Michael Gummelt created MESOS-6754: -- Summary: Include command in task's state.json entry Key: MESOS-6754 URL: https://issues.apache.org/jira/browse/MESOS-6754 Project: Mesos Issue Type: Improvement Components: master Reporter: Michael Gummelt I often would like to determine which command a task is running w/o having to SSH into the box and {{ps}}. I'm currently doing this for HDFS, for example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6665) io::redirect might cause stack overflow.
[ https://issues.apache.org/jira/browse/MESOS-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730198#comment-15730198 ] Adam B commented on MESOS-6665: --- Any update [~benjaminhindman]? > io::redirect might cause stack overflow. > > > Key: MESOS-6665 > URL: https://issues.apache.org/jira/browse/MESOS-6665 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Benjamin Hindman > > Can reproduce this on macOS sierra: > {noformat} > [--] 6 tests from IOTest > [ RUN ] IOTest.Poll > [ OK ] IOTest.Poll (0 ms) > [ RUN ] IOTest.Read > [ OK ] IOTest.Read (3 ms) > [ RUN ] IOTest.BufferedRead > [ OK ] IOTest.BufferedRead (5 ms) > [ RUN ] IOTest.Write > [ OK ] IOTest.Write (1 ms) > [ RUN ] IOTest.Redirect > make[6]: *** [check-local] Illegal instruction: 4 > make[5]: *** [check-am] Error 2 > make[4]: *** [check-recursive] Error 1 > make[3]: *** [check] Error 2 > make[2]: *** [check-recursive] Error 1 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > (reverse-i-search)`k': make check -j3 > Jies-MacBook-Pro:build jie$ lldb 3rdparty/libprocess/libprocess-tests > (lldb) target create "3rdparty/libprocess/libprocess-tests" > Current executable set to '3rdparty/libprocess/libprocess-tests' (x86_64). > (lldb) run --gtest_filter=IOTest.Redirect > Process 26064 launched: > '/Users/jie/workspace/dist/mesos/build/3rdparty/libprocess/libprocess-tests' > (x86_64) > Note: Google Test filter = IOTest.Redirect > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from IOTest > [ RUN ] IOTest.Redirect > Process 26064 stopped > * thread #2: tid = 0x152c5c, 0x7fffd6d463e0 > libsystem_malloc.dylib`szone_malloc_should_clear + 78, stop reason = > EXC_BAD_ACCESS (code=2, address=0x7eb16ff8) > frame #0: 0x7fffd6d463e0 > libsystem_malloc.dylib`szone_malloc_should_clear + 78 > libsystem_malloc.dylib`szone_malloc_should_clear: > -> 0x7fffd6d463e0 <+78>: movq %rax, -0x78(%rbp) > 0x7fffd6d463e4 <+82>: movq 0x10f0(%r12), %r13 > 0x7fffd6d463ec <+90>: leaq (%rax,%rax,4), %r14 > 0x7fffd6d463f0 <+94>: shlq $0x9, %r14 > (lldb) bt > . > frame #2794: 0x7fffd6ddb221 libsystem_pthread.dylib`thread_start + 13 > {noformat} > Change the test to redirect just 1KB data will hide the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6753) Refactor duplicated code for framework registration in master
[ https://issues.apache.org/jira/browse/MESOS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-6753: --- Description: It would be nice to refactor the code and eliminate some/all of these redundancies: * {{Master::activateRecoveredFramework}} and {{Master::\_failoverFramework}} duplicate some code. * {{Master::\_subscribe}} for PID-based schedulers has a code path that contains code that is _very_ similar to the {{Master::failoverFramework}} logic, but is not identical. * The logic around {{updateConnection}} could stand to be cleaned up. e.g., it seems like {{updateConnection}} could/should be responsible for linking to the target PID and/or setting up the {{closed}} callback (for PID or HTTP schedulers, respectively). was: It would be nice to refactor the code and eliminate some/all of these redundancies: * {{Master::activateRecoveredFramework}} and {{Master::_failoverFramework}} duplicate some code. * {{Master::_subscribe}} for PID-based schedulers has a code path that contains code that is _very_ similar to the {{Master::failoverFramework}} logic, but is not identical. * The logic around {{updateConnection}} could stand to be cleaned up. e.g., it seems like {{updateConnection}} could/should be responsible for linking to the target PID and/or setting up the {{closed}} callback (for PID or HTTP schedulers, respectively). > Refactor duplicated code for framework registration in master > - > > Key: MESOS-6753 > URL: https://issues.apache.org/jira/browse/MESOS-6753 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Neil Conway > Labels: mesosphere > > It would be nice to refactor the code and eliminate some/all of these > redundancies: > * {{Master::activateRecoveredFramework}} and {{Master::\_failoverFramework}} > duplicate some code. > * {{Master::\_subscribe}} for PID-based schedulers has a code path that > contains code that is _very_ similar to the {{Master::failoverFramework}} > logic, but is not identical. > * The logic around {{updateConnection}} could stand to be cleaned up. e.g., > it seems like {{updateConnection}} could/should be responsible for linking to > the target PID and/or setting up the {{closed}} callback (for PID or HTTP > schedulers, respectively). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6753) Refactor duplicated code for framework registration in master
Neil Conway created MESOS-6753: -- Summary: Refactor duplicated code for framework registration in master Key: MESOS-6753 URL: https://issues.apache.org/jira/browse/MESOS-6753 Project: Mesos Issue Type: Bug Components: master Reporter: Neil Conway It would be nice to refactor the code and eliminate some/all of these redundancies: * {{Master::activateRecoveredFramework}} and {{Master::_failoverFramework}} duplicate some code. * {{Master::_subscribe}} for PID-based schedulers has a code path that contains code that is _very_ similar to the {{Master::failoverFramework}} logic, but is not identical. * The logic around {{updateConnection}} could stand to be cleaned up. e.g., it seems like {{updateConnection}} could/should be responsible for linking to the target PID and/or setting up the {{closed}} callback (for PID or HTTP schedulers, respectively). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6752) Add a `post()` overload to libprocess for streaming requests
Anand Mazumdar created MESOS-6752: - Summary: Add a `post()` overload to libprocess for streaming requests Key: MESOS-6752 URL: https://issues.apache.org/jira/browse/MESOS-6752 Project: Mesos Issue Type: Improvement Components: HTTP API, libprocess Reporter: Anand Mazumdar Currently, the {{post}}/{{streaming::post}} overloads in [libprocess | https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/http.hpp] don't work for streaming requests. The {{streaming::post}} overload works only for streaming responses. We should add another overload to handle streaming requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6746) IOSwitchboard doesn't properly flush data on ATTACH_CONTAINER_OUTPUT
[ https://issues.apache.org/jira/browse/MESOS-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6746: -- Shepherd: Vinod Kone Sprint: Mesosphere Sprint 47 > IOSwitchboard doesn't properly flush data on ATTACH_CONTAINER_OUTPUT > > > Key: MESOS-6746 > URL: https://issues.apache.org/jira/browse/MESOS-6746 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Anand Mazumdar > Labels: debugging, mesosphere > Fix For: 1.2.0 > > > Currently we are doing a close on the write end of all connection pipes when > we exit the switchboard, but we don't wait until the read is flushed before > exiting. This can cause some data to get dropped since the process may exit > before the reader is flushed. The current code is: > {noformat} > void IOSwitchboardServerProcess::finalize() > { > foreach (HttpConnection& connection, outputConnections) { > connection.close(); > } > > if (failure.isSome()) { > promise.fail(failure->message); > } else { > promise.set(Nothing()); > } > } > {noformat} > We should change it to: > {noformat} > void IOSwitchboardServerProcess::finalize() > { > foreach (HttpConnection& connection, outputConnections) { > connection.close(); > connection.closed().await(); > } > > if (failure.isSome()) { > promise.fail(failure->message); > } else { > promise.set(Nothing()); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6635) Update allocator to handle multi-role frameworks.
[ https://issues.apache.org/jira/browse/MESOS-6635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-6635: -- Assignee: Benjamin Mahler (was: Jay Guo) > Update allocator to handle multi-role frameworks. > - > > Key: MESOS-6635 > URL: https://issues.apache.org/jira/browse/MESOS-6635 > Project: Mesos > Issue Type: Task >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler > > The allocator needs to be adjusted once we allow frameworks to have multiple > roles: > (1) When adding a framework, we need to store all of its roles and add it to > multiple role sorters. > (2) We will CHECK that the framework does not modify its roles when updating > the framework (much like we do for single-role frameworks). > (3) When performing an allocation, the allocator will set > allocation_info.role. When recovering resources, the allocator will unset > allocation_info.role. > (4) The allocator will send AllocationInfo alongside offers that it sends to > the master, so that the master can easily augment {{Offer}} with allocation > info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6751) Mesos should allow for selective environment inheritance.
Till Toenshoff created MESOS-6751: - Summary: Mesos should allow for selective environment inheritance. Key: MESOS-6751 URL: https://issues.apache.org/jira/browse/MESOS-6751 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff We have often run into issues with environment variables inherited by subprocesses which in certain setups cause problems. VERY recent examples are: - MESOS-6747 - MESOS-6748 The pattern for solving an inheritance that covers bases like PATH, LD_LIBRARY_PATH and DYLD_LIBRARY_PATH but at the same time carves out traps like LIBPROCESS_-related variables and maybe also MESOS_-related variables is relatively simple. {noformat} mapenvironment; foreachpair (const string& key, const string& value, os::environment()) { if (!strings::startsWith(key, "LIBPROCESS_") && !strings::startsWith(key, "MESOS_")) { environment.emplace(key, value); } } {noformat} But maybe we can somehow force the use of such pattern to make this kind of bug less frequent on new code that forks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6750) Metrics on the Agent view of the Mesos web UI flickers between empty and non-empty states
[ https://issues.apache.org/jira/browse/MESOS-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-6750: - Attachment: patch.diff Attached a diff to show the parts of the JS that will probably be affected. > Metrics on the Agent view of the Mesos web UI flickers between empty and > non-empty states > - > > Key: MESOS-6750 > URL: https://issues.apache.org/jira/browse/MESOS-6750 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.0.2, 1.1.0 >Reporter: Joseph Wu >Assignee: haosdent >Priority: Minor > Attachments: patch.diff > > > When viewing a specific agent on the Mesos WebUI, the metrics panel on the > left side of the UI will alternate between having values and being empty. > This is due to two different callbacks that run: > * This one sets the metrics into the {{$scope.state}} variable: > https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L564-L577 > * This one blows away the {{$scope.state}} in favor of a new one: > https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L521 > The metrics callback should simply assign to a different variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6750) Metrics on the Agent view of the Mesos web UI flickers between empty and non-empty states
Joseph Wu created MESOS-6750: Summary: Metrics on the Agent view of the Mesos web UI flickers between empty and non-empty states Key: MESOS-6750 URL: https://issues.apache.org/jira/browse/MESOS-6750 Project: Mesos Issue Type: Bug Components: webui Affects Versions: 1.1.0, 1.0.2 Reporter: Joseph Wu Assignee: haosdent Priority: Minor When viewing a specific agent on the Mesos WebUI, the metrics panel on the left side of the UI will alternate between having values and being empty. This is due to two different callbacks that run: * This one sets the metrics into the {{$scope.state}} variable: https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L564-L577 * This one blows away the {{$scope.state}} in favor of a new one: https://github.com/apache/mesos/blob/1.1.x/src/webui/master/static/js/controllers.js#L521 The metrics callback should simply assign to a different variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
[ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729768#comment-15729768 ] Joseph Wu commented on MESOS-6743: -- Just food for thought: A timeout and retry for {{docker stop}} is definitely something we want to add (1). Suppose however, that {{docker stop}} is completely and forever broken. In this case, it may be best for the executor to *kill the agent* (or somehow trigger the agent's death). When the agent is restarted, it will then detect some orphan docker tasks (given {{--docker_kill_orphans}}), and attempt to kill them. If that fails, the agent will fail to recover and start flapping (restart, detect orphans, fail to kill, suicide, ...). ^ This is preferable to me, compared to (2) and (3). > Docker executor hangs forever if `docker stop` fails. > - > > Key: MESOS-6743 > URL: https://issues.apache.org/jira/browse/MESOS-6743 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.1, 1.1.0 >Reporter: Alexander Rukletsov > Labels: mesosphere > > If {{docker stop}} finishes with an error status, the executor should catch > this and react instead of indefinitely waiting for {{reaped}} to return. > An interesting question is _how_ to react. Here are possible solutions. > 1. Retry {{docker stop}}. In this case it is unclear how many times to retry > and what to do if {{docker stop}} continues to fail. > 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. > However, in this case it is unclear what status updates we should send: > {[TASK_KILLING}} for every kill retry? an extra update when we failed to kill > a task? or set a specific reason in {{TASK_KILLING}}? > 3. Clean up and exit. In this case we should make sure the task container is > killed or notify the framework and the operator that the container may still > be running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6749) Update master and agent endpoints to expose FrameworkInfo.roles.
Benjamin Mahler created MESOS-6749: -- Summary: Update master and agent endpoints to expose FrameworkInfo.roles. Key: MESOS-6749 URL: https://issues.apache.org/jira/browse/MESOS-6749 Project: Mesos Issue Type: Task Components: agent, master Reporter: Benjamin Mahler Assignee: Benjamin Bannier With the addition of the FrameworkInfo.roles field, all of the endpoints that expose the framework information need to be updated to expose this additional field. It should be the case that for the v1-style operator calls, the new field will be automatically visible thanks to the direct mapping from protobuf (we should verify this). We can track the updates to metrics separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration
[ https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-6676: --- Shepherd: Vinod Kone > Always re-link with scheduler during re-registration > > > Key: MESOS-6676 > URL: https://issues.apache.org/jira/browse/MESOS-6676 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Scenario: > # Framework registers with master using a non-zero {{failover_timeout}} and > is assigned a FrameworkID. > # The master sees an {{ExitedEvent}} for the master->scheduler link. This > could happen due to some transient network error, e.g., 1-way partition. The > master sends a {{FrameworkErrorMessage}} to the framework. The master marks > the framework as disconnected, but keeps the {{Framework*}} for it around in > {{frameworks.registered}}. > # The framework doesn't receive the {{FrameworkErrorMessage}} because it is > dropped by the network. > # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master > link, but it ignores this anyway (see MESOS-887). > # The scheduler sees a new-master-detected event and re-registers with the > master. It doesn _not_ set the {{force}} flag. This means we follow [this > code > path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771] > in the master, which does _not_ relink with the scheduler. > The result is that scheduler re-registration succeds, but the master -> > scheduler link is never re-established. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6676) Always re-link with scheduler during re-registration
[ https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-6676: -- Assignee: Neil Conway > Always re-link with scheduler during re-registration > > > Key: MESOS-6676 > URL: https://issues.apache.org/jira/browse/MESOS-6676 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Scenario: > # Framework registers with master using a non-zero {{failover_timeout}} and > is assigned a FrameworkID. > # The master sees an {{ExitedEvent}} for the master->scheduler link. This > could happen due to some transient network error, e.g., 1-way partition. The > master sends a {{FrameworkErrorMessage}} to the framework. The master marks > the framework as disconnected, but keeps the {{Framework*}} for it around in > {{frameworks.registered}}. > # The framework doesn't receive the {{FrameworkErrorMessage}} because it is > dropped by the network. > # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master > link, but it ignores this anyway (see MESOS-887). > # The scheduler sees a new-master-detected event and re-registers with the > master. It doesn _not_ set the {{force}} flag. This means we follow [this > code > path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771] > in the master, which does _not_ relink with the scheduler. > The result is that scheduler re-registration succeds, but the master -> > scheduler link is never re-established. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6742) Adding support for s390x architecture
[ https://issues.apache.org/jira/browse/MESOS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729643#comment-15729643 ] Abhishek Dasgupta commented on MESOS-6742: -- For dev contributors list, you may raise a pull request in github. Please, send a mail to the dev mailing list as well to have the contributors access mentioning your reviewboard id and jira id. > Adding support for s390x architecture > -- > > Key: MESOS-6742 > URL: https://issues.apache.org/jira/browse/MESOS-6742 > Project: Mesos > Issue Type: Bug >Reporter: Ayanampudi Varsha > > There are 2 issues: > 1. LdcacheTest.Parse test case fails on s390x machines. > 2. From the value of flag docker_registry in slave.cpp, amd64 images get > downloaded due to which test cases fail on s390x with "Exec format Error" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
[ https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-6747: -- Shepherd: Joseph Wu Assignee: Till Toenshoff > ContainerLogger runnable must not inherit the slave environment. > > > Key: MESOS-6747 > URL: https://issues.apache.org/jira/browse/MESOS-6747 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff >Priority: Blocker > Labels: libprocess, logger > > The ContainerLogger module which forks a child process named > "mesos-logrotate-logger" does inherit the slave's environment. Specifically > things like {{LIBPROCESS_SSL_}} variables are not meant to be picked up > by that runnable and cause issues as soon as the owning user is not the same > as the one owning the agent process. > So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if > that key-file is readable by the agent user (root) only, then the > {{mesos-logrotate-logger}} will try to read that file as well even though it > is being run as nobody - that action will then fail the runnable and hence > fail the entire task. > {noformat} > Could not load key file '/my/funky/key/path/key.key' (OpenSSL error > #33558541): error:0200100D:system library:fopen:Permission denied > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6679) Allow `network/cni` isolator to dynamically load CNI configuration
[ https://issues.apache.org/jira/browse/MESOS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-6679: - Summary: Allow `network/cni` isolator to dynamically load CNI configuration (was: All `network/cni` isolator to dynamically load CNI configuration) > Allow `network/cni` isolator to dynamically load CNI configuration > -- > > Key: MESOS-6679 > URL: https://issues.apache.org/jira/browse/MESOS-6679 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently the `network/cni` isolator learns the CNI config at startup. In > case the CNI config changes after the agent has started, the agent needs to > be restarted in order to learn any modifications to the CNI config. > We would like the `network/cni` isolator to be able to load CNI config on the > fly without a restart. To achieve this we plan to introduce a new endpoint on > the `network/cni` isolator that would allow the operator to explicitly ask > the `network/cni` isolator to reload the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6748) I/O switchboard should inherit agent environment variables.
[ https://issues.apache.org/jira/browse/MESOS-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-6748: - Assignee: Jie Yu > I/O switchboard should inherit agent environment variables. > --- > > Key: MESOS-6748 > URL: https://issues.apache.org/jira/browse/MESOS-6748 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Jie Yu > > Since it is a libexec binary that owned by Mesos. Agent might have some > environment variables (e.g., LD_LIBRARY_PATH) that are needed by the io > switchboard server process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6748) I/O switchboard should inherit agent environment variables.
Jie Yu created MESOS-6748: - Summary: I/O switchboard should inherit agent environment variables. Key: MESOS-6748 URL: https://issues.apache.org/jira/browse/MESOS-6748 Project: Mesos Issue Type: Bug Reporter: Jie Yu Since it is a libexec binary that owned by Mesos. Agent might have some environment variables (e.g., LD_LIBRARY_PATH) that are needed by the io switchboard server process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
[ https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729409#comment-15729409 ] Till Toenshoff commented on MESOS-6747: --- Here is some information on how I got to the root of this problem. The agent is setup for using SSL via the {{LIBPROCESS_SSL...}} variables. The output within the agent log whenever a task is about to get run: {noformat} [...] Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.491056 2455 container_state_cache_impl.cpp:134] Writing container file[/var/run/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/b8f97301-c477-49bc-87ed-1e7ea49366bf] with endpoint[198.51.100.1:37476] Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.491147 2455 sync_util.hpp:136] Result for ticket 1297 complete, returning value. Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.491202 2421 sync_util.hpp:83] Dispatch result obtained for ticket 1297 after waiting <=5s: register_and_update_cache Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.534430 2424 systemd.cpp:96] Assigned child process '16285' to 'mesos_executors.slice' Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.541784 2424 systemd.cpp:96] Assigned child process '16286' to 'mesos_executors.slice' Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.551643 2425 linux_launcher.cpp:429] Launching container b8f97301-c477-49bc-87ed-1e7ea49366bf and cloning with namespaces CLONE_NEWNS Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.555986 2425 systemd.cpp:96] Assigned child process '16288' to 'mesos_executors.slice' Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.559031 2420 containerizer.cpp:1577] Checkpointing container's forked pid 16288 to '/var/lib/mesos/slave/meta/slaves/b79c72c2-5566-4a2f-86da-668611ee4e78-S0/frameworks/b79c72c2-5566-4a2f-86da-668611ee4e78-0001/executors/aoaoaoaoaoao.c3a39772-bca6-11e6-9988-70b3d581/runs/b8f97301-c477-49bc-87ed-1e7ea49366bf/pids/forked.pid' Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to STDERR Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597594 16286 openssl.cpp:424] CA directory path unspecified! NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR= Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597721 16286 openssl.cpp:429] Will not verify peer certificate! Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597728 16286 openssl.cpp:435] Will only verify peer certificate if presented! Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to STDERR Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597862 16285 openssl.cpp:424] CA directory path unspecified! NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR= Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597965 16285 openssl.cpp:429] Will not verify peer certificate! Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: I1207 18:58:31.597975 16285 openssl.cpp:435] Will only verify peer certificate if presented! Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain mesos-agent[2411]: Could not load key file '/run/dcos/pki/tls/private/mesos-slave.key' (OpenSSL error #33558541): error:0200100D:system library:fopen:Permission denied [...] {noformat} Get the agent pid: {noformat} $ ps aux |grep mesos-agent root 2412 3.0 0.9 1169092 142968 ? Sl 17:35 1:41
[jira] [Updated] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
[ https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-6747: -- Affects Version/s: 1.2.0 > ContainerLogger runnable must not inherit the slave environment. > > > Key: MESOS-6747 > URL: https://issues.apache.org/jira/browse/MESOS-6747 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Priority: Blocker > Labels: libprocess, logger > > The ContainerLogger module which forks a child process named > "mesos-logrotate-logger" does inherit the slave's environment. Specifically > things like {{LIBPROCESS_SSL_}} variables are not meant to be picked up > by that runnable and cause issues as soon as the owning user is not the same > as the one owning the agent process. > So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if > that key-file is readable by the agent user (root) only, then the > {{mesos-logrotate-logger}} will try to read that file as well even though it > is being run as nobody - that action will then fail the runnable and hence > fail the entire task. > {noformat} > Could not load key file '/my/funky/key/path/key.key' (OpenSSL error > #33558541): error:0200100D:system library:fopen:Permission denied > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
[ https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-6747: -- Labels: libprocess logger (was: ) > ContainerLogger runnable must not inherit the slave environment. > > > Key: MESOS-6747 > URL: https://issues.apache.org/jira/browse/MESOS-6747 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Priority: Blocker > Labels: libprocess, logger > > The ContainerLogger module which forks a child process named > "mesos-logrotate-logger" does inherit the slave's environment. Specifically > things like {{LIBPROCESS_SSL_}} variables are not meant to be picked up > by that runnable and cause issues as soon as the owning user is not the same > as the one owning the agent process. > So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if > that key-file is readable by the agent user (root) only, then the > {{mesos-logrotate-logger}} will try to read that file as well even though it > is being run as nobody - that action will then fail the runnable and hence > fail the entire task. > {noformat} > Could not load key file '/my/funky/key/path/key.key' (OpenSSL error > #33558541): error:0200100D:system library:fopen:Permission denied > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
[ https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729392#comment-15729392 ] Till Toenshoff commented on MESOS-6747: --- This did not pop up earlier because originally the mesos-logrotate-logger was running in the agent context and as the agent user, hence it did have no issues accessing the key-file. Now that https://issues.apache.org/jira/browse/MESOS-5856 has landed, the logger is running as a different user, causing this problem to surface. > ContainerLogger runnable must not inherit the slave environment. > > > Key: MESOS-6747 > URL: https://issues.apache.org/jira/browse/MESOS-6747 > Project: Mesos > Issue Type: Bug >Reporter: Till Toenshoff >Priority: Blocker > > The ContainerLogger module which forks a child process named > "mesos-logrotate-logger" does inherit the slave's environment. Specifically > things like {{LIBPROCESS_SSL_}} variables are not meant to be picked up > by that runnable and cause issues as soon as the owning user is not the same > as the one owning the agent process. > So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if > that key-file is readable by the agent user (root) only, then the > {{mesos-logrotate-logger}} will try to read that file as well even though it > is being run as nobody - that action will then fail the runnable and hence > fail the entire task. > {noformat} > Could not load key file '/my/funky/key/path/key.key' (OpenSSL error > #33558541): error:0200100D:system library:fopen:Permission denied > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6747) ContainerLogger runnable must not inherit the slave environment.
Till Toenshoff created MESOS-6747: - Summary: ContainerLogger runnable must not inherit the slave environment. Key: MESOS-6747 URL: https://issues.apache.org/jira/browse/MESOS-6747 Project: Mesos Issue Type: Bug Reporter: Till Toenshoff Priority: Blocker The ContainerLogger module which forks a child process named "mesos-logrotate-logger" does inherit the slave's environment. Specifically things like {{LIBPROCESS_SSL_}} variables are not meant to be picked up by that runnable and cause issues as soon as the owning user is not the same as the one owning the agent process. So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if that key-file is readable by the agent user (root) only, then the {{mesos-logrotate-logger}} will try to read that file as well even though it is being run as nobody - that action will then fail the runnable and hence fail the entire task. {noformat} Could not load key file '/my/funky/key/path/key.key' (OpenSSL error #33558541): error:0200100D:system library:fopen:Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6746) IOSwitchboard doesn't properly flush data on ATTACH_CONTAINER_OUTPUT
Kevin Klues created MESOS-6746: -- Summary: IOSwitchboard doesn't properly flush data on ATTACH_CONTAINER_OUTPUT Key: MESOS-6746 URL: https://issues.apache.org/jira/browse/MESOS-6746 Project: Mesos Issue Type: Bug Reporter: Kevin Klues Assignee: Anand Mazumdar Currently we are doing a close on the write end of all connection pipes when we exit the switchboard, but we don't wait until the read is flushed before exiting. This can cause some data to get dropped since the process may exit before the reader is flushed. The current code is: {noformat} void IOSwitchboardServerProcess::finalize() { foreach (HttpConnection& connection, outputConnections) { connection.close(); } if (failure.isSome()) { promise.fail(failure->message); } else { promise.set(Nothing()); } } {noformat} We should change it to: {noformat} void IOSwitchboardServerProcess::finalize() { foreach (HttpConnection& connection, outputConnections) { connection.close(); connection.closed().await(); } if (failure.isSome()) { promise.fail(failure->message); } else { promise.set(Nothing()); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6726) IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a default value
[ https://issues.apache.org/jira/browse/MESOS-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729252#comment-15729252 ] Kevin Klues commented on MESOS-6726: How do I test that I've fixed it? > IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a > default value > - > > Key: MESOS-6726 > URL: https://issues.apache.org/jira/browse/MESOS-6726 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Kevin Klues > Labels: tech-debt > > The class {{IOSwitchboardFlags}} contains a number of members of > non-{{Option}}, fundamental type (i.e., types which do not have > constructors). As customary for a {{Flags}} class, these fields are not > initialized since usually the initialization is done by the calling the > correct overload of {{FlagsBase::add}} taking a default value. > The class {{IOSwitchbardFlags}} calls an {{add}} overload acting on a > non-{{Option}} member which does not take the default value. This can lead to > the members containing garbage values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6726) IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a default value
[ https://issues.apache.org/jira/browse/MESOS-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues reassigned MESOS-6726: -- Assignee: Kevin Klues > IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a > default value > - > > Key: MESOS-6726 > URL: https://issues.apache.org/jira/browse/MESOS-6726 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Kevin Klues > Labels: tech-debt > > The class {{IOSwitchboardFlags}} contains a number of members of > non-{{Option}}, fundamental type (i.e., types which do not have > constructors). As customary for a {{Flags}} class, these fields are not > initialized since usually the initialization is done by the calling the > correct overload of {{FlagsBase::add}} taking a default value. > The class {{IOSwitchbardFlags}} calls an {{add}} overload acting on a > non-{{Option}} member which does not take the default value. This can lead to > the members containing garbage values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6744) DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky
[ https://issues.apache.org/jira/browse/MESOS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729231#comment-15729231 ] Anand Mazumdar commented on MESOS-6744: --- >From the logs, this looks a separate issue than what we fixed in MESOS-6576 >(around status update reordering). > DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky > --- > > Key: MESOS-6744 > URL: https://issues.apache.org/jira/browse/MESOS-6744 > Project: Mesos > Issue Type: Bug > Environment: Recent Arch Linux VM, amd64. >Reporter: Neil Conway > Labels: mesosphere > > This repros consistently for me (~10 test iterations or fewer). Test log: > {noformat} > [ RUN ] DefaultExecutorTest.KillTaskGroupOnTaskFailure > I1208 03:26:47.461477 28632 cluster.cpp:160] Creating default 'local' > authorizer > I1208 03:26:47.462673 28632 replica.cpp:776] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1208 03:26:47.463248 28650 recover.cpp:451] Starting replica recovery > I1208 03:26:47.463537 28650 recover.cpp:477] Replica is in EMPTY status > I1208 03:26:47.476333 28651 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from __req_res__(64)@10.0.2.15:46643 > I1208 03:26:47.476618 28650 recover.cpp:197] Received a recover response from > a replica in EMPTY status > I1208 03:26:47.477242 28649 recover.cpp:568] Updating replica status to > STARTING > I1208 03:26:47.477496 28649 replica.cpp:320] Persisted replica status to > STARTING > I1208 03:26:47.477607 28649 recover.cpp:477] Replica is in STARTING status > I1208 03:26:47.478910 28653 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from __req_res__(65)@10.0.2.15:46643 > I1208 03:26:47.479385 28651 recover.cpp:197] Received a recover response from > a replica in STARTING status > I1208 03:26:47.479717 28647 recover.cpp:568] Updating replica status to VOTING > I1208 03:26:47.479996 28648 replica.cpp:320] Persisted replica status to > VOTING > I1208 03:26:47.480077 28648 recover.cpp:582] Successfully joined the Paxos > group > I1208 03:26:47.763380 28651 master.cpp:380] Master > 0bcb0250-4cf5-4209-92fe-ce260518b50f (archlinux.vagrant.vm) started on > 10.0.2.15:46643 > I1208 03:26:47.763463 28651 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/7lpy50/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" > --registry_max_agent_count="102400" --registry_store_timeout="100secs" > --registry_strict="false" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/7lpy50/master" --zk_session_timeout="10secs" > I1208 03:26:47.764010 28651 master.cpp:432] Master only allowing > authenticated frameworks to register > I1208 03:26:47.764070 28651 master.cpp:446] Master only allowing > authenticated agents to register > I1208 03:26:47.764076 28651 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1208 03:26:47.764081 28651 credentials.hpp:37] Loading credentials for > authentication from '/tmp/7lpy50/credentials' > I1208 03:26:47.764482 28651 master.cpp:504] Using default 'crammd5' > authenticator > I1208 03:26:47.764659 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1208 03:26:47.764981 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1208 03:26:47.765136 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1208 03:26:47.765231 28651 master.cpp:584] Authorization enabled > I1208 03:26:47.768061 28651 master.cpp:2043] Elected as the leading master! > I1208 03:26:47.768097 28651 master.cpp:1566] Recovering from registrar > I1208 03:26:47.768766 28648 log.cpp:553] Attempting to start the writer > I1208 03:26:47.769899 28653 replica.cpp:493] Replica
[jira] [Commented] (MESOS-6614) ExamplesTest.DiskFullFramework hangs on FreeBSD
[ https://issues.apache.org/jira/browse/MESOS-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729202#comment-15729202 ] David Forsythe commented on MESOS-6614: --- This is actually just happening inside of a jail, since it wants 127.0.0.1 as the master address. > ExamplesTest.DiskFullFramework hangs on FreeBSD > --- > > Key: MESOS-6614 > URL: https://issues.apache.org/jira/browse/MESOS-6614 > Project: Mesos > Issue Type: Bug >Reporter: David Forsythe >Assignee: David Forsythe > > ExamplesTest.DiskFullFramework hangs on gmake check on FreeBSD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6745) MesosContainerizer/DefaultExecutorTest.KillTask/0 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729193#comment-15729193 ] Neil Conway commented on MESOS-6745: CC [~anandmazumdar] [~bbannier] > MesosContainerizer/DefaultExecutorTest.KillTask/0 is flaky > -- > > Key: MESOS-6745 > URL: https://issues.apache.org/jira/browse/MESOS-6745 > Project: Mesos > Issue Type: Bug > Environment: Recent Arch Linux VM >Reporter: Neil Conway > Labels: mesosphere > > This repros consistently for me (< 20 test iterations), using {{master}} as > of {{ab79d58c9df0ffb8ad35f6662541e7a5c3ea4a80}}. Test log: > {noformat} > [--] 1 test from MesosContainerizer/DefaultExecutorTest > [ RUN ] MesosContainerizer/DefaultExecutorTest.KillTask/0 > I1208 03:32:34.943745 29285 cluster.cpp:160] Creating default 'local' > authorizer > I1208 03:32:34.944695 29285 replica.cpp:776] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1208 03:32:34.945287 29306 recover.cpp:451] Starting replica recovery > I1208 03:32:34.945431 29306 recover.cpp:477] Replica is in EMPTY status > I1208 03:32:34.946542 29300 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from __req_res__(127)@10.0.2.15:36807 > I1208 03:32:34.946768 29301 recover.cpp:197] Received a recover response from > a replica in EMPTY status > I1208 03:32:34.947377 29299 recover.cpp:568] Updating replica status to > STARTING > I1208 03:32:34.947746 29306 replica.cpp:320] Persisted replica status to > STARTING > I1208 03:32:34.947887 29306 recover.cpp:477] Replica is in STARTING status > I1208 03:32:34.948559 29306 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from __req_res__(128)@10.0.2.15:36807 > I1208 03:32:34.948771 29299 recover.cpp:197] Received a recover response from > a replica in STARTING status > I1208 03:32:34.949097 29302 recover.cpp:568] Updating replica status to VOTING > I1208 03:32:34.949385 29306 replica.cpp:320] Persisted replica status to > VOTING > I1208 03:32:34.949467 29306 recover.cpp:582] Successfully joined the Paxos > group > I1208 03:32:34.971436 29301 master.cpp:380] Master > 67de7bda-9b5b-4fe9-aede-390ec9ca7290 (archlinux.vagrant.vm) started on > 10.0.2.15:36807 > I1208 03:32:34.971519 29301 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/8oMk6W/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" > --registry_max_agent_count="102400" --registry_store_timeout="100secs" > --registry_strict="false" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/8oMk6W/master" --zk_session_timeout="10secs" > I1208 03:32:34.971824 29301 master.cpp:432] Master only allowing > authenticated frameworks to register > I1208 03:32:34.971832 29301 master.cpp:446] Master only allowing > authenticated agents to register > I1208 03:32:34.971837 29301 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1208 03:32:34.971842 29301 credentials.hpp:37] Loading credentials for > authentication from '/tmp/8oMk6W/credentials' > I1208 03:32:34.972051 29301 master.cpp:504] Using default 'crammd5' > authenticator > I1208 03:32:34.972198 29301 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1208 03:32:34.972327 29301 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1208 03:32:34.972436 29301 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1208 03:32:34.972561 29301 master.cpp:584] Authorization enabled > I1208 03:32:34.974555 29300 master.cpp:2043] Elected as the leading master! > I1208 03:32:34.974586 29300 master.cpp:1566] Recovering from registrar > I1208 03:32:34.975244 29306 log.cpp:553] Attempting to start the writer > I1208
[jira] [Created] (MESOS-6745) MesosContainerizer/DefaultExecutorTest.KillTask/0 is flaky
Neil Conway created MESOS-6745: -- Summary: MesosContainerizer/DefaultExecutorTest.KillTask/0 is flaky Key: MESOS-6745 URL: https://issues.apache.org/jira/browse/MESOS-6745 Project: Mesos Issue Type: Bug Environment: Recent Arch Linux VM Reporter: Neil Conway This repros consistently for me (< 20 test iterations), using {{master}} as of {{ab79d58c9df0ffb8ad35f6662541e7a5c3ea4a80}}. Test log: {noformat} [--] 1 test from MesosContainerizer/DefaultExecutorTest [ RUN ] MesosContainerizer/DefaultExecutorTest.KillTask/0 I1208 03:32:34.943745 29285 cluster.cpp:160] Creating default 'local' authorizer I1208 03:32:34.944695 29285 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1208 03:32:34.945287 29306 recover.cpp:451] Starting replica recovery I1208 03:32:34.945431 29306 recover.cpp:477] Replica is in EMPTY status I1208 03:32:34.946542 29300 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(127)@10.0.2.15:36807 I1208 03:32:34.946768 29301 recover.cpp:197] Received a recover response from a replica in EMPTY status I1208 03:32:34.947377 29299 recover.cpp:568] Updating replica status to STARTING I1208 03:32:34.947746 29306 replica.cpp:320] Persisted replica status to STARTING I1208 03:32:34.947887 29306 recover.cpp:477] Replica is in STARTING status I1208 03:32:34.948559 29306 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(128)@10.0.2.15:36807 I1208 03:32:34.948771 29299 recover.cpp:197] Received a recover response from a replica in STARTING status I1208 03:32:34.949097 29302 recover.cpp:568] Updating replica status to VOTING I1208 03:32:34.949385 29306 replica.cpp:320] Persisted replica status to VOTING I1208 03:32:34.949467 29306 recover.cpp:582] Successfully joined the Paxos group I1208 03:32:34.971436 29301 master.cpp:380] Master 67de7bda-9b5b-4fe9-aede-390ec9ca7290 (archlinux.vagrant.vm) started on 10.0.2.15:36807 I1208 03:32:34.971519 29301 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/8oMk6W/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/8oMk6W/master" --zk_session_timeout="10secs" I1208 03:32:34.971824 29301 master.cpp:432] Master only allowing authenticated frameworks to register I1208 03:32:34.971832 29301 master.cpp:446] Master only allowing authenticated agents to register I1208 03:32:34.971837 29301 master.cpp:459] Master only allowing authenticated HTTP frameworks to register I1208 03:32:34.971842 29301 credentials.hpp:37] Loading credentials for authentication from '/tmp/8oMk6W/credentials' I1208 03:32:34.972051 29301 master.cpp:504] Using default 'crammd5' authenticator I1208 03:32:34.972198 29301 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1208 03:32:34.972327 29301 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1208 03:32:34.972436 29301 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1208 03:32:34.972561 29301 master.cpp:584] Authorization enabled I1208 03:32:34.974555 29300 master.cpp:2043] Elected as the leading master! I1208 03:32:34.974586 29300 master.cpp:1566] Recovering from registrar I1208 03:32:34.975244 29306 log.cpp:553] Attempting to start the writer I1208 03:32:34.976706 29304 replica.cpp:493] Replica received implicit promise request from __req_res__(129)@10.0.2.15:36807 with proposal 1 I1208 03:32:34.976793 29304 replica.cpp:342] Persisted promised to 1 I1208 03:32:34.977449 29300 coordinator.cpp:238] Coordinator attempting to fill missing positions I1208 03:32:34.978907 29303 replica.cpp:388] Replica received explicit promise request from __req_res__(130)@10.0.2.15:36807 for
[jira] [Commented] (MESOS-6744) DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky
[ https://issues.apache.org/jira/browse/MESOS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729180#comment-15729180 ] Neil Conway commented on MESOS-6744: CC [~anandmazumdar] [~bbannier] > DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky > --- > > Key: MESOS-6744 > URL: https://issues.apache.org/jira/browse/MESOS-6744 > Project: Mesos > Issue Type: Bug > Environment: Recent Arch Linux VM, amd64. >Reporter: Neil Conway > Labels: mesosphere > > This repros consistently for me (~10 test iterations or fewer). Test log: > {noformat} > [ RUN ] DefaultExecutorTest.KillTaskGroupOnTaskFailure > I1208 03:26:47.461477 28632 cluster.cpp:160] Creating default 'local' > authorizer > I1208 03:26:47.462673 28632 replica.cpp:776] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1208 03:26:47.463248 28650 recover.cpp:451] Starting replica recovery > I1208 03:26:47.463537 28650 recover.cpp:477] Replica is in EMPTY status > I1208 03:26:47.476333 28651 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from __req_res__(64)@10.0.2.15:46643 > I1208 03:26:47.476618 28650 recover.cpp:197] Received a recover response from > a replica in EMPTY status > I1208 03:26:47.477242 28649 recover.cpp:568] Updating replica status to > STARTING > I1208 03:26:47.477496 28649 replica.cpp:320] Persisted replica status to > STARTING > I1208 03:26:47.477607 28649 recover.cpp:477] Replica is in STARTING status > I1208 03:26:47.478910 28653 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from __req_res__(65)@10.0.2.15:46643 > I1208 03:26:47.479385 28651 recover.cpp:197] Received a recover response from > a replica in STARTING status > I1208 03:26:47.479717 28647 recover.cpp:568] Updating replica status to VOTING > I1208 03:26:47.479996 28648 replica.cpp:320] Persisted replica status to > VOTING > I1208 03:26:47.480077 28648 recover.cpp:582] Successfully joined the Paxos > group > I1208 03:26:47.763380 28651 master.cpp:380] Master > 0bcb0250-4cf5-4209-92fe-ce260518b50f (archlinux.vagrant.vm) started on > 10.0.2.15:46643 > I1208 03:26:47.763463 28651 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/7lpy50/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" > --registry_max_agent_count="102400" --registry_store_timeout="100secs" > --registry_strict="false" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/7lpy50/master" --zk_session_timeout="10secs" > I1208 03:26:47.764010 28651 master.cpp:432] Master only allowing > authenticated frameworks to register > I1208 03:26:47.764070 28651 master.cpp:446] Master only allowing > authenticated agents to register > I1208 03:26:47.764076 28651 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1208 03:26:47.764081 28651 credentials.hpp:37] Loading credentials for > authentication from '/tmp/7lpy50/credentials' > I1208 03:26:47.764482 28651 master.cpp:504] Using default 'crammd5' > authenticator > I1208 03:26:47.764659 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1208 03:26:47.764981 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1208 03:26:47.765136 28651 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1208 03:26:47.765231 28651 master.cpp:584] Authorization enabled > I1208 03:26:47.768061 28651 master.cpp:2043] Elected as the leading master! > I1208 03:26:47.768097 28651 master.cpp:1566] Recovering from registrar > I1208 03:26:47.768766 28648 log.cpp:553] Attempting to start the writer > I1208 03:26:47.769899 28653 replica.cpp:493] Replica received implicit > promise request from __req_res__(66)@10.0.2.15:46643 with proposal 1 >
[jira] [Created] (MESOS-6744) DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky
Neil Conway created MESOS-6744: -- Summary: DefaultExecutorTest.KillTaskGroupOnTaskFailure is flaky Key: MESOS-6744 URL: https://issues.apache.org/jira/browse/MESOS-6744 Project: Mesos Issue Type: Bug Environment: Recent Arch Linux VM, amd64. Reporter: Neil Conway This repros consistently for me (~10 test iterations or fewer). Test log: {noformat} [ RUN ] DefaultExecutorTest.KillTaskGroupOnTaskFailure I1208 03:26:47.461477 28632 cluster.cpp:160] Creating default 'local' authorizer I1208 03:26:47.462673 28632 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1208 03:26:47.463248 28650 recover.cpp:451] Starting replica recovery I1208 03:26:47.463537 28650 recover.cpp:477] Replica is in EMPTY status I1208 03:26:47.476333 28651 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(64)@10.0.2.15:46643 I1208 03:26:47.476618 28650 recover.cpp:197] Received a recover response from a replica in EMPTY status I1208 03:26:47.477242 28649 recover.cpp:568] Updating replica status to STARTING I1208 03:26:47.477496 28649 replica.cpp:320] Persisted replica status to STARTING I1208 03:26:47.477607 28649 recover.cpp:477] Replica is in STARTING status I1208 03:26:47.478910 28653 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(65)@10.0.2.15:46643 I1208 03:26:47.479385 28651 recover.cpp:197] Received a recover response from a replica in STARTING status I1208 03:26:47.479717 28647 recover.cpp:568] Updating replica status to VOTING I1208 03:26:47.479996 28648 replica.cpp:320] Persisted replica status to VOTING I1208 03:26:47.480077 28648 recover.cpp:582] Successfully joined the Paxos group I1208 03:26:47.763380 28651 master.cpp:380] Master 0bcb0250-4cf5-4209-92fe-ce260518b50f (archlinux.vagrant.vm) started on 10.0.2.15:46643 I1208 03:26:47.763463 28651 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/7lpy50/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/7lpy50/master" --zk_session_timeout="10secs" I1208 03:26:47.764010 28651 master.cpp:432] Master only allowing authenticated frameworks to register I1208 03:26:47.764070 28651 master.cpp:446] Master only allowing authenticated agents to register I1208 03:26:47.764076 28651 master.cpp:459] Master only allowing authenticated HTTP frameworks to register I1208 03:26:47.764081 28651 credentials.hpp:37] Loading credentials for authentication from '/tmp/7lpy50/credentials' I1208 03:26:47.764482 28651 master.cpp:504] Using default 'crammd5' authenticator I1208 03:26:47.764659 28651 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1208 03:26:47.764981 28651 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1208 03:26:47.765136 28651 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1208 03:26:47.765231 28651 master.cpp:584] Authorization enabled I1208 03:26:47.768061 28651 master.cpp:2043] Elected as the leading master! I1208 03:26:47.768097 28651 master.cpp:1566] Recovering from registrar I1208 03:26:47.768766 28648 log.cpp:553] Attempting to start the writer I1208 03:26:47.769899 28653 replica.cpp:493] Replica received implicit promise request from __req_res__(66)@10.0.2.15:46643 with proposal 1 I1208 03:26:47.769984 28653 replica.cpp:342] Persisted promised to 1 I1208 03:26:47.770534 28652 coordinator.cpp:238] Coordinator attempting to fill missing positions I1208 03:26:47.771479 28652 replica.cpp:388] Replica received explicit promise request from __req_res__(67)@10.0.2.15:46643 for position 0 with proposal 2 I1208 03:26:47.772897 28650 replica.cpp:537] Replica received write request for position 0 from
[jira] [Commented] (MESOS-6726) IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a default value
[ https://issues.apache.org/jira/browse/MESOS-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729162#comment-15729162 ] Benjamin Bannier commented on MESOS-6726: - [~klueska]: Could you find some time to address these? > IOSwitchboardServerFlags adds flags for non-optional fields w/o providing a > default value > - > > Key: MESOS-6726 > URL: https://issues.apache.org/jira/browse/MESOS-6726 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier > Labels: tech-debt > > The class {{IOSwitchboardFlags}} contains a number of members of > non-{{Option}}, fundamental type (i.e., types which do not have > constructors). As customary for a {{Flags}} class, these fields are not > initialized since usually the initialization is done by the calling the > correct overload of {{FlagsBase::add}} taking a default value. > The class {{IOSwitchbardFlags}} calls an {{add}} overload acting on a > non-{{Option}} member which does not take the default value. This can lead to > the members containing garbage values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
Alexander Rukletsov created MESOS-6743: -- Summary: Docker executor hangs forever if `docker stop` fails. Key: MESOS-6743 URL: https://issues.apache.org/jira/browse/MESOS-6743 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 1.0.1, 1.1.0 Reporter: Alexander Rukletsov If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return. An interesting question is _how_ to react. Here are possible solutions. 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail. 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {[TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}? 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6742) Adding support for s390x architecture
[ https://issues.apache.org/jira/browse/MESOS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728914#comment-15728914 ] Ayanampudi Varsha commented on MESOS-6742: -- Hi, I would like to submit a code change to add s390x support for mesos. I need to get added to dev contributors list. I need contributors access for the same. Thanks, > Adding support for s390x architecture > -- > > Key: MESOS-6742 > URL: https://issues.apache.org/jira/browse/MESOS-6742 > Project: Mesos > Issue Type: Bug >Reporter: Ayanampudi Varsha > > There are 2 issues: > 1. LdcacheTest.Parse test case fails on s390x machines. > 2. From the value of flag docker_registry in slave.cpp, amd64 images get > downloaded due to which test cases fail on s390x with "Exec format Error" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4812) Mesos fails to escape command health checks
[ https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728864#comment-15728864 ] haosdent commented on MESOS-4812: - Thanks [~lloesche]'s help. I could reproduce by this application definition {code} { "id": "/test", "cmd": null, "cpus": 1, "mem": 128, "disk": 0, "instances": 1, "executor": null, "fetch": null, "constraints": null, "acceptedResourceRoles": null, "user": null, "container": { "docker": { "image": "nginx", "forcePullImage": false, "privileged": false, "portMappings": [ { "containerPort": 80, "protocol": "tcp" } ], "network": "BRIDGE" } }, "labels": null, "healthChecks": [ { "protocol": "COMMAND", "command": { "value": "bash -c \" commandArguments; commandArguments.push_back(docker->getPath()); commandArguments.push_back("exec"); commandArguments.push_back(containerName); if (command.shell()) { commandArguments.push_back("sh"); commandArguments.push_back("-c"); commandArguments.push_back("\""); commandArguments.push_back(command.value()); commandArguments.push_back("\""); } else { commandArguments.push_back(command.value()); foreach (const string& argument, command.arguments()) { commandArguments.push_back(argument); } } healthCheck.mutable_command()->set_shell(true); <-- Cause problem. healthCheck.mutable_command()->clear_arguments(); healthCheck.mutable_command()->set_value( strings::join(" ", commandArguments)); <-- Cause problem. {code} Then it would generate the health check command {code} sh -c 'docker exec mesos-ce13aa71-ebba-4361-b6dd-8d4ce57ea4ab-S9.566f6c77-a6c9-46e0-bc40-5fe95a1aa9ae sh -c " bash -c " Mesos fails to escape command health checks > --- > > Key: MESOS-4812 > URL: https://issues.apache.org/jira/browse/MESOS-4812 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Lukas Loesche >Assignee: haosdent > Labels: health-check > Attachments: health_task.gif > > > As described in https://github.com/mesosphere/marathon/issues/ > I would like to run a command health check > {noformat} > /bin/bash -c " {noformat} > The health check fails because Mesos, while running the command inside double > quotes of a sh -c "" doesn't escape the double quotes in the command. > If I escape the double quotes myself the command health check succeeds. But > this would mean that the user needs intimate knowledge of how Mesos executes > his commands which can't be right. > I was told this is not a Marathon but a Mesos issue so am opening this JIRA. > I don't know if this only affects the command health check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6742) Adding support for s390x architecture
Ayanampudi Varsha created MESOS-6742: Summary: Adding support for s390x architecture Key: MESOS-6742 URL: https://issues.apache.org/jira/browse/MESOS-6742 Project: Mesos Issue Type: Bug Reporter: Ayanampudi Varsha There are 2 issues: 1. LdcacheTest.Parse test case fails on s390x machines. 2. From the value of flag docker_registry in slave.cpp, amd64 images get downloaded due to which test cases fail on s390x with "Exec format Error" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6741) Authorize v1 SET_LOGGING_LEVEL call
Adam B created MESOS-6741: - Summary: Authorize v1 SET_LOGGING_LEVEL call Key: MESOS-6741 URL: https://issues.apache.org/jira/browse/MESOS-6741 Project: Mesos Issue Type: Bug Components: agent, security Reporter: Adam B We need to add authz to this call to prevent unauthorized users from cranking the log level way up to take down an agent/master. In the v0 API, we protected the /logging/toggle endpoint with a "coarse-grained" GET_ENDPOINT_WITH_PATH ACL, but that cannot be reused (directly) in the v1 API. We could add an analagous coarse-grained V1_CALL_WITH_ACTION ACL, but we're probably better off just adding a trivial SET_LOG_LEVEL Authorization::Action and ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6739) Authorize v1 GET_CONTAINERS call
[ https://issues.apache.org/jira/browse/MESOS-6739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6739: -- Priority: Critical (was: Major) > Authorize v1 GET_CONTAINERS call > > > Key: MESOS-6739 > URL: https://issues.apache.org/jira/browse/MESOS-6739 > Project: Mesos > Issue Type: Bug > Components: agent, security >Reporter: Adam B >Priority: Critical > Labels: security > > We need some kind of authorization for GET_CONTAINERS. > a. Coarse-grained like we already did for /containers. With this you could > say that Alice can GET_CONTAINERS for any/all containers on the cluster, but > Bob cannot see any containers' info. > b. Fine-grained authz like we have for /state and /tasks. With this you could > say that Alice can GET_CONTAINERS and see filtered results where user=alice, > but Bob can only see filtered results where user=bob. It would be nice to > port this to /containers as well if/when we add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6741) Authorize v1 SET_LOGGING_LEVEL call
[ https://issues.apache.org/jira/browse/MESOS-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6741: -- Priority: Minor (was: Major) > Authorize v1 SET_LOGGING_LEVEL call > --- > > Key: MESOS-6741 > URL: https://issues.apache.org/jira/browse/MESOS-6741 > Project: Mesos > Issue Type: Bug > Components: agent, security >Reporter: Adam B >Priority: Minor > Labels: security > > We need to add authz to this call to prevent unauthorized users from cranking > the log level way up to take down an agent/master. > In the v0 API, we protected the /logging/toggle endpoint with a > "coarse-grained" GET_ENDPOINT_WITH_PATH ACL, but that cannot be reused > (directly) in the v1 API. > We could add an analagous coarse-grained V1_CALL_WITH_ACTION ACL, but we're > probably better off just adding a trivial SET_LOG_LEVEL Authorization::Action > and ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6740) Authorize v1 GET_FLAGS call
Adam B created MESOS-6740: - Summary: Authorize v1 GET_FLAGS call Key: MESOS-6740 URL: https://issues.apache.org/jira/browse/MESOS-6740 Project: Mesos Issue Type: Bug Components: agent, security Reporter: Adam B We already have a VIEW_FLAGS ACL that we use for /flags and the flags part of /state. Let's add authz to the v1 GET_FLAGS API call (on agent and master) and reuse that ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6739) Authorize v1 GET_CONTAINERS call
Adam B created MESOS-6739: - Summary: Authorize v1 GET_CONTAINERS call Key: MESOS-6739 URL: https://issues.apache.org/jira/browse/MESOS-6739 Project: Mesos Issue Type: Bug Components: agent, security Reporter: Adam B We need some kind of authorization for GET_CONTAINERS. a. Coarse-grained like we already did for /containers. With this you could say that Alice can GET_CONTAINERS for any/all containers on the cluster, but Bob cannot see any containers' info. b. Fine-grained authz like we have for /state and /tasks. With this you could say that Alice can GET_CONTAINERS and see filtered results where user=alice, but Bob can only see filtered results where user=bob. It would be nice to port this to /containers as well if/when we add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)