[jira] [Commented] (MESOS-8737) Update composing containerizer tests.
[ https://issues.apache.org/jira/browse/MESOS-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490100#comment-16490100 ] Qian Zhang commented on MESOS-8737: --- commit bf901dad031e4f9d55832fa9eb38455ba9639809 Author: Andrei Budnik Date: Fri May 25 09:08:08 2018 +0800 Updated composing containerizer tests. This patch updates composing containerizer tests in order to be consistent with the unification of `destroy()` and `wait()` return types. Review: https://reviews.apache.org/r/66671/ > Update composing containerizer tests. > - > > Key: MESOS-8737 > URL: https://issues.apache.org/jira/browse/MESOS-8737 > Project: Mesos > Issue Type: Task >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Labels: mesosphere, test > Fix For: 1.7.0 > > > Composing containerizer tests need to be updated after changing type and > semantics of return value for `destroy()` method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8953) Mesos HTTP health check may fail on CoreOS 1576.5.
Gilbert Song created MESOS-8953: --- Summary: Mesos HTTP health check may fail on CoreOS 1576.5. Key: MESOS-8953 URL: https://issues.apache.org/jira/browse/MESOS-8953 Project: Mesos Issue Type: Bug Environment: CoreOS 1576.5 Reporter: Gilbert Song this may due to the curl/libcurl version. also, the logging should be improved. we should print out the curl command in the failure message here is the log: {noformat} W0516 14:58:34.680620 4512 health_checker.cpp:316] HTTP health check for task 'spark.77d25e9c-5919-11e8-8664-ba6dbafa9da4' failed: curl exited with status 48: curl: (48) An unknown option was passed in to libcurl W0516 14:58:34.680663 4512 health_checker.cpp:348] HTTP health check for task 'spark.77d25e9c-5919-11e8-8664-ba6dbafa9da4' failed 1 times consecutively I0516 14:58:34.680708 4512 executor.cpp:352] Received task health update, healthy: false {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8713) Synchronize result of `wait` and `destroy` composing c'zer methods
[ https://issues.apache.org/jira/browse/MESOS-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490098#comment-16490098 ] Qian Zhang commented on MESOS-8713: --- commit a4492f7767ef056bc4ea11f17d61521d547f38aa Author: Andrei Budnik Date: Fri May 25 09:08:02 2018 +0800 Ensured that `wait()` and `destroy()` return the same result. We need to return the same `ContainerTermination` result for both `wait()` and `destroy()` for a terminated container. This patch ensures that for a terminated nested container `destroy()` returns the same result as for `wait()`. Review: https://reviews.apache.org/r/66670/ > Synchronize result of `wait` and `destroy` composing c'zer methods > -- > > Key: MESOS-8713 > URL: https://issues.apache.org/jira/browse/MESOS-8713 > Project: Mesos > Issue Type: Task >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Fix For: 1.7.0 > > > Make sure both `wait` and `destroy` methods always return the same result. > For example, if we call `destroy` for a terminated nested container, then > composing c'zer returns `false`/`None`, while `wait` method calls `wait` for > a parent container, which might read a container termination status from the > file. Probably, we need to implement a test for this case, if it doesn't > exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8736) Implement a test which ensures that `wait` and `destroy` return the same result for a terminated nested container.
[ https://issues.apache.org/jira/browse/MESOS-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490108#comment-16490108 ] Qian Zhang commented on MESOS-8736: --- commit d2ab700cdc76056155d60b77fe0f4a210b59723d Author: Andrei Budnik Date: Fri May 25 09:08:43 2018 +0800 Added test to verify presence of nested container termination status. This test verifies that both mesos and composing containerizers maintain the contract described in the Containerizer API regarding availability of a termination status for terminated nested containers. Review: https://reviews.apache.org/r/67135/ > Implement a test which ensures that `wait` and `destroy` return the same > result for a terminated nested container. > -- > > Key: MESOS-8736 > URL: https://issues.apache.org/jira/browse/MESOS-8736 > Project: Mesos > Issue Type: Task >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Labels: mesosphere, test > Fix For: 1.7.0 > > > This test verifies that both mesos and composing containerizers maintain the > contract described in the Containerizer API regarding availability of a > termination status for terminated nested containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8734) Restore `WaitAfterDestroy` test to check termination status of a terminated nested container.
[ https://issues.apache.org/jira/browse/MESOS-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490107#comment-16490107 ] Qian Zhang commented on MESOS-8734: --- commit a1ce9ad6227d7871d8409fcfee519a63dc812a0c Author: Andrei Budnik Date: Fri May 25 09:08:37 2018 +0800 Restored `WaitAfterDestroy` test for a nested container. This test was removed in fd4b9af147, but it's important to check that after termination of a nested container, its termination status is available. This property is used in default executor. Review: https://reviews.apache.org/r/65505/ > Restore `WaitAfterDestroy` test to check termination status of a terminated > nested container. > - > > Key: MESOS-8734 > URL: https://issues.apache.org/jira/browse/MESOS-8734 > Project: Mesos > Issue Type: Task >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Labels: mesosphere, test > Fix For: 1.7.0 > > > It's important to check that after termination of a nested container, its > termination status is available. This property is used in default executor. > Note that the test uses Mesos c'zer and checks above-mentioned property only > for Mesos c'zer. > Right now, if we remove [this section of > code|https://github.com/apache/mesos/blob/5b655ce062ff55cdefed119d97ad923aeeb2efb5/src/slave/containerizer/mesos/containerizer.cpp#L2093-L2111], > no test will be broken! > https://reviews.apache.org/r/65505 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8740) Update description of a Containerizer interface.
[ https://issues.apache.org/jira/browse/MESOS-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490106#comment-16490106 ] Qian Zhang commented on MESOS-8740: --- commit b549c9cac15100cc0497aaa41073be16f678e14a Author: Andrei Budnik Date: Fri May 25 09:08:28 2018 +0800 Updated comments related to `wait`, `destroy` containerizer methods. This patch updates description of `wait()` and `destroy()` methods of the containerizer API. Review: https://reviews.apache.org/r/67130/ > Update description of a Containerizer interface. > > > Key: MESOS-8740 > URL: https://issues.apache.org/jira/browse/MESOS-8740 > Project: Mesos > Issue Type: Documentation >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Labels: documentaion, mesosphere > > [Containerizer > interface|https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp] > must be updated with respect to the latest changes. In addition, it should > clearly describe semantics of `wait()` and `destroy()` methods, including > cases with a nested containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8732) Use composing containerizer by default in tests.
[ https://issues.apache.org/jira/browse/MESOS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490103#comment-16490103 ] Qian Zhang commented on MESOS-8732: --- commit 3d156db1b02fdd607246a899d399fe5010a1c560 Author: Andrei Budnik Date: Fri May 25 09:08:13 2018 +0800 Enabled composing containerizer as a default containerizer in tests. This patch enforces all tests that start an agent to use composing containerizer. This is needed to make sure that composing containerizer is fairly covered by tests. Review: https://reviews.apache.org/r/66817/ > Use composing containerizer by default in tests. > > > Key: MESOS-8732 > URL: https://issues.apache.org/jira/browse/MESOS-8732 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Labels: containerizer, mesosphere, tests > > If we assign "docker,mesos" to the `containerizers` flag for an agent, then > `ComposingContainerizer` will be used for many tests that do not specify > `containerizers` flag. That's the goal of this task. > I tried to do that by adding [`flags.containerizers = > "docker,mesos";`|https://github.com/apache/mesos/blob/master/src/tests/mesos.cpp#L273], > but it turned out that some tests are started to hang due to a paused > clocks, while docker c'zer and docker library use libprocess clocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8829) Get rid of extra `containerizer->wait()` calls in tests.
[ https://issues.apache.org/jira/browse/MESOS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490105#comment-16490105 ] Qian Zhang commented on MESOS-8829: --- commit a8eac5bb33fd23e4fc114e8941cde0a666a17e84 Author: Andrei Budnik Date: Fri May 25 09:08:18 2018 +0800 Removed extra `containerizer->wait()` calls in tests. Previously, `wait()` and `destroy()` containerizer methods returned different types, so it was necessarry to call `wait()` before calling `destroy()` to get the process's exit status. Now, as both methods return `ContainerTermination`, we can get rid of redundant `wait()` calls. Review: https://reviews.apache.org/r/67128/ > Get rid of extra `containerizer->wait()` calls in tests. > > > Key: MESOS-8829 > URL: https://issues.apache.org/jira/browse/MESOS-8829 > Project: Mesos > Issue Type: Improvement >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > Fix For: 1.7.0 > > > Since both `wait()` and `destroy()` return the same result, we can get rid of > extra `containerizer->wait()` call in tests. E.g > [here|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/slave_recovery_tests.cpp#L2292-L2300] > and > [there|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/cluster.cpp#L654-L668] > as well as in some other places. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8712) Remove `destroyed` promise from `Container` struct
[ https://issues.apache.org/jira/browse/MESOS-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490094#comment-16490094 ] Qian Zhang commented on MESOS-8712: --- commit 896c593c7918dd14d44740af22d63f82c0d4813b Author: Andrei Budnik Date: Fri May 25 09:07:36 2018 +0800 Removed `destroyed` from `Container` struct in composing containerizer. Previously, we stored `destroyed` promise for each container and used it to guarantee that `destroy()` returns a non-empty value when the destroy-in-progress stops an launch-in-progress using the next containerizer. Since `wait()` and `destroy()` return the same `ContainerTermination` value when called with the same ContainerID argument, we can remove `destroyed` promise and add callbacks to clean up `containers_` map instead. Moreover, we added a clean up for terminated containers that have been recovered after agent's restart. Review: https://reviews.apache.org/r/8/ > Remove `destroyed` promise from `Container` struct > -- > > Key: MESOS-8712 > URL: https://issues.apache.org/jira/browse/MESOS-8712 > Project: Mesos > Issue Type: Task >Reporter: Andrei Budnik >Assignee: Andrei Budnik >Priority: Major > > [`destroyed` > promise|https://github.com/apache/mesos/blob/5d8a9c1b77f96151da859b4c0c3607d22c36cd18/src/slave/containerizer/composing.cpp#L138] > is not needed anymore, since we can use the property that `wait` and > `destroy` methods depend on the same container termination promise. This > change should affect only composing c'zer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8770) Use Python3 for Mesos support scripts
[ https://issues.apache.org/jira/browse/MESOS-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489938#comment-16489938 ] Andrew Schwartzmeyer commented on MESOS-8770: - We now have copies of the scripts ported to Python 3 under `scripts/python3`, and a version checker to emit a warning to urge developers to (1) install Python 3 and (2) use/test the Python 3 scripts, because the Python 2 ones will be deprecated on July 1st, at which point we can resolve this issue. > Use Python3 for Mesos support scripts > - > > Key: MESOS-8770 > URL: https://issues.apache.org/jira/browse/MESOS-8770 > Project: Mesos > Issue Type: Task >Reporter: Benjamin Bannier >Assignee: Armand Grillet >Priority: Major > > Our Python scripts under {{support/}} currently implicitly assume that > developers have a python2 environment as their primary Python installation. > We should consider updating these scripts so that they can be used with a > python3 installation as well. There exist [some > resources|http://python-future.org/overview.html#automatic-conversion-to-py2-3-compatible-code] > on the web documenting best practices and tools for automatic rewrites which > should get us a long way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8770) Use Python3 for Mesos support scripts
[ https://issues.apache.org/jira/browse/MESOS-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489934#comment-16489934 ] Andrew Schwartzmeyer commented on MESOS-8770: - {noformat} commit ae4e7956a Author: Andrew Schwartzmeyer Date: Thu May 24 14:58:34 2018 -0700 Added warning to Python version checker script. After the user has installed Python 3, if they use the Python 2 scripts, we want to alert them to start using (and therefore testing) the ported Python 3 scripts. Review: https://reviews.apache.org/r/67292/ commit 960df5c48 Author: Armand Grillet Date: Thu May 24 14:58:13 2018 -0700 Ported all support scripts to Python 3. The scripts are in a temporary directory, support/python3. The scripts have been ported using 2to3, the official tool to do so. Many of these scripts require testing from the community before being used by default. The script building the virtual environment and the git hooks have been updated to use the new scripts if the environment variable `MESOSSUPPORTPYTHON` is set to `3` by the user. Review: https://reviews.apache.org/r/67059/ commit 71315eb0c Author: Armand Grillet Date: Thu May 24 14:58:09 2018 -0700 Added python3 to list of Pylint excluded files. This change ensures that pylint will not try to lint the new Python 3 support scripts if it is not run with Python 3. Having such a situation results in unexpected errors such as "Unnecessary parens after 'print' keyword". This change will not be applied in the Python 3 mesos-style. Review: https://reviews.apache.org/r/67282/ commit 2820028e9 Author: Armand Grillet Date: Thu May 24 14:58:03 2018 -0700 Updated support scripts to check for Python 3. Review: https://reviews.apache.org/r/67099/ commit 7ecb2fef2 Author: Armand Grillet Date: Thu May 24 14:58:00 2018 -0700 Added support script to check if Python >= 3.6 is available. Review: [https://reviews.apache.org/r/67247/] {noformat} > Use Python3 for Mesos support scripts > - > > Key: MESOS-8770 > URL: https://issues.apache.org/jira/browse/MESOS-8770 > Project: Mesos > Issue Type: Task >Reporter: Benjamin Bannier >Assignee: Armand Grillet >Priority: Major > > Our Python scripts under {{support/}} currently implicitly assume that > developers have a python2 environment as their primary Python installation. > We should consider updating these scripts so that they can be used with a > python3 installation as well. There exist [some > resources|http://python-future.org/overview.html#automatic-conversion-to-py2-3-compatible-code] > on the web documenting best practices and tools for automatic rewrites which > should get us a long way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8948) Mesos Container should support changing the size of /dev/shm
[ https://issues.apache.org/jira/browse/MESOS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lai reassigned MESOS-8948: Assignee: Jason Lai > Mesos Container should support changing the size of /dev/shm > > > Key: MESOS-8948 > URL: https://issues.apache.org/jira/browse/MESOS-8948 > Project: Mesos > Issue Type: Improvement > Components: agent, containerization >Reporter: chenmingjie >Assignee: Jason Lai >Priority: Minor > Labels: containerizer > > Some program like pytorch use large amount of /dev/shm to create share memory, > Docker Support --shm-size flag to change the size of /dev/shm, but mesos > container does not. > Mesos container should support the changing of the size of /dev/shm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8952) process::await/collect n^2 performance issue
[ https://issues.apache.org/jira/browse/MESOS-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-8952: -- Assignee: Benjamin Mahler > process::await/collect n^2 performance issue > > > Key: MESOS-8952 > URL: https://issues.apache.org/jira/browse/MESOS-8952 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Major > > Due to the use of std::list::size (which appears to be linear in complexity > even with g++ and c++11), process::await and process::collect suffer from n^2 > complexity. A minimal patch to switch to std::vector shows the following > improvement: > {noformat: Title=Before} > Registered 2000 frameworks > Finished launching the tasks; Sleep 10 seconds ... > Start collecting metrics ... > v0 '/metrics/snapshot' response took 17.751689014secs > v1 'master::call::GetMetrics' application/x-protobuf response took > 17.523928635secs > v1 'master::call::GetMetrics' application/json response took 18.111901732secs > {noformat} > {noformat: Title=After} > Registered 2000 frameworks > Finished launching the tasks; Sleep 10 seconds ... > Start collecting metrics ... > v0 '/metrics/snapshot' response took 1.730948431secs > v1 'master::call::GetMetrics' application/x-protobuf response took > 1.697177667secs > v1 'master::call::GetMetrics' application/json response took 2.160314525secs > {noformat} > A follow up to switch the interface to std::vector would be beneficial since > we don't need any of the std::list benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8952) process::await/collect n^2 performance issue
Benjamin Mahler created MESOS-8952: -- Summary: process::await/collect n^2 performance issue Key: MESOS-8952 URL: https://issues.apache.org/jira/browse/MESOS-8952 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Benjamin Mahler Due to the use of std::list::size (which appears to be linear in complexity even with g++ and c++11), process::await and process::collect suffer from n^2 complexity. A minimal patch to switch to std::vector shows the following improvement: {noformat: Title=Before} Registered 2000 frameworks Finished launching the tasks; Sleep 10 seconds ... Start collecting metrics ... v0 '/metrics/snapshot' response took 17.751689014secs v1 'master::call::GetMetrics' application/x-protobuf response took 17.523928635secs v1 'master::call::GetMetrics' application/json response took 18.111901732secs {noformat} {noformat: Title=After} Registered 2000 frameworks Finished launching the tasks; Sleep 10 seconds ... Start collecting metrics ... v0 '/metrics/snapshot' response took 1.730948431secs v1 'master::call::GetMetrics' application/x-protobuf response took 1.697177667secs v1 'master::call::GetMetrics' application/json response took 2.160314525secs {noformat} A follow up to switch the interface to std::vector would be beneficial since we don't need any of the std::list benefits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8883) Send email to dev list concerning the Python 3 update
[ https://issues.apache.org/jira/browse/MESOS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489794#comment-16489794 ] Armand Grillet commented on MESOS-8883: --- I have prepared something: {code} Hi all, Python 2.7 will retire on January 1, 2020 and we currently use it for our support scripts, our Python bindings, and our new CLI. Starting July 1, 2018 you will need to have Python 3.6 on your computer in order to use the support scripts. It is available on all the operating systems we support and even preinstalled on most recent Linux distributions. This change is due to issues with our support scripts on Windows and upcoming work on the CLI. If you already have Python 3.6 installed on your machine, great. Otherwise, you will see a deprecation message when you use the support scripts and the related git hooks. Don't worry, these messages and the switch to Python 3 do not change how the scripts work. We now have a chain ready offering Python 3 support scripts alongside the existing ones. Having a duplicated codebase is not sustainable and we thus plan on deprecating the Python 2 support scripts by July 1st. We want to have a few weeks to test these new scripts thoroughly and let you install Python 3.6, this is why we have decided to have both codebases during a few weeks. If you want to use the new scripts, set in your environment the variable `MESOSSUPPORTPYTHON` to `3` and run again the bash support script `build-virtualenv`. You will then use the Python 3 scripts by default. If you have any questions, please answer to this thread or join the Mesos Slack channel #python3. PS: This Python 3 switch does not apply to the rest of our codebase yet. As we have seen in a previous thread, some developers still rely on the Python 2 bindings and we do not want to disturb that. {code} > Send email to dev list concerning the Python 3 update > - > > Key: MESOS-8883 > URL: https://issues.apache.org/jira/browse/MESOS-8883 > Project: Mesos > Issue Type: Task >Affects Versions: 1.6.0 >Reporter: Armand Grillet >Assignee: Andrew Schwartzmeyer >Priority: Major > > Let's prepare an email for the dev community to express our wish and reasons > to add a {{python3}} dependency to the Mesos codebase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-6823) bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-6823: -- Resolution: Fixed Assignee: Jie Yu Fix Version/s: 1.7.0 {noformat} commit 32d4305b87e79ed02cc686e0c29b027e31c6b3a4 Author: Jie Yu Date: Thu May 24 10:05:17 2018 -0700 Adjusted the tests that use nobody. Used `$SUDO_USER` instead because `nobody` sometimes cannot access direcotries under `$HOME` of the current user running the tests. Review: https://reviews.apache.org/r/67291 {noformat} > bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 > is flaky > -- > > Key: MESOS-6823 > URL: https://issues.apache.org/jira/browse/MESOS-6823 > Project: Mesos > Issue Type: Bug > Environment: Ubuntu 12/14 both with/without SSL >Reporter: Anand Mazumdar >Assignee: Jie Yu >Priority: Major > Labels: flaky, flaky-test, newbie > Fix For: 1.7.0 > > > This showed up on our internal CI > {code} > [23:13:01] : [Step 11/11] [ RUN ] > bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 > [23:13:01] : [Step 11/11] I1219 23:13:01.653230 25712 cluster.cpp:160] > Creating default 'local' authorizer > [23:13:01] : [Step 11/11] I1219 23:13:01.654103 25732 master.cpp:380] > Master c590a129-814c-4903-9681-e16da4da4c94 (ip-172-16-10-213.mesosphere.io) > started on 172.16.10.213:45407 > [23:13:01] : [Step 11/11] I1219 23:13:01.654119 25732 master.cpp:382] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/mnt/teamcity/temp/buildTmp/ev3icd/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/mnt/teamcity/temp/buildTmp/ev3icd/master" > --zk_session_timeout="10secs" > [23:13:01] : [Step 11/11] I1219 23:13:01.654248 25732 master.cpp:432] > Master only allowing authenticated frameworks to register > [23:13:01] : [Step 11/11] I1219 23:13:01.654254 25732 master.cpp:446] > Master only allowing authenticated agents to register > [23:13:01] : [Step 11/11] I1219 23:13:01.654258 25732 master.cpp:459] > Master only allowing authenticated HTTP frameworks to register > [23:13:01] : [Step 11/11] I1219 23:13:01.654261 25732 credentials.hpp:37] > Loading credentials for authentication from > '/mnt/teamcity/temp/buildTmp/ev3icd/credentials' > [23:13:01] : [Step 11/11] I1219 23:13:01.654343 25732 master.cpp:504] Using > default 'crammd5' authenticator > [23:13:01] : [Step 11/11] I1219 23:13:01.654386 25732 http.cpp:922] Using > default 'basic' HTTP authenticator for realm 'mesos-master-readonly' > [23:13:01] : [Step 11/11] I1219 23:13:01.654429 25732 http.cpp:922] Using > default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' > [23:13:01] : [Step 11/11] I1219 23:13:01.654458 25732 http.cpp:922] Using > default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' > [23:13:01] : [Step 11/11] I1219 23:13:01.654477 25732 master.cpp:584] > Authorization enabled > [23:13:01] : [Step 11/11] I1219 23:13:01.654551 25733 > whitelist_watcher.cpp:77] No whitelist given > [23:13:01] : [Step 11/11] I1219 23:13:01.654582 25730 hierarchical.cpp:149] > Initialized hierarchical allocator process > [23:13:01] : [Step 11/11] I1219 23:13:01.655076 25732 master.cpp:2046] > Elected as the leading master! > [23:13:01] : [Step 11/11] I1219 23:13:01.655086 25732 master.cpp:1568] > Recovering from registrar > [23:13:01] : [Step 11/11] I1219 23:13:01.655124 25729 registrar.cpp:329] > Recovering registrar > [23:13:01] : [Step 11/11] I1219 23:13:01.655354 25731 registrar.cpp:362] > Successfully fetched the registry (0B) in 210944n
[jira] [Created] (MESOS-8951) Flaky `AgentContainerAPITest.RecoverNestedContainer`
Andrei Budnik created MESOS-8951: Summary: Flaky `AgentContainerAPITest.RecoverNestedContainer` Key: MESOS-8951 URL: https://issues.apache.org/jira/browse/MESOS-8951 Project: Mesos Issue Type: Bug Environment: internal CI master-668030da Reporter: Andrei Budnik Attachments: AgentContainerAPITest.RecoverNestedContainer-badrun1.txt, AgentContainerAPITest.RecoverNestedContainer-badrun2.txt {code:java} [ FAILED ] ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/9, where GetParam() = (1, 0, application/json, ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", "ROOT_CGROUPS_")) (15297 ms) [ FAILED ] ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/13, where GetParam() = (1, 1, application/json, ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", "ROOT_CGROUPS_")) (15275 ms){code} {code:java} ../../src/tests/agent_container_api_tests.cpp:596 Failed to wait 15secs for wait {code} There is no call of `WAIT_CONTAINER` in agent logs. It looks like the request wasn't delivered to the agent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489437#comment-16489437 ] Jie Yu commented on MESOS-2199: --- https://reviews.apache.org/r/67291/ > Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > --- > > Key: MESOS-2199 > URL: https://issues.apache.org/jira/browse/MESOS-2199 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Ian Downes >Assignee: Jie Yu >Priority: Major > Labels: disabled-test, mesosphere > > Appears that running the executor as {{nobody}} is not supported. > [~nnielsen] can you take a look? > Executor log: > {noformat} > [root@hostname build]# cat > /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 > 487-11862-/executors/1/runs/latest/std* > sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied > {noformat} > Test output: > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from SlaveTest > [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > ../../src/tests/slave_tests.cpp:680: Failure > Value of: statusRunning.get().state() > Actual: TASK_FAILED > Expected: TASK_RUNNING > ../../src/tests/slave_tests.cpp:682: Failure > Failed to wait 10secs for statusFinished > ../../src/tests/slave_tests.cpp:673: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called twice >Actual: called once - unsatisfied and active > [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) > [--] 1 test from SlaveTest (10641 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (10658 ms total) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489438#comment-16489438 ] Jie Yu commented on MESOS-2199: --- The solution (tip from [~jpe...@apache.org]) is to use $SUDO_USER instead of `nobody`. > Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > --- > > Key: MESOS-2199 > URL: https://issues.apache.org/jira/browse/MESOS-2199 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Ian Downes >Assignee: Jie Yu >Priority: Major > Labels: disabled-test, mesosphere > > Appears that running the executor as {{nobody}} is not supported. > [~nnielsen] can you take a look? > Executor log: > {noformat} > [root@hostname build]# cat > /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 > 487-11862-/executors/1/runs/latest/std* > sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied > {noformat} > Test output: > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from SlaveTest > [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > ../../src/tests/slave_tests.cpp:680: Failure > Value of: statusRunning.get().state() > Actual: TASK_FAILED > Expected: TASK_RUNNING > ../../src/tests/slave_tests.cpp:682: Failure > Failed to wait 10secs for statusFinished > ../../src/tests/slave_tests.cpp:673: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called twice >Actual: called once - unsatisfied and active > [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) > [--] 1 test from SlaveTest (10641 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (10658 ms total) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-2199: - Assignee: Jie Yu > Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > --- > > Key: MESOS-2199 > URL: https://issues.apache.org/jira/browse/MESOS-2199 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Ian Downes >Assignee: Jie Yu >Priority: Major > Labels: disabled-test, mesosphere > > Appears that running the executor as {{nobody}} is not supported. > [~nnielsen] can you take a look? > Executor log: > {noformat} > [root@hostname build]# cat > /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 > 487-11862-/executors/1/runs/latest/std* > sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied > {noformat} > Test output: > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from SlaveTest > [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser > ../../src/tests/slave_tests.cpp:680: Failure > Value of: statusRunning.get().state() > Actual: TASK_FAILED > Expected: TASK_RUNNING > ../../src/tests/slave_tests.cpp:682: Failure > Failed to wait 10secs for statusFinished > ../../src/tests/slave_tests.cpp:673: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called twice >Actual: called once - unsatisfied and active > [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) > [--] 1 test from SlaveTest (10641 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (10658 ms total) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-3475) TestContainerizer should not modify global environment variables
[ https://issues.apache.org/jira/browse/MESOS-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik reassigned MESOS-3475: Assignee: Andrei Budnik > TestContainerizer should not modify global environment variables > > > Key: MESOS-3475 > URL: https://issues.apache.org/jira/browse/MESOS-3475 > Project: Mesos > Issue Type: Bug >Reporter: Joris Van Remoortere >Assignee: Andrei Budnik >Priority: Major > > Currently the {{TestContainerizer}} modifies the environment variables. Since > these are global variables, this can cause other threads reading these > variables to get inconsistent results, or even segfault if they happen to > read while the environment is being changed. > Synchronizing within the TestContainerizer is not sufficient. We should pass > the environment variables into a fork, or set them on the command line of an > execute. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8950) Framework operations can make resources unallocatable
Benjamin Bannier created MESOS-8950: --- Summary: Framework operations can make resources unallocatable Key: MESOS-8950 URL: https://issues.apache.org/jira/browse/MESOS-8950 Project: Mesos Issue Type: Bug Components: allocation, master Reporter: Benjamin Bannier The allocator does not offer {{cpus}} or {{mem}} resources smaller than certain, fixed sizes. For framework operations, we do not enforce the same minimum size constraints which can lead the resources becoming unavailable for any future allocations. This behavior seems most pronounced when a framework can register in many roles. Example: * A single multirole framework which can register in any role, e.g., in a certain role subhierarchy. * Single agent with {{cpus:1.5*MIN_CPUS}} and {{mem:1.5*MIN_MEM}}. * Framework is offered all resources and performs a {{RESERVE}} on {{cpus:0.5*MIN_CPUS}}. It then changes its role. * Same framework behavior in next two offer cycles. All {{cpus}} are then reserved for different roles in unallocatable amounts. * Last offer will be just for {{mem:1.5*MIN_MEM}}, framework reserves 0.6 of these to another role. This fragements the {{mem}} resources as well. * No allocatable resources left in cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8949) DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask is flaky
Benjamin Bannier created MESOS-8949: --- Summary: DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask is flaky Key: MESOS-8949 URL: https://issues.apache.org/jira/browse/MESOS-8949 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Bannier I see {{DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask}} fail pretty quickly when run under high system load. This seems to be only peripherally related to e.g., MESOS-7500 as the binaries I am using where produced in a cmake build which does not use libtool wrappers at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)