[jira] [Commented] (MESOS-8737) Update composing containerizer tests.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490100#comment-16490100
 ] 

Qian Zhang commented on MESOS-8737:
---

commit bf901dad031e4f9d55832fa9eb38455ba9639809
Author: Andrei Budnik 
Date: Fri May 25 09:08:08 2018 +0800

Updated composing containerizer tests.
 
 This patch updates composing containerizer tests in order to be
 consistent with the unification of `destroy()` and `wait()` return
 types.
 
 Review: https://reviews.apache.org/r/66671/

> Update composing containerizer tests.
> -
>
> Key: MESOS-8737
> URL: https://issues.apache.org/jira/browse/MESOS-8737
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: mesosphere, test
> Fix For: 1.7.0
>
>
> Composing containerizer tests need to be updated after changing type and 
> semantics of return value for `destroy()` method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8953) Mesos HTTP health check may fail on CoreOS 1576.5.

2018-05-24 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8953:
---

 Summary: Mesos HTTP health check may fail on CoreOS 1576.5.
 Key: MESOS-8953
 URL: https://issues.apache.org/jira/browse/MESOS-8953
 Project: Mesos
  Issue Type: Bug
 Environment: CoreOS 1576.5
Reporter: Gilbert Song


this may due to the curl/libcurl version. also, the logging should be improved. 
we should print out the curl command in the failure message

here is the log:
{noformat}
W0516 14:58:34.680620  4512 health_checker.cpp:316] HTTP health check for task 
'spark.77d25e9c-5919-11e8-8664-ba6dbafa9da4' failed: curl exited with status 
48: curl: (48) An unknown option was passed in to libcurl
W0516 14:58:34.680663  4512 health_checker.cpp:348] HTTP health check for task 
'spark.77d25e9c-5919-11e8-8664-ba6dbafa9da4' failed 1 times consecutively
I0516 14:58:34.680708  4512 executor.cpp:352] Received task health update, 
healthy: false
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8713) Synchronize result of `wait` and `destroy` composing c'zer methods

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490098#comment-16490098
 ] 

Qian Zhang commented on MESOS-8713:
---

commit a4492f7767ef056bc4ea11f17d61521d547f38aa
Author: Andrei Budnik 
Date: Fri May 25 09:08:02 2018 +0800

Ensured that `wait()` and `destroy()` return the same result.
 
 We need to return the same `ContainerTermination` result for both
 `wait()` and `destroy()` for a terminated container. This patch
 ensures that for a terminated nested container `destroy()` returns
 the same result as for `wait()`.
 
 Review: https://reviews.apache.org/r/66670/

> Synchronize result of `wait` and `destroy` composing c'zer methods
> --
>
> Key: MESOS-8713
> URL: https://issues.apache.org/jira/browse/MESOS-8713
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
> Fix For: 1.7.0
>
>
> Make sure both `wait` and `destroy` methods always return the same result.
> For example, if we call `destroy` for a terminated nested container, then 
> composing c'zer returns `false`/`None`, while `wait` method calls `wait` for 
> a parent container, which might read a container termination status from the 
> file. Probably, we need to implement a test for this case, if it doesn't 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8736) Implement a test which ensures that `wait` and `destroy` return the same result for a terminated nested container.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490108#comment-16490108
 ] 

Qian Zhang commented on MESOS-8736:
---

commit d2ab700cdc76056155d60b77fe0f4a210b59723d
Author: Andrei Budnik 
Date: Fri May 25 09:08:43 2018 +0800

Added test to verify presence of nested container termination status.
 
 This test verifies that both mesos and composing containerizers
 maintain the contract described in the Containerizer API regarding
 availability of a termination status for terminated nested containers.
 
 Review: https://reviews.apache.org/r/67135/

> Implement a test which ensures that `wait` and `destroy` return the same 
> result for a terminated nested container.
> --
>
> Key: MESOS-8736
> URL: https://issues.apache.org/jira/browse/MESOS-8736
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: mesosphere, test
> Fix For: 1.7.0
>
>
> This test verifies that both mesos and composing containerizers maintain the 
> contract described in the Containerizer API regarding availability of a 
> termination status for terminated nested containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8734) Restore `WaitAfterDestroy` test to check termination status of a terminated nested container.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490107#comment-16490107
 ] 

Qian Zhang commented on MESOS-8734:
---

commit a1ce9ad6227d7871d8409fcfee519a63dc812a0c
Author: Andrei Budnik 
Date: Fri May 25 09:08:37 2018 +0800

Restored `WaitAfterDestroy` test for a nested container.
 
 This test was removed in fd4b9af147, but it's important to check that
 after termination of a nested container, its termination status is
 available. This property is used in default executor.
 
 Review: https://reviews.apache.org/r/65505/

> Restore `WaitAfterDestroy` test to check termination status of a terminated 
> nested container.
> -
>
> Key: MESOS-8734
> URL: https://issues.apache.org/jira/browse/MESOS-8734
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: mesosphere, test
> Fix For: 1.7.0
>
>
> It's important to check that after termination of a nested container, its 
> termination status is available. This property is used in default executor.
> Note that the test uses Mesos c'zer and checks above-mentioned property only 
> for Mesos c'zer.
> Right now, if we remove [this section of 
> code|https://github.com/apache/mesos/blob/5b655ce062ff55cdefed119d97ad923aeeb2efb5/src/slave/containerizer/mesos/containerizer.cpp#L2093-L2111],
>  no test will be broken!
> https://reviews.apache.org/r/65505



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8740) Update description of a Containerizer interface.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490106#comment-16490106
 ] 

Qian Zhang commented on MESOS-8740:
---

commit b549c9cac15100cc0497aaa41073be16f678e14a
Author: Andrei Budnik 
Date: Fri May 25 09:08:28 2018 +0800

Updated comments related to `wait`, `destroy` containerizer methods.
 
 This patch updates description of `wait()` and `destroy()` methods
 of the containerizer API.
 
 Review: https://reviews.apache.org/r/67130/

> Update description of a Containerizer interface.
> 
>
> Key: MESOS-8740
> URL: https://issues.apache.org/jira/browse/MESOS-8740
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: documentaion, mesosphere
>
> [Containerizer 
> interface|https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp]
>  must be updated with respect to the latest changes. In addition, it should 
> clearly describe semantics of `wait()` and `destroy()` methods, including 
> cases with a nested containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8732) Use composing containerizer by default in tests.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490103#comment-16490103
 ] 

Qian Zhang commented on MESOS-8732:
---

commit 3d156db1b02fdd607246a899d399fe5010a1c560
Author: Andrei Budnik 
Date: Fri May 25 09:08:13 2018 +0800

Enabled composing containerizer as a default containerizer in tests.
 
 This patch enforces all tests that start an agent to use composing
 containerizer. This is needed to make sure that composing containerizer
 is fairly covered by tests.
 
 Review: https://reviews.apache.org/r/66817/

> Use composing containerizer by default in tests.
> 
>
> Key: MESOS-8732
> URL: https://issues.apache.org/jira/browse/MESOS-8732
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: containerizer, mesosphere, tests
>
> If we assign "docker,mesos" to the `containerizers` flag for an agent, then 
> `ComposingContainerizer` will be used for many tests that do not specify 
> `containerizers` flag. That's the goal of this task.
> I tried to do that by adding [`flags.containerizers = 
> "docker,mesos";`|https://github.com/apache/mesos/blob/master/src/tests/mesos.cpp#L273],
>  but it turned out that some tests are started to hang due to a paused 
> clocks, while docker c'zer and docker library use libprocess clocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8829) Get rid of extra `containerizer->wait()` calls in tests.

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490105#comment-16490105
 ] 

Qian Zhang commented on MESOS-8829:
---

commit a8eac5bb33fd23e4fc114e8941cde0a666a17e84
Author: Andrei Budnik 
Date: Fri May 25 09:08:18 2018 +0800

Removed extra `containerizer->wait()` calls in tests.
 
 Previously, `wait()` and `destroy()` containerizer methods returned
 different types, so it was necessarry to call `wait()` before calling
 `destroy()` to get the process's exit status. Now, as both methods
 return `ContainerTermination`, we can get rid of redundant `wait()`
 calls.
 
 Review: https://reviews.apache.org/r/67128/

> Get rid of extra `containerizer->wait()` calls in tests.
> 
>
> Key: MESOS-8829
> URL: https://issues.apache.org/jira/browse/MESOS-8829
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
> Fix For: 1.7.0
>
>
> Since both `wait()` and `destroy()` return the same result, we can get rid of 
> extra `containerizer->wait()` call in tests. E.g 
> [here|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/slave_recovery_tests.cpp#L2292-L2300]
>  and 
> [there|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/cluster.cpp#L654-L668]
>  as well as in some other places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8712) Remove `destroyed` promise from `Container` struct

2018-05-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490094#comment-16490094
 ] 

Qian Zhang commented on MESOS-8712:
---

commit 896c593c7918dd14d44740af22d63f82c0d4813b
Author: Andrei Budnik 
Date: Fri May 25 09:07:36 2018 +0800

Removed `destroyed` from `Container` struct in composing containerizer.
 
 Previously, we stored `destroyed` promise for each container and used
 it to guarantee that `destroy()` returns a non-empty value when the
 destroy-in-progress stops an launch-in-progress using the next
 containerizer. Since `wait()` and `destroy()` return the same
 `ContainerTermination` value when called with the same ContainerID
 argument, we can remove `destroyed` promise and add callbacks to
 clean up `containers_` map instead.
 
 Moreover, we added a clean up for terminated containers that have
 been recovered after agent's restart.
 
 Review: https://reviews.apache.org/r/8/

> Remove `destroyed` promise from `Container` struct
> --
>
> Key: MESOS-8712
> URL: https://issues.apache.org/jira/browse/MESOS-8712
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>
> [`destroyed` 
> promise|https://github.com/apache/mesos/blob/5d8a9c1b77f96151da859b4c0c3607d22c36cd18/src/slave/containerizer/composing.cpp#L138]
>  is not needed anymore, since we can use the property that `wait` and 
> `destroy` methods depend on the same container termination promise. This 
> change should affect only composing c'zer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8770) Use Python3 for Mesos support scripts

2018-05-24 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489938#comment-16489938
 ] 

Andrew Schwartzmeyer commented on MESOS-8770:
-

We now have copies of the scripts ported to Python 3 under `scripts/python3`, 
and a version checker to emit a warning to urge developers to (1) install 
Python 3 and (2) use/test the Python 3 scripts, because the Python 2 ones will 
be deprecated on July 1st, at which point we can resolve this issue.

> Use Python3 for Mesos support scripts
> -
>
> Key: MESOS-8770
> URL: https://issues.apache.org/jira/browse/MESOS-8770
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Armand Grillet
>Priority: Major
>
> Our Python scripts under {{support/}} currently implicitly assume that 
> developers have a python2 environment as their primary Python installation.
> We should consider updating these scripts so that they can be used with a 
> python3 installation as well. There exist [some 
> resources|http://python-future.org/overview.html#automatic-conversion-to-py2-3-compatible-code]
>  on the web documenting best practices and tools for automatic rewrites which 
> should get us a long way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8770) Use Python3 for Mesos support scripts

2018-05-24 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489934#comment-16489934
 ] 

Andrew Schwartzmeyer commented on MESOS-8770:
-

{noformat}
commit ae4e7956a
Author: Andrew Schwartzmeyer 
Date:   Thu May 24 14:58:34 2018 -0700

    Added warning to Python version checker script.

    After the user has installed Python 3, if they use the Python 2
    scripts, we want to alert them to start using (and therefore testing)
    the ported Python 3 scripts.

    Review: https://reviews.apache.org/r/67292/

commit 960df5c48
Author: Armand Grillet 
Date:   Thu May 24 14:58:13 2018 -0700

    Ported all support scripts to Python 3.

    The scripts are in a temporary directory, support/python3.

    The scripts have been ported using 2to3, the official tool to do so.
    Many of these scripts require testing from the community before being
    used by default.

    The script building the virtual environment and the git hooks have
    been updated to use the new scripts if the environment variable
    `MESOSSUPPORTPYTHON` is set to `3` by the user.

    Review: https://reviews.apache.org/r/67059/

commit 71315eb0c
Author: Armand Grillet 
Date:   Thu May 24 14:58:09 2018 -0700

    Added python3 to list of Pylint excluded files.

    This change ensures that pylint will not try to lint the new Python 3
    support scripts if it is not run with Python 3. Having such a situation
    results in unexpected errors such as "Unnecessary parens after 'print'
    keyword". This change will not be applied in the Python 3 mesos-style.

    Review: https://reviews.apache.org/r/67282/

commit 2820028e9
Author: Armand Grillet 
Date:   Thu May 24 14:58:03 2018 -0700

    Updated support scripts to check for Python 3.

    Review: https://reviews.apache.org/r/67099/

commit 7ecb2fef2
Author: Armand Grillet 
Date:   Thu May 24 14:58:00 2018 -0700

    Added support script to check if Python >= 3.6 is available.

    Review: [https://reviews.apache.org/r/67247/]
{noformat}


> Use Python3 for Mesos support scripts
> -
>
> Key: MESOS-8770
> URL: https://issues.apache.org/jira/browse/MESOS-8770
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Armand Grillet
>Priority: Major
>
> Our Python scripts under {{support/}} currently implicitly assume that 
> developers have a python2 environment as their primary Python installation.
> We should consider updating these scripts so that they can be used with a 
> python3 installation as well. There exist [some 
> resources|http://python-future.org/overview.html#automatic-conversion-to-py2-3-compatible-code]
>  on the web documenting best practices and tools for automatic rewrites which 
> should get us a long way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8948) Mesos Container should support changing the size of /dev/shm

2018-05-24 Thread Jason Lai (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lai reassigned MESOS-8948:


Assignee: Jason Lai

> Mesos Container should support changing the size of /dev/shm
> 
>
> Key: MESOS-8948
> URL: https://issues.apache.org/jira/browse/MESOS-8948
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, containerization
>Reporter: chenmingjie
>Assignee: Jason Lai
>Priority: Minor
>  Labels: containerizer
>
> Some program like pytorch use large amount of /dev/shm to create share memory,
> Docker Support --shm-size  flag to change the size of /dev/shm,  but mesos 
> container does not. 
> Mesos container should support the changing of the size of  /dev/shm



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8952) process::await/collect n^2 performance issue

2018-05-24 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-8952:
--

Assignee: Benjamin Mahler

> process::await/collect n^2 performance issue
> 
>
> Key: MESOS-8952
> URL: https://issues.apache.org/jira/browse/MESOS-8952
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Major
>
> Due to the use of std::list::size (which appears to be linear in complexity 
> even with g++ and c++11), process::await and process::collect suffer from n^2 
> complexity. A minimal patch to switch to std::vector shows the following 
> improvement:
> {noformat: Title=Before}
> Registered 2000 frameworks
> Finished launching the tasks; Sleep 10 seconds ...
> Start collecting metrics ...
> v0 '/metrics/snapshot' response took 17.751689014secs
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 17.523928635secs
> v1 'master::call::GetMetrics' application/json response took 18.111901732secs
> {noformat}
> {noformat: Title=After}
> Registered 2000 frameworks
> Finished launching the tasks; Sleep 10 seconds ...
> Start collecting metrics ...
> v0 '/metrics/snapshot' response took 1.730948431secs
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 1.697177667secs
> v1 'master::call::GetMetrics' application/json response took 2.160314525secs
> {noformat}
> A follow up to switch the interface to std::vector would be beneficial since 
> we don't need any of the std::list benefits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8952) process::await/collect n^2 performance issue

2018-05-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8952:
--

 Summary: process::await/collect n^2 performance issue
 Key: MESOS-8952
 URL: https://issues.apache.org/jira/browse/MESOS-8952
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Benjamin Mahler


Due to the use of std::list::size (which appears to be linear in complexity 
even with g++ and c++11), process::await and process::collect suffer from n^2 
complexity. A minimal patch to switch to std::vector shows the following 
improvement:

{noformat: Title=Before}
Registered 2000 frameworks
Finished launching the tasks; Sleep 10 seconds ...
Start collecting metrics ...
v0 '/metrics/snapshot' response took 17.751689014secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
17.523928635secs
v1 'master::call::GetMetrics' application/json response took 18.111901732secs
{noformat}

{noformat: Title=After}
Registered 2000 frameworks
Finished launching the tasks; Sleep 10 seconds ...
Start collecting metrics ...
v0 '/metrics/snapshot' response took 1.730948431secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
1.697177667secs
v1 'master::call::GetMetrics' application/json response took 2.160314525secs
{noformat}

A follow up to switch the interface to std::vector would be beneficial since we 
don't need any of the std::list benefits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8883) Send email to dev list concerning the Python 3 update

2018-05-24 Thread Armand Grillet (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489794#comment-16489794
 ] 

Armand Grillet commented on MESOS-8883:
---

I have prepared something:

{code}
Hi all,

Python 2.7 will retire on January 1, 2020 and we currently use it for
our support scripts, our Python bindings, and our new CLI.

Starting July 1, 2018 you will need to have Python 3.6 on your computer 
in order to use the support scripts. It is available on all the 
operating systems we support and even preinstalled on most recent Linux 
distributions. This change is due to issues with our support scripts on 
Windows and upcoming work on the CLI. 

If you already have Python 3.6 installed on your machine, great. 
Otherwise, you will see a deprecation message when you use the support
scripts and the related git hooks. Don't worry, these messages and the 
switch to Python 3 do not change how the scripts work.

We now have a chain ready offering Python 3 support scripts alongside 
the existing ones. Having a duplicated codebase is not sustainable and 
we thus plan on deprecating the Python 2 support scripts by July 1st. 

We want to have a few weeks to test these new scripts thoroughly and let
you install Python 3.6, this is why we have decided to have both 
codebases during a few weeks.

If you want to use the new scripts, set in your environment the variable 
`MESOSSUPPORTPYTHON` to `3` and run again the bash support script 
`build-virtualenv`. You will then use the Python 3 scripts by default.

If you have any questions, please answer to this thread or join the
Mesos Slack channel #python3.

PS: This Python 3 switch does not apply to the rest of our codebase yet. 
As we have seen in a previous thread, some developers still rely on the 
Python 2 bindings and we do not want to disturb that.
{code}

> Send email to dev list concerning the Python 3 update
> -
>
> Key: MESOS-8883
> URL: https://issues.apache.org/jira/browse/MESOS-8883
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 1.6.0
>Reporter: Armand Grillet
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>
> Let's prepare an email for the dev community to express our wish and reasons 
> to add a {{python3}} dependency to the Mesos codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-6823) bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 is flaky

2018-05-24 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6823:
--

   Resolution: Fixed
 Assignee: Jie Yu
Fix Version/s: 1.7.0

{noformat}
commit 32d4305b87e79ed02cc686e0c29b027e31c6b3a4
Author: Jie Yu 
Date:   Thu May 24 10:05:17 2018 -0700

Adjusted the tests that use nobody.

Used `$SUDO_USER` instead because `nobody` sometimes cannot access
direcotries under `$HOME` of the current user running the tests.

Review: https://reviews.apache.org/r/67291
{noformat}

> bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 
> is flaky
> --
>
> Key: MESOS-6823
> URL: https://issues.apache.org/jira/browse/MESOS-6823
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 12/14 both with/without SSL
>Reporter: Anand Mazumdar
>Assignee: Jie Yu
>Priority: Major
>  Labels: flaky, flaky-test, newbie
> Fix For: 1.7.0
>
>
> This showed up on our internal CI
> {code}
> [23:13:01] :   [Step 11/11] [ RUN  ] 
> bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0
> [23:13:01] :   [Step 11/11] I1219 23:13:01.653230 25712 cluster.cpp:160] 
> Creating default 'local' authorizer
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654103 25732 master.cpp:380] 
> Master c590a129-814c-4903-9681-e16da4da4c94 (ip-172-16-10-213.mesosphere.io) 
> started on 172.16.10.213:45407
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654119 25732 master.cpp:382] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ev3icd/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ev3icd/master" 
> --zk_session_timeout="10secs"
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654248 25732 master.cpp:432] 
> Master only allowing authenticated frameworks to register
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654254 25732 master.cpp:446] 
> Master only allowing authenticated agents to register
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654258 25732 master.cpp:459] 
> Master only allowing authenticated HTTP frameworks to register
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654261 25732 credentials.hpp:37] 
> Loading credentials for authentication from 
> '/mnt/teamcity/temp/buildTmp/ev3icd/credentials'
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654343 25732 master.cpp:504] Using 
> default 'crammd5' authenticator
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654386 25732 http.cpp:922] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654429 25732 http.cpp:922] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654458 25732 http.cpp:922] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654477 25732 master.cpp:584] 
> Authorization enabled
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654551 25733 
> whitelist_watcher.cpp:77] No whitelist given
> [23:13:01] :   [Step 11/11] I1219 23:13:01.654582 25730 hierarchical.cpp:149] 
> Initialized hierarchical allocator process
> [23:13:01] :   [Step 11/11] I1219 23:13:01.655076 25732 master.cpp:2046] 
> Elected as the leading master!
> [23:13:01] :   [Step 11/11] I1219 23:13:01.655086 25732 master.cpp:1568] 
> Recovering from registrar
> [23:13:01] :   [Step 11/11] I1219 23:13:01.655124 25729 registrar.cpp:329] 
> Recovering registrar
> [23:13:01] :   [Step 11/11] I1219 23:13:01.655354 25731 registrar.cpp:362] 
> Successfully fetched the registry (0B) in 210944n

[jira] [Created] (MESOS-8951) Flaky `AgentContainerAPITest.RecoverNestedContainer`

2018-05-24 Thread Andrei Budnik (JIRA)
Andrei Budnik created MESOS-8951:


 Summary: Flaky `AgentContainerAPITest.RecoverNestedContainer`
 Key: MESOS-8951
 URL: https://issues.apache.org/jira/browse/MESOS-8951
 Project: Mesos
  Issue Type: Bug
 Environment: internal CI
 master-668030da
Reporter: Andrei Budnik
 Attachments: AgentContainerAPITest.RecoverNestedContainer-badrun1.txt, 
AgentContainerAPITest.RecoverNestedContainer-badrun2.txt

{code:java}
[  FAILED  ] 
ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/9,
 where GetParam() = (1, 0, application/json, 
("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
"ROOT_CGROUPS_")) (15297 ms)
[  FAILED  ] 
ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/13,
 where GetParam() = (1, 1, application/json, 
("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
"ROOT_CGROUPS_")) (15275 ms){code}
{code:java}
../../src/tests/agent_container_api_tests.cpp:596
Failed to wait 15secs for wait
{code}
There is no call of `WAIT_CONTAINER` in agent logs. It looks like the request 
wasn't delivered to the agent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2018-05-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489437#comment-16489437
 ] 

Jie Yu commented on MESOS-2199:
---

https://reviews.apache.org/r/67291/

> Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ---
>
> Key: MESOS-2199
> URL: https://issues.apache.org/jira/browse/MESOS-2199
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Ian Downes
>Assignee: Jie Yu
>Priority: Major
>  Labels: disabled-test, mesosphere
>
> Appears that running the executor as {{nobody}} is not supported.
> [~nnielsen] can you take a look?
> Executor log:
> {noformat}
> [root@hostname build]# cat 
> /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
> 487-11862-/executors/1/runs/latest/std*
> sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
> {noformat}
> Test output:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from SlaveTest
> [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ../../src/tests/slave_tests.cpp:680: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> ../../src/tests/slave_tests.cpp:682: Failure
> Failed to wait 10secs for statusFinished
> ../../src/tests/slave_tests.cpp:673: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(&driver, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
> [--] 1 test from SlaveTest (10641 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (10658 ms total)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2018-05-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489438#comment-16489438
 ] 

Jie Yu commented on MESOS-2199:
---

The solution (tip from [~jpe...@apache.org]) is to use $SUDO_USER instead of 
`nobody`.

> Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ---
>
> Key: MESOS-2199
> URL: https://issues.apache.org/jira/browse/MESOS-2199
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Ian Downes
>Assignee: Jie Yu
>Priority: Major
>  Labels: disabled-test, mesosphere
>
> Appears that running the executor as {{nobody}} is not supported.
> [~nnielsen] can you take a look?
> Executor log:
> {noformat}
> [root@hostname build]# cat 
> /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
> 487-11862-/executors/1/runs/latest/std*
> sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
> {noformat}
> Test output:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from SlaveTest
> [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ../../src/tests/slave_tests.cpp:680: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> ../../src/tests/slave_tests.cpp:682: Failure
> Failed to wait 10secs for statusFinished
> ../../src/tests/slave_tests.cpp:673: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(&driver, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
> [--] 1 test from SlaveTest (10641 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (10658 ms total)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2018-05-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-2199:
-

Assignee: Jie Yu

> Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ---
>
> Key: MESOS-2199
> URL: https://issues.apache.org/jira/browse/MESOS-2199
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Ian Downes
>Assignee: Jie Yu
>Priority: Major
>  Labels: disabled-test, mesosphere
>
> Appears that running the executor as {{nobody}} is not supported.
> [~nnielsen] can you take a look?
> Executor log:
> {noformat}
> [root@hostname build]# cat 
> /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
> 487-11862-/executors/1/runs/latest/std*
> sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
> {noformat}
> Test output:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from SlaveTest
> [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
> ../../src/tests/slave_tests.cpp:680: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> ../../src/tests/slave_tests.cpp:682: Failure
> Failed to wait 10secs for statusFinished
> ../../src/tests/slave_tests.cpp:673: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(&driver, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
> [--] 1 test from SlaveTest (10641 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (10658 ms total)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-3475) TestContainerizer should not modify global environment variables

2018-05-24 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-3475:


Assignee: Andrei Budnik

> TestContainerizer should not modify global environment variables
> 
>
> Key: MESOS-3475
> URL: https://issues.apache.org/jira/browse/MESOS-3475
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joris Van Remoortere
>Assignee: Andrei Budnik
>Priority: Major
>
> Currently the {{TestContainerizer}} modifies the environment variables. Since 
> these are global variables, this can cause other threads reading these 
> variables to get inconsistent results, or even segfault if they happen to 
> read while the environment is being changed.
> Synchronizing within the TestContainerizer is not sufficient. We should pass 
> the environment variables into a fork, or set them on the command line of an 
> execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8950) Framework operations can make resources unallocatable

2018-05-24 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8950:
---

 Summary: Framework operations can make resources unallocatable
 Key: MESOS-8950
 URL: https://issues.apache.org/jira/browse/MESOS-8950
 Project: Mesos
  Issue Type: Bug
  Components: allocation, master
Reporter: Benjamin Bannier


The allocator does not offer {{cpus}} or {{mem}} resources smaller than 
certain, fixed sizes. For framework operations, we do not enforce the same 
minimum size constraints which can lead the resources becoming unavailable for 
any future allocations. This behavior seems most pronounced when a framework 
can register in many roles.

Example: 

* A single multirole framework which can register in any role, e.g., in a 
certain role subhierarchy. 
* Single agent with {{cpus:1.5*MIN_CPUS}} and {{mem:1.5*MIN_MEM}}.
* Framework is offered all resources and performs a {{RESERVE}} on 
{{cpus:0.5*MIN_CPUS}}. It then changes its role.
* Same framework behavior in next two offer cycles. All {{cpus}} are then 
reserved for different roles in unallocatable amounts.
* Last offer will be just for {{mem:1.5*MIN_MEM}}, framework reserves 0.6 of 
these to another role. This fragements the {{mem}} resources as well.
* No allocatable resources left in cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8949) DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask is flaky

2018-05-24 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8949:
---

 Summary: 
DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask is flaky
 Key: MESOS-8949
 URL: https://issues.apache.org/jira/browse/MESOS-8949
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier


I see {{DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask}} fail 
pretty quickly when run under high system load.

This seems to be only peripherally related to e.g., MESOS-7500 as the binaries 
I am using where produced in a cmake build which does not use libtool wrappers 
at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)