[jira] [Issue Comment Deleted] (MESOS-10213) configure for Python 3

2021-05-20 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-10213:
-
Comment: was deleted

(was: If I can help you then, just hit me (aventer.biz). :-) )

> configure for Python 3 
> ---
>
> Key: MESOS-10213
> URL: https://issues.apache.org/jira/browse/MESOS-10213
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lutz Weischer
>Priority: Trivial
>
> How to 'configure' for Python 3 exactly?: 
> configure: error: Mesos requires Python < 3.0
> ---
> The detected Python version is 3.9.
> If you already have Python 2.6+ installed (and it's the default python
> on the path), you might want to check if you have the PYTHON environment
> variable set to a version of Python greater than 3.0.
> ---
> Thanks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9006) The agent's GET_AGENT leaks resource information when using authorization

2020-06-16 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136594#comment-17136594
 ] 

Benjamin Bannier commented on MESOS-9006:
-

[~dzhu], this is about leaking related to the {{VIEW_ROLE}} authorizer action. 
To see the issue reserve some resources to a role, then query the agent info 
with {{GET_AGENT}} with a principal not authorized to view that role.

> The agent's GET_AGENT leaks resource information when using authorization
> -
>
> Key: MESOS-9006
> URL: https://issues.apache.org/jira/browse/MESOS-9006
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Critical
>  Labels: agent, integration, security
>
> While the master's {{GET_AGENTS}} call e.g., filters resources (by using an 
> approver with {{VIEW_ROLE}}) so that it does not leak resources the querying 
> principal should not be able to see, no such filtering is done in the 
> corresponding agent's {{GET_AGENT}} call.
> This call should be authorized as well to not expose information we expect to 
> be not visible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10091) CI builds on ubuntu 14.04 fail to create Java bindings

2020-05-18 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10091:


Assignee: (was: Benjamin Bannier)

> CI builds on ubuntu 14.04 fail to create Java bindings
> --
>
> Key: MESOS-10091
> URL: https://issues.apache.org/jira/browse/MESOS-10091
> Project: Mesos
>  Issue Type: Bug
>  Components: java api, reviewbot
>Reporter: Benjamin Bannier
>Priority: Major
>
> Builds with Java bindings enabled fail on ubuntu-14.04 (this e.g., includes 
> reviewbot builds) with the following error
> {noformat}
> 22:28:09 Building mesos-1.10.0.jar ...
> 22:28:09 /bin/sed -i.bak 's/mesos\.mesos_pb2/mesos_pb2/' 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py && rm 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py.bak
> 22:28:15 [ERROR] The build could not read 1 project -> [Help 1]
> 22:28:15 [ERROR]   
> 22:28:15 [ERROR]   The project org.apache.mesos:mesos:1.10.0 
> (/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src/java/mesos.pom)
>  has 1 error
> 22:28:15 [ERROR] Non-resolvable parent POM: Could not transfer artifact 
> org.apache:apache:pom:11 from/to central 
> (http://repo.maven.apache.org/maven2): Failed to transfer file: 
> http://repo.maven.apache.org/maven2/org/apache/apache/11/apache-11.pom. 
> Return code is: 501 , ReasonPhrase:HTTPS Required. and 'parent.relativePath' 
> points at wrong local POM @ line 18, column 11 -> [Help 2]
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 22:28:15 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 22:28:15 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> 22:28:15 [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
> 22:28:15 make[1]: *** [java/target/mesos-1.10.0.jar] Error 1
> 22:28:15 make[1]: Leaving directory 
> `/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src'
> 22:28:15 make: *** [all-recursive] Error 1
> {noformat}
> The error seems to be due to the maven version we use in ubuntu-14.04 CI 
> images not using HTTPS by default [which seems required since 
> 2020-01-15|https://support.sonatype.com/hc/en-us/articles/360041287334]
> {quote}
> Question
> As of January 15, 2020 I am receiving the following responses upon making 
> requests to The Central Repository:
> Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status 
> and a body:
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> Requests to http://repo.maven.apache.org/maven2/ return a 501 HTTPS Required 
> status and a body:
> 501 HTTPS Required. 
> Use https://repo.maven.apache.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> How do I satisfy this requirement so that I can regain access to Central?
> Answer
> Effective January 15, 2020, The Central Repository no longer supports 
> insecure communication over plain HTTP and requires that all requests to the 
> repository are encrypted over HTTPS.
> If you're receiving this error, then you need to replace all URL references 
> to Maven Central with their canonical HTTPS counterparts:
> Replace http://repo1.maven.org/maven2/ with https://repo1.maven.org/maven2/
> Replace http://repo.maven.apache.org/maven2/ with 
> https://repo.maven.apache.org/maven2/
> If for any reason your environment cannot support HTTPS, you have the option 
> of using our dedicated insecure endpoint at 
> http://insecure.repo1.maven.org/maven2/
> For further context around the move to HTTPS, please see 
> https://blog.sonatype.com/central-repository-moving-to-https.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-8464) Automate building of mesos-tidy docker image

2020-05-18 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-8464:
---

Assignee: (was: Benjamin Bannier)

> Automate building of mesos-tidy docker image
> 
>
> Key: MESOS-8464
> URL: https://issues.apache.org/jira/browse/MESOS-8464
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Major
>
> The script {{support/mesos-tidy.sh}} relies on the docker image 
> {{mesos/mesos-tidy}}. This imagine is currently manually built from the files 
> in {{support/mesos-tidy}} and then uploaded to dockerhub.The manual step 
> creates unnecessary friction to roll out updates to the mesos-tidy setup; 
> while e.g., every committer can update the setup, not all committers are able 
> to update the image.
> We should investigate how to automate creating this image whenever source 
> files under {{support/mesos-tidy/}} are updated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2020-05-18 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-8400:
---

Assignee: (was: Benjamin Bannier)

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9630) Consider moving linter setup to pre-commit

2020-05-18 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110274#comment-17110274
 ] 

Benjamin Bannier commented on MESOS-9630:
-

{noformat}
commit 83359534cb1b3303fcbae34af3fadd81b7c8cb85
Author: Benjamin Bannier bbann...@apache.org
Date:   Wed May 6 17:47:37 2020 +0200

Removed mesos-style transition script.
Review: https://reviews.apache.org/r/71300/
{noformat}

> Consider moving linter setup to pre-commit
> --
>
> Key: MESOS-9630
> URL: https://issues.apache.org/jira/browse/MESOS-9630
> Project: Mesos
>  Issue Type: Wish
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
> Fix For: 1.10.0
>
>
> Mesos currently uses a mix of hand-crafted git commit hooks and mesos-style 
> to perform linting. While this has served us well our current approach also 
> has some drawbacks, e.g.,
>  * the linter setup is spread between hooks and {{support/mesos-style.py}}
>  * adding new linters can be cumbersome
>  * mesos-style.py uses a process where it creates a single virtualenv to 
> install linters in which is tie d to the source tree
>  * linter dependencies are only cached to an extent and it is easy to run 
> into a situation where one needs to update linter dependencies over the 
> network even though one has successfully linted a revision before
>  * {{support/mesos-style.py}} lacks a number of features, e.g., running over 
> only staged files, running linters in parallel for improved throughput, 
> running only specific linters or disabling certain linters, and the 
> parameterization of the linters is strongly coupled to implementation of the 
> style checker itself.
> The [pre-commit tool|https://pre-commit.com/] solves most of these issues and 
> using it in Mesos would not only allow us to get rid of tooling which is hard 
> to maintain, but also unlock other features. It is licensed under a MIT 
> license. We should consider moving our linting setup over to pre-commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10090) Mesos build on Windows appears to be broken.

2020-01-31 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10090:


Component/s: agent
   Assignee: Benjamin Bannier

Reopening as this is still an issue.

> Mesos build on Windows appears to be broken.
> 
>
> Key: MESOS-10090
> URL: https://issues.apache.org/jira/browse/MESOS-10090
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
> Environment: Windows, MSVC
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>Priority: Blocker
> Fix For: 1.10
>
>
> I was told that when trying to build the latest Mesos (master - 1.10 WIP), 
> MSVC complains about our use of domain sockets;
> {noformat}
> mesos\src\slave/slave.hpp(133,40): error C3083: ‘unix’: the symbol to the 
> left of a ‘::’ must be a type
> mesos\src\slave/slave.hpp(877,28): error C3083: ‘unix’: the symbol to the 
> left of a ‘::’ must be a type
> \mesos\src\slave\slave.cpp(203,45): error C3083: ‘unix’: the symbol to the 
> left of a ‘::’ must be a typemesos\3rdparty\libprocess\src\http.cpp(1628,18): 
> error C3083: ‘unix’: the symbol to the left of a ‘::’ must be a type
> mesos\3rdparty\libprocess\src\http.cpp(1629,16): error C3083: ‘unix’: the 
> symbol to the left of a ‘::’ must be a type {noformat}
> This entirely prevents building on MSVC while no workaround is known - 
> declaring this as a blocker for those reasons.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10091) CI builds on ubuntu 14.04 fail to create Java bindings

2020-01-28 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025204#comment-17025204
 ] 

Benjamin Bannier commented on MESOS-10091:
--

I am unable to figure out a way make this work w/o overwritting multiple 
{{repositories}} in our package dependency chain.

If a user runs into this, they need to upgrade to a newer maven version 
({{>=maven-3.2}} seems to work). I manually tested successfully with the 
unofficial PPA posted here, 
https://launchpad.net/~andrei-pozolotin/+archive/ubuntu/maven3.

> CI builds on ubuntu 14.04 fail to create Java bindings
> --
>
> Key: MESOS-10091
> URL: https://issues.apache.org/jira/browse/MESOS-10091
> Project: Mesos
>  Issue Type: Bug
>  Components: java api, reviewbot
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> Builds with Java bindings enabled fail on ubuntu-14.04 (this e.g., includes 
> reviewbot builds) with the following error
> {noformat}
> 22:28:09 Building mesos-1.10.0.jar ...
> 22:28:09 /bin/sed -i.bak 's/mesos\.mesos_pb2/mesos_pb2/' 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py && rm 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py.bak
> 22:28:15 [ERROR] The build could not read 1 project -> [Help 1]
> 22:28:15 [ERROR]   
> 22:28:15 [ERROR]   The project org.apache.mesos:mesos:1.10.0 
> (/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src/java/mesos.pom)
>  has 1 error
> 22:28:15 [ERROR] Non-resolvable parent POM: Could not transfer artifact 
> org.apache:apache:pom:11 from/to central 
> (http://repo.maven.apache.org/maven2): Failed to transfer file: 
> http://repo.maven.apache.org/maven2/org/apache/apache/11/apache-11.pom. 
> Return code is: 501 , ReasonPhrase:HTTPS Required. and 'parent.relativePath' 
> points at wrong local POM @ line 18, column 11 -> [Help 2]
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 22:28:15 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 22:28:15 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> 22:28:15 [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
> 22:28:15 make[1]: *** [java/target/mesos-1.10.0.jar] Error 1
> 22:28:15 make[1]: Leaving directory 
> `/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src'
> 22:28:15 make: *** [all-recursive] Error 1
> {noformat}
> The error seems to be due to the maven version we use in ubuntu-14.04 CI 
> images not using HTTPS by default [which seems required since 
> 2020-01-15|https://support.sonatype.com/hc/en-us/articles/360041287334]
> {quote}
> Question
> As of January 15, 2020 I am receiving the following responses upon making 
> requests to The Central Repository:
> Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status 
> and a body:
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> Requests to http://repo.maven.apache.org/maven2/ return a 501 HTTPS Required 
> status and a body:
> 501 HTTPS Required. 
> Use https://repo.maven.apache.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> How do I satisfy this requirement so that I can regain access to Central?
> Answer
> Effective January 15, 2020, The Central Repository no longer supports 
> insecure communication over plain HTTP and requires that all requests to the 
> repository are encrypted over HTTPS.
> If you're receiving this error, then you need to replace all URL references 
> to Maven Central with their canonical HTTPS counterparts:
> Replace http://repo1.maven.org/maven2/ with https://repo1.maven.org/maven2/
> Replace http://repo.maven.apache.org/maven2/ with 
> https://repo.maven.apache.org/maven2/
> If for any reason your environment cannot support HTTPS, you have the option 
> of using our dedicated insecure endpoint at 
> http://insecure.repo1.maven.org/maven2/
> For further context around the move to HTTPS, please see 
> https://blog.sonatype.com/central-repository-moving-to-https.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10091) CI builds on ubuntu 14.04 fail to create Java bindings

2020-01-28 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025114#comment-17025114
 ] 

Benjamin Bannier commented on MESOS-10091:
--

Posted slightly related https://reviews.apache.org/r/72054/ which fixes 
reviewbot, at least until we have addressed the underlying issue here.

> CI builds on ubuntu 14.04 fail to create Java bindings
> --
>
> Key: MESOS-10091
> URL: https://issues.apache.org/jira/browse/MESOS-10091
> Project: Mesos
>  Issue Type: Bug
>  Components: java api, reviewbot
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> Builds with Java bindings enabled fail on ubuntu-14.04 (this e.g., includes 
> reviewbot builds) with the following error
> {noformat}
> 22:28:09 Building mesos-1.10.0.jar ...
> 22:28:09 /bin/sed -i.bak 's/mesos\.mesos_pb2/mesos_pb2/' 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py && rm 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py.bak
> 22:28:15 [ERROR] The build could not read 1 project -> [Help 1]
> 22:28:15 [ERROR]   
> 22:28:15 [ERROR]   The project org.apache.mesos:mesos:1.10.0 
> (/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src/java/mesos.pom)
>  has 1 error
> 22:28:15 [ERROR] Non-resolvable parent POM: Could not transfer artifact 
> org.apache:apache:pom:11 from/to central 
> (http://repo.maven.apache.org/maven2): Failed to transfer file: 
> http://repo.maven.apache.org/maven2/org/apache/apache/11/apache-11.pom. 
> Return code is: 501 , ReasonPhrase:HTTPS Required. and 'parent.relativePath' 
> points at wrong local POM @ line 18, column 11 -> [Help 2]
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 22:28:15 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 22:28:15 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> 22:28:15 [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
> 22:28:15 make[1]: *** [java/target/mesos-1.10.0.jar] Error 1
> 22:28:15 make[1]: Leaving directory 
> `/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src'
> 22:28:15 make: *** [all-recursive] Error 1
> {noformat}
> The error seems to be due to the maven version we use in ubuntu-14.04 CI 
> images not using HTTPS by default [which seems required since 
> 2020-01-15|https://support.sonatype.com/hc/en-us/articles/360041287334]
> {quote}
> Question
> As of January 15, 2020 I am receiving the following responses upon making 
> requests to The Central Repository:
> Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status 
> and a body:
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> Requests to http://repo.maven.apache.org/maven2/ return a 501 HTTPS Required 
> status and a body:
> 501 HTTPS Required. 
> Use https://repo.maven.apache.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> How do I satisfy this requirement so that I can regain access to Central?
> Answer
> Effective January 15, 2020, The Central Repository no longer supports 
> insecure communication over plain HTTP and requires that all requests to the 
> repository are encrypted over HTTPS.
> If you're receiving this error, then you need to replace all URL references 
> to Maven Central with their canonical HTTPS counterparts:
> Replace http://repo1.maven.org/maven2/ with https://repo1.maven.org/maven2/
> Replace http://repo.maven.apache.org/maven2/ with 
> https://repo.maven.apache.org/maven2/
> If for any reason your environment cannot support HTTPS, you have the option 
> of using our dedicated insecure endpoint at 
> http://insecure.repo1.maven.org/maven2/
> For further context around the move to HTTPS, please see 
> https://blog.sonatype.com/central-repository-moving-to-https.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10091) CI builds on ubuntu 14.04 fail to create Java bindings

2020-01-28 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-10091:


 Summary: CI builds on ubuntu 14.04 fail to create Java bindings
 Key: MESOS-10091
 URL: https://issues.apache.org/jira/browse/MESOS-10091
 Project: Mesos
  Issue Type: Bug
  Components: java api, reviewbot
Reporter: Benjamin Bannier


Builds with Java bindings enabled fail on ubuntu-14.04 (this e.g., includes 
reviewbot builds) with the following error

{noformat}
22:28:09 Building mesos-1.10.0.jar ...
22:28:09 /bin/sed -i.bak 's/mesos\.mesos_pb2/mesos_pb2/' 
python/interface/src/mesos/v1/interface/scheduler_pb2.py && rm 
python/interface/src/mesos/v1/interface/scheduler_pb2.py.bak
22:28:15 [ERROR] The build could not read 1 project -> [Help 1]
22:28:15 [ERROR]   
22:28:15 [ERROR]   The project org.apache.mesos:mesos:1.10.0 
(/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src/java/mesos.pom)
 has 1 error
22:28:15 [ERROR] Non-resolvable parent POM: Could not transfer artifact 
org.apache:apache:pom:11 from/to central (http://repo.maven.apache.org/maven2): 
Failed to transfer file: 
http://repo.maven.apache.org/maven2/org/apache/apache/11/apache-11.pom. Return 
code is: 501 , ReasonPhrase:HTTPS Required. and 'parent.relativePath' points at 
wrong local POM @ line 18, column 11 -> [Help 2]
22:28:15 [ERROR] 
22:28:15 [ERROR] To see the full stack trace of the errors, re-run Maven with 
the -e switch.
22:28:15 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
22:28:15 [ERROR] 
22:28:15 [ERROR] For more information about the errors and possible solutions, 
please read the following articles:
22:28:15 [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
22:28:15 [ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
22:28:15 make[1]: *** [java/target/mesos-1.10.0.jar] Error 1
22:28:15 make[1]: Leaving directory 
`/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src'
22:28:15 make: *** [all-recursive] Error 1
{noformat}

The error seems to be due to the maven version we use in ubuntu-14.04 CI images 
not using HTTPS by default [which seems required since 
2020-01-15|https://support.sonatype.com/hc/en-us/articles/360041287334]

{quote}
Question

As of January 15, 2020 I am receiving the following responses upon making 
requests to The Central Repository:

Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status 
and a body:

501 HTTPS Required. 
Use https://repo1.maven.org/maven2/
More information at https://links.sonatype.com/central/501-https-required

Requests to http://repo.maven.apache.org/maven2/ return a 501 HTTPS Required 
status and a body:

501 HTTPS Required. 
Use https://repo.maven.apache.org/maven2/
More information at https://links.sonatype.com/central/501-https-required

How do I satisfy this requirement so that I can regain access to Central?
Answer

Effective January 15, 2020, The Central Repository no longer supports insecure 
communication over plain HTTP and requires that all requests to the repository 
are encrypted over HTTPS.

If you're receiving this error, then you need to replace all URL references to 
Maven Central with their canonical HTTPS counterparts:

Replace http://repo1.maven.org/maven2/ with https://repo1.maven.org/maven2/

Replace http://repo.maven.apache.org/maven2/ with 
https://repo.maven.apache.org/maven2/

If for any reason your environment cannot support HTTPS, you have the option of 
using our dedicated insecure endpoint at http://insecure.repo1.maven.org/maven2/

For further context around the move to HTTPS, please see 
https://blog.sonatype.com/central-repository-moving-to-https.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10091) CI builds on ubuntu 14.04 fail to create Java bindings

2020-01-28 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10091:


Assignee: Benjamin Bannier

> CI builds on ubuntu 14.04 fail to create Java bindings
> --
>
> Key: MESOS-10091
> URL: https://issues.apache.org/jira/browse/MESOS-10091
> Project: Mesos
>  Issue Type: Bug
>  Components: java api, reviewbot
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> Builds with Java bindings enabled fail on ubuntu-14.04 (this e.g., includes 
> reviewbot builds) with the following error
> {noformat}
> 22:28:09 Building mesos-1.10.0.jar ...
> 22:28:09 /bin/sed -i.bak 's/mesos\.mesos_pb2/mesos_pb2/' 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py && rm 
> python/interface/src/mesos/v1/interface/scheduler_pb2.py.bak
> 22:28:15 [ERROR] The build could not read 1 project -> [Help 1]
> 22:28:15 [ERROR]   
> 22:28:15 [ERROR]   The project org.apache.mesos:mesos:1.10.0 
> (/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src/java/mesos.pom)
>  has 1 error
> 22:28:15 [ERROR] Non-resolvable parent POM: Could not transfer artifact 
> org.apache:apache:pom:11 from/to central 
> (http://repo.maven.apache.org/maven2): Failed to transfer file: 
> http://repo.maven.apache.org/maven2/org/apache/apache/11/apache-11.pom. 
> Return code is: 501 , ReasonPhrase:HTTPS Required. and 'parent.relativePath' 
> points at wrong local POM @ line 18, column 11 -> [Help 2]
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 22:28:15 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 22:28:15 [ERROR] 
> 22:28:15 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 22:28:15 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> 22:28:15 [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException
> 22:28:15 make[1]: *** [java/target/mesos-1.10.0.jar] Error 1
> 22:28:15 make[1]: Leaving directory 
> `/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src'
> 22:28:15 make: *** [all-recursive] Error 1
> {noformat}
> The error seems to be due to the maven version we use in ubuntu-14.04 CI 
> images not using HTTPS by default [which seems required since 
> 2020-01-15|https://support.sonatype.com/hc/en-us/articles/360041287334]
> {quote}
> Question
> As of January 15, 2020 I am receiving the following responses upon making 
> requests to The Central Repository:
> Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status 
> and a body:
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> Requests to http://repo.maven.apache.org/maven2/ return a 501 HTTPS Required 
> status and a body:
> 501 HTTPS Required. 
> Use https://repo.maven.apache.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> How do I satisfy this requirement so that I can regain access to Central?
> Answer
> Effective January 15, 2020, The Central Repository no longer supports 
> insecure communication over plain HTTP and requires that all requests to the 
> repository are encrypted over HTTPS.
> If you're receiving this error, then you need to replace all URL references 
> to Maven Central with their canonical HTTPS counterparts:
> Replace http://repo1.maven.org/maven2/ with https://repo1.maven.org/maven2/
> Replace http://repo.maven.apache.org/maven2/ with 
> https://repo.maven.apache.org/maven2/
> If for any reason your environment cannot support HTTPS, you have the option 
> of using our dedicated insecure endpoint at 
> http://insecure.repo1.maven.org/maven2/
> For further context around the move to HTTPS, please see 
> https://blog.sonatype.com/central-repository-moving-to-https.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-10084) Detecting whether executor is generated for command task should work when the launcher_dir changes

2020-01-27 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022937#comment-17022937
 ] 

Benjamin Bannier edited comment on MESOS-10084 at 1/27/20 2:27 PM:
---

{{1.5.x}}
{noformat}
commit 2f2146ac61abd54bc3296d8cbb5429cd10584db1
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:57 2020 +0100

Remembered whether an executor was agent-generated.

This patch adds code to pass on whether was generated in the agent from
the point where the executor is generated to the point where we create
an actual `slave::Executor` instance. This allows us to persist this
information in the executor state.

Review: https://reviews.apache.org/r/72035/

commit 6294c319047f4c23ac28c9f20c39c88dfad2a66b
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:51 2020 +0100

Sync'd whether an executor was generated to and from disk.

This patch introduces an `ExecutorState` variable signifying whether an
executor was generated by the agent (and is thus unknown to the master).
Currently we still detect agent-generated executors with a heuristic,
but will adapt that heuristic in a follow-up path making us of the
additional state we now persist.

Review: https://reviews.apache.org/r/72034/

commit 095021dcadde9a19ed3298c65dcbed303651faa8
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:43 2020 +0100

Decoupled detection of generated executors from Mesos install location.

We previously were detecting executors generated for command tasks by
checking whether their command matched the full path of
`mesos-executor`.  This approach can lead to misdetection if e.g.,
during an upgrade a new installation location is choosen.

This patch adjusts the heuristic by now only relying on the fact that
the executor command should end in `mesos-executor`. In order to cut
down on false positives we now additionally check that the executor name
looks similar to the ones we generate for command tasks.

Review: https://reviews.apache.org/r/72033/
{noformat}

{{1.6.x}}:
{noformat}
commit 7e4d380c11c20f4b9e20b06f6b6c67a4657af24b
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:57 2020 +0100

Remembered whether an executor was agent-generated.

This patch adds code to pass on whether was generated in the agent from
the point where the executor is generated to the point where we create
an actual `slave::Executor` instance. This allows us to persist this
information in the executor state.

Review: https://reviews.apache.org/r/72035/

commit 305f2b5e88ed9256b60a02afbdad06e2333937b7
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:51 2020 +0100

Sync'd whether an executor was generated to and from disk.

This patch introduces an `ExecutorState` variable signifying whether an
executor was generated by the agent (and is thus unknown to the master).
Currently we still detect agent-generated executors with a heuristic,
but will adapt that heuristic in a follow-up path making us of the
additional state we now persist.

Review: https://reviews.apache.org/r/72034/

commit efbb1e697af7082f3da1d509df9cd25ccbe0aab8
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:43 2020 +0100

Decoupled detection of generated executors from Mesos install location.

We previously were detecting executors generated for command tasks by
checking whether their command matched the full path of
`mesos-executor`.  This approach can lead to misdetection if e.g.,
during an upgrade a new installation location is choosen.

This patch adjusts the heuristic by now only relying on the fact that
the executor command should end in `mesos-executor`. In order to cut
down on false positives we now additionally check that the executor name
looks similar to the ones we generate for command tasks.

Review: https://reviews.apache.org/r/72033/
{noformat}

{{1.7.x}}:
{noformat}
commit 47f2a11e6ade71c455efe285898b52c256681e31
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:57 2020 +0100

Remembered whether an executor was agent-generated.

This patch adds code to pass on whether was generated in the agent from
the point where the executor is generated to the point where we create
an actual `slave::Executor` instance. This allows us to persist this
information in the executor state.

Review: https://reviews.apache.org/r/72035/

commit 989ccb5c04c25dfde2d9c530cc84218268c403a6
Author: Benjamin Bannier 
Date:   Thu Jan 23 14:19:51 2020 +0100

Sync'd whether an executor was generated to and from disk.

This patch introduces an `ExecutorState` variable signifying whether an
executor was generated by the agent (and is thus unknown to the master).
Currently we still detect agent-generated 

[jira] [Comment Edited] (MESOS-10084) Detecting whether executor is generated for command task should work when the launcher_dir changes

2020-01-21 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015863#comment-17015863
 ] 

Benjamin Bannier edited comment on MESOS-10084 at 1/21/20 12:21 PM:


Reviews:
https://reviews.apache.org/r/72033/
https://reviews.apache.org/r/72034/
https://reviews.apache.org/r/72035/


was (Author: bbannier):
Reviews:
https://reviews.apache.org/r/72002/
https://reviews.apache.org/r/72003/

> Detecting whether executor is generated for command task should work when the 
> launcher_dir changes
> --
>
> Key: MESOS-10084
> URL: https://issues.apache.org/jira/browse/MESOS-10084
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Sekretenko
>Assignee: Benjamin Bannier
>Priority: Critical
>
> As currently implemented, on recovery Mesos agent determines that the 
> executor is generated for command task by comparing the executor command with 
> a current path to Mesos executor:
> https://github.com/apache/mesos/blob/1.7.x/src/slave/slave.cpp#L9635
> During upgrade of production cluster we observed this check to break due to 
> the new launcher_dir being different from the one of checkpointed executor.
> This can cause problems of various kind: for example, after such upgrade, 
> Mesos master can begin to treat the checkpointed command executors as subject 
> to resource quota.
> Design considerations:
>  - proper solution is to checkpoint the flag indicating whether the executor 
> is a command/docker one.
>  - for correct upgrade from older Mesos versions, we will need some kind of 
> workaround to detect command executors after upgrade; the workaround logic 
> should be skipped if there is a checkpointed flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10084) Detecting whether executor is generated for command task should work when the launcher_dir changes

2020-01-14 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10084:


Assignee: Benjamin Bannier

> Detecting whether executor is generated for command task should work when the 
> launcher_dir changes
> --
>
> Key: MESOS-10084
> URL: https://issues.apache.org/jira/browse/MESOS-10084
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Sekretenko
>Assignee: Benjamin Bannier
>Priority: Critical
>
> As currently implemented, on recovery Mesos agent determines that the 
> executor is generated for command task by comparing the executor command with 
> a current path to Mesos executor:
> https://github.com/apache/mesos/blob/1.7.x/src/slave/slave.cpp#L9635
> During upgrade of production cluster we observed this check to break due to 
> the new launcher_dir being different from the one of checkpointed executor.
> This can cause problems of various kind: for example, after such upgrade, 
> Mesos master can begin to treat the checkpointed command executors as subject 
> to resource quota.
> Design considerations:
>  - proper solution is to checkpoint the flag indicating whether the executor 
> is a command/docker one.
>  - for correct upgrade from older Mesos versions, we will need some kind of 
> workaround to detect command executors after upgrade; the workaround logic 
> should be skipped if there is a checkpointed flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10062) Implement relative path computation for stout

2019-12-05 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988766#comment-16988766
 ] 

Benjamin Bannier commented on MESOS-10062:
--

Reviews: 
https://reviews.apache.org/r/71878/
https://reviews.apache.org/r/71879/
https://reviews.apache.org/r/71880/
https://reviews.apache.org/r/71881/
https://reviews.apache.org/r/71882/

> Implement relative path computation for stout
> -
>
> Key: MESOS-10062
> URL: https://issues.apache.org/jira/browse/MESOS-10062
> Project: Mesos
>  Issue Type: Task
>Reporter: Benno Evers
>Assignee: Benjamin Bannier
>Priority: Major
>
> When using executor domain sockets, we might need to specify relative paths 
> in order to stay below the path length limit of 108 characters.
> To do so, we should implement a `path::relative_path()` function in stout 
> that can compute the relative path between two directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10061) Implement chmod() support for stout

2019-12-03 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10061:


Assignee: Benno Evers  (was: Benjamin Bannier)

> Implement chmod() support for stout
> ---
>
> Key: MESOS-10061
> URL: https://issues.apache.org/jira/browse/MESOS-10061
> Project: Mesos
>  Issue Type: Task
>Reporter: Benno Evers
>Assignee: Benno Evers
>Priority: Major
>
> When using executor domain sockets, we need to be able to change permissions 
> on the domain socket to 0600. To do that, we should implement a new function 
> `os::chmod()` in stout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10062) Implement relative path computation for stout

2019-12-03 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10062:


  Sprint: Studio 4: RI-21 60
Story Points: 2
Assignee: Benjamin Bannier

> Implement relative path computation for stout
> -
>
> Key: MESOS-10062
> URL: https://issues.apache.org/jira/browse/MESOS-10062
> Project: Mesos
>  Issue Type: Task
>Reporter: Benno Evers
>Assignee: Benjamin Bannier
>Priority: Major
>
> When using executor domain sockets, we might need to specify relative paths 
> in order to stay below the path length limit of 108 characters.
> To do so, we should implement a `path::relative_path()` function in stout 
> that can compute the relative path between two directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10061) Implement chown() support for stout

2019-12-03 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10061:


Assignee: Benjamin Bannier

> Implement chown() support for stout
> ---
>
> Key: MESOS-10061
> URL: https://issues.apache.org/jira/browse/MESOS-10061
> Project: Mesos
>  Issue Type: Task
>Reporter: Benno Evers
>Assignee: Benjamin Bannier
>Priority: Major
>
> When using executor domain sockets, we need to be able to change permissions 
> on the domain socket to 0600. To do that, we should implement a new function 
> `os::chown()` in stout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10059) Let the command executor connect through a domain socket when available

2019-12-03 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10059:


Assignee: Benno Evers

> Let the command executor connect through a domain socket when available
> ---
>
> Key: MESOS-10059
> URL: https://issues.apache.org/jira/browse/MESOS-10059
> Project: Mesos
>  Issue Type: Task
>Reporter: Benno Evers
>Assignee: Benno Evers
>Priority: Major
>
> If the command executor is using the v1 API (--http_command_executors agent 
> flag) and the MESOS_DOMAIN_SOCKET environment variable is set, the command 
> executor should use the domain socket to communicate with the agent or die 
> trying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9956) CSI plugins reporting duplicated volumes will crash the agent.

2019-11-14 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974505#comment-16974505
 ] 

Benjamin Bannier commented on MESOS-9956:
-

Review for backport to {{1.8.x}}: https://reviews.apache.org/r/71769/

> CSI plugins reporting duplicated volumes will crash the agent.
> --
>
> Key: MESOS-9956
> URL: https://issues.apache.org/jira/browse/MESOS-9956
> Project: Mesos
>  Issue Type: Bug
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: mesosphere, storage
> Fix For: 1.9.0
>
>
> The CSI spec requires volumes to be uniquely identifiable by ID, and thus 
> SLRP currently assumes that a {{ListVolumes}} call does not return duplicated 
> volumes. However, if a SLRP uses a non-conforming CSI plugin that reports 
> duplicated volumes, these volumes would corrupt the SLRP checkpoint and cause 
> the agent to crash at the next reconciliation:
> {noformat}
>  F0829 07:13:55.171332 12721 provider.cpp:1089] Check failed: 
> !checkpointedMap.contains(resource.disk().source().id()){noformat}
> MESOS-9254 introduces periodic reconciliation which make this problem much 
> easier to manifest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9940) Framework removal may lead to inconsistent task states between master and agent.

2019-11-07 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9940:
---

Assignee: (was: Benjamin Bannier)

> Framework removal may lead to inconsistent task states between master and 
> agent.
> 
>
> Key: MESOS-9940
> URL: https://issues.apache.org/jira/browse/MESOS-9940
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Meng Zhu
>Priority: Major
>  Labels: foundations
>
> When a framework is removed from the master (say due to disconnection), 
> master sends a `ShutdownFrameworkMessage` to the agent. At the same time, 
> master would transition the task status to e.g. KILLED. 
> (https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11247-L11291)
> When agent got the shutdown message, it would try to shutdown all the 
> executor and destroy all the containers. The tasks' status is updated after 
> all these are done. 
> (https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7914-L7922)
> However, if the executor shutdown gets stuck (e.g. due to hanging docker 
> daemon), the task status transition will never happen. And master and agent 
> will have diverged view of these tasks.
> One consequence is that masters may try to schedule more workloads onto the 
> problematic agent (because it thinks those task resources are freed up). 
> Since we do not have overcommit check on agent, agent will comply and launch 
> those tasks. This will lead to over-allocation.
> One possible solution is to hold on the master status update until the agent 
> is done with the framework shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9993) Update operator API documentation for re-reservations

2019-11-06 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9993:
---

Assignee: Benjamin Bannier

> Update operator API documentation for re-reservations
> -
>
> Key: MESOS-9993
> URL: https://issues.apache.org/jira/browse/MESOS-9993
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10028) Mesos failed to build due to error C3493 on windows with MSVC

2019-11-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966519#comment-16966519
 ] 

Benjamin Bannier commented on MESOS-10028:
--

This issue is due to disagreement between GCC & Clang, and MSVC around this 
piece of code:
{code}
  const size_t childRoleLength = 36u;
  vector roles(param.roleCount);
  std::generate(roles.begin(), roles.end(), []() {
return "role-" + randString(childRoleLength);
  });
{code}

I believe {{childRoleLength}} is ODR-used here since it is {{const}} and 
initialized by a constant expression. With that, I believe it should be 
implicitly captured, http://eel.is/c++draft/expr.prim.lambda.capture#7.3, but 
MSVC seems to disagree.

We could work around this by either capturing {{childRoleLength}} by value and 
marking the lambda with some {{NOLINT}}, or maybe simpler, remove the {{const}} 
qualifier from {{childRoleLength}}.

Could you look into this [~mzhu]?

> Mesos failed to build due to error C3493 on windows with MSVC
> -
>
> Key: MESOS-10028
> URL: https://issues.apache.org/jira/browse/MESOS-10028
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: master
> Environment: VS 2017 + Windows Server 2016
>Reporter: LinGao
>Priority: Major
> Attachments: log_x64_build.log
>
>
> Mesos failed to build due to error C3493: 'childRoleLength' cannot be 
> implicitly captured because no default capture mode has been specified on 
> Windows using MSVC. It can be first reproduced on 69e92ae reversion on master 
> branch. Could you please take a look at this isssue? Thanks a lot!
>  
> Reproduce steps:
> 1. git clone -c core.autocrlf=true https://github.com/apache/mesos 
> D:\mesos\src
> 2. Open a VS 2017 x64 command prompt as admin and browse to D:\mesos
> 3. cd src
> 4. .\bootstrap.bat
> 5. cd ..
> 6. mkdir build_x64 && pushd build_x64
> 7. cmake ..\src -G "Visual Studio 15 2017 Win64" 
> -DCMAKE_SYSTEM_VERSION=10.0.17134.0 -DENABLE_LIBEVENT=1 
> -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\gnuwin32\bin" -T host=x64
> 8. msbuild Mesos.sln /p:Configuration=Debug /p:Platform=x64 /maxcpucount:4 
> /t:Rebuild
>  
> ErrorMessage:
> D:\mesos\src\src\tests\hierarchical_allocator_tests.cpp(8455): error C3493: 
> 'childRoleLength' cannot be implicitly captured because no default capture 
> mode has been specified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9991) Update 'Master::authorizeReserveResources' for re-reservations

2019-11-01 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9991:
---

Assignee: Benjamin Bannier

> Update 'Master::authorizeReserveResources' for re-reservations
> --
>
> Key: MESOS-9991
> URL: https://issues.apache.org/jira/browse/MESOS-9991
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> We need to authorize all modifications to bring {{source}} to common 
> ancestor, and from common ancestor to {{resources}}.
>  * each removed authorizations needs to be authorized as an {{unreserve}} 
> operation
>  * each added reservation needs to be authorized as a {{reserve}} operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9948) master::Slave::hasExecutor occupies 37% of a 150 second perf sample.

2019-11-01 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964806#comment-16964806
 ] 

Benjamin Bannier commented on MESOS-9948:
-

Backports by [~bmahler]:

* {{1.5.x}}
{noformat}
commit 2b3a1feb4c3be8aafdbabdb041ff7dc083dc884e
Author: Benjamin Mahler 
Date:   Fri Oct 4 13:53:52 2019 -0400Fixed master::Slave::hasExecutor 
performance issue.

This is a backport of the broader fix in MESOS-9948.{noformat}
* {{1.6.x}}
{noformat}
commit 86ce596a352ed7f6ebd87702c21d669d6c4be7af
Author: Benjamin Mahler 
Date:   Fri Oct 4 13:53:19 2019 -0400Fixed master::Slave::hasExecutor 
performance issue.

This is a backport of the broader fix in MESOS-9948. {noformat}
* {{1.7.x}}
{noformat}
commit 7e036afedcba87ee5365c399da884643a5e5497f
Author: Benjamin Mahler 
Date:   Fri Oct 4 13:52:20 2019 -0400Fixed master::Slave::hasExecutor 
performance issue.

This is a backport of the broader fix in MESOS-9948. {noformat}
* {{1.8.x}}
{noformat}
commit 9a849dd570c53e21ae8d952f8d581691ccbd7b1e
Author: Benjamin Mahler 
Date:   Fri Oct 4 13:51:34 2019 -0400Fixed master::Slave::hasExecutor 
performance issue.

This is a backport of the broader fix in MESOS-9948. {noformat}
* {{1.9.x}}
{noformat}
commit 2c1eb0cdc57e24983a55bfade9e0a8f9a24c0a0d
Author: Benjamin Mahler 
Date:   Fri Oct 4 13:50:38 2019 -0400Fixed master::Slave::hasExecutor 
performance issue.

This is a backport of the broader fix in MESOS-9948. {noformat}

> master::Slave::hasExecutor occupies 37% of a 150 second perf sample.
> 
>
> Key: MESOS-9948
> URL: https://issues.apache.org/jira/browse/MESOS-9948
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, performance
> Fix For: 1.5.4, 1.6.3, 1.7.3, 1.8.2, 1.9.1, 1.10.0
>
> Attachments: long-fei-enable-debug-slow-master.gz
>
>
> If you drop the attached perf stacks into flamescope, you can see that 
> mesos::internal::master::Slave::hasExecutor occupies 37% of the overall 
> samples!
> This function does 3 hashmap lookups, 1 can be eliminated for a quick win. 
> However, the larger improvement here will come from eliminating many of the 
> calls to this function.
> This was reported by [~carlone].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9940) Framework removal may lead to inconsistent task states between master and agent.

2019-10-29 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962162#comment-16962162
 ] 

Benjamin Bannier commented on MESOS-9940:
-

[~greggomann], let's put this back into the backlog for now and reestimate it.

> Framework removal may lead to inconsistent task states between master and 
> agent.
> 
>
> Key: MESOS-9940
> URL: https://issues.apache.org/jira/browse/MESOS-9940
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When a framework is removed from the master (say due to disconnection), 
> master sends a `ShutdownFrameworkMessage` to the agent. At the same time, 
> master would transition the task status to e.g. KILLED. 
> (https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11247-L11291)
> When agent got the shutdown message, it would try to shutdown all the 
> executor and destroy all the containers. The tasks' status is updated after 
> all these are done. 
> (https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7914-L7922)
> However, if the executor shutdown gets stuck (e.g. due to hanging docker 
> daemon), the task status transition will never happen. And master and agent 
> will have diverged view of these tasks.
> One consequence is that masters may try to schedule more workloads onto the 
> problematic agent (because it thinks those task resources are freed up). 
> Since we do not have overcommit check on agent, agent will comply and launch 
> those tasks. This will lead to over-allocation.
> One possible solution is to hold on the master status update until the agent 
> is done with the framework shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9940) Framework removal may lead to inconsistent task states between master and agent.

2019-10-29 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962161#comment-16962161
 ] 

Benjamin Bannier commented on MESOS-9940:
-

Like Meng Zhu wrote, the issue is that master currently immediately transitions 
all framework tasks and executors to terminal state after sending out 
\{{ShutdownFrameworkMessage}}s to the agents. The master does not wait for 
agent responses to confirm that the framework was indeed shut down on all 
agents.

Possible solutions need to introduce some feedback mechanism so the master can 
make sure the agents have carried out the framework removal:

1. \{{Master::removeFramework}} instructs agents to shut down the framework and 
transitions all master-owned operations and task launches (these seem to be the 
ones pending authorization; introduce a new framework state like \{{REMOVING}} 
and transition the framework state to it (but keep framework around).

Whenever a framework reregisters with a framework in \{{REMOVING}} state master 
would send it a \{{ShutdownFrameworkMessage}} as well. If the master knows 
about unreachable agents with tasks from the framework it could

2. Either
2a. Introduce an ACK message for \{{ShutdownFrameworkMessage}} and make master 
wait for it before carrying out final removal and transitions of tasks, 
executors, and operations. The agent might send this message after it has 
successfully terminated tasks and executors
2b. Have master wait for terminal executor, tasks and operation updates from 
the agent; if required the master would acknowledge. This requires 
modifications to the agent to make sure these updates are sent even though the 
framework is in \{{TERMINATING}} state on the agent side (e.g., around its 
\{{TaskStatusUpdateManager}}). Ideally this work would remove some knowlegde 
information around a framework's subscription status from the agent. This 
approach seems to introduce additional coupling between agent and master as 
they need to have a common idea on what constitutes an active vs terminated 
framework.

> Framework removal may lead to inconsistent task states between master and 
> agent.
> 
>
> Key: MESOS-9940
> URL: https://issues.apache.org/jira/browse/MESOS-9940
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When a framework is removed from the master (say due to disconnection), 
> master sends a `ShutdownFrameworkMessage` to the agent. At the same time, 
> master would transition the task status to e.g. KILLED. 
> (https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11247-L11291)
> When agent got the shutdown message, it would try to shutdown all the 
> executor and destroy all the containers. The tasks' status is updated after 
> all these are done. 
> (https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7914-L7922)
> However, if the executor shutdown gets stuck (e.g. due to hanging docker 
> daemon), the task status transition will never happen. And master and agent 
> will have diverged view of these tasks.
> One consequence is that masters may try to schedule more workloads onto the 
> problematic agent (because it thinks those task resources are freed up). 
> Since we do not have overcommit check on agent, agent will comply and launch 
> those tasks. This will lead to over-allocation.
> One possible solution is to hold on the master status update until the agent 
> is done with the framework shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (MESOS-9962) Mesos may report completed task as running in the state.

2019-10-23 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-9962:

Comment: was deleted

(was: Review: https://reviews.apache.org/r/71641/)

> Mesos may report completed task as running in the state.
> 
>
> Key: MESOS-9962
> URL: https://issues.apache.org/jira/browse/MESOS-9962
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Priority: Major
>  Labels: foundations
>
> When the following steps occur:
> 1) A graceful shutdown is initiated on the agent (i.e. SIGUSR1 or 
> /master/machine/down).
> 2) The executor is sent a kill, and the agent counts down on 
> executor_shutdown_grace_period.
> 3) The executor exits, before all terminal status updates reach the agent. 
> This is more likely if executor_shutdown_grace_period passes.
> This results in a completed executor, with non-terminal tasks (according to 
> status updates).
> This would produce a confusing report where completed tasks are still 
> TASK_RUNNING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9962) Mesos may report completed task as running in the state.

2019-10-23 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9962:
---

Assignee: (was: Benjamin Bannier)

> Mesos may report completed task as running in the state.
> 
>
> Key: MESOS-9962
> URL: https://issues.apache.org/jira/browse/MESOS-9962
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Priority: Major
>  Labels: foundations
>
> When the following steps occur:
> 1) A graceful shutdown is initiated on the agent (i.e. SIGUSR1 or 
> /master/machine/down).
> 2) The executor is sent a kill, and the agent counts down on 
> executor_shutdown_grace_period.
> 3) The executor exits, before all terminal status updates reach the agent. 
> This is more likely if executor_shutdown_grace_period passes.
> This results in a completed executor, with non-terminal tasks (according to 
> status updates).
> This would produce a confusing report where completed tasks are still 
> TASK_RUNNING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9940) Framework removal may lead to inconsistent task states between master and agent.

2019-10-23 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9940:
---

Assignee: Benjamin Bannier

> Framework removal may lead to inconsistent task states between master and 
> agent.
> 
>
> Key: MESOS-9940
> URL: https://issues.apache.org/jira/browse/MESOS-9940
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When a framework is removed from the master (say due to disconnection), 
> master sends a `ShutdownFrameworkMessage` to the agent. At the same time, 
> master would transition the task status to e.g. KILLED. 
> (https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11247-L11291)
> When agent got the shutdown message, it would try to shutdown all the 
> executor and destroy all the containers. The tasks' status is updated after 
> all these are done. 
> (https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7914-L7922)
> However, if the executor shutdown gets stuck (e.g. due to hanging docker 
> daemon), the task status transition will never happen. And master and agent 
> will have diverged view of these tasks.
> One consequence is that masters may try to schedule more workloads onto the 
> problematic agent (because it thinks those task resources are freed up). 
> Since we do not have overcommit check on agent, agent will comply and launch 
> those tasks. This will lead to over-allocation.
> One possible solution is to hold on the master status update until the agent 
> is done with the framework shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10018) Duplicate tasks if agent partitioned during maintenance down period

2019-10-23 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-10018:


Shepherd: Benno Evers
  Sprint: Foundations: RI-19 57
Assignee: Benjamin Bannier

> Duplicate tasks if agent partitioned during maintenance down period
> ---
>
> Key: MESOS-10018
> URL: https://issues.apache.org/jira/browse/MESOS-10018
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> When the master starts maintenance for a node it
> (1) sends a {{ShutdownMessage}} message to agent, and
> (2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves 
> them
> to the completed task set.
> If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message 
> dropped between (1) and (2), or agent process killed before the executor has 
> shut down), the agent could come back with the lost task running. It would 
> report the task on registration with the master, which would add it to the 
> list of active tasks. With that the same task could be both completed and 
> active.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10018) Duplicate tasks if agent partitioned during maintenance down period

2019-10-23 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-10018:


 Summary: Duplicate tasks if agent partitioned during maintenance 
down period
 Key: MESOS-10018
 URL: https://issues.apache.org/jira/browse/MESOS-10018
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Bannier


When the master starts maintenance for a node it

(1) sends a {{ShutdownMessage}} message to agent, and
(2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves 
them
to the completed task set.

If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message 
dropped between (1) and (2), or agent process killed before the executor has 
shut down), the agent could come back with the lost task running. It would 
report the task on registration with the master, which would add it to the list 
of active tasks. With that the same task could be both completed and active.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9962) Mesos may report completed task as running in the state.

2019-10-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944641#comment-16944641
 ] 

Benjamin Bannier commented on MESOS-9962:
-

A related issue is the early exit for the case where the framework is not 
connected, 
https://github.com/apache/mesos/blob/f1789b0fe5cad221b79a0bc2adfe2036cce6f33d/src/slave/slave.cpp#L5803-L5810.

> Mesos may report completed task as running in the state.
> 
>
> Key: MESOS-9962
> URL: https://issues.apache.org/jira/browse/MESOS-9962
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When the following steps occur:
> 1) A graceful shutdown is initiated on the agent (i.e. SIGUSR1 or 
> /master/machine/down).
> 2) The executor is sent a kill, and the agent counts down on 
> executor_shutdown_grace_period.
> 3) The executor exits, before all terminal status updates reach the agent. 
> This is more likely if executor_shutdown_grace_period passes.
> This results in a completed executor, with non-terminal tasks (according to 
> status updates).
> This would produce a confusing report where completed tasks are still 
> TASK_RUNNING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9962) Mesos may report completed task as running in the state.

2019-10-02 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9962:
---

Assignee: Benjamin Bannier

> Mesos may report completed task as running in the state.
> 
>
> Key: MESOS-9962
> URL: https://issues.apache.org/jira/browse/MESOS-9962
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When the following steps occur:
> 1) A graceful shutdown is initiated on the agent (i.e. SIGUSR1 or 
> /master/machine/down).
> 2) The executor is sent a kill, and the agent counts down on 
> executor_shutdown_grace_period.
> 3) The executor exits, before all terminal status updates reach the agent. 
> This is more likely if executor_shutdown_grace_period passes.
> This results in a completed executor, with non-terminal tasks (according to 
> status updates).
> This would produce a confusing report where completed tasks are still 
> TASK_RUNNING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10000) Consider removing 'hashmap::get'

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-1:


 Summary: Consider removing 'hashmap::get'
 Key: MESOS-1
 URL: https://issues.apache.org/jira/browse/MESOS-1
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Benjamin Bannier


{{hashmap::get}} returns an {{Option}} and is a convenient shortcut to 
avoid a {{contains}} check followed by e.g., {{hashmap::operator[]}}. Since it 
always creates a copy it can, however, be unnecessarily costly if the caller is 
not interested in storing the returned value, but e.g., only interested in 
invoking a member accessing a member in which case the {{contains}} check and 
{{operator[]}} are always preferrable for performance reasons.

We should consider removing {{hashmap::get}} since it can easily lead to 
expensive code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9999) Update scheduler API documentation for re-reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-:
---

 Summary: Update scheduler API documentation for re-reservations
 Key: MESOS-
 URL: https://issues.apache.org/jira/browse/MESOS-
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9998) End to end test of framework exercising re-reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9998:
---

 Summary: End to end test of framework exercising re-reservations
 Key: MESOS-9998
 URL: https://issues.apache.org/jira/browse/MESOS-9998
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9997) Remove 'contains' CHECKS in 'update' in the 'Sorter' impls

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9997:
---

 Summary: Remove 'contains' CHECKS in 'update' in the 'Sorter' impls
 Key: MESOS-9997
 URL: https://issues.apache.org/jira/browse/MESOS-9997
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to be able to update allocations so they can change a {{Resource}}'s 
reservation role. For that we need to remove existing {{contains}} checks in 
{{Sorter}} implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9995) Update 'injectAllocationInfo' to also update a reservation's 'source' field

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9995:
---

 Summary: Update 'injectAllocationInfo' to also update a 
reservation's 'source' field
 Key: MESOS-9995
 URL: https://issues.apache.org/jira/browse/MESOS-9995
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9996) Update 'internal::protobuf::stripAllocationInfo' for 'source' in RESERVE operations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9996:
---

 Summary: Update 'internal::protobuf::stripAllocationInfo' for 
'source' in RESERVE operations
 Key: MESOS-9996
 URL: https://issues.apache.org/jira/browse/MESOS-9996
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9994) Update master scheduler call validation for 'source' in RESERVE operations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9994:
---

 Summary: Update master scheduler call validation for 'source' in 
RESERVE operations
 Key: MESOS-9994
 URL: https://issues.apache.org/jira/browse/MESOS-9994
 Project: Mesos
  Issue Type: Task
 Environment: We need to ensure that {{source}} and {{resources}} have 
a common ancestor.
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9993) Update operator API documentation for re-reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9993:
---

 Summary: Update operator API documentation for re-reservations
 Key: MESOS-9993
 URL: https://issues.apache.org/jira/browse/MESOS-9993
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9992) Add end-to-end test excercising re-reservation operator API

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9992:
---

 Summary: Add end-to-end test excercising re-reservation operator 
API
 Key: MESOS-9992
 URL: https://issues.apache.org/jira/browse/MESOS-9992
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9991) Update 'Master::authorizeReserveResources' for re-reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9991:
---

 Summary: Update 'Master::authorizeReserveResources' for 
re-reservations
 Key: MESOS-9991
 URL: https://issues.apache.org/jira/browse/MESOS-9991
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to authorize all modifications to bring {{source}} to common ancestor, 
and from common ancestor to {{resources}}.
 * each removed authorizations needs to be authorized as an {{unreserve}} 
operation
 * each added reservation needs to be authorized as a {{reserve}} operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9990) Consolidate 'Master::authorizeReserveResources' overloads

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9990:
---

 Summary: Consolidate 'Master::authorizeReserveResources' overloads
 Key: MESOS-9990
 URL: https://issues.apache.org/jira/browse/MESOS-9990
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We should remove {{Master::authorizeReserveResources(Resources, 
Option}} in favor of {{Master::authorizeReserveResources)Reserve, 
Option)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9989) Update 'Master::Http::_reserve' to pass 'source' into generated operation

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9989:
---

 Summary: Update 'Master::Http::_reserve' to pass 'source' into 
generated operation
 Key: MESOS-9989
 URL: https://issues.apache.org/jira/browse/MESOS-9989
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9988) Add 'source' field to scheduler reservation API

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9988:
---

 Summary: Add 'source' field to scheduler reservation API
 Key: MESOS-9988
 URL: https://issues.apache.org/jira/browse/MESOS-9988
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9987) Update 'Master::Http::_reserve' to also require 'source' resources

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9987:
---

 Summary: Update 'Master::Http::_reserve' to also require 'source' 
resources
 Key: MESOS-9987
 URL: https://issues.apache.org/jira/browse/MESOS-9987
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to always pass {{source}} into {{Master::Http::_reserve}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9986) Update 'getConsumedResources' and 'getResourceConversions' for 'source' in reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9986:
---

 Summary: Update 'getConsumedResources' and 
'getResourceConversions' for 'source' in reservations
 Key: MESOS-9986
 URL: https://issues.apache.org/jira/browse/MESOS-9986
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9985) Update validation of 'ReserveResources' for 'source'

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9985:
---

 Summary: Update validation of 'ReserveResources' for 'source'
 Key: MESOS-9985
 URL: https://issues.apache.org/jira/browse/MESOS-9985
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to update {{master::validation::master::call}} for {{source}}. In 
particular we need to require that {{source}} and {{resources}} have a common 
ancestor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9984) Provide a function to compute a common "reservation ancestor" between two 'Resources'

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9984:
---

 Summary: Provide a function to compute a common "reservation 
ancestor" between two 'Resources'
 Key: MESOS-9984
 URL: https://issues.apache.org/jira/browse/MESOS-9984
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to provide a function to compute a common "reservation ancestor" 
between two resources, {{Try getReservationAncestor(const 
Resources&, const Resources&)}}.

The common ancestor can be found by repeatedly popping dynamic reservations 
from the full {{Resources}}.

We should test the following cases:
 * either LHS or RHS empty
 * both empty -> empty ancestor
 * {{STATIC}} reservations on path
 * partially reserved LHS/RHS (partially reserved: not all {{Resource}} have 
the same reservation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9983) Intermediate rejection of Reserve operations with source set

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9983:
---

 Summary: Intermediate rejection of Reserve operations with source 
set
 Key: MESOS-9983
 URL: https://issues.apache.org/jira/browse/MESOS-9983
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier


We need to update {{Master::authorizeReserveResources}} to reject any 
{{Reserve}} operation whenever {{source}} is set until we have a proper 
implementation in place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9982) Add a 'source' field to operator API ReserveResources protobuf

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9982:
---

 Summary: Add a 'source' field to operator API ReserveResources 
protobuf
 Key: MESOS-9982
 URL: https://issues.apache.org/jira/browse/MESOS-9982
 Project: Mesos
  Issue Type: Task
  Components: HTTP API
Reporter: Benjamin Bannier






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9981) Introduce a Mesos API to update reservations

2019-10-01 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9981:
---

 Summary: Introduce a Mesos API to update reservations
 Key: MESOS-9981
 URL: https://issues.apache.org/jira/browse/MESOS-9981
 Project: Mesos
  Issue Type: Epic
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


We should introduce an API which allows updating a resource reservations so 
e.g., persistent volumes can be moved between roles non-destructively.

Design doc: 
https://docs.google.com/document/d/1LFh0OkOEHslmK6xqok1fCn2MOqGefvNodusOOnV66Q4/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9961) Agent could fail to report completed tasks.

2019-10-01 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9961:
---

Assignee: (was: Benjamin Bannier)

> Agent could fail to report completed tasks.
> ---
>
> Key: MESOS-9961
> URL: https://issues.apache.org/jira/browse/MESOS-9961
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Priority: Major
>  Labels: foundations
>
> When agent reregisters with a master, we don't report completed executors for 
> active frameworks. We only report completed executors if the framework is 
> also completed on the agent:
> https://github.com/apache/mesos/blob/1.7.x/src/slave/slave.cpp#L1785-L1832



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9961) Agent could fail to report completed tasks.

2019-10-01 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941721#comment-16941721
 ] 

Benjamin Bannier commented on MESOS-9961:
-

It is unclear to me what the intended outcome for this ticket is. How would we 
use information on these completed executors, e.g., in {{GET_EXECUTOR}} calls? 
While we do provide information on completed frameworks (and their executors), 
the way they are used is very different from how we work on information on not 
yet completed frameworks (e.g., the master currently wouldn't be able to 
distinguish completed and not executed executors internally or on the API 
level).

I would suggest we scope this ticket better with a laundry list of outcomes 
this work should accomplish and ideally proposals on how this would affect APIs 
(e.g., {{ReregisterSlaveMessage}}, {{agent::GetExecutors}}, 
{{master::GetExecutors}}). We can then break down the work into smaller pieces. 
My initial assessment of 2 story points is way too low to capture this. 
[~greggomann], let's go back to the drawing board on this one.

> Agent could fail to report completed tasks.
> ---
>
> Key: MESOS-9961
> URL: https://issues.apache.org/jira/browse/MESOS-9961
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When agent reregisters with a master, we don't report completed executors for 
> active frameworks. We only report completed executors if the framework is 
> also completed on the agent:
> https://github.com/apache/mesos/blob/1.7.x/src/slave/slave.cpp#L1785-L1832



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-9980) HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky

2019-09-30 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940894#comment-16940894
 ] 

Benjamin Bannier edited comment on MESOS-9980 at 9/30/19 1:16 PM:
--

Bisecting the success rate points to [https://reviews.apache.org/r/71440] as 
the source of the flakiness [~mzhu] [~asekretenko].
||revision||success rate||σ||
|1.2.0-rc1-6091-g73033130d|1.0|0.1|
|1.2.0-rc1-6141-g73033130d|1.0|0.1|
|1.2.0-rc1-6147-g48819af30|1.0|0.1|
|1.2.0-rc1-6150-g2ec34ca59|1.0|0.1|
|1.2.0-rc1-6151-gfdaabac78|1.0|0.1|
|1.2.0-rc1-6152-g09003294f|0.22581|0.09449|
|1.2.0-rc1-6153-g783fd45c5|0.20690|0.09279|
|1.2.0-rc1-6166-g3478e40c6|0.27907|0.09111|
|1.2.0-rc1-6194-gf1789b0fe|0.28571|0.09352|


was (Author: bbannier):
Bisecting the success rate with 
[https://gist.github.com/bbannier/7f219cf087bbcaaebe58ba97f56737ec] points to 
[https://reviews.apache.org/r/71440] as the source of the flakiness [~mzhu] 
[~asekretenko].
||revision||success rate||σ||
|1.2.0-rc1-6091-g73033130d|1.0|0.1|
|1.2.0-rc1-6141-g73033130d|1.0|0.1|
|1.2.0-rc1-6147-g48819af30|1.0|0.1|
|1.2.0-rc1-6150-g2ec34ca59|1.0|0.1|
|1.2.0-rc1-6151-gfdaabac78|1.0|0.1|
|1.2.0-rc1-6152-g09003294f|0.22581|0.09449|
|1.2.0-rc1-6153-g783fd45c5|0.20690|0.09279|
|1.2.0-rc1-6166-g3478e40c6|0.27907|0.09111|
|1.2.0-rc1-6194-gf1789b0fe|0.28571|0.09352|

> HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky
> ---
>
> Key: MESOS-9980
> URL: https://issues.apache.org/jira/browse/MESOS-9980
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Major
>  Labels: allocator, resource-management
>
> This test seems to flake pretty quickly for me under system stress with 
> {{aed0b871479}},
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocatorTest
> [ RUN  ] HierarchicalAllocatorTest.MaintenanceInverseOffers
> I0930 12:36:48.872493 17846 hierarchical.cpp:921] Added agent agent1 (agent1) 
> with cpus:2; mem:1024 (offered or allocated: {})
> I0930 12:36:48.873165 17856 hierarchical.cpp:663] Added framework framework1
> I0930 12:36:48.873967 17852 hierarchical.cpp:663] Added framework framework2
> I0930 12:36:48.874433 17848 hierarchical.cpp:921] Added agent agent2 (agent2) 
> with cpus:2; mem:1024 (offered or allocated: {})
> ../src/tests/hierarchical_allocator_tests.cpp:1148: Failure
> Value of: deallocations.get().isPending()
>   Actual: false
> Expected: true
> {noformat}
> When saturating my system with {{stress-ng}} this test only succeeds ~30% of 
> the time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-9980) HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky

2019-09-30 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940894#comment-16940894
 ] 

Benjamin Bannier edited comment on MESOS-9980 at 9/30/19 11:39 AM:
---

Bisecting the success rate with 
[https://gist.github.com/bbannier/7f219cf087bbcaaebe58ba97f56737ec] points to 
[https://reviews.apache.org/r/71440] as the source of the flakiness [~mzhu] 
[~asekretenko].
||revision||success rate||σ||
|1.2.0-rc1-6091-g73033130d|1.0|0.1|
|1.2.0-rc1-6141-g73033130d|1.0|0.1|
|1.2.0-rc1-6147-g48819af30|1.0|0.1|
|1.2.0-rc1-6150-g2ec34ca59|1.0|0.1|
|1.2.0-rc1-6151-gfdaabac78|1.0|0.1|
|1.2.0-rc1-6152-g09003294f|0.22581|0.09449|
|1.2.0-rc1-6153-g783fd45c5|0.20690|0.09279|
|1.2.0-rc1-6166-g3478e40c6|0.27907|0.09111|
|1.2.0-rc1-6194-gf1789b0fe|0.28571|0.09352|


was (Author: bbannier):
Bisecting the failure rate with 
[https://gist.github.com/bbannier/7f219cf087bbcaaebe58ba97f56737ec] points to 
[https://reviews.apache.org/r/71440] as the source of the flakiness [~mzhu] 
[~asekretenko].
||revision||success rate||σ||
|1.2.0-rc1-6091-g73033130d|1.0|0.1|
|1.2.0-rc1-6141-g73033130d|1.0|0.1|
|1.2.0-rc1-6147-g48819af30|1.0|0.1|
|1.2.0-rc1-6150-g2ec34ca59|1.0|0.1|
|1.2.0-rc1-6151-gfdaabac78|1.0|0.1|
|1.2.0-rc1-6152-g09003294f|0.22581|0.09449|
|1.2.0-rc1-6153-g783fd45c5|0.20690|0.09279|
|1.2.0-rc1-6166-g3478e40c6|0.27907|0.09111|
|1.2.0-rc1-6194-gf1789b0fe|0.28571|0.09352|

> HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky
> ---
>
> Key: MESOS-9980
> URL: https://issues.apache.org/jira/browse/MESOS-9980
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Major
>  Labels: allocator, resource-management
>
> This test seems to flake pretty quickly for me under system stress with 
> {{aed0b871479}},
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocatorTest
> [ RUN  ] HierarchicalAllocatorTest.MaintenanceInverseOffers
> I0930 12:36:48.872493 17846 hierarchical.cpp:921] Added agent agent1 (agent1) 
> with cpus:2; mem:1024 (offered or allocated: {})
> I0930 12:36:48.873165 17856 hierarchical.cpp:663] Added framework framework1
> I0930 12:36:48.873967 17852 hierarchical.cpp:663] Added framework framework2
> I0930 12:36:48.874433 17848 hierarchical.cpp:921] Added agent agent2 (agent2) 
> with cpus:2; mem:1024 (offered or allocated: {})
> ../src/tests/hierarchical_allocator_tests.cpp:1148: Failure
> Value of: deallocations.get().isPending()
>   Actual: false
> Expected: true
> {noformat}
> When saturating my system with {{stress-ng}} this test only succeeds ~30% of 
> the time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9980) HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky

2019-09-30 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940894#comment-16940894
 ] 

Benjamin Bannier commented on MESOS-9980:
-

Bisecting the failure rate with 
[https://gist.github.com/bbannier/7f219cf087bbcaaebe58ba97f56737ec] points to 
[https://reviews.apache.org/r/71440] as the source of the flakiness [~mzhu] 
[~asekretenko].
||revision||success rate||σ||
|1.2.0-rc1-6091-g73033130d|1.0|0.1|
|1.2.0-rc1-6141-g73033130d|1.0|0.1|
|1.2.0-rc1-6147-g48819af30|1.0|0.1|
|1.2.0-rc1-6150-g2ec34ca59|1.0|0.1|
|1.2.0-rc1-6151-gfdaabac78|1.0|0.1|
|1.2.0-rc1-6152-g09003294f|0.22581|0.09449|
|1.2.0-rc1-6153-g783fd45c5|0.20690|0.09279|
|1.2.0-rc1-6166-g3478e40c6|0.27907|0.09111|
|1.2.0-rc1-6194-gf1789b0fe|0.28571|0.09352|

> HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky
> ---
>
> Key: MESOS-9980
> URL: https://issues.apache.org/jira/browse/MESOS-9980
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Major
>  Labels: allocator, resource-management
>
> This test seems to flake pretty quickly for me under system stress with 
> {{aed0b871479}},
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocatorTest
> [ RUN  ] HierarchicalAllocatorTest.MaintenanceInverseOffers
> I0930 12:36:48.872493 17846 hierarchical.cpp:921] Added agent agent1 (agent1) 
> with cpus:2; mem:1024 (offered or allocated: {})
> I0930 12:36:48.873165 17856 hierarchical.cpp:663] Added framework framework1
> I0930 12:36:48.873967 17852 hierarchical.cpp:663] Added framework framework2
> I0930 12:36:48.874433 17848 hierarchical.cpp:921] Added agent agent2 (agent2) 
> with cpus:2; mem:1024 (offered or allocated: {})
> ../src/tests/hierarchical_allocator_tests.cpp:1148: Failure
> Value of: deallocations.get().isPending()
>   Actual: false
> Expected: true
> {noformat}
> When saturating my system with {{stress-ng}} this test only succeeds ~30% of 
> the time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9980) HierarchicalAllocatorTest.MaintenanceInverseOffers is flaky

2019-09-30 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9980:
---

 Summary: HierarchicalAllocatorTest.MaintenanceInverseOffers is 
flaky
 Key: MESOS-9980
 URL: https://issues.apache.org/jira/browse/MESOS-9980
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Bannier


This test seems to flake pretty quickly for me under system stress with 
{{aed0b871479}},
{noformat}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from HierarchicalAllocatorTest
[ RUN  ] HierarchicalAllocatorTest.MaintenanceInverseOffers
I0930 12:36:48.872493 17846 hierarchical.cpp:921] Added agent agent1 (agent1) 
with cpus:2; mem:1024 (offered or allocated: {})
I0930 12:36:48.873165 17856 hierarchical.cpp:663] Added framework framework1
I0930 12:36:48.873967 17852 hierarchical.cpp:663] Added framework framework2
I0930 12:36:48.874433 17848 hierarchical.cpp:921] Added agent agent2 (agent2) 
with cpus:2; mem:1024 (offered or allocated: {})
../src/tests/hierarchical_allocator_tests.cpp:1148: Failure
Value of: deallocations.get().isPending()
  Actual: false
Expected: true
{noformat}

When saturating my system with {{stress-ng}} this test on succeeds ~30% of the 
time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9961) Agent could fail to report completed tasks.

2019-09-25 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9961:
---

Assignee: Benjamin Bannier

> Agent could fail to report completed tasks.
> ---
>
> Key: MESOS-9961
> URL: https://issues.apache.org/jira/browse/MESOS-9961
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Meng Zhu
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> When agent reregisters with a master, we don't report completed executors for 
> active frameworks. We only report completed executors if the framework is 
> also completed on the agent:
> https://github.com/apache/mesos/blob/1.7.x/src/slave/slave.cpp#L1785-L1832



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9978) Nvml isolator cannot be disabled which makes it impossible to exclude non-free code

2019-09-24 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9978:
---

 Summary: Nvml isolator cannot be disabled which makes it 
impossible to exclude non-free code
 Key: MESOS-9978
 URL: https://issues.apache.org/jira/browse/MESOS-9978
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


We currently do not allow disabling of the {{nvml}} isolator which depends on a 
very likely non-free license. This might make it hard to include Mesos at all 
in distributions requiring only free licenses.

We should add a configuration time flag to disable this feature completely 
until we can provide a free replacement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9968) WWWAuthenticate header parsing fails when commas are in (quoted) realm

2019-09-20 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9968:
---

Assignee: Benjamin Bannier

> WWWAuthenticate header parsing fails when commas are in (quoted) realm
> --
>
> Key: MESOS-9968
> URL: https://issues.apache.org/jira/browse/MESOS-9968
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Jan Schlicht
>Assignee: Benjamin Bannier
>Priority: Major
>
> This was discovered when trying to launch the 
> {{[nvcr.io/nvidia/tensorflow:19.08-py3|http://nvcr.io/nvidia/tensorflow:19.08-py3]}}
>  image using the Mesos containerizer. This launch fails with
> {noformat}
> Failed to launch container: Failed to get WWW-Authenticate header: Unexpected 
> auth-param format: 
> 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull' 
> in 
> 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull,push;'
> {noformat}
> This is because the [header tokenization in 
> libprocess|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L640]
>  can't handle commas in quoted realm values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-9948) master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a user.

2019-09-19 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933393#comment-16933393
 ] 

Benjamin Bannier edited comment on MESOS-9948 at 9/19/19 1:44 PM:
--

Looking at the trace more carefully the issue here _is not_ with inefficient 
lookups, but due to the creation of temporaries. The user likely has frameworks 
with lots of executors and by accessing the framework's executors with 
{{Option::get}} we force creation of a temporary copying the whole executor 
map, even though a reference would work just fine.
{code}
return executors.contains(frameworkId) &&
  executors.get(frameworkId)->contains(executorId);
{code}

This seems not seem to be an isolated instance of this misuse, but only one 
with particularly inefficient behavior. We should consider whether providing 
{{hashmap::get}} at all is useful given its potential for misuse.

Reviews:
https://reviews.apache.org/r/71519/
https://reviews.apache.org/r/71520/


was (Author: bbannier):
Looking at the trace more carefully the issue here _is not_ with inefficient 
lookups, but due to the creation of temporaries. The user likely has frameworks 
with lots of executors and by accessing the framework's executors with 
{{Option::get}} we force creation of a temporary copying the whole executor 
map, even though a reference would work just fine.
{code}
return executors.contains(frameworkId) &&
  executors.get(frameworkId)->contains(executorId);
{code}

This seems not seem to be an isolated instance of this misuse, but only one 
with particularly inefficient case. We should consider whether providing 
{{hashmap::get}} at all is useful given its potential for misuse.

Reviews:
https://reviews.apache.org/r/71519/
https://reviews.apache.org/r/71520/

> master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a 
> user.
> 
>
> Key: MESOS-9948
> URL: https://issues.apache.org/jira/browse/MESOS-9948
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, performance
> Attachments: long-fei-enable-debug-slow-master.gz
>
>
> If you drop the attached perf stacks into flamescope, you can see that 
> mesos::internal::master::Slave::hasExecutor occupies 37% of the overall 
> samples!
> This function does 3 hashmap lookups, 1 can be eliminated for a quick win. 
> However, the larger improvement here will come from eliminating many of the 
> calls to this function.
> This was reported by [~carlone].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9948) master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a user.

2019-09-19 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933393#comment-16933393
 ] 

Benjamin Bannier commented on MESOS-9948:
-

Looking at the trace more carefully the issue here _is not_ with inefficient 
lookups, but due to the creation of temporaries. The user likely has frameworks 
with lots of executors and by accessing the framework's executors with 
{{Option::get}} we force creation of a temporary copying the whole executor 
map, even though a reference would work just fine.
{code}
return executors.contains(frameworkId) &&
  executors.get(frameworkId)->contains(executorId);
{code}

This seems not seem to be an isolated instance of this misuse, but only one 
with particularly inefficient case. We should consider whether providing 
{{hashmap::get}} at all is useful given its potential for misuse.

Reviews:
https://reviews.apache.org/r/71519/
https://reviews.apache.org/r/71520/

> master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a 
> user.
> 
>
> Key: MESOS-9948
> URL: https://issues.apache.org/jira/browse/MESOS-9948
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, performance
> Attachments: long-fei-enable-debug-slow-master.gz
>
>
> If you drop the attached perf stacks into flamescope, you can see that 
> mesos::internal::master::Slave::hasExecutor occupies 37% of the overall 
> samples!
> This function does 3 hashmap lookups, 1 can be eliminated for a quick win. 
> However, the larger improvement here will come from eliminating many of the 
> calls to this function.
> This was reported by [~carlone].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-9948) master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a user.

2019-09-19 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9948:
---

Assignee: Benjamin Bannier

> master::Slave::hasExecutor occupies 37% of a 150 second perf sample from a 
> user.
> 
>
> Key: MESOS-9948
> URL: https://issues.apache.org/jira/browse/MESOS-9948
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, performance
> Attachments: long-fei-enable-debug-slow-master.gz
>
>
> If you drop the attached perf stacks into flamescope, you can see that 
> mesos::internal::master::Slave::hasExecutor occupies 37% of the overall 
> samples!
> This function does 3 hashmap lookups, 1 can be eliminated for a quick win. 
> However, the larger improvement here will come from eliminating many of the 
> calls to this function.
> This was reported by [~carlone].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9630) Consider moving linter setup to pre-commit

2019-09-18 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932300#comment-16932300
 ] 

Benjamin Bannier commented on MESOS-9630:
-

Moving r/71300 to MESOS-9974 so this ticket can be closed.

> Consider moving linter setup to pre-commit
> --
>
> Key: MESOS-9630
> URL: https://issues.apache.org/jira/browse/MESOS-9630
> Project: Mesos
>  Issue Type: Wish
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Mesos currently uses a mix of hand-crafted git commit hooks and mesos-style 
> to perform linting. While this has served us well our current approach also 
> has some drawbacks, e.g.,
>  * the linter setup is spread between hooks and {{support/mesos-style.py}}
>  * adding new linters can be cumbersome
>  * mesos-style.py uses a process where it creates a single virtualenv to 
> install linters in which is tie d to the source tree
>  * linter dependencies are only cached to an extent and it is easy to run 
> into a situation where one needs to update linter dependencies over the 
> network even though one has successfully linted a revision before
>  * {{support/mesos-style.py}} lacks a number of features, e.g., running over 
> only staged files, running linters in parallel for improved throughput, 
> running only specific linters or disabling certain linters, and the 
> parameterization of the linters is strongly coupled to implementation of the 
> style checker itself.
> The [pre-commit tool|https://pre-commit.com/] solves most of these issues and 
> using it in Mesos would not only allow us to get rid of tooling which is hard 
> to maintain, but also unlock other features. It is licensed under a MIT 
> license. We should consider moving our linting setup over to pre-commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-9974) Remove support/mesos-style.py transition script

2019-09-18 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9974:
---

 Summary: Remove support/mesos-style.py transition script
 Key: MESOS-9974
 URL: https://issues.apache.org/jira/browse/MESOS-9974
 Project: Mesos
  Issue Type: Task
Affects Versions: 1.10
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


In MESOS-9360 we have moved our linter stack to pre-commit. We still have a 
dummy script {{support/mesos-style.py}} in tree instructing developers to 
migrate. We should remove it before releasing {{1.10.0}}, but give enough 
transition time so developers have transitioned their setups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9630) Consider moving linter setup to pre-commit

2019-09-18 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932299#comment-16932299
 ] 

Benjamin Bannier commented on MESOS-9630:
-

The following patches have landed on {{master}} (1.10-dev):
{noformat}
commit fb467a03cb8a5bde5147dc06ca6b73c9df04ff48
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:19 2019 +0200

Enabled a number of additional pre-commit checks.

This patch enables checkers for well-formed YAML and JSON, and a linter
which checks that all executable scripts have a valid shebang line.

Review: https://reviews.apache.org/r/71209/

commit 2af339668fd90212999bae06a050a05824f2971e
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:18 2019 +0200

Revert "Updated cpplint to be compatible with Python 3."

This reverts commit 89db66e3df831eaa50fffb4149a3894097505c14.

This patch was necessary when we were running cpplint in the python3
environment used e.g., also for bindings and other scripts. With
pre-commit we have freedom to choose the Python environment needed so we
can undo our adjustments here to stay closer to upstream.

Review: https://reviews.apache.org/r/71208/

commit 3478e40c656160b8f08e0ad8c154289417bb6aaa
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:17 2019 +0200

Revert "Updated cpplint.py to be less verbose when there is no linting 
issue."

This reverts commit c0f8f56d5a93f3fb870e448fedfd22f1491356ca.

This patch was necessary when we were running cpplint via
`support/mesos-style.py` to prevent it from cluttering up the hook
output. When running under pre-commit linter output is not shown if no
errors occur so we can undo our change to stay closer to upstream.

Review: https://reviews.apache.org/r/71207/

commit 37d76fff124d28a0281b9231058bb1b92fc65abe
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:15 2019 +0200

Removed old mesos-style and references.

This patch removes references to `support/mesos-style.py` which was
replaced with a pre-commit setup in a previous commit. We also remove
the tool itself.

Review: https://reviews.apache.org/r/71206/

commit 454661dd0dcbb7a7bc87ac58ad74fd6dd04c5c15
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:14 2019 +0200

Switched commit hooks to pre-commit.

This patch switches commit hooks to be orchestrated by the pre-commit
tool mirroring the previous linters invoked through git commit
hooks (orchestrated by `support/mesos-style.py` or standalone hooks).

Using pre-commit removes the burden of maintaining
`support/mesos-style.py`, making sure that hooks have the expected
environment (e.g., Python version, Node installed). Additionally,
upstream provides a number of additional linters which are not hard to
add to Mesos' hooks.

Review: https://reviews.apache.org/r/71205/

commit a138c2bd7cb3749f1dceb0e520e1138536abb531
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:13 2019 +0200

Added separate script to install developer setup.

This patch breaks the installation of developer tools (i.e., linter
configuration files and git hooks) out of `./bootstrap`. This not only
simplifies and streamlines the setup, but will allow us to add
developer-only features without breaking users who are just interested
in building a distribution tarball.

Review: https://reviews.apache.org/r/71299/

commit cbaca81a54720771662c119c80aec6101f120afc
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:11 2019 +0200

Added gitlint config.

This patch adds a config for the gitlint tool which is slated to replace
a custom commit-msg hook once we switch our hook infrastructure to the
pre-commit tool.

Review: https://reviews.apache.org/r/71204/

commit 526043b586da0201fd7e374197139e75b249e299
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:10 2019 +0200

Added check script to check for license headers.

This check adds a script which validates that source files have valid
license headers. This will allow us to reuse this functionality with
e.g., the pre-commit tool.

At the moment the code added here is not invoked from
`support/mesos-style.py` since it will be removed in a follow-up commit.

Review: https://reviews.apache.org/r/71203/

commit 2232d48ce5b07c4e094c6850cb28212495824110
Author: Benjamin Bannier 
Date:   Wed Sep 18 11:37:09 2019 +0200

Moved cpplint configuration into dedicated file.

With this change we not only reduce the amount of code in
`support/mesos-style.py` in favor of a configuration supported by
upstream, but we also make it easier to interoperate with editor
integrations for cpplint.

Review: https://reviews.apache.org/r/70096/
{noformat}

> Consider moving linter setup to pre-commit
> 

[jira] [Commented] (MESOS-9798) How to reduce compile time after had changed/improved source code?

2019-09-17 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931672#comment-16931672
 ] 

Benjamin Bannier commented on MESOS-9798:
-

[~rchatsiri], you should take advantage of parallelized processing when 
invoking {{make}}, i.e., your step (3) above could be (assuming your dev 
machine has 12 cores)
{noformat}
$ make -j 12
{noformat}

This assumes that your {{MAKEFLAGS}} environment variable does not already 
contain {{-j 12}} or similar. With that {{make}} would perform e.g., up to 12 
parallel compilation processes; steps like linking (e.g., of {{libmesos}}) are 
still mostly sequential and become a bottleneck for highly parallelized builds 
(linking {{libmesos}} can take up to a minute depending on your hardware, used 
linker, and flags).

> How to reduce compile time after had changed/improved source code?
> --
>
> Key: MESOS-9798
> URL: https://issues.apache.org/jira/browse/MESOS-9798
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Affects Versions: 1.8.0
> Environment: Linux firework-vm01 4.9.0-9-amd64 #1 SMP Debian 
> 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux
>Reporter: chatsiri
>Priority: Minor
>  Labels: newbie
>
> Hello all, 
>      I'm have changed variables in src/ directory finished, but compiler 
> using long time to finished build steps. How can reduces compile time per 
> component or source directory? Such as an simple steps below
>  # I'm add new member function to class Docker on docker.hpp. This class 
> declares on file at docker directory.
>  # Compile source again from build directory. This directory create on the 
> base source code directory same src/ , bin/ and include/.
>  # Come to build path with 
>  ## $cd build
>  ## $../configure --disable-python --disable-java --enable-debug 
> --enable-fast-install
>  ## $make
>  ## $sudo make install.   
> In steps No.3. Compiler used long time compiles source code. How we can 
> reduce compile time per source directory that we had changed its?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-9412) Create a Kubernetes framework for Mesos

2019-09-05 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923208#comment-16923208
 ] 

Benjamin Bannier commented on MESOS-9412:
-

Thanks for filing this issue [~Cameron]. Have you made any progress on this? If 
yes, please share e.g., on the dev mailing list. I am sure this would be 
interesting for a wider audience.

> Create a Kubernetes framework for Mesos
> ---
>
> Key: MESOS-9412
> URL: https://issues.apache.org/jira/browse/MESOS-9412
> Project: Mesos
>  Issue Type: Wish
>Reporter: Cameron Chen
>Priority: Minor
>
> Currently no Kubernetes framework for Mesos is available today. Kubernetes is 
> gaining rapid adoption and we would like to provide the Kubernetes API to our 
> end users. The virtual kubelet model allows us to use the Kubernetes API 
> while leveraging Mesos as the cluster backend.  
>  
> [https://github.com/virtual-kubelet/virtual-kubelet]
>  
> We are interested in collaborating with the community to create a Mesos 
> Provider for Virtual Kubelet. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2019-09-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917584#comment-16917584
 ] 

Benjamin Bannier edited comment on MESOS-8400 at 9/4/19 1:51 PM:
-

Reviews:

 -[https://reviews.apache.org/r/71382]-
 -[https://reviews.apache.org/r/71383]-
 [https://reviews.apache.org/r/71384]
 [https://reviews.apache.org/r/71385]


was (Author: bbannier):
Reviews:

-https://reviews.apache.org/r/71382-
 -[https://reviews.apache.org/r/71383]-
 [https://reviews.apache.org/r/71384]
 [https://reviews.apache.org/r/71385]

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2019-09-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917584#comment-16917584
 ] 

Benjamin Bannier edited comment on MESOS-8400 at 9/4/19 1:50 PM:
-

Reviews:

[-https://reviews.apache.org/r/71382-]
 -[https://reviews.apache.org/r/71383]-
 [https://reviews.apache.org/r/71384]
 [https://reviews.apache.org/r/71385]


was (Author: bbannier):
Reviews:

https://reviews.apache.org/r/71382
 https://reviews.apache.org/r/71383
 https://reviews.apache.org/r/71384
 https://reviews.apache.org/r/71385

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2019-09-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922502#comment-16922502
 ] 

Benjamin Bannier commented on MESOS-8400:
-

{noformat}
commit d1b32cc3753001f7001dfa30fcea9000264001ef
Author: Benjamin Bannier 
Date:   Wed Sep 4 13:03:22 2019 +0200

Added stringification for resource provider calls.

Review: https://reviews.apache.org/r/71383/

commit 4676938dbff75ab0badd6dad35496285ddcff65c
Author: Benjamin Bannier 
Date:   Wed Sep 4 13:03:20 2019 +0200

Removed unused and unimplemented method declaration.

Review: https://reviews.apache.org/r/71382/
 {noformat}

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2019-09-04 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917584#comment-16917584
 ] 

Benjamin Bannier edited comment on MESOS-8400 at 9/4/19 1:50 PM:
-

Reviews:

-https://reviews.apache.org/r/71382-
 -[https://reviews.apache.org/r/71383]-
 [https://reviews.apache.org/r/71384]
 [https://reviews.apache.org/r/71385]


was (Author: bbannier):
Reviews:

[-https://reviews.apache.org/r/71382-]
 -[https://reviews.apache.org/r/71383]-
 [https://reviews.apache.org/r/71384]
 [https://reviews.apache.org/r/71385]

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (MESOS-9958) New CLI is not included in distribution tarball

2019-09-03 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9958:
---

Assignee: Benjamin Bannier

> New CLI is not included in distribution tarball
> ---
>
> Key: MESOS-9958
> URL: https://issues.apache.org/jira/browse/MESOS-9958
> Project: Mesos
>  Issue Type: Bug
>  Components: build, cli, release
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>
> The files needed to build the new CLI are not included in distribution 
> tarballs. This makes it impossible to build the CLI from released tarballs, 
> and users have instead build directly from the git sources.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (MESOS-9958) New CLI is not included in distribution tarball

2019-09-02 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9958:
---

 Summary: New CLI is not included in distribution tarball
 Key: MESOS-9958
 URL: https://issues.apache.org/jira/browse/MESOS-9958
 Project: Mesos
  Issue Type: Bug
  Components: build, cli, release
Reporter: Benjamin Bannier


The files needed to build the new CLI are not included in distribution 
tarballs. This makes it impossible to build the CLI from released tarballs, and 
users have instead build directly from the git sources.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (MESOS-9952) ExampleTest.DiskFullFramework is slow

2019-08-22 Thread Benjamin Bannier (Jira)
Benjamin Bannier created MESOS-9952:
---

 Summary: ExampleTest.DiskFullFramework is slow
 Key: MESOS-9952
 URL: https://issues.apache.org/jira/browse/MESOS-9952
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


Executing {{ExampleTest.DiskFullFramework}} on my setup takes almost 18s in a 
not optimized build. This is way too long for a default-enabled test.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (MESOS-8400) Handle plugin crashes gracefully in SLRP recovery.

2019-08-21 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-8400:
---

  Sprint: Resource Mgmt: RI-17 Sprint 53
Assignee: Benjamin Bannier

> Handle plugin crashes gracefully in SLRP recovery.
> --
>
> Key: MESOS-8400
> URL: https://issues.apache.org/jira/browse/MESOS-8400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> When a CSI plugin crashes, the container daemon in SLRP will reset its 
> corresponding {{csi::Client}} service future. However, if a CSI call races 
> with a plugin crash, the call may be issued before the service future is 
> reset, resulting in a failure for that CSI call. MESOS-9517 partly addresses 
> this for {{CreateVolume}} and {{DeleteVolume}} calls, but calls in the SLRP 
> recovery path, e.g., {{ListVolume}}, {{GetCapacity}}, {{Probe}}, could make 
> the SLRP unrecoverable.
> There are two main issues:
>  1. For {{Probe}}, we should investigate if it is needed to make a few retry 
> attempts, then after that, we should recover from failed attempts (e.g., kill 
> the plugin container), then make the container daemon relaunch the plugin 
> instead of failing the daemon.
> 2. For other calls in the recovery path, we should either retry the call, or 
> make the local resource provider daemon be able to restart the SLRP after it 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (MESOS-9482) Resource provider manager can crash on invalid data from resource providers

2019-08-20 Thread Benjamin Bannier (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9482:
---

Assignee: Benjamin Bannier

> Resource provider manager can crash on invalid data from resource providers
> ---
>
> Key: MESOS-9482
> URL: https://issues.apache.org/jira/browse/MESOS-9482
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> The resource provider manager code currently contains a number of assertions 
> which will crash the manager (and its agent) if some forms of invalid data 
> are received from a resource provider. This is dangerous since resource 
> providers are not necessarily part of Mesos-controlled code (they talk to the 
> manager over an HTTP API and could even be in external processes).
> Instead of crashing, the resource provider manager should disconnect the 
> resource providers in such scenarios.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (MESOS-9630) Consider moving linter setup to pre-commit

2019-08-16 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896520#comment-16896520
 ] 

Benjamin Bannier edited comment on MESOS-9630 at 8/16/19 7:43 PM:
--

Reviews:
https://reviews.apache.org/r/70096/
https://reviews.apache.org/r/71203/
https://reviews.apache.org/r/71204/
https://reviews.apache.org/r/71299/
https://reviews.apache.org/r/71205/
https://reviews.apache.org/r/71206/
https://reviews.apache.org/r/71207/
https://reviews.apache.org/r/71208/
https://reviews.apache.org/r/71209/

To be submitted when above chain has been submitted for some time:
https://reviews.apache.org/r/71300/



was (Author: bbannier):
Reviews:
https://reviews.apache.org/r/70096/
https://reviews.apache.org/r/71203/
https://reviews.apache.org/r/71204/
https://reviews.apache.org/r/71205/
https://reviews.apache.org/r/71206/
https://reviews.apache.org/r/71207/
https://reviews.apache.org/r/71208/
https://reviews.apache.org/r/71209/


> Consider moving linter setup to pre-commit
> --
>
> Key: MESOS-9630
> URL: https://issues.apache.org/jira/browse/MESOS-9630
> Project: Mesos
>  Issue Type: Wish
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Mesos currently uses a mix of hand-crafted git commit hooks and mesos-style 
> to perform linting. While this has served us well our current approach also 
> has some drawbacks, e.g.,
>  * the linter setup is spread between hooks and {{support/mesos-style.py}}
>  * adding new linters can be cumbersome
>  * mesos-style.py uses a process where it creates a single virtualenv to 
> install linters in which is tie d to the source tree
>  * linter dependencies are only cached to an extent and it is easy to run 
> into a situation where one needs to update linter dependencies over the 
> network even though one has successfully linted a revision before
>  * {{support/mesos-style.py}} lacks a number of features, e.g., running over 
> only staged files, running linters in parallel for improved throughput, 
> running only specific linters or disabling certain linters, and the 
> parameterization of the linters is strongly coupled to implementation of the 
> style checker itself.
> The [pre-commit tool|https://pre-commit.com/] solves most of these issues and 
> using it in Mesos would not only allow us to get rid of tooling which is hard 
> to maintain, but also unlock other features. It is licensed under a MIT 
> license. We should consider moving our linting setup over to pre-commit.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-8808) CSI documentation has a broken link to a non-existent page.

2019-08-13 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906128#comment-16906128
 ] 

Benjamin Bannier commented on MESOS-8808:
-

[~joseph], is there anything we can help with to get 
[https://reviews.apache.org/r/65112/] over the finish line?

> CSI documentation has a broken link to a non-existent page.
> ---
>
> Key: MESOS-8808
> URL: https://issues.apache.org/jira/browse/MESOS-8808
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, storage
>Affects Versions: 1.5.0
>Reporter: Gastón Kleiman
>Priority: Major
>  Labels: csi, documentation, mesosphere
>
> There's a broken link to a non-existent {{resource-provider.md}} document 
> here: https://mesos.apache.org/documentation/latest/csi/#resource-providers



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (MESOS-8808) CSI documentation has a broken link to a non-existent page.

2019-08-13 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906128#comment-16906128
 ] 

Benjamin Bannier edited comment on MESOS-8808 at 8/13/19 12:08 PM:
---

[~kaysoky], is there anything we can help with to get 
[https://reviews.apache.org/r/65112/] over the finish line?


was (Author: bbannier):
[~joseph], is there anything we can help with to get 
[https://reviews.apache.org/r/65112/] over the finish line?

> CSI documentation has a broken link to a non-existent page.
> ---
>
> Key: MESOS-8808
> URL: https://issues.apache.org/jira/browse/MESOS-8808
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, storage
>Affects Versions: 1.5.0
>Reporter: Gastón Kleiman
>Priority: Major
>  Labels: csi, documentation, mesosphere
>
> There's a broken link to a non-existent {{resource-provider.md}} document 
> here: https://mesos.apache.org/documentation/latest/csi/#resource-providers



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (MESOS-9560) ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky

2019-08-13 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905216#comment-16905216
 ] 

Benjamin Bannier edited comment on MESOS-9560 at 8/13/19 9:29 AM:
--

Reviews:
[https://reviews.apache.org/r/71272/]
[https://reviews.apache.org/r/71277/]


was (Author: bbannier):
Review: https://reviews.apache.org/r/71272/

> ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
> 
>
> Key: MESOS-9560
> URL: https://issues.apache.org/jira/browse/MESOS-9560
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere, storage, test
> Fix For: 1.9.0
>
> Attachments: consoleText.txt
>
>
> We observed a segfault in 
> {{ContentType/AgentAPITest.MarkResourceProviderGone/1}} on test teardown.
> {noformat}
> I0131 23:55:59.378453  6798 slave.cpp:923] Agent terminating
> I0131 23:55:59.378813 31143 master.cpp:1269] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal) disconnected
> I0131 23:55:59.378831 31143 master.cpp:3272] Disconnecting agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378846 31143 master.cpp:3291] Deactivating agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378891 31143 hierarchical.cpp:793] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 deactivated
> F0131 23:55:59.378891 31149 logging.cpp:67] RAW: Pure virtual method called
> @ 0x7f633aaaebdd  google::LogMessage::Fail()
> @ 0x7f633aab6281  google::RawLog__()
> @ 0x7f6339821262  __cxa_pure_virtual
> @ 0x55671cacc113  
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @ 0x55671b532e78  
> mesos::internal::tests::resource_provider::MockResourceProvider<>::disconnected()
> @ 0x7f633978f6b0  process::AsyncExecutorProcess::execute<>()
> @ 0x7f633979f218  
> _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_
> @ 0x7f633a9f5d01  process::ProcessBase::consume()
> @ 0x7f633aa1a08a  process::ProcessManager::resume()
> @ 0x7f633aa1db06  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f633acc9f80  execute_native_thread_routine
> @ 0x7f6337142e25  start_thread
> @ 0x7f6336241bad  __clone
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9560) ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky

2019-08-12 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905214#comment-16905214
 ] 

Benjamin Bannier commented on MESOS-9560:
-

Reopening since we are still observing similar failures.

> ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
> 
>
> Key: MESOS-9560
> URL: https://issues.apache.org/jira/browse/MESOS-9560
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere, storage, test
> Fix For: 1.9.0
>
> Attachments: consoleText.txt
>
>
> We observed a segfault in 
> {{ContentType/AgentAPITest.MarkResourceProviderGone/1}} on test teardown.
> {noformat}
> I0131 23:55:59.378453  6798 slave.cpp:923] Agent terminating
> I0131 23:55:59.378813 31143 master.cpp:1269] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal) disconnected
> I0131 23:55:59.378831 31143 master.cpp:3272] Disconnecting agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378846 31143 master.cpp:3291] Deactivating agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378891 31143 hierarchical.cpp:793] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 deactivated
> F0131 23:55:59.378891 31149 logging.cpp:67] RAW: Pure virtual method called
> @ 0x7f633aaaebdd  google::LogMessage::Fail()
> @ 0x7f633aab6281  google::RawLog__()
> @ 0x7f6339821262  __cxa_pure_virtual
> @ 0x55671cacc113  
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @ 0x55671b532e78  
> mesos::internal::tests::resource_provider::MockResourceProvider<>::disconnected()
> @ 0x7f633978f6b0  process::AsyncExecutorProcess::execute<>()
> @ 0x7f633979f218  
> _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_
> @ 0x7f633a9f5d01  process::ProcessBase::consume()
> @ 0x7f633aa1a08a  process::ProcessManager::resume()
> @ 0x7f633aa1db06  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f633acc9f80  execute_native_thread_routine
> @ 0x7f6337142e25  start_thread
> @ 0x7f6336241bad  __clone
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (MESOS-5542) Add support for wrapping of move-only types in Future

2019-08-04 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899583#comment-16899583
 ] 

Benjamin Bannier edited comment on MESOS-5542 at 8/4/19 8:39 AM:
-

The major complication here is that in order to support move-only types in 
continuations {{process::Future}} would need to support move-only callbacks. 
Currently, we use thinly wrapped {{std::function}} for the callbacks which 
require {{CopyConstructible}} function objects.


was (Author: bbannier):
The major complication here is that in order to support move-only types in 
continuations we {{process::Future}} would need to support move-only callbacks. 
Currently, we use thinly wrapped {{std::function}} for the callbacks which 
requires {{CopyConstructible}} function objects.

> Add support for wrapping of move-only types in Future 
> --
>
> Key: MESOS-5542
> URL: https://issues.apache.org/jira/browse/MESOS-5542
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Bannier
>Priority: Major
>  Labels: c++11, mesosphere
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (MESOS-2238) Use Owned<> for Process pointers in wrapper classes

2019-08-04 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-2238:
---

Assignee: (was: Akanksha Agrawal)
  Labels:   (was: easyfix newbie)

Unassigning this as there hasn't been any progress.

> Use Owned<> for Process pointers in wrapper classes
> ---
>
> Key: MESOS-2238
> URL: https://issues.apache.org/jira/browse/MESOS-2238
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Priority: Major
>
> A common pattern in our code (see e.g. {{Isolator}}, {{DockerContainerizer}}, 
> {{Allocator}}) is to wrap Process-based class into a non Process-one. 
> However, our code base is inconsistent about how we store the pointer to the 
> underlying class: somewhere we wrap it into {{Owned<>}} (see e.g. 
> {{Isolator}}, {{DockerContainerizer}}), somewhere it is a raw pointer (see 
> e.g. {{Allocator}}, {{ExternalContainerizer}}).
> Using {{Owned<>}} for this particular case is preferable, since it hints the 
> reader about the correct semantics and intention. For consistency reason, 
> sweep through the code base and replace raw pointers with its {{Owned<>}} 
> counterpart.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5542) Add support for wrapping of move-only types in Future

2019-08-04 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899583#comment-16899583
 ] 

Benjamin Bannier commented on MESOS-5542:
-

The major complication here is that in order to support move-only types in 
continuations we {{process::Future}} would need to support move-only callbacks. 
Currently, we use thinly wrapped {{std::function}} for the callbacks which 
requires {{CopyConstructible}} function objects.

> Add support for wrapping of move-only types in Future 
> --
>
> Key: MESOS-5542
> URL: https://issues.apache.org/jira/browse/MESOS-5542
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Bannier
>Priority: Major
>  Labels: c++11, mesosphere
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-6382) Add option to enable parallel test runner for cmake builds

2019-08-02 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898720#comment-16898720
 ] 

Benjamin Bannier commented on MESOS-6382:
-

{noformat}
commit e1176c453d04a8ef8f53cf23928b5bbb09173d78
Author: Benjamin Bannier 
Date: Fri Aug 2 11:10:37 2019 +0200

Renamed cmake parameter for parallel test execution.

The Jenkins setup (which uses `support/docker-build.sh` under the
covers) is parameterized with the `CONFIGURATION` environment variable.
While in we pass configure-style flags for both autotools and cmake
builds to it in the Jenkins
configuration, the script performs transformations so that
configure-style flags are transformed to cmake-style (replace `_` with
`-`, uppercase flags, replace `--` with `-D`).

We disable parallel test execution in Jenkins by passing
`--disable-parallel-test-execution` which with the transformations in
`support/docker-build.sh` leads to a cmake arg
`-DDISABLE_PARALLEL_TEST_EXECUTION=1`. This patch renames the cmake arg
from a default enabled `ENABLE_PARALLEL_TEST_EXECUTION` to a default
disabled `DISABLE_PARALLEL_TEST_EXECUTION` to support this workflow.

Review: https://reviews.apache.org/r/71232/
{noformat}

> Add option to enable parallel test runner for cmake builds
> --
>
> Key: MESOS-6382
> URL: https://issues.apache.org/jira/browse/MESOS-6382
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Benjamin Bannier
>Priority: Major
> Fix For: 1.9.0
>
>
> We should add a config option to enable the parallel test runner already 
> available in the autotools setup also in the cmake setup.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (MESOS-9907) Retain agent draining start time in master

2019-07-25 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9907:
---

Assignee: Benjamin Bannier

> Retain agent draining start time in master
> --
>
> Key: MESOS-9907
> URL: https://issues.apache.org/jira/browse/MESOS-9907
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations
>
> The master should store in memory the last time that a {{DrainSlaveMessage}} 
> was sent to the agent so that this time can be displayed in the web UI. This 
> would help operators determine the expected time at which the agent should 
> transition to DRAINED.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9836) Docker containerizer overwrites `/mesos/slave` cgroups.

2019-07-25 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892622#comment-16892622
 ] 

Benjamin Bannier commented on MESOS-9836:
-

Any update on this [~gilbert]? We saw this again. This is a bad issue as it 
mainly manifests as just degraded agent performance. It could only be caught by 
monitoring of the agent cgroups. Can you please bump this up?

> Docker containerizer overwrites `/mesos/slave` cgroups.
> ---
>
> Key: MESOS-9836
> URL: https://issues.apache.org/jira/browse/MESOS-9836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Chun-Hung Hsiao
>Priority: Critical
>  Labels: docker, mesosphere
>
> The following bug was observed on our internal testing cluster.
> The docker containerizer launched a container on an agent:
> {noformat}
> I0523 06:00:53.888579 21815 docker.cpp:1195] Starting container 
> 'f69c8a8c-eba4-4494-a305-0956a44a6ad2' for task 
> 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1' (and executor 
> 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1') of framework 
> 415284b7-2967-407d-b66f-f445e93f064e-0011
> I0523 06:00:54.524171 21815 docker.cpp:783] Checkpointing pid 13716 to 
> '/var/lib/mesos/slave/meta/slaves/60c42ab7-eb1a-4cec-b03d-ea06bff00c3f-S2/frameworks/415284b7-2967-407d-b66f-f445e93f064e-0011/executors/apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1/runs/f69c8a8c-eba4-4494-a305-0956a44a6ad2/pids/forked.pid'
> {noformat}
> After the container was launched, the docker containerizer did a {{docker 
> inspect}} on the container and cached the pid:
>  
> [https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1764]
>  The pid should be slightly greater than 13716.
> The docker executor sent a {{TASK_FINISHED}} status update around 16 minutes 
> later:
> {noformat}
> I0523 06:16:17.287595 21809 slave.cpp:5566] Handling status update 
> TASK_FINISHED (Status UUID: 4e00b786-b773-46cd-8327-c7deb08f1de9) for task 
> apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1 of framework 
> 415284b7-2967-407d-b66f-f445e93f064e-0011 from executor(1)@172.31.1.7:36244
> {noformat}
> After receiving the terminal status update, the agent asked the docker 
> containerizer to update {{cpu.cfs_period_us}}, {{cpu.cfs_quota_us}} and 
> {{memory.soft_limit_in_bytes}} of the container through the cached pid:
>  
> [https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1696]
> {noformat}
> I0523 06:16:17.290447 21815 docker.cpp:1868] Updated 'cpu.shares' to 102 at 
> /sys/fs/cgroup/cpu,cpuacct/mesos/slave for container 
> f69c8a8c-eba4-4494-a305-0956a44a6ad2
> I0523 06:16:17.290660 21815 docker.cpp:1895] Updated 'cpu.cfs_period_us' to 
> 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container 
> f69c8a8c-eba4-4494-a305-0956a44a6ad2
> I0523 06:16:17.889816 21815 docker.cpp:1937] Updated 
> 'memory.soft_limit_in_bytes' to 32MB for container 
> f69c8a8c-eba4-4494-a305-0956a44a6ad2
> {noformat}
> Note that the cgroup of {{cpu.shares}} was {{/mesos/slave}}. This was 
> possibly because that over the 16 minutes the pid got reused:
> {noformat}
> # zgrep 'systemd.cpp:98\]' /var/log/mesos/archive/mesos-agent.log.12.gz
> ...
> I0523 06:00:54.525178 21815 systemd.cpp:98] Assigned child process '13716' to 
> 'mesos_executors.slice'
> I0523 06:00:55.078546 21808 systemd.cpp:98] Assigned child process '13798' to 
> 'mesos_executors.slice'
> I0523 06:00:55.134096 21808 systemd.cpp:98] Assigned child process '13799' to 
> 'mesos_executors.slice'
> ...
> I0523 06:06:30.997439 21808 systemd.cpp:98] Assigned child process '32689' to 
> 'mesos_executors.slice'
> I0523 06:06:31.050976 21808 systemd.cpp:98] Assigned child process '32690' to 
> 'mesos_executors.slice'
> I0523 06:06:31.110514 21815 systemd.cpp:98] Assigned child process '32692' to 
> 'mesos_executors.slice'
> I0523 06:06:33.143726 21818 systemd.cpp:98] Assigned child process '446' to 
> 'mesos_executors.slice'
> I0523 06:06:33.196251 21818 systemd.cpp:98] Assigned child process '447' to 
> 'mesos_executors.slice'
> I0523 06:06:33.266332 21816 systemd.cpp:98] Assigned child process '449' to 
> 'mesos_executors.slice'
> ...
> I0523 06:09:34.870056 21808 systemd.cpp:98] Assigned child process '13717' to 
> 'mesos_executors.slice'
> I0523 06:09:34.937762 21813 systemd.cpp:98] Assigned child process '13744' to 
> 'mesos_executors.slice'
> I0523 06:09:35.073971 21817 systemd.cpp:98] Assigned child process '13754' to 
> 'mesos_executors.slice'
> ...
> {noformat}
> It was highly likely that the container itself exited around 06:09:35, way 
> before the docker executor detected and reported the terminal status update, 

[jira] [Commented] (MESOS-9901) Specialize jsonify for protobuf Maps.

2019-07-23 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890778#comment-16890778
 ] 

Benjamin Bannier commented on MESOS-9901:
-

[~mzhu], could you document the expected behavior? AFAICT we already e.g., test 
that output produced by {{JSON::protobuf}} can be parsed back to proto maps, 
see {{ProtobufTest.JsonifyMap}} in {{3rdparty/stout/tests/protobuf_tests.cpp}}. 
If we do any changes there we should make sure to not break our JSON API.

> Specialize jsonify for protobuf Maps.
> -
>
> Key: MESOS-9901
> URL: https://issues.apache.org/jira/browse/MESOS-9901
> Project: Mesos
>  Issue Type: Improvement
>  Components: json api
>Reporter: Meng Zhu
>Priority: Major
>
> Jsonify current treats protobuf as a regular repeated field. For example, for 
> the schema 
> {noformat}
> message QuotaConfig {
>   required string role = 1;
>   map guarantees = 2;
>   map limits = 3;
> }
> {noformat}
> it will produce:
> {noformat}
>   "configs": [
> {
>   "role": "role1",
>   "guarantees": [
> {
>   "key": "cpus",
>   "value": {
> "value": 1
>   }
> },
> {
>   "key": "mem",
>   "value": {
> "value": 512
>   }
> }
>   ]
> {noformat}
> This output cannot be parsed back to proto messages. We need to specialize 
> jsonify for Maps type. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (MESOS-9254) Make SLRP be able to update its volumes and storage pools.

2019-07-18 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9254:
---

Assignee: Benjamin Bannier

> Make SLRP be able to update its volumes and storage pools.
> --
>
> Key: MESOS-9254
> URL: https://issues.apache.org/jira/browse/MESOS-9254
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: mesosphere, mesosphere-dss-post-ga, storage
>
> We should consider making SLRP update its resources periodically, or adding 
> an endpoint to trigger that, for the following reasons:
> 1. Mesos currently assumes all profiles have disjoint storage pools. This is 
> because Mesos models each resource independently. However, in practice an 
> operator can set up, say two profiles, one for linear volumes and one for 
> raid volumes, and an "LVM" resource provider that can provision both linear 
> and raid volumes. The correlation between the storage pools of the linear and 
> raid profiles would reduce one's pool capacity when a volume of the other 
> type is provisioned. To reflect the actual sizes of correlated storage pools, 
> we need a way to make SLRP update its resources.
> 2. The SLRP now only queries the CSI plugin to report a list of volumes 
> during startup, so if a new device is added, the operator will have to 
> restart the agent to trigger another SLRP startup, which is inconvenient.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (MESOS-9895) SlaveTest.DrainingAgentRejectLaunch is flaky

2019-07-16 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9895:
---

Assignee: Benjamin Bannier

> SlaveTest.DrainingAgentRejectLaunch is flaky
> 
>
> Key: MESOS-9895
> URL: https://issues.apache.org/jira/browse/MESOS-9895
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: flaky-test, test
> Attachments: consoleText.txt
>
>
> We saw {{SlaveTest.DrainingAgentRejectLaunch}} fail repeatedly on ASF Jenkins 
> CI.
> {noformat}
> ../../src/tests/slave_tests.cpp:12408: Failure
> Failed to wait 15secs for runningUpdate2
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (MESOS-9895) SlaveTest.DrainingAgentRejectLaunch is flaky

2019-07-16 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-9895:
---

 Summary: SlaveTest.DrainingAgentRejectLaunch is flaky
 Key: MESOS-9895
 URL: https://issues.apache.org/jira/browse/MESOS-9895
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Benjamin Bannier
 Attachments: consoleText.txt

We saw {{SlaveTest.DrainingAgentRejectLaunch}} fail repeatedly on ASF Jenkins 
CI.
{noformat}
../../src/tests/slave_tests.cpp:12408: Failure
Failed to wait 15secs for runningUpdate2
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (MESOS-9846) Update UI for agent draining

2019-07-16 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886050#comment-16886050
 ] 

Benjamin Bannier edited comment on MESOS-9846 at 7/16/19 11:30 AM:
---

[~greggomann], with the way the drain config is presented in master and agent 
state endpoints it is currently hard to interpret the meaning of 
{{max_grace_period}} as represents a value relative to the time the draining 
was starting, however that time is not exposed. WDYT about including the start 
time in HTTP responses as well? Specifying a duration as opposed to a deadline 
is convenient for users _triggering draining_, but hard to interpret after the 
fact. Looking back it might also have made sense to allow users to specify a 
relative {{max_grace_period}}, but already translate that to a deadline on the 
master before instructing the agent to drain.


was (Author: bbannier):
[~greggomann], with the way the drain config is presented in master and agent 
state endpoints it is currently hard to interpret the meaning of 
{{max_grace_period}} as represents a value relative to the time the draining 
was starting, however that time is not exposed. WDYT about including the start 
time in HTTP responses as well? Specifying a duration as opposed to a deadline 
is convenient for users _triggering draining_, but hard to interpret after the 
fact.

> Update UI for agent draining
> 
>
> Key: MESOS-9846
> URL: https://issues.apache.org/jira/browse/MESOS-9846
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, mesosphere
>
> We should expose the new agent metadata in the web UI:
> * Drain info
> * Deactivation state
> It may also be worth exposing unreachable and gone agents in some way, so 
> that agents do not simply disappear from the UI when they transition to 
> unreachable and/or gone, during or after maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9846) Update UI for agent draining

2019-07-16 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886050#comment-16886050
 ] 

Benjamin Bannier commented on MESOS-9846:
-

[~greggomann], with the way the drain config is presented in master and agent 
state endpoints it is currently hard to interpret the meaning of 
{{max_grace_period}} as represents a value relative to the time the draining 
was starting, however that time is not exposed. WDYT about including the start 
time in HTTP responses as well? Specifying a duration as opposed to a deadline 
is convenient for users _triggering draining_, but hard to interpret after the 
fact.

> Update UI for agent draining
> 
>
> Key: MESOS-9846
> URL: https://issues.apache.org/jira/browse/MESOS-9846
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, mesosphere
>
> We should expose the new agent metadata in the web UI:
> * Drain info
> * Deactivation state
> It may also be worth exposing unreachable and gone agents in some way, so 
> that agents do not simply disappear from the UI when they transition to 
> unreachable and/or gone, during or after maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (MESOS-9846) Update UI for agent draining

2019-07-16 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-9846:

Comment: was deleted

(was: With MESOS-9816 we only add the drain information to the master 
endpoints. This only allows us to display drain information in {{/#/agents}} 
which uses master state; for the agent detail screen {{/#/agents/}} 
we would however need to display this information also the the agent state 
endpoint.)

> Update UI for agent draining
> 
>
> Key: MESOS-9846
> URL: https://issues.apache.org/jira/browse/MESOS-9846
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, mesosphere
>
> We should expose the new agent metadata in the web UI:
> * Drain info
> * Deactivation state
> It may also be worth exposing unreachable and gone agents in some way, so 
> that agents do not simply disappear from the UI when they transition to 
> unreachable and/or gone, during or after maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9846) Update UI for agent draining

2019-07-16 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886024#comment-16886024
 ] 

Benjamin Bannier commented on MESOS-9846:
-

With MESOS-9816 we only add the drain information to the master endpoints. This 
only allows us to display drain information in {{/#/agents}} which uses master 
state; for the agent detail screen {{/#/agents/}} we would however 
need to display this information also the the agent state endpoint.

> Update UI for agent draining
> 
>
> Key: MESOS-9846
> URL: https://issues.apache.org/jira/browse/MESOS-9846
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: foundations, mesosphere
>
> We should expose the new agent metadata in the web UI:
> * Drain info
> * Deactivation state
> It may also be worth exposing unreachable and gone agents in some way, so 
> that agents do not simply disappear from the UI when they transition to 
> unreachable and/or gone, during or after maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (MESOS-9846) Update UI for agent draining

2019-07-15 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-9846:
---

Assignee: Benjamin Bannier

> Update UI for agent draining
> 
>
> Key: MESOS-9846
> URL: https://issues.apache.org/jira/browse/MESOS-9846
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Greg Mann
>Assignee: Benjamin Bannier
>Priority: Major
>
> We should expose the new agent metadata in the web UI:
> * Drain info
> * Deactivation state
> It may also be worth exposing unreachable and gone agents in some way, so 
> that agents do not simply disappear from the UI when they transition to 
> unreachable and/or gone, during or after maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   3   4   5   6   7   8   9   10   >