[jira] [Updated] (MESOS-8035) Correct mesos-tests CMake build dependencies

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-8035:

Labels: cmake  (was: )

> Correct mesos-tests CMake build dependencies
> 
>
> Key: MESOS-8035
> URL: https://issues.apache.org/jira/browse/MESOS-8035
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: cmake
>
> Specifically, we currently get away with building all the dependencies by 
> having {{mesos-tests}} depend on everything, or we build the default target, 
> which builds every executable not specifically excluded. However, we should 
> correct the dependency graph to instead have {{mesos-agent}} depend on 
> {{mesos-fetcher}} etc., and {{mesos-tests}} only depend on leaf dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8035) Correct mesos-tests CMake build dependencies

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8035:
---

 Summary: Correct mesos-tests CMake build dependencies
 Key: MESOS-8035
 URL: https://issues.apache.org/jira/browse/MESOS-8035
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer
Priority: Minor


Specifically, we currently get away with building all the dependencies by 
having {{mesos-tests}} depend on everything, or we build the default target, 
which builds every executable not specifically excluded. However, we should 
correct the dependency graph to instead have {{mesos-agent}} depend on 
{{mesos-fetcher}} etc., and {{mesos-tests}} only depend on leaf dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8034) Remove LIBNAME_VERSION from EXTERNAL

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8034:
---

Assignee: Andrew Schwartzmeyer

> Remove LIBNAME_VERSION from EXTERNAL
> 
>
> Key: MESOS-8034
> URL: https://issues.apache.org/jira/browse/MESOS-8034
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: cmake
>
> This setting is superfluous, and moreover, overwrites the existing variables 
> set in {{Versions.cmake}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8034) Remove LIBNAME_VERSION from EXTERNAL

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8034:
---

 Summary: Remove LIBNAME_VERSION from EXTERNAL
 Key: MESOS-8034
 URL: https://issues.apache.org/jira/browse/MESOS-8034
 Project: Mesos
  Issue Type: Improvement
  Components: cmake
Reporter: Andrew Schwartzmeyer
Priority: Minor


This setting is superfluous, and moreover, overwrites the existing variables 
set in {{Versions.cmake}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8033) Use more idiomatic CMake for compiler features

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-8033:

Labels: cmake  (was: )

> Use more idiomatic CMake for compiler features
> --
>
> Key: MESOS-8033
> URL: https://issues.apache.org/jira/browse/MESOS-8033
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: cmake
>
> Specifically, we should replace
> {noformat}
>   string(APPEND CMAKE_CXX_FLAGS " -std=c++11")
> {noformat}
> With {{CMAKE_CXX_STANDARD}}, and use [compile feature 
> requirements|https://cmake.org/cmake/help/latest/manual/cmake-compile-features.7.html#compile-feature-requirements].
> And replace
> {noformat}
>   string(APPEND CMAKE_CXX_FLAGS " -Wformat-security")
> {noformat}
> With compile options instead of appending to {{CMAKE_CXX_FLAGS}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8033) Use more idiomatic CMake for compiler features

2017-09-27 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8033:
---

 Summary: Use more idiomatic CMake for compiler features
 Key: MESOS-8033
 URL: https://issues.apache.org/jira/browse/MESOS-8033
 Project: Mesos
  Issue Type: Improvement
  Components: cmake
Reporter: Andrew Schwartzmeyer
Priority: Minor


Specifically, we should replace

{noformat}
  string(APPEND CMAKE_CXX_FLAGS " -std=c++11")
{noformat}

With {{CMAKE_CXX_STANDARD}}, and use [compile feature 
requirements|https://cmake.org/cmake/help/latest/manual/cmake-compile-features.7.html#compile-feature-requirements].

And replace

{noformat}
  string(APPEND CMAKE_CXX_FLAGS " -Wformat-security")
{noformat}
With compile options instead of appending to {{CMAKE_CXX_FLAGS}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8032) Launching SLRP

2017-09-27 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8032:
--

 Summary: Launching SLRP
 Key: MESOS-8032
 URL: https://issues.apache.org/jira/browse/MESOS-8032
 Project: Mesos
  Issue Type: Task
  Components: agent
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao
 Fix For: 1.5.0


Launching a SLRP requires the following steps:
1. Verify the configuration
2. Launch CSI plugins in standalone containers. It needs to use the V1 API to 
talk to the agent to launch the plugins, which may require authN/authZ.
3. Get the resources from CSI plugins and register to the resource provider 
manager through the Resource Provider API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8031) SLRP Configuration

2017-09-27 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8031:
---
Labels: storage  (was: )

> SLRP Configuration
> --
>
> Key: MESOS-8031
> URL: https://issues.apache.org/jira/browse/MESOS-8031
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> A typical SLRP configuration could look like the following:
> {noformat}
> {
>   "type": "org.apache.mesos.rp.local.storage",
>   "name": "local-volume",
>   "storage": {
> "csi_plugins": [
>   {
> "name": "plugin_1",
> "command": {...},
> "resources": [...],
> "container": {...}
>   },
>   {
> "name": "plugin_2",
> "command": {...},
> "resources": [...],
> "container": {...}
>   }
> ],
> "controller_plugin_name": "plugin_1",
> "node_plugin_name": "plugin_2"
>   }
> }
> {noformat}
> The {{csi_plugins}} field lists the configurations to launch standalone 
> containers for CSI plugins. The plugins are specified through a map, then we 
> use the {{controller_plugin_name}} and {{node_plugin_name}} fields to refer 
> to the corresponding plugin. With this design, we can support both headless 
> and split-component deployment for CSI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8031) SLRP Configuration

2017-09-27 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8031:
--

 Summary: SLRP Configuration
 Key: MESOS-8031
 URL: https://issues.apache.org/jira/browse/MESOS-8031
 Project: Mesos
  Issue Type: Task
  Components: agent
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


A typical SLRP configuration could look like the following:
{noformat}
{
  "type": "org.apache.mesos.rp.local.storage",
  "name": "local-volume",
  "storage": {
"csi_plugins": [
  {
"name": "plugin_1",
"command": {...},
"resources": [...],
"container": {...}
  },
  {
"name": "plugin_2",
"command": {...},
"resources": [...],
"container": {...}
  }
],
"controller_plugin_name": "plugin_1",
"node_plugin_name": "plugin_2"
  }
}
{noformat}
The {{csi_plugins}} field lists the configurations to launch standalone 
containers for CSI plugins. The plugins are specified through a map, then we 
use the {{controller_plugin_name}} and {{node_plugin_name}} fields to refer to 
the corresponding plugin. With this design, we can support both headless and 
split-component deployment for CSI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8027) os::open doesn't always atomically apply O_CLOEXEC

2017-09-27 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-8027:
--

Assignee: James Peach

| [r/62638|https://reviews.apache.org/r/62638] | Removed support for platforms 
without O_CLOEXEC. |

> os::open doesn't always atomically apply O_CLOEXEC
> --
>
> Key: MESOS-8027
> URL: https://issues.apache.org/jira/browse/MESOS-8027
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: James Peach
>Assignee: James Peach
>  Labels: security
>
> In [r/39180|https://reviews.apache.org/r/39180], the {{os/open.hpp}} header 
> was refactored so that it conditionally includes {{fcntl.h}}. However 
> {{fcntl.h}} is required to make the {{O_CLOEXEC}} symbol visible, so it is 
> quite likely that {{O_CLOEXEC_UNDEFINED}} will be defined even on systems 
> that do actually support {{O_CLOEXEC}}. This causes {{os::open}} to fall back 
> to the non-atomic open+fcntl sequence, which can leak file descriptors into 
> child processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8030) A resource provider for supporting local storage through CSI

2017-09-27 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8030:
---
Labels: storage  (was: )

> A resource provider for supporting local storage through CSI
> 
>
> Key: MESOS-8030
> URL: https://issues.apache.org/jira/browse/MESOS-8030
> Project: Mesos
>  Issue Type: Epic
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> The Storage Local Resource Provider (SLRP) is a resource provider component 
> in Mesos to manage persistent local storage on agents. SLRP should support 
> the following MVP functions:
> * Registering to the RP manager (P0)
> * Reporting available disk resources through a CSI controller plugin. (P0)
> * Processing resource converting operations (CREATE_BLOCK, CREATE_VOLUME, 
> DESTROY_BLOCK, DESTROY_VOLUME) issued by frameworks to convert RAW disk 
> resources to mount or block volumes through a CSI controller plugin (P0)
> * Publish/unpublish a disk resource through CSI controller/node plugins for a 
> task (P0)
> * Support storage profiles through modules (P1)
> * Tracking and checkpointing resources and reservations (P1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8030) A resource provider for supporting local storage through CSI

2017-09-27 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8030:
--

 Summary: A resource provider for supporting local storage through 
CSI
 Key: MESOS-8030
 URL: https://issues.apache.org/jira/browse/MESOS-8030
 Project: Mesos
  Issue Type: Epic
  Components: agent
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


The Storage Local Resource Provider (SLRP) is a resource provider component in 
Mesos to manage persistent local storage on agents. SLRP should support the 
following MVP functions:
* Registering to the RP manager (P0)
* Reporting available disk resources through a CSI controller plugin. (P0)
* Processing resource converting operations (CREATE_BLOCK, CREATE_VOLUME, 
DESTROY_BLOCK, DESTROY_VOLUME) issued by frameworks to convert RAW disk 
resources to mount or block volumes through a CSI controller plugin (P0)
* Publish/unpublish a disk resource through CSI controller/node plugins for a 
task (P0)
* Support storage profiles through modules (P1)
* Tracking and checkpointing resources and reservations (P1)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8029) Benchmark hierarchical roles for large role trees.

2017-09-27 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8029:
--

 Summary: Benchmark hierarchical roles for large role trees.
 Key: MESOS-8029
 URL: https://issues.apache.org/jira/browse/MESOS-8029
 Project: Mesos
  Issue Type: Improvement
  Components: allocation, master
Reporter: Benjamin Mahler


It's unlikely that large hierarchical role trees will have acceptable 
performance for the use cases we want to be able to support (e.g. consider a 
scheduler with each service or job having a role).

We'll need to perform some benchmarking and identify performance improvements 
to make role trees scalable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process

2017-09-27 Thread R.B. Boyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

R.B. Boyer updated MESOS-4065:
--
Affects Version/s: 1.2.2

> slave FD for ZK tcp connection leaked to executor process
> -
>
> Key: MESOS-4065
> URL: https://issues.apache.org/jira/browse/MESOS-4065
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1, 0.25.0, 1.2.2
>Reporter: James DeFelice
>  Labels: mesosphere, security
>
> {code}
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd
> root  1432 99.3  0.0 202420 12928 ?Rsl  21:32  13:51 
> ./etcd-mesos-executor -log_dir=./
> root  1450  0.4  0.1  38332 28752 ?Sl   21:32   0:03 ./etcd 
> --data-dir=etcd_data --name=etcd-1449178273 
> --listen-peer-urls=http://10.0.0.45:1025 
> --initial-advertise-peer-urls=http://10.0.0.45:1025 
> --listen-client-urls=http://10.0.0.45:1026 
> --advertise-client-urls=http://10.0.0.45:1026 
> --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025
>  --initial-cluster-state=existing
> core  1651  0.0  0.0   6740   928 pts/0S+   21:46   0:00 grep 
> --colour=auto -e etcd
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181
> etcd-meso 1432 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave
> root  1124  0.2  0.1 900496 25736 ?Ssl  21:11   0:04 
> /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave
> core  1658  0.0  0.0   6740   832 pts/0S+   21:46   0:00 grep 
> --colour=auto -e slave
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181
> mesos-sla 1124 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> {code}
> I only tested against mesos 0.24.1 and 0.25.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8028) Ensure webui can handle frameworks with large number of roles.

2017-09-27 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8028:
--

 Summary: Ensure webui can handle frameworks with large number of 
roles.
 Key: MESOS-8028
 URL: https://issues.apache.org/jira/browse/MESOS-8028
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Benjamin Mahler


Currently, the webui displays the list of framework roles in the framework 
tables as well as in the framework page. With the introduction of multi-role 
frameworks and hierarchical roles, frameworks may use a very large number of 
roles (e.g. consider a framework that uses 1 role per service job). The webui 
should assume the number of roles can be O(10,000s).

* This means listing out all the roles in table columns and so on is 
problematic.
* The 'Roles' tab will likely need to be updated to display more of a 
expandable / collapsable tree-like structure to navigate portions of the 
hierarchy that the user is interested in (per the original suggestion in 
MESOS-6995).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8027) os::open doesn't always atomically apply O_CLOEXEC

2017-09-27 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-8027:
---
Labels: security  (was: )

> os::open doesn't always atomically apply O_CLOEXEC
> --
>
> Key: MESOS-8027
> URL: https://issues.apache.org/jira/browse/MESOS-8027
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: James Peach
>  Labels: security
>
> In [r/39180|https://reviews.apache.org/r/39180], the {{os/open.hpp}} header 
> was refactored so that it conditionally includes {{fcntl.h}}. However 
> {{fcntl.h}} is required to make the {{O_CLOEXEC}} symbol visible, so it is 
> quite likely that {{O_CLOEXEC_UNDEFINED}} will be defined even on systems 
> that do actually support {{O_CLOEXEC}}. This causes {{os::open}} to fall back 
> to the non-atomic open+fcntl sequence, which can leak file descriptors into 
> child processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8027) os::open doesn't always atomically apply O_CLOEXEC

2017-09-27 Thread James Peach (JIRA)
James Peach created MESOS-8027:
--

 Summary: os::open doesn't always atomically apply O_CLOEXEC
 Key: MESOS-8027
 URL: https://issues.apache.org/jira/browse/MESOS-8027
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: James Peach


In [r/39180|https://reviews.apache.org/r/39180], the {{os/open.hpp}} header was 
refactored so that it conditionally includes {{fcntl.h}}. However {{fcntl.h}} 
is required to make the {{O_CLOEXEC}} symbol visible, so it is quite likely 
that {{O_CLOEXEC_UNDEFINED}} will be defined even on systems that do actually 
support {{O_CLOEXEC}}. This causes {{os::open}} to fall back to the non-atomic 
open+fcntl sequence, which can leak file descriptors into child processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7651) Consider a more explicit way to bind reservations / volumes to a framework.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7651:
---
Labels: multitenancy  (was: )

> Consider a more explicit way to bind reservations / volumes to a framework.
> ---
>
> Key: MESOS-7651
> URL: https://issues.apache.org/jira/browse/MESOS-7651
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: multitenancy
>
> Currently, when a framework creates a reservation or a persistent volume, and 
> it wants exclusive access to this volume or reservation, it must take a few 
> steps:
> * Ensure that no other frameworks are running within the reservation role (or 
> the other frameworks are co-operative).
> * With hierarchical roles, frameworks must also ensure that the role is a 
> leaf so that no descendant roles will have access to the reservation/volume. 
> This could be done by generating a role (e.g. eng/kafka/).
> It's not easy for the framework to ensure these things, since role ACLs are 
> controlled by the operator.
> We should consider a more direct way for a framework to ensure that their 
> reservation/volume cannot be shared. E.g. by binding it to their framework id 
> (perhaps re-using roles for this rather than introducing something new?)
> We should also consider binding the reservation / volumes, much like other 
> objects (tasks, executors), to the framework's lifecycle. So that if the 
> framework is removed, the reservations / volumes it left behind are cleaned 
> up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7258) Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7258:
---
Description: 
The current support for schedulers to subscribe to additional roles or 
unsubscribe from some of their roles requires that the scheduler obtain a new 
subscription with the master which invalidates the event stream.

A more lightweight mechanism would be to provide calls for the scheduler to 
subscribe to additional roles or unsubscribe from some roles such that the 
existing event stream remains open and offers to the new roles arrive on the 
existing event stream. E.g.

SUBSCRIBE_TO_ROLE
UNSUBSCRIBE_FROM_ROLE

One open question pertains to the terminology here, whether we would want to 
avoid using "subscribe" in this context. An alternative would be:

UPDATE_FRAMEWORK_INFO

Which provides a generic mechanism for a framework to perform framework info 
updates without obtaining a new event stream.

*NOTE*: Not specific to this issue, but we need to figure out how to allow the 
framework to not leak reservations, e.g. MESOS-7651.

  was:
The current support for schedulers to subscribe to additional roles or 
unsubscribe from some of their roles requires that the scheduler obtain a new 
subscription with the master which invalidates the event stream.

A more lightweight mechanism would be to provide calls for the scheduler to 
subscribe to additional roles or unsubscribe from some roles such that the 
existing event stream remains open and offers to the new roles arrive on the 
existing event stream. E.g.

SUBSCRIBE_TO_ROLE
UNSUBSCRIBE_FROM_ROLE

One open question pertains to the terminology here, whether we would want to 
avoid using "subscribe" in this context. An alternative would be:

UPDATE_FRAMEWORK_INFO

Which provides a generic mechanism for a framework to perform framework info 
updates without obtaining a new event stream.


> Provide scheduler calls to subscribe to additional roles and unsubscribe from 
> roles.
> 
>
> Key: MESOS-7258
> URL: https://issues.apache.org/jira/browse/MESOS-7258
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, scheduler api
>Reporter: Benjamin Mahler
>  Labels: multitenancy
>
> The current support for schedulers to subscribe to additional roles or 
> unsubscribe from some of their roles requires that the scheduler obtain a new 
> subscription with the master which invalidates the event stream.
> A more lightweight mechanism would be to provide calls for the scheduler to 
> subscribe to additional roles or unsubscribe from some roles such that the 
> existing event stream remains open and offers to the new roles arrive on the 
> existing event stream. E.g.
> SUBSCRIBE_TO_ROLE
> UNSUBSCRIBE_FROM_ROLE
> One open question pertains to the terminology here, whether we would want to 
> avoid using "subscribe" in this context. An alternative would be:
> UPDATE_FRAMEWORK_INFO
> Which provides a generic mechanism for a framework to perform framework info 
> updates without obtaining a new event stream.
> *NOTE*: Not specific to this issue, but we need to figure out how to allow 
> the framework to not leak reservations, e.g. MESOS-7651.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8026) Satisfy quota evenly across roles.

2017-09-27 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8026:
--

 Summary: Satisfy quota evenly across roles.
 Key: MESOS-8026
 URL: https://issues.apache.org/jira/browse/MESOS-8026
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Benjamin Mahler


Currently, quota is allocated based DRF sorting of the roles based on the 
role's cluster-wide allocation. This strategy is in place to attempt to 
allocate quota in a fair manner. However, it is ill-fitted to quota, since 
quota bears no connection to a fair share of the overall cluster. Rather, a 
more appropriate notion of fairness of quota, would be a DRF of the roles based 
on the role's share of its quota.

For example:

{noformat}
small_role: quota = [10,20,30]
large_role: quota = [100,200,300]

both have allocation of [0,0,0] with dominant share of 0.

allocate to small_role [1,2,3], small role now has dominant share of 0.1.

allocate to large_role [1,2,3], large role now has dominant share 0.01,
allocate to large_role [2,4,6], large role now has dominant share 0.02,
...
allocate to large_role [10,20,30], large role now has dominant share 0.1.

now small_role and large_role are tied again, repeat.
{noformat}

This strategy attempts to evenly satisfy quota across the roles. In this 
example, since large_role has 10x the quota as small_role, it receives 
resources in a 10:1 ratio compared to small_role.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7605) UCR doesn't isolate uts namespace w/ host networking

2017-09-27 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183154#comment-16183154
 ] 

James Peach commented on MESOS-7605:


After thinking about this some more, there are 3 cases

1. No container image. In this case there's no container image (so we won't 
rewrite {{/etc/hostname}}) but we still want to enter a UTS namespace for 
security reasons.

2. Container image with {{network/cni}}. When we have a container image, we can 
consistently set the hostname inside the container.  {{network/cni}} only 
enters a UTS namespace when setting the hostname.

3. Container image w/ {{network/port_mapping}}. This isolator never enters a 
UTS namespace to set the hostname and is agnostic to whether there is a 
container image.

The goal here is to isolate the UTS namespace, not necessarily support 
per-container hostname in every configuration. Since {{network/cni}} is always 
enabled by default, we could have that isolator always enter a UTS namespace, 
however it seems unreasonable to exclude {{network/port_mapping}} users.

So what I would like to do here is add a simple {{namespaces/uts}} isolator 
that does nothing more than place a container tree in a new UTS namespace. To 
be compatible with CNI and network namespaces, all the containers in the tree 
should join the same UTS namespace.

Note that this would not change the semantics of how we set the hostname in 
containers. It merely adds an additional layer of namespace isolation in more 
cases. The {namespaces/uts}} isolator will not update the hostname inside the 
container. If that is required, then the {{network/cni}} isolator should be 
used.


Ping [~gilbert], [~qianzhang], [~jieyu], [~avinash.mesos]

> UCR doesn't isolate uts namespace w/ host networking
> 
>
> Key: MESOS-7605
> URL: https://issues.apache.org/jira/browse/MESOS-7605
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James DeFelice
>Assignee: James Peach
>  Labels: mesosphere
>
> Docker's {{run}} command supports a {{--hostname}} parameter which impacts 
> container isolation, even in {{host}} network mode: (via 
> https://docs.docker.com/engine/reference/run/)
> {quote}
> Even in host network mode a container has its own UTS namespace by default. 
> As such --hostname is allowed in host network mode and will only change the 
> hostname inside the container. Similar to --hostname, the --add-host, --dns, 
> --dns-search, and --dns-option options can be used in host network mode.
> {quote}
> I see no evidence that UCR offers a similar isolation capability.
> Related: the {{ContainerInfo}} protobuf has a {{hostname}} field which was 
> initially added to support the Docker containerizer's use of the 
> {{--hostname}} Docker {{run}} flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4812) Mesos fails to escape command health checks

2017-09-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4812:
--
  Sprint: Mesosphere Sprint 65
Story Points: 5

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: Andrei Budnik
>  Labels: health-check, mesosphere, tech-debt
> Attachments: health_task.gif
>
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7504) Parent's mount namespace cannot be determined when launching a nested container.

2017-09-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7504:
-

Assignee: Andrei Budnik

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> 
>
> Key: MESOS-7504
> URL: https://issues.apache.org/jira/browse/MESOS-7504
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 1881) listening on socket file 
> '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for 
> container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.522195 17191 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"sleep 
> 

[jira] [Updated] (MESOS-7504) Parent's mount namespace cannot be determined when launching a nested container.

2017-09-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7504:
--
  Sprint: Mesosphere Sprint 65
Story Points: 3

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> 
>
> Key: MESOS-7504
> URL: https://issues.apache.org/jira/browse/MESOS-7504
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 1881) listening on socket file 
> '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for 
> container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.522195 17191 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"sleep 
> 1000"},"enter_namespaces":[131072,536870912],"environment":{}}" 
> 

[jira] [Updated] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7500:
--
Sprint: Mesosphere Sprint 56, Mesosphere Sprint 65  (was: Mesosphere Sprint 
56)

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7634) OsTest.ChownNoAccess fails on s390x machines

2017-09-27 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182920#comment-16182920
 ] 

Vinod Kone commented on MESOS-7634:
---

Looks like that node is offline. Can you ask INFRA to get it back up?

> OsTest.ChownNoAccess fails on s390x machines
> 
>
> Key: MESOS-7634
> URL: https://issues.apache.org/jira/browse/MESOS-7634
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Nayana Thorat
>
> Running a custom branch of Mesos (with some fixes in docker build scripts for 
> s390x) on s390x based CI machines throws the following error when running 
> stout tests.
> {code}
> [ RUN  ] OsTest.ChownNoAccess
> ../../../../3rdparty/stout/tests/os_tests.cpp:839: Failure
> Value of: os::chown(uid.get(), gid.get(), "one", true).isError()
>   Actual: false
> Expected: true
> ../../../../3rdparty/stout/tests/os_tests.cpp:840: Failure
> Value of: os::chown(uid.get(), gid.get(), "one/two", true).isError()
>   Actual: false
> {code}
> One can repro this by building Mesos from my custom branch here: 
> https://github.com/vinodkone/mesos/tree/vinod/s390x



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7136) Eliminate fair sharing between frameworks within a role.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7136:
---
Labels: multitenancy tech-debt  (was: tech-debt)

> Eliminate fair sharing between frameworks within a role.
> 
>
> Key: MESOS-7136
> URL: https://issues.apache.org/jira/browse/MESOS-7136
> Project: Mesos
>  Issue Type: Epic
>Reporter: Benjamin Mahler
>  Labels: multitenancy, tech-debt
>
> The current fair sharing algorithm performs fair sharing between frameworks 
> within a role. This is equivalent to having the framework id behave as a 
> pseudo-role beneath the role. Consider the case where there are two spark 
> frameworks running within the same "spark" role. This behaves similarly to 
> hierarchical roles with the framework ID acting as an implicit role:
> {noformat}
>  ^
>/   \
>   spark services
> ^
>   /   \
> /   \
> FrameworkId1 FrameworkId2
> (fixed weight of 1)(fixed weight of 1)
> {noformat}
> Unfortunately, the frameworks cannot change their weight to be a value other 
> than 1 (see MESOS-6247) and they cannot set quota.
> With the addition of hierarchical roles (see MESOS-6375) we can eliminate the 
> notion of the framework ID acting as a pseudo-role in favor of explicitly 
> using hierarchical roles. E.g.
> {noformat}
>  ^
>/   \
> engsales
> ^
>   /   \
>  analytics ui
>  ^
>/   \
>learning reports
> {noformat}
> Here if two frameworks run within the eng/analytics role, then they will 
> compete for resources without fair sharing. However, if resource guarantees 
> are required, sub-roles can be created explicitly, e.g. 
> eng/analytics/learning and eng/analytics/reports. These roles can be given 
> weights and quota.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4441) Allocate revocable resources beyond quota guarantee.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4441:
---
Labels: mesosphere multitenancy  (was: mesosphere)

> Allocate revocable resources beyond quota guarantee.
> 
>
> Key: MESOS-4441
> URL: https://issues.apache.org/jira/browse/MESOS-4441
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>  Labels: mesosphere, multitenancy
>
> h4. Status Quo
> Currently resources allocated to frameworks in a role with quota (aka 
> quota'ed role) beyond quota guarantee are marked non-revocable. This impacts 
> our flexibility for revoking them if we decide so in the future.
> h4. Proposal
> Once quota guarantee is satisfied we must not necessarily further allocate 
> resources as non-revocable. Instead we can mark all offers resources beyond 
> guarantee as revocable. When in the future {{RevocableInfo}} evolves 
> frameworks will get additional information about "revocability" of the 
> resource (i.e. allocation slack)
> h4. Caveats
> Though it seems like a simple change, it has several implications.
> h6. Fairness
> Currently the hierarchical allocator considers revocable resources as regular 
> resources when doing fairness calculations. This may prevent frameworks 
> getting non-revocable resources as part of their role's quota guarantee if 
> they accept some revocable resources as well.
> Consider the following scenario. A single framework in a role with quota set 
> to {{10}} CPUs is allocated {{10}} CPUs as non-revocable resources as part of 
> its quota and additionally {{2}} revocable CPUs. Now a task using {{2}} 
> non-revocable CPUs finishes and its resources are returned. Total allocation 
> for the role is {{8}} non-revocable + {{2}} revocable. However, the role may 
> not be offered additional {{2}} non-revocable since its total allocation 
> satisfies quota.
> h6. Resource math
> If we allocate non-revocable resources as revocable, we should make sure we 
> do accounting right: either we should update total agent resources and mark 
> them as revocable as well, or bookkeep resources as non-revocable and convert 
> them to revocable when necessary.
> h6. Coarse-grained nature of allocation
> The hierarchical allocator performs "coarse-grained" allocation, meaning it 
> always allocates the entire remaining agent resources to a single framework. 
> This may lead to over-allocating some resources as non-revocable beyond quota 
> guarantee.
> h6. Quotas smaller than fair share
> If a quota set for a role is smaller than its fair share, it may reduce the 
> amount of resources offered to this role, if frameworks in it do not accept 
> revocable resources. This is probably the most important consequence of the 
> proposed change. Operators may set quota to get guarantees, but may observe a 
> decrease in amount of resources a role gets, which is not intuitive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7790) Design hierarchical quota allocation.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-7790:
--

Assignee: Michael Park

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6375) Support hierarchical resource allocation roles.

2017-09-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6375:
---
Labels: mesosphere multitenancy  (was: mesosphere)

> Support hierarchical resource allocation roles.
> ---
>
> Key: MESOS-6375
> URL: https://issues.apache.org/jira/browse/MESOS-6375
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Neil Conway
>  Labels: mesosphere, multitenancy
>
> Currently mesos provides a non-hierarchical resource allocation model, in 
> which all roles are siblings of one another.
> Organizations often have a need for hierarchical resource allocation 
> constraints, whether for fair sharing of resources or for specifying quota 
> constraints.
> Consider the following fair sharing hierarchy based on "shares":
> {noformat}
>   ^   ^
> /   \   /   \
>   /   \   /   \
>eng (3)   sales (1)  =>   eng (75%)  sales (25%)
>  ^  ^
>/   \  /   \
>  /   \  /   \
>   ads (2)build (1)  ads (66%)  build (33%)
> {noformat}
> The hierarchy specifies that the engineering organization should get 3x as 
> many resources as sales, and within these resources the ads team should get 
> 2x as many resources as the build team. The implication of this is that, if 
> the ads team is not using some of its resources, the build team and 
> engineering organization will be able to use these resources before the sales 
> organization can. Without a hierarchy, the resources unused by the ads team 
> would be re-distributed among all other roles (rather than only its siblings).
> Quota can also apply in a hierarchical manner:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> See https://people.eecs.berkeley.edu/~alig/papers/h-drf.pdf for some 
> discussion w.r.t. sharing resources in a hierarchical model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8025) Update the master field in the new CLI config to accept a URL instead of an

2017-09-27 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-8025:
--

 Summary: Update the master field in the new CLI config to accept a 
URL instead of an 
 Key: MESOS-8025
 URL: https://issues.apache.org/jira/browse/MESOS-8025
 Project: Mesos
  Issue Type: Improvement
  Components: cli
 Environment: This will be useful in cases where the master is behind a 
proxy or when the master is sitting directly on port 80.
Reporter: Kevin Klues
Assignee: Armand Grillet






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7985) Use ASF CI for automating RPM packaging and upload to bintray.

2017-09-27 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7985:
--
Shepherd: Till Toenshoff
  Sprint: Mesosphere Sprint 64
Story Points: 3
 Summary: Use ASF CI for automating RPM packaging and upload to 
bintray.  (was: Use ASF CI for automating RPM/DEB packaging and upload to 
bintray.)

> Use ASF CI for automating RPM packaging and upload to bintray.
> --
>
> Key: MESOS-7985
> URL: https://issues.apache.org/jira/browse/MESOS-7985
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8024) Add Mesos CLI command to list agents

2017-09-27 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182529#comment-16182529
 ] 

Kevin Klues commented on MESOS-8024:


{noformat}
commit 1270703291a132d8d959d71bf99e4dfe4cf4292e
Author: Armand Grillet 
Date:   Wed Sep 27 15:02:04 2017 +0200

Added 'mesos agent list' command to CLI.

This command displays the agents in a cluster by
reaching the slaves endpoint of a master.

Review: https://reviews.apache.org/r/62065/
{noformat}

> Add Mesos CLI command to list agents
> 
>
> Key: MESOS-8024
> URL: https://issues.apache.org/jira/browse/MESOS-8024
> Project: Mesos
>  Issue Type: Task
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> We should have a command listing the agents in a Mesos cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8024) Add Mesos CLI command to list agents

2017-09-27 Thread Armand Grillet (JIRA)
Armand Grillet created MESOS-8024:
-

 Summary: Add Mesos CLI command to list agents
 Key: MESOS-8024
 URL: https://issues.apache.org/jira/browse/MESOS-8024
 Project: Mesos
  Issue Type: Task
Reporter: Armand Grillet
Assignee: Armand Grillet


We should have a command listing the agents in a Mesos cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7985) Use ASF CI for automating RPM/DEB packaging and upload to bintray.

2017-09-27 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya reassigned MESOS-7985:
-

Assignee: Kapil Arya

> Use ASF CI for automating RPM/DEB packaging and upload to bintray.
> --
>
> Key: MESOS-7985
> URL: https://issues.apache.org/jira/browse/MESOS-7985
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7634) OsTest.ChownNoAccess fails on s390x machines

2017-09-27 Thread vandita (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182495#comment-16182495
 ] 

vandita commented on MESOS-7634:


Hi Vinod,

Could you please trigger a build on the mesos slave vm( 148.100.33.168) labeled 
mesos2.

> OsTest.ChownNoAccess fails on s390x machines
> 
>
> Key: MESOS-7634
> URL: https://issues.apache.org/jira/browse/MESOS-7634
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Nayana Thorat
>
> Running a custom branch of Mesos (with some fixes in docker build scripts for 
> s390x) on s390x based CI machines throws the following error when running 
> stout tests.
> {code}
> [ RUN  ] OsTest.ChownNoAccess
> ../../../../3rdparty/stout/tests/os_tests.cpp:839: Failure
> Value of: os::chown(uid.get(), gid.get(), "one", true).isError()
>   Actual: false
> Expected: true
> ../../../../3rdparty/stout/tests/os_tests.cpp:840: Failure
> Value of: os::chown(uid.get(), gid.get(), "one/two", true).isError()
>   Actual: false
> {code}
> One can repro this by building Mesos from my custom branch here: 
> https://github.com/vinodkone/mesos/tree/vinod/s390x



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8023) Warn users trying to use HTTP Basic Authentication over non-secure channels

2017-09-27 Thread Benno Evers (JIRA)
Benno Evers created MESOS-8023:
--

 Summary: Warn users trying to use HTTP Basic Authentication over 
non-secure channels
 Key: MESOS-8023
 URL: https://issues.apache.org/jira/browse/MESOS-8023
 Project: Mesos
  Issue Type: Improvement
Reporter: Benno Evers


Since the Basic authentication submits passwords and usernames in plain text, 
it should only be used when the connection is already secured through another 
layer, e.g. when using HTTPS.

Since many users are not aware of this fact, Mesos should try to detect warn 
about this situation where possible, to prevent accidental leaking of passwords.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8013) Add test for blkio statistics

2017-09-27 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182081#comment-16182081
 ] 

Qian Zhang commented on MESOS-8013:
---

RR:
https://reviews.apache.org/r/62579/

> Add test for blkio statistics
> -
>
> Key: MESOS-8013
> URL: https://issues.apache.org/jira/browse/MESOS-8013
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> In [MESOS-6162|https://issues.apache.org/jira/browse/MESOS-6162], we have 
> added the support for cgroups blkio statistics. In this ticket, we'd like to 
> add a test to verify the cgroups blkio statistics can be correctly retrieved 
> via Mesos containerizer's {{usage()}} method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)