[jira] [Assigned] (MESOS-8968) Wire `UPDATE_QUOTA` call.

2019-07-10 Thread Meng Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu reassigned MESOS-8968:
---

Assignee: Meng Zhu

> Wire `UPDATE_QUOTA` call.
> -
>
> Key: MESOS-8968
> URL: https://issues.apache.org/jira/browse/MESOS-8968
> Project: Mesos
>  Issue Type: Bug
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>Priority: Major
>  Labels: Quota, allocator, multitenancy
>
> Wire the existing master, auth, registar, and allocator pieces together to 
> complete the `UPDATE_QUOTA` call.
> This would enable the master capability `QUOTA_V2`.
> This also fixes the "ignoring zero resource quota" bug in the old quota 
> implementation, namely:
> Currently, Mesos discards resource object with zero scalar value when parsing 
> resources. This means quota set to zero would be ignored and not enforced. 
> For example, role with quota set to "cpu:10;mem:10;gpu:0" intends to get no 
> GPU. Due to the above issue, the allocator can only see the quota as 
> "cpu:10;mem:10", and no quota GPU means no guarantee and NO limit. Thus GPUs 
> may still be allocated to this role. 
> With the completion of `UPDATE_QUOTA` which takes a map of name, scalar 
> values, zero value will no longer be dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8968) Wire `UPDATE_QUOTA` call.

2019-07-10 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882508#comment-16882508
 ] 

Meng Zhu edited comment on MESOS-8968 at 7/10/19 11:54 PM:
---

{noformat}
commit 0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8 (apache/master)
Author: Meng Zhu 
Date:   Fri Jul 5 18:05:59 2019 -0700

Implemented `UPDATE_QUOTA` operator call.

This patch wires up the master, auth, registar and allocator
pieces for `UPDATE_QUOTA` call.

This enables the master capability `QUOTA_V2`. The capability
implies the quota v2 API is capable of writes (`UPDATE_QUOTA`)
and the master is capable of recovering from V2 quota
(`QuotaConfig`) in registry.

This patch lacks the rescind offer logic. When quota limits
and guarantees are configured, it might be necessary to
rescind offers on the fly to satisfy new guarantees or be
constrained by the new limits. A todo is left and will be
tackled in subsequent patches.

Also enabled test `MasterQuotaTest.RecoverQuotaEmptyCluster`.

Review: https://reviews.apache.org/r/71021
{noformat}

{noformat}
commit dcd73437549413790751d1ff127989dbb29bd753 (HEAD -> update_quota, 
apache/master)
Author: Meng Zhu 
Date:   Sun Jul 7 14:27:14 2019 -0700

Added tests for `UPDATE_QUOTA`.

These tests reuse the existing tests for `SET_QUOTA` and
`REMOVE_QUOTA` calls. In general, `UPDATE_QUOTA` request
should fail where `SET_QUOTA` fails. When the existing
test expects `SET_QUOTA` call succeeds, we test the
`UPDATE_QUOTA` call by first remove the set quota and then
send the `UPDATE_QUOTA` request.

Review: https://reviews.apache.org/r/71022
{noformat}


was (Author: mzhu):
{noformat}
commit 0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8 (apache/master)
Author: Meng Zhu 
Date:   Fri Jul 5 18:05:59 2019 -0700

Implemented `UPDATE_QUOTA` operator call.

This patch wires up the master, auth, registar and allocator
pieces for `UPDATE_QUOTA` call.

This enables the master capability `QUOTA_V2`. The capability
implies the quota v2 API is capable of writes (`UPDATE_QUOTA`)
and the master is capable of recovering from V2 quota
(`QuotaConfig`) in registry.

This patch lacks the rescind offer logic. When quota limits
and guarantees are configured, it might be necessary to
rescind offers on the fly to satisfy new guarantees or be
constrained by the new limits. A todo is left and will be
tackled in subsequent patches.

Also enabled test `MasterQuotaTest.RecoverQuotaEmptyCluster`.

Review: https://reviews.apache.org/r/71021
{noformat}


> Wire `UPDATE_QUOTA` call.
> -
>
> Key: MESOS-8968
> URL: https://issues.apache.org/jira/browse/MESOS-8968
> Project: Mesos
>  Issue Type: Bug
>Reporter: Meng Zhu
>Priority: Major
>  Labels: Quota, allocator, multitenancy
>
> Wire the existing master, auth, registar, and allocator pieces together to 
> complete the `UPDATE_QUOTA` call.
> This would enable the master capability `QUOTA_V2`.
> This also fixes the "ignoring zero resource quota" bug in the old quota 
> implementation, namely:
> Currently, Mesos discards resource object with zero scalar value when parsing 
> resources. This means quota set to zero would be ignored and not enforced. 
> For example, role with quota set to "cpu:10;mem:10;gpu:0" intends to get no 
> GPU. Due to the above issue, the allocator can only see the quota as 
> "cpu:10;mem:10", and no quota GPU means no guarantee and NO limit. Thus GPUs 
> may still be allocated to this role. 
> With the completion of `UPDATE_QUOTA` which takes a map of name, scalar 
> values, zero value will no longer be dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9812) Add achievability validation for update quota call.

2019-07-10 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882512#comment-16882512
 ] 

Meng Zhu commented on MESOS-9812:
-

This covers: the guarantee overcommitment check, and hierchical gurantees check

{noformat}
commit 16f0b0c295960e397e56f6d504b8075cb62e6e4f
Author: Meng Zhu 
Date:   Fri Jul 5 15:41:01 2019 -0700

Added overcommit and hierarchical inclusion check for `UPDATE_QUOTA`.

The overcommit check validates that the total quota guarantees in
the cluster is contained by the cluster capacity.

The hierarchical inclusion check validates that the sum of
children's  guarantees is contained by the parent guarantee.

Further validation is needed for:

- Check a role's limit is less than its current consumption.
- Check a role's limit is less than its parent's limit.

Review: https://reviews.apache.org/r/71020
{noformat}

Leave the ticket on for now for:
limits < consumption, hierarchical limits invariant.

> Add achievability validation for update quota call.
> ---
>
> Key: MESOS-9812
> URL: https://issues.apache.org/jira/browse/MESOS-9812
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>Priority: Major
>  Labels: resource-management
>
> Add overcommit check, hierarchical quota validation and force flag override 
> for update quota call.
> Right now, we only have validation for per quota config. We need to add 
> further validation for the update quota call regarding:
> 1. Check if the role's resource limits are already breached. To achieve this, 
> we need to first rescind offers until its allocated resources are below 
> limits. If after all rescinds, allocated resources are still above the 
> requested limits, we will return an error unless the `force` flag is used.
> 2. If the aggregated quota guarantees of all roles are less than the cluster 
> capacity. If so we will return an error unless the `force` flag is used.
> 3. hierarchical limits validation
>   a. Check a role's limit is less than its parent's limit.
>   b. Check the sum of children's guarantees is less than its parent's 
> guarantees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8968) Wire `UPDATE_QUOTA` call.

2019-07-10 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882508#comment-16882508
 ] 

Meng Zhu commented on MESOS-8968:
-

{noformat}
commit 0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8 (apache/master)
Author: Meng Zhu 
Date:   Fri Jul 5 18:05:59 2019 -0700

Implemented `UPDATE_QUOTA` operator call.

This patch wires up the master, auth, registar and allocator
pieces for `UPDATE_QUOTA` call.

This enables the master capability `QUOTA_V2`. The capability
implies the quota v2 API is capable of writes (`UPDATE_QUOTA`)
and the master is capable of recovering from V2 quota
(`QuotaConfig`) in registry.

This patch lacks the rescind offer logic. When quota limits
and guarantees are configured, it might be necessary to
rescind offers on the fly to satisfy new guarantees or be
constrained by the new limits. A todo is left and will be
tackled in subsequent patches.

Also enabled test `MasterQuotaTest.RecoverQuotaEmptyCluster`.

Review: https://reviews.apache.org/r/71021
{noformat}


> Wire `UPDATE_QUOTA` call.
> -
>
> Key: MESOS-8968
> URL: https://issues.apache.org/jira/browse/MESOS-8968
> Project: Mesos
>  Issue Type: Bug
>Reporter: Meng Zhu
>Priority: Major
>  Labels: Quota, allocator, multitenancy
>
> Wire the existing master, auth, registar, and allocator pieces together to 
> complete the `UPDATE_QUOTA` call.
> This would enable the master capability `QUOTA_V2`.
> This also fixes the "ignoring zero resource quota" bug in the old quota 
> implementation, namely:
> Currently, Mesos discards resource object with zero scalar value when parsing 
> resources. This means quota set to zero would be ignored and not enforced. 
> For example, role with quota set to "cpu:10;mem:10;gpu:0" intends to get no 
> GPU. Due to the above issue, the allocator can only see the quota as 
> "cpu:10;mem:10", and no quota GPU means no guarantee and NO limit. Thus GPUs 
> may still be allocated to this role. 
> With the completion of `UPDATE_QUOTA` which takes a map of name, scalar 
> values, zero value will no longer be dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8968) Wire `UPDATE_QUOTA` call.

2019-07-10 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882509#comment-16882509
 ] 

Meng Zhu commented on MESOS-8968:
-

Leave it open for now, until more tests are landed.

> Wire `UPDATE_QUOTA` call.
> -
>
> Key: MESOS-8968
> URL: https://issues.apache.org/jira/browse/MESOS-8968
> Project: Mesos
>  Issue Type: Bug
>Reporter: Meng Zhu
>Priority: Major
>  Labels: Quota, allocator, multitenancy
>
> Wire the existing master, auth, registar, and allocator pieces together to 
> complete the `UPDATE_QUOTA` call.
> This would enable the master capability `QUOTA_V2`.
> This also fixes the "ignoring zero resource quota" bug in the old quota 
> implementation, namely:
> Currently, Mesos discards resource object with zero scalar value when parsing 
> resources. This means quota set to zero would be ignored and not enforced. 
> For example, role with quota set to "cpu:10;mem:10;gpu:0" intends to get no 
> GPU. Due to the above issue, the allocator can only see the quota as 
> "cpu:10;mem:10", and no quota GPU means no guarantee and NO limit. Thus GPUs 
> may still be allocated to this role. 
> With the completion of `UPDATE_QUOTA` which takes a map of name, scalar 
> values, zero value will no longer be dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8503) Improve UI when displaying frameworks with many roles.

2019-07-10 Thread Benjamin Mahler (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-8503:
--

Assignee: (was: Armand Grillet)

> Improve UI when displaying frameworks with many roles.
> --
>
> Key: MESOS-8503
> URL: https://issues.apache.org/jira/browse/MESOS-8503
> Project: Mesos
>  Issue Type: Task
>Reporter: Armand Grillet
>Priority: Major
> Attachments: Screen Shot 2018-01-29 à 10.38.05.png
>
>
> The /frameworks UI endpoint displays all the roles of each framework in a 
> table:
> !Screen Shot 2018-01-29 à 10.38.05.png!
> This is not readable if a framework has many roles. We thus need to provide a 
> solution to only display a few roles per framework and show more when a user 
> wants to see all of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9887) Race condition between two terminal task status updates for Docker executor.

2019-07-10 Thread Andrei Budnik (JIRA)
Andrei Budnik created MESOS-9887:


 Summary: Race condition between two terminal task status updates 
for Docker executor.
 Key: MESOS-9887
 URL: https://issues.apache.org/jira/browse/MESOS-9887
 Project: Mesos
  Issue Type: Bug
  Components: agent, containerization
Reporter: Andrei Budnik
 Attachments: race_example.txt

h2. Overview

Expected behavior:
 Task successfully finishes and sends TASK_FINISHED status update.

Observed behavior:
 Task successfully finishes, but the agent sends TASK_FAILED with the reason 
"REASON_EXECUTOR_TERMINATED".

In normal circumstances, Docker executor 
[sends|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/docker/executor.cpp#L758]
 final status update TASK_FINISHED to the agent, which then [gets 
processed|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5543]
 by the agent before termination of the executor's process.

However, if the processing of the initial TASK_FINISHED gets delayed, then 
there is a chance that Docker executor terminates and the agent 
[triggers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L6662]
 TASK_FAILED which will [be 
handled|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5816-L5826]
 prior to the TASK_FINISHED status update.

See attached logs which contain an example of the race condition.
h2. Reproducing bug

1. Add the following code:
{code:java}
  static int c = 0;
  if (++c == 3) { // to skip TASK_STARTING and TASK_RUNNING status updates.
::sleep(2);
  }
{code}
to the 
[`ComposingContainerizerProcess::status`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L578]
 and to the 
[`DockerContainerizerProcess::status`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/docker.cpp#L2167].

2. Recompile mesos

3. Launch mesos master and agent locally

4. Launch a simple Docker task via `mesos-execute`:
{code}
#  cd build
./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=docker --docker_image=alpine --resources="cpus:1;mem:32" 
--command="ls"
{code}
h2. Race condition - description

1. Mesos agent receives TASK_FINISHED status update and then subscribes on 
[`containerizer->status()`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5754-L5761].

2. `containerizer->status()` operation for TASK_FINISHED status update gets 
delayed in the composing containerizer (e.g. due to switch of the worker thread 
that executes `status` method).

3. Docker executor terminates and the agent 
[triggers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L6662]
 TASK_FAILED.

4. Docker containerizer destroys the container. A registered callback for the 
`containerizer->wait` call in the composing containerizer dispatches [lambda 
function|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L368-L373]
 that will clean up `containers_` map.

5. Composing c'zer resumes and dispatches 
`[status()|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L579]`
 method to the Docker containerizer for TASK_FINISHED, which in turn hangs for 
a few seconds.

6. Corresponding `containerId` gets removed from the `containers_` map of the 
composing c'zer.

7. Mesos agent subscribes on 
[`containerizer->status()`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5754-L5761]
 for the TASK_FAILED status update.

8. Composing c'zer returns ["Container not 
found"|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L576]
 for TASK_FAILED.

9. 
`[Slave::_statusUpdate|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5826]`
 stores TASK_FAILED terminal status update in the executor's data structure.

10. Docker executor resumes and finishes processing of `status()` method for 
TASK_FINISHED. Finally, it returns control to the `Slave::_statusUpdate` 
continuation. This method 
[discovers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5808-L5814]
 that the executor has already been destroyed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9618) Display quota consumption in the webui.

2019-07-10 Thread Benjamin Mahler (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-9618:
--

Assignee: Benjamin Mahler

> Display quota consumption in the webui.
> ---
>
> Key: MESOS-9618
> URL: https://issues.apache.org/jira/browse/MESOS-9618
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Major
>  Labels: resource-management
>
> Currently, the Roles table in the webui displays allocation and quota 
> guarantees / limits. However, quota "consumption" is different from 
> allocation, in that reserved resources are always considered consumed against 
> the quota.
> This discrepancy has led to confusion from users. One exampled occurred when 
> an agent was added with a large reservation exceeding the memory quota 
> guarantee. The user sees memory chopping in offers, and since the scheduler 
> didn't want to use the reservation, it can't launch its tasks.
> If consumption is shown in the UI, we should include a tool tip that 
> indicates how consumed is calculated so that users know how to interpret it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9886) RoleTest.RolesEndpointContainsConsumedQuota is flaky.

2019-07-10 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-9886:
--

 Summary: RoleTest.RolesEndpointContainsConsumedQuota is flaky.
 Key: MESOS-9886
 URL: https://issues.apache.org/jira/browse/MESOS-9886
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


{noformat}
[ RUN  ] RoleTest.RolesEndpointContainsConsumedQuota
I0710 07:05:42.670790  9995 cluster.cpp:176] Creating default 'local' authorizer
I0710 07:05:42.672238   master.cpp:440] Master 
8db40cec-43ef-41a1-89a4-4f7b877d8f13 (ip-172-16-10-69.ec2.internal)
started on 172.16.10.69:37082
I0710 07:05:42.672256   master.cpp:443] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregiste
r_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate
_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --authenticate_http_readwr
ite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/1d
0m6o/credentials" --filter_gpu_resources="true" --framework_sorter="drf" 
--help="false" --hostname_lookup="true" --http
_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initializ
e="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_co
mpleted_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework
="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework
_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="
1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry
_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submission
s="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/1d0m6o/master" --zk_session_time
out="10secs"
I0710 07:05:42.672351   master.cpp:492] Master only allowing authenticated 
frameworks to register
I0710 07:05:42.672356   master.cpp:498] Master only allowing authenticated 
agents to register
I0710 07:05:42.672360   master.cpp:504] Master only allowing authenticated 
HTTP frameworks to register
I0710 07:05:42.672364   credentials.hpp:37] Loading credentials for 
authentication from '/tmp/1d0m6o/credentials'
I0710 07:05:42.672430   master.cpp:548] Using default 'crammd5' 
authenticator
I0710 07:05:42.672466   http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0710 07:05:42.672508   http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite
'
I0710 07:05:42.672538   http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler
'
I0710 07:05:42.672569   master.cpp:629] Authorization enabled
I0710 07:05:42.672658 10001 hierarchical.cpp:241] Initialized hierarchical 
allocator process
I0710 07:05:42.672685 10001 whitelist_watcher.cpp:77] No whitelist given
I0710 07:05:42.673316 10001 master.cpp:2150] Elected as the leading master!
I0710 07:05:42.673331 10001 master.cpp:1664] Recovering from registrar
I0710 07:05:42.673616 10001 registrar.cpp:339] Recovering registrar
I0710 07:05:42.673874 10001 registrar.cpp:383] Successfully fetched the 
registry (0B) in 239104ns
I0710 07:05:42.673923 10001 registrar.cpp:487] Applied 1 operations in 7745ns; 
attempting to update the registry
I0710 07:05:42.674052   registrar.cpp:544] Successfully updated the 
registry in 108032ns
I0710 07:05:42.674082   registrar.cpp:416] Successfully recovered registrar
I0710 07:05:42.674152   master.cpp:1799] Recovered 0 agents from the 
registry (180B); allowing 10mins for agents to
 reregister
I0710 07:05:42.674185  9996 hierarchical.cpp:280] Skipping recovery of 
hierarchical allocator: nothing to recover
W0710 07:05:42.676100  9995 process.cpp:2877] Attempted to spawn already 
running process files@172.16.10.69:37082
I0710 07:05:42.676537  9995 containerizer.cpp:314] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesyst
em/posix, network/cni }
I0710 07:05:42.678514  9995 linux_launcher.cpp:144] Using /cgroup/freezer as 
the freezer hierarchy for the Linux launch
er
I0710 07:05:42.678980  9995 provisioner.cpp:298] Using default backend 'copy'
I0710 07:05:42.680043  9995 cluster.cpp:510] Creating default 'local' authorizer
I0710 07:05:42.680832  9998 slave.cpp:265] Mesos agent started on 
(522)@172.16.10.69:37082
I0710 07:05:42.680850  9998 slave.cpp:266] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; --a

[jira] [Commented] (MESOS-9849) Add support for per-role REVIVE / SUPPRESS to V0 scheduler driver.

2019-07-10 Thread Andrei Sekretenko (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882214#comment-16882214
 ] 

Andrei Sekretenko commented on MESOS-9849:
--

Exercising this functionality in the Java V0 test framework in the Mesos tests: 
[https://reviews.apache.org/r/71047 
|https://reviews.apache.org/r/71047] (to ensure that bindings don't fail to 
pass this parameters through due to someone's stupid mistake)

> Add support for per-role REVIVE / SUPPRESS to V0 scheduler driver.
> --
>
> Key: MESOS-9849
> URL: https://issues.apache.org/jira/browse/MESOS-9849
> Project: Mesos
>  Issue Type: Task
>  Components: scheduler driver
>Reporter: Benjamin Mahler
>Assignee: Andrei Sekretenko
>Priority: Major
>  Labels: resource-management
>
> Unfortunately, there are still schedulers that are using the v0 bindings and 
> are unable to move to v1 before wanting to use the per-role REVIVE / SUPPRESS 
> calls.
> We'll need to add per-role REVIVE / SUPPRESS into the v1 scheduler driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9885) Resource provider configuration are only removing its container, causing issues in failover scenarios

2019-07-10 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-9885:
---

 Summary: Resource provider configuration are only removing its 
container, causing issues in failover scenarios
 Key: MESOS-9885
 URL: https://issues.apache.org/jira/browse/MESOS-9885
 Project: Mesos
  Issue Type: Bug
  Components: resource provider
Affects Versions: 1.8.0
Reporter: Jan Schlicht


An agent could crash while it is handling a {{REMOVE_RESOURCE_PROVIDER_CONFIG}} 
call. In that case, the resource provider won't be removed. This is because its 
configuration is only removed if the actual resource provider container has 
been stopped. I.e. in {{LocalResourceProviderDaemonProcess::remove}} {{os::rm}} 
is only called if {{cleanupContainers}} was successful. After agent failover, 
the resource provider will still be running. This can be a problem for 
frameworks/operators, because there isn't a feedback channel that informs them 
if their removal requests was successful or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)