[jira] [Assigned] (MESOS-7170) Allow for custom filters on Mesos APIs

2017-05-24 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-7170:


Assignee: (was: Gilbert Song)

> Allow for custom filters on Mesos APIs 
> ---
>
> Key: MESOS-7170
> URL: https://issues.apache.org/jira/browse/MESOS-7170
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Miguel Bernadin
>Priority: Minor
>
> For tasks.json API and others like state.json, etc, on larger clusters the 
> data that Mesos master sends is quite lengthy. It would be good to provide 
> filters in the API to allow Mesos to just send only the RUNNING tasks in the 
> cluster so it does less work. Creating this JIRA so we can have intelligent 
> filters to pick what data to send on the server side, rather than filtering 
> it out on the client side. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7564) Introduce a heartbeat mechanism for executor <-> agent communication.

2017-05-24 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7564:
--
Target Version/s: 1.4.0

> Introduce a heartbeat mechanism for executor <-> agent communication.
> -
>
> Key: MESOS-7564
> URL: https://issues.apache.org/jira/browse/MESOS-7564
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>
> Currently, we do not have heartbeats for executor <-> agent communication. 
> This is especially problematic in scenarios when IPFilters are enabled since 
> the default conntrack keep alive timeout is 5 days. When that timeout 
> elapses, the executor doesn't get notified via a socket disconnection when 
> the agent process restarts. The executor would then get killed if it doesn't 
> re-register when the agent recovery process is completed.
> Enabling application level heartbeats or TCP KeepAlive's can be a possible 
> way for fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7563) Make the HTTP command executor the default implementation.

2017-05-24 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7563:
--
Target Version/s: 1.4.0

> Make the HTTP command executor the default implementation.
> --
>
> Key: MESOS-7563
> URL: https://issues.apache.org/jira/browse/MESOS-7563
> Project: Mesos
>  Issue Type: Epic
>Reporter: Anand Mazumdar
>
> This epic tracks the work needed to make HTTP command executors the default 
> i.e., enable the {{http_command_executor}} flag. Currently, all command 
> executors use the old executor driver implementation. With this flag being 
> always enabled, the command executors would use the v1 HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7564) Introduce a heartbeat mechanism for executor <-> agent communication.

2017-05-24 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-7564:
-

 Summary: Introduce a heartbeat mechanism for executor <-> agent 
communication.
 Key: MESOS-7564
 URL: https://issues.apache.org/jira/browse/MESOS-7564
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Currently, we do not have heartbeats for executor <-> agent communication. This 
is especially problematic in scenarios when IPFilters are enabled since the 
default conntrack keep alive timeout is 5 days. When that timeout elapses, the 
executor doesn't get notified via a socket disconnection when the agent process 
restarts. The executor would then get killed if it doesn't re-register when the 
agent recovery process is completed.

Enabling application level heartbeats or TCP KeepAlive's can be a possible way 
for fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7562) MasterTest.IgnoreOldAgentReregistration is flaky

2017-05-24 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-7562:
--

Assignee: Neil Conway

> MasterTest.IgnoreOldAgentReregistration is flaky
> 
>
> Key: MESOS-7562
> URL: https://issues.apache.org/jira/browse/MESOS-7562
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>
> {noformat}
> [ RUN  ] MasterTest.IgnoreOldAgentReregistration
> I0524 16:29:07.143152 29236 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0524 16:29:07.149690 29287 master.cpp:436] Master 
> 3912ae61-36a4-468c-bef5-82f082370f3d (core-dev) started on 10.0.49.2:42980
> I0524 16:29:07.149724 29287 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/gg4ie7/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/gg4ie7/master" 
> --zk_session_timeout="10secs"
> I0524 16:29:07.149896 29287 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0524 16:29:07.149905 29287 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0524 16:29:07.149912 29287 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0524 16:29:07.149920 29287 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/gg4ie7/credentials'
> I0524 16:29:07.150065 29287 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0524 16:29:07.150133 29287 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0524 16:29:07.150168 29287 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0524 16:29:07.150223 29287 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0524 16:29:07.150259 29287 master.cpp:640] Authorization enabled
> I0524 16:29:07.151617 29274 master.cpp:2161] Elected as the leading master!
> I0524 16:29:07.151644 29274 master.cpp:1700] Recovering from registrar
> I0524 16:29:07.152218 29261 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 505088ns
> I0524 16:29:07.152268 29261 registrar.cpp:493] Applied 1 operations in 
> 4200ns; attempting to update the registry
> I0524 16:29:07.152664 29261 registrar.cpp:550] Successfully updated the 
> registry in 371200ns
> I0524 16:29:07.152703 29261 registrar.cpp:422] Successfully recovered 
> registrar
> I0524 16:29:07.153328 29291 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0524 16:29:07.160094 29236 containerizer.cpp:230] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0524 16:29:07.160295 29236 backend.cpp:76] Failed to create 'overlay' 
> backend: OverlayBackend requires root privileges
> W0524 16:29:07.160326 29236 backend.cpp:76] Failed to create 'bind' backend: 
> BindBackend requires root privileges
> I0524 16:29:07.160334 29236 provisioner.cpp:255] Using default backend 'copy'
> I0524 16:29:07.161916 29236 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0524 16:29:07.162616 29276 slave.cpp:225] Mesos agent started on 
> (7738)@10.0.49.2:42980
> I0524 16:29:07.162644 29276 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" 

[jira] [Created] (MESOS-7563) Make the HTTP command executor the default implementation.

2017-05-24 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-7563:
-

 Summary: Make the HTTP command executor the default implementation.
 Key: MESOS-7563
 URL: https://issues.apache.org/jira/browse/MESOS-7563
 Project: Mesos
  Issue Type: Epic
Reporter: Anand Mazumdar


This epic tracks the work needed to make HTTP command executors the default 
i.e., enable the {{http_command_executor}} flag. Currently, all command 
executors use the old executor driver implementation. With this flag being 
always enabled, the command executors would use the v1 HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7562) MasterTest.IgnoreOldAgentReregistration is flaky

2017-05-24 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7562:
--

 Summary: MasterTest.IgnoreOldAgentReregistration is flaky
 Key: MESOS-7562
 URL: https://issues.apache.org/jira/browse/MESOS-7562
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


{noformat}
[ RUN  ] MasterTest.IgnoreOldAgentReregistration
I0524 16:29:07.143152 29236 cluster.cpp:162] Creating default 'local' authorizer
I0524 16:29:07.149690 29287 master.cpp:436] Master 
3912ae61-36a4-468c-bef5-82f082370f3d (core-dev) started on 10.0.49.2:42980
I0524 16:29:07.149724 29287 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/gg4ie7/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/gg4ie7/master" 
--zk_session_timeout="10secs"
I0524 16:29:07.149896 29287 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0524 16:29:07.149905 29287 master.cpp:502] Master only allowing authenticated 
agents to register
I0524 16:29:07.149912 29287 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0524 16:29:07.149920 29287 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/gg4ie7/credentials'
I0524 16:29:07.150065 29287 master.cpp:560] Using default 'crammd5' 
authenticator
I0524 16:29:07.150133 29287 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0524 16:29:07.150168 29287 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0524 16:29:07.150223 29287 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0524 16:29:07.150259 29287 master.cpp:640] Authorization enabled
I0524 16:29:07.151617 29274 master.cpp:2161] Elected as the leading master!
I0524 16:29:07.151644 29274 master.cpp:1700] Recovering from registrar
I0524 16:29:07.152218 29261 registrar.cpp:389] Successfully fetched the 
registry (0B) in 505088ns
I0524 16:29:07.152268 29261 registrar.cpp:493] Applied 1 operations in 4200ns; 
attempting to update the registry
I0524 16:29:07.152664 29261 registrar.cpp:550] Successfully updated the 
registry in 371200ns
I0524 16:29:07.152703 29261 registrar.cpp:422] Successfully recovered registrar
I0524 16:29:07.153328 29291 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0524 16:29:07.160094 29236 containerizer.cpp:230] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
W0524 16:29:07.160295 29236 backend.cpp:76] Failed to create 'overlay' backend: 
OverlayBackend requires root privileges
W0524 16:29:07.160326 29236 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0524 16:29:07.160334 29236 provisioner.cpp:255] Using default backend 'copy'
I0524 16:29:07.161916 29236 cluster.cpp:448] Creating default 'local' authorizer
I0524 16:29:07.162616 29276 slave.cpp:225] Mesos agent started on 
(7738)@10.0.49.2:42980
I0524 16:29:07.162644 29276 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/MasterTest_IgnoreOldAgentReregistration_WX8CZz/credential" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" 

[jira] [Commented] (MESOS-7476) Restrict capabilities to only the bounding set.

2017-05-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023909#comment-16023909
 ] 

James Peach commented on MESOS-7476:


| [https://reviews.apache.org/r/59547/|https://reviews.apache.org/r/59547/] | 
Rename ContainerLaunchInfo `capabilities` field.|
| [https://reviews.apache.org/r/59548/|https://reviews.apache.org/r/59548/] | 
Add a `bounding_capabilities` field to ContainerLaunchInfo.|
| [https://reviews.apache.org/r/59549/|https://reviews.apache.org/r/59549/] | 
Add the agent --bounding_capabilities flag.|
| [https://reviews.apache.org/r/59550/|https://reviews.apache.org/r/59550/] 
|Check bounding capabilities at isolator creation time |
| [https://reviews.apache.org/r/59551/|https://reviews.apache.org/r/59551/] 
|Change launcher working directory before dropping privilege. |
| [https://reviews.apache.org/r/59552/|https://reviews.apache.org/r/59552/] | 
Add support for explicitly setting bounding capabilities. |



> Restrict capabilities to only the bounding set.
> ---
>
> Key: MESOS-7476
> URL: https://issues.apache.org/jira/browse/MESOS-7476
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>
> As a security improvement, it would be useful to be able to set the bounding 
> capability set without also granting those capabilities. This is what the 
> {{--allowed_capabilities}} flag sounds like it does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7170) Allow for custom filters on Mesos APIs

2017-05-24 Thread Eric Chung (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023906#comment-16023906
 ] 

Eric Chung commented on MESOS-7170:
---

We're having the same issue: our production cluster typically runs 30k+ tasks, 
making the time to download the entire list of tasks too long for a productive 
user experience, and also creates an unnecessary load on the master.

It would be awesome if we could filter tasks by label, which can be customized 
by the user, so we could do something like:
`curl /tasks?=&=`

we could of course also do something fancy like using a resource query 
language, but that can be up for debate.

> Allow for custom filters on Mesos APIs 
> ---
>
> Key: MESOS-7170
> URL: https://issues.apache.org/jira/browse/MESOS-7170
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Miguel Bernadin
>Assignee: Gilbert Song
>Priority: Minor
>
> For tasks.json API and others like state.json, etc, on larger clusters the 
> data that Mesos master sends is quite lengthy. It would be good to provide 
> filters in the API to allow Mesos to just send only the RUNNING tasks in the 
> cluster so it does less work. Creating this JIRA so we can have intelligent 
> filters to pick what data to send on the server side, rather than filtering 
> it out on the client side. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MESOS-7476) Restrict capabilities to only the bounding set.

2017-05-24 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-7476:
---
Comment: was deleted

(was: | 
[https://reviews.apache.org/r/59183/|https://reviews.apache.org/r/59183/] | 
Refactor setting capabilities into a helper function. |
| [https://reviews.apache.org/r/59184/|https://reviews.apache.org/r/59184/] | 
Add support for explicitly setting bounding capabilities. |)

> Restrict capabilities to only the bounding set.
> ---
>
> Key: MESOS-7476
> URL: https://issues.apache.org/jira/browse/MESOS-7476
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>
> As a security improvement, it would be useful to be able to set the bounding 
> capability set without also granting those capabilities. This is what the 
> {{--allowed_capabilities}} flag sounds like it does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7477) Support ambient capabilities.

2017-05-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006761#comment-16006761
 ] 

James Peach edited comment on MESOS-7477 at 5/24/17 11:52 PM:
--

| [https://reviews.apache.org/r/59185/|https://reviews.apache.org/r/59185/] 
|Add ambient capability support. |
| [https://reviews.apache.org/r/59553/|https://reviews.apache.org/r/59553/] | 
Add ambient capabilities to launched tasks. |
| [https://reviews.apache.org/r/59554/|https://reviews.apache.org/r/59554/] | 
Rename the `\-\-allowed_capabilities` flag to `\-\-effective_capabilities`. |
| [https://reviews.apache.org/r/59186/|https://reviews.apache.org/r/59186/] 
|Additional linux/capabilities isolator documentation. |




was (Author: jamespeach):
| [https://reviews.apache.org/r/59185/|https://reviews.apache.org/r/59185/] | 
Add ambient capability support. |
| [https://reviews.apache.org/r/59186/|https://reviews.apache.org/r/59186/] | 
Additional linux/capabilities isolator documentation. |

> Support ambient capabilities.
> -
>
> Key: MESOS-7477
> URL: https://issues.apache.org/jira/browse/MESOS-7477
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>
> Add support for ambient capabilities so that capabilities granted in the 
> {{LaunchTask}} message can be made active in the task without the requirement 
> for matching file-based capabilities.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7521) Major performance regression in DRF sorter.

2017-05-24 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-7521:
---
Summary: Major performance regression in DRF sorter.  (was: Major 
performance regression in drf sorter)

> Major performance regression in DRF sorter.
> ---
>
> Key: MESOS-7521
> URL: https://issues.apache.org/jira/browse/MESOS-7521
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.3.0
>Reporter: Dario Rexin
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: perfomance
>
> The addition of hierarchical roles to the framework sorter 
> (https://github.com/apache/mesos/commit/e5ef1992b2b8e84b5d1487f1578f18f2291cd082)
>  has introduced a major performance regression to 1.2. Suppressing offers for 
> frameworks does not seem to reduce allocation time anymore, like it used to 
> in 1.2. Here are some relevant benchmark results:
> Mesos 1.2:
> {noformat}
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7
> Using 1000 agents and 6000 frameworks
> Added 6000 frameworks in 105957us
> Added 1000 agents in 34.937438secs
> allocate() took 27.408828secs to make 1000 offers with 1200 out of 6000 
> frameworks suppressing offers
> allocate() took 20.121897secs to make 1000 offers with 2400 out of 6000 
> frameworks suppressing offers
> allocate() took 12.964302secs to make 1000 offers with 3600 out of 6000 
> frameworks suppressing offers
> allocate() took 6.534221secs to make 1000 offers with 4800 out of 6000 
> frameworks suppressing offers
> allocate() took 8953us to make 0 offers with 6000 out of 6000 frameworks 
> suppressing offers
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 
> (106198 ms)
> {noformat}
> Mesos 1.3:
> {noformat}
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7
> Using 1000 agents and 6000 frameworks
> Added 6000 frameworks in 1.036217secs
> Added 1000 agents in 10.093938secs
> allocate() took 10.629448secs to make 1000 offers with 1200 out of 6000 
> frameworks suppressing offers
> allocate() took 11.607185secs to make 1000 offers with 2400 out of 6000 
> frameworks suppressing offers
> allocate() took 12.896578secs to make 1000 offers with 3600 out of 6000 
> frameworks suppressing offers
> allocate() took 14.162431secs to make 1000 offers with 4800 out of 6000 
> frameworks suppressing offers
> allocate() took 257060us to make 0 offers with 6000 out of 6000 frameworks 
> suppressing offers
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 
> (64011 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7552) MasterAllocatorTest/0.FrameworkExited is flaky

2017-05-24 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-7552:
---
Shepherd: Anand Mazumdar

> MasterAllocatorTest/0.FrameworkExited is flaky
> --
>
> Key: MESOS-7552
> URL: https://issues.apache.org/jira/browse/MESOS-7552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] MasterAllocatorTest/0.FrameworkExited
> I0523 19:43:15.274132 29720 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0523 19:43:15.280047 29758 master.cpp:436] Master 
> a2abf627-97d2-4603-bda2-301f78203413 (core-dev) started on 10.0.49.2:33691
> I0523 19:43:15.280078 29758 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/GdDJ5A/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/GdDJ5A/master" 
> --zk_session_timeout="10secs"
> I0523 19:43:15.280259 29758 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0523 19:43:15.280269 29758 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0523 19:43:15.280297 29758 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0523 19:43:15.280305 29758 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/GdDJ5A/credentials'
> I0523 19:43:15.280433 29758 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0523 19:43:15.280496 29758 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0523 19:43:15.280544 29758 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0523 19:43:15.280743 29758 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0523 19:43:15.280772 29758 master.cpp:640] Authorization enabled
> I0523 19:43:15.281690 29774 master.cpp:2161] Elected as the leading master!
> I0523 19:43:15.281720 29774 master.cpp:1700] Recovering from registrar
> I0523 19:43:15.281911 29768 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 120320ns
> I0523 19:43:15.281942 29768 registrar.cpp:493] Applied 1 operations in 
> 2995ns; attempting to update the registry
> I0523 19:43:15.282146 29768 registrar.cpp:550] Successfully updated the 
> registry in 192us
> I0523 19:43:15.282207 29768 registrar.cpp:422] Successfully recovered 
> registrar
> I0523 19:43:15.282466 29779 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0523 19:43:15.289202 29720 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0523 19:43:15.289670 29758 slave.cpp:225] Mesos agent started on 
> (50)@10.0.49.2:33691
> I0523 19:43:15.289695 29758 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" 
> --credential="/tmp/MasterAllocatorTest_0_FrameworkExited_OPmret/credential" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" 

[jira] [Created] (MESOS-7561) Add storage resource provider specific information in ResourceProviderInfo.

2017-05-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-7561:
-

 Summary: Add storage resource provider specific information in 
ResourceProviderInfo.
 Key: MESOS-7561
 URL: https://issues.apache.org/jira/browse/MESOS-7561
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


For storage resource provider, there will be some specific configuration 
information. For instance, the most important one is the `ContainerConfig` of 
the CSI Plugin container.

That config information will be sent to the corresponding agent that will use 
the resources provided by the resource provider. For storage resource provider 
particularly, the agent needs to launch the CSI Node Plugin to mount the 
volumes.

Comparing to adding first class storage resource provider information, an 
alternative is to add a generic labels field in ResourceProviderInfo and let 
resource provider itself figure out the format of the labels. However, I 
believe a first class solution is better and more clear.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7560) Add 'type' and 'name' to ResourceProviderInfo.

2017-05-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-7560:
-

 Summary: Add 'type' and 'name' to ResourceProviderInfo.
 Key: MESOS-7560
 URL: https://issues.apache.org/jira/browse/MESOS-7560
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


The 'type' field will be used to load the corresponding implementation (either 
internal or via module). To avoid conflict, the naming should follow java 
packing naming scheme (e.g., org.apache.mesos.resource_provider.local.storage).

Since there could be multiple instances of the same resource provider type, 
it's important to also add a 'name' field to distinguish between instances of 
the same type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7559) CMake builds using parallel execution fail on OS X

2017-05-24 Thread Aaron Wood (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Wood updated MESOS-7559:
--
Description: 
When doing a {code}cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_DEBUG=0 .. && make 
-j4{code} there are some strange transient errors that pop up:

{code}
Scanning dependencies of target boost-1.53.0
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/src
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f 
CMakeFiles/make_bin_include_dir.dir/build.make 
CMakeFiles/make_bin_include_dir.dir/build
make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission 
denied
make[1]: *** [3rdparty/CMakeFiles/protobuf-2.6.1.dir/all] Error 1
make[1]: *** Waiting for unfinished jobs
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f 
3rdparty/CMakeFiles/boost-1.53.0.dir/build.make 
3rdparty/CMakeFiles/boost-1.53.0.dir/build
make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission 
denied
make[1]: *** [CMakeFiles/make_bin_include_dir.dir/all] Error 1
make[1]: *** [3rdparty/CMakeFiles/boost-1.53.0.dir/all] Error 1
[  0%] Built target make_bin_src_dir
make: *** [all] Error 2
{code}

{code}
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE__, #e) 
: (void)0)
^
29 warnings generated.
libtool: compile:  gcc -DHAVE_CONFIG_H -I. 
-I/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22 -g 
-O3 -MT ev.lo -MD -MP -MF .deps/ev.Tpo -c 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22/ev.c 
-o ev.o >/dev/null 2>&1
mv -f .deps/ev.Tpo .deps/ev.Plo
/bin/sh ./libtool  --tag=CC   --mode=link gcc  -g -O3 -version-info 4:0:0  -o 
libev.la -rpath 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
 ev.lo event.lo  
libtool: link: gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o 
.libs/libev.4.dylib  .libs/ev.o .libs/event.o-O3   -install_name  
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib/libev.4.dylib
 -compatibility_version 5 -current_version 5.0 -Wl,-single_module
libtool: link: (cd ".libs" && rm -f "libev.dylib" && ln -s "libev.4.dylib" 
"libev.dylib")
libtool: link: ar cru .libs/libev.a  ev.o event.o
libtool: link: ranlib .libs/libev.a
libtool: link: ( cd ".libs" && rm -f "libev.la" && ln -s "../libev.la" 
"libev.la" )
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-build
[  4%] Performing install step for 'libev-4.22'
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& mkdir -p 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
 && cp -r 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs/.
 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-install
[  6%] Completed 'libev-4.22'
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/libev-4.22-complete
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-done
[  6%] Built target libev-4.22
make: *** [all] Error 2
{code}

And there seems to be an impassable error further along:

{code}
[ 27%] Completed 'glog-0.3.3'
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/glog-0.3.3-complete
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-stamp/glog-0.3.3-done
gmake[2]: Leaving directory '/Users/myusername/Code/src/mesos/build'
[ 27%] Built target glog-0.3.3
gmake[1]: Leaving directory '/Users/myusername/Code/src/mesos/build'
gmake: *** [Makefile:120: all] Error 2
{code}

  was:
There are some strange transient 

[jira] [Created] (MESOS-7559) CMake builds using parallel execution fail on OS X

2017-05-24 Thread Aaron Wood (JIRA)
Aaron Wood created MESOS-7559:
-

 Summary: CMake builds using parallel execution fail on OS X
 Key: MESOS-7559
 URL: https://issues.apache.org/jira/browse/MESOS-7559
 Project: Mesos
  Issue Type: Bug
  Components: build, cmake
Reporter: Aaron Wood
Assignee: Andrew Schwartzmeyer
Priority: Minor


There are some strange transient errors that pop up:

{code}
Scanning dependencies of target boost-1.53.0
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/src
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f 
CMakeFiles/make_bin_include_dir.dir/build.make 
CMakeFiles/make_bin_include_dir.dir/build
make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission 
denied
make[1]: *** [3rdparty/CMakeFiles/protobuf-2.6.1.dir/all] Error 1
make[1]: *** Waiting for unfinished jobs
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f 
3rdparty/CMakeFiles/boost-1.53.0.dir/build.make 
3rdparty/CMakeFiles/boost-1.53.0.dir/build
make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission 
denied
make[1]: *** [CMakeFiles/make_bin_include_dir.dir/all] Error 1
make[1]: *** [3rdparty/CMakeFiles/boost-1.53.0.dir/all] Error 1
[  0%] Built target make_bin_src_dir
make: *** [all] Error 2
{code}

{code}
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE__, #e) 
: (void)0)
^
29 warnings generated.
libtool: compile:  gcc -DHAVE_CONFIG_H -I. 
-I/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22 -g 
-O3 -MT ev.lo -MD -MP -MF .deps/ev.Tpo -c 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22/ev.c 
-o ev.o >/dev/null 2>&1
mv -f .deps/ev.Tpo .deps/ev.Plo
/bin/sh ./libtool  --tag=CC   --mode=link gcc  -g -O3 -version-info 4:0:0  -o 
libev.la -rpath 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
 ev.lo event.lo  
libtool: link: gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o 
.libs/libev.4.dylib  .libs/ev.o .libs/event.o-O3   -install_name  
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib/libev.4.dylib
 -compatibility_version 5 -current_version 5.0 -Wl,-single_module
libtool: link: (cd ".libs" && rm -f "libev.dylib" && ln -s "libev.4.dylib" 
"libev.dylib")
libtool: link: ar cru .libs/libev.a  ev.o event.o
libtool: link: ranlib .libs/libev.a
libtool: link: ( cd ".libs" && rm -f "libev.la" && ln -s "../libev.la" 
"libev.la" )
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-build
[  4%] Performing install step for 'libev-4.22'
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& mkdir -p 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
 && cp -r 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs/.
 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib
cd 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build 
&& /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-install
[  6%] Completed 'libev-4.22'
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/libev-4.22-complete
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-done
[  6%] Built target libev-4.22
make: *** [all] Error 2
{code}

And there seems to be an impassable error further along:

{code}
[ 27%] Completed 'glog-0.3.3'
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/glog-0.3.3-complete
cd /Users/myusername/Code/src/mesos/build/3rdparty && 
/usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch 
/Users/myusername/Code/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-stamp/glog-0.3.3-done
gmake[2]: Leaving directory '/Users/myusername/Code/src/mesos/build'
[ 27%] Built target glog-0.3.3
gmake[1]: Leaving 

[jira] [Updated] (MESOS-7515) MasterAllocatorTest/0.ResourcesUnused is flaky

2017-05-24 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-7515:
---
Shepherd: Anand Mazumdar

> MasterAllocatorTest/0.ResourcesUnused is flaky
> --
>
> Key: MESOS-7515
> URL: https://issues.apache.org/jira/browse/MESOS-7515
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] MasterAllocatorTest/0.ResourcesUnused
> I0516 11:23:52.681485 27347 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0516 11:23:52.689667 27389 master.cpp:436] Master 
> 0596a957-df3e-4b44-94d6-d99478d0bb6e (core-dev) started on 10.0.49.2:42110
> I0516 11:23:52.689745 27389 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/5Pnjkv/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/5Pnjkv/master" 
> --zk_session_timeout="10secs"
> I0516 11:23:52.690110 27389 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0516 11:23:52.690142 27389 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0516 11:23:52.690166 27389 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0516 11:23:52.690218 27389 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/5Pnjkv/credentials'
> I0516 11:23:52.690475 27389 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0516 11:23:52.690603 27389 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0516 11:23:52.690723 27389 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0516 11:23:52.690870 27389 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0516 11:23:52.691264 27389 master.cpp:640] Authorization enabled
> I0516 11:23:52.694108 27394 master.cpp:2161] Elected as the leading master!
> I0516 11:23:52.694157 27394 master.cpp:1700] Recovering from registrar
> I0516 11:23:52.695142 27362 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 756992ns
> I0516 11:23:52.695263 27362 registrar.cpp:493] Applied 1 operations in 
> 14433ns; attempting to update the registry
> I0516 11:23:52.695825 27362 registrar.cpp:550] Successfully updated the 
> registry in 457984ns
> I0516 11:23:52.695955 27362 registrar.cpp:422] Successfully recovered 
> registrar
> I0516 11:23:52.697041 27381 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0516 11:23:52.712441 27347 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0516 11:23:52.713631 27375 slave.cpp:225] Mesos agent started on 
> (79)@10.0.49.2:42110
> I0516 11:23:52.713680 27375 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" 
> --credential="/tmp/MasterAllocatorTest_0_ResourcesUnused_KNgb71/credential" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" 

[jira] [Created] (MESOS-7558) Add resource provider validation

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7558:
---

 Summary: Add resource provider validation
 Key: MESOS-7558
 URL: https://issues.apache.org/jira/browse/MESOS-7558
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht


Similar to how it's done during agent registration/re-registration, the 
informations provided by a resource provider need to get validation during 
certain operation (e.g. re-registration, while applying offer operations, ...).
Some of these validations only cover the provided informations (e.g. are the 
resources in {{ResourceProviderInfo}} only of type {{disk}}), others take the 
current cluster state into account (e.g. do the resources that a task wants to 
use exist on the resource provider).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7557) Test that resource providers can re-register after a master failover

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7557:
---

 Summary: Test that resource providers can re-register after a 
master failover
 Key: MESOS-7557
 URL: https://issues.apache.org/jira/browse/MESOS-7557
 Project: Mesos
  Issue Type: Task
Reporter: Jan Schlicht


Restarting a master in a test environment should trigger a resource provider 
re-registration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7556) Wait for resource provider re-registrations after a master failover

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7556:
---

 Summary: Wait for resource provider re-registrations after a 
master failover
 Key: MESOS-7556
 URL: https://issues.apache.org/jira/browse/MESOS-7556
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht


Recover all resource provider IDs from registrar after a failover and set up 
timeouts for resource providers to re-register.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7555) Add resource provider IDs to the registry

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7555:
---

 Summary: Add resource provider IDs to the registry
 Key: MESOS-7555
 URL: https://issues.apache.org/jira/browse/MESOS-7555
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht


To support resource provider re-registration following a master fail-over, the 
IDs of registered resource providers need to be kept in the registry.
An operation to commit those IDs using the registrar needs to be added as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7554) Add a re-registration timeout for resource providers

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7554:
---

 Summary: Add a re-registration timeout for resource providers
 Key: MESOS-7554
 URL: https://issues.apache.org/jira/browse/MESOS-7554
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht


This re-registration timeout will be started when a resource provider seems to 
have disconnected, similar to how it's done for agents. While waiting for the 
resource provider to reconnect, it will be deactivated. On re-registration the 
timeout will be canceled and the resource provider activated again. In case of 
a timeout, the internal state will be changed to {{unreachable}} (as it is for 
agents in that situation) and considered gone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7553) Distinguish between different resource provider states in the master

2017-05-24 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7553:
---

 Summary: Distinguish between different resource provider states in 
the master
 Key: MESOS-7553
 URL: https://issues.apache.org/jira/browse/MESOS-7553
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht


In preparation to support time-outs for resource provider re-registrations, the 
master needs to be able to distinguish between registered, unreachable and gone 
resource providers, so that resources aren't offered when not registered. For 
that, internal resource provider states have to be added to the master, as it 
is already implemented for agents (i.e. the {{completed}}, {{registered}}, 
{{removed}} maps in {{master.cpp}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)