[jira] [Assigned] (MESOS-7170) Allow for custom filters on Mesos APIs
[ https://issues.apache.org/jira/browse/MESOS-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-7170: Assignee: (was: Gilbert Song) > Allow for custom filters on Mesos APIs > --- > > Key: MESOS-7170 > URL: https://issues.apache.org/jira/browse/MESOS-7170 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Miguel Bernadin >Priority: Minor > > For tasks.json API and others like state.json, etc, on larger clusters the > data that Mesos master sends is quite lengthy. It would be good to provide > filters in the API to allow Mesos to just send only the RUNNING tasks in the > cluster so it does less work. Creating this JIRA so we can have intelligent > filters to pick what data to send on the server side, rather than filtering > it out on the client side. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7564) Introduce a heartbeat mechanism for executor <-> agent communication.
[ https://issues.apache.org/jira/browse/MESOS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7564: -- Target Version/s: 1.4.0 > Introduce a heartbeat mechanism for executor <-> agent communication. > - > > Key: MESOS-7564 > URL: https://issues.apache.org/jira/browse/MESOS-7564 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar > > Currently, we do not have heartbeats for executor <-> agent communication. > This is especially problematic in scenarios when IPFilters are enabled since > the default conntrack keep alive timeout is 5 days. When that timeout > elapses, the executor doesn't get notified via a socket disconnection when > the agent process restarts. The executor would then get killed if it doesn't > re-register when the agent recovery process is completed. > Enabling application level heartbeats or TCP KeepAlive's can be a possible > way for fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7563) Make the HTTP command executor the default implementation.
[ https://issues.apache.org/jira/browse/MESOS-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7563: -- Target Version/s: 1.4.0 > Make the HTTP command executor the default implementation. > -- > > Key: MESOS-7563 > URL: https://issues.apache.org/jira/browse/MESOS-7563 > Project: Mesos > Issue Type: Epic >Reporter: Anand Mazumdar > > This epic tracks the work needed to make HTTP command executors the default > i.e., enable the {{http_command_executor}} flag. Currently, all command > executors use the old executor driver implementation. With this flag being > always enabled, the command executors would use the v1 HTTP API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7564) Introduce a heartbeat mechanism for executor <-> agent communication.
Anand Mazumdar created MESOS-7564: - Summary: Introduce a heartbeat mechanism for executor <-> agent communication. Key: MESOS-7564 URL: https://issues.apache.org/jira/browse/MESOS-7564 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Currently, we do not have heartbeats for executor <-> agent communication. This is especially problematic in scenarios when IPFilters are enabled since the default conntrack keep alive timeout is 5 days. When that timeout elapses, the executor doesn't get notified via a socket disconnection when the agent process restarts. The executor would then get killed if it doesn't re-register when the agent recovery process is completed. Enabling application level heartbeats or TCP KeepAlive's can be a possible way for fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7562) MasterTest.IgnoreOldAgentReregistration is flaky
[ https://issues.apache.org/jira/browse/MESOS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-7562: -- Assignee: Neil Conway > MasterTest.IgnoreOldAgentReregistration is flaky > > > Key: MESOS-7562 > URL: https://issues.apache.org/jira/browse/MESOS-7562 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Neil Conway > > {noformat} > [ RUN ] MasterTest.IgnoreOldAgentReregistration > I0524 16:29:07.143152 29236 cluster.cpp:162] Creating default 'local' > authorizer > I0524 16:29:07.149690 29287 master.cpp:436] Master > 3912ae61-36a4-468c-bef5-82f082370f3d (core-dev) started on 10.0.49.2:42980 > I0524 16:29:07.149724 29287 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/gg4ie7/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/gg4ie7/master" > --zk_session_timeout="10secs" > I0524 16:29:07.149896 29287 master.cpp:488] Master only allowing > authenticated frameworks to register > I0524 16:29:07.149905 29287 master.cpp:502] Master only allowing > authenticated agents to register > I0524 16:29:07.149912 29287 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0524 16:29:07.149920 29287 credentials.hpp:37] Loading credentials for > authentication from '/tmp/gg4ie7/credentials' > I0524 16:29:07.150065 29287 master.cpp:560] Using default 'crammd5' > authenticator > I0524 16:29:07.150133 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0524 16:29:07.150168 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0524 16:29:07.150223 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0524 16:29:07.150259 29287 master.cpp:640] Authorization enabled > I0524 16:29:07.151617 29274 master.cpp:2161] Elected as the leading master! > I0524 16:29:07.151644 29274 master.cpp:1700] Recovering from registrar > I0524 16:29:07.152218 29261 registrar.cpp:389] Successfully fetched the > registry (0B) in 505088ns > I0524 16:29:07.152268 29261 registrar.cpp:493] Applied 1 operations in > 4200ns; attempting to update the registry > I0524 16:29:07.152664 29261 registrar.cpp:550] Successfully updated the > registry in 371200ns > I0524 16:29:07.152703 29261 registrar.cpp:422] Successfully recovered > registrar > I0524 16:29:07.153328 29291 master.cpp:1799] Recovered 0 agents from the > registry (119B); allowing 10mins for agents to re-register > I0524 16:29:07.160094 29236 containerizer.cpp:230] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret > W0524 16:29:07.160295 29236 backend.cpp:76] Failed to create 'overlay' > backend: OverlayBackend requires root privileges > W0524 16:29:07.160326 29236 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0524 16:29:07.160334 29236 provisioner.cpp:255] Using default backend 'copy' > I0524 16:29:07.161916 29236 cluster.cpp:448] Creating default 'local' > authorizer > I0524 16:29:07.162616 29276 slave.cpp:225] Mesos agent started on > (7738)@10.0.49.2:42980 > I0524 16:29:07.162644 29276 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup"
[jira] [Created] (MESOS-7563) Make the HTTP command executor the default implementation.
Anand Mazumdar created MESOS-7563: - Summary: Make the HTTP command executor the default implementation. Key: MESOS-7563 URL: https://issues.apache.org/jira/browse/MESOS-7563 Project: Mesos Issue Type: Epic Reporter: Anand Mazumdar This epic tracks the work needed to make HTTP command executors the default i.e., enable the {{http_command_executor}} flag. Currently, all command executors use the old executor driver implementation. With this flag being always enabled, the command executors would use the v1 HTTP API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7562) MasterTest.IgnoreOldAgentReregistration is flaky
Neil Conway created MESOS-7562: -- Summary: MasterTest.IgnoreOldAgentReregistration is flaky Key: MESOS-7562 URL: https://issues.apache.org/jira/browse/MESOS-7562 Project: Mesos Issue Type: Bug Reporter: Neil Conway {noformat} [ RUN ] MasterTest.IgnoreOldAgentReregistration I0524 16:29:07.143152 29236 cluster.cpp:162] Creating default 'local' authorizer I0524 16:29:07.149690 29287 master.cpp:436] Master 3912ae61-36a4-468c-bef5-82f082370f3d (core-dev) started on 10.0.49.2:42980 I0524 16:29:07.149724 29287 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/gg4ie7/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/gg4ie7/master" --zk_session_timeout="10secs" I0524 16:29:07.149896 29287 master.cpp:488] Master only allowing authenticated frameworks to register I0524 16:29:07.149905 29287 master.cpp:502] Master only allowing authenticated agents to register I0524 16:29:07.149912 29287 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0524 16:29:07.149920 29287 credentials.hpp:37] Loading credentials for authentication from '/tmp/gg4ie7/credentials' I0524 16:29:07.150065 29287 master.cpp:560] Using default 'crammd5' authenticator I0524 16:29:07.150133 29287 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0524 16:29:07.150168 29287 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0524 16:29:07.150223 29287 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0524 16:29:07.150259 29287 master.cpp:640] Authorization enabled I0524 16:29:07.151617 29274 master.cpp:2161] Elected as the leading master! I0524 16:29:07.151644 29274 master.cpp:1700] Recovering from registrar I0524 16:29:07.152218 29261 registrar.cpp:389] Successfully fetched the registry (0B) in 505088ns I0524 16:29:07.152268 29261 registrar.cpp:493] Applied 1 operations in 4200ns; attempting to update the registry I0524 16:29:07.152664 29261 registrar.cpp:550] Successfully updated the registry in 371200ns I0524 16:29:07.152703 29261 registrar.cpp:422] Successfully recovered registrar I0524 16:29:07.153328 29291 master.cpp:1799] Recovered 0 agents from the registry (119B); allowing 10mins for agents to re-register I0524 16:29:07.160094 29236 containerizer.cpp:230] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret W0524 16:29:07.160295 29236 backend.cpp:76] Failed to create 'overlay' backend: OverlayBackend requires root privileges W0524 16:29:07.160326 29236 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0524 16:29:07.160334 29236 provisioner.cpp:255] Using default backend 'copy' I0524 16:29:07.161916 29236 cluster.cpp:448] Creating default 'local' authorizer I0524 16:29:07.162616 29276 slave.cpp:225] Mesos agent started on (7738)@10.0.49.2:42980 I0524 16:29:07.162644 29276 slave.cpp:226] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_IgnoreOldAgentReregistration_WX8CZz/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; --docker_remove_delay="6hrs"
[jira] [Commented] (MESOS-7476) Restrict capabilities to only the bounding set.
[ https://issues.apache.org/jira/browse/MESOS-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023909#comment-16023909 ] James Peach commented on MESOS-7476: | [https://reviews.apache.org/r/59547/|https://reviews.apache.org/r/59547/] | Rename ContainerLaunchInfo `capabilities` field.| | [https://reviews.apache.org/r/59548/|https://reviews.apache.org/r/59548/] | Add a `bounding_capabilities` field to ContainerLaunchInfo.| | [https://reviews.apache.org/r/59549/|https://reviews.apache.org/r/59549/] | Add the agent --bounding_capabilities flag.| | [https://reviews.apache.org/r/59550/|https://reviews.apache.org/r/59550/] |Check bounding capabilities at isolator creation time | | [https://reviews.apache.org/r/59551/|https://reviews.apache.org/r/59551/] |Change launcher working directory before dropping privilege. | | [https://reviews.apache.org/r/59552/|https://reviews.apache.org/r/59552/] | Add support for explicitly setting bounding capabilities. | > Restrict capabilities to only the bounding set. > --- > > Key: MESOS-7476 > URL: https://issues.apache.org/jira/browse/MESOS-7476 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: James Peach >Assignee: James Peach > > As a security improvement, it would be useful to be able to set the bounding > capability set without also granting those capabilities. This is what the > {{--allowed_capabilities}} flag sounds like it does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7170) Allow for custom filters on Mesos APIs
[ https://issues.apache.org/jira/browse/MESOS-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023906#comment-16023906 ] Eric Chung commented on MESOS-7170: --- We're having the same issue: our production cluster typically runs 30k+ tasks, making the time to download the entire list of tasks too long for a productive user experience, and also creates an unnecessary load on the master. It would be awesome if we could filter tasks by label, which can be customized by the user, so we could do something like: `curl /tasks?=&=` we could of course also do something fancy like using a resource query language, but that can be up for debate. > Allow for custom filters on Mesos APIs > --- > > Key: MESOS-7170 > URL: https://issues.apache.org/jira/browse/MESOS-7170 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Miguel Bernadin >Assignee: Gilbert Song >Priority: Minor > > For tasks.json API and others like state.json, etc, on larger clusters the > data that Mesos master sends is quite lengthy. It would be good to provide > filters in the API to allow Mesos to just send only the RUNNING tasks in the > cluster so it does less work. Creating this JIRA so we can have intelligent > filters to pick what data to send on the server side, rather than filtering > it out on the client side. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (MESOS-7476) Restrict capabilities to only the bounding set.
[ https://issues.apache.org/jira/browse/MESOS-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-7476: --- Comment: was deleted (was: | [https://reviews.apache.org/r/59183/|https://reviews.apache.org/r/59183/] | Refactor setting capabilities into a helper function. | | [https://reviews.apache.org/r/59184/|https://reviews.apache.org/r/59184/] | Add support for explicitly setting bounding capabilities. |) > Restrict capabilities to only the bounding set. > --- > > Key: MESOS-7476 > URL: https://issues.apache.org/jira/browse/MESOS-7476 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: James Peach >Assignee: James Peach > > As a security improvement, it would be useful to be able to set the bounding > capability set without also granting those capabilities. This is what the > {{--allowed_capabilities}} flag sounds like it does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7477) Support ambient capabilities.
[ https://issues.apache.org/jira/browse/MESOS-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006761#comment-16006761 ] James Peach edited comment on MESOS-7477 at 5/24/17 11:52 PM: -- | [https://reviews.apache.org/r/59185/|https://reviews.apache.org/r/59185/] |Add ambient capability support. | | [https://reviews.apache.org/r/59553/|https://reviews.apache.org/r/59553/] | Add ambient capabilities to launched tasks. | | [https://reviews.apache.org/r/59554/|https://reviews.apache.org/r/59554/] | Rename the `\-\-allowed_capabilities` flag to `\-\-effective_capabilities`. | | [https://reviews.apache.org/r/59186/|https://reviews.apache.org/r/59186/] |Additional linux/capabilities isolator documentation. | was (Author: jamespeach): | [https://reviews.apache.org/r/59185/|https://reviews.apache.org/r/59185/] | Add ambient capability support. | | [https://reviews.apache.org/r/59186/|https://reviews.apache.org/r/59186/] | Additional linux/capabilities isolator documentation. | > Support ambient capabilities. > - > > Key: MESOS-7477 > URL: https://issues.apache.org/jira/browse/MESOS-7477 > Project: Mesos > Issue Type: Improvement >Reporter: James Peach >Assignee: James Peach > > Add support for ambient capabilities so that capabilities granted in the > {{LaunchTask}} message can be made active in the task without the requirement > for matching file-based capabilities. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7521) Major performance regression in DRF sorter.
[ https://issues.apache.org/jira/browse/MESOS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7521: --- Summary: Major performance regression in DRF sorter. (was: Major performance regression in drf sorter) > Major performance regression in DRF sorter. > --- > > Key: MESOS-7521 > URL: https://issues.apache.org/jira/browse/MESOS-7521 > Project: Mesos > Issue Type: Bug > Components: allocation >Affects Versions: 1.3.0 >Reporter: Dario Rexin >Assignee: Neil Conway >Priority: Blocker > Labels: perfomance > > The addition of hierarchical roles to the framework sorter > (https://github.com/apache/mesos/commit/e5ef1992b2b8e84b5d1487f1578f18f2291cd082) > has introduced a major performance regression to 1.2. Suppressing offers for > frameworks does not seem to reduce allocation time anymore, like it used to > in 1.2. Here are some relevant benchmark results: > Mesos 1.2: > {noformat} > [ RUN ] > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 > Using 1000 agents and 6000 frameworks > Added 6000 frameworks in 105957us > Added 1000 agents in 34.937438secs > allocate() took 27.408828secs to make 1000 offers with 1200 out of 6000 > frameworks suppressing offers > allocate() took 20.121897secs to make 1000 offers with 2400 out of 6000 > frameworks suppressing offers > allocate() took 12.964302secs to make 1000 offers with 3600 out of 6000 > frameworks suppressing offers > allocate() took 6.534221secs to make 1000 offers with 4800 out of 6000 > frameworks suppressing offers > allocate() took 8953us to make 0 offers with 6000 out of 6000 frameworks > suppressing offers > [ OK ] > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 > (106198 ms) > {noformat} > Mesos 1.3: > {noformat} > [ RUN ] > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 > Using 1000 agents and 6000 frameworks > Added 6000 frameworks in 1.036217secs > Added 1000 agents in 10.093938secs > allocate() took 10.629448secs to make 1000 offers with 1200 out of 6000 > frameworks suppressing offers > allocate() took 11.607185secs to make 1000 offers with 2400 out of 6000 > frameworks suppressing offers > allocate() took 12.896578secs to make 1000 offers with 3600 out of 6000 > frameworks suppressing offers > allocate() took 14.162431secs to make 1000 offers with 4800 out of 6000 > frameworks suppressing offers > allocate() took 257060us to make 0 offers with 6000 out of 6000 frameworks > suppressing offers > [ OK ] > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.SuppressOffers/7 > (64011 ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7552) MasterAllocatorTest/0.FrameworkExited is flaky
[ https://issues.apache.org/jira/browse/MESOS-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7552: --- Shepherd: Anand Mazumdar > MasterAllocatorTest/0.FrameworkExited is flaky > -- > > Key: MESOS-7552 > URL: https://issues.apache.org/jira/browse/MESOS-7552 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > {noformat} > [ RUN ] MasterAllocatorTest/0.FrameworkExited > I0523 19:43:15.274132 29720 cluster.cpp:162] Creating default 'local' > authorizer > I0523 19:43:15.280047 29758 master.cpp:436] Master > a2abf627-97d2-4603-bda2-301f78203413 (core-dev) started on 10.0.49.2:33691 > I0523 19:43:15.280078 29758 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/GdDJ5A/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/GdDJ5A/master" > --zk_session_timeout="10secs" > I0523 19:43:15.280259 29758 master.cpp:488] Master only allowing > authenticated frameworks to register > I0523 19:43:15.280269 29758 master.cpp:502] Master only allowing > authenticated agents to register > I0523 19:43:15.280297 29758 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0523 19:43:15.280305 29758 credentials.hpp:37] Loading credentials for > authentication from '/tmp/GdDJ5A/credentials' > I0523 19:43:15.280433 29758 master.cpp:560] Using default 'crammd5' > authenticator > I0523 19:43:15.280496 29758 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0523 19:43:15.280544 29758 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0523 19:43:15.280743 29758 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0523 19:43:15.280772 29758 master.cpp:640] Authorization enabled > I0523 19:43:15.281690 29774 master.cpp:2161] Elected as the leading master! > I0523 19:43:15.281720 29774 master.cpp:1700] Recovering from registrar > I0523 19:43:15.281911 29768 registrar.cpp:389] Successfully fetched the > registry (0B) in 120320ns > I0523 19:43:15.281942 29768 registrar.cpp:493] Applied 1 operations in > 2995ns; attempting to update the registry > I0523 19:43:15.282146 29768 registrar.cpp:550] Successfully updated the > registry in 192us > I0523 19:43:15.282207 29768 registrar.cpp:422] Successfully recovered > registrar > I0523 19:43:15.282466 29779 master.cpp:1799] Recovered 0 agents from the > registry (119B); allowing 10mins for agents to re-register > I0523 19:43:15.289202 29720 cluster.cpp:448] Creating default 'local' > authorizer > I0523 19:43:15.289670 29758 slave.cpp:225] Mesos agent started on > (50)@10.0.49.2:33691 > I0523 19:43:15.289695 29758 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" > --credential="/tmp/MasterAllocatorTest_0_FrameworkExited_OPmret/credential" > --default_role="*" --disk_watch_interval="1mins" --docker="docker" > --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; > --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" > --docker_stop_timeout="0ns"
[jira] [Created] (MESOS-7561) Add storage resource provider specific information in ResourceProviderInfo.
Jie Yu created MESOS-7561: - Summary: Add storage resource provider specific information in ResourceProviderInfo. Key: MESOS-7561 URL: https://issues.apache.org/jira/browse/MESOS-7561 Project: Mesos Issue Type: Task Reporter: Jie Yu For storage resource provider, there will be some specific configuration information. For instance, the most important one is the `ContainerConfig` of the CSI Plugin container. That config information will be sent to the corresponding agent that will use the resources provided by the resource provider. For storage resource provider particularly, the agent needs to launch the CSI Node Plugin to mount the volumes. Comparing to adding first class storage resource provider information, an alternative is to add a generic labels field in ResourceProviderInfo and let resource provider itself figure out the format of the labels. However, I believe a first class solution is better and more clear. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7560) Add 'type' and 'name' to ResourceProviderInfo.
Jie Yu created MESOS-7560: - Summary: Add 'type' and 'name' to ResourceProviderInfo. Key: MESOS-7560 URL: https://issues.apache.org/jira/browse/MESOS-7560 Project: Mesos Issue Type: Task Reporter: Jie Yu The 'type' field will be used to load the corresponding implementation (either internal or via module). To avoid conflict, the naming should follow java packing naming scheme (e.g., org.apache.mesos.resource_provider.local.storage). Since there could be multiple instances of the same resource provider type, it's important to also add a 'name' field to distinguish between instances of the same type. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7559) CMake builds using parallel execution fail on OS X
[ https://issues.apache.org/jira/browse/MESOS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Wood updated MESOS-7559: -- Description: When doing a {code}cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_DEBUG=0 .. && make -j4{code} there are some strange transient errors that pop up: {code} Scanning dependencies of target boost-1.53.0 /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/src /Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/make_bin_include_dir.dir/build.make CMakeFiles/make_bin_include_dir.dir/build make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission denied make[1]: *** [3rdparty/CMakeFiles/protobuf-2.6.1.dir/all] Error 1 make[1]: *** Waiting for unfinished jobs /Applications/Xcode.app/Contents/Developer/usr/bin/make -f 3rdparty/CMakeFiles/boost-1.53.0.dir/build.make 3rdparty/CMakeFiles/boost-1.53.0.dir/build make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission denied make[1]: *** [CMakeFiles/make_bin_include_dir.dir/all] Error 1 make[1]: *** [3rdparty/CMakeFiles/boost-1.53.0.dir/all] Error 1 [ 0%] Built target make_bin_src_dir make: *** [all] Error 2 {code} {code} /usr/include/assert.h:93:25: note: expanded from macro 'assert' (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE__, #e) : (void)0) ^ 29 warnings generated. libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22 -g -O3 -MT ev.lo -MD -MP -MF .deps/ev.Tpo -c /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22/ev.c -o ev.o >/dev/null 2>&1 mv -f .deps/ev.Tpo .deps/ev.Plo /bin/sh ./libtool --tag=CC --mode=link gcc -g -O3 -version-info 4:0:0 -o libev.la -rpath /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib ev.lo event.lo libtool: link: gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o .libs/libev.4.dylib .libs/ev.o .libs/event.o-O3 -install_name /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib/libev.4.dylib -compatibility_version 5 -current_version 5.0 -Wl,-single_module libtool: link: (cd ".libs" && rm -f "libev.dylib" && ln -s "libev.4.dylib" "libev.dylib") libtool: link: ar cru .libs/libev.a ev.o event.o libtool: link: ranlib .libs/libev.a libtool: link: ( cd ".libs" && rm -f "libev.la" && ln -s "../libev.la" "libev.la" ) cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-build [ 4%] Performing install step for 'libev-4.22' cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && mkdir -p /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib && cp -r /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs/. /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-install [ 6%] Completed 'libev-4.22' cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/libev-4.22-complete cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-done [ 6%] Built target libev-4.22 make: *** [all] Error 2 {code} And there seems to be an impassable error further along: {code} [ 27%] Completed 'glog-0.3.3' cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/glog-0.3.3-complete cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-stamp/glog-0.3.3-done gmake[2]: Leaving directory '/Users/myusername/Code/src/mesos/build' [ 27%] Built target glog-0.3.3 gmake[1]: Leaving directory '/Users/myusername/Code/src/mesos/build' gmake: *** [Makefile:120: all] Error 2 {code} was: There are some strange transient
[jira] [Created] (MESOS-7559) CMake builds using parallel execution fail on OS X
Aaron Wood created MESOS-7559: - Summary: CMake builds using parallel execution fail on OS X Key: MESOS-7559 URL: https://issues.apache.org/jira/browse/MESOS-7559 Project: Mesos Issue Type: Bug Components: build, cmake Reporter: Aaron Wood Assignee: Andrew Schwartzmeyer Priority: Minor There are some strange transient errors that pop up: {code} Scanning dependencies of target boost-1.53.0 /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/src /Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/make_bin_include_dir.dir/build.make CMakeFiles/make_bin_include_dir.dir/build make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission denied make[1]: *** [3rdparty/CMakeFiles/protobuf-2.6.1.dir/all] Error 1 make[1]: *** Waiting for unfinished jobs /Applications/Xcode.app/Contents/Developer/usr/bin/make -f 3rdparty/CMakeFiles/boost-1.53.0.dir/build.make 3rdparty/CMakeFiles/boost-1.53.0.dir/build make[1]: /Applications/Xcode.app/Contents/Developer/usr/bin/make: Permission denied make[1]: *** [CMakeFiles/make_bin_include_dir.dir/all] Error 1 make[1]: *** [3rdparty/CMakeFiles/boost-1.53.0.dir/all] Error 1 [ 0%] Built target make_bin_src_dir make: *** [all] Error 2 {code} {code} /usr/include/assert.h:93:25: note: expanded from macro 'assert' (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE__, #e) : (void)0) ^ 29 warnings generated. libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22 -g -O3 -MT ev.lo -MD -MP -MF .deps/ev.Tpo -c /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22/ev.c -o ev.o >/dev/null 2>&1 mv -f .deps/ev.Tpo .deps/ev.Plo /bin/sh ./libtool --tag=CC --mode=link gcc -g -O3 -version-info 4:0:0 -o libev.la -rpath /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib ev.lo event.lo libtool: link: gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o .libs/libev.4.dylib .libs/ev.o .libs/event.o-O3 -install_name /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib/libev.4.dylib -compatibility_version 5 -current_version 5.0 -Wl,-single_module libtool: link: (cd ".libs" && rm -f "libev.dylib" && ln -s "libev.4.dylib" "libev.dylib") libtool: link: ar cru .libs/libev.a ev.o event.o libtool: link: ranlib .libs/libev.a libtool: link: ( cd ".libs" && rm -f "libev.la" && ln -s "../libev.la" "libev.la" ) cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-build [ 4%] Performing install step for 'libev-4.22' cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && mkdir -p /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib && cp -r /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs/. /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-lib/lib cd /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-install [ 6%] Completed 'libev-4.22' cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/libev-4.22-complete cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-stamp/libev-4.22-done [ 6%] Built target libev-4.22 make: *** [all] Error 2 {code} And there seems to be an impassable error further along: {code} [ 27%] Completed 'glog-0.3.3' cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E make_directory /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/CMakeFiles/glog-0.3.3-complete cd /Users/myusername/Code/src/mesos/build/3rdparty && /usr/local/Cellar/cmake/3.8.1/bin/cmake -E touch /Users/myusername/Code/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-stamp/glog-0.3.3-done gmake[2]: Leaving directory '/Users/myusername/Code/src/mesos/build' [ 27%] Built target glog-0.3.3 gmake[1]: Leaving
[jira] [Updated] (MESOS-7515) MasterAllocatorTest/0.ResourcesUnused is flaky
[ https://issues.apache.org/jira/browse/MESOS-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7515: --- Shepherd: Anand Mazumdar > MasterAllocatorTest/0.ResourcesUnused is flaky > -- > > Key: MESOS-7515 > URL: https://issues.apache.org/jira/browse/MESOS-7515 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > {noformat} > [ RUN ] MasterAllocatorTest/0.ResourcesUnused > I0516 11:23:52.681485 27347 cluster.cpp:162] Creating default 'local' > authorizer > I0516 11:23:52.689667 27389 master.cpp:436] Master > 0596a957-df3e-4b44-94d6-d99478d0bb6e (core-dev) started on 10.0.49.2:42110 > I0516 11:23:52.689745 27389 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/5Pnjkv/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/5Pnjkv/master" > --zk_session_timeout="10secs" > I0516 11:23:52.690110 27389 master.cpp:488] Master only allowing > authenticated frameworks to register > I0516 11:23:52.690142 27389 master.cpp:502] Master only allowing > authenticated agents to register > I0516 11:23:52.690166 27389 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0516 11:23:52.690218 27389 credentials.hpp:37] Loading credentials for > authentication from '/tmp/5Pnjkv/credentials' > I0516 11:23:52.690475 27389 master.cpp:560] Using default 'crammd5' > authenticator > I0516 11:23:52.690603 27389 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0516 11:23:52.690723 27389 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0516 11:23:52.690870 27389 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0516 11:23:52.691264 27389 master.cpp:640] Authorization enabled > I0516 11:23:52.694108 27394 master.cpp:2161] Elected as the leading master! > I0516 11:23:52.694157 27394 master.cpp:1700] Recovering from registrar > I0516 11:23:52.695142 27362 registrar.cpp:389] Successfully fetched the > registry (0B) in 756992ns > I0516 11:23:52.695263 27362 registrar.cpp:493] Applied 1 operations in > 14433ns; attempting to update the registry > I0516 11:23:52.695825 27362 registrar.cpp:550] Successfully updated the > registry in 457984ns > I0516 11:23:52.695955 27362 registrar.cpp:422] Successfully recovered > registrar > I0516 11:23:52.697041 27381 master.cpp:1799] Recovered 0 agents from the > registry (119B); allowing 10mins for agents to re-register > I0516 11:23:52.712441 27347 cluster.cpp:448] Creating default 'local' > authorizer > I0516 11:23:52.713631 27375 slave.cpp:225] Mesos agent started on > (79)@10.0.49.2:42110 > I0516 11:23:52.713680 27375 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" > --credential="/tmp/MasterAllocatorTest_0_ResourcesUnused_KNgb71/credential" > --default_role="*" --disk_watch_interval="1mins" --docker="docker" > --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; > --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" > --docker_stop_timeout="0ns"
[jira] [Created] (MESOS-7558) Add resource provider validation
Jan Schlicht created MESOS-7558: --- Summary: Add resource provider validation Key: MESOS-7558 URL: https://issues.apache.org/jira/browse/MESOS-7558 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht Similar to how it's done during agent registration/re-registration, the informations provided by a resource provider need to get validation during certain operation (e.g. re-registration, while applying offer operations, ...). Some of these validations only cover the provided informations (e.g. are the resources in {{ResourceProviderInfo}} only of type {{disk}}), others take the current cluster state into account (e.g. do the resources that a task wants to use exist on the resource provider). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7557) Test that resource providers can re-register after a master failover
Jan Schlicht created MESOS-7557: --- Summary: Test that resource providers can re-register after a master failover Key: MESOS-7557 URL: https://issues.apache.org/jira/browse/MESOS-7557 Project: Mesos Issue Type: Task Reporter: Jan Schlicht Restarting a master in a test environment should trigger a resource provider re-registration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7556) Wait for resource provider re-registrations after a master failover
Jan Schlicht created MESOS-7556: --- Summary: Wait for resource provider re-registrations after a master failover Key: MESOS-7556 URL: https://issues.apache.org/jira/browse/MESOS-7556 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht Recover all resource provider IDs from registrar after a failover and set up timeouts for resource providers to re-register. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7555) Add resource provider IDs to the registry
Jan Schlicht created MESOS-7555: --- Summary: Add resource provider IDs to the registry Key: MESOS-7555 URL: https://issues.apache.org/jira/browse/MESOS-7555 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht To support resource provider re-registration following a master fail-over, the IDs of registered resource providers need to be kept in the registry. An operation to commit those IDs using the registrar needs to be added as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7554) Add a re-registration timeout for resource providers
Jan Schlicht created MESOS-7554: --- Summary: Add a re-registration timeout for resource providers Key: MESOS-7554 URL: https://issues.apache.org/jira/browse/MESOS-7554 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht This re-registration timeout will be started when a resource provider seems to have disconnected, similar to how it's done for agents. While waiting for the resource provider to reconnect, it will be deactivated. On re-registration the timeout will be canceled and the resource provider activated again. In case of a timeout, the internal state will be changed to {{unreachable}} (as it is for agents in that situation) and considered gone. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7553) Distinguish between different resource provider states in the master
Jan Schlicht created MESOS-7553: --- Summary: Distinguish between different resource provider states in the master Key: MESOS-7553 URL: https://issues.apache.org/jira/browse/MESOS-7553 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht In preparation to support time-outs for resource provider re-registrations, the master needs to be able to distinguish between registered, unreachable and gone resource providers, so that resources aren't offered when not registered. For that, internal resource provider states have to be added to the master, as it is already implemented for agents (i.e. the {{completed}}, {{registered}}, {{removed}} maps in {{master.cpp}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)