[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944546#comment-15944546 ] Deshi Xiao commented on MESOS-6184: --- i have rebase the patch to 1.2.0 branch codebase. and testing it, it always get coredump file. ``` I0328 11:48:12.92218148 exec.cpp:162] Version: 1.2.0 I0328 11:48:12.92925254 exec.cpp:237] Executor registered on agent a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 I0328 11:48:12.93164054 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx I0328 11:48:16.14571453 health_checker.cpp:196] Ignoring failure as health check still in grace period W0328 11:48:26.28995849 health_checker.cpp:202] Health check failed 1 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:36.34005555 health_checker.cpp:202] Health check failed 2 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:46.38653349 health_checker.cpp:202] Health check failed 3 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack trace: *** @
[jira] [Issue Comment Deleted] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deshi Xiao updated MESOS-6184: -- Comment: was deleted (was: good for me) > Health checks should use a general mechanism to enter namespaces of the task. > - > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Critical > Labels: health-check, mesosphere > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7319) Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion.
[ https://issues.apache.org/jira/browse/MESOS-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7319: --- Description: The current naming of the DRAIN mode in maintenance has been confusing to users as there tends to be an expectation of mesos doing something (e.g. not sending offers, or killing tasks) to achieve the drain, whereas in reality mesos does nothing and expects the schedulers to act (this only applies for maintenance aware schedulers). Rather, what's actually happening at in the DRAIN mode is that the maintenance is scheduled, that's it. So a name like SCHEDULED would be less confusing for users: http://mesos.apache.org/documentation/latest/maintenance/ Component/s: documentation > Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion. > -- > > Key: MESOS-7319 > URL: https://issues.apache.org/jira/browse/MESOS-7319 > Project: Mesos > Issue Type: Improvement > Components: documentation, HTTP API, master >Reporter: Benjamin Mahler > > The current naming of the DRAIN mode in maintenance has been confusing to > users as there tends to be an expectation of mesos doing something (e.g. not > sending offers, or killing tasks) to achieve the drain, whereas in reality > mesos does nothing and expects the schedulers to act (this only applies for > maintenance aware schedulers). > Rather, what's actually happening at in the DRAIN mode is that the > maintenance is scheduled, that's it. So a name like SCHEDULED would be less > confusing for users: http://mesos.apache.org/documentation/latest/maintenance/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7319) Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion.
Benjamin Mahler created MESOS-7319: -- Summary: Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion. Key: MESOS-7319 URL: https://issues.apache.org/jira/browse/MESOS-7319 Project: Mesos Issue Type: Improvement Components: HTTP API, master Reporter: Benjamin Mahler -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7201) Improvements to maintenance primitives
[ https://issues.apache.org/jira/browse/MESOS-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944343#comment-15944343 ] Benjamin Mahler commented on MESOS-7201: [~kaysoky] I'm inclined to rename the {{DRAIN}} mode to {{SCHEDULED}} as there is not necessarily "draining" occurring in the {{DRAIN}} mode, so this tends to confuse users as they have an expectation of mesos doing something (e.g. not sending offers, or killing tasks) to achieve the drain. Thoughts? > Improvements to maintenance primitives > -- > > Key: MESOS-7201 > URL: https://issues.apache.org/jira/browse/MESOS-7201 > Project: Mesos > Issue Type: Epic >Reporter: Joseph Wu > Labels: mesosphere > > This is a follow up epic to MESOS-1474 to capture further improvements for > maintenance primitives. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7317) Add master endpoint to deactivate / activate agent
[ https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7317: --- Target Version/s: 1.3.0 > Add master endpoint to deactivate / activate agent > -- > > Key: MESOS-7317 > URL: https://issues.apache.org/jira/browse/MESOS-7317 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Neil Conway > Labels: mesosphere > > This would allow the operator to deactivate and then subsequently activate an > agent. The allocator does not make offers for deactivated agents; this > functionality would be useful to help operators "manually (incrementally) > drain" the tasks running on an agent, e.g., before taking the agent down. > At present, if the operator causes a framework to kill a task running on the > agent, the framework will often receive an offer for the unused resources on > the agent, which will often result in respawning the killed task on the same > agent. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7235) Improve Storage Support using Resources Provider and CSI
[ https://issues.apache.org/jira/browse/MESOS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7235: -- Summary: Improve Storage Support using Resources Provider and CSI (was: Improve Storage Support using Resources Provider) > Improve Storage Support using Resources Provider and CSI > > > Key: MESOS-7235 > URL: https://issues.apache.org/jira/browse/MESOS-7235 > Project: Mesos > Issue Type: Epic >Reporter: Jie Yu >Assignee: Jie Yu > > Currently, Mesos supports both [local persistent > volumes|https://github.com/apache/mesos/blob/master/docs/persistent-volume.md] > as well as [external persistent > volumes|https://github.com/apache/mesos/blob/master/docs/docker-volume.md]. > However, both of them are not ideal. > Local persistent volumes do not support offering physical or logical block > devices directly. Also, frameworks do not have choices to select filesystems > for their local persistent volumes. There are also some [usability > problem|https://issues.apache.org/jira/browse/MESOS-4209] with the local > persistent volumes. Mesos does support [multiple local > disks|https://github.com/apache/mesos/blob/master/docs/multiple-disk.md]. > However, it’s a big burden for operators to configure each agent properly to > be able to leverage this feature. > External persistent volumes support in Mesos currently bypasses the resource > management part. In other words, using an external persistent volume does not > go through the usual offer cycle. Mesos doesn’t track resources associated > with the external volumes. This makes quota control, reservation, fair > sharing almost impossible to implement. Also, the current interface Mesos > uses to interact with volume providers is the [Docker Volume Driver interface > (DVDI)|https://docs.docker.com/engine/extend/plugins_volume/], which is very > specific to operations on a particular agent. > The main problem I see currently is that we don’t have a coherent story for > storage. Yes, we have some primitives in Mesos that can support some stateful > services, but this is far from ideal. Some of them are just the stop gap > solution (e.g., the external volume support). This epic tries to tell a > coherent story for supporting storage in Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7317) Add master endpoint to deactivate / activate agent
[ https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944208#comment-15944208 ] Benjamin Mahler commented on MESOS-7317: Linking in the "maintenance improvements" epic. > Add master endpoint to deactivate / activate agent > -- > > Key: MESOS-7317 > URL: https://issues.apache.org/jira/browse/MESOS-7317 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Neil Conway > Labels: mesosphere > > This would allow the operator to deactivate and then subsequently activate an > agent. The allocator does not make offers for deactivated agents; this > functionality would be useful to help operators "manually (incrementally) > drain" the tasks running on an agent, e.g., before taking the agent down. > At present, if the operator causes a framework to kill a task running on the > agent, the framework will often receive an offer for the unused resources on > the agent, which will often result in respawning the killed task on the same > agent. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7318) Libprocess delays and timers should be undisturbed by system clock jumps.
Benjamin Mahler created MESOS-7318: -- Summary: Libprocess delays and timers should be undisturbed by system clock jumps. Key: MESOS-7318 URL: https://issues.apache.org/jira/browse/MESOS-7318 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Benjamin Mahler Currently, libprocess timers / delays / timeouts are affected by system clock jumps because they do not use a monotonic clock as a reference point. Since these require relative timing, we can use a monotonic clock as the reference point. We also need the approach to be affected by clock manipulation at the libprocess level (i.e. {{Clock::advance(...)}} and {{Clock::update(...)}}) for testing purposes. The current recommendation is for users to use NTP with skewing applied to adjust for leaps, e.g.: https://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html I thought we already had a ticket for this but can't seem to find it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7317) Add master endpoint to deactivate / activate agent
[ https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7317: --- Description: This would allow the operator to deactivate and then subsequently activate an agent. The allocator does not make offers for deactivated agents; this functionality would be useful to help operators "manually (incrementally) drain" the tasks running on an agent, e.g., before taking the agent down. At present, if the operator causes a framework to kill a task running on the agent, the framework will often receive an offer for the unused resources on the agent, which will often result in respawning the killed task on the same agent. was: This would allow the operator to deactivate and then subsequently activate an agent. The allocator does not make offers for deactivated agents; this functionality would be useful to help operators "manually (incrementally) drain" the tasks running on an agent, e.g., before taking the agent down. At present, if the operator causes a framework to kill a task running on the agent, the framework will receive an offer for the unused resources on the agent, which will often result in respawning the killed task on the same agent. > Add master endpoint to deactivate / activate agent > -- > > Key: MESOS-7317 > URL: https://issues.apache.org/jira/browse/MESOS-7317 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Neil Conway > Labels: mesosphere > > This would allow the operator to deactivate and then subsequently activate an > agent. The allocator does not make offers for deactivated agents; this > functionality would be useful to help operators "manually (incrementally) > drain" the tasks running on an agent, e.g., before taking the agent down. > At present, if the operator causes a framework to kill a task running on the > agent, the framework will often receive an offer for the unused resources on > the agent, which will often result in respawning the killed task on the same > agent. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7317) Add master endpoint to deactivate / activate agent
Neil Conway created MESOS-7317: -- Summary: Add master endpoint to deactivate / activate agent Key: MESOS-7317 URL: https://issues.apache.org/jira/browse/MESOS-7317 Project: Mesos Issue Type: Improvement Components: agent, master Reporter: Neil Conway This would allow the operator to deactivate and then subsequently activate an agent. The allocator does not make offers for deactivated agents; this functionality would be useful to help operators "manually (incrementally) drain" the tasks running on an agent, e.g., before taking the agent down. At present, if the operator causes a framework to kill a task running on the agent, the framework will receive an offer for the unused resources on the agent, which will often result in respawning the killed task on the same agent. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7169) Documentation still references `ContainerLogger::recover`
[ https://issues.apache.org/jira/browse/MESOS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated MESOS-7169: - Component/s: modules > Documentation still references `ContainerLogger::recover` > - > > Key: MESOS-7169 > URL: https://issues.apache.org/jira/browse/MESOS-7169 > Project: Mesos > Issue Type: Bug > Components: documentation, modules >Affects Versions: 1.1.0 >Reporter: Charles Allen > > MESOS-6371 removed {{ContainerLogger::recover}} but > https://github.com/apache/mesos/blob/1.1.0/include/mesos/slave/container_logger.hpp#L143 > still discusses the recovery process as being important. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
[ https://issues.apache.org/jira/browse/MESOS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943999#comment-15943999 ] Anand Mazumdar commented on MESOS-7316: --- cc: [~bbannier] > Upgrading Mesos to 1.2.0 results in some information missing from the > `/flags` endpoint. > > > Key: MESOS-7316 > URL: https://issues.apache.org/jira/browse/MESOS-7316 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Priority: Critical > Labels: mesosphere > > From OSS Mesos Slack: > I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. > After doing this, it looks like the {{zk}} field on the {{/master/flags}} > endpoint is no longer present. > This looks related to the recent {{Flags}} refactoring that was done which > resulted in some flags no longer being populated since they were not part of > {{master::Flags}} in {{src/master/flags.hpp}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
Anand Mazumdar created MESOS-7316: - Summary: Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint. Key: MESOS-7316 URL: https://issues.apache.org/jira/browse/MESOS-7316 Project: Mesos Issue Type: Bug Components: HTTP API Reporter: Anand Mazumdar Priority: Critical >From OSS Mesos Slack: I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. After doing this, it looks like the {{zk}} field on the {{/master/flags}} endpoint is no longer present. This looks related to the recent {{Flags}} refactoring that was done which resulted in some flags no longer being populated since they were not part of {{master::Flags}} in {{src/master/flags.hpp}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7186) Metrics about used/allocated shared resources are incorrect accounted.
[ https://issues.apache.org/jira/browse/MESOS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anindya Sinha updated MESOS-7186: - Shepherd: Yan Xu > Metrics about used/allocated shared resources are incorrect accounted. > -- > > Key: MESOS-7186 > URL: https://issues.apache.org/jira/browse/MESOS-7186 > Project: Mesos > Issue Type: Bug >Reporter: Yan Xu >Assignee: Anindya Sinha > > Certain gauges like {{master/_used}} are calculated from data > structures like {{hashmapusedResources}} which are > keyed off by the framework. However because frameworks under the same role > can have the same shared persistent volumes, we need to de-duplicate (via > Resource arithmetics) before extracting and summing up the double scalar > values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6236) Launch subprocesses associated with specified namespaces.
[ https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943782#comment-15943782 ] Deshi Xiao commented on MESOS-6236: --- the patch is outdate, i have update the patch, need testing it. if the testing is done. i will give a patch. > Launch subprocesses associated with specified namespaces. > - > > Key: MESOS-6236 > URL: https://issues.apache.org/jira/browse/MESOS-6236 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: haosdent > Labels: mesosphere > > Currently there is no standard way in Mesos to launch a child process in a > different namespace (e.g. {{net}}, {{mnt}}). A user may leverage > {{Subprocess}} and provide its own {{clone}} callback, but this approach is > error-prone. > One possible solution is to implement a {{Subprocess}}' child hook. In > [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have > introduced a child hook framework in subprocess and implemented three child > hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce > another child hook {{SETNS}} so that other components (e.g., health check) > can call it to enter the namespaces of a specific process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7315) Design doc for resource provider and storage integration.
[ https://issues.apache.org/jira/browse/MESOS-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7315: -- Description: https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing > Design doc for resource provider and storage integration. > - > > Key: MESOS-7315 > URL: https://issues.apache.org/jira/browse/MESOS-7315 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Jie Yu > > https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7235) Improve Storage Support using Resources Provider
[ https://issues.apache.org/jira/browse/MESOS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-7235: - Assignee: Jie Yu > Improve Storage Support using Resources Provider > > > Key: MESOS-7235 > URL: https://issues.apache.org/jira/browse/MESOS-7235 > Project: Mesos > Issue Type: Epic >Reporter: Jie Yu >Assignee: Jie Yu > > Currently, Mesos supports both [local persistent > volumes|https://github.com/apache/mesos/blob/master/docs/persistent-volume.md] > as well as [external persistent > volumes|https://github.com/apache/mesos/blob/master/docs/docker-volume.md]. > However, both of them are not ideal. > Local persistent volumes do not support offering physical or logical block > devices directly. Also, frameworks do not have choices to select filesystems > for their local persistent volumes. There are also some [usability > problem|https://issues.apache.org/jira/browse/MESOS-4209] with the local > persistent volumes. Mesos does support [multiple local > disks|https://github.com/apache/mesos/blob/master/docs/multiple-disk.md]. > However, it’s a big burden for operators to configure each agent properly to > be able to leverage this feature. > External persistent volumes support in Mesos currently bypasses the resource > management part. In other words, using an external persistent volume does not > go through the usual offer cycle. Mesos doesn’t track resources associated > with the external volumes. This makes quota control, reservation, fair > sharing almost impossible to implement. Also, the current interface Mesos > uses to interact with volume providers is the [Docker Volume Driver interface > (DVDI)|https://docs.docker.com/engine/extend/plugins_volume/], which is very > specific to operations on a particular agent. > The main problem I see currently is that we don’t have a coherent story for > storage. Yes, we have some primitives in Mesos that can support some stateful > services, but this is far from ideal. Some of them are just the stop gap > solution (e.g., the external volume support). This epic tries to tell a > coherent story for supporting storage in Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7315) Design doc for resource provider and storage integration.
Jie Yu created MESOS-7315: - Summary: Design doc for resource provider and storage integration. Key: MESOS-7315 URL: https://issues.apache.org/jira/browse/MESOS-7315 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7312) Update Resource proto for storage resource providers.
[ https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7312: -- Summary: Update Resource proto for storage resource providers. (was: Update Resource proto for CSI requirements) > Update Resource proto for storage resource providers. > - > > Key: MESOS-7312 > URL: https://issues.apache.org/jira/browse/MESOS-7312 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > CSI requires a number of changes to the {{Resource}} proto: > * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} > * {{ResourceProviderID}} in Resource > * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7312) Update Resource proto for storage resource providers.
[ https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7312: -- Description: Storage resource provider support requires a number of changes to the {{Resource}} proto: * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} * {{ResourceProviderID}} in Resource * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. was: CSI requires a number of changes to the {{Resource}} proto: * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} * {{ResourceProviderID}} in Resource * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. > Update Resource proto for storage resource providers. > - > > Key: MESOS-7312 > URL: https://issues.apache.org/jira/browse/MESOS-7312 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > Storage resource provider support requires a number of changes to the > {{Resource}} proto: > * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} > * {{ResourceProviderID}} in Resource > * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7302) Support launching standalone containers.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943574#comment-15943574 ] Jie Yu commented on MESOS-7302: --- [~avinash.mesos] Containerizer has a 'wait' interface to wait for the termination of the container. We don't need a default or command executor semantics. > Support launching standalone containers. > > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. > This is also one necessary step towards running MesosContainerizer in > standalone mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7302) Support launching standalone containers.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943468#comment-15943468 ] Avinash Sridharan commented on MESOS-7302: -- [~jieyu] How would the standalone containers report task updates to the agent? They would still need to have the default/command executor semantics to monitor the container right? Just a thought, in order to launch stand alone containers why can we make the agent act as a first class framework that can launch containers directly on the agent? > Support launching standalone containers. > > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. > This is also one necessary step towards running MesosContainerizer in > standalone mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7314) Add offer operations for converting disk resources
Jan Schlicht created MESOS-7314: --- Summary: Add offer operations for converting disk resources Key: MESOS-7314 URL: https://issues.apache.org/jira/browse/MESOS-7314 Project: Mesos Issue Type: Task Components: master Reporter: Jan Schlicht Assignee: Jan Schlicht One should be able to convert {{RAW}} and {{BLOCK}} disk resources into a different types by applying operations to them. The offer operations and the related validation and resource handling needs to be implemented. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7313) HealthCheckTest.HealthyTaskViaTCP is flaky on Windows.
Alexander Rukletsov created MESOS-7313: -- Summary: HealthCheckTest.HealthyTaskViaTCP is flaky on Windows. Key: MESOS-7313 URL: https://issues.apache.org/jira/browse/MESOS-7313 Project: Mesos Issue Type: Bug Components: test Environment: Windows Server 2016 + Containers AMI Reporter: Alexander Rukletsov Log: https://pastebin.com/1vgKCmv7 According to the log, this does not seem related to health check failure, but rather to the task failure. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requirements
[ https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-7312: Summary: Update Resource proto for CSI requirements (was: Update Resource proto for CSI requiremenets) > Update Resource proto for CSI requirements > -- > > Key: MESOS-7312 > URL: https://issues.apache.org/jira/browse/MESOS-7312 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > CSI requires a number of changes to the {{Resource}} proto: > * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} > * {{ResourceProviderID}} in Resource > * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7026) Update authorization / authorization-filtering to handle hierarchical roles.
[ https://issues.apache.org/jira/browse/MESOS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier reassigned MESOS-7026: --- Assignee: Alexander Rojas (was: Benjamin Bannier) > Update authorization / authorization-filtering to handle hierarchical roles. > > > Key: MESOS-7026 > URL: https://issues.apache.org/jira/browse/MESOS-7026 > Project: Mesos > Issue Type: Task > Components: agent, HTTP API, master >Reporter: Benjamin Mahler >Assignee: Alexander Rojas > > Authorization and endpoint filtering will need to be updated in order to > allow the authorization to be performed in a hierarchical manner (e.g. a user > can see all beneath /eng/* vs. a user can see all beneath /eng/frontend/*). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requiremenets
[ https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-7312: Description: CSI requires a number of changes to the {{Resource}} proto: * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} * {{ResourceProviderID}} in Resource * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. Summary: Update Resource proto for CSI requiremenets (was: Support RAW and BLOCK type Resource::DiskInfo::Source) > Update Resource proto for CSI requiremenets > --- > > Key: MESOS-7312 > URL: https://issues.apache.org/jira/browse/MESOS-7312 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > CSI requires a number of changes to the {{Resource}} proto: > * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} > * {{ResourceProviderID}} in Resource > * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requiremenets
[ https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-7312: Story Points: 3 (was: 1) > Update Resource proto for CSI requiremenets > --- > > Key: MESOS-7312 > URL: https://issues.apache.org/jira/browse/MESOS-7312 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > CSI requires a number of changes to the {{Resource}} proto: > * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}} > * {{ResourceProviderID}} in Resource > * {{Resource::DiskInfo::Source::Path}} should be {{optional}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7312) Support RAW and BLOCK type Resource::DiskInfo::Source
Benjamin Bannier created MESOS-7312: --- Summary: Support RAW and BLOCK type Resource::DiskInfo::Source Key: MESOS-7312 URL: https://issues.apache.org/jira/browse/MESOS-7312 Project: Mesos Issue Type: Bug Reporter: Benjamin Bannier Assignee: Benjamin Bannier -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
[ https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942776#comment-15942776 ] Benjamin Bannier commented on MESOS-5745: - > Is there a way to force no optimizations via an ENV? [~ghulands]: Since you did not specify {{--enable-optimize}} on the command line, I suspect you ended up with an optimized build since you either have some {{CXXFLAGS}} set in your environment, or your system activated optimization by default via either {{CONFIG_SITE}} or a compiler site config. As things stand currently, Mesos will ignore {{--disable-optimize}} when e.g., {{CXXFLAGS}} are set, so you might need to disable optimizations by explicitly redefining {{CXXFLAGS}} to include e.g., {{-O0}}. We'll likely give more concrete suggestions if we know how optimizations where enabled in your build. > AuthenticationTest.UnauthenticatedSlave fails with clang++3.8 > - > > Key: MESOS-5745 > URL: https://issues.apache.org/jira/browse/MESOS-5745 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Michael Park >Assignee: Benjamin Bannier > Labels: mesosphere > > With {{clang++-3.8}}, {{make check}} fails with the following message: > {noformat} > [ RUN ] AuthenticationTest.UnauthenticatedSlave > *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are > using GNU date *** > PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>() > *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: *** > @ 0x7fff8af4252a _sigtramp > @0x110216a00 (unknown) > @0x10b7f5881 mesos::internal::logging::Flags::operator=() > @0x10b7f3076 mesos::internal::slave::Flags::operator=() > @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start() > @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave() > @0x10b7511b9 > mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody() > @0x10c703caa > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x10c703b0a testing::Test::Run() > @0x10c704b02 testing::TestInfo::Run() > @0x10c7053c3 testing::TestCase::Run() > @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests() > @0x10c70ca43 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x10c70c95e testing::UnitTest::Run() > @0x10bbe44f3 main > @ 0x7fff9071a5ad start > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)