[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944546#comment-15944546
 ] 

Deshi Xiao commented on MESOS-6184:
---

i have rebase the patch to 1.2.0 branch codebase. and testing it, it always get 
coredump file.

```
I0328 11:48:12.92218148 exec.cpp:162] Version: 1.2.0
I0328 11:48:12.92925254 exec.cpp:237] Executor registered on agent 
a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
I0328 11:48:12.93164054 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file 
/tmp/gvqGyb -v 
/data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
 --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest 
--label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu 
--label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp 
--name 
mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
 nginx
I0328 11:48:16.14571453 health_checker.cpp:196] Ignoring failure as health 
check still in grace period
W0328 11:48:26.28995849 health_checker.cpp:202] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:36.34005555 health_checker.cpp:202] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:46.38653349 health_checker.cpp:202] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack 
trace: ***
@ 

[jira] [Issue Comment Deleted] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-27 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6184:
--
Comment: was deleted

(was: good for me)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Critical
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7319) Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion.

2017-03-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7319:
---
Description: 
The current naming of the DRAIN mode in maintenance has been confusing to users 
as there tends to be an expectation of mesos doing something (e.g. not sending 
offers, or killing tasks) to achieve the drain, whereas in reality mesos does 
nothing and expects the schedulers to act (this only applies for maintenance 
aware schedulers).

Rather, what's actually happening at in the DRAIN mode is that the maintenance 
is scheduled, that's it. So a name like SCHEDULED would be less confusing for 
users: http://mesos.apache.org/documentation/latest/maintenance/
Component/s: documentation

> Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion.
> --
>
> Key: MESOS-7319
> URL: https://issues.apache.org/jira/browse/MESOS-7319
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation, HTTP API, master
>Reporter: Benjamin Mahler
>
> The current naming of the DRAIN mode in maintenance has been confusing to 
> users as there tends to be an expectation of mesos doing something (e.g. not 
> sending offers, or killing tasks) to achieve the drain, whereas in reality 
> mesos does nothing and expects the schedulers to act (this only applies for 
> maintenance aware schedulers).
> Rather, what's actually happening at in the DRAIN mode is that the 
> maintenance is scheduled, that's it. So a name like SCHEDULED would be less 
> confusing for users: http://mesos.apache.org/documentation/latest/maintenance/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7319) Rename the DRAIN maintenance mode to SCHEDULED to avoid confusion.

2017-03-27 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7319:
--

 Summary: Rename the DRAIN maintenance mode to SCHEDULED to avoid 
confusion.
 Key: MESOS-7319
 URL: https://issues.apache.org/jira/browse/MESOS-7319
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API, master
Reporter: Benjamin Mahler






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7201) Improvements to maintenance primitives

2017-03-27 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944343#comment-15944343
 ] 

Benjamin Mahler commented on MESOS-7201:


[~kaysoky] I'm inclined to rename the {{DRAIN}} mode to {{SCHEDULED}} as there 
is not necessarily "draining" occurring in the {{DRAIN}} mode, so this tends to 
confuse users as they have an expectation of mesos doing something (e.g. not 
sending offers, or killing tasks) to achieve the drain. Thoughts?

> Improvements to maintenance primitives
> --
>
> Key: MESOS-7201
> URL: https://issues.apache.org/jira/browse/MESOS-7201
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> This is a follow up epic to MESOS-1474 to capture further improvements for 
> maintenance primitives.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7317) Add master endpoint to deactivate / activate agent

2017-03-27 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7317:
---
Target Version/s: 1.3.0

> Add master endpoint to deactivate / activate agent
> --
>
> Key: MESOS-7317
> URL: https://issues.apache.org/jira/browse/MESOS-7317
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> This would allow the operator to deactivate and then subsequently activate an 
> agent. The allocator does not make offers for deactivated agents; this 
> functionality would be useful to help operators "manually (incrementally) 
> drain" the tasks running on an agent, e.g., before taking the agent down.
> At present, if the operator causes a framework to kill a task running on the 
> agent, the framework will often receive an offer for the unused resources on 
> the agent, which will often result in respawning the killed task on the same 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7235) Improve Storage Support using Resources Provider and CSI

2017-03-27 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7235:
--
Summary: Improve Storage Support using Resources Provider and CSI  (was: 
Improve Storage Support using Resources Provider)

> Improve Storage Support using Resources Provider and CSI
> 
>
> Key: MESOS-7235
> URL: https://issues.apache.org/jira/browse/MESOS-7235
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, Mesos supports both [local persistent 
> volumes|https://github.com/apache/mesos/blob/master/docs/persistent-volume.md]
>  as well as [external persistent 
> volumes|https://github.com/apache/mesos/blob/master/docs/docker-volume.md]. 
> However, both of them are not ideal.
> Local persistent volumes do not support offering physical or logical block 
> devices directly. Also, frameworks do not have choices to select filesystems 
> for their local persistent volumes. There are also some [usability 
> problem|https://issues.apache.org/jira/browse/MESOS-4209] with the local 
> persistent volumes. Mesos does support [multiple local 
> disks|https://github.com/apache/mesos/blob/master/docs/multiple-disk.md]. 
> However, it’s a big burden for operators to configure each agent properly to 
> be able to leverage this feature.
> External persistent volumes support in Mesos currently bypasses the resource 
> management part. In other words, using an external persistent volume does not 
> go through the usual offer cycle. Mesos doesn’t track resources associated 
> with the external volumes. This makes quota control, reservation, fair 
> sharing almost impossible to implement. Also, the current interface Mesos 
> uses to interact with volume providers is the [Docker Volume Driver interface 
> (DVDI)|https://docs.docker.com/engine/extend/plugins_volume/], which is very 
> specific to operations on a particular agent.
> The main problem I see currently is that we don’t have a coherent story for 
> storage. Yes, we have some primitives in Mesos that can support some stateful 
> services, but this is far from ideal. Some of them are just the stop gap 
> solution (e.g., the external volume support). This epic tries to tell a 
> coherent story for supporting storage in Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7317) Add master endpoint to deactivate / activate agent

2017-03-27 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944208#comment-15944208
 ] 

Benjamin Mahler commented on MESOS-7317:


Linking in the "maintenance improvements" epic.

> Add master endpoint to deactivate / activate agent
> --
>
> Key: MESOS-7317
> URL: https://issues.apache.org/jira/browse/MESOS-7317
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> This would allow the operator to deactivate and then subsequently activate an 
> agent. The allocator does not make offers for deactivated agents; this 
> functionality would be useful to help operators "manually (incrementally) 
> drain" the tasks running on an agent, e.g., before taking the agent down.
> At present, if the operator causes a framework to kill a task running on the 
> agent, the framework will often receive an offer for the unused resources on 
> the agent, which will often result in respawning the killed task on the same 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7318) Libprocess delays and timers should be undisturbed by system clock jumps.

2017-03-27 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7318:
--

 Summary: Libprocess delays and timers should be undisturbed by 
system clock jumps.
 Key: MESOS-7318
 URL: https://issues.apache.org/jira/browse/MESOS-7318
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Benjamin Mahler


Currently, libprocess timers / delays / timeouts are affected by system clock 
jumps because they do not use a monotonic clock as a reference point.

Since these require relative timing, we can use a monotonic clock as the 
reference point. We also need the approach to be affected by clock manipulation 
at the libprocess level (i.e. {{Clock::advance(...)}} and 
{{Clock::update(...)}}) for testing purposes.

The current recommendation is for users to use NTP with skewing applied to 
adjust for leaps, e.g.: 
https://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html

I thought we already had a ticket for this but can't seem to find it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7317) Add master endpoint to deactivate / activate agent

2017-03-27 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-7317:
---
Description: 
This would allow the operator to deactivate and then subsequently activate an 
agent. The allocator does not make offers for deactivated agents; this 
functionality would be useful to help operators "manually (incrementally) 
drain" the tasks running on an agent, e.g., before taking the agent down.

At present, if the operator causes a framework to kill a task running on the 
agent, the framework will often receive an offer for the unused resources on 
the agent, which will often result in respawning the killed task on the same 
agent.

  was:
This would allow the operator to deactivate and then subsequently activate an 
agent. The allocator does not make offers for deactivated agents; this 
functionality would be useful to help operators "manually (incrementally) 
drain" the tasks running on an agent, e.g., before taking the agent down.

At present, if the operator causes a framework to kill a task running on the 
agent, the framework will receive an offer for the unused resources on the 
agent, which will often result in respawning the killed task on the same agent.


> Add master endpoint to deactivate / activate agent
> --
>
> Key: MESOS-7317
> URL: https://issues.apache.org/jira/browse/MESOS-7317
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> This would allow the operator to deactivate and then subsequently activate an 
> agent. The allocator does not make offers for deactivated agents; this 
> functionality would be useful to help operators "manually (incrementally) 
> drain" the tasks running on an agent, e.g., before taking the agent down.
> At present, if the operator causes a framework to kill a task running on the 
> agent, the framework will often receive an offer for the unused resources on 
> the agent, which will often result in respawning the killed task on the same 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7317) Add master endpoint to deactivate / activate agent

2017-03-27 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7317:
--

 Summary: Add master endpoint to deactivate / activate agent
 Key: MESOS-7317
 URL: https://issues.apache.org/jira/browse/MESOS-7317
 Project: Mesos
  Issue Type: Improvement
  Components: agent, master
Reporter: Neil Conway


This would allow the operator to deactivate and then subsequently activate an 
agent. The allocator does not make offers for deactivated agents; this 
functionality would be useful to help operators "manually (incrementally) 
drain" the tasks running on an agent, e.g., before taking the agent down.

At present, if the operator causes a framework to kill a task running on the 
agent, the framework will receive an offer for the unused resources on the 
agent, which will often result in respawning the killed task on the same agent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7169) Documentation still references `ContainerLogger::recover`

2017-03-27 Thread Charles Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7169:
-
Component/s: modules

> Documentation still references `ContainerLogger::recover`
> -
>
> Key: MESOS-7169
> URL: https://issues.apache.org/jira/browse/MESOS-7169
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, modules
>Affects Versions: 1.1.0
>Reporter: Charles Allen
>
> MESOS-6371 removed {{ContainerLogger::recover}} but 
> https://github.com/apache/mesos/blob/1.1.0/include/mesos/slave/container_logger.hpp#L143
>  still discusses the recovery process as being important.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.

2017-03-27 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943999#comment-15943999
 ] 

Anand Mazumdar commented on MESOS-7316:
---

cc: [~bbannier]

> Upgrading Mesos to 1.2.0 results in some information missing from the 
> `/flags` endpoint.
> 
>
> Key: MESOS-7316
> URL: https://issues.apache.org/jira/browse/MESOS-7316
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> From OSS Mesos Slack:
> I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. 
> After doing this, it looks like the {{zk}} field on the {{/master/flags}} 
> endpoint is no longer present. 
> This looks related to the recent {{Flags}} refactoring that was done which 
> resulted in some flags no longer being populated since they were not part of 
> {{master::Flags}} in {{src/master/flags.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.

2017-03-27 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-7316:
-

 Summary: Upgrading Mesos to 1.2.0 results in some information 
missing from the `/flags` endpoint.
 Key: MESOS-7316
 URL: https://issues.apache.org/jira/browse/MESOS-7316
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: Anand Mazumdar
Priority: Critical


>From OSS Mesos Slack:
I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. After 
doing this, it looks like the {{zk}} field on the {{/master/flags}} endpoint is 
no longer present. 

This looks related to the recent {{Flags}} refactoring that was done which 
resulted in some flags no longer being populated since they were not part of 
{{master::Flags}} in {{src/master/flags.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7186) Metrics about used/allocated shared resources are incorrect accounted.

2017-03-27 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha updated MESOS-7186:
-
Shepherd: Yan Xu

> Metrics about used/allocated shared resources are incorrect accounted.
> --
>
> Key: MESOS-7186
> URL: https://issues.apache.org/jira/browse/MESOS-7186
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yan Xu
>Assignee: Anindya Sinha
>
> Certain gauges like {{master/_used}} are calculated from data 
> structures like {{hashmap usedResources}} which are 
> keyed off by the framework. However because frameworks under the same role 
> can have the same shared persistent volumes, we need to de-duplicate (via 
> Resource arithmetics) before extracting and summing up the double scalar 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2017-03-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943782#comment-15943782
 ] 

Deshi Xiao commented on MESOS-6236:
---

the patch is outdate, i have update the patch, need testing it. if the testing 
is done. i will give a patch.

> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: haosdent
>  Labels: mesosphere
>
> Currently there is no standard way in Mesos to launch a child process in a 
> different namespace (e.g. {{net}}, {{mnt}}). A user may leverage 
> {{Subprocess}} and provide its own {{clone}} callback, but this approach is 
> error-prone.
> One possible solution is to implement a {{Subprocess}}' child hook. In 
> [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce 
> another child hook {{SETNS}} so that other components (e.g., health check) 
> can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7315) Design doc for resource provider and storage integration.

2017-03-27 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7315:
--
Description: 
https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing

> Design doc for resource provider and storage integration.
> -
>
> Key: MESOS-7315
> URL: https://issues.apache.org/jira/browse/MESOS-7315
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7235) Improve Storage Support using Resources Provider

2017-03-27 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7235:
-

Assignee: Jie Yu

> Improve Storage Support using Resources Provider
> 
>
> Key: MESOS-7235
> URL: https://issues.apache.org/jira/browse/MESOS-7235
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, Mesos supports both [local persistent 
> volumes|https://github.com/apache/mesos/blob/master/docs/persistent-volume.md]
>  as well as [external persistent 
> volumes|https://github.com/apache/mesos/blob/master/docs/docker-volume.md]. 
> However, both of them are not ideal.
> Local persistent volumes do not support offering physical or logical block 
> devices directly. Also, frameworks do not have choices to select filesystems 
> for their local persistent volumes. There are also some [usability 
> problem|https://issues.apache.org/jira/browse/MESOS-4209] with the local 
> persistent volumes. Mesos does support [multiple local 
> disks|https://github.com/apache/mesos/blob/master/docs/multiple-disk.md]. 
> However, it’s a big burden for operators to configure each agent properly to 
> be able to leverage this feature.
> External persistent volumes support in Mesos currently bypasses the resource 
> management part. In other words, using an external persistent volume does not 
> go through the usual offer cycle. Mesos doesn’t track resources associated 
> with the external volumes. This makes quota control, reservation, fair 
> sharing almost impossible to implement. Also, the current interface Mesos 
> uses to interact with volume providers is the [Docker Volume Driver interface 
> (DVDI)|https://docs.docker.com/engine/extend/plugins_volume/], which is very 
> specific to operations on a particular agent.
> The main problem I see currently is that we don’t have a coherent story for 
> storage. Yes, we have some primitives in Mesos that can support some stateful 
> services, but this is far from ideal. Some of them are just the stop gap 
> solution (e.g., the external volume support). This epic tries to tell a 
> coherent story for supporting storage in Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7315) Design doc for resource provider and storage integration.

2017-03-27 Thread Jie Yu (JIRA)
Jie Yu created MESOS-7315:
-

 Summary: Design doc for resource provider and storage integration.
 Key: MESOS-7315
 URL: https://issues.apache.org/jira/browse/MESOS-7315
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7312) Update Resource proto for storage resource providers.

2017-03-27 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7312:
--
Summary: Update Resource proto for storage resource providers.  (was: 
Update Resource proto for CSI requirements)

> Update Resource proto for storage resource providers.
> -
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> CSI requires a number of changes to the {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7312) Update Resource proto for storage resource providers.

2017-03-27 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7312:
--
Description: 
Storage resource provider support requires a number of changes to the 
{{Resource}} proto:

* support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
* {{ResourceProviderID}} in Resource
* {{Resource::DiskInfo::Source::Path}} should be {{optional}}.

  was:
CSI requires a number of changes to the {{Resource}} proto:

* support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
* {{ResourceProviderID}} in Resource
* {{Resource::DiskInfo::Source::Path}} should be {{optional}}.


> Update Resource proto for storage resource providers.
> -
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> Storage resource provider support requires a number of changes to the 
> {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7302) Support launching standalone containers.

2017-03-27 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943574#comment-15943574
 ] 

Jie Yu commented on MESOS-7302:
---

[~avinash.mesos] Containerizer has a 'wait' interface to wait for the 
termination of the container. We don't need a default or command executor 
semantics.

> Support launching standalone containers.
> 
>
> Key: MESOS-7302
> URL: https://issues.apache.org/jira/browse/MESOS-7302
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Jie Yu
>
> Containerizer should support launching containers (both top level and nested) 
> that are not tied to a particular Mesos task or executor. This is for the 
> case where the agent wants to launch some system containers (e.g., for CSI 
> plugin) that will be managed by Mesos.
> More specifically, the Containerizer interfaces should be refactored so that 
> they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` 
> interface depends on them. Instead, we should consistently use ContainerInfo 
> and CommandInfo in Containerizer and isolators.
> This is also one necessary step towards running MesosContainerizer in 
> standalone mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7302) Support launching standalone containers.

2017-03-27 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943468#comment-15943468
 ] 

Avinash Sridharan commented on MESOS-7302:
--

[~jieyu]
How would the standalone containers report task updates to the agent? They 
would still need to have the default/command executor semantics to monitor the 
container right?

Just a thought, in order to launch stand alone containers why can we make the 
agent act as a first class framework that can launch containers directly on the 
agent?

> Support launching standalone containers.
> 
>
> Key: MESOS-7302
> URL: https://issues.apache.org/jira/browse/MESOS-7302
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Jie Yu
>
> Containerizer should support launching containers (both top level and nested) 
> that are not tied to a particular Mesos task or executor. This is for the 
> case where the agent wants to launch some system containers (e.g., for CSI 
> plugin) that will be managed by Mesos.
> More specifically, the Containerizer interfaces should be refactored so that 
> they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` 
> interface depends on them. Instead, we should consistently use ContainerInfo 
> and CommandInfo in Containerizer and isolators.
> This is also one necessary step towards running MesosContainerizer in 
> standalone mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7314) Add offer operations for converting disk resources

2017-03-27 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7314:
---

 Summary: Add offer operations for converting disk resources
 Key: MESOS-7314
 URL: https://issues.apache.org/jira/browse/MESOS-7314
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht
Assignee: Jan Schlicht


One should be able to convert {{RAW}} and {{BLOCK}} disk resources into a 
different types by applying operations to them. The offer operations and the 
related validation and resource handling needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7313) HealthCheckTest.HealthyTaskViaTCP is flaky on Windows.

2017-03-27 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7313:
--

 Summary: HealthCheckTest.HealthyTaskViaTCP is flaky on Windows.
 Key: MESOS-7313
 URL: https://issues.apache.org/jira/browse/MESOS-7313
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Windows Server 2016 + Containers AMI
Reporter: Alexander Rukletsov


Log: https://pastebin.com/1vgKCmv7

According to the log, this does not seem related to health check failure, but 
rather to the task failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requirements

2017-03-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7312:

Summary: Update Resource proto for CSI requirements  (was: Update Resource 
proto for CSI requiremenets)

> Update Resource proto for CSI requirements
> --
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> CSI requires a number of changes to the {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7026) Update authorization / authorization-filtering to handle hierarchical roles.

2017-03-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-7026:
---

Assignee: Alexander Rojas  (was: Benjamin Bannier)

> Update authorization / authorization-filtering to handle hierarchical roles.
> 
>
> Key: MESOS-7026
> URL: https://issues.apache.org/jira/browse/MESOS-7026
> Project: Mesos
>  Issue Type: Task
>  Components: agent, HTTP API, master
>Reporter: Benjamin Mahler
>Assignee: Alexander Rojas
>
> Authorization and endpoint filtering will need to be updated in order to 
> allow the authorization to be performed in a hierarchical manner (e.g. a user 
> can see all beneath /eng/* vs. a user can see all beneath /eng/frontend/*).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requiremenets

2017-03-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7312:

Description: 
CSI requires a number of changes to the {{Resource}} proto:

* support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
* {{ResourceProviderID}} in Resource
* {{Resource::DiskInfo::Source::Path}} should be {{optional}}.
Summary: Update Resource proto for CSI requiremenets  (was: Support RAW 
and BLOCK type Resource::DiskInfo::Source)

> Update Resource proto for CSI requiremenets
> ---
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> CSI requires a number of changes to the {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7312) Update Resource proto for CSI requiremenets

2017-03-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7312:

Story Points: 3  (was: 1)

> Update Resource proto for CSI requiremenets
> ---
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> CSI requires a number of changes to the {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7312) Support RAW and BLOCK type Resource::DiskInfo::Source

2017-03-27 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-7312:
---

 Summary: Support RAW and BLOCK type Resource::DiskInfo::Source
 Key: MESOS-7312
 URL: https://issues.apache.org/jira/browse/MESOS-7312
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2017-03-27 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942776#comment-15942776
 ] 

Benjamin Bannier commented on MESOS-5745:
-

> Is there a way to force no optimizations via an ENV?

[~ghulands]: Since you did not specify {{--enable-optimize}} on the command 
line, I suspect you ended up with an optimized build since you either have some 
{{CXXFLAGS}} set in your environment, or your system activated optimization by 
default via either {{CONFIG_SITE}} or a compiler site config. As things stand 
currently, Mesos will ignore {{--disable-optimize}} when e.g., {{CXXFLAGS}} are 
set, so you might need to disable optimizations by explicitly redefining 
{{CXXFLAGS}} to include e.g., {{-O0}}. We'll likely give more concrete 
suggestions if we know how optimizations where enabled in your build.  

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)