[jira] [Created] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees

2017-10-30 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8150:
-

 Summary: Attributes documentation indicates that sets are valid 
attribute types; code disagrees
 Key: MESOS-8150
 URL: https://issues.apache.org/jira/browse/MESOS-8150
 Project: Mesos
  Issue Type: Documentation
Reporter: Tim Harper
Priority: Minor


On the [Mesos Attributes & 
Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] 
page, it says:

{quote}The types of values that are supported by Attributes and Resources in 
Mesos are scalar, ranges, sets and text.{quote}

However, the code for 1.4.x disagrees. Sets are not supported for attribute 
types:

https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171

https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8051) Killing TASK_GROUP fail to kill some tasks

2017-10-30 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226257#comment-16226257
 ] 

Qian Zhang commented on MESOS-8051:
---

commit 05c7dd88f269692b7248c1087a3f57759eba6853
Author: Qian Zhang 
Date:   Mon Oct 9 09:01:15 2017 +0800

Ignored the tasks already being killed when killing the task group.

When the scheduler tries to kill multiple tasks in the task group
simultaneously, the default executor will kill the tasks one by
one. When the first task is killed, the default executor will kill
all the other tasks in the task group, however, we need to ignore
the tasks which are already being killed, otherwise, the check
`CHECK(!container->killing);` in `DefaultExecutor::kill()` will fail.

Review: https://reviews.apache.org/r/62836

commit 28831de34d098c894042246dd6fef402eb3b960d
Author: Qian Zhang 
Date:   Mon Oct 9 14:25:31 2017 +0800

Added a test `DefaultExecutorTest.KillMultipleTasks`.

Review: https://reviews.apache.org/r/62837

> Killing TASK_GROUP fail to kill some tasks
> --
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Assignee: Qian Zhang
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 

[jira] [Updated] (MESOS-8050) Mesos HTTP/HTTPS health checks for IPv6 docker containers.

2017-10-30 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-8050:
-
  Sprint: Mesosphere Sprint 67
Story Points: 5

> Mesos HTTP/HTTPS health checks for IPv6 docker containers.
> --
>
> Key: MESOS-8050
> URL: https://issues.apache.org/jira/browse/MESOS-8050
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Qian Zhang
>
> Currently the MESOS HTTP/HTTPS health checks hardcode the IP address to 
> 127.0.0.1 while performing the pings on the containers. With IPv6 containers, 
> even dual stack kernels the container will have both the IPv4 and IPv6 
> loopback interfaces (127.0.0.1 and ::1). Further, its upto the application's 
> discretion to either open a INET or an INET6 socket which would imply that to 
> support IPv6 containers the MESOS HTTP/HTTPS health checks need to be 
> configurable to perform health checks on 127.0.0.1 or ::1. 
> A proposal here would be to introduce the concept of a transport on which 
> MESOS HTTP/HTTPS health checks work. That is the framework specifies whether 
> MESOS HTTP healthchecks work over TCP or TCP6. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8050) Mesos HTTP/HTTPS health checks for IPv6 docker containers.

2017-10-30 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226011#comment-16226011
 ] 

Avinash Sridharan commented on MESOS-8050:
--

Protobuf changes for `HTTP` and `TCP` health checks:
https://reviews.apache.org/r/63434/

> Mesos HTTP/HTTPS health checks for IPv6 docker containers.
> --
>
> Key: MESOS-8050
> URL: https://issues.apache.org/jira/browse/MESOS-8050
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Qian Zhang
>
> Currently the MESOS HTTP/HTTPS health checks hardcode the IP address to 
> 127.0.0.1 while performing the pings on the containers. With IPv6 containers, 
> even dual stack kernels the container will have both the IPv4 and IPv6 
> loopback interfaces (127.0.0.1 and ::1). Further, its upto the application's 
> discretion to either open a INET or an INET6 socket which would imply that to 
> support IPv6 containers the MESOS HTTP/HTTPS health checks need to be 
> configurable to perform health checks on 127.0.0.1 or ::1. 
> A proposal here would be to introduce the concept of a transport on which 
> MESOS HTTP/HTTPS health checks work. That is the framework specifies whether 
> MESOS HTTP healthchecks work over TCP or TCP6. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8149) Move all `markGone` logic into master.

2017-10-30 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-8149:
--
Affects Version/s: 1.5.0

> Move all `markGone` logic into master.
> --
>
> Key: MESOS-8149
> URL: https://issues.apache.org/jira/browse/MESOS-8149
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Jie Yu
>
> Currently, it's split between master's http handler (in src/master/http.cpp) 
> and master (src/master/master.cpp).
> https://github.com/apache/mesos/blob/6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9/src/master/master.cpp#L7473
> https://github.com/apache/mesos/blob/6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9/src/master/http.cpp#L5398-L5420
> We should consider moving all logics related to marking an agent gone to the 
> master.cpp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8149) Move all `markGone` logic into master.

2017-10-30 Thread Jie Yu (JIRA)
Jie Yu created MESOS-8149:
-

 Summary: Move all `markGone` logic into master.
 Key: MESOS-8149
 URL: https://issues.apache.org/jira/browse/MESOS-8149
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu


Currently, it's split between master's http handler (in src/master/http.cpp) 
and master (src/master/master.cpp).
https://github.com/apache/mesos/blob/6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9/src/master/master.cpp#L7473
https://github.com/apache/mesos/blob/6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9/src/master/http.cpp#L5398-L5420

We should consider moving all logics related to marking an agent gone to the 
master.cpp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback

2017-10-30 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225743#comment-16225743
 ] 

Greg Mann commented on MESOS-8130:
--

Review here: https://reviews.apache.org/r/63322/

> Add placeholder handlers for offer operation feedback
> -
>
> Key: MESOS-8130
> URL: https://issues.apache.org/jira/browse/MESOS-8130
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> In order to sketch out the flow of messages necessary to facilitate offer 
> operation feedback, we should add some empty placeholder handlers to the 
> master and agent as detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback

2017-10-30 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225744#comment-16225744
 ] 

Greg Mann commented on MESOS-8130:
--

{code}
commit 6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9
Author: Greg Mann g...@mesosphere.io
Date:   Mon Oct 30 13:02:18 2017 -0700

Added placeholder handlers and other changes for operation updates.

This patch adds empty placeholder handler functions which will
be used for offer operation status updates as well as their
acknowledgement and reconciliation.

A number of switch statements are also updated to handle new
enum values and validation code is added.

Review: https://reviews.apache.org/r/63322/
{code}

> Add placeholder handlers for offer operation feedback
> -
>
> Key: MESOS-8130
> URL: https://issues.apache.org/jira/browse/MESOS-8130
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> In order to sketch out the flow of messages necessary to facilitate offer 
> operation feedback, we should add some empty placeholder handlers to the 
> master and agent as detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8131) Add new protobuf messages for offer operation feedback

2017-10-30 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225731#comment-16225731
 ] 

Greg Mann commented on MESOS-8131:
--

{code}
commit e6bec836af3a672a0838cd6a1b7687f087d5594f
Author: Greg Mann 
Date:   Mon Oct 30 13:00:58 2017 -0700

Added protobuf messages for V1 scheduler operation feedback.

This patch adds new and updated protobuf messages to facilitate
offer operation status updates, as well as acknowledgement of
those updates and operation status reconciliation.

Review: https://reviews.apache.org/r/63321/
{code}

> Add new protobuf messages for offer operation feedback
> --
>
> Key: MESOS-8131
> URL: https://issues.apache.org/jira/browse/MESOS-8131
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> We should add the necessary protobuf messages for offer operation feedback as 
> detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8131) Add new protobuf messages for offer operation feedback

2017-10-30 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225729#comment-16225729
 ] 

Greg Mann commented on MESOS-8131:
--

Review here: https://reviews.apache.org/r/63321/

> Add new protobuf messages for offer operation feedback
> --
>
> Key: MESOS-8131
> URL: https://issues.apache.org/jira/browse/MESOS-8131
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> We should add the necessary protobuf messages for offer operation feedback as 
> detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7616) Consider supporting changes to agent's domain without full drain.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7616:
-

Assignee: Benno Evers  (was: Neil Conway)

> Consider supporting changes to agent's domain without full drain.
> -
>
> Key: MESOS-7616
> URL: https://issues.apache.org/jira/browse/MESOS-7616
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Benno Evers
>  Labels: mesosphere
>
> In the initial review chain, any change to an agent's domain requires a full 
> drain. This is simple and straightforward, but it makes it more difficult for 
> operators to opt-in to using fault domains.
> We should consider allowing agents to transition from "no configured domain" 
> to "configured domain" without requiring an agent drain. This has some 
> complications, however: e.g., without an API for communicating changes in an 
> agent's configuration to frameworks, they might not realize that an agent's 
> domain has changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8148:
-

Assignee: Benno Evers

> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>Assignee: Benno Evers
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> The specification is as follows:
> {code}
> scalar : floatValue
> floatValue : ( intValue ( "." intValue )? ) | ...
> intValue : [0-9]+
> range : "[" rangeValue ( "," rangeValue )* "]"
> rangeValue : scalar "-" scalar
> set : "{" text ( "," text )* "}"
> text : [a-zA-Z0-9_/.-]
> {code}
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to use the Mesos set 
> value type specification to describe a set of zones in which an app should be 
> deployed, and, as a consequence, would result in additional complexity (IE: 
> Marathon would need to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

The specification is as follows:

{code}
scalar : floatValue

floatValue : ( intValue ( "." intValue )? ) | ...

intValue : [0-9]+

range : "[" rangeValue ( "," rangeValue )* "]"

rangeValue : scalar "-" scalar

set : "{" text ( "," text )* "}"

text : [a-zA-Z0-9_/.-]
{code}

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> The specification is as follows:
> {code}
> scalar : floatValue
> floatValue : ( intValue ( "." intValue )? ) | ...
> intValue : [0-9]+
> range : "[" rangeValue ( "," rangeValue )* "]"
> rangeValue : scalar "-" scalar
> set : "{" text ( "," text )* "}"
> text : [a-zA-Z0-9_/.-]
> {code}
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to use the Mesos set 
> value type specification to describe a set of zones in which an app should be 
> deployed, and, as a consequence, would result in additional complexity (IE: 
> Marathon would need to implement an 

[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to use the Mesos set 
> value type specification to describe a set of zones in which an app should be 
> deployed, and, as a consequence, would result in additional complexity (IE: 
> Marathon would need to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has 
adopted the Mesos attribute value specification and will enforce it in the 
validation layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to uses the Mesos set 
> value type specification to describe a set of zones in which an app would be 
> deployed, and, would result in additional complexity (IE: Marathon would need 
> to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has 
adopted the Mesos attribute value specification and will enforce it in the 
validation layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted 
the Mesos attribute value specification and will enforce it in the validation 
layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, 
> Marathon has adopted the Mesos attribute value specification and will enforce 
> it in the validation layer. As an example, it will be possible to write 
> things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to uses the Mesos set 
> value type specification to describe a set of zones in which an app would be 
> deployed, and, would result in additional complexity (IE: Marathon would need 
> to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8148:
-

 Summary: Enforce text attribute value specification for zone and 
region values
 Key: MESOS-8148
 URL: https://issues.apache.org/jira/browse/MESOS-8148
 Project: Mesos
  Issue Type: Improvement
Reporter: Tim Harper


Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted 
the Mesos attribute value specification and will enforce it in the validation 
layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7506) Multiple tests leave orphan containers.

2017-10-30 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225515#comment-16225515
 ] 

Andrei Budnik commented on MESOS-7506:
--

Some tests (from {{SlaveTest}} and {{SlaveRecoveryTest}}) have a pattern [like 
this|https://github.com/apache/mesos/blob/ff01d0c44251e2ffaa2f4f47b33c790594d194d9/src/tests/slave_tests.cpp#L393-L406],
 where the clock is advanced by {{executor_registration_timeout}} and then it 
waits in a loop until a task status update is sent. This loop is executing 
while the container is being destroyed. At the same time, container destruction 
consists of multiple steps, one of them waits for [cgroups 
destruction|https://github.com/apache/mesos/blob/ff01d0c44251e2ffaa2f4f47b33c790594d194d9/src/slave/containerizer/mesos/linux_launcher.cpp#L567].
 That means, we have a race between container destruction process and the loop 
that advances the clock, leading to the following outcomes:
#  Container completely destroyed, before clock advancing reaches timeout (e.g. 
{{cgroups::DESTROY_TIMEOUT}}).
# Triggered timeout due to clock advancing, before container destruction 
completes. That results in [leaving 
orphaned|https://github.com/apache/mesos/blob/ff01d0c44251e2ffaa2f4f47b33c790594d194d9/src/slave/containerizer/mesos/containerizer.cpp#L2367-L2380]
 containers that will be detected by [Slave 
destructor|https://github.com/apache/mesos/blob/ff01d0c44251e2ffaa2f4f47b33c790594d194d9/src/tests/cluster.cpp#L559-L584]
 in `tests/cluster.cpp`, so the test will fail.

The issue is easily reproduced by advancing the clocks by 60 seconds or more in 
the loop, which waits for a status update.

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-30 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8054:
-
Shepherd: Greg Mann

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> Only LAUNCH operations provide feedback on success or failure. All Operations 
> should do so. RESERVE, UNRESERVE, CREATE, DESTROY, CREATE_VOLUME, AND 
> DESTROY_VOLUME should all provide feedback on success or failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4945) Garbage collect unused docker layers in the store.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4945:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66  (was: Mesosphere Sprint 
65, Mesosphere Sprint 66, Mesosphere Sprint 67)

> Garbage collect unused docker layers in the store.
> --
>
> Key: MESOS-4945
> URL: https://issues.apache.org/jira/browse/MESOS-4945
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Zhitao Li
>  Labels: Mesosphere
>
> Right now, we don't have any garbage collection in place for docker layers. 
> It's not straightforward to implement because we don't know what container is 
> currently using the layer. We probably need a way to track the current usage 
> of layers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8147) Rename systemd/upstart unit files from 'slave' to 'agent'.

2017-10-30 Thread Kapil Arya (JIRA)
Kapil Arya created MESOS-8147:
-

 Summary: Rename systemd/upstart unit files from 'slave' to 'agent'.
 Key: MESOS-8147
 URL: https://issues.apache.org/jira/browse/MESOS-8147
 Project: Mesos
  Issue Type: Task
Reporter: Kapil Arya


The following three files in support/packaging/common should be renamed:
* mesos-slave
* mesos-slave.service
* mesos-slave.upstart

This will require changes to the RPM specfile as well. Perhaps we can use 
systemd service name alias feature to accomplish some of it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8054:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66  (was: Mesosphere Sprint 
65, Mesosphere Sprint 66, Mesosphere Sprint 67)

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> Only LAUNCH operations provide feedback on success or failure. All Operations 
> should do so. RESERVE, UNRESERVE, CREATE, DESTROY, CREATE_VOLUME, AND 
> DESTROY_VOLUME should all provide feedback on success or failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8140) Executors should clear their auth tokens

2017-10-30 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8140:
-
Shepherd: Greg Mann
  Labels: security  (was: )

> Executors should clear their auth tokens
> 
>
> Key: MESOS-8140
> URL: https://issues.apache.org/jira/browse/MESOS-8140
> Project: Mesos
>  Issue Type: Bug
>  Components: executor, security
>Reporter: James Peach
>Assignee: James Peach
>  Labels: security
> Fix For: 1.5.0
>
>
> The built-in executors should clear {{MESOS_EXECUTOR_AUTHENTICATION_TOKEN}} 
> from their environment since otherwise tasks running as the same user in the 
> same container can trivially inspect it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7506:
--
Story Points: 3

Adding 3 story points for now. [~abudnik] Please update as you see fit.

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7790:
--
Story Points: 8

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7966) check for maintenance on agent causes fatal error

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7966:
--
Sprint: Mesosphere Sprint 66  (was: Mesosphere Sprint 66, Mesosphere Sprint 
67)

> check for maintenance on agent causes fatal error
> -
>
> Key: MESOS-7966
> URL: https://issues.apache.org/jira/browse/MESOS-7966
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.0
>Reporter: Rob Johnson
>Assignee: Armand Grillet
>Priority: Blocker
>  Labels: reliability
>
> We interact with the maintenance API frequently to orchestrate gracefully 
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with 
> the api. This happens relatively frequently, and impacts us when downstream 
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: 
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're 
> happy to provide any other logs you need - please let me know what would be 
> useful for debugging.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Story Points: 3

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Armand Grillet
>Priority: Blocker
>  Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7905) GarbageCollectorIntegrationTest.ExitedFramework is flaky

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7905:
--
Sprint: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 62, 
Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66)

> GarbageCollectorIntegrationTest.ExitedFramework is flaky
> 
>
> Key: MESOS-7905
> URL: https://issues.apache.org/jira/browse/MESOS-7905
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>Assignee: Kapil Arya
>
> Observed this on ASF CI.
> {code}
> [ RUN  ] GarbageCollectorIntegrationTest.ExitedFramework
> I0818 23:51:42.881799  5882 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0818 23:51:42.884285  5907 master.cpp:442] Master 
> 6d3f4c59-27e2-4701-9f7f-7c1f301e7fba (ef22537e2401) started on 
> 172.17.0.3:57495
> I0818 23:51:42.884332  5907 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/rYJzr3/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/rYJzr3/master" --zk_session_timeout="10secs"
> I0818 23:51:42.884627  5907 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0818 23:51:42.884644  5907 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0818 23:51:42.884658  5907 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0818 23:51:42.884774  5907 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/rYJzr3/credentials'
> I0818 23:51:42.885066  5907 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0818 23:51:42.885213  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0818 23:51:42.885382  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0818 23:51:42.885512  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0818 23:51:42.885640  5907 master.cpp:646] Authorization enabled
> I0818 23:51:42.885818  5903 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0818 23:51:42.886016  5905 whitelist_watcher.cpp:77] No whitelist given
> I0818 23:51:42.889050  5908 master.cpp:2163] Elected as the leading master!
> I0818 23:51:42.889081  5908 master.cpp:1702] Recovering from registrar
> I0818 23:51:42.889387  5909 registrar.cpp:347] Recovering registrar
> I0818 23:51:42.889838  5909 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 409856ns
> I0818 23:51:42.889966  5909 registrar.cpp:495] Applied 1 operations in 
> 38859ns; attempting to update the registry
> I0818 23:51:42.890450  5909 registrar.cpp:552] Successfully updated the 
> registry in 425216ns
> I0818 23:51:42.890552  5909 registrar.cpp:424] Successfully recovered 
> registrar
> I0818 23:51:42.890890  5909 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0818 23:51:42.890969  5910 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0818 23:51:42.895795  5882 process.cpp:3228] Attempting to spawn already 
> spawned process files@172.17.0.3:57495
> I0818 23:51:42.896057  5882 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0818 23:51:42.897809  5904 slave.cpp:250] Mesos agent started on 
> (85)@172.17.0.3:57495
> I0818 23:51:42.897848  5904 slave.cpp:251] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> 

[jira] [Updated] (MESOS-8135) Masters can lose track of executor IDs.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8135:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Masters can lose track of executor IDs.
> ---
>
> Key: MESOS-8135
> URL: https://issues.apache.org/jira/browse/MESOS-8135
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere, mesosphere-oncall
>
> The response to the master's {{state}} endpoint sometimes doesn't include the 
> executor id corresponding to some tasks.
> Agents send to the leading master a list of tasks when reregistering. Tasks 
> that are not started by the command executor should contain an executor id, 
> but the following snippet can sometimes clear the executor id of tasks 
> started by other executors: 
> https://github.com/apache/mesos/blob/1.4.0/src/slave/slave.cpp#L1515-L1522



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fail to kill some tasks

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8051:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Killing TASK_GROUP fail to kill some tasks
> --
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Assignee: Qian Zhang
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210932  

[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7790:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4812) Mesos fails to escape command health checks

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4812:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: Andrei Budnik
>  Labels: health-check, mesosphere, tech-debt
> Attachments: health_task.gif
>
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-564) Update Contribution Documentation

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-564:
-
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67  (was: Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66)

> Update Contribution Documentation
> -
>
> Key: MESOS-564
> URL: https://issues.apache.org/jira/browse/MESOS-564
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dave Lester
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Our contribution guide is currently fairly verbose, and it focuses on the 
> ReviewBoard workflow for making code contributions. It would be helpful for 
> new contributors to have a first-time contribution guide which focuses on 
> using GitHub PRs to make small contributions, since that workflow has a 
> smaller barrier to entry for new users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7985) Use ASF CI for automating RPM packaging and upload to bintray.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7985:
--
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 64, Mesosphere Sprint 66)

> Use ASF CI for automating RPM packaging and upload to bintray.
> --
>
> Key: MESOS-7985
> URL: https://issues.apache.org/jira/browse/MESOS-7985
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7881) Building gRPC with CMake

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7881:
--
Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67  (was: 
Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere 
Sprint 64, Mesosphere Sprint 66)

> Building gRPC with CMake
> 
>
> Key: MESOS-7881
> URL: https://issues.apache.org/jira/browse/MESOS-7881
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
> Fix For: 1.4.0
>
>
> gRPC manages its own third-party libraries, which overlap with Mesos' 
> third-party library bundles. We need to write proper rules in CMake to 
> configure gRPC's CMake properly to build it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8119) ROOT_DOCKER_DockerHealthyTask segfaults in debian 8.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8119:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> ROOT_DOCKER_DockerHealthyTask segfaults in debian 8.
> 
>
> Key: MESOS-8119
> URL: https://issues.apache.org/jira/browse/MESOS-8119
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Amazon debian 8 AMIs:
> * gcc (Debian 4.9.2-10) 4.9.2
> * Docker version 17.07.0-ce, build 8784753
> * SSL or CMake builds only!
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: ROOT_DOCKER_DockerHealthyTask-segfault1.txt, 
> ROOT_DOCKER_DockerHealthyTask-segfault2.txt
>
>
> This test consistently cannot recover the agent on two debian 8 builds: with 
> SSL and CMake based. The error is always the same (full logs attached):
> {noformat}
> 19:40:59 E1019 19:40:58.581372 16873 slave.cpp:6301] EXIT with status 1: 
> Failed to perform recovery: Failed to run 'docker -H 
> unix:///var/run/docker.sock ps -a': exited with status 1; stderr='error 
> during connect: Get 
> http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/json?all=1: read unix 
> @->/var/run/docker.sock: read: connection reset by peer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7699:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable 
> freshly released)
> ---
>
> Key: MESOS-7699
> URL: https://issues.apache.org/jira/browse/MESOS-7699
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.2.0
>Reporter: Adam Cecile
>Assignee: Benno Evers
>  Labels: autotools
>
> Hi,
> It seems the issue comes from a workaround added a while ago:
> https://reviews.apache.org/r/40326/
> https://reviews.apache.org/r/40327/
> When building with external libraries it turns out creating build commands 
> line with -isystem /usr/include which is clearly stated as being wrong, 
> according to GCC guys:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129
> I'll do some testing by reverting all -isystem to -I and I'll let it know if 
> it gets built.
> Regards, Adam.
> {noformat}
> configure:21642: result: no
> configure:21642: checking glog/logging.h presence
> configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include 
> -I/usr/include conftest.cpp
> In file included from /usr/include/c++/6/ext/string_conversions.h:41:0,
>  from /usr/include/c++/6/bits/basic_string.h:5417,
>  from /usr/include/c++/6/string:52,
>  from /usr/include/c++/6/bits/locale_classes.h:40,
>  from /usr/include/c++/6/bits/ios_base.h:41,
>  from /usr/include/c++/6/ios:42,
>  from /usr/include/c++/6/ostream:38,
>  from /usr/include/glog/logging.h:43,
>  from conftest.cpp:32:
> /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or 
> directory
>  #include_next 
>  ^
> compilation terminated.
> configure:21642: $? = 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "mesos"
> | #define PACKAGE_TARNAME "mesos"
> | #define PACKAGE_VERSION "1.2.0"
> | #define PACKAGE_STRING "mesos 1.2.0"
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define PACKAGE "mesos"
> | #define VERSION "1.2.0"
> | #define STDC_HEADERS 1
> | #define HAVE_SYS_TYPES_H 1
> | #define HAVE_SYS_STAT_H 1
> | #define HAVE_STDLIB_H 1
> | #define HAVE_STRING_H 1
> | #define HAVE_MEMORY_H 1
> | #define HAVE_STRINGS_H 1
> | #define HAVE_INTTYPES_H 1
> | #define HAVE_STDINT_H 1
> | #define HAVE_UNISTD_H 1
> | #define HAVE_DLFCN_H 1
> | #define LT_OBJDIR ".libs/"
> | #define HAVE_CXX11 1
> | #define HAVE_PTHREAD_PRIO_INHERIT 1
> | #define HAVE_PTHREAD 1
> | #define HAVE_LIBZ 1
> | #define HAVE_FTS_H 1
> | #define HAVE_APR_POOLS_H 1
> | #define HAVE_LIBAPR_1 1
> | #define HAVE_BOOST_VERSION_HPP 1
> | #define HAVE_LIBCURL 1
> | /* end confdefs.h.  */
> | #include 
> configure:21642: result: no
> configure:21642: checking for glog/logging.h
> configure:21642: result: no
> configure:21674: error: cannot find glog
> ---
> You have requested the use of a non-bundled glog but no suitable
> glog could be found.
> You may want specify the location of glog by providing a prefix
> path via --with-glog=DIR, or check that the path you provided is
> correct if you're already doing this.
> ---
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8143) Publish and unpublish storage local resources through CSI plugins.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8143:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Publish and unpublish storage local resources through CSI plugins.
> --
>
> Key: MESOS-8143
> URL: https://issues.apache.org/jira/browse/MESOS-8143
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> Storage local resource provider needs to call the following CSI API to 
> publish CSI volumes for tasks to use:
> 1. ControllerPublishVolume (optional)
> 2. NodePublishVolume
> Although we don't need to unpublish CSI volumes after tasks are completed, we 
> still needs to unpublish them for DESTROY_VOLUME or DESTROY_BLOCK:
> 1. NodeUnpublishVolume
> 2. ControllerUnpublishVolume (optional)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8141) Add filesystem layout for storage resource providers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8141:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Add filesystem layout for storage resource providers.
> -
>
> Key: MESOS-8141
> URL: https://issues.apache.org/jira/browse/MESOS-8141
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> We need directories for placing mount points and checkpoint CSI volume state 
> for storage resource providers. Unlike resource checkpoints, CSI volume 
> states should persist across agents since otherwise the CSI plugin might not 
> work properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8052:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> "protoc" not found when running "make -j4 check" directly in stout
> --
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
> Fix For: 1.4.1
>
>
> If we run {{make -j4 check}} without running {{make}} first, we will get the 
> following error message:
> {noformat}
> 3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. 
> ../tests/protobuf_tests.proto
> /bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory
> Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed
> make: *** [protobuf_tests.pb.cc] Error 127
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6894) Checkpoint 'ContainerConfig' in Mesos Containerizer.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6894:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Checkpoint 'ContainerConfig' in Mesos Containerizer.
> 
>
> Key: MESOS-6894
> URL: https://issues.apache.org/jira/browse/MESOS-6894
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> This information can be used ford image GC in Mesos Containerizer, as well as 
> other purposes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8078:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> leader_info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4945) Garbage collect unused docker layers in the store.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4945:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Garbage collect unused docker layers in the store.
> --
>
> Key: MESOS-4945
> URL: https://issues.apache.org/jira/browse/MESOS-4945
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Zhitao Li
>  Labels: Mesosphere
>
> Right now, we don't have any garbage collection in place for docker layers. 
> It's not straightforward to implement because we don't know what container is 
> currently using the layer. We probably need a way to track the current usage 
> of layers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8144) Add a mock resource provider manager.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8144:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Add a mock resource provider manager.
> -
>
> Key: MESOS-8144
> URL: https://issues.apache.org/jira/browse/MESOS-8144
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.5.0
>
>
> To test a storage local resource provider, we need to inject a mock resource 
> provider manager such that:
> 1. A full agent will start during the test so the resource provider can 
> launch standalone containers for CSI plugins.
> 2. We can inject offer operations through the mock manager to test the 
> resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8115) Add a master flag to disallow agents that are not configured with fault domain

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8115:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Add a master flag to disallow agents that are not configured with fault domain
> --
>
> Key: MESOS-8115
> URL: https://issues.apache.org/jira/browse/MESOS-8115
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Once mesos masters and agents in a cluster are *all* upgraded to a version 
> where the fault domains feature is available, it is beneficial to enforce 
> that agents without a fault domain configured are not allowed to join the 
> cluster. 
> This is a safety net for operators who could forget to configure the fault 
> domain of a remote agent and let it join the cluster. If this happens, an 
> agent in a remote region will be considered a local agent by the master and 
> frameworks (because agent's fault domain is not configured) causing tasks to 
> potentially land in a remote agent which is undesirable.
> Note that this has to be a configurable flag and not enforced by default 
> because otherwise upgrades from a fault domain non-configured cluster to a 
> configured cluster will not be possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7550) Publish Local Resource Provider resources in the agent before container launch or update.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7550:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Publish Local Resource Provider resources in the agent before container 
> launch or update.
> -
>
> Key: MESOS-7550
> URL: https://issues.apache.org/jira/browse/MESOS-7550
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> The agent will ask RP manager to publish the resources before container can 
> start to use them. SLRP (storage local resource provider) will be responsible 
> for making sure the CSI volume is made available on the host. This will 
> involve calling `ControllerPublishVolume` and `NodePublishVolume` RPCs from 
> the CSI Plugin.
> This will happen when a workload (i.e., task/executor) are being launched on 
> the agent that uses a CSI volume.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7837) Propagate resource updates from local resource providers to master

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7837:
--
Sprint: Mesosphere Sprint 60, Mesosphere Sprint 61, Mesosphere Sprint 62, 
Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere 
Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 60, Mesosphere Sprint 
61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 65, Mesosphere Sprint 66)

> Propagate resource updates from local resource providers to master
> --
>
> Key: MESOS-7837
> URL: https://issues.apache.org/jira/browse/MESOS-7837
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
>
> When a resource provider registers with a resource provider manager, the 
> manager should sent a message to its subscribers informing them on the 
> changed resources.
> For the first iteration where we add agent-specific, local resource 
> providers, the agent would be subscribed to the manager. It should be changed 
> to handle such a resource update by informing the master about its changed 
> resources. In order to support master failovers, we should make sure to 
> similarly inform the master on agent reregistration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7975:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> The command/default/docker executor can incorrectly send a TASK_FINISHED 
> update even when the task is killed
> 
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default/command/docker executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7601) Some container launch failures are mistakenly treated as errors.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7601:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67  (was: 
Mesosphere Sprint 59, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere 
Sprint 64, Mesosphere Sprint 66)

> Some container launch failures are mistakenly treated as errors.
> 
>
> Key: MESOS-7601
> URL: https://issues.apache.org/jira/browse/MESOS-7601
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: containerizer, mesosphere, tech-debt
>
> I've observed a case when a scheduler stops (i.e. calls TEARDOWN) while some 
> of its tasks are being launched. While this is a valid behaviour, the agent 
> prints an error and increased container launch errors metrics.
> Below are log excerpts for such framework, 
> {{6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092}}.
> *Master log*
> {noformat}
> [centos@ip-172-31-6-200 ~]$ journalctl _PID=29716 --since "2 hours ago" 
> --no-pager | grep 
> "6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092"
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226218 29724 master.cpp:6072] Updating 
> info for framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226356 29728 hierarchical.cpp:274] Added 
> framework 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226405 29728 hierarchical.cpp:379] 
> Deactivated framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.228570 29728 hierarchical.cpp:343] 
> Activated framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.246068 29721 master.cpp:7105] Sending 1 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.247851 29721 master.cpp:7194] Sending 1 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.912937 29728 master.cpp:4806] Processing 
> DECLINE call for offers: [ 92434aef-27da-4fd1-a5c4-b286d640d5b3-O509464 ] for 
> framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:59.804184 29727 master.cpp:7105] Sending 2 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:59.804411 29727 master.cpp:7194] Sending 2 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.248924 29721 master.cpp:7105] Sending 2 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.249289 29721 master.cpp:7194] Sending 2 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.249724 29721 master.cpp:3851] Processing 
> ACCEPT call for offers: [ 

[jira] [Updated] (MESOS-7944) Implement jemalloc support for Mesos

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7944:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67  (was: Mesosphere Sprint 63, Mesosphere Sprint 65, 
Mesosphere Sprint 66)

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>  Labels: mesosphere
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8087) Add operation status update handler in Master.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8087:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Add operation status update handler in Master.
> --
>
> Key: MESOS-8087
> URL: https://issues.apache.org/jira/browse/MESOS-8087
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> Please follow this doc for details.
> https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#
> This handler will process operation status update from resource providers. 
> Depends on whether it's old or new operations, the logic is slightly 
> different.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8102) Add a test CSI plugin for storage local resource provider.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8102:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Add a test CSI plugin for storage local resource provider.
> --
>
> Key: MESOS-8102
> URL: https://issues.apache.org/jira/browse/MESOS-8102
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> We need a dummy CSI plugin for testing storage local resoure providers. The 
> test CSI plugin would just create subdirectories under its working 
> directories to mimic the behavior of creating volumes, then bind-mount those 
> volumes to mimic publish.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8101) Import resources from CSI plugins in storage local resource provider.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8101:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Import resources from CSI plugins in storage local resource provider.
> -
>
> Key: MESOS-8101
> URL: https://issues.apache.org/jira/browse/MESOS-8101
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> The following lists the steps to import resources from a CSI plugin:
> 1. Launch the node plugin
> 1.1 GetSupportedVersions
> 1.2 GetPluginInfo
> 1.3 ProbeNode
> 1.4 GetNodeCapabilities
> 2. Launch the controller plugin
> 2.1 GetSuportedVersions
> 2.2 GetPluginInfo
> 2.3 GetControllerCapabilities
> 3. GetCapacity
> 4. ListVolumes
> 5. Report to the resource provider through UPDATE_TOTAL_RESOURCES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7305) Adjust the recover logic of MesosContainerizer to allow standalone containers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7305:
--
Sprint: Mesosphere Sprint 57, Mesosphere Sprint 58, Mesosphere Sprint 59, 
Mesosphere Sprint 60, Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere 
Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67  (was: Mesosphere Sprint 57, Mesosphere Sprint 58, 
Mesosphere Sprint 59, Mesosphere Sprint 60, Mesosphere Sprint 61, Mesosphere 
Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66)

> Adjust the recover logic of MesosContainerizer to allow standalone containers.
> --
>
> Key: MESOS-7305
> URL: https://issues.apache.org/jira/browse/MESOS-7305
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> The current recovery logic in MesosContainerizer assumes that all top level 
> containers are tied to some Mesos executors. Adding standalone containers 
> will invalid this assumption. The recovery logic must be changed to adapt to 
> that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7982) Create Centos 6/7 RPM package.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7982:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66, 
Mesosphere Sprint 67  (was: Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 66)

> Create Centos 6/7 RPM package.
> --
>
> Key: MESOS-7982
> URL: https://issues.apache.org/jira/browse/MESOS-7982
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>
> Create SPEC file and a corresponding Docker file for CentOS 6 and 7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7594) Implement 'apply' for resource provider related operations

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7594:
--
Sprint: Mesosphere Sprint 57, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67  (was: 
Mesosphere Sprint 57, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere 
Sprint 64, Mesosphere Sprint 66)

> Implement 'apply' for resource provider related operations
> --
>
> Key: MESOS-7594
> URL: https://issues.apache.org/jira/browse/MESOS-7594
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> Resource providers provide new offer operations ({{CREATE_BLOCK}}, 
> {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations 
> can be applied by frameworks when they accept on offer. Handling of these 
> operations has to be added to the master's {{accept}} call. I.e. the 
> corresponding resource provider needs be extracted from the offer's resources 
> and a {{resource_provider::Event::OPERATION}} has to be sent to the resource 
> provider. The resource provider will answer with a 
> {{resource_provider::Call::Update}} which needs to be handled as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7851) Master stores old resource format in the registry

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7851:
--
Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere 
Sprint 67  (was: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 
63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66)

> Master stores old resource format in the registry
> -
>
> Key: MESOS-7851
> URL: https://issues.apache.org/jira/browse/MESOS-7851
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Greg Mann
>Assignee: Michael Park
>  Labels: master, mesosphere, reservation
>
> We intend for the master to store all internal resource representations in 
> the new, post-reservation-refinement format. However, [when persisting 
> registered agents to the 
> registrar|https://github.com/apache/mesos/blob/498a000ac1bb8f51dc871f22aea265424a407a17/src/master/master.cpp#L5861-L5876],
>  the master does not convert the resources; agents provide resources in the 
> pre-reservation-refinement format, and these resources are stored as-is. This 
> means that after recovery, any agents in the master's {{slaves.recovered}} 
> map will have {{SlaveInfo.resources}} in the pre-reservation-refinement 
> format.
> We should update the master to convert these resources before persisting them 
> to the registry.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7506:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7966) check for maintenance on agent causes fatal error

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7966:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> check for maintenance on agent causes fatal error
> -
>
> Key: MESOS-7966
> URL: https://issues.apache.org/jira/browse/MESOS-7966
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.0
>Reporter: Rob Johnson
>Assignee: Armand Grillet
>Priority: Blocker
>  Labels: reliability
>
> We interact with the maintenance API frequently to orchestrate gracefully 
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with 
> the api. This happens relatively frequently, and impacts us when downstream 
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: 
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're 
> happy to provide any other logs you need - please let me know what would be 
> useful for debugging.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8055) Design doc for offer operations feedback

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8055:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Design doc for offer operations feedback
> 
>
> Key: MESOS-8055
> URL: https://issues.apache.org/jira/browse/MESOS-8055
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8100) Authorize standalone container calls from local resource providers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8100:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Authorize standalone container calls from local resource providers.
> ---
>
> Key: MESOS-8100
> URL: https://issues.apache.org/jira/browse/MESOS-8100
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> We need to add authorization for a local resource provider to call the 
> standalone container API to prevent the provider from manipulating arbitrary 
> containers. We can use the same JWT-based authN/authZ mechanism for 
> executors, where the agent will create a auth token for each local resource 
> provider instance:
> {noformat}
> class LecalResourceProvider
> {
> public:
>   static Try create(
>   const process::http::URL& url,
>   const std::string& workDir,
>   const mesos::ResourceProviderInfo& info,
>   const Option& authToken);
>   ...
> };
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8070) Bundled GRPC build does not build on Debian 8

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8070:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Bundled GRPC build does not build on Debian 8
> -
>
> Key: MESOS-8070
> URL: https://issues.apache.org/jira/browse/MESOS-8070
> Project: Mesos
>  Issue Type: Bug
>Reporter: Zhitao Li
>Assignee: Chun-Hung Hsiao
> Fix For: 1.5.0
>
>
> Debian 8 includes an outdated version of libc-ares-dev, which prevents 
> bundled GRPC to build.
> I believe [~chhsia0] already has a fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8108) Process offer operations in storage local resource provider

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8108:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Process offer operations in storage local resource provider
> ---
>
> Key: MESOS-8108
> URL: https://issues.apache.org/jira/browse/MESOS-8108
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.5.0
>
>
> The storage local resource provider receives offer operations for 
> reservations and resource conversions, and invoke proper CSI calls to 
> implement these operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8054:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> Only LAUNCH operations provide feedback on success or failure. All Operations 
> should do so. RESERVE, UNRESERVE, CREATE, DESTROY, CREATE_VOLUME, AND 
> DESTROY_VOLUME should all provide feedback on success or failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Armand Grillet
>Priority: Blocker
>  Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8013) Add test for blkio statistics

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8013:
--
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 64, Mesosphere Sprint 66)

> Add test for blkio statistics
> -
>
> Key: MESOS-8013
> URL: https://issues.apache.org/jira/browse/MESOS-8013
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> In [MESOS-6162|https://issues.apache.org/jira/browse/MESOS-6162], we have 
> added the support for cgroups blkio statistics. In this ticket, we'd like to 
> add a test to verify the cgroups blkio statistics can be correctly retrieved 
> via Mesos containerizer's {{usage()}} method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8096:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 
66)

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -
>
> Key: MESOS-8096
> URL: https://issues.apache.org/jira/browse/MESOS-8096
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver, test
> Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: AsyncExecutorProcess-badrun-1.txt, 
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) 
> hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8097) Add filesystem layout for local resource providers.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8097:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Add filesystem layout for local resource providers.
> ---
>
> Key: MESOS-8097
> URL: https://issues.apache.org/jira/browse/MESOS-8097
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> We need to add a checkpoint directory for local resource providers. The 
> checkpoints should be tied to the slave ID, otherwise resources with the same 
> ID appearing on different agents (due to agent failover and registering with 
> a new ID) may confuse frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7939:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66)

> Early disk usage check for garbage collection during recovery
> -
>
> Key: MESOS-7939
> URL: https://issues.apache.org/jira/browse/MESOS-7939
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Critical
> Fix For: 1.4.1
>
>
> Currently the default value for `disk_watch_interval` is 1 minute. This is 
> not fast enough and could lead to the following scenario:
> 1. The disk usage was checked and there was not enough headroom:
> {noformat}
> I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max 
> allowed age: 0ns
> {noformat}
> But no container was pruned because no container had been scheduled for GC.
> 2. A task was completed. The task itself contained a lot of nested 
> containers, each used a lot of disk space. Note that there is no way for 
> Mesos agent to schedule individual nested containers for GC since nested 
> containers are not necessarily tied to tasks. When the top-lovel container is 
> completed, it was scheduled for GC, and the nested containers would be GC'ed 
> as well: 
> {noformat}
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.9466483852days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466405037days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.946635763days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466324148days in the future
> {noformat}
> 3. Since the next disk usage check was still 40ish seconds away, no GC was 
> performed even though the disk was full. As a result, Mesos agent failed to 
> checkpoint the task status:
> {noformat}
> I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> 4. When the agent restarted, it tried to checkpoint the task status again. 
> However, since the first disk usage check was scheduled 1 minute after 
> startup, the agent failed before GC kicked in, falling into a restart failure 
> loop:
> {noformat}
> F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> We should kick in GC early, so the agent can 

[jira] [Updated] (MESOS-8079) Checkpoint and recover layers used to provision rootfs in provisioner

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8079:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Checkpoint and recover layers used to provision rootfs in provisioner
> -
>
> Key: MESOS-8079
> URL: https://issues.apache.org/jira/browse/MESOS-8079
> Project: Mesos
>  Issue Type: Task
>  Components: provisioner
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>  Labels: Mesosphere
>
> This information will be necessary for {{provisioner}} to determine all 
> layers of active containers, which we need to retain when image gc happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8032) Launch CSI plugins in storage local resource provider.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8032:
--
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67  (was: Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66)

> Launch CSI plugins in storage local resource provider.
> --
>
> Key: MESOS-8032
> URL: https://issues.apache.org/jira/browse/MESOS-8032
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
> Fix For: 1.5.0
>
>
> Launching a CSI plugin requires the following steps:
> 1. Verify the configuration.
> 2. Prepare a directory in the work directory of the resource provider where 
> the socket file should be placed, and construct the path of the socket file.
> 3. If the socket file already exists and the plugin is already running, we 
> should not launch another plugin instance.
> 4. Otherwise, launch a standalone container to run the plugin and connect to 
> it through the socket file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7306) Support mount propagation for host volumes.

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7306:
--
Sprint: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 66, Mesosphere Sprint 67  (was: Mesosphere Sprint 62, 
Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66)

> Support mount propagation for host volumes.
> ---
>
> Key: MESOS-7306
> URL: https://issues.apache.org/jira/browse/MESOS-7306
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere, storage
>
> Currently, all mounts in a container are marked as 'slave' by default. 
> However, for some cases, we may want mounts under certain directory in a 
> container to be propagate back to the root mount namespace. This is useful 
> for the case where we want the mounts to survive container failures.
> See more documentation about mount propagation in:
> https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
> Given mount propagation is very hard for users to understand, probably worth 
> limiting this to just host volumes because we only see use case for that at 
> the moment.
> Some relevant discussion can be found here:
> https://github.com/kubernetes/community/blob/master/contributors/design-proposals/propagation.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8089) Add messages to publish resources on a resource provider

2017-10-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8089:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67  
(was: Mesosphere Sprint 65, Mesosphere Sprint 66)

> Add messages to publish resources on a resource provider
> 
>
> Key: MESOS-8089
> URL: https://issues.apache.org/jira/browse/MESOS-8089
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> Before launching a task that uses resource provider resources, the resource 
> provider needs to be informed to "publish" these resources as it may take 
> some necessary actions. For external resource providers resources might also 
> have to be "unpublished" when a task is finished. The resource provider needs 
> to ack these calls after it's ready.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8145) Breadcrumb link on Getting Started is broken

2017-10-30 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225338#comment-16225338
 ] 

James Peach commented on MESOS-8145:


Ping [~andschwa]

> Breadcrumb link on Getting Started is broken
> 
>
> Key: MESOS-8145
> URL: https://issues.apache.org/jira/browse/MESOS-8145
> Project: Mesos
>  Issue Type: Bug
>Reporter: Tomasz Janiszewski
>Priority: Minor
>
> After changing getting started page from {{getting started}} to  
> {{getting-started}}  breadcrumb is generated improperly and after clicking 
> gives 404. It happens because of the logic in breadcrumbs generator that 
> removes spaces from page breadcrumb and uses them as a part of the link.
> https://github.com/apache/mesos/blob/1.4.0/site/source/layouts/basic.erb#L73
> Introduced in MESOS-8117



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7594) Implement 'apply' for resource provider related operations

2017-10-30 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225131#comment-16225131
 ] 

Jie Yu commented on MESOS-7594:
---

commit 1d8992f26acb16c90ad406336fc6a7b9617ce8e3
Author: Jan Schlicht 
Date:   Fri Oct 27 11:38:29 2017 +0200

Implemented handling of storage related offer operations in master.

While the resource provider manager is responsible to apply offer
operations by sending events to the respective resource providers,
the master takes care of accepting these operations. Hence, for local
resource providers the master sends an `ApplyResourceOperationMessage`
to the agent where the resource provider is running on. The agent
then relays the operation contained in the message to the resource
provider manager.

(This is based on https://reviews.apache.org/r/61947)

Review: https://reviews.apache.org/r/63356

> Implement 'apply' for resource provider related operations
> --
>
> Key: MESOS-7594
> URL: https://issues.apache.org/jira/browse/MESOS-7594
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> Resource providers provide new offer operations ({{CREATE_BLOCK}}, 
> {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations 
> can be applied by frameworks when they accept on offer. Handling of these 
> operations has to be added to the master's {{accept}} call. I.e. the 
> corresponding resource provider needs be extracted from the offer's resources 
> and a {{resource_provider::Event::OPERATION}} has to be sent to the resource 
> provider. The resource provider will answer with a 
> {{resource_provider::Call::Update}} which needs to be handled as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7594) Implement 'apply' for resource provider related operations

2017-10-30 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225127#comment-16225127
 ] 

Jie Yu commented on MESOS-7594:
---

commit 6c2c7d0ad9018e0b9050831aa7d9621b6c37fc03
Author: Jie Yu 
Date:   Thu Oct 26 12:19:56 2017 +0200

Added stub handler in agent for ApplyOfferOperationMessage.

Review: https://reviews.apache.org/r/63399

> Implement 'apply' for resource provider related operations
> --
>
> Key: MESOS-7594
> URL: https://issues.apache.org/jira/browse/MESOS-7594
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> Resource providers provide new offer operations ({{CREATE_BLOCK}}, 
> {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations 
> can be applied by frameworks when they accept on offer. Handling of these 
> operations has to be added to the master's {{accept}} call. I.e. the 
> corresponding resource provider needs be extracted from the offer's resources 
> and a {{resource_provider::Event::OPERATION}} has to be sent to the resource 
> provider. The resource provider will answer with a 
> {{resource_provider::Call::Update}} which needs to be handled as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7594) Implement 'apply' for resource provider related operations

2017-10-30 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225123#comment-16225123
 ] 

Jie Yu commented on MESOS-7594:
---

commit 2a620fd1296620004ed24e260e5a8993c9652427
Author: Jan Schlicht 
Date:   Tue Oct 17 15:19:29 2017 +0200

Removed TODOs from storage operation 'apply' handlers.

These operations don't alter the offered resources immediately.
Only after operation feedback has been received, resources might
be altered. But this will be handled in a different code path.

Review: https://reviews.apache.org/r/63105

> Implement 'apply' for resource provider related operations
> --
>
> Key: MESOS-7594
> URL: https://issues.apache.org/jira/browse/MESOS-7594
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> Resource providers provide new offer operations ({{CREATE_BLOCK}}, 
> {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations 
> can be applied by frameworks when they accept on offer. Handling of these 
> operations has to be added to the master's {{accept}} call. I.e. the 
> corresponding resource provider needs be extracted from the offer's resources 
> and a {{resource_provider::Event::OPERATION}} has to be sent to the resource 
> provider. The resource provider will answer with a 
> {{resource_provider::Call::Update}} which needs to be handled as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7594) Implement 'apply' for resource provider related operations

2017-10-30 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225121#comment-16225121
 ] 

Jie Yu commented on MESOS-7594:
---

commit 6c8ae68180f92bba0dbdf0516cc833c04f958f5b
Author: Jan Schlicht 
Date:   Tue Oct 24 00:37:36 2017 -0700

Added validation for disk related new operations.

(This is based on https://reviews.apache.org/r/61946)

Review: https://reviews.apache.org/r/63355

> Implement 'apply' for resource provider related operations
> --
>
> Key: MESOS-7594
> URL: https://issues.apache.org/jira/browse/MESOS-7594
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> Resource providers provide new offer operations ({{CREATE_BLOCK}}, 
> {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations 
> can be applied by frameworks when they accept on offer. Handling of these 
> operations has to be added to the master's {{accept}} call. I.e. the 
> corresponding resource provider needs be extracted from the offer's resources 
> and a {{resource_provider::Event::OPERATION}} has to be sent to the resource 
> provider. The resource provider will answer with a 
> {{resource_provider::Call::Update}} which needs to be handled as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8146) Mesos agent fails containers on restart if containers were started with memory-swap less than memory + 64mb

2017-10-30 Thread Guchakov Nikita (JIRA)
Guchakov Nikita created MESOS-8146:
--

 Summary: Mesos agent fails containers on restart if containers 
were started with memory-swap less than memory + 64mb
 Key: MESOS-8146
 URL: https://issues.apache.org/jira/browse/MESOS-8146
 Project: Mesos
  Issue Type: Bug
  Components: agent
Affects Versions: 1.4.0
 Environment: Mesos 1.4.0
Redhat 7.4
Marathon 1.4.8
docker 1.12.6 
docker api 1.24
Reporter: Guchakov Nikita


I'd have some strange behaviour with Mesos when trying to disable swap on 
docker containers. Our mesos version in use is 1.4.0
When marathon deploys containers with
```
"parameters": [
{
  "key": "memory",
  "value": "1024m"
},
{
  "key": "memory-swap",
  "value": "1024m"
}
  ]
```

then it deploys successfully. BUT when mesos-slave restarts and tries to 
deregister executor it fails:

```E1027 11:11:47.367416 12626 slave.cpp:4287] Failed to update resources for 
container 6e3e07af-db09-4dc0-88f8-4e5599529cbe of executor 
'templates-api.d72549fd-baed-11e7-9742-96b37b4eca54' of framework 
20171020-202151-141892780-5050-1-0001, destroying container: Failed to set 
'memory.limit_in_bytes': Invalid argument
```

Things goes more weird when I tried different memory-swap configurations:
Containers doesn't destroyed on slave's restart only when memory-swap >= memory 
+ 64mb.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8094) Leverage helper functions to reduce boilerplate code related to v1 API.

2017-10-30 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224769#comment-16224769
 ] 

Alexander Rukletsov commented on MESOS-8094:


{noformat}
Commit: c0e1b41f82d764fe240de721c30314da2b62445a [c0e1b41]
Author: James Peach 
Date: 30 October 2017 at 12:44:55 GMT+1
Committer: Alexander Rukletsov 

Stopped awaiting the connected event in ports isolator tests.

Rather than explicitly waiting for the `connected` scheduler event,
consistently apply the testing pattern from the default executor
tests. We expect that the `connected` event happens, but we only
need to synchronize the test on the `subscribed` event.

Review: https://reviews.apache.org/r/63183/
{noformat}

> Leverage helper functions to reduce boilerplate code related to v1 API.
> ---
>
> Key: MESOS-8094
> URL: https://issues.apache.org/jira/browse/MESOS-8094
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Alexander Rukletsov
>  Labels: mesosphere, newbie
>
> https://reviews.apache.org/r/61982/ created an example how test code related 
> to scheduler v1 API can be simplified with appropriate usage of helper 
> function. For example, instead of crafting a subscribe call manually like in
> {noformat}
>   {
> v1::scheduler::Call call;
> call.set_type(v1::scheduler::Call::SUBSCRIBE);
> v1::scheduler::Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(v1::DEFAULT_FRAMEWORK_INFO);
> mesos.send(call);
>   }
> {noformat}
> a helper function {{v1::scheduler::SendSubscribe()}} shall be invoked.
> To find all occurrences that shall be fixed, one can grep the test codebase 
> for {{call.set_type}}. At the moment I see the following files:
> {noformat}
> api_tests.cpp
> check_tests.cpp
> http_fault_tolerant_tests.cpp
> master_maintenance_tests.cpp
> master_tests.cpp
> scheduler_tests.cpp
> slave_authorization_tests.cpp
> slave_recovery_tests.cpp
> slave_tests.cpp
> {noformat}
> The same applies for sending status update acks; 
> {{v1::scheduler::SendAcknowledge()}} action shall be used instead of manually 
> crafting acks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables

2017-10-30 Thread Matthew Mead-Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224609#comment-16224609
 ] 

Matthew Mead-Briggs commented on MESOS-7886:


[~vinodkone] here's the ticket that we discussed at MesosCon.

I'm going to be rolling this approach out at Yelp pretty soon. The diff above 
is heavily inspired by the hook for setting task labels. After your talk I was 
thinking about how I would get the volume based secrets working too. I wonder 
if it makes sense to have a hook that just allows the user to override the 
whole TaskInfo object? Rather than a hook for each thing within it. Its a 
longer term goal anyway as it'll only be easy if we move to the Mesos 
containerizer. Maybe you have some better ideas about how we could hook secret 
resolvers into the master side more properly?

> Add master hook for setting environment variables
> -
>
> Key: MESOS-7886
> URL: https://issues.apache.org/jira/browse/MESOS-7886
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Matthew Mead-Briggs
>
> At Yelp we're planning to integrate our secret store with our platform as a 
> service which runs on Mesos.
> I was hoping to write a module to "inject" environment variables on the 
> master side but the necessary hook doesn't currently exist. Such a hook 
> already exists on the slave side. However, for this integration that would 
> require me to give all the agents access to the secret store and I'd much 
> prefer to limit this to the master side.
> There is already a hook for adding labels:
> https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48
> So it seems it should be pretty easy to add one for setting environment 
> variables too? I had a crack the other day but although I got my code to 
> compile something was not working at runtime (note: I'm not a C++ dev). Is 
> there any reason why we wouldn't want such a hook? If anyone can confirm that 
> it's a sane thing to add then I'd be happy to spend some time trying to get 
> it working (although I may need some help)!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7972) SlaveTest.HTTPSchedulerSlaveRestart test is flaky

2017-10-30 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7972:
---
Shepherd: Alexander Rukletsov
  Sprint: Mesosphere Sprint 66
Story Points: 1
  Labels: flaky-test mesosphere  (was: )

> SlaveTest.HTTPSchedulerSlaveRestart test is flaky
> -
>
> Key: MESOS-7972
> URL: https://issues.apache.org/jira/browse/MESOS-7972
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>Assignee: Benjamin Mahler
>  Labels: flaky-test, mesosphere
> Fix For: 1.5.0
>
> Attachments: slave_test_http_scheduler_restart.bad.log, 
> slave_test_http_scheduler_restart.good.log
>
>
> Saw this on ASF CI when testing 1.4.0-rc5
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0912 05:40:15.280185 32547 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:15.282783 32554 master.cpp:442] Master 
> c23ff8cf-cb2f-40d0-8f18-871a41f128cf (b909d5e22907) started on 
> 172.17.0.2:58922
> I0912 05:40:15.282804 32554 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/he1E9j/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/he1E9j/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:15.283092 32554 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:15.283110 32554 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:15.283118 32554 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:15.283123 32554 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/he1E9j/credentials'
> I0912 05:40:15.283394 32554 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:15.283543 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:15.283731 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:15.283887 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:15.284021 32554 master.cpp:646] Authorization enabled
> I0912 05:40:15.284293 32552 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:15.284335 32550 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:15.287078 32561 master.cpp:2163] Elected as the leading master!
> I0912 05:40:15.287103 32561 master.cpp:1702] Recovering from registrar
> I0912 05:40:15.287214 32557 registrar.cpp:347] Recovering registrar
> I0912 05:40:15.287703 32557 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 455936ns
> I0912 05:40:15.287791 32557 registrar.cpp:495] Applied 1 operations in 
> 24179ns; attempting to update the registry
> I0912 05:40:15.288317 32557 registrar.cpp:552] Successfully updated the 
> registry in 473088ns
> I0912 05:40:15.288435 32557 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:15.288789 32548 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:15.288822 32559 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0912 05:40:15.292457 32547 containerizer.cpp:246] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0912 05:40:15.293053 32547 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges
> W0912 05:40:15.293184 32547