[jira] [Commented] (MESOS-6596) Dynamic reservation endpoint returns 409s

2016-11-15 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669164#comment-15669164
 ] 

Joseph Wu commented on MESOS-6596:
--

Can you include the actual body of the request you are making?  Your request 
appears to be failing on a validation step, which should be deterministic.  
Keep in mind that the reserve call is no guaranteed to succeed, as a framework 
may be given those resources before you manage to complete the call.

Also, please include the version of Mesos.  In the current codebase, it is not 
possible to get a 409 return status with that message.  It is possible to get a 
400 return status, or a 409 with a different message.


> Dynamic reservation endpoint returns 409s
> -
>
> Key: MESOS-6596
> URL: https://issues.apache.org/jira/browse/MESOS-6596
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Kunal Thakar
>
> The operation to dynamically reserve a host for a framework consistently 
> fails, but succeeds sometimes.
> We are calling the /reserve endpoint on the master with the same payload and 
> it mostly returns 409, with the occasional success. Pasting the output of two 
> consecutive /reserve calls:
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> *   Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 409 Conflict
> HTTP/1.1 409 Conflict
> < Date: Tue, 15 Nov 2016 23:07:10 GMT
> Date: Tue, 15 Nov 2016 23:07:10 GMT
> < Content-Type: text/plain; charset=utf-8
> Content-Type: text/plain; charset=utf-8
> < Content-Length: 58
> Content-Length: 58
> * HTTP error before end of send, stop sending
> <
> * Closing connection #0
> Invalid RESERVE Operation:  does not contain mem(*):120621
> {code}
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> *   Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 202 Accepted
> HTTP/1.1 202 Accepted
> < Date: Tue, 15 Nov 2016 23:07:16 GMT
> Date: Tue, 15 Nov 2016 23:07:16 GMT
> < Content-Length: 0
> Content-Length: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6196) Make the disk/xfs isolator nesting aware.

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668947#comment-15668947
 ] 

James Peach commented on MESOS-6196:


FWIW I don't think that there's anything to do here as long as

* the XFS quota is applied to the enclosing task group directory
* all sub-containers have their scratch space within that directory
* the sub-container disk resource is a subset of the task group disk resource

One reason for adding nesting support would be to restrict the disk resource of 
individual sub-containers. Not sure whether that is expected to be part of the 
nesting semantics.

> Make the disk/xfs isolator nesting aware.
> -
>
> Key: MESOS-6196
> URL: https://issues.apache.org/jira/browse/MESOS-6196
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668934#comment-15668934
 ] 

James Peach commented on MESOS-6575:


A significant benefit of the {{disk/xfs}} isolator is that it doesn't kill the 
task, so I'm not very supportive of this. I suppose that it could be 
implemented as an additional feature flag, but I'm not sure why you would want 
this. IMHO the behavior of the {{disk/du}} isolator is pretty undesirable.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: isolation, slave
>Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5393) XFS disk isolator should disallow sandbox writes when no 'disk' is used in executor/task

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5393:
--

Assignee: James Peach

> XFS disk isolator should disallow sandbox writes when no 'disk' is used in 
> executor/task
> 
>
> Key: MESOS-5393
> URL: https://issues.apache.org/jira/browse/MESOS-5393
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: James Peach
>
> This is similar to MESOS-5081 and was left as a TODO in the first patch for 
> the XFS isolator.
> {noformat:title=}
> // TODO(jpeach) If there's no disk resource attached, we should set the
> // minimum quota (1 block), since a zero quota would be unconstrained.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5158) Provide XFS quota support for persistent volumes.

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5158:
--

Assignee: James Peach

> Provide XFS quota support for persistent volumes.
> -
>
> Key: MESOS-5158
> URL: https://issues.apache.org/jira/browse/MESOS-5158
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>Assignee: James Peach
>
> Given that the lifecycle of persistent volumes is managed outside of the 
> isolator, we may need to further abstract out the quota management 
> functionality to do it outside the XFS isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5116) Investigate supporting accounting only mode in XFS isolator

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5116:
--

Assignee: James Peach

> Investigate supporting accounting only mode in XFS isolator
> ---
>
> Key: MESOS-5116
> URL: https://issues.apache.org/jira/browse/MESOS-5116
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>Assignee: James Peach
>
> The initial implementation of XFS isolator always enforces the disk quota 
> limit. In contrast, Posix disk isolator supports optionally monitoring the 
> disk usage without enforcement. This eases the transition into disk quota 
> enforcement mode.
> Mesos agent provides a {{flags.enforce_container_disk_quota}} flag to turn on 
> enforcement when the Posix isolator is added. With XFS either we support it 
> as well or we need to change the flag so it's Posix disk isolator specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6596) Dynamic reservation endpoint returns 409s

2016-11-15 Thread Kunal Thakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Thakar updated MESOS-6596:

Description: 
The operation to dynamically reserve a host for a framework consistently fails, 
but succeeds sometimes.

We are calling the /reserve endpoint on the master with the same payload and it 
mostly returns 409, with the occasional success. Pasting the output of two 
consecutive /reserve calls:

{code}
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Tue, 15 Nov 2016 23:07:10 GMT
Date: Tue, 15 Nov 2016 23:07:10 GMT
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Content-Length: 58
Content-Length: 58

* HTTP error before end of send, stop sending
<
* Closing connection #0
Invalid RESERVE Operation:  does not contain mem(*):120621
{code}

{code}
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Date: Tue, 15 Nov 2016 23:07:16 GMT
Date: Tue, 15 Nov 2016 23:07:16 GMT
< Content-Length: 0
Content-Length: 0
{code}


  was:
The operation to dynamically reserve a host for a framework consistently fails, 
but succeeds sometimes.

We are calling the /reserve endpoint on the master with the same payload and it 
mostly returns 409, with the occasional success. Pasting the output of two 
consecutive /reserve calls:

```
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Tue, 15 Nov 2016 23:07:10 GMT
Date: Tue, 15 Nov 2016 23:07:10 GMT
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Content-Length: 58
Content-Length: 58

* HTTP error before end of send, stop sending
<
* Closing connection #0
Invalid RESERVE Operation:  does not contain mem(*):120621
```

```
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Date: Tue, 15 Nov 2016 23:07:16 GMT
Date: Tue, 15 Nov 2016 23:07:16 GMT
< Content-Length: 0
Content-Length: 0
```


> Dynamic reservation endpoint returns 409s
> -
>
> Key: MESOS-6596
> URL: https://issues.apache.org/jira/browse/MESOS-6596
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Kunal Thakar
>
> The operation to dynamically reserve a host for a framework consistently 
> fails, but succeeds sometimes.
> We are calling the /reserve endpoint on the master with the same payload and 
> it mostly returns 409, with the occasional success. Pasting the output of two 
> consecutive /reserve calls:
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> *   Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting 

[jira] [Created] (MESOS-6596) Dynamic reservation endpoint returns 409s

2016-11-15 Thread Kunal Thakar (JIRA)
Kunal Thakar created MESOS-6596:
---

 Summary: Dynamic reservation endpoint returns 409s
 Key: MESOS-6596
 URL: https://issues.apache.org/jira/browse/MESOS-6596
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Kunal Thakar


The operation to dynamically reserve a host for a framework consistently fails, 
but succeeds sometimes.

We are calling the /reserve endpoint on the master with the same payload and it 
mostly returns 409, with the occasional success. Pasting the output of two 
consecutive /reserve calls:

```
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Tue, 15 Nov 2016 23:07:10 GMT
Date: Tue, 15 Nov 2016 23:07:10 GMT
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Content-Length: 58
Content-Length: 58

* HTTP error before end of send, stop sending
<
* Closing connection #0
Invalid RESERVE Operation:  does not contain mem(*):120621
```

```
* About to connect() to computexxx-yyy port 5050 (#0)
*   Trying 10.184.21.3... connected
* Server auth using Basic with user 'cassandra'
> POST /master/reserve HTTP/1.1
> Authorization: Basic blah
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: computexxx-yyy:5050
> Accept: */*
> Content-Length: 1046
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
< HTTP/1.1 202 Accepted
HTTP/1.1 202 Accepted
< Date: Tue, 15 Nov 2016 23:07:16 GMT
Date: Tue, 15 Nov 2016 23:07:16 GMT
< Content-Length: 0
Content-Length: 0
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6595) As a Mesos user I want to launch processes that will run on every node in the cluster

2016-11-15 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668408#comment-15668408
 ] 

James DeFelice commented on MESOS-6595:
---

Marathon users have been asking for this feature for .. years: 
https://github.com/mesosphere/marathon/issues/846

> As a Mesos user I want to launch processes that will run on every node in the 
> cluster
> -
>
> Key: MESOS-6595
> URL: https://issues.apache.org/jira/browse/MESOS-6595
> Project: Mesos
>  Issue Type: Story
>Reporter: James DeFelice
>  Labels: mesosphere
>
> Some applicable use cases:
> - log collection
> - metrics and monitoring
> - service discovery
> It might also be useful to break this functionality down into: daemon 
> processes for master nodes vs. daemon processes for agent nodes.
> There was some initial discussion and back-of-the-napkin design for this at 
> Mesoscon this past year (with an emphasis on agent nodes) but I'm not aware 
> that anything significant materialized from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6595) As a Mesos user I want to launch processes that will run on every node in the cluster

2016-11-15 Thread James DeFelice (JIRA)
James DeFelice created MESOS-6595:
-

 Summary: As a Mesos user I want to launch processes that will run 
on every node in the cluster
 Key: MESOS-6595
 URL: https://issues.apache.org/jira/browse/MESOS-6595
 Project: Mesos
  Issue Type: Story
Reporter: James DeFelice


Some applicable use cases:
- log collection
- metrics and monitoring
- service discovery

It might also be useful to break this functionality down into: daemon processes 
for master nodes vs. daemon processes for agent nodes.

There was some initial discussion and back-of-the-napkin design for this at 
Mesoscon this past year (with an emphasis on agent nodes) but I'm not aware 
that anything significant materialized from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6594) Add `Containerizer::attach()` API call

2016-11-15 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6594:
-

 Summary: Add `Containerizer::attach()` API call
 Key: MESOS-6594
 URL: https://issues.apache.org/jira/browse/MESOS-6594
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone


This ticket just tracks the API change but not the actual implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6546) Update the Containerizer to handle attachInput and attachOutput calls.

2016-11-15 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6546:
--
Summary: Update the Containerizer to handle attachInput and attachOutput 
calls.  (was: Update the Containerizer API to include attachInput and 
attachOutput calls.)

> Update the Containerizer to handle attachInput and attachOutput calls.
> --
>
> Key: MESOS-6546
> URL: https://issues.apache.org/jira/browse/MESOS-6546
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> With the per-container I/O switchboard we are adding, the containerizer 
> should be responsible for both launching the I/O switchboard process, as well 
> as allowing external components to interface with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6162) Add support for cgroups blkio subsystem

2016-11-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6162:

Assignee: Jason Lai  (was: Zhitao Li)

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6590) Update protobuf for cgroups blkio subsystem

2016-11-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6590:

Assignee: Jason Lai

> Update protobuf for cgroups blkio subsystem
> ---
>
> Key: MESOS-6590
> URL: https://issues.apache.org/jira/browse/MESOS-6590
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6588) LinuxRootfs misses required files

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667754#comment-15667754
 ] 

James Peach edited comment on MESOS-6588 at 11/15/16 5:37 PM:
--

|Move containerizer Rootfs support to a cpp file. 
|[https://reviews.apache.org/r/53790|https://reviews.apache.org/r/53790] |
|Use the stout ELF parser to collect Linux rootfs files. 
|[https://reviews.apache.org/r/53791|https://reviews.apache.org/r/53791] |


was (Author: jamespeach):
|Move containerizer Rootfs support to a cpp file. 
|https://reviews.apache.org/r/53790 |
|Use the stout ELF parser to collect Linux rootfs files. 
|https://reviews.apache.org/r/53791 |

> LinuxRootfs misses required files
> -
>
> Key: MESOS-6588
> URL: https://issues.apache.org/jira/browse/MESOS-6588
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: James Peach
>Assignee: James Peach
>
> The hard-coded list of required files in 
> {{src/tests/containerizer/rootfs.hpp}} is out of date for Fedora 24. F24 now 
> requires {{libtinfo.so.6}} and {{/lib64/libcrypto.so.10}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6558) Added stub classes for rest cgroups subsystems.

2016-11-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6558:
---

Assignee: haosdent

> Added stub classes for rest cgroups subsystems.
> ---
>
> Key: MESOS-6558
> URL: https://issues.apache.org/jira/browse/MESOS-6558
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6593) Update protobuf for cgroups net_prio subsystem

2016-11-15 Thread haosdent (JIRA)
haosdent created MESOS-6593:
---

 Summary: Update protobuf for cgroups net_prio subsystem
 Key: MESOS-6593
 URL: https://issues.apache.org/jira/browse/MESOS-6593
 Project: Mesos
  Issue Type: Task
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6591) Update protobuf for cgroups pids subsystem

2016-11-15 Thread haosdent (JIRA)
haosdent created MESOS-6591:
---

 Summary: Update protobuf for cgroups pids subsystem
 Key: MESOS-6591
 URL: https://issues.apache.org/jira/browse/MESOS-6591
 Project: Mesos
  Issue Type: Task
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6592) Update protobuf for cgroups cpuset subsystem

2016-11-15 Thread haosdent (JIRA)
haosdent created MESOS-6592:
---

 Summary: Update protobuf for cgroups cpuset subsystem
 Key: MESOS-6592
 URL: https://issues.apache.org/jira/browse/MESOS-6592
 Project: Mesos
  Issue Type: Task
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6590) Update protobuf for cgroups blkio subsystem

2016-11-15 Thread haosdent (JIRA)
haosdent created MESOS-6590:
---

 Summary: Update protobuf for cgroups blkio subsystem
 Key: MESOS-6590
 URL: https://issues.apache.org/jira/browse/MESOS-6590
 Project: Mesos
  Issue Type: Task
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6567) Actively Scan for CNI Configurations

2016-11-15 Thread Dan Osborne (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667560#comment-15667560
 ] 

Dan Osborne commented on MESOS-6567:


Ah I didn't interpret "existing" correctly in your post. Makes sense.

So since there already is a filescan happening at container runtime, it 
hopefully shouldn't be too difficult to expand it to  networks that don't 
already exist too.

> Actively Scan for CNI Configurations
> 
>
> Key: MESOS-6567
> URL: https://issues.apache.org/jira/browse/MESOS-6567
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dan Osborne
>
> Mesos-Agent currently loads the CNI configs into memory at startup. After 
> this point, new configurations that are added will remain unknown to the 
> Mesos Agent process until it is restarted.
> This ticket is to request that the Mesos Agent process can the CNI config 
> directory each time it is networking a task, so that modifying, adding, and 
> removing networks will not require a slave reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6223) Allow agents to re-register post a host reboot

2016-11-15 Thread Megha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655590#comment-15655590
 ] 

Megha edited comment on MESOS-6223 at 11/15/16 3:39 PM:


[~neilc]
Here, I am analyzing the impact of allowing agent to recover post reboot in the 
context of partition awareness. In my understanding there is no new transition 
which is not already happening with partition awareness. Do you think there 
could be a risk involved in allowing the recovery post reboots.

1. If there are no partition-aware frameworks on the agent: Agent while 
rebooting could either be disconnected or may fail the master health check 
timeout. The executors don't re-register as they have exited because of the 
reboot. Agent re-registers and starts to send status updates for unacked 
updates. From the framework's point of view the transition is simply 
TASK_STARTING/TASK_RUNNING -> TASK_LOST.

2. If there are tasks from partition aware frameworks on the agent: 
a. The transition is same as above if the agent is disconnected.
b. If the agent is marked unreachable while it was rebooting then from the 
framework's point of view, the tasks transition   from TASK_UNREACHABLE -> 
TASK_GONE when the agent re-registers and send status updates. Since the 
unreachable agents are in registry so master will remember them across its 
failovers so if the agent doesn't come back then frameworks will receive 
TASK_UNREACHABLE update upon reconciliation unless the registry is purged.
c. If the agent is marked gone then the master is going to send 
TASK_GONE_BY_OPERATOR and if such an agent doesn't come back then future 
framework reconciliations will result in TASK_UNKNOWN status update since these 
there is no gone registry so the agents won't be remembered across master 
failovers. And if the agent eventually comes back then the task could 
transition from TASK_UNKNOWN back to TASK_GONE.



was (Author: megha.sharma):
[~neilc]
Here, I am analyzing the impact of allowing agent to recover post reboot in the 
context of partition awareness. In my understanding there is no new transition 
which is not already happening with partition awareness. Do you think there 
could be a risk involved in allowing the recovery post reboots.

1. If there are no partition-aware frameworks on the agent: Agent while 
rebooting could either be disconnected or may fail the master health check 
timeout. The executors don't re-register as they have exited because of the 
reboot. Agent re-registers and starts to send status updates for unacked 
updates. From the framework's point of view the transition is simply 
TASK_STARTING/TASK_RUNNING -> TASK_LOST.

2. If there are tasks from partition aware frameworks on the agent: 
a. The transition is same as above if the agent is disconnected.
b. If the agent is marked unreachable while it was rebooting then from the 
framework's point of view, the tasks transition   from TASK_UNREACHABLE -> 
TASK_GONE when the agent re-registers and send status updates. Since the 
unreachable agents are in registry so master will remember them across its 
failovers so if the agent doesn't come back then frameworks will receive 
TASK_UNREACHABLE update upon reconciliation unless the registry is purged.
c. If the agent is marked gone then the master sends TASK_GONE and if such 
an agent doesn't come back then future framework reconciliations will result in 
TASK_UNKNOWN status update since these there is no gone registry so the agents 
won't be remembered across master failovers. And if the agent eventually comes 
back then the task could transition from TASK_UNKNOWN back to TASK_GONE.


> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Megha
>Assignee: Megha
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5966:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47  (was: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 
42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6395:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> HealthChecker sends updates to executor via libprocess messaging.
> -
>
> Key: MESOS-6395
> URL: https://issues.apache.org/jira/browse/MESOS-6395
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently {{HealthChecker}} sends status updates via libprocess messaging to 
> the executor's UPID. This seems unnecessary after refactoring health checker 
> into the library: a simple callback will do. Moreover, not requiring 
> executor's {{UPID}} will simplify creating a mocked {{HealthChecker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6477) Build a standalone python client for connecting to our Mock HTTP Server that implements the new Debug APIs

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6477:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Build a standalone python client for connecting to our Mock HTTP Server that 
> implements the new Debug APIs
> --
>
> Key: MESOS-6477
> URL: https://issues.apache.org/jira/browse/MESOS-6477
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Steven Locke
>  Labels: debugging, mesosphere
>
> This client prototype should have a similar CLI to what we eventually want to 
> build into the Mesos or DC/OS CLI.
> {noformat}
> Streaming HTTP Client
> Usage:
>   client task exec [--tty] [--interactive]   [...]
>   client task attach [--tty] [--interactive] 
> Options:
>   --tty  Allocate a tty on the server before
>  attaching to the container.
>   --interactive  Connect the stdin of the client to
>  the stdin of the container.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6476) Build a Mock HTTP Server that implements the new Debugging API calls

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6476:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Build a Mock HTTP Server that implements the new Debugging API calls
> 
>
> Key: MESOS-6476
> URL: https://issues.apache.org/jira/browse/MESOS-6476
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Steven Locke
>  Labels: debugging, mesosphere
>
> The mock server should simply launch a process to run whatever command is 
> passed to it, rather than attempt to launch an actual nested container in 
> mesos. However, it should do everything necessary to deal with attaching a 
> {{pty}}  / redirecting {{stdin/stdout/stderr}} properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere 
Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 39, Mesosphere Sprint 
40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, 
Mesosphere Sprint 45, Mesosphere Sprint 46)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5900) Support Unix domain socket connections in libprocess

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5900:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Support Unix domain socket connections in libprocess
> 
>
> Key: MESOS-5900
> URL: https://issues.apache.org/jira/browse/MESOS-5900
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Benjamin Hindman
>  Labels: mesosphere
>
> We should consider allowing two programs on the same host using libprocess to 
> communicate via Unix domain sockets rather than TCP. This has a few 
> advantages:
> * Security: remote hosts cannot connect to the Unix socket. Domain sockets 
> also offer additional support for 
> [authentication|https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Authentication-UNIX_Domain.html].
> * Performance: domain sockets are marginally faster than localhost TCP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6466) Add support for streaming HTTP requests in Mesos

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6466:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Add support for streaming HTTP requests in Mesos
> 
>
> Key: MESOS-6466
> URL: https://issues.apache.org/jira/browse/MESOS-6466
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Anand Mazumdar
>  Labels: debugging, mesosphere
>
> We already have support for streaming HTTP responses in Mesos. We now also 
> need to add support for streaming HTTP requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5597) Document Mesos "health check" feature.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5597:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Document Mesos "health check" feature.
> --
>
> Key: MESOS-5597
> URL: https://issues.apache.org/jira/browse/MESOS-5597
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: documentation, health-check, mesosphere
>
> We don't talk about this feature at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5963:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> HealthChecker should not decide when to kill tasks and when to stop 
> performing health checks.
> -
>
> Key: MESOS-5963
> URL: https://issues.apache.org/jira/browse/MESOS-5963
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, {{HealthChecker}} library decides when a task should be killed 
> based on its health status. Moreover, it stops checking it health after that. 
> This seems unfortunate, because it's up to the executor and / or framework to 
> decide both when to kill tasks and when to health check them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5856) Logrotate ContainerLogger module does not rotate logs when run as root with `--switch_user`.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5856:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Logrotate ContainerLogger module does not rotate logs when run as root with 
> `--switch_user`.
> 
>
> Key: MESOS-5856
> URL: https://issues.apache.org/jira/browse/MESOS-5856
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0, 0.28.0, 1.0.0
>Reporter: Joseph Wu
>Assignee: Sivaram Kannan
>Priority: Critical
>  Labels: logger, mesosphere, newbie
>
> The logrotate ContainerLogger module runs as the agent's user.  In most 
> cases, this is {{root}}.
> When {{logrotate}} is run as root, there is an additional check the 
> configuration files must pass (because a root {{logrotate}} needs to be 
> secured against non-root modifications to the configuration):
> https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815
> Log rotation will fail under the following scenario:
> 1) The agent is run with {{--switch_user}} (default: true)
> 2) A task is launched with a non-root user specified
> 3) The logrotate module spawns a few companion processes (as root) and this 
> creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and 
> {{stderr.logrotate.conf}} files (as root).  This step races with the next 
> step.
> 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to 
> the non-root user.  Including the files just created.
> 5) When {{logrotate}} is run, it will skip any non-root configuration files.  
> This means the files are not rotated.
> 
> Fix: The logrotate module's companion processes should call {{setuid}} and 
> {{setgid}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6494) Clean up the flags parsing in the executors.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6494:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Clean up the flags parsing in the executors.
> 
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6184:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 46, Mesosphere Sprint 47  
(was: Mesosphere Sprint 44, Mesosphere Sprint 46)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Blocker
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6348) Allow `network/cni` isolator unit-tests to run with CNI plugins

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6348:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Allow `network/cni` isolator unit-tests to run with CNI plugins 
> 
>
> Key: MESOS-6348
> URL: https://issues.apache.org/jira/browse/MESOS-6348
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, we don't have any infrastructure to allow for CNI plugins to be 
> used in `network/cni` isolator unit-tests. This forces us to mock CNI plugins 
> that don't use new network namespaces leading to very restricting form of 
> unit-tests. 
> Especially for port-mapper plugin, in order to test its DNAT functionality it 
> will be very useful if we run the containers in separate network namespace 
> requiring an actual CNI plugin.
> The proposal is there to introduce a test filter called CNIPLUGIN, that gets 
> set when CNI_PATH env var is set. Tests using the CNIPLUGIN filter can then 
> use actual CNI plugins in their tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6589) Document DockerInfo.Parameter usage in the docker containerizer document

2016-11-15 Thread haosdent (JIRA)
haosdent created MESOS-6589:
---

 Summary: Document DockerInfo.Parameter usage in the docker 
containerizer document 
 Key: MESOS-6589
 URL: https://issues.apache.org/jira/browse/MESOS-6589
 Project: Mesos
  Issue Type: Improvement
  Components: docker, documentation
Reporter: haosdent
Assignee: haosdent
Priority: Minor


Some users would like to pass extra parameters when launch docker container by 
Mesos. Apart from reading the mesos protobuf message, user are not aware of how 
to do that in Mesos via reading documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6567) Actively Scan for CNI Configurations

2016-11-15 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666477#comment-15666477
 ] 

Qian Zhang commented on MESOS-6567:
---

Yes, NetworkCniIsolatorProcess::create happens at boot, so picking up new CNI 
network configuration files requires a reboot of agent. However, the 
modification to an existing CNI network configuration file can be picked up at 
runtime, because the CNI network configuration file will be read every time 
when attaching a new container to a CNI network 
(https://github.com/apache/mesos/blob/1.0.1/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L969),
 so it is possible to launch a container to a CNI network, and then modify the 
configuration file of that CNI network and launch another container to that CNI 
network with the new configuration.

> Actively Scan for CNI Configurations
> 
>
> Key: MESOS-6567
> URL: https://issues.apache.org/jira/browse/MESOS-6567
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dan Osborne
>
> Mesos-Agent currently loads the CNI configs into memory at startup. After 
> this point, new configurations that are added will remain unknown to the 
> Mesos Agent process until it is restarted.
> This ticket is to request that the Mesos Agent process can the CNI config 
> directory each time it is networking a task, so that modifying, adding, and 
> removing networks will not require a slave reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)