[jira] [Commented] (MESOS-6596) Dynamic reservation endpoint returns 409s
[ https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669164#comment-15669164 ] Joseph Wu commented on MESOS-6596: -- Can you include the actual body of the request you are making? Your request appears to be failing on a validation step, which should be deterministic. Keep in mind that the reserve call is no guaranteed to succeed, as a framework may be given those resources before you manage to complete the call. Also, please include the version of Mesos. In the current codebase, it is not possible to get a 409 return status with that message. It is possible to get a 400 return status, or a 409 with a different message. > Dynamic reservation endpoint returns 409s > - > > Key: MESOS-6596 > URL: https://issues.apache.org/jira/browse/MESOS-6596 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Kunal Thakar > > The operation to dynamically reserve a host for a framework consistently > fails, but succeeds sometimes. > We are calling the /reserve endpoint on the master with the same payload and > it mostly returns 409, with the occasional success. Pasting the output of two > consecutive /reserve calls: > {code} > * About to connect() to computexxx-yyy port 5050 (#0) > * Trying 10.184.21.3... connected > * Server auth using Basic with user 'cassandra' > > POST /master/reserve HTTP/1.1 > > Authorization: Basic blah > > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > > Host: computexxx-yyy:5050 > > Accept: */* > > Content-Length: 1046 > > Content-Type: application/x-www-form-urlencoded > > Expect: 100-continue > > > * Done waiting for 100-continue > < HTTP/1.1 409 Conflict > HTTP/1.1 409 Conflict > < Date: Tue, 15 Nov 2016 23:07:10 GMT > Date: Tue, 15 Nov 2016 23:07:10 GMT > < Content-Type: text/plain; charset=utf-8 > Content-Type: text/plain; charset=utf-8 > < Content-Length: 58 > Content-Length: 58 > * HTTP error before end of send, stop sending > < > * Closing connection #0 > Invalid RESERVE Operation: does not contain mem(*):120621 > {code} > {code} > * About to connect() to computexxx-yyy port 5050 (#0) > * Trying 10.184.21.3... connected > * Server auth using Basic with user 'cassandra' > > POST /master/reserve HTTP/1.1 > > Authorization: Basic blah > > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > > Host: computexxx-yyy:5050 > > Accept: */* > > Content-Length: 1046 > > Content-Type: application/x-www-form-urlencoded > > Expect: 100-continue > > > * Done waiting for 100-continue > < HTTP/1.1 202 Accepted > HTTP/1.1 202 Accepted > < Date: Tue, 15 Nov 2016 23:07:16 GMT > Date: Tue, 15 Nov 2016 23:07:16 GMT > < Content-Length: 0 > Content-Length: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6196) Make the disk/xfs isolator nesting aware.
[ https://issues.apache.org/jira/browse/MESOS-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668947#comment-15668947 ] James Peach commented on MESOS-6196: FWIW I don't think that there's anything to do here as long as * the XFS quota is applied to the enclosing task group directory * all sub-containers have their scratch space within that directory * the sub-container disk resource is a subset of the task group disk resource One reason for adding nesting support would be to restrict the disk resource of individual sub-containers. Not sure whether that is expected to be part of the nesting semantics. > Make the disk/xfs isolator nesting aware. > - > > Key: MESOS-6196 > URL: https://issues.apache.org/jira/browse/MESOS-6196 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota
[ https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668934#comment-15668934 ] James Peach commented on MESOS-6575: A significant benefit of the {{disk/xfs}} isolator is that it doesn't kill the task, so I'm not very supportive of this. I suppose that it could be implemented as an additional feature flag, but I'm not sure why you would want this. IMHO the behavior of the {{disk/du}} isolator is pretty undesirable. > Change `disk/xfs` isolator to terminate executor when it exceeds quota > -- > > Key: MESOS-6575 > URL: https://issues.apache.org/jira/browse/MESOS-6575 > Project: Mesos > Issue Type: Task > Components: isolation, slave >Reporter: Santhosh Kumar Shanmugham > > Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf > when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on > XFS's internal quota enforcement, silently fails the {{write}} operation, > that causes the quota limit to be exceeded, without surfacing the quota > breach information. > This task is to change the `disk/xfs` isolator so that, a > {{ContainerLimitation}} message is triggered when the quota is exceeded. > This feature will rely on the underlying filesystem being mounted with > {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes > a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the > isolator can track the disk quota via {{xfs_quota}}, very much like > {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface > the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, > causing the executor to be terminated. This feature can then be turned on/off > via the existing {{enforce_container_disk_quota}} option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5393) XFS disk isolator should disallow sandbox writes when no 'disk' is used in executor/task
[ https://issues.apache.org/jira/browse/MESOS-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-5393: -- Assignee: James Peach > XFS disk isolator should disallow sandbox writes when no 'disk' is used in > executor/task > > > Key: MESOS-5393 > URL: https://issues.apache.org/jira/browse/MESOS-5393 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Yan Xu >Assignee: James Peach > > This is similar to MESOS-5081 and was left as a TODO in the first patch for > the XFS isolator. > {noformat:title=} > // TODO(jpeach) If there's no disk resource attached, we should set the > // minimum quota (1 block), since a zero quota would be unconstrained. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5158) Provide XFS quota support for persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-5158: -- Assignee: James Peach > Provide XFS quota support for persistent volumes. > - > > Key: MESOS-5158 > URL: https://issues.apache.org/jira/browse/MESOS-5158 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu >Assignee: James Peach > > Given that the lifecycle of persistent volumes is managed outside of the > isolator, we may need to further abstract out the quota management > functionality to do it outside the XFS isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5116) Investigate supporting accounting only mode in XFS isolator
[ https://issues.apache.org/jira/browse/MESOS-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-5116: -- Assignee: James Peach > Investigate supporting accounting only mode in XFS isolator > --- > > Key: MESOS-5116 > URL: https://issues.apache.org/jira/browse/MESOS-5116 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu >Assignee: James Peach > > The initial implementation of XFS isolator always enforces the disk quota > limit. In contrast, Posix disk isolator supports optionally monitoring the > disk usage without enforcement. This eases the transition into disk quota > enforcement mode. > Mesos agent provides a {{flags.enforce_container_disk_quota}} flag to turn on > enforcement when the Posix isolator is added. With XFS either we support it > as well or we need to change the flag so it's Posix disk isolator specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6596) Dynamic reservation endpoint returns 409s
[ https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Thakar updated MESOS-6596: Description: The operation to dynamically reserve a host for a framework consistently fails, but succeeds sometimes. We are calling the /reserve endpoint on the master with the same payload and it mostly returns 409, with the occasional success. Pasting the output of two consecutive /reserve calls: {code} * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 409 Conflict HTTP/1.1 409 Conflict < Date: Tue, 15 Nov 2016 23:07:10 GMT Date: Tue, 15 Nov 2016 23:07:10 GMT < Content-Type: text/plain; charset=utf-8 Content-Type: text/plain; charset=utf-8 < Content-Length: 58 Content-Length: 58 * HTTP error before end of send, stop sending < * Closing connection #0 Invalid RESERVE Operation: does not contain mem(*):120621 {code} {code} * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 202 Accepted HTTP/1.1 202 Accepted < Date: Tue, 15 Nov 2016 23:07:16 GMT Date: Tue, 15 Nov 2016 23:07:16 GMT < Content-Length: 0 Content-Length: 0 {code} was: The operation to dynamically reserve a host for a framework consistently fails, but succeeds sometimes. We are calling the /reserve endpoint on the master with the same payload and it mostly returns 409, with the occasional success. Pasting the output of two consecutive /reserve calls: ``` * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 409 Conflict HTTP/1.1 409 Conflict < Date: Tue, 15 Nov 2016 23:07:10 GMT Date: Tue, 15 Nov 2016 23:07:10 GMT < Content-Type: text/plain; charset=utf-8 Content-Type: text/plain; charset=utf-8 < Content-Length: 58 Content-Length: 58 * HTTP error before end of send, stop sending < * Closing connection #0 Invalid RESERVE Operation: does not contain mem(*):120621 ``` ``` * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 202 Accepted HTTP/1.1 202 Accepted < Date: Tue, 15 Nov 2016 23:07:16 GMT Date: Tue, 15 Nov 2016 23:07:16 GMT < Content-Length: 0 Content-Length: 0 ``` > Dynamic reservation endpoint returns 409s > - > > Key: MESOS-6596 > URL: https://issues.apache.org/jira/browse/MESOS-6596 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Kunal Thakar > > The operation to dynamically reserve a host for a framework consistently > fails, but succeeds sometimes. > We are calling the /reserve endpoint on the master with the same payload and > it mostly returns 409, with the occasional success. Pasting the output of two > consecutive /reserve calls: > {code} > * About to connect() to computexxx-yyy port 5050 (#0) > * Trying 10.184.21.3... connected > * Server auth using Basic with user 'cassandra' > > POST /master/reserve HTTP/1.1 > > Authorization: Basic blah > > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > > Host: computexxx-yyy:5050 > > Accept: */* > > Content-Length: 1046 > > Content-Type: application/x-www-form-urlencoded > > Expect: 100-continue > > > * Done waiting
[jira] [Created] (MESOS-6596) Dynamic reservation endpoint returns 409s
Kunal Thakar created MESOS-6596: --- Summary: Dynamic reservation endpoint returns 409s Key: MESOS-6596 URL: https://issues.apache.org/jira/browse/MESOS-6596 Project: Mesos Issue Type: Bug Components: master Reporter: Kunal Thakar The operation to dynamically reserve a host for a framework consistently fails, but succeeds sometimes. We are calling the /reserve endpoint on the master with the same payload and it mostly returns 409, with the occasional success. Pasting the output of two consecutive /reserve calls: ``` * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 409 Conflict HTTP/1.1 409 Conflict < Date: Tue, 15 Nov 2016 23:07:10 GMT Date: Tue, 15 Nov 2016 23:07:10 GMT < Content-Type: text/plain; charset=utf-8 Content-Type: text/plain; charset=utf-8 < Content-Length: 58 Content-Length: 58 * HTTP error before end of send, stop sending < * Closing connection #0 Invalid RESERVE Operation: does not contain mem(*):120621 ``` ``` * About to connect() to computexxx-yyy port 5050 (#0) * Trying 10.184.21.3... connected * Server auth using Basic with user 'cassandra' > POST /master/reserve HTTP/1.1 > Authorization: Basic blah > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j > zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: computexxx-yyy:5050 > Accept: */* > Content-Length: 1046 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue < HTTP/1.1 202 Accepted HTTP/1.1 202 Accepted < Date: Tue, 15 Nov 2016 23:07:16 GMT Date: Tue, 15 Nov 2016 23:07:16 GMT < Content-Length: 0 Content-Length: 0 ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6595) As a Mesos user I want to launch processes that will run on every node in the cluster
[ https://issues.apache.org/jira/browse/MESOS-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668408#comment-15668408 ] James DeFelice commented on MESOS-6595: --- Marathon users have been asking for this feature for .. years: https://github.com/mesosphere/marathon/issues/846 > As a Mesos user I want to launch processes that will run on every node in the > cluster > - > > Key: MESOS-6595 > URL: https://issues.apache.org/jira/browse/MESOS-6595 > Project: Mesos > Issue Type: Story >Reporter: James DeFelice > Labels: mesosphere > > Some applicable use cases: > - log collection > - metrics and monitoring > - service discovery > It might also be useful to break this functionality down into: daemon > processes for master nodes vs. daemon processes for agent nodes. > There was some initial discussion and back-of-the-napkin design for this at > Mesoscon this past year (with an emphasis on agent nodes) but I'm not aware > that anything significant materialized from that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6595) As a Mesos user I want to launch processes that will run on every node in the cluster
James DeFelice created MESOS-6595: - Summary: As a Mesos user I want to launch processes that will run on every node in the cluster Key: MESOS-6595 URL: https://issues.apache.org/jira/browse/MESOS-6595 Project: Mesos Issue Type: Story Reporter: James DeFelice Some applicable use cases: - log collection - metrics and monitoring - service discovery It might also be useful to break this functionality down into: daemon processes for master nodes vs. daemon processes for agent nodes. There was some initial discussion and back-of-the-napkin design for this at Mesoscon this past year (with an emphasis on agent nodes) but I'm not aware that anything significant materialized from that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6594) Add `Containerizer::attach()` API call
Vinod Kone created MESOS-6594: - Summary: Add `Containerizer::attach()` API call Key: MESOS-6594 URL: https://issues.apache.org/jira/browse/MESOS-6594 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone This ticket just tracks the API change but not the actual implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6546) Update the Containerizer to handle attachInput and attachOutput calls.
[ https://issues.apache.org/jira/browse/MESOS-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-6546: -- Summary: Update the Containerizer to handle attachInput and attachOutput calls. (was: Update the Containerizer API to include attachInput and attachOutput calls.) > Update the Containerizer to handle attachInput and attachOutput calls. > -- > > Key: MESOS-6546 > URL: https://issues.apache.org/jira/browse/MESOS-6546 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > With the per-container I/O switchboard we are adding, the containerizer > should be responsible for both launching the I/O switchboard process, as well > as allowing external components to interface with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6162) Add support for cgroups blkio subsystem
[ https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6162: Assignee: Jason Lai (was: Zhitao Li) > Add support for cgroups blkio subsystem > --- > > Key: MESOS-6162 > URL: https://issues.apache.org/jira/browse/MESOS-6162 > Project: Mesos > Issue Type: Task >Reporter: haosdent >Assignee: Jason Lai > > Noted that cgroups blkio subsystem may have performance issue, refer to > https://github.com/opencontainers/runc/issues/861 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6590) Update protobuf for cgroups blkio subsystem
[ https://issues.apache.org/jira/browse/MESOS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6590: Assignee: Jason Lai > Update protobuf for cgroups blkio subsystem > --- > > Key: MESOS-6590 > URL: https://issues.apache.org/jira/browse/MESOS-6590 > Project: Mesos > Issue Type: Task >Reporter: haosdent >Assignee: Jason Lai > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6588) LinuxRootfs misses required files
[ https://issues.apache.org/jira/browse/MESOS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667754#comment-15667754 ] James Peach edited comment on MESOS-6588 at 11/15/16 5:37 PM: -- |Move containerizer Rootfs support to a cpp file. |[https://reviews.apache.org/r/53790|https://reviews.apache.org/r/53790] | |Use the stout ELF parser to collect Linux rootfs files. |[https://reviews.apache.org/r/53791|https://reviews.apache.org/r/53791] | was (Author: jamespeach): |Move containerizer Rootfs support to a cpp file. |https://reviews.apache.org/r/53790 | |Use the stout ELF parser to collect Linux rootfs files. |https://reviews.apache.org/r/53791 | > LinuxRootfs misses required files > - > > Key: MESOS-6588 > URL: https://issues.apache.org/jira/browse/MESOS-6588 > Project: Mesos > Issue Type: Bug > Components: containerization, tests >Reporter: James Peach >Assignee: James Peach > > The hard-coded list of required files in > {{src/tests/containerizer/rootfs.hpp}} is out of date for Fedora 24. F24 now > requires {{libtinfo.so.6}} and {{/lib64/libcrypto.so.10}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6558) Added stub classes for rest cgroups subsystems.
[ https://issues.apache.org/jira/browse/MESOS-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-6558: --- Assignee: haosdent > Added stub classes for rest cgroups subsystems. > --- > > Key: MESOS-6558 > URL: https://issues.apache.org/jira/browse/MESOS-6558 > Project: Mesos > Issue Type: Task >Reporter: haosdent >Assignee: haosdent > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6593) Update protobuf for cgroups net_prio subsystem
haosdent created MESOS-6593: --- Summary: Update protobuf for cgroups net_prio subsystem Key: MESOS-6593 URL: https://issues.apache.org/jira/browse/MESOS-6593 Project: Mesos Issue Type: Task Reporter: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6591) Update protobuf for cgroups pids subsystem
haosdent created MESOS-6591: --- Summary: Update protobuf for cgroups pids subsystem Key: MESOS-6591 URL: https://issues.apache.org/jira/browse/MESOS-6591 Project: Mesos Issue Type: Task Reporter: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6592) Update protobuf for cgroups cpuset subsystem
haosdent created MESOS-6592: --- Summary: Update protobuf for cgroups cpuset subsystem Key: MESOS-6592 URL: https://issues.apache.org/jira/browse/MESOS-6592 Project: Mesos Issue Type: Task Reporter: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6590) Update protobuf for cgroups blkio subsystem
haosdent created MESOS-6590: --- Summary: Update protobuf for cgroups blkio subsystem Key: MESOS-6590 URL: https://issues.apache.org/jira/browse/MESOS-6590 Project: Mesos Issue Type: Task Reporter: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6567) Actively Scan for CNI Configurations
[ https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667560#comment-15667560 ] Dan Osborne commented on MESOS-6567: Ah I didn't interpret "existing" correctly in your post. Makes sense. So since there already is a filescan happening at container runtime, it hopefully shouldn't be too difficult to expand it to networks that don't already exist too. > Actively Scan for CNI Configurations > > > Key: MESOS-6567 > URL: https://issues.apache.org/jira/browse/MESOS-6567 > Project: Mesos > Issue Type: Improvement >Reporter: Dan Osborne > > Mesos-Agent currently loads the CNI configs into memory at startup. After > this point, new configurations that are added will remain unknown to the > Mesos Agent process until it is restarted. > This ticket is to request that the Mesos Agent process can the CNI config > directory each time it is networking a task, so that modifying, adding, and > removing networks will not require a slave reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655590#comment-15655590 ] Megha edited comment on MESOS-6223 at 11/15/16 3:39 PM: [~neilc] Here, I am analyzing the impact of allowing agent to recover post reboot in the context of partition awareness. In my understanding there is no new transition which is not already happening with partition awareness. Do you think there could be a risk involved in allowing the recovery post reboots. 1. If there are no partition-aware frameworks on the agent: Agent while rebooting could either be disconnected or may fail the master health check timeout. The executors don't re-register as they have exited because of the reboot. Agent re-registers and starts to send status updates for unacked updates. From the framework's point of view the transition is simply TASK_STARTING/TASK_RUNNING -> TASK_LOST. 2. If there are tasks from partition aware frameworks on the agent: a. The transition is same as above if the agent is disconnected. b. If the agent is marked unreachable while it was rebooting then from the framework's point of view, the tasks transition from TASK_UNREACHABLE -> TASK_GONE when the agent re-registers and send status updates. Since the unreachable agents are in registry so master will remember them across its failovers so if the agent doesn't come back then frameworks will receive TASK_UNREACHABLE update upon reconciliation unless the registry is purged. c. If the agent is marked gone then the master is going to send TASK_GONE_BY_OPERATOR and if such an agent doesn't come back then future framework reconciliations will result in TASK_UNKNOWN status update since these there is no gone registry so the agents won't be remembered across master failovers. And if the agent eventually comes back then the task could transition from TASK_UNKNOWN back to TASK_GONE. was (Author: megha.sharma): [~neilc] Here, I am analyzing the impact of allowing agent to recover post reboot in the context of partition awareness. In my understanding there is no new transition which is not already happening with partition awareness. Do you think there could be a risk involved in allowing the recovery post reboots. 1. If there are no partition-aware frameworks on the agent: Agent while rebooting could either be disconnected or may fail the master health check timeout. The executors don't re-register as they have exited because of the reboot. Agent re-registers and starts to send status updates for unacked updates. From the framework's point of view the transition is simply TASK_STARTING/TASK_RUNNING -> TASK_LOST. 2. If there are tasks from partition aware frameworks on the agent: a. The transition is same as above if the agent is disconnected. b. If the agent is marked unreachable while it was rebooting then from the framework's point of view, the tasks transition from TASK_UNREACHABLE -> TASK_GONE when the agent re-registers and send status updates. Since the unreachable agents are in registry so master will remember them across its failovers so if the agent doesn't come back then frameworks will receive TASK_UNREACHABLE update upon reconciliation unless the registry is purged. c. If the agent is marked gone then the master sends TASK_GONE and if such an agent doesn't come back then future framework reconciliations will result in TASK_UNKNOWN status update since these there is no gone registry so the agents won't be remembered across master failovers. And if the agent eventually comes back then the task could transition from TASK_UNKNOWN back to TASK_GONE. > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Megha >Assignee: Megha > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support
[ https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5966: - Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Add libprocess HTTP tests with SSL support > -- > > Key: MESOS-5966 > URL: https://issues.apache.org/jira/browse/MESOS-5966 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > Libprocess contains SSL unit tests which test our SSL support using simple > sockets. We should add tests which also make use of libprocess's various HTTP > classes and helpers in a variety of SSL configurations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.
[ https://issues.apache.org/jira/browse/MESOS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6395: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > HealthChecker sends updates to executor via libprocess messaging. > - > > Key: MESOS-6395 > URL: https://issues.apache.org/jira/browse/MESOS-6395 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: health-check, mesosphere > > Currently {{HealthChecker}} sends status updates via libprocess messaging to > the executor's UPID. This seems unnecessary after refactoring health checker > into the library: a simple callback will do. Moreover, not requiring > executor's {{UPID}} will simplify creating a mocked {{HealthChecker}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6477) Build a standalone python client for connecting to our Mock HTTP Server that implements the new Debug APIs
[ https://issues.apache.org/jira/browse/MESOS-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6477: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Build a standalone python client for connecting to our Mock HTTP Server that > implements the new Debug APIs > -- > > Key: MESOS-6477 > URL: https://issues.apache.org/jira/browse/MESOS-6477 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Steven Locke > Labels: debugging, mesosphere > > This client prototype should have a similar CLI to what we eventually want to > build into the Mesos or DC/OS CLI. > {noformat} > Streaming HTTP Client > Usage: > client task exec [--tty] [--interactive] [...] > client task attach [--tty] [--interactive] > Options: > --tty Allocate a tty on the server before > attaching to the container. > --interactive Connect the stdin of the client to > the stdin of the container. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6366) Design doc for executor authentication
[ https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6366: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Design doc for executor authentication > -- > > Key: MESOS-6366 > URL: https://issues.apache.org/jira/browse/MESOS-6366 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6476) Build a Mock HTTP Server that implements the new Debugging API calls
[ https://issues.apache.org/jira/browse/MESOS-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6476: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Build a Mock HTTP Server that implements the new Debugging API calls > > > Key: MESOS-6476 > URL: https://issues.apache.org/jira/browse/MESOS-6476 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Steven Locke > Labels: debugging, mesosphere > > The mock server should simply launch a process to run whatever command is > passed to it, rather than attempt to launch an actual nested container in > mesos. However, it should do everything necessary to deal with attaching a > {{pty}} / redirecting {{stdin/stdout/stderr}} properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6335) Add user doc for task group tasks
[ https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6335: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Add user doc for task group tasks > - > > Key: MESOS-6335 > URL: https://issues.apache.org/jira/browse/MESOS-6335 > Project: Mesos > Issue Type: Documentation >Reporter: Vinod Kone >Assignee: Gilbert Song > > Committed some basic documentation. So moving this to pods-improvements epic > and targeting this for 1.2.0. I would like this to track the more > comprehensive documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled
[ https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3753: - Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Test the HTTP Scheduler library with SSL enabled > > > Key: MESOS-3753 > URL: https://issues.apache.org/jira/browse/MESOS-3753 > Project: Mesos > Issue Type: Story > Components: framework, HTTP API, test >Reporter: Joseph Wu >Assignee: Greg Mann > Labels: mesosphere, security > > Currently, the HTTP Scheduler library does not support SSL-enabled Mesos. > (You can manually test this by spinning up an SSL-enabled master and attempt > to run the event-call framework example against it.) > We need to add tests that check the HTTP Scheduler library against > SSL-enabled Mesos: > * with downgrade support, > * with required framework/client-side certifications, > * with/without verification of certificates (master-side), > * with/without verification of certificates (framework-side), > * with a custom certificate authority (CA) > These options should be controlled by the same environment variables found on > the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/]. > Note: This issue will be broken down into smaller sub-issues as bugs/problems > are discovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5900) Support Unix domain socket connections in libprocess
[ https://issues.apache.org/jira/browse/MESOS-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5900: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Support Unix domain socket connections in libprocess > > > Key: MESOS-5900 > URL: https://issues.apache.org/jira/browse/MESOS-5900 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Neil Conway >Assignee: Benjamin Hindman > Labels: mesosphere > > We should consider allowing two programs on the same host using libprocess to > communicate via Unix domain sockets rather than TCP. This has a few > advantages: > * Security: remote hosts cannot connect to the Unix socket. Domain sockets > also offer additional support for > [authentication|https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Authentication-UNIX_Domain.html]. > * Performance: domain sockets are marginally faster than localhost TCP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.
[ https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6193: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Make the docker/volume isolator nesting aware. > -- > > Key: MESOS-6193 > URL: https://issues.apache.org/jira/browse/MESOS-6193 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Gilbert Song > Labels: isolator, mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6466) Add support for streaming HTTP requests in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6466: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Add support for streaming HTTP requests in Mesos > > > Key: MESOS-6466 > URL: https://issues.apache.org/jira/browse/MESOS-6466 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Anand Mazumdar > Labels: debugging, mesosphere > > We already have support for streaming HTTP responses in Mesos. We now also > need to add support for streaming HTTP requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.
[ https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6291: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Add unit tests for nested container case for filesystem/linux isolator. > --- > > Key: MESOS-6291 > URL: https://issues.apache.org/jira/browse/MESOS-6291 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: isolator, mesosphere > > Parameterize the existing tests so that all works for both top level > container and nested container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.
[ https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6292: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Add unit tests for nested container case for docker/runtime isolator. > - > > Key: MESOS-6292 > URL: https://issues.apache.org/jira/browse/MESOS-6292 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: isolator, mesosphere > > Launch nested containers with different container images specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5597) Document Mesos "health check" feature.
[ https://issues.apache.org/jira/browse/MESOS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5597: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Document Mesos "health check" feature. > -- > > Key: MESOS-5597 > URL: https://issues.apache.org/jira/browse/MESOS-5597 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Assignee: Alexander Rukletsov > Labels: documentation, health-check, mesosphere > > We don't talk about this feature at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.
[ https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5963: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > HealthChecker should not decide when to kill tasks and when to stop > performing health checks. > - > > Key: MESOS-5963 > URL: https://issues.apache.org/jira/browse/MESOS-5963 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: health-check, mesosphere > > Currently, {{HealthChecker}} library decides when a task should be killed > based on its health status. Moreover, it stops checking it health after that. > This seems unfortunate, because it's up to the executor and / or framework to > decide both when to kill tasks and when to health check them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5856) Logrotate ContainerLogger module does not rotate logs when run as root with `--switch_user`.
[ https://issues.apache.org/jira/browse/MESOS-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5856: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Logrotate ContainerLogger module does not rotate logs when run as root with > `--switch_user`. > > > Key: MESOS-5856 > URL: https://issues.apache.org/jira/browse/MESOS-5856 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0, 0.28.0, 1.0.0 >Reporter: Joseph Wu >Assignee: Sivaram Kannan >Priority: Critical > Labels: logger, mesosphere, newbie > > The logrotate ContainerLogger module runs as the agent's user. In most > cases, this is {{root}}. > When {{logrotate}} is run as root, there is an additional check the > configuration files must pass (because a root {{logrotate}} needs to be > secured against non-root modifications to the configuration): > https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815 > Log rotation will fail under the following scenario: > 1) The agent is run with {{--switch_user}} (default: true) > 2) A task is launched with a non-root user specified > 3) The logrotate module spawns a few companion processes (as root) and this > creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and > {{stderr.logrotate.conf}} files (as root). This step races with the next > step. > 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to > the non-root user. Including the files just created. > 5) When {{logrotate}} is run, it will skip any non-root configuration files. > This means the files are not rotated. > > Fix: The logrotate module's companion processes should call {{setuid}} and > {{setgid}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6494) Clean up the flags parsing in the executors.
[ https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6494: - Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 46) > Clean up the flags parsing in the executors. > > > Key: MESOS-6494 > URL: https://issues.apache.org/jira/browse/MESOS-6494 > Project: Mesos > Issue Type: Improvement >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > The current executors and the executor libraries use a mix of `stout::flags` > and `os::getenv` to parse flags, leading to a lot of unnecessary and > sometimes duplicated code. > This should be cleaned up, using only {{stout::flags}} to parse flags. > Environment variables should be used for the flags that are common to ALL the > executors (listed in the Executor HTTP API doc). > Command line parameters should be used for flags that apply only to > individual executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6184: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 46) > Health checks should use a general mechanism to enter namespaces of the task. > - > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Blocker > Labels: health-check, mesosphere > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6348) Allow `network/cni` isolator unit-tests to run with CNI plugins
[ https://issues.apache.org/jira/browse/MESOS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-6348: - Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere Sprint 47 (was: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46) > Allow `network/cni` isolator unit-tests to run with CNI plugins > > > Key: MESOS-6348 > URL: https://issues.apache.org/jira/browse/MESOS-6348 > Project: Mesos > Issue Type: Task >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently, we don't have any infrastructure to allow for CNI plugins to be > used in `network/cni` isolator unit-tests. This forces us to mock CNI plugins > that don't use new network namespaces leading to very restricting form of > unit-tests. > Especially for port-mapper plugin, in order to test its DNAT functionality it > will be very useful if we run the containers in separate network namespace > requiring an actual CNI plugin. > The proposal is there to introduce a test filter called CNIPLUGIN, that gets > set when CNI_PATH env var is set. Tests using the CNIPLUGIN filter can then > use actual CNI plugins in their tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6589) Document DockerInfo.Parameter usage in the docker containerizer document
haosdent created MESOS-6589: --- Summary: Document DockerInfo.Parameter usage in the docker containerizer document Key: MESOS-6589 URL: https://issues.apache.org/jira/browse/MESOS-6589 Project: Mesos Issue Type: Improvement Components: docker, documentation Reporter: haosdent Assignee: haosdent Priority: Minor Some users would like to pass extra parameters when launch docker container by Mesos. Apart from reading the mesos protobuf message, user are not aware of how to do that in Mesos via reading documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6567) Actively Scan for CNI Configurations
[ https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666477#comment-15666477 ] Qian Zhang commented on MESOS-6567: --- Yes, NetworkCniIsolatorProcess::create happens at boot, so picking up new CNI network configuration files requires a reboot of agent. However, the modification to an existing CNI network configuration file can be picked up at runtime, because the CNI network configuration file will be read every time when attaching a new container to a CNI network (https://github.com/apache/mesos/blob/1.0.1/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L969), so it is possible to launch a container to a CNI network, and then modify the configuration file of that CNI network and launch another container to that CNI network with the new configuration. > Actively Scan for CNI Configurations > > > Key: MESOS-6567 > URL: https://issues.apache.org/jira/browse/MESOS-6567 > Project: Mesos > Issue Type: Improvement >Reporter: Dan Osborne > > Mesos-Agent currently loads the CNI configs into memory at startup. After > this point, new configurations that are added will remain unknown to the > Mesos Agent process until it is restarted. > This ticket is to request that the Mesos Agent process can the CNI config > directory each time it is networking a task, so that modifying, adding, and > removing networks will not require a slave reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)