[jira] [Commented] (MESOS-9258) Consider making Mesos subscribers send heartbeats
[ https://issues.apache.org/jira/browse/MESOS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680729#comment-16680729 ] Joseph Wu commented on MESOS-9258: -- Prototype for the max lifetime proposal: https://reviews.apache.org/r/69302/ > Consider making Mesos subscribers send heartbeats > - > > Key: MESOS-9258 > URL: https://issues.apache.org/jira/browse/MESOS-9258 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Gastón Kleiman >Assignee: Joseph Wu >Priority: Critical > Labels: mesosphere > > Some reverse proxies (e.g., ELB using an HTTP listener) won't close the > upstream connection to Mesos when they detect that their client is > disconnected. > This can make Mesos leak subscribers, which generates unnecessary > authorization requests and affects performance. > We should evaluate methods (e.g., heartbeats) to enable Mesos to detect that > a subscriber is gone, even if the TCP connection is still open. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-7564) Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
[ https://issues.apache.org/jira/browse/MESOS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-7564: Assignee: Joseph Wu > Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication. > - > > Key: MESOS-7564 > URL: https://issues.apache.org/jira/browse/MESOS-7564 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Reporter: Anand Mazumdar >Assignee: Joseph Wu >Priority: Critical > Labels: api, mesosphere, v1_api > > Currently, we do not have heartbeats for executor <-> agent communication. > This is especially problematic in scenarios when IPFilters are enabled since > the default conntrack keep alive timeout is 5 days. When that timeout > elapses, the executor doesn't get notified via a socket disconnection when > the agent process restarts. The executor would then get killed if it doesn't > re-register when the agent recovery process is completed. > Enabling application level heartbeats or TCP KeepAlive's can be a possible > way for fixing this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9332) Debug container should run as the same user of its parent container by default
[ https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679488#comment-16679488 ] Qian Zhang commented on MESOS-9332: --- After discussed [~gilbert], we agree that we should actually run nested container (rather than just debug container) as the same user of its parent container by default, so I have updated the above patches accordingly. > Debug container should run as the same user of its parent container by default > -- > > Key: MESOS-9332 > URL: https://issues.apache.org/jira/browse/MESOS-9332 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > Labels: containerizer, mesosphere > > Currently when launching a debug container, by default Mesos agent will use > the executor's user as the debug container's user if the `user` field is not > specified in the debug container's `commandInfo` (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for > details). This is OK for the command task since the command executor's user > is same with command task's user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070] > for details), so the debug container will be launched as the same user of > the task. But for the task in a task group, the default executor's user is > same with the framework user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] > for details), so in this case the debug container will be launched as the > same user of the framework rather than the task. So in a scenario that > framework user is a normal user but the task user is root, the debug > container will be launched as the normal which is not desired, the > expectation is the debug container should run as the same user of the > container it debugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9164) Subprocess should unset CLOEXEC on whitelisted file descriptors.
[ https://issues.apache.org/jira/browse/MESOS-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679389#comment-16679389 ] Qian Zhang commented on MESOS-9164: --- commit d9a02acb8c9440c29811e6f66fe2e1146a04aa52 Author: Qian Zhang Date: Wed Aug 29 10:17:05 2018 +0800 Closed all file descriptors except `whitelist_fds` in posix/subprocess. Review: https://reviews.apache.org/r/68644 commit df0a616e3555767e308a87c787d5ad5cdd4e66c1 Author: Qian Zhang Date: Fri Oct 12 22:04:02 2018 +0800 Added a test `SubprocessTest.WhiteListFds`. Review: https://reviews.apache.org/r/69016 commit bb533b784928bca1553b6ed86d10105de26bb76d Author: Qian Zhang Date: Mon Sep 3 15:09:24 2018 +0800 Updated IO switchboard to use subprocess's `whitelist_fds` parameter. Review: https://reviews.apache.org/r/68645 commit 2455543d7534d2c1491854ff6efff1c75a1c4395 Author: Qian Zhang Date: Mon Sep 3 15:11:51 2018 +0800 Updated launchers to use subprocess's `whitelist_fds` parameter. Review: https://reviews.apache.org/r/68646 commit face988a52b0775f0c3e959d1f164212c1eba96c Author: Qian Zhang Date: Mon Oct 8 16:06:31 2018 +0800 Removed the child hook `UNSET_CLOEXEC`. We do not need this child hook since any file descripters need to unset the close-on-exec flag can be put in the `whitelist_fds` parameter of the `subprocess` method. Review: https://reviews.apache.org/r/68995 > Subprocess should unset CLOEXEC on whitelisted file descriptors. > > > Key: MESOS-9164 > URL: https://issues.apache.org/jira/browse/MESOS-9164 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: James Peach >Assignee: Qian Zhang >Priority: Major > > The libprocess subprocess API accepts a set of whitelisted file descriptors > that are supposed to be inherited to the child process. On windows, these > are used, but otherwise the subprocess API just ignores them. We probably > should make sure that the API clears the {{CLOEXEC}} flag on this descriptors > so that they are inherited to the child. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9381) Design gRPC-based Mesos module interfaces.
Chun-Hung Hsiao created MESOS-9381: -- Summary: Design gRPC-based Mesos module interfaces. Key: MESOS-9381 URL: https://issues.apache.org/jira/browse/MESOS-9381 Project: Mesos Issue Type: Wish Components: modules Reporter: Chun-Hung Hsiao We could consider designing how to have gRPC-based Mesos module interfaces. This will enable users to write their own modules through more language bindings. For synchronous module interfaces, MESOS-7749 already providers the gRPC client support. We could move this to another epic in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9332) Nested container should run as the same user of its parent container by default
[ https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679501#comment-16679501 ] Qian Zhang commented on MESOS-9332: --- commit be494213083b27bc768c919f3df1df2bca899955 Author: Qian Zhang Date: Fri Oct 26 09:23:27 2018 +0800 Made nested container runs as its parent container's user by default. Review: https://reviews.apache.org/r/69234 commit 4e00b663910ac3a37dd86e454acadb78dba1322a Author: Qian Zhang Date: Wed Oct 31 17:18:18 2018 -0700 Added a test `ROOT_UNPRIVILEGED_USER_DefaultExecutorCommandHealthCheck`. Review: https://reviews.apache.org/r/69235 commit 05e2cb58dde866b67955304417804bee684d5817 Author: Qian Zhang Date: Thu Nov 1 13:35:49 2018 -0700 Fixed a coding error that a test waited on a wrong task status update. Review: https://reviews.apache.org/r/69236 > Nested container should run as the same user of its parent container by > default > --- > > Key: MESOS-9332 > URL: https://issues.apache.org/jira/browse/MESOS-9332 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > Labels: containerizer, mesosphere > > Currently when launching a debug container, by default Mesos agent will use > the executor's user as the debug container's user if the `user` field is not > specified in the debug container's `commandInfo` (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for > details). This is OK for the command task since the command executor's user > is same with command task's user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070] > for details), so the debug container will be launched as the same user of > the task. But for the task in a task group, the default executor's user is > same with the framework user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] > for details), so in this case the debug container will be launched as the > same user of the framework rather than the task. So in a scenario that > framework user is a normal user but the task user is root, the debug > container will be launched as the normal which is not desired, the > expectation is the debug container should run as the same user of the > container it debugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9332) Debug container should run as the same user of its parent container by default
[ https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679488#comment-16679488 ] Qian Zhang edited comment on MESOS-9332 at 11/8/18 9:26 AM: After discussed [~gilbert], we agree that we should actually run nested container (rather than just debug container) as the same user of its parent container by default, so I have updated the above patches and also the summary of this ticket accordingly. was (Author: qianzhang): After discussed [~gilbert], we agree that we should actually run nested container (rather than just debug container) as the same user of its parent container by default, so I have updated the above patches accordingly. > Debug container should run as the same user of its parent container by default > -- > > Key: MESOS-9332 > URL: https://issues.apache.org/jira/browse/MESOS-9332 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > Labels: containerizer, mesosphere > > Currently when launching a debug container, by default Mesos agent will use > the executor's user as the debug container's user if the `user` field is not > specified in the debug container's `commandInfo` (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for > details). This is OK for the command task since the command executor's user > is same with command task's user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070] > for details), so the debug container will be launched as the same user of > the task. But for the task in a task group, the default executor's user is > same with the framework user (see [this > code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] > for details), so in this case the debug container will be launched as the > same user of the framework rather than the task. So in a scenario that > framework user is a normal user but the task user is root, the debug > container will be launched as the normal which is not desired, the > expectation is the debug container should run as the same user of the > container it debugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9380) Support v1 Executor API over gRPC.
Chun-Hung Hsiao created MESOS-9380: -- Summary: Support v1 Executor API over gRPC. Key: MESOS-9380 URL: https://issues.apache.org/jira/browse/MESOS-9380 Project: Mesos Issue Type: Task Components: agent Reporter: Chun-Hung Hsiao Supporting v1 Executor API over gRPC will enable people to write custom executors with more language bindings. The main work includes: 1. Define the Executor gRPC service in {{service.proto}}. The proto will be in proto3, but the request and response messages will still be in proto2, since this is what the current v1 API based on. 2. Refactor the agent code to support both HTTP and gRPC connections to reuse most of the current code for HTTP Executor API. 3. Implement handlers for gRPC calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9379) Support v1 Operator API over gRPC.
Chun-Hung Hsiao created MESOS-9379: -- Summary: Support v1 Operator API over gRPC. Key: MESOS-9379 URL: https://issues.apache.org/jira/browse/MESOS-9379 Project: Mesos Issue Type: Task Components: master Reporter: Chun-Hung Hsiao Supporting v1 Operator API over gRPC will enable people to interact with Mesos through more language bindings. The main work includes: 1. Define the Operator gRPC service in {{service.proto}}. The proto will be in proto3, but the request and response messages will still be in proto2, since this is what the current v1 API based on. 2. Refactor the master code to support both HTTP and gRPC connections to reuse most of the current code for HTTP Operator API. 3. Implement handlers for gRPC calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9378) Support v1 Agent API over gRPC.
Chun-Hung Hsiao created MESOS-9378: -- Summary: Support v1 Agent API over gRPC. Key: MESOS-9378 URL: https://issues.apache.org/jira/browse/MESOS-9378 Project: Mesos Issue Type: Task Components: agent Reporter: Chun-Hung Hsiao Supporting v1 Agent API over gRPC will enable people to query the agent with more language bindings, and the UI can be based on gRPC. The main work includes: 1. Define the Agent gRPC service in {{service.proto}}. The proto will be in proto3, but the request and response messages will still be in proto2, since this is what the current v1 API based on. 2. Implement handlers for gRPC calls to delegate them to the original agent call handlers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9372) Support V1 API through GRPC.
Chun-Hung Hsiao created MESOS-9372: -- Summary: Support V1 API through GRPC. Key: MESOS-9372 URL: https://issues.apache.org/jira/browse/MESOS-9372 Project: Mesos Issue Type: Epic Reporter: Chun-Hung Hsiao Supporting V1 API over GRPC would make it easier for people to adapt the V1 API, as the current HTTP API is not easy to use, and GRPC can generate different language bindings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9377) Support v1 Master API over gRPC.
Chun-Hung Hsiao created MESOS-9377: -- Summary: Support v1 Master API over gRPC. Key: MESOS-9377 URL: https://issues.apache.org/jira/browse/MESOS-9377 Project: Mesos Issue Type: Task Components: master Reporter: Chun-Hung Hsiao Supporting v1 Master API over gRPC will enable people to query the master with more language bindings, and the UI can be based on gRPC. The main work includes: 1. Define the Master gRPC service in service.proto. The proto will be in proto3, but the request and response messages will still be in proto2, since this is what the current v1 API based on. 2. Implement handlers for gRPC calls to delegate them to the original master call handlers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9374) gRPC server support for unary calls in libprocess.
Chun-Hung Hsiao created MESOS-9374: -- Summary: gRPC server support for unary calls in libprocess. Key: MESOS-9374 URL: https://issues.apache.org/jira/browse/MESOS-9374 Project: Mesos Issue Type: Task Components: libprocess Reporter: Chun-Hung Hsiao Supporting gRPC server for unary calls will enable using Mesos synchronous API (such as the master and agent API) over gRPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9376) Support v1 Scheduler API over gRPC.
Chun-Hung Hsiao created MESOS-9376: -- Summary: Support v1 Scheduler API over gRPC. Key: MESOS-9376 URL: https://issues.apache.org/jira/browse/MESOS-9376 Project: Mesos Issue Type: Task Components: scheduler api Reporter: Chun-Hung Hsiao Supporting v1 Scheduler API over gRPC will enable people to write frameworks with more language bindings. The main work includes: 1. Define the Scheduler gRPC service in {{service.proto}}. The proto will be in proto3, but the request and response messages will still be in proto2, since this is what the current v1 API based on. 2. Refactor the master code to support both HTTP and gRPC connections to reuse most of the current code for HTTP Scheduler API. 3. Implement handlers for gRPC calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9373) Design gRPC server and streaming support in Libprocess.
Chun-Hung Hsiao created MESOS-9373: -- Summary: Design gRPC server and streaming support in Libprocess. Key: MESOS-9373 URL: https://issues.apache.org/jira/browse/MESOS-9373 Project: Mesos Issue Type: Task Components: libprocess Reporter: Chun-Hung Hsiao Currently libprocess only supports gRPC client for unary gRPC calls through MESOS-7749. To have full gRPC support to enable V1 gRPC API, we have to lay out the design for: 1. gRPC server support for unary gRPC calls. 2. gRPC server-to-client support. Optionally, we could consider support the following: 3. gRPC client-to-server streaming support. Mesos API does not use this pattern currently. 4. gRPC bi-directional streaming support. Only very few API calls use this so not as important as above. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9375) Support gRPC server streaming calls in libprocess.
Chun-Hung Hsiao created MESOS-9375: -- Summary: Support gRPC server streaming calls in libprocess. Key: MESOS-9375 URL: https://issues.apache.org/jira/browse/MESOS-9375 Project: Mesos Issue Type: Task Components: libprocess Reporter: Chun-Hung Hsiao Supporting gRPC server for server streaming calls will enable using Mesos streaming API (such as the scheduler and operator API) over gRPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)