[jira] [Commented] (MESOS-9258) Consider making Mesos subscribers send heartbeats

2018-11-08 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680729#comment-16680729
 ] 

Joseph Wu commented on MESOS-9258:
--

Prototype for the max lifetime proposal:
https://reviews.apache.org/r/69302/

> Consider making Mesos subscribers send heartbeats
> -
>
> Key: MESOS-9258
> URL: https://issues.apache.org/jira/browse/MESOS-9258
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Gastón Kleiman
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Some reverse proxies (e.g., ELB using an HTTP listener) won't close the 
> upstream connection to Mesos when they detect that their client is 
> disconnected.
> This can make Mesos leak subscribers, which generates unnecessary 
> authorization requests and affects performance.
> We should evaluate methods (e.g., heartbeats) to enable Mesos to detect that 
> a subscriber is gone, even if the TCP connection is still open.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7564) Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.

2018-11-08 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-7564:


Assignee: Joseph Wu

> Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
> -
>
> Key: MESOS-7564
> URL: https://issues.apache.org/jira/browse/MESOS-7564
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Reporter: Anand Mazumdar
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: api, mesosphere, v1_api
>
> Currently, we do not have heartbeats for executor <-> agent communication. 
> This is especially problematic in scenarios when IPFilters are enabled since 
> the default conntrack keep alive timeout is 5 days. When that timeout 
> elapses, the executor doesn't get notified via a socket disconnection when 
> the agent process restarts. The executor would then get killed if it doesn't 
> re-register when the agent recovery process is completed.
> Enabling application level heartbeats or TCP KeepAlive's can be a possible 
> way for fixing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9332) Debug container should run as the same user of its parent container by default

2018-11-08 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679488#comment-16679488
 ] 

Qian Zhang commented on MESOS-9332:
---

After discussed [~gilbert], we agree that we should actually run nested 
container (rather than just debug container) as the same user of its parent 
container by default, so I have updated the above patches accordingly.

> Debug container should run as the same user of its parent container by default
> --
>
> Key: MESOS-9332
> URL: https://issues.apache.org/jira/browse/MESOS-9332
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>  Labels: containerizer, mesosphere
>
> Currently when launching a debug container, by default Mesos agent will use 
> the executor's user as the debug container's user if the `user` field is not 
> specified in the debug container's `commandInfo` (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for 
> details). This is OK for the command task since the command executor's user 
> is same with command task's user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070]
>  for details), so the debug container will be launched as the same user of 
> the task. But for the task in a task group, the default executor's user is 
> same with the framework user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] 
> for details), so in this case the debug container will be launched as the 
> same user of the framework rather than the task. So in a scenario that 
> framework user is a normal user but the task user is root, the debug 
> container will be launched as the normal which is not desired, the 
> expectation is the debug container should run as the same user of the 
> container it debugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9164) Subprocess should unset CLOEXEC on whitelisted file descriptors.

2018-11-08 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679389#comment-16679389
 ] 

Qian Zhang commented on MESOS-9164:
---

commit d9a02acb8c9440c29811e6f66fe2e1146a04aa52
Author: Qian Zhang 
Date:   Wed Aug 29 10:17:05 2018 +0800

Closed all file descriptors except `whitelist_fds` in posix/subprocess.

Review: https://reviews.apache.org/r/68644

commit df0a616e3555767e308a87c787d5ad5cdd4e66c1
Author: Qian Zhang 
Date:   Fri Oct 12 22:04:02 2018 +0800

Added a test `SubprocessTest.WhiteListFds`.

Review: https://reviews.apache.org/r/69016

commit bb533b784928bca1553b6ed86d10105de26bb76d
Author: Qian Zhang 
Date:   Mon Sep 3 15:09:24 2018 +0800

Updated IO switchboard to use subprocess's `whitelist_fds` parameter.

Review: https://reviews.apache.org/r/68645

commit 2455543d7534d2c1491854ff6efff1c75a1c4395
Author: Qian Zhang 
Date:   Mon Sep 3 15:11:51 2018 +0800

Updated launchers to use subprocess's `whitelist_fds` parameter.

Review: https://reviews.apache.org/r/68646

commit face988a52b0775f0c3e959d1f164212c1eba96c
Author: Qian Zhang 
Date:   Mon Oct 8 16:06:31 2018 +0800

Removed the child hook `UNSET_CLOEXEC`.

We do not need this child hook since any file descripters need
to unset the close-on-exec flag can be put in the `whitelist_fds`
parameter of the `subprocess` method.

Review: https://reviews.apache.org/r/68995

> Subprocess should unset CLOEXEC on whitelisted file descriptors.
> 
>
> Key: MESOS-9164
> URL: https://issues.apache.org/jira/browse/MESOS-9164
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>Assignee: Qian Zhang
>Priority: Major
>
> The libprocess subprocess API accepts a set of whitelisted file descriptors 
> that are supposed to  be inherited to the child process. On windows, these 
> are used, but otherwise the subprocess API just ignores them. We probably 
> should make sure that the API clears the {{CLOEXEC}} flag on this descriptors 
> so that they are inherited to the child.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9381) Design gRPC-based Mesos module interfaces.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9381:
--

 Summary: Design gRPC-based Mesos module interfaces.
 Key: MESOS-9381
 URL: https://issues.apache.org/jira/browse/MESOS-9381
 Project: Mesos
  Issue Type: Wish
  Components: modules
Reporter: Chun-Hung Hsiao


We could consider designing how to have gRPC-based Mesos module interfaces. 
This will enable users to write their own modules through more language 
bindings. For synchronous module interfaces, MESOS-7749 already providers the 
gRPC client support.

We could move this to another epic in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9332) Nested container should run as the same user of its parent container by default

2018-11-08 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679501#comment-16679501
 ] 

Qian Zhang commented on MESOS-9332:
---

commit be494213083b27bc768c919f3df1df2bca899955
Author: Qian Zhang 
Date:   Fri Oct 26 09:23:27 2018 +0800

Made nested container runs as its parent container's user by default.

Review: https://reviews.apache.org/r/69234

commit 4e00b663910ac3a37dd86e454acadb78dba1322a
Author: Qian Zhang 
Date:   Wed Oct 31 17:18:18 2018 -0700

Added a test `ROOT_UNPRIVILEGED_USER_DefaultExecutorCommandHealthCheck`.

Review: https://reviews.apache.org/r/69235

commit 05e2cb58dde866b67955304417804bee684d5817
Author: Qian Zhang 
Date:   Thu Nov 1 13:35:49 2018 -0700

Fixed a coding error that a test waited on a wrong task status update.

Review: https://reviews.apache.org/r/69236

> Nested container should run as the same user of its parent container by 
> default
> ---
>
> Key: MESOS-9332
> URL: https://issues.apache.org/jira/browse/MESOS-9332
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>  Labels: containerizer, mesosphere
>
> Currently when launching a debug container, by default Mesos agent will use 
> the executor's user as the debug container's user if the `user` field is not 
> specified in the debug container's `commandInfo` (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for 
> details). This is OK for the command task since the command executor's user 
> is same with command task's user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070]
>  for details), so the debug container will be launched as the same user of 
> the task. But for the task in a task group, the default executor's user is 
> same with the framework user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] 
> for details), so in this case the debug container will be launched as the 
> same user of the framework rather than the task. So in a scenario that 
> framework user is a normal user but the task user is root, the debug 
> container will be launched as the normal which is not desired, the 
> expectation is the debug container should run as the same user of the 
> container it debugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9332) Debug container should run as the same user of its parent container by default

2018-11-08 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679488#comment-16679488
 ] 

Qian Zhang edited comment on MESOS-9332 at 11/8/18 9:26 AM:


After discussed [~gilbert], we agree that we should actually run nested 
container (rather than just debug container) as the same user of its parent 
container by default, so I have updated the above patches and also the summary 
of this ticket accordingly.


was (Author: qianzhang):
After discussed [~gilbert], we agree that we should actually run nested 
container (rather than just debug container) as the same user of its parent 
container by default, so I have updated the above patches accordingly.

> Debug container should run as the same user of its parent container by default
> --
>
> Key: MESOS-9332
> URL: https://issues.apache.org/jira/browse/MESOS-9332
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>  Labels: containerizer, mesosphere
>
> Currently when launching a debug container, by default Mesos agent will use 
> the executor's user as the debug container's user if the `user` field is not 
> specified in the debug container's `commandInfo` (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/http.cpp#L2559] for 
> details). This is OK for the command task since the command executor's user 
> is same with command task's user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L6068:L6070]
>  for details), so the debug container will be launched as the same user of 
> the task. But for the task in a task group, the default executor's user is 
> same with the framework user (see [this 
> code|https://github.com/apache/mesos/blob/1.7.0/src/slave/slave.cpp#L8959] 
> for details), so in this case the debug container will be launched as the 
> same user of the framework rather than the task. So in a scenario that 
> framework user is a normal user but the task user is root, the debug 
> container will be launched as the normal which is not desired, the 
> expectation is the debug container should run as the same user of the 
> container it debugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9380) Support v1 Executor API over gRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9380:
--

 Summary: Support v1 Executor API over gRPC.
 Key: MESOS-9380
 URL: https://issues.apache.org/jira/browse/MESOS-9380
 Project: Mesos
  Issue Type: Task
  Components: agent
Reporter: Chun-Hung Hsiao


Supporting v1 Executor API over gRPC will enable people to write custom 
executors with more language bindings. The main work includes:
 1. Define the Executor gRPC service in {{service.proto}}. The proto will be in 
proto3, but the request and response messages will still be in proto2, since 
this is what the current v1 API based on.
 2. Refactor the agent code to support both HTTP and gRPC connections to reuse 
most of the current code for HTTP Executor API.
 3. Implement handlers for gRPC calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9379) Support v1 Operator API over gRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9379:
--

 Summary: Support v1 Operator API over gRPC.
 Key: MESOS-9379
 URL: https://issues.apache.org/jira/browse/MESOS-9379
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Chun-Hung Hsiao


Supporting v1 Operator API over gRPC will enable people to interact with Mesos 
through more language bindings. The main work includes:
1. Define the Operator gRPC service in {{service.proto}}. The proto will be in 
proto3, but the request and response messages will still be in proto2, since 
this is what the current v1 API based on.
2. Refactor the master code to support both HTTP and gRPC connections to reuse 
most of the current code for HTTP Operator API.
3. Implement handlers for gRPC calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9378) Support v1 Agent API over gRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9378:
--

 Summary: Support v1 Agent API over gRPC.
 Key: MESOS-9378
 URL: https://issues.apache.org/jira/browse/MESOS-9378
 Project: Mesos
  Issue Type: Task
  Components: agent
Reporter: Chun-Hung Hsiao


Supporting v1 Agent API over gRPC will enable people to query the agent with 
more language bindings, and the UI can be based on gRPC. The main work includes:
1. Define the Agent gRPC service in {{service.proto}}. The proto will be in 
proto3, but the request and response messages will still be in proto2, since 
this is what the current v1 API based on.
2. Implement handlers for gRPC calls to delegate them to the original agent 
call handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9372) Support V1 API through GRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9372:
--

 Summary: Support V1 API through GRPC.
 Key: MESOS-9372
 URL: https://issues.apache.org/jira/browse/MESOS-9372
 Project: Mesos
  Issue Type: Epic
Reporter: Chun-Hung Hsiao


Supporting V1 API over GRPC would make it easier for people to adapt the V1 
API, as the current HTTP API is not easy to use, and GRPC can generate 
different language bindings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9377) Support v1 Master API over gRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9377:
--

 Summary: Support v1 Master API over gRPC.
 Key: MESOS-9377
 URL: https://issues.apache.org/jira/browse/MESOS-9377
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Chun-Hung Hsiao


Supporting v1 Master API over gRPC will enable people to query the master with 
more language bindings, and the UI can be based on gRPC. The main work includes:
1. Define the Master gRPC service in service.proto. The proto will be in 
proto3, but the request and response messages will still be in proto2, since 
this is what the current v1 API based on.
2. Implement handlers for gRPC calls to delegate them to the original master 
call handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9374) gRPC server support for unary calls in libprocess.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9374:
--

 Summary: gRPC server support for unary calls in libprocess.
 Key: MESOS-9374
 URL: https://issues.apache.org/jira/browse/MESOS-9374
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Chun-Hung Hsiao


Supporting gRPC server for unary calls will enable using Mesos synchronous API 
(such as the master and agent API) over gRPC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9376) Support v1 Scheduler API over gRPC.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9376:
--

 Summary: Support v1 Scheduler API over gRPC.
 Key: MESOS-9376
 URL: https://issues.apache.org/jira/browse/MESOS-9376
 Project: Mesos
  Issue Type: Task
  Components: scheduler api
Reporter: Chun-Hung Hsiao


Supporting v1 Scheduler API over gRPC will enable people to write frameworks 
with more language bindings. The main work includes:
 1. Define the Scheduler gRPC service in {{service.proto}}. The proto will be 
in proto3, but the request and response messages will still be in proto2, since 
this is what the current v1 API based on.
 2. Refactor the master code to support both HTTP and gRPC connections to reuse 
most of the current code for HTTP Scheduler API.
 3. Implement handlers for gRPC calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9373) Design gRPC server and streaming support in Libprocess.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9373:
--

 Summary: Design gRPC server and streaming support in Libprocess.
 Key: MESOS-9373
 URL: https://issues.apache.org/jira/browse/MESOS-9373
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Chun-Hung Hsiao


Currently libprocess only supports gRPC client for unary gRPC calls through 
MESOS-7749. To have full gRPC support to enable V1 gRPC API, we have to lay out 
the design for:
1. gRPC server support for unary gRPC calls.
2. gRPC server-to-client support.

Optionally, we could consider support the following:
3. gRPC client-to-server streaming support. Mesos API does not use this pattern 
currently.
4. gRPC bi-directional streaming support. Only very few API calls use this so 
not as important as above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9375) Support gRPC server streaming calls in libprocess.

2018-11-08 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9375:
--

 Summary: Support gRPC server streaming calls in libprocess.
 Key: MESOS-9375
 URL: https://issues.apache.org/jira/browse/MESOS-9375
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Chun-Hung Hsiao


Supporting gRPC server for server streaming calls will enable using Mesos 
streaming API (such as the scheduler and operator API) over gRPC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)