[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441616#comment-16441616 ] Gilbert Song commented on MESOS-8247: - [~abudnik][~alexr], do we still aim to land this in 1.5.1? > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Priority: Major > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322114#comment-16322114 ] Alexander Rukletsov commented on MESOS-8247: {noformat} Commit: 9c03a463c1ac8f63dc00255945a04016c45f04e9 [9c03a46] Parents: 164d99e1be Author: Alexander Rukletsov Date: 11 January 2018 at 13:07:29 GMT+1 Committer: Alexander Rukletsov Labels: HEAD -> master Logged socket create/connect failures on warning level. If a socket create or connect failure occurs during link() or send(), the reason for error is not propagated to the library user. Also, the data being sent or queued is silently dropped on the floor. The socket code does not log itself on a higher level when an error situation occurs. The only trace is the log entries touched in this patch: having them at warning level will significantly simplify debugging. This patch also consistently logs send / link target. Review: https://reviews.apache.org/r/65048/ {noformat} > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320309#comment-16320309 ] Alexander Rukletsov commented on MESOS-8247: https://reviews.apache.org/r/65048/ — not a fix, logging improvement. > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308419#comment-16308419 ] Jie Yu commented on MESOS-8247: --- OK, i retargeted this for 1.5.1 > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307902#comment-16307902 ] Alexander Rukletsov commented on MESOS-8247: [~jieyu] nope. > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302046#comment-16302046 ] Jie Yu commented on MESOS-8247: --- [~abudnik], [~alexr], is this a blocker for 1.5.0? > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262862#comment-16262862 ] Alexander Rukletsov commented on MESOS-8247: These patches ensure that the driver-based executors react at kill task requests even if the task has not been launched: https://reviews.apache.org/r/64032/ https://reviews.apache.org/r/64033/ > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261281#comment-16261281 ] Andrei Budnik commented on MESOS-8247: -- Additional logs: {code} Nov 14 23:03:21 ip-xxx mesos-agent[2029]: E1114 23:03:21.049590 2057 process.cpp:2431] Failed to shutdown socket with fd 320: Transport endpoint is not connected Nov 14 23:03:21 ip-xxx mesos-agent[2029]: I1114 23:03:21.049783 2054 slave.cpp:4484] Got exited event for executor(1)@xx.xx.yy.zzz:10895 {code} > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8247) Executor registered message is lost
[ https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257291#comment-16257291 ] Andrei Budnik commented on MESOS-8247: -- Related https://issues.apache.org/jira/browse/MESOS-3851 ? > Executor registered message is lost > --- > > Key: MESOS-8247 > URL: https://issues.apache.org/jira/browse/MESOS-8247 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik > > h3. Brief description of successful agent-executor communication. > Executor sends `RegisterExecutorMessage` message to Agent during > initialization step. Agent sends a `ExecutorRegisteredMessage` message as a > response to the Executor in `registerExecutor()` method. Whenever executor > receives `ExecutorRegisteredMessage`, it prints a `Executor registered on > agent...` to stderr logs. > h3. Problem description. > The agent launches built-in docker executor, which is stuck in `STAGING` > state. > stderr logs of the docker executor: > {code} > I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3 > {code} > It doesn't contain a message like `Executor registered on agent...`. At the > same time agent received `RegisterExecutorMessage` and sent `runTask` message > to the executor. > stdout logs consists of the same repeating message: > {code} > Received killTask for task ... > {code} > Also, the docker executor process doesn't contain child processes. > Currently, executor [doesn't > attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320] > to launch a task if it is not registered at the agent, while [task > killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343] > doesn't have such a check. > It looks like `ExecutorRegisteredMessage` has been lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)