[jira] [Commented] (MESOS-9000) Operator API event stream can miss task status updates.

2018-08-21 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587193#comment-16587193
 ] 

Alexander Rukletsov commented on MESOS-9000:


On the 1.7.x branch:
{noformat}
commit a2f826d5a641b8ae3e5742ffeab7166281e296f8
Author: Benno Evers 
AuthorDate: Tue Aug 21 10:58:35 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Aug 21 11:08:41 2018 +0200

Changed operator API to notify subscribers on every status change.

Prior to this change, the master would only send `TaskUpdated`
messages to subscribers when the latest known task state on the
agent changed.

This implied that schedulers could not reliably wait for the status
information corresponding to specific state updates, i.e.,
`TASK_RUNNING`, since there is no guarantee that subscribers get
notified during the time when this status update will be included in
the status field.

After this change, `TaskUpdated` messages are sent whenever the latest
acknowledged state of the task changes.

Review: https://reviews.apache.org/r/67575/
{noformat}

> Operator API event stream can miss task status updates.
> ---
>
> Key: MESOS-9000
> URL: https://issues.apache.org/jira/browse/MESOS-9000
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Benno Evers
>Assignee: Benno Evers
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.7.0
>
>
> As of now, the master only sends TaskUpdated messages to subscribers when the 
> latest known task state on the agent changed:
> {noformat}
>   // src/master/master.cpp
>   if (!protobuf::isTerminalState(task->state())) {
> if (status.state() != task->state()) {
>   sendSubscribersUpdate = true;
> }
> task->set_state(latestState.getOrElse(status.state()));
>   }
> {noformat}
> The latest state is set like this:
> {noformat}
> // src/messages/messages.proto
> message StatusUpdate {
>   [...]
>   // This corresponds to the latest state of the task according to the
>   // agent. Note that this state might be different than the state in
>   // 'status' because task status update manager queues updates. In
>   // other words, 'status' corresponds to the update at top of the
>   // queue and 'latest_state' corresponds to the update at bottom of
>   // the queue.
>   optional TaskState latest_state = 7;
> }
> {noformat}
> However, the `TaskStatus` message included in an `TaskUpdated` event is the 
> event at the bottom of the queue when the update was sent.
> So we can easily get in a situation where e.g. the first TaskUpdated has 
> .status.state == TASK_STARTING and .state == TASK_RUNNING, and the second 
> update with .status.state == TASK_RUNNNING and .state == TASK_RUNNING would 
> not get delivered because the latest known state did not change.
> This implies that schedulers can not reliably wait for the status information 
> corresponding to specific task state, since there is no guarantee that 
> subscribers get notified during the time when this status update will be 
> included in the status field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9000) Operator API event stream can miss task status updates

2018-06-15 Thread Zhitao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513816#comment-16513816
 ] 

Zhitao Li commented on MESOS-9000:
--

I believe the high level intention was to avoid sending unnecessary duplicate 
status update messages, but I don't think we explicitly considered the multiple 
event queued scenario you described.

I think if we have a counter to monitor rate of message on event stream, it 
sounds fine to add this.



> Operator API event stream can miss task status updates
> --
>
> Key: MESOS-9000
> URL: https://issues.apache.org/jira/browse/MESOS-9000
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>
> As of now, the master only sends TaskUpdated messages to subscribers when the 
> latest known task state on the agent changed:
> {noformat}
>   // src/master/master.cpp
>   if (!protobuf::isTerminalState(task->state())) {
> if (status.state() != task->state()) {
>   sendSubscribersUpdate = true;
> }
> task->set_state(latestState.getOrElse(status.state()));
>   }
> {noformat}
> The latest state is set like this:
> {noformat}
> // src/messages/messages.proto
> message StatusUpdate {
>   [...]
>   // This corresponds to the latest state of the task according to the
>   // agent. Note that this state might be different than the state in
>   // 'status' because task status update manager queues updates. In
>   // other words, 'status' corresponds to the update at top of the
>   // queue and 'latest_state' corresponds to the update at bottom of
>   // the queue.
>   optional TaskState latest_state = 7;
> }
> {noformat}
> However, the `TaskStatus` message included in an `TaskUpdated` event is the 
> event at the bottom of the queue when the update was sent.
> So we can easily get in a situation where e.g. the first TaskUpdated has 
> .status.state == TASK_STARTING and .state == TASK_RUNNING, and the second 
> update with .status.state == TASK_RUNNNING and .state == TASK_RUNNING would 
> not get delivered because the latest known state did not change.
> This implies that schedulers can not reliably wait for the status information 
> corresponding to specific task state, since there is no guarantee that 
> subscribers get notified during the time when this status update will be 
> included in the status field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9000) Operator API event stream can miss task status updates

2018-06-15 Thread Benno Evers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513740#comment-16513740
 ] 

Benno Evers commented on MESOS-9000:


[~zhitao], since you added the .status field to the `TaskUpdated` message, can 
you remember if this was the intended behaviour, or is this an oversight?

> Operator API event stream can miss task status updates
> --
>
> Key: MESOS-9000
> URL: https://issues.apache.org/jira/browse/MESOS-9000
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>
> As of now, the master only sends TaskUpdated messages to subscribers when the 
> latest known task state on the agent changed:
> {noformat}
>   // src/master/master.cpp
>   if (!protobuf::isTerminalState(task->state())) {
> if (status.state() != task->state()) {
>   sendSubscribersUpdate = true;
> }
> task->set_state(latestState.getOrElse(status.state()));
>   }
> {noformat}
> The latest state is set like this:
> {noformat}
> // src/messages/messages.proto
> message StatusUpdate {
>   [...]
>   // This corresponds to the latest state of the task according to the
>   // agent. Note that this state might be different than the state in
>   // 'status' because task status update manager queues updates. In
>   // other words, 'status' corresponds to the update at top of the
>   // queue and 'latest_state' corresponds to the update at bottom of
>   // the queue.
>   optional TaskState latest_state = 7;
> }
> {noformat}
> However, the `TaskStatus` message included in an `TaskUpdated` event is the 
> event at the bottom of the queue when the update was sent.
> So we can easily get in a situation where e.g. the first TaskUpdated has 
> .status.state == TASK_STARTING and .state == TASK_RUNNING, and the second 
> update with .status.state == TASK_RUNNNING and .state == TASK_RUNNING would 
> not get delivered because the latest known state did not change.
> This implies that schedulers can not reliably wait for the status information 
> corresponding to specific task state, since there is no guarantee that 
> subscribers get notified during the time when this status update will be 
> included in the status field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9000) Operator API event stream can miss task status updates

2018-06-15 Thread Benno Evers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513680#comment-16513680
 ] 

Benno Evers commented on MESOS-9000:


Review: https://reviews.apache.org/r/67575/

> Operator API event stream can miss task status updates
> --
>
> Key: MESOS-9000
> URL: https://issues.apache.org/jira/browse/MESOS-9000
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>
> As of now, the master only sends TaskUpdated messages
> to subscribers when the latest known task state on the agent changed.
> {noformat}
> // src/messages/messages.proto
> message StatusUpdate {
>   [...]
>   // This corresponds to the latest state of the task according to the
>   // agent. Note that this state might be different than the state in
>   // 'status' because task status update manager queues updates. In
>   // other words, 'status' corresponds to the update at top of the
>   // queue and 'latest_state' corresponds to the update at bottom of
>   // the queue.
>   optional TaskState latest_state = 7;
> }
> {noformat}
> This implied that schedulers could not reliably wait for the status
> information corresponding to specific state updates (i.e. TASK_RUNNING),
> since there is no guarantee that subscribers get notified during
> the time when this status update will be included in the status field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)