[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2016-05-05 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272922#comment-15272922
 ] 

Stephan Erb commented on MESOS-3870:


Could MESOS-5332 be due to the out-of-order deliver mentioned here?

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Epic
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2016-01-16 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103504#comment-15103504
 ] 

Neil Conway commented on MESOS-3870:


See also {{Slave::checkpointResources}}, which has a long comment (and some 
workaround code) to deal with the fact that ordered message delivery is not 
currently guaranteed.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998971#comment-14998971
 ] 

haosdent commented on MESOS-3870:
-

... Looks not possible. Only after consume event A, ProcessBase.state become 
BLOCKED.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998884#comment-14998884
 ] 

haosdent commented on MESOS-3870:
-

I think have another case make same Process run in different thread. Suppose 
ProcessBase pop event A in thread 1 and change ProcessBase.state to BLOCKED in 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2463
 . And not yet consume event B. Then event B reach and enqueue same Process to 
ProcessManager.runq in 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L3008
 . Then ProcessManager dequeue it in thread 2 and pop event B and run event B 
while thread 1 not yet running event A consume function. Is this possible?

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998837#comment-14998837
 ] 

haosdent commented on MESOS-3870:
-

Ohoh, got it. Thank you for explanation.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998806#comment-14998806
 ] 

Neil Conway commented on MESOS-3870:


You mean "volatile"? The variable is read and written inside a "synchronized" 
block, which will do the necessary synchronization (memory barriers) to ensure 
that other CPUs see the appropriate values (provided they also use synchronized 
blocks when examining the variable).

There are a few places that read "ProcessBase.state" without holding the mutex 
(e.g., ProcessManager::resume()) -- that is probably unsafe and should be fixed.

(Note that "volatile" is not sufficient/appropriate for ensuring reasonable 
semantics for concurrent access to shared state without mutual exclusion, 
anyway...)

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998805#comment-14998805
 ] 

Neil Conway commented on MESOS-3870:


You mean "volatile"? The variable is read and written inside a "synchronized" 
block, which will do the necessary synchronization (memory barriers) to ensure 
that other CPUs see the appropriate values (provided they also use synchronized 
blocks when examining the variable).

There are a few places that read "ProcessBase.state" without holding the mutex 
(e.g., ProcessManager::resume()) -- that is probably unsafe and should be fixed.

(Note that "volatile" is not sufficient/appropriate for ensuring reasonable 
semantics for concurrent access to shared state without mutual exclusion, 
anyway...)

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998757#comment-14998757
 ] 

haosdent commented on MESOS-3870:
-

Yes, but ProcessBase.state don't have violate. I am not sure if it will still 
get dirty value while it change in other thread.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998737#comment-14998737
 ] 

Neil Conway commented on MESOS-3870:


I don't see how: the routine acquires ProcessBase.mutex before examining 
ProcessBase.state.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998730#comment-14998730
 ] 

haosdent commented on MESOS-3870:
-

Yes, but I think it still could be enqueue twice? Because ProcessBase.state in 
different CPU core caches.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998720#comment-14998720
 ] 

Neil Conway commented on MESOS-3870:


A process can't be enqueued onto the runq twice. This is prevented because a 
process is only added to the runq when it receives an event and the process is 
in state "BLOCKED"; once a process is on the runq, its state is changed to 
"READY", so it won't be readded again in the future. 
(https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2998-L3017)

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998700#comment-14998700
 ] 

haosdent commented on MESOS-3870:
-

Suppose a Process enqueue to runq twice(Or impossible, seems I could not find 
any code avoid it enqueue multi times) when it receive two events.

And the dequeue in different work threads, and not yet running. In work thread 
1, Process dequeue event A and not yet running. In work thread 2, Process 
dequeue event B and start running.

Is this scenario possible?

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998694#comment-14998694
 ] 

Neil Conway commented on MESOS-3870:


"runq" contains processes that have work to do (inbound events to consume) but 
aren't currently running. Once a process starts running, it is removed from the 
runq, and it processes its input queue in order. i.e., a given process will be 
running on at most one thread at any given time.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998686#comment-14998686
 ] 

haosdent commented on MESOS-3870:
-

I think ProcessManager could dequeue same Process in different work thread?
{noformat}
ProcessBase* ProcessManager::dequeue()
{
  // TODO(benh): Remove a process from this thread's runq. If there
  // are no processes to run, and this is not a dedicated thread, then
  // steal one from another threads runq.

  ProcessBase* process = NULL;

  synchronized (runq_mutex) {
if (!runq.empty()) {
  process = runq.front();
  runq.pop_front();
  // Increment the running count of processes in order to support
  // the Clock::settle() operation (this must be done atomically
  // with removing the process from the runq).
  running.fetch_add(1);
}
  }

  return process;
}
{noformat}

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998677#comment-14998677
 ] 

Neil Conway commented on MESOS-3870:


This should be accounted for by the fact that each process has a queue of input 
events that are consumed in-order (see the "events" deque in ProcessBase). 
i.e., although we can have many worker threads, a given process is only running 
in at most one thread at a time and each process' input events are consumed in 
the order in which they were delivered.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2015-11-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998605#comment-14998605
 ] 

haosdent commented on MESOS-3870:
-

I think although send could be ordered when only have a connection, but execute 
also could be out-of-order in receiver. ProcessManager would create 8~cpu 
number work threads to handle input messages. Because ProcessManager dispatch 
work by event, same Process would be called in different thread for different 
event.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)