[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272922#comment-15272922 ] Stephan Erb commented on MESOS-3870: Could MESOS-5332 be due to the out-of-order deliver mentioned here? > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Epic > Components: libprocess >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103504#comment-15103504 ] Neil Conway commented on MESOS-3870: See also {{Slave::checkpointResources}}, which has a long comment (and some workaround code) to deal with the fact that ordered message delivery is not currently guaranteed. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998971#comment-14998971 ] haosdent commented on MESOS-3870: - ... Looks not possible. Only after consume event A, ProcessBase.state become BLOCKED. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998884#comment-14998884 ] haosdent commented on MESOS-3870: - I think have another case make same Process run in different thread. Suppose ProcessBase pop event A in thread 1 and change ProcessBase.state to BLOCKED in https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2463 . And not yet consume event B. Then event B reach and enqueue same Process to ProcessManager.runq in https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L3008 . Then ProcessManager dequeue it in thread 2 and pop event B and run event B while thread 1 not yet running event A consume function. Is this possible? > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998837#comment-14998837 ] haosdent commented on MESOS-3870: - Ohoh, got it. Thank you for explanation. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998806#comment-14998806 ] Neil Conway commented on MESOS-3870: You mean "volatile"? The variable is read and written inside a "synchronized" block, which will do the necessary synchronization (memory barriers) to ensure that other CPUs see the appropriate values (provided they also use synchronized blocks when examining the variable). There are a few places that read "ProcessBase.state" without holding the mutex (e.g., ProcessManager::resume()) -- that is probably unsafe and should be fixed. (Note that "volatile" is not sufficient/appropriate for ensuring reasonable semantics for concurrent access to shared state without mutual exclusion, anyway...) > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998805#comment-14998805 ] Neil Conway commented on MESOS-3870: You mean "volatile"? The variable is read and written inside a "synchronized" block, which will do the necessary synchronization (memory barriers) to ensure that other CPUs see the appropriate values (provided they also use synchronized blocks when examining the variable). There are a few places that read "ProcessBase.state" without holding the mutex (e.g., ProcessManager::resume()) -- that is probably unsafe and should be fixed. (Note that "volatile" is not sufficient/appropriate for ensuring reasonable semantics for concurrent access to shared state without mutual exclusion, anyway...) > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998757#comment-14998757 ] haosdent commented on MESOS-3870: - Yes, but ProcessBase.state don't have violate. I am not sure if it will still get dirty value while it change in other thread. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998737#comment-14998737 ] Neil Conway commented on MESOS-3870: I don't see how: the routine acquires ProcessBase.mutex before examining ProcessBase.state. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998730#comment-14998730 ] haosdent commented on MESOS-3870: - Yes, but I think it still could be enqueue twice? Because ProcessBase.state in different CPU core caches. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998720#comment-14998720 ] Neil Conway commented on MESOS-3870: A process can't be enqueued onto the runq twice. This is prevented because a process is only added to the runq when it receives an event and the process is in state "BLOCKED"; once a process is on the runq, its state is changed to "READY", so it won't be readded again in the future. (https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2998-L3017) > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998700#comment-14998700 ] haosdent commented on MESOS-3870: - Suppose a Process enqueue to runq twice(Or impossible, seems I could not find any code avoid it enqueue multi times) when it receive two events. And the dequeue in different work threads, and not yet running. In work thread 1, Process dequeue event A and not yet running. In work thread 2, Process dequeue event B and start running. Is this scenario possible? > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998694#comment-14998694 ] Neil Conway commented on MESOS-3870: "runq" contains processes that have work to do (inbound events to consume) but aren't currently running. Once a process starts running, it is removed from the runq, and it processes its input queue in order. i.e., a given process will be running on at most one thread at any given time. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998686#comment-14998686 ] haosdent commented on MESOS-3870: - I think ProcessManager could dequeue same Process in different work thread? {noformat} ProcessBase* ProcessManager::dequeue() { // TODO(benh): Remove a process from this thread's runq. If there // are no processes to run, and this is not a dedicated thread, then // steal one from another threads runq. ProcessBase* process = NULL; synchronized (runq_mutex) { if (!runq.empty()) { process = runq.front(); runq.pop_front(); // Increment the running count of processes in order to support // the Clock::settle() operation (this must be done atomically // with removing the process from the runq). running.fetch_add(1); } } return process; } {noformat} > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998677#comment-14998677 ] Neil Conway commented on MESOS-3870: This should be accounted for by the fact that each process has a queue of input events that are consumed in-order (see the "events" deque in ProcessBase). i.e., although we can have many worker threads, a given process is only running in at most one thread at a time and each process' input events are consumed in the order in which they were delivered. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998605#comment-14998605 ] haosdent commented on MESOS-3870: - I think although send could be ordered when only have a connection, but execute also could be out-of-order in receiver. ProcessManager would create 8~cpu number work threads to handle input messages. Because ProcessManager dispatch work by event, same Process would be called in different thread for different event. > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)