-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40660/
-----------------------------------------------------------
Review request for mesos and Vinod Kone.
Bugs: MESOS-3851
https://issues.apache.org/jira/browse/MESOS-3851
Repository: mesos
Description
-------
Previously, we did not `link` against the executor `PID` while
(re)-registering. This might lead to libprocess creating ephemeral sockets
everytime a `send(...)` was invoked. This was leading to races where messages
might appear on the Executor out of order. This change does a `link(...)` on
the executor PID to ensure ordered message delivery.
---Not to be included in commit message---
I am still not comfortable bringing back the reverted commit
https://reviews.apache.org/r/40107/ . I can see one more race condition even
with a `link(...)`. We can still have messages coming out of order when the
first socket fails after sending the first message when still in flight. A new
socket gets created when we send the second message now, which might arrive
earlier then the first message leading to a race. But, this is a behavior that
is heavily relied upon elsewhere in our code-base. Happy to be proven wrong
though and be convinced that we can bring back the reverted commit now after
this change.
Diffs
-----
src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c
Diff: https://reviews.apache.org/r/40660/diff/
Testing
-------
make check
Thanks,
Anand Mazumdar