Joseph Wu created MESOS-5748:
--------------------------------

             Summary: Potential segfault in `link` and `send` when linking to a 
remote process
                 Key: MESOS-5748
                 URL: https://issues.apache.org/jira/browse/MESOS-5748
             Project: Mesos
          Issue Type: Bug
          Components: libprocess
    Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0, 0.23.0, 0.22.0
            Reporter: Joseph Wu
             Fix For: 1.0.0


There is a race the SocketManager, between a remote {{link}} and disconnection 
of the underlying socket.

We potentially segfault here: 
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

{{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
object.  However, the code above this line actually has ownership of the 
pointer:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

If the socket dies during the link, the {{ignore_recv_data}} may delete the 
Socket underneath {{link}}:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411

----
The same race exists for {{send}}.

This race was discovered while running a new test in repetition:
https://reviews.apache.org/r/49175/

On OSX, I hit the race consistently every 500-800 repetitions:
{code}
3rdparty/libprocess/libprocess-tests 
--gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
--gtest_repeat=1000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to