[ 
https://issues.apache.org/jira/browse/MESOS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414303#comment-15414303
 ] 

Greg Mann commented on MESOS-5988:
----------------------------------

Review here: https://reviews.apache.org/r/50936/

> PollSocketImpl can write to a stale fd.
> ---------------------------------------
>
>                 Key: MESOS-5988
>                 URL: https://issues.apache.org/jira/browse/MESOS-5988
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>            Reporter: Benjamin Mahler
>            Assignee: Greg Mann
>            Priority: Blocker
>              Labels: mesosphere
>             Fix For: 1.0.1
>
>
> When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We 
> were curious why PollSocketImpl avoids the same issue. It seems that 
> PollSocketImpl has a similar race, however in the case of PollSocketImpl we 
> will simply write to a stale file descriptor.
> One example is {{PollSocketImpl::send(const char*, size_t)}}:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
> {code}
> Future<size_t> PollSocketImpl::send(const char* data, size_t size)
> {
>   return io::poll(get(), io::WRITE)
>     .then(lambda::bind(&internal::socket_send_data, get(), data, size));
> }
> Future<size_t> socket_send_data(int s, const char* data, size_t size)
> {
>   CHECK(size > 0);
>   while (true) {
>     ssize_t length = send(s, data, size, MSG_NOSIGNAL);
> #ifdef __WINDOWS__
>     int error = WSAGetLastError();
> #else
>     int error = errno;
> #endif // __WINDOWS__
>     if (length < 0 && net::is_restartable_error(error)) {
>       // Interrupted, try again now.
>       continue;
>     } else if (length < 0 && net::is_retryable_error(error)) {
>       // Might block, try again later.
>       return io::poll(s, io::WRITE)
>         .then(lambda::bind(&internal::socket_send_data, s, data, size));
>     } else if (length <= 0) {
>       // Socket error or closed.
>       if (length < 0) {
>         const string error = os::strerror(errno);
>         VLOG(1) << "Socket error while sending: " << error;
>       } else {
>         VLOG(1) << "Socket closed while sending";
>       }
>       if (length == 0) {
>         return length;
>       } else {
>         return Failure(ErrnoError("Socket send failed"));
>       }
>     } else {
>       CHECK(length > 0);
>       return length;
>     }
>   }
> }
> {code}
> If the last reference to the {{Socket}} goes away before the 
> {{socket_send_data}} loop completes, then we will write to a stale fd!
> It turns out that we have avoided this issue because in libprocess we happen 
> to keep a reference to the {{Socket}} around when sending:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
> {code}
> void send(Encoder* encoder, Socket socket)
> {
>   switch (encoder->kind()) {
>     case Encoder::DATA: {
>       size_t size;
>       const char* data = static_cast<DataEncoder*>(encoder)->next(&size);
>       socket.send(data, size)
>         .onAny(lambda::bind(
>             &internal::_send,
>             lambda::_1,
>             socket,
>             encoder,
>             size));
>       break;
>     }
>     case Encoder::FILE: {
>       off_t offset;
>       size_t size;
>       int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size);
>       socket.sendfile(fd, offset, size)
>         .onAny(lambda::bind(
>             &internal::_send,
>             lambda::_1,
>             socket,
>             encoder,
>             size));
>       break;
>     }
>   }
> }
> {code}
> However, this may not be true in all call-sites going forward. Currently, it 
> appears that http::Connection can trigger this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to