[ https://issues.apache.org/jira/browse/MESOS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414303#comment-15414303 ]
Greg Mann commented on MESOS-5988: ---------------------------------- Review here: https://reviews.apache.org/r/50936/ > PollSocketImpl can write to a stale fd. > --------------------------------------- > > Key: MESOS-5988 > URL: https://issues.apache.org/jira/browse/MESOS-5988 > Project: Mesos > Issue Type: Bug > Components: libprocess > Reporter: Benjamin Mahler > Assignee: Greg Mann > Priority: Blocker > Labels: mesosphere > Fix For: 1.0.1 > > > When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We > were curious why PollSocketImpl avoids the same issue. It seems that > PollSocketImpl has a similar race, however in the case of PollSocketImpl we > will simply write to a stale file descriptor. > One example is {{PollSocketImpl::send(const char*, size_t)}}: > https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245 > {code} > Future<size_t> PollSocketImpl::send(const char* data, size_t size) > { > return io::poll(get(), io::WRITE) > .then(lambda::bind(&internal::socket_send_data, get(), data, size)); > } > Future<size_t> socket_send_data(int s, const char* data, size_t size) > { > CHECK(size > 0); > while (true) { > ssize_t length = send(s, data, size, MSG_NOSIGNAL); > #ifdef __WINDOWS__ > int error = WSAGetLastError(); > #else > int error = errno; > #endif // __WINDOWS__ > if (length < 0 && net::is_restartable_error(error)) { > // Interrupted, try again now. > continue; > } else if (length < 0 && net::is_retryable_error(error)) { > // Might block, try again later. > return io::poll(s, io::WRITE) > .then(lambda::bind(&internal::socket_send_data, s, data, size)); > } else if (length <= 0) { > // Socket error or closed. > if (length < 0) { > const string error = os::strerror(errno); > VLOG(1) << "Socket error while sending: " << error; > } else { > VLOG(1) << "Socket closed while sending"; > } > if (length == 0) { > return length; > } else { > return Failure(ErrnoError("Socket send failed")); > } > } else { > CHECK(length > 0); > return length; > } > } > } > {code} > If the last reference to the {{Socket}} goes away before the > {{socket_send_data}} loop completes, then we will write to a stale fd! > It turns out that we have avoided this issue because in libprocess we happen > to keep a reference to the {{Socket}} around when sending: > https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707 > {code} > void send(Encoder* encoder, Socket socket) > { > switch (encoder->kind()) { > case Encoder::DATA: { > size_t size; > const char* data = static_cast<DataEncoder*>(encoder)->next(&size); > socket.send(data, size) > .onAny(lambda::bind( > &internal::_send, > lambda::_1, > socket, > encoder, > size)); > break; > } > case Encoder::FILE: { > off_t offset; > size_t size; > int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size); > socket.sendfile(fd, offset, size) > .onAny(lambda::bind( > &internal::_send, > lambda::_1, > socket, > encoder, > size)); > break; > } > } > } > {code} > However, this may not be true in all call-sites going forward. Currently, it > appears that http::Connection can trigger this bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)