[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: Code-Review+2 sgtm. As a side note, what do you think of adding PARTIAL_WRITE mode for TlsContext? So that Socket::Write() and TlsSocket::Write() would behave similar. If we are eager to fix the flakes, I think it's possible to add the test in a separate changelist. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 21:35:49 + Gerrit-HasComments: No
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > How do you know it will call it with the same length of 10? I think the code as it is today will always call with the same length. The edge case is if a partial write happens. But if Writev() does a partial write of an iovec/slice, it returns back to the caller to adjust the offset within that slice: https://github.com/apache/kudu/blob/f9a6a1b2d15feb8e2c044f20248774105a3a62bf/src/kudu/security/tls_socket.cc#L87 So there wouldn't be a case where writev() calls SSL_write() with a different length after a failure. But I agree that's not in the contract and it's best not to depend on implementation details that could change at any point, which may invalidate this fix. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:49:33 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > Ah, it seems I misread the docs regarding SSL_write() and SSL_MODE_ACCEPT_M Correction: actually, there is a restriction on the length for the follow-up write: it should not be less than the length specified when SSL_write() returned SSL_ERROR_WANT_WRITE, as seen from ssl/s3_pkt.c: if ((s->s3->wpend_tot > (int)len) || ((s->s3->wpend_buf != buf) && !(s->mode & SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)) || (s->s3->wpend_type != type)) { SSLerr(SSL_F_SSL3_WRITE_PENDING, SSL_R_BAD_WRITE_RETRY); return (-1); } And specifying SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER would not help here. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:48:08 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > In other words, I think it's brittle to assume the caller will call it with Ah, it seems I misread the docs regarding SSL_write() and SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER. Only the buffer needs to be the same, right. No need to require the same length. So, no need for SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode -- the fix should be enough. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:40:03 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > How do you know it will call it with the same length of 10? In other words, I think it's brittle to assume the caller will call it with the same length. The offset in the buffer will be the same, yes, but it's brittle to assume the caller will use the same length for the next call, since that's not in the contract of that OutboundTransfer::SendBuffer API, as I understand. Maybe, we should also enable SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER for the TlsContext. Also, I found that enabling SSL_MODE_ENABLE_PARTIAL_WRITE for TlsContext (without any other changes) fixes the issue with that error in LargeWrites test. But this fix is also needed to have consistent behavior of TlsSocket::Writev(). Overall, I think enabling SSL_MODE_ENABLE_PARTIAL_WRITE might be beneficial to have as well: so far callers of TlsSocket::Write() handle partial writes correctly, and there will be less difference in behavior of TlsSocket::Write() and Socket::Write() in that case. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:34:36 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > It would get the same parameters, wouldn't it? How do you know it will call it with the same length of 10? -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:22:13 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } > Does this really help? It would get the same parameters, wouldn't it? If there is a vector with 2 frames of 10 bytes each, Writev() would call Write() twice with a different buffer: Write(buf, 10, ...); // succeeds Write(buf+10, 10, ...); // fails If the second call fails with 'SSL_ERROR_WANT_WRITE', we go back up to the caller with Status::OK() and 10 bytes written. The caller would retry Writev() with 'buf+10' as the new offset, so SSL_write() would see the same offset as it last tried with. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:17:37 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc File src/kudu/security/tls_socket.cc: http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99 PS1, Line 99: if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) { : return Status::OK(); : } Does this really help? I'm not sure that would help in some cases. My concern is if it was a partial write indeed, the upper-level caller (i.e. OutboundTransfer::SendBuffer()) will adjust the cur_offset_in_slice_, so the next time it will call TlsSocket::Writev() with different buffer and offset, that would result calling SSL_write() with different parameters. However, SSL_write() expects the follow-up call after SSL_ERROR_WANT_WRITE to be exactly with the same set parameters that was used when SSL_write() returned SSL_ERROR_WANT_WRITE. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 19:12:40 + Gerrit-HasComments: Yes
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8570 ) Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. Patch Set 1: Code-Review+1 The fix makes sense to me. -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 16 Nov 2017 07:27:48 + Gerrit-HasComments: No
[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev
Hello Alexey Serbin, Sailesh Mukil, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/8570 to review the following change. Change subject: WIP: tls_socket: properly handle temporary socket errors in Writev .. WIP: tls_socket: properly handle temporary socket errors in Writev This fixes a bug which caused RaftConsensusITest.TestLargeBatches to fail when run under stress, as in the following command line: taskset -c 0-4 \ build/latest/bin/raft_consensus-itest \ --gtest_filter=\*LargeBat\* \ --stress-cpu-threads=8 This would produce an error like: Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878 This means that we were retrying a write after getting EAGAIN, but with a different buffer than the first time. I tracked this down to mishandling of temporary socket errors in TlsSocket::Writev(). In the case that we successfully write part of the io vector but hit such an error trying to write a later element in the vector, we were still propagating the error back up to the caller. The caller didn't realize that part of the write was successful, and thus it would retry the write from the beginning. WIP because this should have better TLS-socket-specific test coverage. Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 --- M src/kudu/security/tls_socket.cc 1 file changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/8570/1 -- To view, visit http://gerrit.cloudera.org:8080/8570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472 Gerrit-Change-Number: 8570 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Sailesh Mukil