[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Todd Lipcon has removed a vote on this change. Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() Previously, OutboundTransfer::TransferStarted() returns true iff non-zero bytes have been successfully sent via Writev(). As it turns out, this doesn't work well with SSL_write(). When SSL_write() returns -1 with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly the same buffer pointer next time even if 0 bytes have been written. The following sequence becomes problematic with the previous implementation of OutboundTransfer::TransferStarted(): - WriteHandler() calls SendBuffer() on an OutboundTransfer. - SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above. Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0 and OutboundTransfer::TransferStarted() still returns false. - OutboundTransfer is cancelled or timed out. car->call is set to NULL. - WirteHandler() is called again and as it notices that the OutboundTransfer hasn't really started yet and "car->call" is NULL due to cancellation, it removes it from the outbound transfer queue and moves on to the next entry in the queue. - WriteHandler() calls SendBuffer() with the next entry in the queue and eventually calls SSL_write() with a different buffer than expected by SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error. This change fixes the problem above by adding a boolean flag 'started_' which is set to true if OutboundTransfer::SendBuffer() has been called at least once. Also added some tests to exercise cancellation paths with multiple concurrent RPCs. Confirmed the problem above is fixed by running stress test in a 130 node cluster with Impala. The problem happened consistently without the fix. Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Reviewed-on: http://gerrit.cloudera.org:8080/9587 Reviewed-by: Sailesh Mukil Reviewed-by: Todd Lipcon Tested-by: Todd Lipcon --- M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/transfer.cc M src/kudu/rpc/transfer.h 4 files changed, 84 insertions(+), 1 deletion(-) Approvals: Sailesh Mukil: Looks good to me, but someone else must approve Todd Lipcon: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 2: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 15:10:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc@1225 PS2, Line 1225: // This serves as a helper function for TestCancellationMultiThreads() to exercise : // cancellation when there are concurrent RPCs. > nit: Better to not explain what a user of this function would do, since it' Doesn't seem too bad to as this provides context on how this function fits in with the rest of the test. -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 06:43:50 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 04:34:13 + Gerrit-HasComments: No
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc@1225 PS2, Line 1225: // This serves as a helper function for TestCancellationMultiThreads() to exercise : // cancellation when there are concurrent RPCs. nit: Better to not explain what a user of this function would do, since it's already explained in TestCancellationMultiThreads(). -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 04:33:57 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 2: Hit with flaky tests again in TSAN build: [ FAILED ] AllTypesItest/3.TestAllKeyTypes, where TypeParam = kudu::client::IntKeysTestSetup > [ FAILED ] AllTypesItest/7.TestAllKeyTypes, where TypeParam = kudu::client::IntKeysTestSetup > Both failed with the following backtraces: F0313 01:15:33.904728 14346 master_main.cc:74] Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Error reading clock. Clock considered unsynchronized *** Check failure stack trace: *** @ 0x7f9641ff03b9 google::LogMessage::Flush() at ??:0 @ 0x7f9641ff446e google::LogMessageFatal::~LogMessageFatal() at ??:0 @ 0x4b5391 kudu::master::MasterMain() at /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:6199 @ 0x4b4ddf main at /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:6189 @ 0x7f963f379f45 __libc_start_main at ??:0 @ 0x41e092 (unknown) at ??:0 /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/all_types-itest.cc:518: Failure -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 01:47:34 + Gerrit-HasComments: No
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h File src/kudu/rpc/rpc-test-base.h: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h@441 PS1, Line 441: FLAGS_rpc_encrypt_loopback_connections = true; > ah, does this mean we were not successfully testing encrypted RPCs in the p I believe so. I confirmed that the TlsSocket path is now being exercised by manually adding a CHECK() in that TlsSocket::Write(). http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@740 PS1, Line 740:// Linux, SSL enabled > hmm, is this fix related to the patch in question? I'm surprised we haven't Yes, as we weren't using TlsSocket before so we weren't triggering this error message. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1223 PS1, Line 1223: // Helper function for TestCancellationMultiThreads. > better to document briefly what it does, rather than what it is. Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1224 PS1, Line 1224: SendRpcAndCancel > this should probably be named SendAndCancelRpcs or something, since it send Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1251 PS1, Line 1251: ASSERT_TRUE > have to use CHECK here since this isn't the main thread and gtest isn't thr Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1256 PS1, Line 1256: #define BUF_SIZE (16 * 1024 * 1024) > can you use a const int defined below within the function instead? We pref Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1258 PS1, Line 1258: Regression > did this test successfully repro the bug w/o the fix? Unfortunately, no even with multiple tries in different build types. That said, I think this is still a good test to include. Any suggestion to make this test more effective in triggering the bug is welcomed. I confirmed the fix with Impala stress test running on a 130 node cluster which hit this bug quite consistently without the fix. Updated the commit message with this piece of information. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1274 PS1, Line 1274: unique_ptr buf(new uint8_t[BUF_SIZE]); : memset(buf.get(), 'a', BUF_SIZE); : Slice slice(buf.get(), BUF_SIZE); > you could probably get away with just: Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h File src/kudu/rpc/transfer.h: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@186 PS1, Line 186: True if we have tried sending anything in 'payload_slices_' > Actually isn't it more like "True if SendBuffer() is called at least once." Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@187 PS1, Line 187: // no bytes were sent successfully. This is needed as SSL_write() is stateful. > nit: Add a reference to this JIRA, else it's hard to understand why we adde Done http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc File src/kudu/rpc/transfer.cc: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc@190 PS1, Line 190: started_ = true; > Do we know of any cases where setting started_ = true this early can backfi As far as I can tell, this should be fine. Both SendBuffer() and OutboundTransfer::TransferStarted() are called from Connection::WriteHandler() only. So, if we get to the point of SendBuffer(), we should have taken the path of if (!transfer->TransferStarted()) once. -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 00:54:24 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Hello Kudu Jenkins, Sailesh Mukil, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9587 to look at the new patch set (#2). Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() Previously, OutboundTransfer::TransferStarted() returns true iff non-zero bytes have been successfully sent via Writev(). As it turns out, this doesn't work well with SSL_write(). When SSL_write() returns -1 with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly the same buffer pointer next time even if 0 bytes have been written. The following sequence becomes problematic with the previous implementation of OutboundTransfer::TransferStarted(): - WriteHandler() calls SendBuffer() on an OutboundTransfer. - SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above. Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0 and OutboundTransfer::TransferStarted() still returns false. - OutboundTransfer is cancelled or timed out. car->call is set to NULL. - WirteHandler() is called again and as it notices that the OutboundTransfer hasn't really started yet and "car->call" is NULL due to cancellation, it removes it from the outbound transfer queue and moves on to the next entry in the queue. - WriteHandler() calls SendBuffer() with the next entry in the queue and eventually calls SSL_write() with a different buffer than expected by SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error. This change fixes the problem above by adding a boolean flag 'started_' which is set to true if OutboundTransfer::SendBuffer() has been called at least once. Also added some tests to exercise cancellation paths with multiple concurrent RPCs. Confirmed the problem above is fixed by running stress test in a 130 node cluster with Impala. The problem happened consistently without the fix. Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 --- M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/transfer.cc M src/kudu/rpc/transfer.h 4 files changed, 84 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/9587/2 -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h File src/kudu/rpc/transfer.h: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@186 PS1, Line 186: True if we have tried sending anything in 'payload_slices_' Actually isn't it more like "True if SendBuffer() is called at least once." ? The Writev() function can return without attempting to send anything in some cases. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@187 PS1, Line 187: // no bytes were sent successfully. This is needed as SSL_write() is stateful. nit: Add a reference to this JIRA, else it's hard to understand why we added it. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc File src/kudu/rpc/transfer.cc: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc@190 PS1, Line 190: started_ = true; Do we know of any cases where setting started_ = true this early can backfire? I looked at the code and it seems fine to me, but I just want to make sure. In some cases, we won't send anything at all: https://github.com/apache/kudu/blob/master/src/kudu/security/tls_socket.cc#L50-L55 Or we could hit errors before actually calling the SSL_write() or send() functions. So we should make sure that having started_ = true for these cases doesn't cause a problem elsewhere in the code. -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 23:36:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 1: (8 comments) > Patch Set 1: > > the test failure seems to be infrastructure issue ? > > tablet-decoder-eval-test.0dist-test-slave-dist-test-slave-p2pc-2 > Nonefailed to download task files: WARNING 100 > isolateserver(1484): Adding unknown file 9a0053f2a34 yea that's odd, agree it's unrelated. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h File src/kudu/rpc/rpc-test-base.h: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h@441 PS1, Line 441: FLAGS_rpc_encrypt_loopback_connections = true; ah, does this mean we were not successfully testing encrypted RPCs in the past with this test? :( http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@740 PS1, Line 740:// Linux, SSL enabled hmm, is this fix related to the patch in question? I'm surprised we haven't seen this failure on our flaky test dashboard http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1223 PS1, Line 1223: // Helper function for TestCancellationMultiThreads. better to document briefly what it does, rather than what it is. http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1224 PS1, Line 1224: SendRpcAndCancel this should probably be named SendAndCancelRpcs or something, since it sends and cancels more than one of them http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1251 PS1, Line 1251: ASSERT_TRUE have to use CHECK here since this isn't the main thread and gtest isn't thread-safe http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1256 PS1, Line 1256: #define BUF_SIZE (16 * 1024 * 1024) can you use a const int defined below within the function instead? We prefer 'const int kBufferSize' vs a #define http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1258 PS1, Line 1258: Regression did this test successfully repro the bug w/o the fix? http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1274 PS1, Line 1274: unique_ptr buf(new uint8_t[BUF_SIZE]); : memset(buf.get(), 'a', BUF_SIZE); : Slice slice(buf.get(), BUF_SIZE); you could probably get away with just: string buf('a', kBufferSize); and then down below pass Slice(buf) -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 22:59:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9587 ) Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. Patch Set 1: the test failure seems to be infrastructure issue ? tablet-decoder-eval-test.0 dist-test-slave-dist-test-slave-p2pc-2 Nonefailed to download task files: WARNING 100 isolateserver(1484): Adding unknown file 9a0053f2a34 -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 21:23:21 + Gerrit-HasComments: No
[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9587 Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() .. KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() Previously, OutboundTransfer::TransferStarted() returns true iff non-zero bytes have been successfully sent via Writev(). As it turns out, this doesn't work well with SSL_write(). When SSL_write() returns -1 with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly the same buffer pointer next time even if 0 bytes have been written. The following sequence becomes problematic with the previous implementation of OutboundTransfer::TransferStarted(): - WriteHandler() calls SendBuffer() on an OutboundTransfer. - SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above. Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0 and OutboundTransfer::TransferStarted() still returns false. - OutboundTransfer is cancelled or timed out. car->call is set to NULL. - WirteHandler() is called again and as it notices that the OutboundTransfer hasn't really started yet and "car->call" is NULL due to cancellation, it removes it from the outbound transfer queue and moves on to the next entry in the queue. - WriteHandler() calls SendBuffer() with the next entry in the queue and eventually calls SSL_write() with a different buffer than expected by SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error. This change fixes the problem above by adding a boolean flag 'started_' which is set to true if OutboundTransfer::SendBuffer() has been called at least once. Also added some tests to exercise cancellation paths with multiple concurrent RPCs. Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 --- M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/transfer.cc M src/kudu/rpc/transfer.h 4 files changed, 83 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/9587/1 -- To view, visit http://gerrit.cloudera.org:8080/9587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9587 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho