Todd Lipcon has submitted this change and it was merged. (
Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with
KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()
Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.
The following sequence becomes problematic with the previous implementation
- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
hasn't really started yet and "car->call" is NULL due to cancellation, it
removes it from the outbound transfer queue and moves on to the next entry
in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
eventually calls SSL_write() with a different buffer than expected by
SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.
This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.
Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.
Reviewed-by: Sailesh Mukil <sail...@cloudera.com>
Reviewed-by: Todd Lipcon <t...@apache.org>
Tested-by: Todd Lipcon <t...@apache.org>
4 files changed, 84 insertions(+), 1 deletion(-)
Sailesh Mukil: Looks good to me, but someone else must approve
Todd Lipcon: Looks good to me, approved; Verified
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>