[ 
https://issues.apache.org/jira/browse/IMPALA-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6624.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

Fixed at 
[https://github.com/apache/impala/commit/8079cd9d2a87051f81a41910b74fab15e35f36ea]

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Reviewed-on: http://gerrit.cloudera.org:8080/9587
Reviewed-by: Sailesh Mukil <sail...@cloudera.com>
Reviewed-by: Todd Lipcon <t...@apache.org>
Tested-by: Todd Lipcon <t...@apache.org>
Reviewed-on: http://gerrit.cloudera.org:8080/9606
Tested-by: Impala Public Jenkins

> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6624
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6624
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 3.0, Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Blocker
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> During stress testing in a secure 140 node cluster, Impalad ran into the 
> following errors. This is supposed to be fixed in KUDU-2218. The fix for 
> KUDU-2218 has already been cherry-picked to Impala code base at this 
> [commit|https://github.com/apache/impala/commit/678bf28e233e667b05585110422762614840bdc2]
>  and the build should have this commit. It's unclear if Impala may be missing 
> other commits or the issue in KUDU-2218 is not completely fixed.
> Assigning to [~sailesh] to lead the investigation. Please feel free to 
> reassign to me if you are swamped Sailesh.
> {noformat}
> W0307 03:31:04.512100 158268 connection.cc:659] client connection to 
> 10.17.221.47:27000 send error: Network error: failed to write to TLS socket: 
> error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> W0307 03:31:04.524086 158268 connection.cc:153] Shutting down client 
> connection to 10.17.221.47:27000 with pending inbound data (11/16 bytes 
> received, last active 0 ns ago, status=Network error: failed to write to TLS 
> socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad writ
> e retry:s3_pkt.c:874)
> E0307 03:31:04.535635 123156 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> W0307 03:31:04.536145 158268 connection.cc:190] Error closing socket: Network 
> error: TlsSocket::Close: Success
> E0307 03:31:04.584370 140087 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> I0307 03:31:04.697773 158412 rpcz_store.cc:255] Call 
> impala.DataStreamService.TransmitData from 10.17.221.15:33716 (request call 
> id 509466) took 125221ms. Request Metrics: {}
> E0307 03:31:04.707012 64577 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:04.767437 123164 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:04.786669 117111 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:04.792443 118554 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:04.912823 108328 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.221110 64484 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.228492 167981 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.232076 117126 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.412305 69586 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.427347 64667 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.430274 65641 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.436692 66206 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.437369 116174 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.515347 115108 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.545945 66826 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.752233 68861 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.793612 117106 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.799756 102340 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.801004 107447 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.832300 138449 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:05.881510 66751 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:06.278340 138373 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:06.278870 116990 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:06.280494 136840 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:06.490084 66207 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:07.231269 67227 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> E0307 03:31:07.339190 66752 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to