[ https://issues.apache.org/jira/browse/IMPALA-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Ho resolved IMPALA-6624. -------------------------------- Resolution: Fixed Fix Version/s: Impala 2.12.0 Impala 3.0 Fixed at [https://github.com/apache/impala/commit/8079cd9d2a87051f81a41910b74fab15e35f36ea] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() Previously, OutboundTransfer::TransferStarted() returns true iff non-zero bytes have been successfully sent via Writev(). As it turns out, this doesn't work well with SSL_write(). When SSL_write() returns -1 with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly the same buffer pointer next time even if 0 bytes have been written. The following sequence becomes problematic with the previous implementation of OutboundTransfer::TransferStarted(): - WriteHandler() calls SendBuffer() on an OutboundTransfer. - SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above. Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0 and OutboundTransfer::TransferStarted() still returns false. - OutboundTransfer is cancelled or timed out. car->call is set to NULL. - WirteHandler() is called again and as it notices that the OutboundTransfer hasn't really started yet and "car->call" is NULL due to cancellation, it removes it from the outbound transfer queue and moves on to the next entry in the queue. - WriteHandler() calls SendBuffer() with the next entry in the queue and eventually calls SSL_write() with a different buffer than expected by SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error. This change fixes the problem above by adding a boolean flag 'started_' which is set to true if OutboundTransfer::SendBuffer() has been called at least once. Also added some tests to exercise cancellation paths with multiple concurrent RPCs. Confirmed the problem above is fixed by running stress test in a 130 node cluster with Impala. The problem happened consistently without the fix. Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Reviewed-on: http://gerrit.cloudera.org:8080/9587 Reviewed-by: Sailesh Mukil <sail...@cloudera.com> Reviewed-by: Todd Lipcon <t...@apache.org> Tested-by: Todd Lipcon <t...@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9606 Tested-by: Impala Public Jenkins > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c > --------------------------------------------------------------------------------------------------------------------- > > Key: IMPALA-6624 > URL: https://issues.apache.org/jira/browse/IMPALA-6624 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec > Affects Versions: Impala 3.0, Impala 2.12.0 > Reporter: Michael Ho > Assignee: Michael Ho > Priority: Blocker > Fix For: Impala 3.0, Impala 2.12.0 > > > During stress testing in a secure 140 node cluster, Impalad ran into the > following errors. This is supposed to be fixed in KUDU-2218. The fix for > KUDU-2218 has already been cherry-picked to Impala code base at this > [commit|https://github.com/apache/impala/commit/678bf28e233e667b05585110422762614840bdc2] > and the build should have this commit. It's unclear if Impala may be missing > other commits or the issue in KUDU-2218 is not completely fixed. > Assigning to [~sailesh] to lead the investigation. Please feel free to > reassign to me if you are swamped Sailesh. > {noformat} > W0307 03:31:04.512100 158268 connection.cc:659] client connection to > 10.17.221.47:27000 send error: Network error: failed to write to TLS socket: > error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > W0307 03:31:04.524086 158268 connection.cc:153] Shutting down client > connection to 10.17.221.47:27000 with pending inbound data (11/16 bytes > received, last active 0 ns ago, status=Network error: failed to write to TLS > socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad writ > e retry:s3_pkt.c:874) > E0307 03:31:04.535635 123156 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > W0307 03:31:04.536145 158268 connection.cc:190] Error closing socket: Network > error: TlsSocket::Close: Success > E0307 03:31:04.584370 140087 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > I0307 03:31:04.697773 158412 rpcz_store.cc:255] Call > impala.DataStreamService.TransmitData from 10.17.221.15:33716 (request call > id 509466) took 125221ms. Request Metrics: {} > E0307 03:31:04.707012 64577 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:04.767437 123164 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:04.786669 117111 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:04.792443 118554 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:04.912823 108328 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.221110 64484 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.228492 167981 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.232076 117126 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.412305 69586 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.427347 64667 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.430274 65641 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.436692 66206 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.437369 116174 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.515347 115108 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.545945 66826 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.752233 68861 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.793612 117106 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.799756 102340 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.801004 107447 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.832300 138449 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:05.881510 66751 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:06.278340 138373 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:06.278870 116990 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:06.280494 136840 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:06.490084 66207 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:07.231269 67227 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > E0307 03:31:07.339190 66752 krpc-data-stream-sender.cc:335] channel send to > 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: > Network error: failed to write to TLS socket: error:1409F07F:SSL > routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)