Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12761 )
Change subject: Increase timeout in tls_socket-test ...................................................................... Increase timeout in tls_socket-test Very rarely (~3/2000 times in TSAN with 8 stress threads), tls_socket-test will fail with an log like the following: I0314 19:20:54.118880 236 tls_socket-test.cc:109] server: negotiation complete I0314 19:20:54.119151 223 tls_socket-test.cc:109] client: negotiation complete I0314 19:21:04.127199 236 tls_socket-test.cc:165] server echoing 33406976 bytes /data/6/wdberkeley/kudu/src/kudu/security/tls_socket-test.cc:234: Failure Failed Bad status: Network error: BlockingRecv error: failed to read from TLS socket (remote: unknown): Connection reset by peer (error 104) It seems the following is happening: 1. The client and the echo server connect successfully. 2. The client sends its payload of 32MiB (33554432 bytes) in BlockingWrite. 3. The server, while looping in BlockingRecv receiving the payload and through some combination of resource saturation, unfavorable scheduling, and EINTR returns from recv, fails to read the whole payload before timing out. Notice the 10 second delay between the second and third messages (the timeout is 10s) and the number of bytes being echoed of < 32MiB. 4. The server terminates the connection because of the timeout, but this does not result in a failure on its side because the server was stopped by the client. 5. The client fails when it first tries to BlockingRecv from the closed connection, instead of on the second BlockingRecv as the test intends. This seems like a test-only issue- the time out on the server side seems like reasonable behavior. Since it's so rare, tripling the timeout should hopefully make the issue stop or at least make it much, much rarer. With a 10s timeout, 2000 runs on TSAN, and 8 stress threads, I saw 2-4 failures. With a 30s timeout, I see 0. Change-Id: Ibc615ea8f03a74f38b2bd6f3b4c140b3e435d4f3 Reviewed-on: http://gerrit.cloudera.org:8080/12761 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> --- M src/kudu/security/tls_socket-test.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/12761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibc615ea8f03a74f38b2bd6f3b4c140b3e435d4f3 Gerrit-Change-Number: 12761 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley <[email protected]>
