Hello Mike Percy,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/9414
to review the following change.
Change subject: webserver-stress-itest: fix flakiness
......................................................................
webserver-stress-itest: fix flakiness
This fixes a source of flakiness I found on the flaky dashboard. In some runs
of this test, we'd hit the following interleaving:
- we start the master with webserver_port=0 and it picks some port (eg 35000)
- we stop the master
- the curl threads are still running, and one of them picks port 35000 as the
local side of its TCP connection. It then tries to connect to 35000 and hits
the dreaded "tcp loop connect" phenomenon[1] in which it actually connects
to _itself_. Thus it just hangs there occupying the port
- we try to start the master again, and it fails to bind
- we now time out trying to Join() on the curl thread, which is waiting forever
for itself to respond to an HTTP request.
The fix is to use non-ephemeral ports for the webserver as we already do
for the RPC server. I additionally added timeouts to the curl calls.
[1] http://www.rampa.sk/static/tcpLoopConnect.html
Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
---
M src/kudu/integration-tests/linked_list-test-util.h
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/util/curl_util.cc
M src/kudu/util/curl_util.h
4 files changed, 31 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/9414/1
--
To view, visit http://gerrit.cloudera.org:8080/9414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
Gerrit-Change-Number: 9414
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>