Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12647
Change subject: [TS heartbeater] avoid reconnecting to master too often ...................................................................... [TS heartbeater] avoid reconnecting to master too often With this patch, the heartbeater thread in tservers don't reset its master proxy and reconnect to master (re-negotiating a connection) every heartbeat if the master is accepting connections and Ping RPC requests but isn't able to properly respond to TS heartbeats. E.g., when running RemoteKsckTest.TestClusterWithLocation test scenario for TSAN builds, I sometimes saw log messages like the following (the test sets FLAGS_heartbeat_interval_ms = 10): I0301 20:29:11.932394 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.944639 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.946904 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.960994 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.964995 3819 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.972220 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.974987 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.988946 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:11.991653 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.003091 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.017015 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.017540 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.031175 3819 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.031175 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.046165 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.059644 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.073026 3819 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.075335 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.077802 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.089138 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.101193 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.102268 3819 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.104634 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.118392 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.132237 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.147235 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.165709 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.171120 3819 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.179481 3746 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 I0301 20:29:12.191591 3671 heartbeater.cc:345] Connected to a master server at 127.3.75.254:36221 It turned out the counter of the consecutively failed heartbeats kept increasing, while the master was responding with ServiceUnavailable to incoming TS hearbeats. Change-Id: I961ae453ffd6ce343574ce58cb0e13fdad218078 --- M src/kudu/tserver/heartbeater.cc 1 file changed, 18 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12647/1 -- To view, visit http://gerrit.cloudera.org:8080/12647 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I961ae453ffd6ce343574ce58cb0e13fdad218078 Gerrit-Change-Number: 12647 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <[email protected]>
