Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12647


Change subject: [TS heartbeater] avoid reconnecting to master too often
......................................................................

[TS heartbeater] avoid reconnecting to master too often

With this patch, the heartbeater thread in tservers don't reset
its master proxy and reconnect to master (re-negotiating a connection)
every heartbeat if the master is accepting connections and Ping RPC
requests but isn't able to properly respond to TS heartbeats.

E.g., when running RemoteKsckTest.TestClusterWithLocation test scenario
for TSAN builds, I sometimes saw log messages like the following
(the test sets FLAGS_heartbeat_interval_ms = 10):

I0301 20:29:11.932394  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.944639  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.946904  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.960994  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.964995  3819 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.972220  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.974987  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.988946  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:11.991653  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.003091  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.017015  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.017540  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.031175  3819 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.031175  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.046165  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.059644  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.073026  3819 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.075335  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.077802  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.089138  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.101193  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.102268  3819 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.104634  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.118392  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.132237  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.147235  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.165709  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.171120  3819 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.179481  3746 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221
I0301 20:29:12.191591  3671 heartbeater.cc:345] Connected to a master server at 
127.3.75.254:36221

It turned out the counter of the consecutively failed heartbeats kept
increasing, while the master was responding with ServiceUnavailable
to incoming TS hearbeats.

Change-Id: I961ae453ffd6ce343574ce58cb0e13fdad218078
---
M src/kudu/tserver/heartbeater.cc
1 file changed, 18 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12647/1
--
To view, visit http://gerrit.cloudera.org:8080/12647
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I961ae453ffd6ce343574ce58cb0e13fdad218078
Gerrit-Change-Number: 12647
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>

Reply via email to