Hi,

We are serving a few different resources whose total # of partitions is ~
30K. We just did a rolling restart fo the cluster and the clients which use
the RoutingTableProvider are stuck in a bad state where they are constantly
subscribing to changes in the external view of a cluster. Here is the helix
log on the client after our rolling restart was finished - the client is
constantly polling ZK. The zookeeper node is pushing 300mbps right now and
most of the traffic is being pulled by clients. Is this a race condition -
also is there an easy way to make the clients not poll so aggressively. We
restarted one of the clients and we don't see these same messages anymore.
Also is it possible to just propagate external view diffs instead of the
whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
child-change. path: /main_a/EXTERNALVIEW, listener:
org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
child-change. path: /main_a/EXTERNALVIEW, listener:
org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

Reply via email to