My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.
Varun On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <[email protected]> wrote: > Hey Varun, > > I guess your external view is pretty large, since each external view > callback takes ~3s. The RoutingTableProvider is callback based, so only > when there is a change in the external view, RoutingTableProvider will read > the entire external view from ZK. During the rolling upgrade, there are > lots of live instance change, which may lead to a lot of changes in the > external view. One possible way to mitigate the issue is to smooth the > traffic by having some delays in between bouncing nodes. We can do a rough > estimation on how many external view changes you might have during the > upgrade, how many listeners you have, and how large is the external views. > Once we have these numbers, we might know the ZK bandwidth requirement. ZK > read bandwidth can be scaled by adding ZK observers. > > ZK watcher is one time only, so every time a listener receives a > callback, it will re-register its watcher again to ZK. > > It's normally unreliable to depend on delta changes instead of reading > the entire znode. There might be some corner cases where you would lose > delta changes if you depend on that. > > For the ZK connection issue, do you have any log on the ZK server side > regarding this connection? > > Thanks, > Jason > > ------------------------------ > *From:* Varun Sharma [[email protected]] > *Sent:* Monday, February 02, 2015 4:41 PM > *To:* [email protected] > *Subject:* Re: Excessive ZooKeeper load > > I believe there is a misbehaving client. Here is a stack trace - it > probably lost connection and is now stampeding it: > > "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk > 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 > nid=0x7db5 in Object.wait() [0x00007f52ca9c3000] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.Object.wait(Object.java:503) > > at > org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > > - locked <0x00000004fb0d8c38> (a > org.apache.zookeeper.ClientCnxn$Packet) > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > > at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) > > at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823) > > * at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)* > > * at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)* > > * at > org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)* > > at org.apache.helix.manager.zk > .CallbackHandler.subscribeDataChange(CallbackHandler.java:241) > > at org.apache.helix.manager.zk > .CallbackHandler.subscribeForChanges(CallbackHandler.java:287) > > at org.apache.helix.manager.zk > .CallbackHandler.invoke(CallbackHandler.java:202) > > - locked <0x000000056b75a948> (a org.apache.helix.manager.zk > .ZKHelixManager) > > at org.apache.helix.manager.zk > .CallbackHandler.handleDataChange(CallbackHandler.java:338) > > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) > > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <[email protected]> wrote: > >> I am wondering what is causing the zk subscription to happen every 2-3 >> seconds - is this a new watch being established every 3 seconds ? >> >> Thanks >> Varun >> >> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <[email protected]> wrote: >> >>> Hi, >>> >>> We are serving a few different resources whose total # of partitions >>> is ~ 30K. We just did a rolling restart fo the cluster and the clients >>> which use the RoutingTableProvider are stuck in a bad state where they are >>> constantly subscribing to changes in the external view of a cluster. Here >>> is the helix log on the client after our rolling restart was finished - the >>> client is constantly polling ZK. The zookeeper node is pushing 300mbps >>> right now and most of the traffic is being pulled by clients. Is this a >>> race condition - also is there an easy way to make the clients not poll so >>> aggressively. We restarted one of the clients and we don't see these same >>> messages anymore. Also is it possible to just propagate external view diffs >>> instead of the whole big znode ? >>> >>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms >>> >>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider >>> >>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes >>> child-change. path: /main_a/EXTERNALVIEW, listener: >>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>> >>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms >>> >>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider >>> >>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes >>> child-change. path: /main_a/EXTERNALVIEW, listener: >>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>> >>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms >>> >>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE >>> /main_a/EXTERNALVIEW >>> listener:org.apache.helix.spectator.RoutingTableProvider >>> >>> >>> >> >
