I assume that it also gets called when external views get modified ? How can i distinguish if there was an Add, a modify or a delete ?
Thanks Varun On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <[email protected]> wrote: > Yes. It will get invoked when external views are added or deleted. > ------------------------------ > *From:* Varun Sharma [[email protected]] > *Sent:* Thursday, February 05, 2015 1:27 AM > > *To:* [email protected] > *Subject:* Re: Excessive ZooKeeper load > > I had another question - does the RoutingTableProvider > onExternalViewChange call get invoked when a resource gets deleted (and > hence its external view znode) ? > > On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <[email protected]> wrote: > >> Yes. I think we did this in the incubating stage or even before. It's >> probably in a separate branch for some performance evaluation. >> >> ------------------------------ >> *From:* kishore g [[email protected]] >> *Sent:* Wednesday, February 04, 2015 9:54 PM >> >> *To:* [email protected] >> *Subject:* Re: Excessive ZooKeeper load >> >> Jason, I remember having the ability to compress/decompress and >> before we added the support to bucketize, compression was used to support >> large number of partitions. However I dont see the code anywhere. Did we do >> this on a separate branch? >> >> thanks, >> Kishore G >> >> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <[email protected]> wrote: >> >>> Hi Varun, we can certainly add compression and have a config for >>> turning it on/off. We do have implemented compression in our own zkclient >>> before. The issue for compression might be: >>> 1) cpu consumption on controller will increase. >>> 2) hard to debug >>> >>> Thanks, >>> Jason >>> ------------------------------ >>> *From:* kishore g [[email protected]] >>> *Sent:* Wednesday, February 04, 2015 3:08 PM >>> >>> *To:* [email protected] >>> *Subject:* Re: Excessive ZooKeeper load >>> >>> we do have the ability to compress the data. I am not sure if there >>> is a easy way to turn on/off the compression. >>> >>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <[email protected]> >>> wrote: >>> >>>> I am wondering if its possible to gzip the external view znode - a >>>> simple gzip cut down the data size by 25X. Is it possible to plug in >>>> compression/decompression as zookeeper nodes are read ? >>>> >>>> Varun >>>> >>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <[email protected]> wrote: >>>> >>>>> There are multiple options we can try here. >>>>> what if we used cacheddataaccessor for this use case?.clients will >>>>> only read if node has changed. This optimization can benefit all use >>>>> cases. >>>>> >>>>> What about batching the watch triggers. Not sure which version of >>>>> helix has this option. >>>>> >>>>> Another option is to use a poll based roundtable instead of watch >>>>> based. This can coupled with cacheddataaccessor can be over efficient. >>>>> >>>>> Thanks, >>>>> Kishore G >>>>> On Feb 2, 2015 8:17 PM, "Varun Sharma" <[email protected]> wrote: >>>>> >>>>>> My total external view across all resources is roughly 3M in size and >>>>>> there are 100 clients downloading it twice for every node restart - thats >>>>>> 600M of data for every restart. So I guess that is causing this issue. We >>>>>> are thinking of doing some tricks to limit the # of clients to 1 from >>>>>> 100. >>>>>> I guess that should help significantly. >>>>>> >>>>>> Varun >>>>>> >>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hey Varun, >>>>>>> >>>>>>> I guess your external view is pretty large, since each external >>>>>>> view callback takes ~3s. The RoutingTableProvider is callback >>>>>>> based, so only when there is a change in the external view, >>>>>>> RoutingTableProvider will read the entire external view from ZK. During >>>>>>> the >>>>>>> rolling upgrade, there are lots of live instance change, which may lead >>>>>>> to >>>>>>> a lot of changes in the external view. One possible way to mitigate the >>>>>>> issue is to smooth the traffic by having some delays in between bouncing >>>>>>> nodes. We can do a rough estimation on how many external view changes >>>>>>> you >>>>>>> might have during the upgrade, how many listeners you have, and how >>>>>>> large >>>>>>> is the external views. Once we have these numbers, we might know the ZK >>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK >>>>>>> observers. >>>>>>> >>>>>>> ZK watcher is one time only, so every time a listener receives a >>>>>>> callback, it will re-register its watcher again to ZK. >>>>>>> >>>>>>> It's normally unreliable to depend on delta changes instead of >>>>>>> reading the entire znode. There might be some corner cases where you >>>>>>> would >>>>>>> lose delta changes if you depend on that. >>>>>>> >>>>>>> For the ZK connection issue, do you have any log on the ZK server >>>>>>> side regarding this connection? >>>>>>> >>>>>>> Thanks, >>>>>>> Jason >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Varun Sharma [[email protected]] >>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM >>>>>>> *To:* [email protected] >>>>>>> *Subject:* Re: Excessive ZooKeeper load >>>>>>> >>>>>>> I believe there is a misbehaving client. Here is a stack trace - >>>>>>> it probably lost connection and is now stampeding it: >>>>>>> >>>>>>> "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk >>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 >>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000] >>>>>>> >>>>>>> java.lang.Thread.State: WAITING (on object monitor) >>>>>>> >>>>>>> at java.lang.Object.wait(Native Method) >>>>>>> >>>>>>> at java.lang.Object.wait(Object.java:503) >>>>>>> >>>>>>> at >>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) >>>>>>> >>>>>>> - locked <0x00000004fb0d8c38> (a >>>>>>> org.apache.zookeeper.ClientCnxn$Packet) >>>>>>> >>>>>>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) >>>>>>> >>>>>>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >>>>>>> >>>>>>> at org.I0Itec.zk >>>>>>> client.ZkConnection.exists(ZkConnection.java:95) >>>>>>> >>>>>>> at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823) >>>>>>> >>>>>>> * at >>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)* >>>>>>> >>>>>>> * at >>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)* >>>>>>> >>>>>>> * at >>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)* >>>>>>> >>>>>>> at org.apache.helix.manager.zk >>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241) >>>>>>> >>>>>>> at org.apache.helix.manager.zk >>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287) >>>>>>> >>>>>>> at org.apache.helix.manager.zk >>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202) >>>>>>> >>>>>>> - locked <0x000000056b75a948> (a org.apache.helix.manager.zk >>>>>>> .ZKHelixManager) >>>>>>> >>>>>>> at org.apache.helix.manager.zk >>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338) >>>>>>> >>>>>>> at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) >>>>>>> >>>>>>> at org.I0Itec.zk >>>>>>> client.ZkEventThread.run(ZkEventThread.java:71) >>>>>>> >>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I am wondering what is causing the zk subscription to happen every >>>>>>>> 2-3 seconds - is this a new watch being established every 3 seconds ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Varun >>>>>>>> >>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We are serving a few different resources whose total # of >>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and >>>>>>>>> the >>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state >>>>>>>>> where >>>>>>>>> they are constantly subscribing to changes in the external view of a >>>>>>>>> cluster. Here is the helix log on the client after our rolling >>>>>>>>> restart was >>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is >>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by >>>>>>>>> clients. Is this a race condition - also is there an easy way to make >>>>>>>>> the >>>>>>>>> clients not poll so aggressively. We restarted one of the clients and >>>>>>>>> we >>>>>>>>> don't see these same messages anymore. Also is it possible to just >>>>>>>>> propagate external view diffs instead of the whole big znode ? >>>>>>>>> >>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms >>>>>>>>> >>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>> >>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes >>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener: >>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>>>>>>>> >>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms >>>>>>>>> >>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>> >>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes >>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener: >>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>>>>>>>> >>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms >>>>>>>>> >>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >
