So does the routing table provider receive updates for all the external views - does the "list<ExternalView>" always contain all the external views ?
Varun On Fri, Feb 6, 2015 at 10:37 AM, Zhen Zhang <[email protected]> wrote: > It doesn't distinguish. RoutingTableProvide is always trying to keep its > content the same as those on ZK. > > ------------------------------ > *From:* Varun Sharma [[email protected]] > *Sent:* Friday, February 06, 2015 10:27 AM > > *To:* [email protected] > *Subject:* Re: Excessive ZooKeeper load > > How does the original RoutingTableProvider distinguish deletions from > add/updates ? > > On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <[email protected]> wrote: > >> Hi Varun, for the batching update. Helix controller is not updating >> external view on every update. Normally Helix controller will aggregate >> updates during a period of time. Say for 100 partitions, if they are >> updated roughly as the same time, then Helix controller will update >> external view only once. For routing table, what do you mean by ignoring >> delete events? RoutingTable will always be updated by ZK callbacks and sync >> up with the corresponding external views on ZK. >> >> Thanks, >> Jason >> >> ------------------------------ >> *From:* Varun Sharma [[email protected]] >> *Sent:* Thursday, February 05, 2015 9:17 PM >> >> *To:* [email protected] >> *Subject:* Re: Excessive ZooKeeper load >> >> One more question for the routing table provider - is it possible to >> distinguish b/w add/modify and delete - I essentially want to ignore the >> delete events - can that be found by looking at the list of ExternalView(s) >> being passed ? >> >> Thanks >> Varun >> >> On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <[email protected]> wrote: >> >>> I see - one more thing - there was talk of a batching mode where Helix >>> can batch updates - can it batch multiple updates to the external view and >>> write once into zookeeper instead of writing for every update. For example, >>> consider the case when lots of partitions are being onlined - if we could >>> batch updates to the external view into batches of 100 ? Is that supported >>> in Helix 0.6.4 >>> >>> Thanks ! >>> Varun >>> >>> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <[email protected]> wrote: >>> >>>> Yes. the listener will be notified on add/delete/modify. You can >>>> distinguish if you have a local cache and compare to get the delta. >>>> Currently the API doesn't expose this. >>>> >>>> ------------------------------ >>>> *From:* Varun Sharma [[email protected]] >>>> *Sent:* Thursday, February 05, 2015 1:53 PM >>>> >>>> *To:* [email protected] >>>> *Subject:* Re: Excessive ZooKeeper load >>>> >>>> I assume that it also gets called when external views get modified >>>> ? How can i distinguish if there was an Add, a modify or a delete ? >>>> >>>> Thanks >>>> Varun >>>> >>>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <[email protected]> wrote: >>>> >>>>> Yes. It will get invoked when external views are added or deleted. >>>>> ------------------------------ >>>>> *From:* Varun Sharma [[email protected]] >>>>> *Sent:* Thursday, February 05, 2015 1:27 AM >>>>> >>>>> *To:* [email protected] >>>>> *Subject:* Re: Excessive ZooKeeper load >>>>> >>>>> I had another question - does the RoutingTableProvider >>>>> onExternalViewChange call get invoked when a resource gets deleted (and >>>>> hence its external view znode) ? >>>>> >>>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <[email protected]> >>>>> wrote: >>>>> >>>>>> Yes. I think we did this in the incubating stage or even before. >>>>>> It's probably in a separate branch for some performance evaluation. >>>>>> >>>>>> ------------------------------ >>>>>> *From:* kishore g [[email protected]] >>>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM >>>>>> >>>>>> *To:* [email protected] >>>>>> *Subject:* Re: Excessive ZooKeeper load >>>>>> >>>>>> Jason, I remember having the ability to compress/decompress and >>>>>> before we added the support to bucketize, compression was used to support >>>>>> large number of partitions. However I dont see the code anywhere. Did we >>>>>> do >>>>>> this on a separate branch? >>>>>> >>>>>> thanks, >>>>>> Kishore G >>>>>> >>>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Varun, we can certainly add compression and have a config for >>>>>>> turning it on/off. We do have implemented compression in our own >>>>>>> zkclient >>>>>>> before. The issue for compression might be: >>>>>>> 1) cpu consumption on controller will increase. >>>>>>> 2) hard to debug >>>>>>> >>>>>>> Thanks, >>>>>>> Jason >>>>>>> ------------------------------ >>>>>>> *From:* kishore g [[email protected]] >>>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM >>>>>>> >>>>>>> *To:* [email protected] >>>>>>> *Subject:* Re: Excessive ZooKeeper load >>>>>>> >>>>>>> we do have the ability to compress the data. I am not sure if >>>>>>> there is a easy way to turn on/off the compression. >>>>>>> >>>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I am wondering if its possible to gzip the external view znode - a >>>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in >>>>>>>> compression/decompression as zookeeper nodes are read ? >>>>>>>> >>>>>>>> Varun >>>>>>>> >>>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> There are multiple options we can try here. >>>>>>>>> what if we used cacheddataaccessor for this use case?.clients will >>>>>>>>> only read if node has changed. This optimization can benefit all use >>>>>>>>> cases. >>>>>>>>> >>>>>>>>> What about batching the watch triggers. Not sure which version of >>>>>>>>> helix has this option. >>>>>>>>> >>>>>>>>> Another option is to use a poll based roundtable instead of watch >>>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kishore G >>>>>>>>> On Feb 2, 2015 8:17 PM, "Varun Sharma" <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> My total external view across all resources is roughly 3M in size >>>>>>>>>> and there are 100 clients downloading it twice for every node >>>>>>>>>> restart - >>>>>>>>>> thats 600M of data for every restart. So I guess that is causing this >>>>>>>>>> issue. We are thinking of doing some tricks to limit the # of >>>>>>>>>> clients to 1 >>>>>>>>>> from 100. I guess that should help significantly. >>>>>>>>>> >>>>>>>>>> Varun >>>>>>>>>> >>>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Varun, >>>>>>>>>>> >>>>>>>>>>> I guess your external view is pretty large, since each >>>>>>>>>>> external view callback takes ~3s. The RoutingTableProvider is >>>>>>>>>>> callback based, so only when there is a change in the external view, >>>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. >>>>>>>>>>> During the >>>>>>>>>>> rolling upgrade, there are lots of live instance change, which may >>>>>>>>>>> lead to >>>>>>>>>>> a lot of changes in the external view. One possible way to mitigate >>>>>>>>>>> the >>>>>>>>>>> issue is to smooth the traffic by having some delays in between >>>>>>>>>>> bouncing >>>>>>>>>>> nodes. We can do a rough estimation on how many external view >>>>>>>>>>> changes you >>>>>>>>>>> might have during the upgrade, how many listeners you have, and how >>>>>>>>>>> large >>>>>>>>>>> is the external views. Once we have these numbers, we might know >>>>>>>>>>> the ZK >>>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK >>>>>>>>>>> observers. >>>>>>>>>>> >>>>>>>>>>> ZK watcher is one time only, so every time a listener receives >>>>>>>>>>> a callback, it will re-register its watcher again to ZK. >>>>>>>>>>> >>>>>>>>>>> It's normally unreliable to depend on delta changes instead of >>>>>>>>>>> reading the entire znode. There might be some corner cases where >>>>>>>>>>> you would >>>>>>>>>>> lose delta changes if you depend on that. >>>>>>>>>>> >>>>>>>>>>> For the ZK connection issue, do you have any log on the ZK >>>>>>>>>>> server side regarding this connection? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> ------------------------------ >>>>>>>>>>> *From:* Varun Sharma [[email protected]] >>>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM >>>>>>>>>>> *To:* [email protected] >>>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load >>>>>>>>>>> >>>>>>>>>>> I believe there is a misbehaving client. Here is a stack >>>>>>>>>>> trace - it probably lost connection and is now stampeding it: >>>>>>>>>>> >>>>>>>>>>> "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk >>>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 >>>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() >>>>>>>>>>> [0x00007f52ca9c3000] >>>>>>>>>>> >>>>>>>>>>> java.lang.Thread.State: WAITING (on object monitor) >>>>>>>>>>> >>>>>>>>>>> at java.lang.Object.wait(Native Method) >>>>>>>>>>> >>>>>>>>>>> at java.lang.Object.wait(Object.java:503) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) >>>>>>>>>>> >>>>>>>>>>> - locked <0x00000004fb0d8c38> (a >>>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >>>>>>>>>>> >>>>>>>>>>> at org.I0Itec.zk >>>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95) >>>>>>>>>>> >>>>>>>>>>> at org.I0Itec.zk >>>>>>>>>>> client.ZkClient$11.call(ZkClient.java:823) >>>>>>>>>>> >>>>>>>>>>> * at >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)* >>>>>>>>>>> >>>>>>>>>>> * at >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)* >>>>>>>>>>> >>>>>>>>>>> * at >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)* >>>>>>>>>>> >>>>>>>>>>> at org.apache.helix.manager.zk >>>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241) >>>>>>>>>>> >>>>>>>>>>> at org.apache.helix.manager.zk >>>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287) >>>>>>>>>>> >>>>>>>>>>> at org.apache.helix.manager.zk >>>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202) >>>>>>>>>>> >>>>>>>>>>> - locked <0x000000056b75a948> (a >>>>>>>>>>> org.apache.helix.manager.zk.ZKHelixManager) >>>>>>>>>>> >>>>>>>>>>> at org.apache.helix.manager.zk >>>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338) >>>>>>>>>>> >>>>>>>>>>> at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) >>>>>>>>>>> >>>>>>>>>>> at org.I0Itec.zk >>>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71) >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am wondering what is causing the zk subscription to happen >>>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 >>>>>>>>>>>> seconds ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Varun >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> We are serving a few different resources whose total # of >>>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster >>>>>>>>>>>>> and the >>>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad >>>>>>>>>>>>> state where >>>>>>>>>>>>> they are constantly subscribing to changes in the external view >>>>>>>>>>>>> of a >>>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling >>>>>>>>>>>>> restart was >>>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper >>>>>>>>>>>>> node is >>>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled >>>>>>>>>>>>> by >>>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to >>>>>>>>>>>>> make the >>>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients >>>>>>>>>>>>> and we >>>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just >>>>>>>>>>>>> propagate external view diffs instead of the whole big znode ? >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: >>>>>>>>>>>>> 3340ms >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 >>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener: >>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: >>>>>>>>>>>>> 3371ms >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 >>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener: >>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879 >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: >>>>>>>>>>>>> 3281ms >>>>>>>>>>>>> >>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE >>>>>>>>>>>>> /main_a/EXTERNALVIEW >>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
