Another thing is that the RoutingTable is logging this line "Resetting the routing table.". Looks like this happens when we fail to set the watcher.
thanks, Kishore G On Sat, Mar 7, 2015 at 8:05 PM, kishore g <[email protected]> wrote: > Your explanation makes sense. > > > https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java. > For bucketized resource we see that path is deleted and set again. Jason, > any idea why we are removing the path? > > case EXTERNALVIEW: if (value.getBucketSize() == 0) { records.add(value. > getRecord()); } else { _baseDataAccessor.remove(path, options); > > On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <[email protected]> wrote: > >> How does the writing of externalview work for bucketized resources -is it >> possible that the top level znode for the resource is first deleted and >> then rewritten with the latest external view ? >> >> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <[email protected]> wrote: >> >>> Here is the stack trace - there is a zookeeper race and the detailed >>> stack trace appears for bucketized resources. I saw that the ideal state >>> for the resource was created on 26th Feb and was modified on 7th March. >>> However, the external view for the resource is showing up as created on 7th >>> march as well as modified on 7th march. The external view is created at >>> 10:36:04 on 7th march which is 20 seconds after this log message stack >>> trace is spit out. After this the routing table provider no longer receives >>> any more zk callbacks. >>> >>> 2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0, >>> rc:NONODE, path: >>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040 >>> >>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a, >>> rc:NONODE, path: >>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739 >>> >>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f, >>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>> >>> 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] >>> (CallbackHandler.java:304) WARN fail to subscribe child/data change. path: >>> /main_a/EXTERNALVIEW, listener: >>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da >>> >>> *org.I0Itec.zkclient.exception.ZkNoNodeException: >>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250* >>> >>> at >>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) >>> >>> at >>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) >>> >>> at >>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210) >>> >>> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391) >>> >>> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570) >>> >>> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >>> >>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>> KeeperErrorCode = NoNode for >>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>> >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> >>> at >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) >>> >>> 2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181] >>> (RoutingTableProvider.java:99) INFO *Resetting* the routing table. >>> >>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <[email protected]> >>> wrote: >>> >>>> I suspect the callbacks are not coming in, for a long time now. >>>> >>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <[email protected]> >>>> wrote: >>>> >>>>> I grepped this and found nothing: >>>>> >>>>> sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log* >>>>> >>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode though. >>>>> >>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <[email protected]> >>>>> wrote: >>>>> >>>>>> Yes. you should see a pair of "START:INVOKE..." and >>>>>> "END:INVOKE:..." for each callback in your log. >>>>>> ------------------------------ >>>>>> *From:* Varun Sharma [[email protected]] >>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM >>>>>> *To:* [email protected] >>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks >>>>>> >>>>>> Ohk - is there a way to confirm that the callbacks are being >>>>>> processed (from the logs etc.) ? >>>>>> >>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Varun, >>>>>>> >>>>>>> This should not be a problem. When we register a callback, we are >>>>>>> expecting a call back type of INIT first, followed by a sequence of >>>>>>> CALLBACK types, and when you unregister the callback, you will received >>>>>>> a >>>>>>> FINALIZED type. Since unregister is an async operation, when you >>>>>>> receive a >>>>>>> FINALIZED type, you might still see a couple of CALLBACK type callbacks, >>>>>>> which are simply ignored. The log is basically telling you that. >>>>>>> >>>>>>> Thanks, >>>>>>> Jason >>>>>>> ------------------------------ >>>>>>> *From:* Varun Sharma [[email protected]] >>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM >>>>>>> *To:* [email protected] >>>>>>> *Subject:* RoutingTableProvider dropping callbacks >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> It seems that the RoutingTableProvider is dropping callbacks in >>>>>>> our case. Here is a log: >>>>>>> >>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181] >>>>>>> (CallbackHandler.java:130) WARN Skip processing callbacks for listener: >>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062, >>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK >>>>>>> >>>>>>> >>>>>>> We have a custom RoutingTableProvider to catch callbacks and do >>>>>>> some processing - this is causing a lot of issues for us. What could be >>>>>>> causing this ? >>>>>>> >>>>>>> Thanks >>>>>>> Varun >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
