If I recall correctly from a previous thread, it seems like we don't even support changing of bucket sizes for the same resource - so it seems we should probably not be deleting the znode in this case ?
On Sun, Mar 8, 2015 at 11:43 PM, Zhen Zhang <[email protected]> wrote: > @Kishore, I think the remove is used in case bucket size is changed, so > we can clean all the buckets for old size and set it using new size. > > The issue seems like a race condition in setting bucketized external > view and add watches on child paths. Will investigate more. > > Thanks, > Jason > ------------------------------ > *From:* Varun Sharma [[email protected]] > *Sent:* Saturday, March 07, 2015 11:07 PM > > *To:* [email protected] > *Subject:* Re: RoutingTableProvider dropping callbacks > > Please find the attached log file with the above trace. > > On Sat, Mar 7, 2015 at 8:12 PM, kishore g <[email protected]> wrote: > >> Another thing is that the RoutingTable is logging this line "Resetting >> the routing table.". Looks like this happens when we fail to set the >> watcher. >> >> thanks, >> Kishore G >> >> On Sat, Mar 7, 2015 at 8:05 PM, kishore g <[email protected]> wrote: >> >>> Your explanation makes sense. >>> >>> >>> https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java. >>> For bucketized resource we see that path is deleted and set again. Jason, >>> any idea why we are removing the path? >>> >>> case EXTERNALVIEW: if (value.getBucketSize() == 0) { records. >>> add(value.getRecord()); } else { _baseDataAccessor.remove(path, >>> options); >>> >>> On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <[email protected]> >>> wrote: >>> >>>> How does the writing of externalview work for bucketized resources -is >>>> it possible that the top level znode for the resource is first deleted and >>>> then rewritten with the latest external view ? >>>> >>>> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <[email protected]> >>>> wrote: >>>> >>>>> Here is the stack trace - there is a zookeeper race and the detailed >>>>> stack trace appears for bucketized resources. I saw that the ideal state >>>>> for the resource was created on 26th Feb and was modified on 7th March. >>>>> However, the external view for the resource is showing up as created on >>>>> 7th >>>>> march as well as modified on 7th march. The external view is created at >>>>> 10:36:04 on 7th march which is 20 seconds after this log message stack >>>>> trace is spit out. After this the routing table provider no longer >>>>> receives >>>>> any more zk callbacks. >>>>> >>>>> 2015-03-07 10:35:43,735 [main-EventThread] >>>>> (ZkAsyncCallbacks.java:127) WARN >>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0, >>>>> rc:NONODE, path: >>>>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040 >>>>> >>>>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>>>> WARN >>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a, >>>>> rc:NONODE, path: >>>>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739 >>>>> >>>>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>>>> WARN >>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f, >>>>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>>>> >>>>> 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] >>>>> (CallbackHandler.java:304) WARN fail to subscribe child/data change. >>>>> path: >>>>> /main_a/EXTERNALVIEW, listener: >>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da >>>>> >>>>> *org.I0Itec.zkclient.exception.ZkNoNodeException: >>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >>>>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250* >>>>> >>>>> at >>>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) >>>>> >>>>> at >>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) >>>>> >>>>> at >>>>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210) >>>>> >>>>> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) >>>>> >>>>> at >>>>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279) >>>>> >>>>> at >>>>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202) >>>>> >>>>> at >>>>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391) >>>>> >>>>> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570) >>>>> >>>>> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >>>>> >>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>>> KeeperErrorCode = NoNode for >>>>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>>>> >>>>> at >>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>>> >>>>> at >>>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) >>>>> >>>>> 2015-03-07 10:35:43,848 >>>>> [ZkClient-EventThread-17-terrapinzk001a:2181] >>>>> (RoutingTableProvider.java:99) INFO *Resetting* the routing table. >>>>> >>>>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <[email protected]> >>>>> wrote: >>>>> >>>>>> I suspect the callbacks are not coming in, for a long time now. >>>>>> >>>>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I grepped this and found nothing: >>>>>>> >>>>>>> sudo grep START:INVOKE.*EXTERNALVIEW >>>>>>> /var/log/terrapin/controller.log* >>>>>>> >>>>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode though. >>>>>>> >>>>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes. you should see a pair of "START:INVOKE..." and >>>>>>>> "END:INVOKE:..." for each callback in your log. >>>>>>>> ------------------------------ >>>>>>>> *From:* Varun Sharma [[email protected]] >>>>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM >>>>>>>> *To:* [email protected] >>>>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks >>>>>>>> >>>>>>>> Ohk - is there a way to confirm that the callbacks are being >>>>>>>> processed (from the logs etc.) ? >>>>>>>> >>>>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Varun, >>>>>>>>> >>>>>>>>> This should not be a problem. When we register a callback, we >>>>>>>>> are expecting a call back type of INIT first, followed by a sequence >>>>>>>>> of >>>>>>>>> CALLBACK types, and when you unregister the callback, you will >>>>>>>>> received a >>>>>>>>> FINALIZED type. Since unregister is an async operation, when you >>>>>>>>> receive a >>>>>>>>> FINALIZED type, you might still see a couple of CALLBACK type >>>>>>>>> callbacks, >>>>>>>>> which are simply ignored. The log is basically telling you that. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jason >>>>>>>>> ------------------------------ >>>>>>>>> *From:* Varun Sharma [[email protected]] >>>>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM >>>>>>>>> *To:* [email protected] >>>>>>>>> *Subject:* RoutingTableProvider dropping callbacks >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> It seems that the RoutingTableProvider is dropping callbacks in >>>>>>>>> our case. Here is a log: >>>>>>>>> >>>>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181] >>>>>>>>> (CallbackHandler.java:130) WARN Skip processing callbacks for >>>>>>>>> listener: >>>>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062, >>>>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK >>>>>>>>> >>>>>>>>> >>>>>>>>> We have a custom RoutingTableProvider to catch callbacks and do >>>>>>>>> some processing - this is causing a lot of issues for us. What could >>>>>>>>> be >>>>>>>>> causing this ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Varun >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
