@Kishore, I think the remove is used in case bucket size is changed, so we can 
clean all the buckets for old size and set it using new size.

The issue seems like a race condition in setting bucketized external view and 
add watches on child paths. Will investigate more.

Thanks,
Jason
________________________________
From: Varun Sharma [[email protected]]
Sent: Saturday, March 07, 2015 11:07 PM
To: [email protected]
Subject: Re: RoutingTableProvider dropping callbacks

Please find the attached log file with the above trace.

On Sat, Mar 7, 2015 at 8:12 PM, kishore g 
<[email protected]<mailto:[email protected]>> wrote:
Another thing is that the RoutingTable is logging this line "Resetting the 
routing table.". Looks like this happens when we fail to set the watcher.

thanks,
Kishore G

On Sat, Mar 7, 2015 at 8:05 PM, kishore g 
<[email protected]<mailto:[email protected]>> wrote:
Your explanation makes sense.

https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java.
 For bucketized resource we see that path is deleted and set again. Jason, any 
idea why we are removing the path?

case EXTERNALVIEW:
        if (value.getBucketSize() == 0) {
        records.add(value.getRecord());
        } else {
        _baseDataAccessor.remove(path, options);

On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma 
<[email protected]<mailto:[email protected]>> wrote:
How does the writing of externalview work for bucketized resources -is it 
possible that the top level znode for the resource is first deleted and then 
rewritten with the latest external view ?

On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma 
<[email protected]<mailto:[email protected]>> wrote:
Here is the stack trace - there is a zookeeper race and the detailed stack 
trace appears for bucketized resources. I saw that the ideal state for the 
resource was created on 26th Feb and was modified on 7th March. However, the 
external view for the resource is showing up as created on 7th march as well as 
modified on 7th march. The external view is created at 10:36:04 on 7th march 
which is 20 seconds after this log message stack trace is spit out. After this 
the routing table provider no longer receives any more zk callbacks.


2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  
org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0, 
rc:NONODE, path: 
/main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  
org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a, 
rc:NONODE, path: 
/main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  
org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f, 
rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] 
(CallbackHandler.java:304) WARN  fail to subscribe child/data change. path: 
/main_a/EXTERNALVIEW, listener: 
com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da

org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)

        at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210)

        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)

        at 
org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279)

        at 
org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        at 
org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391)

        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)

        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)

2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181] 
(RoutingTableProvider.java:99) INFO  Resetting the routing table.

On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma 
<[email protected]<mailto:[email protected]>> wrote:
I suspect the callbacks are not coming in, for a long time now.

On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma 
<[email protected]<mailto:[email protected]>> wrote:
I grepped this and found nothing:


sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log*

I found a bunch of START:INVOKE for the IDEALSTATES znode though.

On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang 
<[email protected]<mailto:[email protected]>> wrote:
Yes. you should see a pair of "START:INVOKE..." and "END:INVOKE:..." for each 
callback in your log.
________________________________
From: Varun Sharma [[email protected]<mailto:[email protected]>]
Sent: Thursday, March 05, 2015 11:11 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: RoutingTableProvider dropping callbacks

Ohk - is there a way to confirm that the callbacks are being processed (from 
the logs etc.) ?

On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Varun,

This should not be a problem. When we register a callback, we are expecting a 
call back type of INIT first, followed by a sequence of CALLBACK types, and 
when you unregister the callback, you will received a FINALIZED type. Since 
unregister is an async operation, when you receive a FINALIZED type, you might 
still see a couple of CALLBACK type callbacks, which are simply ignored. The 
log is basically telling you that.

Thanks,
Jason
________________________________
From: Varun Sharma [[email protected]<mailto:[email protected]>]
Sent: Thursday, March 05, 2015 10:44 AM
To: [email protected]<mailto:[email protected]>
Subject: RoutingTableProvider dropping callbacks

Hi,

It seems that the RoutingTableProvider is dropping callbacks in our case. Here 
is a log:


[ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:130) WARN  
Skip processing callbacks for listener: 
com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062, path: 
/main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK


We have a custom RoutingTableProvider to catch callbacks and do some processing 
- this is causing a lot of issues for us. What  could be causing this ?

Thanks
Varun








Reply via email to