yep .. Seeing this $ grep -i flap /var/log/streamio/streamio.log 2015-04-30 16:08:50,823 ERROR - ZKHelixManager - instanceName: ??--checkpointer is flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in 300000ms. 2015-04-30 16:09:30,140 ERROR - ZKHelixManager - instanceName: ??-controller- is flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in 300000ms. 2015-04-30 16:11:05,679 ERROR - ZKHelixManager - instanceName: ??-controller- is flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in 300000ms.
and confirmed its GCing from the logs. (Sorry, had a bad dashboard originally that did not catch this) Thanks Vinoth On Thu, Apr 30, 2015 at 12:12 PM, Zhen Zhang <[email protected]> wrote: > Hi Vinoth, > > The NPE indicates the zookeeper connection in ZkClient is NULL. The > connection becomes NULL only when HelixManager#disconnect() is called. This > may happen if you directly call HelixManager#disconnect() or there are > frequent GC's and HelixManager disconnects itself. You may grep > "KeeperState" to figure out the connection state changes. > > Thanks, > Jason > > > On Thu, Apr 30, 2015 at 11:53 AM, Vinoth Chandar <[email protected]> wrote: > >> Hi guys, >> >> I am hitting the following with 0.6.5, upon a ZK connection timeout . We >> make this call to the PropertyStore to figure out an offset to resume from. >> This error eventually puts every partition into an error state and comes to >> a grinding halt. Any pointers to troubleshoot this? Nonetheless, there >> should nt be an NPE right? >> >> NullPointerException >> >> - >> >> org.apache.helix.manager.zk.ZkClient$4 in call at line 241 >> - >> >> org.apache.helix.manager.zk.ZkClient$4 in call at line 237 >> - >> >> org.I0Itec.zkclient.ZkClient in retryUntilConnected at line 675 >> - >> >> org.apache.helix.manager.zk.ZkClient in readData at line 237 >> - >> >> org.I0Itec.zkclient.ZkClient in readData at line 761 >> - >> >> org.apache.helix.manager.zk.ZkBaseDataAccessor in get at line 308 >> - >> >> org.apache.helix.manager.zk.ZkCacheBaseDataAccessor in get at line 377 >> - >> >> org.apache.helix.store.zk.AutoFallbackPropertyStore in get at line 100 >> >> >> >> Thanks >> Vinoth >> > >
