Hi Vinoth, The NPE indicates the zookeeper connection in ZkClient is NULL. The connection becomes NULL only when HelixManager#disconnect() is called. This may happen if you directly call HelixManager#disconnect() or there are frequent GC's and HelixManager disconnects itself. You may grep "KeeperState" to figure out the connection state changes.
Thanks, Jason On Thu, Apr 30, 2015 at 11:53 AM, Vinoth Chandar <[email protected]> wrote: > Hi guys, > > I am hitting the following with 0.6.5, upon a ZK connection timeout . We > make this call to the PropertyStore to figure out an offset to resume from. > This error eventually puts every partition into an error state and comes to > a grinding halt. Any pointers to troubleshoot this? Nonetheless, there > should nt be an NPE right? > > NullPointerException > > - > > org.apache.helix.manager.zk.ZkClient$4 in call at line 241 > - > > org.apache.helix.manager.zk.ZkClient$4 in call at line 237 > - > > org.I0Itec.zkclient.ZkClient in retryUntilConnected at line 675 > - > > org.apache.helix.manager.zk.ZkClient in readData at line 237 > - > > org.I0Itec.zkclient.ZkClient in readData at line 761 > - > > org.apache.helix.manager.zk.ZkBaseDataAccessor in get at line 308 > - > > org.apache.helix.manager.zk.ZkCacheBaseDataAccessor in get at line 377 > - > > org.apache.helix.store.zk.AutoFallbackPropertyStore in get at line 100 > > > > Thanks > Vinoth >
