[ https://issues.apache.org/jira/browse/SAMZA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shanthoosh Venkataraman updated SAMZA-1607: ------------------------------------------- Summary: Handle ZkNodeNotExists exception in zkUtils.readProcessorData (was: Fix bug in reading the ephemeral processor nodes from zookeeper.) > Handle ZkNodeNotExists exception in zkUtils.readProcessorData > ------------------------------------------------------------- > > Key: SAMZA-1607 > URL: https://issues.apache.org/jira/browse/SAMZA-1607 > Project: Samza > Issue Type: Bug > Reporter: Shanthoosh Venkataraman > Assignee: Shanthoosh Venkataraman > Priority: Major > > Existing implementation of reading the data of ephemeral processor nodes in > zookeeper happens in two steps. > A. Fetch the list of ephemeral processor nodes. > B. Read the data of each processor node from the list. > A ephemeral zookeeper node present in step A might be unavailable in the step > B. This exception in unhandled currently and can kill the leader processor > unnecessarily. Here's the related exception observed in a dev setup. > {code:java} > org.apache.samza.SamzaException: Cannot read ZK node: > /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001 > at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232) > at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255) > at > org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292) > at > org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194) > at > org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188) > at > org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001 > at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084) > at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)