[ https://issues.apache.org/jira/browse/SAMZA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389275#comment-16389275 ]
ASF GitHub Bot commented on SAMZA-1607: --------------------------------------- GitHub user shanthoosh opened a pull request: https://github.com/apache/samza/pull/437 SAMZA-1607: Handle ZkNodeNotExistsException in zkUtils.readProcessorData You can merge this pull request into a Git repository by running: $ git pull https://github.com/shanthoosh/samza fix_zkutils_get_processor_data Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #437 ---- commit 2c6d5f9cee4d833d8f63823ee4078f41a726203f Author: Shanthoosh Venkataraman <svenkataraman@...> Date: 2018-02-12T23:25:36Z SAMZA-1607: Handle ZkNodeNotExists exception in zkUtils.readProcessorData(). ---- > Handle ZkNodeNotExists exception in zkUtils.readProcessorData > ------------------------------------------------------------- > > Key: SAMZA-1607 > URL: https://issues.apache.org/jira/browse/SAMZA-1607 > Project: Samza > Issue Type: Bug > Reporter: Shanthoosh Venkataraman > Assignee: Shanthoosh Venkataraman > Priority: Major > > Existing implementation of reading the data of ephemeral processor nodes in > zookeeper happens in two steps. > A. Fetch the list of ephemeral processor nodes. > B. Read the data of each processor node from the list. > A ephemeral zookeeper node present in step A might be unavailable in the step > B. This exception in unhandled currently and can kill the leader processor > unnecessarily. Here's the related exception observed in a dev setup. > {code:java} > org.apache.samza.SamzaException: Cannot read ZK node: > /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001 > at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232) > at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255) > at > org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292) > at > org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194) > at > org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188) > at > org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001 > at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084) > at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)