[ 
https://issues.apache.org/jira/browse/KAFKA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Dagostino resolved KAFKA-14890.
-----------------------------------
    Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-14887

> Kafka initiates shutdown due to connectivity problem with Zookeeper and 
> FatalExitError from ChangeNotificationProcessorThread
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-14890
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14890
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.3.2
>            Reporter: Denis Razuvaev
>            Priority: Major
>
> Hello, 
> We have faced several times the deadlock in Kafka, the similar issue is - 
> https://issues.apache.org/jira/browse/KAFKA-13544 
> The question - is it expected behavior that Kafka decided to shut down due to 
> connectivity problems with Zookeeper? Seems like it is related to the 
> inability to read data from */feature* Zk node and the 
> _ZooKeeperClientExpiredException_ thrown from _ZooKeeperClient_ class. This 
> exception is thrown and it is caught only in catch block of _doWork()_ method 
> in {_}ChangeNotificationProcessorThread{_}, and it leads to 
> {_}FatalExitError{_}. 
> This problem with shutdown is reproduced in the new versions of Kafka (which 
> already have fix regarding deadlock from 13544). 
> It is hard to write a synthetic test to reproduce problem, but it can be 
> reproduced locally via debug mode with the following steps: 
> 1) Start Zookeeper and start Kafka in debug mode. 
> 2) Emulate connectivity problem between Kafka and Zookeeper, for example 
> connection can be closed via Netcrusher library. 
> 3) Put a breakpoint in _updateLatestOrThrow()_ method in 
> _FeatureCacheUpdater_ class, before 
> _zkClient.getDataAndVersion(featureZkNodePath)_ line execution. 
> 4) Restore connection between Kafka and Zookeeper after session expiration. 
> Kafka execution should be stopped on the breakpoint.
> 5) Resume execution until Kafka starts to execute line 
> _zooKeeperClient.handleRequests(remainingRequests)_ in 
> _retryRequestsUntilConnected_ method in _KafkaZkClient_ class. 
> 6) Again emulate connectivity problem between Kafka and Zookeeper and wait 
> until session will be expired. 
> 7) Restore connection between Kafka and Zookeeper. 
> 8) Kafka begins shutdown process, due to: 
> _ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK 
> node change event. The broker will eventually exit. 
> (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)_
>  
> The following problems on the real environment can be caused by some network 
> problems and periodic disconnection and connection to the Zookeeper in a 
> short time period. 
> I started mail thread in 
> [https://lists.apache.org/thread/gbk4scwd8g7mg2tfsokzj5tjgrjrb9dw] regarding 
> this problem, but have no answers.
> For me it seems like defect, because Kafka initiates shutdown after restoring 
> connection between Kafka and Zookeeper, and should be fixed. 
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to