Yash Mayya created KAFKA-14732:
----------------------------------

             Summary: Use an exponential backoff retry mechanism while 
reconfiguring connector tasks
                 Key: KAFKA-14732
                 URL: https://issues.apache.org/jira/browse/KAFKA-14732
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
            Reporter: Yash Mayya
            Assignee: Yash Mayya


Kafka Connect in distributed mode retries infinitely with a fixed retry backoff 
(250 ms) in case of errors arising during connector task reconfiguration. Tasks 
can be "reconfigured" during connector startup (to get the initial task configs 
from the connector), a connector resume or if a connector explicitly requests 
it via its context. Task reconfiguration essentially entails requesting a 
connector instance for its task configs and writing them to the Connect 
cluster's config storage (in case a change in task configs is detected). A 
fixed retry backoff of 250 ms leads to very aggressive retries - consider a 
Debezium connector which attempts to initiate a database connection in its 
[taskConfigs 
method|https://github.com/debezium/debezium/blob/bf347da71ad9b0819998a3bc9754b3cc96cc1563/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java#L63].
 If the connection fails due to something like an invalid login, the Connect 
worker will essentially spam connection attempts frequently and indefinitely 
(until the connector config / database side configs are fixed). An exponential 
backoff retry mechanism seems more well suited for the 
[DistributedHerder::reconfigureConnectorTasksWithRetry|https://github.com/apache/kafka/blob/a54a34a11c1c867ff62a7234334cad5139547fd7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1873-L1898]
 method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to