[jira] [Created] (KAFKA-13045) Add test for batched OffsetFetch requests where we have the same group in the request twice.
Sanjana Kaundinya created KAFKA-13045: - Summary: Add test for batched OffsetFetch requests where we have the same group in the request twice. Key: KAFKA-13045 URL: https://issues.apache.org/jira/browse/KAFKA-13045 Project: Kafka Issue Type: Test Reporter: Sanjana Kaundinya -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9678) Introduce bounded exponential backoff in clients
[ https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya resolved KAFKA-9678. -- Resolution: Duplicate > Introduce bounded exponential backoff in clients > > > Key: KAFKA-9678 > URL: https://issues.apache.org/jira/browse/KAFKA-9678 > Project: Kafka > Issue Type: Improvement > Components: admin, consumer, producer >Reporter: Guozhang Wang >Assignee: Sanjana Kaundinya >Priority: Major > Labels: needs-kip > > In all clients (consumer, producer, admin, and streams) we have retry > mechanisms with fixed backoff to handle transient connection issues with > brokers. However, with small backoff (many defaults to 100ms) we could send > 10s of requests per second to the broker, and if the connection issue is > prolonged it means a huge overhead. > We should consider introducing upper-bounded exponential backoff universally > in those clients to reduce the num of retry requests during the period of > connection partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-9678) Introduce bounded exponential backoff in clients
[ https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya reopened KAFKA-9678: -- > Introduce bounded exponential backoff in clients > > > Key: KAFKA-9678 > URL: https://issues.apache.org/jira/browse/KAFKA-9678 > Project: Kafka > Issue Type: Improvement > Components: admin, consumer, producer >Reporter: Guozhang Wang >Assignee: Sanjana Kaundinya >Priority: Major > Labels: needs-kip > > In all clients (consumer, producer, admin, and streams) we have retry > mechanisms with fixed backoff to handle transient connection issues with > brokers. However, with small backoff (many defaults to 100ms) we could send > 10s of requests per second to the broker, and if the connection issue is > prolonged it means a huge overhead. > We should consider introducing upper-bounded exponential backoff universally > in those clients to reduce the num of retry requests during the period of > connection partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9893) Configurable TCP connection timeout and improve the initial metadata fetch
[ https://issues.apache.org/jira/browse/KAFKA-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya resolved KAFKA-9893. -- Resolution: Implemented > Configurable TCP connection timeout and improve the initial metadata fetch > -- > > Key: KAFKA-9893 > URL: https://issues.apache.org/jira/browse/KAFKA-9893 > Project: Kafka > Issue Type: New Feature >Reporter: Cheng Tan >Assignee: Cheng Tan >Priority: Major > > This issue has two parts: > # Support transportation layer connection timeout described in KIP-601 > # Optimize the logic for NetworkClient.leastLoadedNode() > Changes: > # Added a new common client configuration parameter > socket.connection.setup.timeout.ms to the NetworkClient. Handle potential > transportation layer timeout using the same approach as it handling potential > request timeout. > # When no connected channel exists, leastLoadedNode() will now provide a > disconnected node that has the least number of failed attempts. > # ClusterConnectionStates will keep the connecting node ids. Now it also has > several new public methods to provide per connection relavant data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9678) Introduce bounded exponential backoff in clients
[ https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya resolved KAFKA-9678. -- Resolution: Duplicate > Introduce bounded exponential backoff in clients > > > Key: KAFKA-9678 > URL: https://issues.apache.org/jira/browse/KAFKA-9678 > Project: Kafka > Issue Type: Improvement > Components: admin, consumer, producer >Reporter: Guozhang Wang >Assignee: Sanjana Kaundinya >Priority: Major > Labels: needs-kip > > In all clients (consumer, producer, admin, and streams) we have retry > mechanisms with fixed backoff to handle transient connection issues with > brokers. However, with small backoff (many defaults to 100ms) we could send > 10s of requests per second to the broker, and if the connection issue is > prolonged it means a huge overhead. > We should consider introducing upper-bounded exponential backoff universally > in those clients to reduce the num of retry requests during the period of > connection partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
Sanjana Kaundinya created KAFKA-10021: - Summary: When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms Key: KAFKA-10021 URL: https://issues.apache.org/jira/browse/KAFKA-10021 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Sanjana Kaundinya Currently in the Connect code in DistributedHerder.java, we see the following piece of code {{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs)) return; // Safe to return and tick immediately because readConfigToEnd will do the backoff for us}} where the workerSyncTimeoutMs passed in is the timeout given to read to the end of the config log. This is a bug as we should check if fetch.wait.max.ms is greater than worker.sync.timeout.ms and if it is, use worker.sync.timeout.ms as the fetch.wait.max.ms. A better fix would be to use the AdminClient to read to the end of the log, but at a minimum we should check the configs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9988) Log incorrectly reports task has failed when task takes too long to shutdown
Sanjana Kaundinya created KAFKA-9988: Summary: Log incorrectly reports task has failed when task takes too long to shutdown Key: KAFKA-9988 URL: https://issues.apache.org/jira/browse/KAFKA-9988 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Sanjana Kaundinya If the OffsetStorageReader is closed while the task is trying to shutdown, and the task is trying to access the offsets from the OffsetStorageReader, then we see the following in the logs. {code:java} [2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask) org.apache.kafka.connect.errors.ConnectException: Failed to fetch offsets. at org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:114) at org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:63) at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:205) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.kafka.connect.errors.ConnectException: Offset reader closed while attempting to read offsets. This is likely because the task was been scheduled to stop but has taken longer than the graceful shutdown period to do so. at org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:103) ... 14 more [2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask) {code} This is a bit misleading, because the task is already on its way of being shutdown, and doesn't actually need manual intervention to be restarted. We can see that as later on in the logs we see that it throws another unrecoverable exception. {code:java} [2020-05-05 05:40:39,361] ERROR WorkerSourceTask{id=replicator-18} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask) {code} If we know a task is on its way of shutting down, we should not throw a ConnectException and instead log a warning so that we don't log false negatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9621) AdminClient listOffsets operation does not respect retries and backoff
Sanjana Kaundinya created KAFKA-9621: Summary: AdminClient listOffsets operation does not respect retries and backoff Key: KAFKA-9621 URL: https://issues.apache.org/jira/browse/KAFKA-9621 Project: Kafka Issue Type: Bug Components: admin Reporter: Sanjana Kaundinya Assignee: Cheng Tan Fix For: 2.6.0 Similar to https://issues.apache.org/jira/browse/KAFKA-9047, currently the listOffsets operation doesn't respect the configured retries and backoff for a given call. For example, the code path could go like so: 1) Make a metadata request and schedule subsequent list offsets calls 2) Metadata error comes back with `InvalidMetadataException` 3) Go back to 1 The problem here is that the state is not preserved across calls. We loose the information regarding how many tries the call has been tried and how far out we should schedule the call to try again. This could lead to a tight retry loop and put pressure on the brokers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9580) Log clearer error messages when there is an offset out of range
Sanjana Kaundinya created KAFKA-9580: Summary: Log clearer error messages when there is an offset out of range Key: KAFKA-9580 URL: https://issues.apache.org/jira/browse/KAFKA-9580 Project: Kafka Issue Type: Bug Components: core Reporter: Sanjana Kaundinya Currently in [Fetcher::initializedCompletedFetch|[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L1259]], it's not informative on what offsets are exactly in range, and it makes it harder to debug when Consumer issues come up in this situation. It would be much more helpful if we logged a message similar to [Log::read|[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L1468]] to make it easier to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9306) Kafka Consumer does not clean up all metrics after shutdown
[ https://issues.apache.org/jira/browse/KAFKA-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya resolved KAFKA-9306. -- Fix Version/s: 2.4.1 Resolution: Fixed > Kafka Consumer does not clean up all metrics after shutdown > --- > > Key: KAFKA-9306 > URL: https://issues.apache.org/jira/browse/KAFKA-9306 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 2.4.1 > > > The Kafka Consumer does not clean up all metrics after shutdown. It seems > like this was a regression introduced in Kafka 2.4 when we added the > KafkaConsumerMetrics class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9558) getListOffsetsCalls doesn't update node in case of leader change
Sanjana Kaundinya created KAFKA-9558: Summary: getListOffsetsCalls doesn't update node in case of leader change Key: KAFKA-9558 URL: https://issues.apache.org/jira/browse/KAFKA-9558 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 2.5.0 Reporter: Sanjana Kaundinya Assignee: Sanjana Kaundinya As seen here: [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3810] In handling the response in the `listOffsets` call, if there are errors in the topic partition that require a metadata refresh, it simply passes the call object as `this`. This produces incorrect behavior if there was a leader change, because the call object never gets its leader node updated. This will result in a tight loop of list offsets being called to the same old leader and not resulting in offsets, even though the metadata was correctly updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9520) Deprecate ZooKeeper access for kafka-reassign-partitions.sh
Sanjana Kaundinya created KAFKA-9520: Summary: Deprecate ZooKeeper access for kafka-reassign-partitions.sh Key: KAFKA-9520 URL: https://issues.apache.org/jira/browse/KAFKA-9520 Project: Kafka Issue Type: Sub-task Affects Versions: 2.5.0 Reporter: Sanjana Kaundinya Fix For: 2.5.0 As part of KIP-555 access for zookeeper must be deprecated for kafka-reassign-partitions.sh. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9519) Deprecate ZooKeeper access for kafka-configs.sh
Sanjana Kaundinya created KAFKA-9519: Summary: Deprecate ZooKeeper access for kafka-configs.sh Key: KAFKA-9519 URL: https://issues.apache.org/jira/browse/KAFKA-9519 Project: Kafka Issue Type: Sub-task Affects Versions: 2.5.0 Reporter: Sanjana Kaundinya Assignee: Sanjana Kaundinya Fix For: 2.5.0 As part of KIP-555 access for zookeeper must be deprecated for kafka-configs.sh. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9509) Fix flaky test MirrorConnectorsIntegrationTest.testReplication
Sanjana Kaundinya created KAFKA-9509: Summary: Fix flaky test MirrorConnectorsIntegrationTest.testReplication Key: KAFKA-9509 URL: https://issues.apache.org/jira/browse/KAFKA-9509 Project: Kafka Issue Type: Test Components: mirrormaker Affects Versions: 2.4.0, 2.4.1, 2.5.0 Reporter: Sanjana Kaundinya The test `MirrorConnectorsIntegrationTest.testReplication` is a flaky test for MirrorMaker 2.0. Its flakiness lies in the timing of when the connectors and tasks are started up. The fix for this would make it such that when the connectors are started up, to wait until the REST endpoint returns a positive number of tasks to be confident that we can start testing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-8945) Incorrect null check in the constructor for ConnectorHealth and AbstractState
Sanjana Kaundinya created KAFKA-8945: Summary: Incorrect null check in the constructor for ConnectorHealth and AbstractState Key: KAFKA-8945 URL: https://issues.apache.org/jira/browse/KAFKA-8945 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.3.0 Reporter: Sanjana Kaundinya This bug is in relation to KIP-285: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-285%3A+Connect+Rest+Extension+Plugin] In the constructors of ConnectorHealth.java and AbstractState.java, the check that is done for the null parameters is done incorrectly. The current code only allows for the class to be instantiated if the parameters passed in are null. However the expected behavior has to be the opposite of this: we only want this class to be instantiated if the parameters passed in are not null. While the fix for this is pretty trivial, it would be good to add in some testing that tests the appropriate classes related to the rest extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)