[jira] [Created] (KAFKA-13045) Add test for batched OffsetFetch requests where we have the same group in the request twice.

2021-07-07 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-13045:
-

 Summary: Add test for batched OffsetFetch requests where we have 
the same group in the request twice.
 Key: KAFKA-13045
 URL: https://issues.apache.org/jira/browse/KAFKA-13045
 Project: Kafka
  Issue Type: Test
Reporter: Sanjana Kaundinya






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9678) Introduce bounded exponential backoff in clients

2020-07-01 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya resolved KAFKA-9678.
--
Resolution: Duplicate

> Introduce bounded exponential backoff in clients
> 
>
> Key: KAFKA-9678
> URL: https://issues.apache.org/jira/browse/KAFKA-9678
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, consumer, producer 
>Reporter: Guozhang Wang
>Assignee: Sanjana Kaundinya
>Priority: Major
>  Labels: needs-kip
>
> In all clients (consumer, producer, admin, and streams) we have retry 
> mechanisms with fixed backoff to handle transient connection issues with 
> brokers. However, with small backoff (many defaults to 100ms) we could send 
> 10s of requests per second to the broker, and if the connection issue is 
> prolonged it means a huge overhead.
> We should consider introducing upper-bounded exponential backoff universally 
> in those clients to reduce the num of retry requests during the period of 
> connection partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-9678) Introduce bounded exponential backoff in clients

2020-07-01 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya reopened KAFKA-9678:
--

> Introduce bounded exponential backoff in clients
> 
>
> Key: KAFKA-9678
> URL: https://issues.apache.org/jira/browse/KAFKA-9678
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, consumer, producer 
>Reporter: Guozhang Wang
>Assignee: Sanjana Kaundinya
>Priority: Major
>  Labels: needs-kip
>
> In all clients (consumer, producer, admin, and streams) we have retry 
> mechanisms with fixed backoff to handle transient connection issues with 
> brokers. However, with small backoff (many defaults to 100ms) we could send 
> 10s of requests per second to the broker, and if the connection issue is 
> prolonged it means a huge overhead.
> We should consider introducing upper-bounded exponential backoff universally 
> in those clients to reduce the num of retry requests during the period of 
> connection partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9893) Configurable TCP connection timeout and improve the initial metadata fetch

2020-07-01 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya resolved KAFKA-9893.
--
Resolution: Implemented

> Configurable TCP connection timeout and improve the initial metadata fetch
> --
>
> Key: KAFKA-9893
> URL: https://issues.apache.org/jira/browse/KAFKA-9893
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Cheng Tan
>Assignee: Cheng Tan
>Priority: Major
>
> This issue has two parts:
>  # Support transportation layer connection timeout described in KIP-601
>  # Optimize the logic for NetworkClient.leastLoadedNode()
> Changes:
>  # Added a new common client configuration parameter 
> socket.connection.setup.timeout.ms to the NetworkClient. Handle potential 
> transportation layer timeout using the same approach as it handling potential 
> request timeout.
>  # When no connected channel exists, leastLoadedNode() will now provide a 
> disconnected node that has the least number of failed attempts. 
>  # ClusterConnectionStates will keep the connecting node ids. Now it also has 
> several new public methods to provide per connection relavant data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9678) Introduce bounded exponential backoff in clients

2020-06-23 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya resolved KAFKA-9678.
--
Resolution: Duplicate

> Introduce bounded exponential backoff in clients
> 
>
> Key: KAFKA-9678
> URL: https://issues.apache.org/jira/browse/KAFKA-9678
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, consumer, producer 
>Reporter: Guozhang Wang
>Assignee: Sanjana Kaundinya
>Priority: Major
>  Labels: needs-kip
>
> In all clients (consumer, producer, admin, and streams) we have retry 
> mechanisms with fixed backoff to handle transient connection issues with 
> brokers. However, with small backoff (many defaults to 100ms) we could send 
> 10s of requests per second to the broker, and if the connection issue is 
> prolonged it means a huge overhead.
> We should consider introducing upper-bounded exponential backoff universally 
> in those clients to reduce the num of retry requests during the period of 
> connection partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10021) When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms

2020-05-19 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-10021:
-

 Summary: When reading to the end of the config log, check if 
fetch.max.wait.ms is greater than worker.sync.timeout.ms
 Key: KAFKA-10021
 URL: https://issues.apache.org/jira/browse/KAFKA-10021
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Sanjana Kaundinya


Currently in the Connect code in DistributedHerder.java, we see the following 
piece of code

 

{{if (!canReadConfigs && !readConfigToEnd(workerSyncTimeoutMs))
return; // Safe to return and tick immediately because 
readConfigToEnd will do the backoff for us}}

where the workerSyncTimeoutMs passed in is the timeout given to read to the end 
of the config log. This is a bug as we should check if fetch.wait.max.ms is 
greater than worker.sync.timeout.ms and if it is, use worker.sync.timeout.ms as 
the fetch.wait.max.ms. A better fix would be to use the AdminClient to read to 
the end of the log, but at a minimum we should check the configs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9988) Log incorrectly reports task has failed when task takes too long to shutdown

2020-05-13 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9988:


 Summary: Log incorrectly reports task has failed when task takes 
too long to shutdown
 Key: KAFKA-9988
 URL: https://issues.apache.org/jira/browse/KAFKA-9988
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Sanjana Kaundinya


If the OffsetStorageReader is closed while the task is trying to shutdown, and 
the task is trying to access the offsets from the OffsetStorageReader, then we 
see the following in the logs.

{code:java}
[2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task threw 
an uncaught and unrecoverable exception 
(org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Failed to fetch offsets.
at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:114)
at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:63)
at 
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:205)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.connect.errors.ConnectException: Offset reader 
closed while attempting to read offsets. This is likely because the task was 
been scheduled to stop but has taken longer than the graceful shutdown period 
to do so.
at 
org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:103)
... 14 more
[2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=replicator-18} Task is 
being killed and will not recover until manually restarted 
(org.apache.kafka.connect.runtime.WorkerTask)
{code}

This is a bit misleading, because the task is already on its way of being 
shutdown, and doesn't actually need manual intervention to be restarted. We can 
see that as later on in the logs we see that it throws another unrecoverable 
exception.

{code:java}
[2020-05-05 05:40:39,361] ERROR WorkerSourceTask{id=replicator-18} Task threw 
an uncaught and unrecoverable exception 
(org.apache.kafka.connect.runtime.WorkerTask)
{code}

If we know a task is on its way of shutting down, we should not throw a 
ConnectException and instead log a warning so that we don't log false negatives.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9621) AdminClient listOffsets operation does not respect retries and backoff

2020-02-27 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9621:


 Summary: AdminClient listOffsets operation does not respect 
retries and backoff
 Key: KAFKA-9621
 URL: https://issues.apache.org/jira/browse/KAFKA-9621
 Project: Kafka
  Issue Type: Bug
  Components: admin
Reporter: Sanjana Kaundinya
Assignee: Cheng Tan
 Fix For: 2.6.0


Similar to https://issues.apache.org/jira/browse/KAFKA-9047, currently the 
listOffsets operation doesn't respect the configured retries and backoff for a 
given call. 

For example, the code path could go like so:
1) Make a metadata request and schedule subsequent list offsets calls
2) Metadata error comes back with `InvalidMetadataException`

3) Go back to 1

The problem here is that the state is not preserved across calls. We loose the 
information regarding how many tries the call has been tried and how far out we 
should schedule the call to try again. This could lead to a tight retry loop 
and put pressure on the brokers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9580) Log clearer error messages when there is an offset out of range

2020-02-20 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9580:


 Summary: Log clearer error messages when there is an offset out of 
range
 Key: KAFKA-9580
 URL: https://issues.apache.org/jira/browse/KAFKA-9580
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Sanjana Kaundinya


Currently in 
[Fetcher::initializedCompletedFetch|[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L1259]],
 it's not informative on what offsets are exactly in range, and it makes it 
harder to debug when Consumer issues come up in this situation. It would be 
much more helpful if we logged a message similar to 
[Log::read|[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L1468]]
 to make it easier to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9306) Kafka Consumer does not clean up all metrics after shutdown

2020-02-18 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya resolved KAFKA-9306.
--
Fix Version/s: 2.4.1
   Resolution: Fixed

> Kafka Consumer does not clean up all metrics after shutdown
> ---
>
> Key: KAFKA-9306
> URL: https://issues.apache.org/jira/browse/KAFKA-9306
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 2.4.1
>
>
> The Kafka Consumer does not clean up all metrics after shutdown.  It seems 
> like this was a regression introduced in Kafka 2.4 when we added the 
> KafkaConsumerMetrics class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9558) getListOffsetsCalls doesn't update node in case of leader change

2020-02-14 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9558:


 Summary: getListOffsetsCalls doesn't update node in case of leader 
change
 Key: KAFKA-9558
 URL: https://issues.apache.org/jira/browse/KAFKA-9558
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 2.5.0
Reporter: Sanjana Kaundinya
Assignee: Sanjana Kaundinya


As seen here:
[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3810]

In handling the response in the `listOffsets` call, if there are errors in the 
topic partition that require a metadata refresh, it simply passes the call 
object as `this`. This produces incorrect behavior if there was a leader 
change, because the call object never gets its leader node updated. This will 
result in a tight loop of list offsets being called to the same old leader and 
not resulting in offsets, even though the metadata was correctly updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9520) Deprecate ZooKeeper access for kafka-reassign-partitions.sh

2020-02-06 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9520:


 Summary: Deprecate ZooKeeper access for 
kafka-reassign-partitions.sh
 Key: KAFKA-9520
 URL: https://issues.apache.org/jira/browse/KAFKA-9520
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 2.5.0
Reporter: Sanjana Kaundinya
 Fix For: 2.5.0


As part of KIP-555 access for zookeeper must be deprecated  for 
kafka-reassign-partitions.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9519) Deprecate ZooKeeper access for kafka-configs.sh

2020-02-06 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9519:


 Summary: Deprecate ZooKeeper access for kafka-configs.sh   
 Key: KAFKA-9519
 URL: https://issues.apache.org/jira/browse/KAFKA-9519
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 2.5.0
Reporter: Sanjana Kaundinya
Assignee: Sanjana Kaundinya
 Fix For: 2.5.0


As part of KIP-555 access for zookeeper must be deprecated  for 
kafka-configs.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9509) Fix flaky test MirrorConnectorsIntegrationTest.testReplication

2020-02-05 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-9509:


 Summary: Fix flaky test 
MirrorConnectorsIntegrationTest.testReplication
 Key: KAFKA-9509
 URL: https://issues.apache.org/jira/browse/KAFKA-9509
 Project: Kafka
  Issue Type: Test
  Components: mirrormaker
Affects Versions: 2.4.0, 2.4.1, 2.5.0
Reporter: Sanjana Kaundinya


The test `MirrorConnectorsIntegrationTest.testReplication` is a flaky test for 
MirrorMaker 2.0. Its flakiness lies in the timing of when the connectors and 
tasks are started up. The fix for this would make it such that when the 
connectors are started up, to wait until the REST endpoint returns a positive 
number of tasks to be confident that we can start testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8945) Incorrect null check in the constructor for ConnectorHealth and AbstractState

2019-09-25 Thread Sanjana Kaundinya (Jira)
Sanjana Kaundinya created KAFKA-8945:


 Summary: Incorrect null check in the constructor for 
ConnectorHealth and AbstractState
 Key: KAFKA-8945
 URL: https://issues.apache.org/jira/browse/KAFKA-8945
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.3.0
Reporter: Sanjana Kaundinya


This bug is in relation to KIP-285: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-285%3A+Connect+Rest+Extension+Plugin]

In the constructors of ConnectorHealth.java and AbstractState.java, the check 
that is done for the null parameters is done incorrectly. The current code only 
allows for the class to be instantiated if the parameters passed in are null. 
However the expected behavior has to be the opposite of this: we only want this 
class to be instantiated if the parameters passed in are not null. While the 
fix for this is pretty trivial, it would be good to add in some testing that 
tests the appropriate classes related to the rest extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)