[jira] [Commented] (KAFKA-8898) if there is no message for poll, kafka consumer also apply memory

2019-10-07 Thread linking12 (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946467#comment-16946467
 ] 

linking12 commented on KAFKA-8898:
--

code:

   org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords

!image-2019-10-08-12-07-37-328.png!

> if there is no message for poll, kafka consumer also apply memory
> -
>
> Key: KAFKA-8898
> URL: https://issues.apache.org/jira/browse/KAFKA-8898
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.1
>Reporter: linking12
>Priority: Blocker
>  Labels: performance
> Attachments: image-2019-10-08-12-07-37-328.png
>
>
> when poll message, but there is no record,but consumer will apply 1000 byte 
> memory;
> fetched = *new* HashMap<>() is not good idea, it will apply memory in heap 
> but there is no message;
> I think fetched = *new* HashMap<>() will appear in records exist;
>  
> ```
>   *public* Map>> fetchedRecords() {
>         Map>> fetched = *new* 
> HashMap<>();
>         *int* recordsRemaining = maxPollRecords;
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8898) if there is no message for poll, kafka consumer also apply memory

2019-10-07 Thread linking12 (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linking12 updated KAFKA-8898:
-
Attachment: image-2019-10-08-12-07-37-328.png

> if there is no message for poll, kafka consumer also apply memory
> -
>
> Key: KAFKA-8898
> URL: https://issues.apache.org/jira/browse/KAFKA-8898
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.1
>Reporter: linking12
>Priority: Blocker
>  Labels: performance
> Attachments: image-2019-10-08-12-07-37-328.png
>
>
> when poll message, but there is no record,but consumer will apply 1000 byte 
> memory;
> fetched = *new* HashMap<>() is not good idea, it will apply memory in heap 
> but there is no message;
> I think fetched = *new* HashMap<>() will appear in records exist;
>  
> ```
>   *public* Map>> fetchedRecords() {
>         Map>> fetched = *new* 
> HashMap<>();
>         *int* recordsRemaining = maxPollRecords;
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8206) A consumer can't discover new group coordinator when the cluster was partly restarted

2019-10-07 Thread Ivan Yurchenko (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946465#comment-16946465
 ] 

Ivan Yurchenko commented on KAFKA-8206:
---

[~guozhang] yes, I will do this.

> A consumer can't discover new group coordinator when the cluster was partly 
> restarted
> -
>
> Key: KAFKA-8206
> URL: https://issues.apache.org/jira/browse/KAFKA-8206
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0, 2.2.0
>Reporter: alex gabriel
>Assignee: Ivan Yurchenko
>Priority: Critical
>  Labels: needs-kip
>
> *A consumer can't discover new group coordinator when the cluster was partly 
> restarted*
> Preconditions:
> I use Kafka server and Java kafka-client lib 2.2 version
> I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1 
> ZK(localhost:2181)
> I have replication factor 2 for the all my topics and 
> '_unclean.leader.election.enable=true_' on both Kafka nodes.
> Steps to reproduce:
> 1) Start 2nodes (localhost:9092/localhost:9093)
> 2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
> {noformat}
> // discovered group coordinator (0-node)
> 2019-04-09 16:23:18,963 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
> ...metadatacache is updated (2 nodes in the cluster list)
> 2019-04-09 16:23:18,928 DEBUG 
> [org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] - 
> [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending 
> metadata request (type=MetadataRequest, topics=) to node localhost:9092 
> (id: -1 rack: null)>
> 2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null), 
> localhost:9093 (id: 1 rack: null)], partitions = [], controller = 
> localhost:9092 (id: 0 rack: null))}>
> {noformat}
> 3) Shutdown 1-node (localhost:9093)
> {noformat}
> // metadata was updated to the 4 version (but for some reasons it still had 2 
> alive nodes inside cluster)
> 2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null), 
> localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic = 
> events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1], 
> offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader 
> = 0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller = 
> localhost:9092 (id: 0 rack: null))}>
> //consumers thinks that node-1 is still alive and try to send coordinator 
> lookup to it but failed
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group 
> coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or 
> invalid, will attempt rediscovery>
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
> 2019-04-09 16:24:01,117 WARN 
> [org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1 
> (localhost:9093) could not be established. Broker may not be available.>
> // refreshing metadata again
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Cancelled request with header RequestHeader(apiKey=FIND_COORDINATOR, 
> apiVersion=2, clientId=events-consumer0, correlationId=112) due to node 1 
> being disconnected>
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Coordinator discovery failed, refreshing metadata>
> // metadata was updated to the 5 version where 

[jira] [Commented] (KAFKA-8272) Changed(De)Serializer does not forward configure() call

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946418#comment-16946418
 ] 

ASF GitHub Bot commented on KAFKA-8272:
---

mjsax commented on pull request #6614: [WIP -- do not merge] KAFKA-8272: 
Changed(De)Serializer should forward call to configure()
URL: https://github.com/apache/kafka/pull/6614
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Changed(De)Serializer does not forward configure() call
> ---
>
> Key: KAFKA-8272
> URL: https://issues.apache.org/jira/browse/KAFKA-8272
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0, 
> 0.10.2.1, 0.10.2.2, 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.0.0, 1.0.1, 
> 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>
> Exposed via KAFKA-3729.
> Because this was not a problem in the past, might not be necessary to back 
> port to all version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8939) Flaky test ReassignPartitionsClusterTest#shouldTriggerReassignmentWithZnodePrecedenceOnControllerStartup

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946413#comment-16946413
 ] 

Matthias J. Sax commented on KAFKA-8939:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/2391/consoleFull]

> Flaky test 
> ReassignPartitionsClusterTest#shouldTriggerReassignmentWithZnodePrecedenceOnControllerStartup
> 
>
> Key: KAFKA-8939
> URL: https://issues.apache.org/jira/browse/KAFKA-8939
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, core
>Affects Versions: 2.4.0
>Reporter: Sophie Blee-Goldman
>Priority: Major
>  Labels: flaky-test
>
> h3. Stacktrace
> java.lang.AssertionError: expected: but was: at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:120) at 
> org.junit.Assert.assertEquals(Assert.java:146) at 
> kafka.admin.ReassignPartitionsClusterTest.shouldTriggerReassignmentWithZnodePrecedenceOnControllerStartup(ReassignPartitionsClusterTest.scala:687)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>  at jdk.internal.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> 

[jira] [Updated] (KAFKA-8967) Flaky test kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8967:
---
Labels: flaky-test  (was: )

> Flaky test 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig
> 
>
> Key: KAFKA-8967
> URL: https://issues.apache.org/jira/browse/KAFKA-8967
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Priority: Major
>  Labels: flaky-test
>
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig(SaslSslAdminClientIntegrationTest.scala:452)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
> This server does not host this topic-partition.{code}
> Failed in [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25374]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8967) Flaky test kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8967:
---
Component/s: unit tests
 security
 core

> Flaky test 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig
> 
>
> Key: KAFKA-8967
> URL: https://issues.apache.org/jira/browse/KAFKA-8967
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, security, unit tests
>Reporter: Stanislav Kozlovski
>Priority: Major
>  Labels: flaky-test
>
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig(SaslSslAdminClientIntegrationTest.scala:452)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
> This server does not host this topic-partition.{code}
> Failed in [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25374]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8967) Flaky test kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946412#comment-16946412
 ] 

Matthias J. Sax commented on KAFKA-8967:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/8378/consoleFull]

> Flaky test 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig
> 
>
> Key: KAFKA-8967
> URL: https://issues.apache.org/jira/browse/KAFKA-8967
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Priority: Major
>
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> kafka.api.SaslSslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig(SaslSslAdminClientIntegrationTest.scala:452)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
> This server does not host this topic-partition.{code}
> Failed in [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25374]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8995) Add new metric on broker to illustrate produce request compression percentage

2019-10-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8995:
-
Reporter: Jun Rao  (was: Guozhang Wang)

> Add new metric on broker to illustrate produce request compression percentage
> -
>
> Key: KAFKA-8995
> URL: https://issues.apache.org/jira/browse/KAFKA-8995
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: needs-kip
>
> When `compression.type` is set to `producer`, we would accept produce request 
> and use its encoded compression to apply to the logs; otherwise we would 
> recompress the message according to the configured compression type before 
> appending. There are pros and cons to recompress the data: you pay more CPU 
> to recompress, but you reduce the storage cost. 
> In practice, if the incoming produce requests are not compressed, then 
> compressing before appending maybe more beneficial, otherwise just keep them 
> as if `producer` config maybe better. Adding a metric to expose the incoming 
> requests' compression in percentage would be a helpful data point to help 
> operators selecting their compression policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8995) Add new metric on broker to illustrate produce request compression percentage

2019-10-07 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-8995:


 Summary: Add new metric on broker to illustrate produce request 
compression percentage
 Key: KAFKA-8995
 URL: https://issues.apache.org/jira/browse/KAFKA-8995
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Guozhang Wang
Assignee: Guozhang Wang


When `compression.type` is set to `producer`, we would accept produce request 
and use its encoded compression to apply to the logs; otherwise we would 
recompress the message according to the configured compression type before 
appending. There are pros and cons to recompress the data: you pay more CPU to 
recompress, but you reduce the storage cost. 

In practice, if the incoming produce requests are not compressed, then 
compressing before appending maybe more beneficial, otherwise just keep them as 
if `producer` config maybe better. Adding a metric to expose the incoming 
requests' compression in percentage would be a helpful data point to help 
operators selecting their compression policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-07 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned KAFKA-8994:
-

Assignee: Vinoth Chandar

> Streams should expose standby replication information & allow stale reads of 
> state store
> 
>
> Key: KAFKA-8994
> URL: https://issues.apache.org/jira/browse/KAFKA-8994
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Currently Streams interactive queries (IQ) fail during the time period where 
> there is a rebalance in progress. 
> Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
>  
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
>  
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-07 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-8994:
-

 Summary: Streams should expose standby replication information & 
allow stale reads of state store
 Key: KAFKA-8994
 URL: https://issues.apache.org/jira/browse/KAFKA-8994
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Vinoth Chandar


Currently Streams interactive queries (IQ) fail during the time period where 
there is a rebalance in progress. 

Consider the following scenario in a three node Streams cluster with node A, 
node S and node R, executing a stateful sub-topology/topic group with 1 
partition and `_num.standby.replicas=1_`  
 * *t0*: A is the active instance owning the partition, B is the standby that 
keeps replicating the A's state into its local disk, R just routes streams IQs 
to active instance using StreamsMetadata
 * *t1*: IQs pick node R as router, R forwards query to A, A responds back to R 
which reverse forwards back the results.
 * *t2:* Active A instance is killed and rebalance begins. IQs start failing to 
A
 * *t3*: Rebalance assignment happens and standby B is now promoted as active 
instance. IQs continue to fail
 * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
commit position, IQs continue to fail
 * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
start succeeding again

 

Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
take few seconds (~10 seconds based on defaults values). Depending on how laggy 
the standby B was prior to A being killed, t4 can take few seconds-minutes. 

 

While this behavior favors consistency over availability at all times, the long 
unavailability window might be undesirable for certain classes of applications 
(e.g simple caches or dashboards). 

 

This issue aims to also expose information about standby B to R, during each 
rebalance such that the queries can be routed by an application to a standby to 
serve stale reads, choosing availability over consistency. 

 

 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8206) A consumer can't discover new group coordinator when the cluster was partly restarted

2019-10-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8206:
-
Labels: needs-kip  (was: )

> A consumer can't discover new group coordinator when the cluster was partly 
> restarted
> -
>
> Key: KAFKA-8206
> URL: https://issues.apache.org/jira/browse/KAFKA-8206
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0, 2.2.0
>Reporter: alex gabriel
>Assignee: Ivan Yurchenko
>Priority: Critical
>  Labels: needs-kip
>
> *A consumer can't discover new group coordinator when the cluster was partly 
> restarted*
> Preconditions:
> I use Kafka server and Java kafka-client lib 2.2 version
> I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1 
> ZK(localhost:2181)
> I have replication factor 2 for the all my topics and 
> '_unclean.leader.election.enable=true_' on both Kafka nodes.
> Steps to reproduce:
> 1) Start 2nodes (localhost:9092/localhost:9093)
> 2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
> {noformat}
> // discovered group coordinator (0-node)
> 2019-04-09 16:23:18,963 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
> ...metadatacache is updated (2 nodes in the cluster list)
> 2019-04-09 16:23:18,928 DEBUG 
> [org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] - 
> [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending 
> metadata request (type=MetadataRequest, topics=) to node localhost:9092 
> (id: -1 rack: null)>
> 2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null), 
> localhost:9093 (id: 1 rack: null)], partitions = [], controller = 
> localhost:9092 (id: 0 rack: null))}>
> {noformat}
> 3) Shutdown 1-node (localhost:9093)
> {noformat}
> // metadata was updated to the 4 version (but for some reasons it still had 2 
> alive nodes inside cluster)
> 2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null), 
> localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic = 
> events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1], 
> offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader 
> = 0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller = 
> localhost:9092 (id: 0 rack: null))}>
> //consumers thinks that node-1 is still alive and try to send coordinator 
> lookup to it but failed
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group 
> coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or 
> invalid, will attempt rediscovery>
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
> 2019-04-09 16:24:01,117 WARN 
> [org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1 
> (localhost:9093) could not be established. Broker may not be available.>
> // refreshing metadata again
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Cancelled request with header RequestHeader(apiKey=FIND_COORDINATOR, 
> apiVersion=2, clientId=events-consumer0, correlationId=112) due to node 1 
> being disconnected>
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Coordinator discovery failed, refreshing metadata>
> // metadata was updated to the 5 version where cluster had only 0-node 
> localhost:9092 as 

[jira] [Commented] (KAFKA-8206) A consumer can't discover new group coordinator when the cluster was partly restarted

2019-10-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946323#comment-16946323
 ] 

Guozhang Wang commented on KAFKA-8206:
--

[~ivanyu] Thanks for the PR, I think your approach is promising but since it 
changes the default behavior of consumer / producer clients which is part of 
our public API (it is a semantics change), we'd better have a discussion about 
its pros / cons under different scenarios. Could you drive the KIP discussion 
as well?

> A consumer can't discover new group coordinator when the cluster was partly 
> restarted
> -
>
> Key: KAFKA-8206
> URL: https://issues.apache.org/jira/browse/KAFKA-8206
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0, 2.2.0
>Reporter: alex gabriel
>Assignee: Ivan Yurchenko
>Priority: Critical
>
> *A consumer can't discover new group coordinator when the cluster was partly 
> restarted*
> Preconditions:
> I use Kafka server and Java kafka-client lib 2.2 version
> I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1 
> ZK(localhost:2181)
> I have replication factor 2 for the all my topics and 
> '_unclean.leader.election.enable=true_' on both Kafka nodes.
> Steps to reproduce:
> 1) Start 2nodes (localhost:9092/localhost:9093)
> 2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
> {noformat}
> // discovered group coordinator (0-node)
> 2019-04-09 16:23:18,963 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
> ...metadatacache is updated (2 nodes in the cluster list)
> 2019-04-09 16:23:18,928 DEBUG 
> [org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] - 
> [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending 
> metadata request (type=MetadataRequest, topics=) to node localhost:9092 
> (id: -1 rack: null)>
> 2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null), 
> localhost:9093 (id: 1 rack: null)], partitions = [], controller = 
> localhost:9092 (id: 0 rack: null))}>
> {noformat}
> 3) Shutdown 1-node (localhost:9093)
> {noformat}
> // metadata was updated to the 4 version (but for some reasons it still had 2 
> alive nodes inside cluster)
> 2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] - 
> Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id = 
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null), 
> localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic = 
> events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1], 
> offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader 
> = 0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller = 
> localhost:9092 (id: 0 rack: null))}>
> //consumers thinks that node-1 is still alive and try to send coordinator 
> lookup to it but failed
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
> 2019-04-09 16:23:46,981 INFO 
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group 
> coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or 
> invalid, will attempt rediscovery>
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
> 2019-04-09 16:24:01,117 WARN 
> [org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer 
> clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1 
> (localhost:9093) could not be established. Broker may not be available.>
> // refreshing metadata again
> 2019-04-09 16:24:01,117 DEBUG 
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
>  - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] 
> Cancelled request with header RequestHeader(apiKey=FIND_COORDINATOR, 
> apiVersion=2, clientId=events-consumer0, correlationId=112) due to node 1 
> being disconnected>
> 2019-04-09 16:24:01,117 DEBUG 
> 

[jira] [Resolved] (KAFKA-8839) Improve logging in Kafka Streams around debugging task lifecycle

2019-10-07 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8839.
---
Resolution: Fixed

Closing since the PR has been merged

> Improve logging in Kafka Streams around debugging task lifecycle 
> -
>
> Key: KAFKA-8839
> URL: https://issues.apache.org/jira/browse/KAFKA-8839
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.4.0
>
>
> As a follow up to KAFKA-8831, this Jira will track efforts around improving 
> logging/docs around 
>  
>  * Being able to follow state of tasks from assignment to restoration 
>  * Better detection of misconfigured state store dir 
>  * Docs giving guidance for rebalance time and state store config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8839) Improve logging in Kafka Streams around debugging task lifecycle

2019-10-07 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated KAFKA-8839:
--
Fix Version/s: 2.4.0

> Improve logging in Kafka Streams around debugging task lifecycle 
> -
>
> Key: KAFKA-8839
> URL: https://issues.apache.org/jira/browse/KAFKA-8839
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.4.0
>
>
> As a follow up to KAFKA-8831, this Jira will track efforts around improving 
> logging/docs around 
>  
>  * Being able to follow state of tasks from assignment to restoration 
>  * Better detection of misconfigured state store dir 
>  * Docs giving guidance for rebalance time and state store config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5488) KStream.branch should not return a Array of streams we have to access by known index

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946305#comment-16946305
 ] 

ASF GitHub Bot commented on KAFKA-5488:
---

mjsax commented on pull request #6164: KAFKA-5488: A method-chaining way to 
build branches for topology
URL: https://github.com/apache/kafka/pull/6164
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KStream.branch should not return a Array of streams we have to access by 
> known index
> 
>
> Key: KAFKA-5488
> URL: https://issues.apache.org/jira/browse/KAFKA-5488
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Marcel "childNo͡.de" Trautwein
>Priority: Major
>  Labels: kip
>
> KIP-418: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-418%3A+A+method-chaining+way+to+branch+KStream]
> long story short: it's a mess to get a {{KStream<>[]}} out from 
> {{KStream<>branch(Predicate<>...)}}. It breaks the fluent API and it produces 
> bad code which is not that good to maintain since you have to know the right 
> index for an unnamed branching stream.
> Example
> {code:java}
> import org.apache.kafka.streams.kstream.KStreamBuilder;
> import org.apache.kafka.streams.kstream.KStream;
> public class StreamAppWithBranches {
> public static void main(String... args) {
> KStream[] branchedStreams= new KStreamBuilder()
> .stream("eventTopic")
> .branch(
> (k, v) -> EventType::validData
> (k, v) -> true
> );
> 
> branchedStreams[0]
> .to("topicValidData");
> 
> branchedStreams[1]
> .to("topicInvalidData");
> }
> }
> {code}
> Quick idea, s.th. like {{void branch(final BranchDefinition, 
> Consumer>>... branchPredicatesAndHandlers);}} where you can write 
> branches/streams code nested where it belongs to
> so it would be possible to write code like
> {code:java}
> new KStreamBuilder()
> .stream("eventTopic")
> .branch(
> Branch.create(
> (k, v) -> EventType::validData,
> stream -> stream.to("topicValidData")
> ),
> Branch.create(
> (k, v) -> true,
> stream -> stream.to("topicInvalidData")
> )
> );
> {code}
> I'll go forward to evaluate some ideas:
>  [https://gitlab.com/childno.de/apache_kafka/snippets/1665655]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5488) KStream.branch should not return a Array of streams we have to access by known index

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946304#comment-16946304
 ] 

Matthias J. Sax commented on KAFKA-5488:


The corresponding KIP was inactive for a long time, and it's unclear if 
[~iponomarev] is still working on it. Hence, I am closing the corresponding PR 
for now and un-assign the ticket. Feel free to resume your work at any time 
[~iponomarev] – if anybody else want to pick it up, feel free to do so.

> KStream.branch should not return a Array of streams we have to access by 
> known index
> 
>
> Key: KAFKA-5488
> URL: https://issues.apache.org/jira/browse/KAFKA-5488
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.1, 0.11.0.0
>Reporter: Marcel "childNo͡.de" Trautwein
>Assignee: Ivan Ponomarev
>Priority: Major
>  Labels: kip
>
> KIP-418: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-418%3A+A+method-chaining+way+to+branch+KStream]
> long story short: it's a mess to get a {{KStream<>[]}} out from 
> {{KStream<>branch(Predicate<>...)}}. It breaks the fluent API and it produces 
> bad code which is not that good to maintain since you have to know the right 
> index for an unnamed branching stream.
> Example
> {code:java}
> import org.apache.kafka.streams.kstream.KStreamBuilder;
> import org.apache.kafka.streams.kstream.KStream;
> public class StreamAppWithBranches {
> public static void main(String... args) {
> KStream[] branchedStreams= new KStreamBuilder()
> .stream("eventTopic")
> .branch(
> (k, v) -> EventType::validData
> (k, v) -> true
> );
> 
> branchedStreams[0]
> .to("topicValidData");
> 
> branchedStreams[1]
> .to("topicInvalidData");
> }
> }
> {code}
> Quick idea, s.th. like {{void branch(final BranchDefinition, 
> Consumer>>... branchPredicatesAndHandlers);}} where you can write 
> branches/streams code nested where it belongs to
> so it would be possible to write code like
> {code:java}
> new KStreamBuilder()
> .stream("eventTopic")
> .branch(
> Branch.create(
> (k, v) -> EventType::validData,
> stream -> stream.to("topicValidData")
> ),
> Branch.create(
> (k, v) -> true,
> stream -> stream.to("topicInvalidData")
> )
> );
> {code}
> I'll go forward to evaluate some ideas:
>  [https://gitlab.com/childno.de/apache_kafka/snippets/1665655]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8432) Add static membership to Sticky assignor

2019-10-07 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-8432:
--

Assignee: (was: Boyang Chen)

> Add static membership to Sticky assignor
> 
>
> Key: KAFKA-8432
> URL: https://issues.apache.org/jira/browse/KAFKA-8432
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7245) Deprecate WindowStore#put(key, value)

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946280#comment-16946280
 ] 

ASF GitHub Bot commented on KAFKA-7245:
---

mjsax commented on pull request #7105: KAFKA-7245 (Deprecate 
WindowStore#put(key, value)) :- The method in t…
URL: https://github.com/apache/kafka/pull/7105
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecate WindowStore#put(key, value)
> -
>
> Key: KAFKA-7245
> URL: https://issues.apache.org/jira/browse/KAFKA-7245
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Omkar Mestry
>Priority: Minor
>  Labels: kip, newbie
>
> We want to remove `WindowStore#put(key, value)` – for this, we first need to 
> deprecate is via a KIP and remove later.
> Instead of using `WindowStore#put(key, value)` we need to migrate code to 
> specify the timestamp explicitly using `WindowStore#put(key, value, 
> timestamp)`. The current code base use the explicit call to set the timestamp 
> in production code already. The simplified `put(key, value)` is only used in 
> tests, and thus, we would need to update those tests.
> KIP-474 :- 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115526545]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8972) Toggle bulkloading hit NPE

2019-10-07 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946262#comment-16946262
 ] 

Sophie Blee-Goldman commented on KAFKA-8972:


[~bchen225242] I've also seen this fail with a different but related error, a 
RocksDbException. Do you think we should track this all under one ticket for 
the test_broker_type_bounce failures, or have a separate ticket for each kind 
of failure we see? It's difficult to tell if/how exactly the two are related, 
but I suspect they are since they both have to do with incorrectly 
opening/closing a store

> Toggle bulkloading hit NPE
> --
>
> Key: KAFKA-8972
> URL: https://issues.apache.org/jira/browse/KAFKA-8972
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> test `test_broker_type_bounce` could sometimes fail due to NPE in changelog 
> restoration:
>  
> ```
> [2019-09-30 15:22:43,574] ERROR stream-thread 
> [SmokeTest-357607f6-655b-4b3c-ad3e-f5e5e19df83e-StreamThread-2] Encountered 
> the following error during processing: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> java.lang.NullPointerException
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:403)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:650)
>         at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:59)
>         at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:76)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.startRestoration(StoreChangelogReader.java:205)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:181)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:327)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:863)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:792)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:761)```
> Seems to be some bug with dbAccessor initialization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8993) Add an EOS performance test suite similar to ProducerPerformance

2019-10-07 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-8993:
--

 Summary: Add an EOS performance test suite similar to 
ProducerPerformance
 Key: KAFKA-8993
 URL: https://issues.apache.org/jira/browse/KAFKA-8993
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen
Assignee: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8972) Toggle bulkloading hit NPE

2019-10-07 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-8972:
---
Description: 
test `test_broker_type_bounce` could sometimes fail due to NPE in changelog 
restoration:

 

```

[2019-09-30 15:22:43,574] ERROR stream-thread 
[SmokeTest-357607f6-655b-4b3c-ad3e-f5e5e19df83e-StreamThread-2] Encountered the 
following error during processing: 
(org.apache.kafka.streams.processor.internals.StreamThread)

java.lang.NullPointerException

        at 
org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:403)

        at 
org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:650)

        at 
org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:59)

        at 
org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:76)

        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.startRestoration(StoreChangelogReader.java:205)

        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:181)

        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)

        at 
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:327)

        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:863)

        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:792)

        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:761)```
Seems to be some bug with dbAccessor initialization. 

  was:test `test_broker_type_bounce` could sometimes fail due to a task 
migrated exception by the end of test. This is because during cluster rolling 
bounce the member could timeout and be rejected from rejoining the group. A 
proper fix would be improving the resilience of stream client by making session 
timeout longer.


> Toggle bulkloading hit NPE
> --
>
> Key: KAFKA-8972
> URL: https://issues.apache.org/jira/browse/KAFKA-8972
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> test `test_broker_type_bounce` could sometimes fail due to NPE in changelog 
> restoration:
>  
> ```
> [2019-09-30 15:22:43,574] ERROR stream-thread 
> [SmokeTest-357607f6-655b-4b3c-ad3e-f5e5e19df83e-StreamThread-2] Encountered 
> the following error during processing: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> java.lang.NullPointerException
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:403)
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:650)
>         at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:59)
>         at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:76)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.startRestoration(StoreChangelogReader.java:205)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:181)
>         at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:327)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:863)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:792)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:761)```
> Seems to be some bug with dbAccessor initialization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8972) Toggle bulkloading hit NPE

2019-10-07 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-8972:
---
Summary: Toggle bulkloading hit NPE  (was: Reduce 
streams_broker_bounce_test.py#test_broker_type_bounce flakiness)

> Toggle bulkloading hit NPE
> --
>
> Key: KAFKA-8972
> URL: https://issues.apache.org/jira/browse/KAFKA-8972
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> test `test_broker_type_bounce` could sometimes fail due to a task migrated 
> exception by the end of test. This is because during cluster rolling bounce 
> the member could timeout and be rejected from rejoining the group. A proper 
> fix would be improving the resilience of stream client by making session 
> timeout longer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8973) Give user options to turn off linger.ms/batch.size constraint

2019-10-07 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946232#comment-16946232
 ] 

Boyang Chen commented on KAFKA-8973:


I guess it's not an action item that makes sense, closing it.

> Give user options to turn off linger.ms/batch.size constraint
> -
>
> Key: KAFKA-8973
> URL: https://issues.apache.org/jira/browse/KAFKA-8973
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Currently there is no explicit way to understand whether we hit linger.ms or 
> batch.size when producer sends out batches. From user's perspective, they 
> should be allowed to opt-in either one or both, which improves the 
> throughput/latency control in general.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946208#comment-16946208
 ] 

Guozhang Wang commented on KAFKA-8104:
--

[~nizhikov] Thanks for picking this up, I think this issue is the same as 
reported in https://issues.apache.org/jira/browse/KAFKA-8891 and 
https://issues.apache.org/jira/browse/KAFKA-7263 as well, which is a 
long-lurking bug. So I'm linking them together now.

I will take a look into your PR as well.

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is 

[jira] [Commented] (KAFKA-8262) Flaky Test MetricsIntegrationTest#testStreamMetric

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946177#comment-16946177
 ] 

ASF GitHub Bot commented on KAFKA-8262:
---

guozhangwang commented on pull request #7457: KAFKA-8262, KAFKA-8263: Fix flaky 
test `MetricsIntegrationTest` (#6922)
URL: https://github.com/apache/kafka/pull/7457
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test MetricsIntegrationTest#testStreamMetric
> --
>
> Key: KAFKA-8262
> URL: https://issues.apache.org/jira/browse/KAFKA-8262
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetric/]
> {quote}java.lang.AssertionError: Condition not met within timeout 1. 
> testTaskMetric -> Size of metrics of type:'commit-latency-avg' must be equal 
> to:5 but it's equal to 0 expected:<5> but was:<0> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361) at 
> org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetric(MetricsIntegrationTest.java:228){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8262) Flaky Test MetricsIntegrationTest#testStreamMetric

2019-10-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8262:
-
Fix Version/s: 2.3.1

> Flaky Test MetricsIntegrationTest#testStreamMetric
> --
>
> Key: KAFKA-8262
> URL: https://issues.apache.org/jira/browse/KAFKA-8262
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0, 2.3.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetric/]
> {quote}java.lang.AssertionError: Condition not met within timeout 1. 
> testTaskMetric -> Size of metrics of type:'commit-latency-avg' must be equal 
> to:5 but it's equal to 0 expected:<5> but was:<0> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361) at 
> org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetric(MetricsIntegrationTest.java:228){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8263) Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore

2019-10-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8263:
-
Fix Version/s: 2.3.1

> Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore
> ---
>
> Key: KAFKA-8263
> URL: https://issues.apache.org/jira/browse/KAFKA-8263
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0, 2.3.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetricOfWindowStore/]
> {quote}java.lang.AssertionError: Condition not met within timeout 1. 
> testStoreMetricWindow -> Size of metrics of type:'put-latency-avg' must be 
> equal to:2 but it's equal to 0 expected:<2> but was:<0> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8940) Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8940:
---
Summary: Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance  
(was: Flaky SmokeTestDriverIntegrationTest.shouldWorkWithRebalance)

> Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
> -
>
> Key: KAFKA-8940
> URL: https://issues.apache.org/jira/browse/KAFKA-8940
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: flaky-test
>
> I lost the screen shot unfortunately... it reports the set of expected 
> records does not match the received records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8951) Avoid unnecessary rebalances and downtime for "safe" partitions

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8951:
---
Affects Version/s: 2.4.0

> Avoid unnecessary rebalances and downtime for "safe" partitions
> ---
>
> Key: KAFKA-8951
> URL: https://issues.apache.org/jira/browse/KAFKA-8951
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>Affects Versions: 2.4.0
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With cooperative rebalancing, any partition that is encoded in one consumer's 
> Subscription cannot be re-assigned to a different consumer during that 
> rebalance. The partition must be removed from the assignment and revoked by 
> its old owner before triggering a second rebalance during which it can be 
> assigned. This is to enforce a synchronization barrier so that no two 
> consumers can ever own the same partition at the same time
> This leads to down time for that partition plus a second rebalance, which may 
> not always be necessary. In Streams for example, the consumer will pause all 
> partitions of an active task until it is running (ie has been initialized and 
> restored). It should be safe to give these partitions away, provided they are 
> not resumed between sending the joinGroup request and receiving the syncGroup 
> response.
> One proposal would be to modify two methods in the ConsumerPartitionAssignor 
> interface. 1) ConsumerPartitionAssignor#subscriptionUserData would be passed 
> in the set of `ownedPartitions` that will be included in the subscription, 
> allowing it to remove any that it knows are safe to give away.
> 2) ConsumerPartitionAssignor#onAssignment would be passed the set of revoked 
> partitions, allowing it to remove any that it knows were already reassigned 
> and should not trigger another rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8992) Don't expose Errors in `RemoveMemberFromGroupResult`

2019-10-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-8992:
--

Assignee: Boyang Chen  (was: Jason Gustafson)

> Don't expose Errors in `RemoveMemberFromGroupResult`
> 
>
> Key: KAFKA-8992
> URL: https://issues.apache.org/jira/browse/KAFKA-8992
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Blocker
> Fix For: 2.4.0
>
>
> The type `RemoveMemberFromGroupResult` exposes `Errors` from `topLevelError`. 
> We should just get rid of this API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8992) Don't expose Errors in `RemoveMemberFromGroupResult`

2019-10-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-8992:
---
Description: The type `RemoveMemberFromGroupResult` exposes `Errors` from 
`topLevelError`. We should just get rid of this API.  (was: We have two new 
APIs which have exposed the `Errors` object: `RemoveMemberFromGroupResult` and 
`DeleteConsumerGroupOffsetsResult`. We should change them to use exception 
types instead.)

> Don't expose Errors in `RemoveMemberFromGroupResult`
> 
>
> Key: KAFKA-8992
> URL: https://issues.apache.org/jira/browse/KAFKA-8992
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.4.0
>
>
> The type `RemoveMemberFromGroupResult` exposes `Errors` from `topLevelError`. 
> We should just get rid of this API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946114#comment-16946114
 ] 

ASF GitHub Bot commented on KAFKA-8104:
---

nizhikov commented on pull request #7460: KAFKA-8104: Consumer cannot rejoin to 
the group after rebalancing
URL: https://github.com/apache/kafka/pull/7460
 
 
   This PR contains the fix of race condition bug between "consumer thread" and 
"consumer coordinator heartbeat thread". It reproduces in many production 
environments.
   
   Condition for reproducing:
   
   1. Consumer thread initiates rejoin to the group because of commit timeout. 
Call of `AbstractCoordinator#joinGroupIfNeeded` which leads to 
`sendJoinGroupRequest`.
   2. `JoinGroupResponseHandler` writes to the 
`AbstractCoordinator.this.generation` new generation data and leaves the` 
synchronized` section.
   3. Heartbeat thread executes `mabeLeaveGroup` and clears generation data via 
`resetGenerationOnLeaveGroup`.
   4. Consumer thread executes `onJoinComplete(generation.generationId, 
generation.memberId, generation.protocol, memberAssignment);` with the cleared 
generation data. This leads to the corresponding exception.
   
   The race fixed with the condition in `maybeLeaveGroup`: if we have ongoing 
rejoin process in consumer thread there is no reason to reset generation data 
and send `LeaveGroupRequest` in heartbeat thread.
   
   This PR contains unfair "reproducer".
   It implemented with the `CountDownLatch` that imitates described race in 
`AbstractCoordinator` code.
   
   I need assistance on how should be fair reproducer implemented.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - 

[jira] [Updated] (KAFKA-8992) Don't expose Errors in `RemoveMemberFromGroupResult`

2019-10-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-8992:
---
Summary: Don't expose Errors in `RemoveMemberFromGroupResult`  (was: Don't 
expose Errors in Admin result objects)

> Don't expose Errors in `RemoveMemberFromGroupResult`
> 
>
> Key: KAFKA-8992
> URL: https://issues.apache.org/jira/browse/KAFKA-8992
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.4.0
>
>
> We have two new APIs which have exposed the `Errors` object: 
> `RemoveMemberFromGroupResult` and `DeleteConsumerGroupOffsetsResult`. We 
> should change them to use exception types instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8992) Don't expose Errors in Admin result objects

2019-10-07 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946112#comment-16946112
 ] 

Jason Gustafson commented on KAFKA-8992:


My mistake. It is just `RemoveMemberFromGroupResult` that has exposed `Errors`. 

> Don't expose Errors in Admin result objects
> ---
>
> Key: KAFKA-8992
> URL: https://issues.apache.org/jira/browse/KAFKA-8992
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.4.0
>
>
> We have two new APIs which have exposed the `Errors` object: 
> `RemoveMemberFromGroupResult` and `DeleteConsumerGroupOffsetsResult`. We 
> should change them to use exception types instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8147:
---
Description: 
The streams DSL does not provide a way to configure the changelog topic created 
by KTable.suppress.

>From the perspective of an external user this could be implemented similar to 
>the configuration of aggregate + materialized, i.e.,
{code:java}
changelogTopicConfigs = // Configs
materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
..
KGroupedStream.aggregate(..,materialized)
{code}
[KIP-446: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress]

  was:
The streams DSL does not provide a way to configure the changelog topic created 
by KTable.suppress.

>From the perspective of an external user this could be implemented similar to 
>the configuration of aggregate + materialized, i.e., 
{code:java}
changelogTopicConfigs = // Configs
materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
..
KGroupedStream.aggregate(..,materialized)
{code}


> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: Maarten
>Priority: Minor
>  Labels: kip
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e.,
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}
> [KIP-446: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8992) Don't expose Errors in Admin result objects

2019-10-07 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-8992:
--

 Summary: Don't expose Errors in Admin result objects
 Key: KAFKA-8992
 URL: https://issues.apache.org/jira/browse/KAFKA-8992
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 2.4.0


We have two new APIs which have exposed the `Errors` object: 
`RemoveMemberFromGroupResult` and `DeleteConsumerGroupOffsetsResult`. We should 
change them to use exception types instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8147:
---
Labels: kip  (was: needs-kip)

> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: Maarten
>Priority: Minor
>  Labels: kip
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e., 
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8575) Investigate removing EAGER & cleaning up task suspension (part 8)

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8575:
---
Description: 
With KIP-429 the suspend/resume of tasks may have minimal gains while adding a 
lot of complexity and potential bugs. We should consider removing/cleaning it 
up.

We should also consider removing EAGER rebalancing from Streams entirely, if 
results indicate that COOPERATIVE displays superior performance. 

  was:
With KIP-429 the suspend/resume of tasks may have minimal gains while adding a 
lot of complexity and potential bugs. We should consider removing/cleaning it 
up.

 


> Investigate removing EAGER &  cleaning up task suspension (part 8)
> --
>
> Key: KAFKA-8575
> URL: https://issues.apache.org/jira/browse/KAFKA-8575
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With KIP-429 the suspend/resume of tasks may have minimal gains while adding 
> a lot of complexity and potential bugs. We should consider removing/cleaning 
> it up.
> We should also consider removing EAGER rebalancing from Streams entirely, if 
> results indicate that COOPERATIVE displays superior performance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8575) Investigate removing EAGER & cleaning up task suspension (part 8)

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8575:
---
Description: 
With KIP-429 the suspend/resume of tasks may have minimal gains while adding a 
lot of complexity and potential bugs. We should consider removing/cleaning it 
up.

 

  was:With KIP-429 the suspend/resume of tasks may have minimal gains while 
adding a lot of complexity and potential bugs. We should consider 
removing/cleaning it up.


> Investigate removing EAGER &  cleaning up task suspension (part 8)
> --
>
> Key: KAFKA-8575
> URL: https://issues.apache.org/jira/browse/KAFKA-8575
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With KIP-429 the suspend/resume of tasks may have minimal gains while adding 
> a lot of complexity and potential bugs. We should consider removing/cleaning 
> it up.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8575) Investigate removing EAGER & cleaning up task suspension (part 8)

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8575:
---
Affects Version/s: 3.0.0

> Investigate removing EAGER &  cleaning up task suspension (part 8)
> --
>
> Key: KAFKA-8575
> URL: https://issues.apache.org/jira/browse/KAFKA-8575
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With KIP-429 the suspend/resume of tasks may have minimal gains while adding 
> a lot of complexity and potential bugs. We should consider removing/cleaning 
> it up.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8575) Investigate removing EAGER & cleaning up task suspension (part 8)

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8575:
---
Summary: Investigate removing EAGER &  cleaning up task suspension (part 8) 
 (was: Investigate cleaning up task suspension (part 8))

> Investigate removing EAGER &  cleaning up task suspension (part 8)
> --
>
> Key: KAFKA-8575
> URL: https://issues.apache.org/jira/browse/KAFKA-8575
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With KIP-429 the suspend/resume of tasks may have minimal gains while adding 
> a lot of complexity and potential bugs. We should consider removing/cleaning 
> it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8510) Update StreamsPartitionAssignor to use the built-in owned partitions to achieve stickiness (part 7)

2019-10-07 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8510:
---
Component/s: (was: consumer)

> Update StreamsPartitionAssignor to use the built-in owned partitions to 
> achieve stickiness (part 7)
> ---
>
> Key: KAFKA-8510
> URL: https://issues.apache.org/jira/browse/KAFKA-8510
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Fix For: 2.4.0
>
>
> Today this information is encoded as part of the user data bytes, we can now 
> remove it and leverage on the owned partitions of the protocol directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6498) Add RocksDB statistics via Streams metrics

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6498:
---
Description: 
RocksDB's own stats can be programmatically exposed via 
{{Options.statistics()}} and the JNI `Statistics` has indeed implemented many 
useful settings already. However these stats are not exposed directly via 
Streams today and hence for any users who wants to get access to them they have 
to manually interact with the underlying RocksDB directly, not through Streams.

We should expose such stats via Streams metrics programmatically for users to 
investigate them without trying to access the rocksDB directly.

[KIP-471: Expose RocksDB Metrics in Kafka 
Streams|http://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams]

  was:
RocksDB's own stats can be programmatically exposed via 
{{Options.statistics()}} and the JNI `Statistics` has indeed implemented many 
useful settings already. However these stats are not exposed directly via 
Streams today and hence for any users who wants to get access to them they have 
to manually interact with the underlying RocksDB directly, not through Streams.

We should expose such stats via Streams metrics programmatically for users to 
investigate them without trying to access the rocksDB directly.



> Add RocksDB statistics via Streams metrics
> --
>
> Key: KAFKA-6498
> URL: https://issues.apache.org/jira/browse/KAFKA-6498
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics, streams
>Reporter: Guozhang Wang
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: kip
>
> RocksDB's own stats can be programmatically exposed via 
> {{Options.statistics()}} and the JNI `Statistics` has indeed implemented many 
> useful settings already. However these stats are not exposed directly via 
> Streams today and hence for any users who wants to get access to them they 
> have to manually interact with the underlying RocksDB directly, not through 
> Streams.
> We should expose such stats via Streams metrics programmatically for users to 
> investigate them without trying to access the rocksDB directly.
> [KIP-471: Expose RocksDB Metrics in Kafka 
> Streams|http://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6498) Add RocksDB statistics via Streams metrics

2019-10-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6498:
---
Labels: kip  (was: needs-kip)

> Add RocksDB statistics via Streams metrics
> --
>
> Key: KAFKA-6498
> URL: https://issues.apache.org/jira/browse/KAFKA-6498
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics, streams
>Reporter: Guozhang Wang
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: kip
>
> RocksDB's own stats can be programmatically exposed via 
> {{Options.statistics()}} and the JNI `Statistics` has indeed implemented many 
> useful settings already. However these stats are not exposed directly via 
> Streams today and hence for any users who wants to get access to them they 
> have to manually interact with the underlying RocksDB directly, not through 
> Streams.
> We should expose such stats via Streams metrics programmatically for users to 
> investigate them without trying to access the rocksDB directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946061#comment-16946061
 ] 

Nikolay Izhikov edited comment on KAFKA-8104 at 10/7/19 5:28 PM:
-

Reproducer simplified and updated.

[1] https://github.com/nizhikov/kafka/pull/1


was (Author: nizhikov):
Reproducer simplified and updated.

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946061#comment-16946061
 ] 

Nikolay Izhikov commented on KAFKA-8104:


Reproducer simplified and updated.

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to 

[jira] [Resolved] (KAFKA-6263) Expose metric for group metadata loading duration

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-6263.
--
Fix Version/s: 2.4.0
   Resolution: Fixed

> Expose metric for group metadata loading duration
> -
>
> Key: KAFKA-6263
> URL: https://issues.apache.org/jira/browse/KAFKA-6263
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jason Gustafson
>Assignee: Anastasia Vela
>Priority: Major
> Fix For: 2.4.0
>
>
> We have seen in several cases where the log cleaner either wasn't enabled or 
> had experienced some failure that __consumer_offsets partitions can grow 
> excessively. When one of these partitions changes leadership, the new 
> coordinator must load the offset cache from the start of the log, which can 
> take arbitrarily long depending on how large the partition has grown (we have 
> seen cases where it took hours). Catching this problem is not always easy 
> because the condition is rare and the symptom just tends to be a long period 
> of inactivity in the consumer group which gradually gets worse over time. It 
> may therefore be useful to have a broker metric for the load time so that it 
> can be monitored and potentially alerted on. Same thing goes for the 
> transaction log 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946006#comment-16946006
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

guozhangwang commented on pull request #7386: KAFKA-8179: Part 7, cooperative 
rebalancing in Streams
URL: https://github.com/apache/kafka/pull/7386
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8985) Use flexibleVersions with LeaderAndIsr, UMR, etc., and improve RequestResponseTest coverage

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946000#comment-16946000
 ] 

ASF GitHub Bot commented on KAFKA-8985:
---

hachikuji commented on pull request #7453: KAFKA-8985; Add flexible version 
support to inter-broker APIs
URL: https://github.com/apache/kafka/pull/7453
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use flexibleVersions with LeaderAndIsr, UMR, etc., and improve 
> RequestResponseTest coverage
> ---
>
> Key: KAFKA-8985
> URL: https://issues.apache.org/jira/browse/KAFKA-8985
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>
> We should use flexibleVersions with LeaderAndIsr, UpdateMetadataRequest, and 
> the other inter-broker APIs.  We should also bump  ApiVersion  and improve 
> the test coverage of RequestResponseTest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8985) Use flexibleVersions with LeaderAndIsr, UMR, etc., and improve RequestResponseTest coverage

2019-10-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8985.

Fix Version/s: 2.4.0
 Assignee: Jason Gustafson  (was: Colin McCabe)
   Resolution: Fixed

> Use flexibleVersions with LeaderAndIsr, UMR, etc., and improve 
> RequestResponseTest coverage
> ---
>
> Key: KAFKA-8985
> URL: https://issues.apache.org/jira/browse/KAFKA-8985
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.4.0
>
>
> We should use flexibleVersions with LeaderAndIsr, UpdateMetadataRequest, and 
> the other inter-broker APIs.  We should also bump  ApiVersion  and improve 
> the test coverage of RequestResponseTest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8262) Flaky Test MetricsIntegrationTest#testStreamMetric

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945965#comment-16945965
 ] 

ASF GitHub Bot commented on KAFKA-8262:
---

cadonna commented on pull request #7457: KAFKA-8262, KAFKA-8263: Fix flaky test 
`MetricsIntegrationTest` (#6922)
URL: https://github.com/apache/kafka/pull/7457
 
 
   - Timeout occurred due to initial slow rebalancing.
   - Added code to wait until `KafkaStreams` instance is in state RUNNING to 
check registration of metrics and in state NOT_RUNNING to check deregistration 
of metrics.
   - I removed all other wait conditions, because they are not needed if 
`KafkaStreams` instance is in the right state.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test MetricsIntegrationTest#testStreamMetric
> --
>
> Key: KAFKA-8262
> URL: https://issues.apache.org/jira/browse/KAFKA-8262
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetric/]
> {quote}java.lang.AssertionError: Condition not met within timeout 1. 
> testTaskMetric -> Size of metrics of type:'commit-latency-avg' must be equal 
> to:5 but it's equal to 0 expected:<5> but was:<0> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361) at 
> org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetric(MetricsIntegrationTest.java:228){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8989) Embedded broker could not be reached in unit test

2019-10-07 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945859#comment-16945859
 ] 

Bruno Cadonna commented on KAFKA-8989:
--

[~bchen225242] Could you please add some description or even better some code 
to reproduce this bug? 

> Embedded broker could not be reached in unit test
> -
>
> Key: KAFKA-8989
> URL: https://issues.apache.org/jira/browse/KAFKA-8989
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8934) Introduce Instance-level Metrics

2019-10-07 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-8934.
--
Fix Version/s: 2.4.0
   Resolution: Fixed

> Introduce Instance-level Metrics
> 
>
> Key: KAFKA-8934
> URL: https://issues.apache.org/jira/browse/KAFKA-8934
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> Introduce instance-level metrics as proposed in KIP-444.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8977) Remove MockStreamsMetrics Since it is not a Mock

2019-10-07 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945729#comment-16945729
 ] 

Bruno Cadonna commented on KAFKA-8977:
--

Thank you for picking this up.
 
IMO we should mock {{StreamsMetricsImpl}} in the test where we do not directly 
test {{StreamsMetricsImpl}}. However, there is no need to write a mock. I would 
use {{EasyMock}} or {{PowerMock}} to mock {{StreamsMetricsImpl}}. You would use 
{{EasyMock}} if only instance methods need to be mocked and {{PowerMock}} if 
class methods or final methods need to be mocked. You can find an example of 
both types of mocks for {{StreamsMetricsImpl}} in {{NamedCacheMetricsTest}}. 

> Remove MockStreamsMetrics Since it is not a Mock
> 
>
> Key: KAFKA-8977
> URL: https://issues.apache.org/jira/browse/KAFKA-8977
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: bibin sebastian
>Priority: Minor
>  Labels: newbie
>
> The class {{MockStreamsMetrics}} is used throughout unit tests as a mock but 
> it is not really a mock since it only hides two parameters of the 
> {{StreamsMetricsImpl}} constructor. Either a real mock or the real 
> {{StreamsMetricsImpl}} should be used in the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7471) Multiple Consumer Group Management (Describe, Reset, Delete)

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-7471.
--
Fix Version/s: 2.4.0
   Resolution: Fixed

> Multiple Consumer Group Management (Describe, Reset, Delete)
> 
>
> Key: KAFKA-7471
> URL: https://issues.apache.org/jira/browse/KAFKA-7471
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Alex Dunayevsky
>Assignee: Alex Dunayevsky
>Priority: Major
> Fix For: 2.4.0
>
>
> Functionality needed:
>  * Describe/Delete/Reset offsets on multiple consumer groups at a time 
> (including each group by repeating `--group` parameter)
>  * Describe/Delete/Reset offsets on ALL consumer groups at a time (add new 
> --groups-all option similar to --topics-all)
>  * Generate CSV for multiple consumer groups
> What are the benifits? 
>  * No need to start a new JVM to perform each query on every single consumer 
> group
>  * Abiltity to query groups by their status (for instance, `-v grepping` by 
> `Stable` to spot problematic/dead/empty groups)
>  * Ability to export offsets to reset for multiple consumer groups to a CSV 
> file (needs CSV generation export/import format rework)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7800) Extend Admin API to support dynamic application log levels

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-7800.
--
Fix Version/s: 2.4.0
   Resolution: Fixed

> Extend Admin API to support dynamic application log levels
> --
>
> Key: KAFKA-7800
> URL: https://issues.apache.org/jira/browse/KAFKA-7800
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Major
> Fix For: 2.4.0
>
>
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-412%3A+Extend+Admin+API+to+support+dynamic+application+log+levels]
> Logging is a critical part of any system's infrastructure. It is the most 
> direct way of observing what is happening with a system. In the case of 
> issues, it helps us diagnose the problem quickly which in turn helps lower 
> the 
> [MTTR|http://enterprisedevops.org/article/devops-metric-mean-time-to-recovery-mttr-definition-and-reasoning].
> Kafka supports application logging via the log4j library and outputs messages 
> in various log levels (TRACE, DEBUG, INFO, WARN, ERROR). Log4j is a rich 
> library that supports fine-grained logging configurations (e.g use INFO-level 
> logging in {{kafka.server.ReplicaManager}} and use DEBUG-level in 
> {{kafka.server.KafkaApis}}).
> This is statically configurable through the 
> [log4j.properties|https://github.com/apache/kafka/blob/trunk/config/log4j.properties]
>  file which gets read once at broker start-up.
> A problem with this static configuration is that we cannot alter the log 
> levels when a problem arises. It is severely impractical to edit a properties 
> file and restart all brokers in order to gain visibility of a problem taking 
> place in production.
> It would be very useful if we support dynamically altering the log levels at 
> runtime without needing to restart the Kafka process.
> Log4j itself supports dynamically altering the log levels in a programmatic 
> way and Kafka exposes a [JMX 
> API|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/Log4jController.scala]
>  that lets you alter them. This allows users to change the log levels via a 
> GUI (jconsole) or a CLI (jmxterm) that uses JMX.
> There is one problem with changing log levels through JMX that we hope to 
> address and that is *Ease of Use*:
>  * Establishing a connection - Connecting to a remote process via JMX 
> requires configuring and exposing multiple JMX ports to the outside world. 
> This is a burden on users, as most production deployments may stand behind 
> layers of firewalls and have policies against opening ports. This makes 
> opening the ports and connections in the middle of an incident even more 
> burdensome
>  * Security - JMX and tools around it support authentication and 
> authorization but it is an additional hassle to set up credentials for 
> another system.
>  * Manual process - Changing the whole cluster's log level requires manually 
> connecting to each broker. In big deployments, this is severely impractical 
> and forces users to build tooling around it.
> h4. Proposition
> Ideally, Kafka would support dynamically changing log levels and address all 
> of the aforementioned concerns out of the box.
> We propose extending the IncrementalAlterConfig/DescribeConfig Admin API with 
> functionality for dynamically altering the broker's log level.
> This approach would also pave the way for even finer-grained logging logic 
> (e.g log DEBUG level only for a certain topic) and would allow us to leverage 
> the existing *AlterConfigPolicy* for custom user-defined validation of 
> log-level changes.
> These log-level changes will be *temporary* and reverted on broker restart - 
> we will not persist them anywhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8696) Clean up Sum/Count/Total metrics

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-8696.
--
Resolution: Fixed

> Clean up Sum/Count/Total metrics
> 
>
> Key: KAFKA-8696
> URL: https://issues.apache.org/jira/browse/KAFKA-8696
> Project: Kafka
>  Issue Type: Improvement
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Minor
> Fix For: 2.4.0
>
>
> Kafka has a family of metrics consisting of:
> org.apache.kafka.common.metrics.stats.Count
> org.apache.kafka.common.metrics.stats.Sum
> org.apache.kafka.common.metrics.stats.Total
> org.apache.kafka.common.metrics.stats.Rate.SampledTotal
> org.apache.kafka.streams.processor.internals.metrics.CumulativeCount
> These metrics are all related to each other, but their relationship is 
> obscure (and one is redundant) (and another is internal).
> I've recently been involved in a third  recapitulation of trying to work out 
> which metric does what. It seems like it's time to clean up the mess and save 
> everyone from having to work out the mystery for themselves.
> I've proposed https://cwiki.apache.org/confluence/x/kkAyBw to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8943) Move SecurityProviderCreator to a public package

2019-10-07 Thread Rajini Sivaram (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945698#comment-16945698
 ] 

Rajini Sivaram commented on KAFKA-8943:
---

[~mprsai] Yes, please go ahead. Thanks.

> Move SecurityProviderCreator to a public package
> 
>
> Key: KAFKA-8943
> URL: https://issues.apache.org/jira/browse/KAFKA-8943
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Rajini Sivaram
>Priority: Blocker
> Fix For: 2.4.0
>
>
> The public interface `SecurityProviderCreator` added under KAFKA-8669 
> (KIP-492) is currently in the internal package 
> `org.apache.kafka.common.security` along with other internal classes. Since 
> this is a public interface, we should move it to a public package. We should 
> also add `@InterfaceStability.Evolving` annotation.
>  
> Marked as blocker for 2.4.0 since we should do this before the release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-10-07 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945690#comment-16945690
 ] 

Levani Kokhreidze commented on KAFKA-8555:
--

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25549/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/]

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8686) Flaky test ExampleConnectIntegrationTest#testSinkConnector

2019-10-07 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945691#comment-16945691
 ] 

Levani Kokhreidze commented on KAFKA-8686:
--

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25549/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSinkConnector/]

> Flaky test ExampleConnectIntegrationTest#testSinkConnector
> --
>
> Key: KAFKA-8686
> URL: https://issues.apache.org/jira/browse/KAFKA-8686
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: flaky-test
>
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/429/console
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.13/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector.test.stdout*20:09:20*
>  *20:09:20* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSinkConnector FAILED*20:09:20* java.lang.AssertionError: Condition 
> not met within timeout 15000. Connector tasks were not assigned a partition 
> each.*20:09:20* at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376)*20:09:20*
>  at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:353)*20:09:20*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector(ExampleConnectIntegrationTest.java:128)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8669) Add java security providers in Kafka Security config

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-8669.
--
Fix Version/s: 2.4.0
 Assignee: Sai Sandeep
   Resolution: Fixed

Fixed in https://github.com/apache/kafka/pull/7090

> Add java security providers in Kafka Security config
> 
>
> Key: KAFKA-8669
> URL: https://issues.apache.org/jira/browse/KAFKA-8669
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Sai Sandeep
>Assignee: Sai Sandeep
>Priority: Minor
> Fix For: 2.4.0
>
>
> Currently kafka supports ssl.keymanager.algorithm and 
> ssl.trustmanager.algorithm parameters as part of secure config. These 
> parameters can be configured to load the key manager and trust managers which 
> provide keys and certificates for ssl handshakes with the clients/server. The 
> algorithms configured by parameters need to be registered by Java security 
> provider classes. These provider classes are configured as JVM properties 
> through java.security file. An example file given below
> {code:java}
> $ cat /usr/lib/jvm/jdk-8-oracle-x64/jre/lib/security/java.security
> ...
> security.provider.1=sun.security.provider.Sun
> security.provider.2=sun.security.rsa.SunRsaSign
> security.provider.3=sun.security.ec.SunEC
> …
> {code}
> Custom keymanager and trustmanager algorithms can be used to supply the kafka 
> brokers with keys and certificates, these algorithms can be used to replace 
> the traditional, non-scalable static keystore and truststore jks files.
> To take advantage of these custom algorithms, we want to support java 
> security provider parameter in security config. This param can be used by 
> kafka brokers or kafka clients(when connecting to the kafka brokers). The 
> security providers can also be used for configuring security in SASL based 
> communication too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8907) Return topic configs in CreateTopics response

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar updated KAFKA-8907:
-
Reviewer: Manikumar  (was: Manikumar Reddy)

> Return topic configs in CreateTopics response 
> --
>
> Key: KAFKA-8907
> URL: https://issues.apache.org/jira/browse/KAFKA-8907
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.4.0
>
>
> See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-525+-+Return+topic+metadata+and+configs+in+CreateTopics+response]
>  for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8907) Return topic configs in CreateTopics response

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-8907.
--
  Reviewer: Manikumar Reddy
Resolution: Fixed

> Return topic configs in CreateTopics response 
> --
>
> Key: KAFKA-8907
> URL: https://issues.apache.org/jira/browse/KAFKA-8907
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.4.0
>
>
> See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-525+-+Return+topic+metadata+and+configs+in+CreateTopics+response]
>  for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-10-07 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar reassigned KAFKA-7500:


Assignee: Ryanne Dolan  (was: Manikumar)

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Ryanne Dolan
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8233) Helper classes to make it simpler to write test logic with TopologyTestDriver

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945660#comment-16945660
 ] 

ASF GitHub Bot commented on KAFKA-8233:
---

mjsax commented on pull request #7378: KAFKA-8233: KIP-470: TopologyTestDriver 
test input and output usability improvements
URL: https://github.com/apache/kafka/pull/7378
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Helper classes to make it simpler to write test logic with TopologyTestDriver
> -
>
> Key: KAFKA-8233
> URL: https://issues.apache.org/jira/browse/KAFKA-8233
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jukka Karvanen
>Assignee: Jukka Karvanen
>Priority: Minor
>  Labels: kip
>
> When using TopologyTestDriver you need to call ConsumerRecordFactory to 
> create ConsumerRecord passed into pipeInput method to write to topic. Also 
> when calling readOutput to consume from topic, you need to provide correct 
> Deserializers each time.
> You easily end up writing helper methods in your test classed, but this can 
> be avoided when adding generic input and output topic classes.
> This improvement adds TestInputTopic class which wraps TopologyTestDriver  
> and ConsumerRecordFactory methods as one class to be used to write to Input 
> Topics and TestOutputTopic class which collects TopologyTestDriver  reading 
> methods and provide typesafe read methods.
> This is KIP:
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-470%3A+TopologyTestDriver+test+input+and+output+usability+improvements]
>  
>  
> More info and an example of how Stream test looks after using this classes:
> [https://github.com/jukkakarvanen/kafka-streams-test-topics]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8690) Flakey test ConnectWorkerIntegrationTest#testAddAndRemoveWorke

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945656#comment-16945656
 ] 

Matthias J. Sax commented on KAFKA-8690:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/2357/testReport/junit/org.apache.kafka.connect.integration/ConnectWorkerIntegrationTest/testAddAndRemoveWorker/]

> Flakey test  ConnectWorkerIntegrationTest#testAddAndRemoveWorke
> ---
>
> Key: KAFKA-8690
> URL: https://issues.apache.org/jira/browse/KAFKA-8690
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23570/consoleFull]
> org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > 
> testAddAndRemoveWorker STARTED*02:56:46* 
> org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker.test.stdout*02:56:46*
>  *02:56:46* org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest 
> > testAddAndRemoveWorker FAILED*02:56:46* java.lang.AssertionError: 
> Condition not met within timeout 15000. Connector tasks did not start in 
> time.*02:56:46* at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376)*02:56:46*
>  at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:353)*02:56:46*
>  at 
> org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker(ConnectWorkerIntegrationTest.java:118)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8686) Flaky test ExampleConnectIntegrationTest#testSinkConnector

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945655#comment-16945655
 ] 

Matthias J. Sax commented on KAFKA-8686:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/2357/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSinkConnector/]

> Flaky test ExampleConnectIntegrationTest#testSinkConnector
> --
>
> Key: KAFKA-8686
> URL: https://issues.apache.org/jira/browse/KAFKA-8686
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: flaky-test
>
> https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/429/console
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.13/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector.test.stdout*20:09:20*
>  *20:09:20* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSinkConnector FAILED*20:09:20* java.lang.AssertionError: Condition 
> not met within timeout 15000. Connector tasks were not assigned a partition 
> each.*20:09:20* at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376)*20:09:20*
>  at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:353)*20:09:20*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSinkConnector(ExampleConnectIntegrationTest.java:128)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945654#comment-16945654
 ] 

Matthias J. Sax commented on KAFKA-8555:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/25558/testReport/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/]

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-10-07 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945658#comment-16945658
 ] 

Matthias J. Sax commented on KAFKA-8555:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/8344/consoleFull]

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)