date:20200417

[jira] [Commented] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086310#comment-17086310
 ] 

ASF GitHub Bot commented on KAFKA-9888:
---

C0urante commented on pull request #8511: KAFKA-9888: Copy connector configs 
before passing to REST extensions
URL: https://github.com/apache/kafka/pull/8511
 
 
   [Jira](https://issues.apache.org/jira/browse/KAFKA-9888)
   
   The changes made in 
[KIP-454](https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface)
 involved adding a `connectorConfig` method to the 
[ConnectClusterState](https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java)
 interface that REST extensions could use to query the worker for the 
configuration of a given connector. The implementation for this method returns 
the Java `Map` that's stored in the worker's view of the config topic (when 
running in distributed mode). No copying is performed, which causes mutations 
of that `Map` object to persist across invocations of `connectorConfig` and, 
even worse, propagate to the worker when, e.g., starting a connector.
   
   The changes here just cause the framework to copy that map before sending it 
to REST extensions, and alter a comment in `KafkaConfigBackingStore` that 
addresses the mutability of the snapshots that it provides to warn against 
changes that may lead to bugs like this one.
   
   An existing unit test is modified to ensure that REST extensions receive a 
copy of the connector config, not the original.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST extensions can mutate connector configs in worker config state snapshot
> 
>
> Key: KAFKA-9888
> URL: https://issues.apache.org/jira/browse/KAFKA-9888
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0, 2.4.0, 2.3.1, 2.5.0, 2.4.1
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> The changes made in 
> [KIP-454|https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface]
>  involved adding a {{connectorConfig}} method to the 
> [ConnectClusterState|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java]
>  interface that REST extensions could use to query the worker for the 
> configuration of a given connector. The [implementation for this 
> method|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/health/ConnectClusterStateImpl.java#L86-L89]
>  returns the Java {{Map}} that's stored in the worker's view of the config 
> topic (when running in distributed mode). No copying is performed, which 
> causes mutations of that {{Map}} object to persist across invocations of 
> {{connectorConfig}} and, even worse, propagate to the worker when, e.g., 
> starting a connector.
> We should not give REST extensions that original map, but instead a copy of 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot

2020-04-17 Thread Chris Egerton (Jira)

Chris Egerton created KAFKA-9888:


 Summary: REST extensions can mutate connector configs in worker 
config state snapshot
 Key: KAFKA-9888
 URL: https://issues.apache.org/jira/browse/KAFKA-9888
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.4.1, 2.5.0, 2.3.1, 2.4.0, 2.3.0
Reporter: Chris Egerton
Assignee: Chris Egerton


The changes made in 
[KIP-454|https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface]
 involved adding a {{connectorConfig}} method to the 
[ConnectClusterState|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java]
 interface that REST extensions could use to query the worker for the 
configuration of a given connector. The [implementation for this 
method|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/health/ConnectClusterStateImpl.java#L86-L89]
 returns the Java {{Map}} that's stored in the worker's view of the config 
topic (when running in distributed mode). No copying is performed, which causes 
mutations of that {{Map}} object to persist across invocations of 
{{connectorConfig}} and, even worse, propagate to the worker when, e.g., 
starting a connector.

We should not give REST extensions that original map, but instead a copy of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086227#comment-17086227
 ] 

ASF GitHub Bot commented on KAFKA-9839:
---

apovzner commented on pull request #8509: KAFKA-9839: Broker should accept 
control requests with newer broker epoch
URL: https://github.com/apache/kafka/pull/8509
 
 
   A broker throws IllegalStateException if the broker epoch in the 
LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest is larger than its 
current broker epoch. However, there is no guarantee that the broker would 
receive the latest broker epoch before the controller: When the broker 
registers with ZK, there are few more instructions to process before this 
broker "knows" about its epoch, while the controller may already get notified 
and send UPDATE_METADATA request (as an example) with the new epoch. This will 
result in clients getting stale metadata from this broker. 
   
   With this PR, a broker accepts 
LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest if the broker epoch is 
newer than the current epoch.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException on metadata update when broker learns about its new 
> epoch after the controller
> 
>
> Key: KAFKA-9839
> URL: https://issues.apache.org/jira/browse/KAFKA-9839
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 2.3.1
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Critical
>
> Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current 
> broker epoch YYY"  on UPDATE_METADATA when the controller learns about the 
> broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker 
> completes (the broker learns about its new epoch).
> Here is the scenario we observed in more detail:
> 1. ZK session expires on broker 1
> 2. Broker 1 establishes new session to ZK and creates znode
> 3. Controller learns about broker 1 and assigns epoch
> 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know 
> about its new epoch yet, so we get an exception:
> ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, 
> api=UPDATE_METADATA, body={
> .
> java.lang.IllegalStateException: Epoch XXX larger than current broker epoch 
> YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at 
> kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:139) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at 
> java.lang.Thread.run(Thread.java:748)
> 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the 
> created znode at /brokers/ids/1"
> The result is the broker has a stale metadata for some time.
> Possible solutions:
> 1. Broker returns a more specific error and controller retries UPDATE_MEDATA
> 2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-17 Thread Anna Povzner (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086185#comment-17086185
 ] 

Anna Povzner commented on KAFKA-9839:
-

[~junrao] Thanks, yes I agree that accepting controller requests 
(UpdateMetadataRequest, LeaderAndIsrRequest, StopReplicaRequest) with newer 
broker epoch is the best approach – I don't see any drawbacks of accepting 
newer metadata. I am opening a PR with this solution shortly. 

> IllegalStateException on metadata update when broker learns about its new 
> epoch after the controller
> 
>
> Key: KAFKA-9839
> URL: https://issues.apache.org/jira/browse/KAFKA-9839
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 2.3.1
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Critical
>
> Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current 
> broker epoch YYY"  on UPDATE_METADATA when the controller learns about the 
> broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker 
> completes (the broker learns about its new epoch).
> Here is the scenario we observed in more detail:
> 1. ZK session expires on broker 1
> 2. Broker 1 establishes new session to ZK and creates znode
> 3. Controller learns about broker 1 and assigns epoch
> 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know 
> about its new epoch yet, so we get an exception:
> ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, 
> api=UPDATE_METADATA, body={
> .
> java.lang.IllegalStateException: Epoch XXX larger than current broker epoch 
> YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at 
> kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:139) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at 
> java.lang.Thread.run(Thread.java:748)
> 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the 
> created znode at /brokers/ids/1"
> The result is the broker has a stale metadata for some time.
> Possible solutions:
> 1. Broker returns a more specific error and controller retries UPDATE_MEDATA
> 2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-7689) Add Commit/List Offsets Operations to AdminClient

2020-04-17 Thread zhangzhisheng (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086181#comment-17086181
 ] 

zhangzhisheng commented on KAFKA-7689:
--

good

 

> Add Commit/List Offsets Operations to AdminClient
> -
>
> Key: KAFKA-7689
> URL: https://issues.apache.org/jira/browse/KAFKA-7689
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
> Fix For: 2.5.0
>
>
> Jira for KIP-396: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result

2020-04-17 Thread Boyang Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086127#comment-17086127
 ] 

Boyang Chen commented on KAFKA-9224:


This basically goes back to what consistency guarantee for IQ queries we are 
providing. Consider a billing pipeline, which doesn't turn on EOS and only rely 
on IQ, then our commitment as at_least_once doesn't guarantee the ordering for 
all times, so it's possible to see non-monotonic increasing cost value over 
time. These guarantees are protected by EOS, which aims for not just one time 
processing but also strong ordering guarantee. 

Of course we could disagree here, but as long as unclean leader election is 
still possible, I don't think a fail-over task could provide any linearizable 
guarantee towards to previous owner. It's good for us to clarify on the SLA 
first if we don't think this is expected for non-EOS.

> State store should not see uncommitted transaction result
> -
>
> Key: KAFKA-9224
> URL: https://issues.apache.org/jira/browse/KAFKA-9224
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Currently under EOS, the uncommitted write could be reflected in the state 
> store before the ongoing transaction is finished. This means interactive 
> query could see uncommitted data within state store which is not ideal for 
> users relying on state stores for strong consistency. Ideally, we should have 
> an option to include state store commit as part of ongoing transaction, 
> however an immediate step towards a better reasoned system is to `write after 
> transaction commit`, which means we always buffer data within stream cache 
> for EOS until the ongoing transaction is committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

2020-04-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-9818.

Fix Version/s: 2.6.0
   Resolution: Fixed

> Flaky Test 
> RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
> -
>
> Key: KAFKA-9818
> URL: https://issues.apache.org/jira/browse/KAFKA-9818
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.6.0
>
>
> h3. Error Message
> java.lang.AssertionError
> h3. Stacktrace
> java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>  at jdk.internal.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
>

[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086090#comment-17086090
 ] 

ASF GitHub Bot commented on KAFKA-9818:
---

mjsax commented on pull request #8507: KAFKA-9818: Fix flaky test in 
RecordCollectorTest
URL: https://github.com/apache/kafka/pull/8507
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test 
> RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
> -
>
> Key: KAFKA-9818
> URL: https://issues.apache.org/jira/browse/KAFKA-9818
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
>
> h3. Error Message
> java.lang.AssertionError
> h3. Stacktrace
> java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>  at

[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result

2020-04-17 Thread John Roesler (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086088#comment-17086088
 ] 

John Roesler commented on KAFKA-9224:
-

Thanks [~guozhang] , just for clarity, yes, that's the same situation I was 
talking about.

So far, it seems like the problem as stated is actually orthogonal to EOS. My 
question was, why solve this with an approach that only applies to active 
queries under EOS, and not instead seek a general solution to the same problem 
in all circumstances?

> State store should not see uncommitted transaction result
> -
>
> Key: KAFKA-9224
> URL: https://issues.apache.org/jira/browse/KAFKA-9224
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Currently under EOS, the uncommitted write could be reflected in the state 
> store before the ongoing transaction is finished. This means interactive 
> query could see uncommitted data within state store which is not ideal for 
> users relying on state stores for strong consistency. Ideally, we should have 
> an option to include state store commit as part of ongoing transaction, 
> however an immediate step towards a better reasoned system is to `write after 
> transaction commit`, which means we always buffer data within stream cache 
> for EOS until the ongoing transaction is committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086067#comment-17086067
 ] 

ASF GitHub Bot commented on KAFKA-6145:
---

vvcephei commented on pull request #8475: KAFKA-6145: KIP-441: Add test 
scenarios to ensure rebalance convergence
URL: https://github.com/apache/kafka/pull/8475
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> ---
>
> Key: KAFKA-6145
> URL: https://issues.apache.org/jira/browse/KAFKA-6145
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Sophie Blee-Goldman
>Priority: Major
>  Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-4090) JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol

2020-04-17 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086060#comment-17086060
 ] 

David Mollitor commented on KAFKA-4090:
---

Hey updates or thoughts on moving this forward?

> JVM runs into OOM if (Java) client uses a SSL port without setting the 
> security protocol
> 
>
> Key: KAFKA-4090
> URL: https://issues.apache.org/jira/browse/KAFKA-4090
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1, 0.10.0.1, 2.1.0
>Reporter: Jaikiran Pai
>Assignee: Alexandre Dupriez
>Priority: Major
>
> Quoting from the mail thread that was sent to Kafka mailing list:
> {quote}
> We have been using Kafka 0.9.0.1 (server and Java client libraries). So far 
> we had been using it with plaintext transport but recently have been 
> considering upgrading to using SSL. It mostly works except that a 
> mis-configured producer (and even consumer) causes a hard to relate 
> OutOfMemory exception and thus causing the JVM in which the client is 
> running, to go into a bad state. We can consistently reproduce that OOM very 
> easily. We decided to check if this is something that is fixed in 0.10.0.1 so 
> upgraded one of our test systems to that version (both server and client 
> libraries) but still see the same issue. Here's how it can be easily 
> reproduced
> 1. Enable SSL listener on the broker via server.properties, as per the Kafka 
> documentation
> {code}
> listeners=PLAINTEXT://:9092,SSL://:9093
> ssl.keystore.location=
> ssl.keystore.password=pass
> ssl.key.password=pass
> ssl.truststore.location=
> ssl.truststore.password=pass
> {code}
> 2. Start zookeeper and kafka server
> 3. Create a "oom-test" topic (which will be used for these tests):
> {code}
> kafka-topics.sh --zookeeper localhost:2181 --create --topic oom-test  
> --partitions 1 --replication-factor 1
> {code}
> 4. Create a simple producer which sends a single message to the topic via 
> Java (new producer) APIs:
> {code}
> public class OOMTest {
> public static void main(final String[] args) throws Exception {
> final Properties kafkaProducerConfigs = new Properties();
> // NOTE: Intentionally use a SSL port without specifying 
> security.protocol as SSL
> 
> kafkaProducerConfigs.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
> "localhost:9093");
> 
> kafkaProducerConfigs.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
> StringSerializer.class.getName());
> 
> kafkaProducerConfigs.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>  StringSerializer.class.getName());
> try (KafkaProducer producer = new 
> KafkaProducer<>(kafkaProducerConfigs)) {
> System.out.println("Created Kafka producer");
> final String topicName = "oom-test";
> final String message = "Hello OOM!";
> // send a message to the topic
> final Future recordMetadataFuture = 
> producer.send(new ProducerRecord<>(topicName, message));
> final RecordMetadata sentRecordMetadata = 
> recordMetadataFuture.get();
> System.out.println("Sent message '" + message + "' to topic '" + 
> topicName + "'");
> }
> System.out.println("Tests complete");
> }
> }
> {code}
> Notice that the server URL is using a SSL endpoint localhost:9093 but isn't 
> specifying any of the other necessary SSL configs like security.protocol.
> 5. For the sake of easily reproducing this issue run this class with a max 
> heap size of 256MB (-Xmx256M). Running this code throws up the following 
> OutOfMemoryError in one of the Sender threads:
> {code}
> 18:33:25,770 ERROR [KafkaThread] - Uncaught exception in 
> kafka-producer-network-thread | producer-1:
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Note that I set it to 256MB as heap size to easily reproduce it but this 
> isn't

[jira] [Comment Edited] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-04-17 Thread Jason Gustafson (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085938#comment-17085938
 ] 

Jason Gustafson edited comment on KAFKA-9543 at 4/17/20, 6:19 PM:
--

[~junrao] Good suggestion. I opened 
https://issues.apache.org/jira/browse/KAFKA-9886 to fix this separately. Note 
that I'll leave the issue here open until we can confirm whether it is fixed by 
KAFKA-9838.


was (Author: hachikuji):
[~junrao] Good suggestion. I opened 
https://issues.apache.org/jira/browse/KAFKA-9886. I'll leave this issue open 
until we can confirm whether it is fixed by KAFKA-9838.

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup

2020-04-17 Thread Chris Egerton (Jira)

Chris Egerton created KAFKA-9887:


 Summary: failed-task-count JMX metric not updated if task fails 
during startup
 Key: KAFKA-9887
 URL: https://issues.apache.org/jira/browse/KAFKA-9887
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.4.1, 2.5.0, 2.4.0
Reporter: Chris Egerton


If a task fails on startup (specifically, during [this code 
section|https://github.com/apache/kafka/blob/00a59b392d92b0d6d3a321ef9a53dae4b3a9d030/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L427-L468]),
 the {{failed-task-count}} JMX metric is not updated to reflect the task 
failure, even though the status endpoints in the REST API do report the task as 
failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9881) Verify whether RocksDBMetrics Get Measurements from RocksDB in a Unit Test instead of an Intergration Test

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085956#comment-17085956
 ] 

ASF GitHub Bot commented on KAFKA-9881:
---

guozhangwang commented on pull request #8501: KAFKA-9881: Convert integration 
test to verify measurements from RocksDB to unit test
URL: https://github.com/apache/kafka/pull/8501
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Verify whether RocksDBMetrics Get Measurements from RocksDB in a Unit Test 
> instead of an Intergration Test
> --
>
> Key: KAFKA-9881
> URL: https://issues.apache.org/jira/browse/KAFKA-9881
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: Bruno Cadonna
>Priority: Major
>
> The integration test {{RocksDBMetricsIntegrationTest}} takes pretty long to 
> complete. The main part of the runtime is spent in the two tests that verify 
> whether the rocksDB metrics get actual measurements from RocksDB. Those tests 
> need to wait for the thread that collects the measurements of the RocksDB 
> metrics to trigger the first recordings of the metrics. These tests do not 
> need to run as integration tests and thus they shall be converted into unit 
> tests to save runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-04-17 Thread Evan Williams (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085948#comment-17085948
 ] 

Evan Williams commented on KAFKA-4084:
--

[~sql_consulting] 

Many thanks for that!
I've just started trying to implement the patch, and just wanted to confirm a 
few things:
 # Do I *replace* the entire contents of /usr/share/java/kafka (where my Kafka 
JAR's are), with the 96 JAR's from the tar file ? Or do I just *copy them in 
addition* to the existing JAR's ? I tried replacing them all, however kafka was 
just crashing on start.
 # In regards to the patched JAR's, I see that they have "2.4.1" in the 
filenames, ie: kafka-clients-2.4.1-SNAPSHOT.jar 
Whereas my existing JARS's have 5.4.1. - will that effect things, when I do the 
rolling restarts ? ie, mistmatching versions.

 

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-04-17 Thread Jason Gustafson (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085938#comment-17085938
 ] 

Jason Gustafson commented on KAFKA-9543:


[~junrao] Good suggestion. I opened 
https://issues.apache.org/jira/browse/KAFKA-9886. I'll leave this issue open 
until we can confirm whether it is fixed by KAFKA-9838.

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-9886) Validate segment range before reading in `Log.read`

2020-04-17 Thread Jason Gustafson (Jira)

Jason Gustafson created KAFKA-9886:
--

 Summary: Validate segment range before reading in `Log.read`
 Key: KAFKA-9886
 URL: https://issues.apache.org/jira/browse/KAFKA-9886
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Log.read uses the following logic to set the upper limit on a segment read.

{code}
val maxPosition = {
   // Use the max offset position if it is on this segment; otherwise, the 
segment size is the limit.
  if (maxOffsetMetadata.segmentBaseOffset == segment.baseOffset) {
 maxOffsetMetadata.relativePositionInSegment
  } else {
segment.size
  }
}
{code}

In the else branch, the expectation is that 
`maxOffsetMetadata.segmentBaseOffset > segment.baseOffset`. In KAFKA-9838, we 
found a bug where this assumption failed  which led to reads above the high 
watermark. We should validate the expectation explicitly so that we don't leave 
the door open for similar bugs in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-04-17 Thread Jun Rao (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085916#comment-17085916
 ] 

Jun Rao commented on KAFKA-9543:


[~hachikuji]: Thanks for the analysis. It does seem this issue could be caused 
by KAFKA-9838.

Also, in Log.read(), we have code like the following. If we get to the _else_ 
part, the assumption is that maxOffsetMetadata.segmentBaseOffset > 
segment.baseOffset. Perhaps it's useful to assert that. That may help uncover 
issues that we may not know yet.

 

 
{code:java}
val maxPosition = {
 // Use the max offset position if it is on this segment; otherwise, the 
segment size is the limit.
 if (maxOffsetMetadata.segmentBaseOffset == segment.baseOffset) {
 maxOffsetMetadata.relativePositionInSegment
 } else {
 segment.size
 }
}{code}
 

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085908#comment-17085908
 ] 

ASF GitHub Bot commented on KAFKA-9818:
---

mjsax commented on pull request #8507: KAFKA-9818: Fix flaky test in 
RecordCollectorTest
URL: https://github.com/apache/kafka/pull/8507
 
 
   Call for review @vvcephei. Follow up of #8488 (own ticket number)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test 
> RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
> -
>
> Key: KAFKA-9818
> URL: https://issues.apache.org/jira/browse/KAFKA-9818
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
>
> h3. Error Message
> java.lang.AssertionError
> h3. Stacktrace
> java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at 
>

[jira] [Resolved] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]

2020-04-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-9819.

Fix Version/s: 2.6.0
   Resolution: Fixed

> Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]
> --
>
> Key: KAFKA-9819
> URL: https://issues.apache.org/jira/browse/KAFKA-9819
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.6.0
>
>
> h3. Error Message
> java.lang.AssertionError: expected:<[test-reader Changelog partition 
> unknown-0 could not be found, it could be already cleaned up during the 
> handlingof task corruption and never restore again]> but was:<[[AdminClient 
> clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) 
> could not be established. Broker may not be available., test-reader Changelog 
> partition unknown-0 could not be found, it could be already cleaned up during 
> the handlingof task corruption and never restore again]>
> h3. Stacktrace
> java.lang.AssertionError: expected:<[test-reader Changelog partition 
> unknown-0 could not be found, it could be already cleaned up during the 
> handlingof task corruption and never restore again]> but was:<[[AdminClient 
> clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) 
> could not be established. Broker may not be available., test-reader Changelog 
> partition unknown-0 could not be found, it could be already cleaned up during 
> the handlingof task corruption and never restore again]>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]

2020-04-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085832#comment-17085832
 ] 

ASF GitHub Bot commented on KAFKA-9819:
---

mjsax commented on pull request #8488: KAFKA-9819: Fix flaky test in 
StoreChangelogReaderTest
URL: https://github.com/apache/kafka/pull/8488
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]
> --
>
> Key: KAFKA-9819
> URL: https://issues.apache.org/jira/browse/KAFKA-9819
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
>
> h3. Error Message
> java.lang.AssertionError: expected:<[test-reader Changelog partition 
> unknown-0 could not be found, it could be already cleaned up during the 
> handlingof task corruption and never restore again]> but was:<[[AdminClient 
> clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) 
> could not be established. Broker may not be available., test-reader Changelog 
> partition unknown-0 could not be found, it could be already cleaned up during 
> the handlingof task corruption and never restore again]>
> h3. Stacktrace
> java.lang.AssertionError: expected:<[test-reader Changelog partition 
> unknown-0 could not be found, it could be already cleaned up during the 
> handlingof task corruption and never restore again]> but was:<[[AdminClient 
> clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) 
> could not be established. Broker may not be available., test-reader Changelog 
> partition unknown-0 could not be found, it could be already cleaned up during 
> the handlingof task corruption and never restore again]>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2020-04-17 Thread Bill Bejeck (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085744#comment-17085744
 ] 

Bill Bejeck commented on KAFKA-7965:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/5826/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/]

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: David Jacot
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-17 Thread Sigurd Sandve (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085702#comment-17085702
 ] 

Sigurd Sandve edited comment on KAFKA-9839 at 4/17/20, 12:37 PM:
-

We experienced the same issue, adding some logs if it could be helpful.

After this happened the server got stuck in a loop for an hour where ISR was 
not in sync, and it wasn't fixed until a hard restart was done on the server.
{code:java}
Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:11,527] WARN Client session timed out, have not heard from server in 
6981ms for sessionid 0x5484a370021 (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:11,892] INFO Client session timed out, have not heard from server in 
6981ms for sessionid 0x5484a370021, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:12 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,930] INFO Opening socket connection to server
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,930] INFO Socket connection established to 
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] WARN Unable to reconnect to ZooKeeper service, session 
0x5484a370021 has expired (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO Unable to reconnect to ZooKeeper service, session 
0x5484a370021 has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO EventThread shut down for session: 0x5484a370021 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO [ZooKeeperClient Kafka server] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO [ZooKeeperClient Kafka server] Initializing a new session to 
<1><2><3><4><5>. (kafka.zookeepe
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO Initiating client connection, connectString=<1><2><3><4><5> 
sessionTimeout=6000 watcher=kafka
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO Creating /brokers/ids/0 (is it secure? false) 
(kafka.zk.KafkaZkClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,042] INFO Opening socket connection to server <1> Will not attempt to 
authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,042] INFO Socket connection established to <1>, initiating session 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,832] INFO Session establishment complete on server <1>, sessionid = 
0x1008d93001b, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,874] ERROR [KafkaApi-0] Error when handling request: clientId=22, 
correlationId=0, api=UPDATE_METADATA, 
body={controller_id=22,controller_epoch=83,broker_epoch=25770069286,topic_states=[{topic
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: 
=47,replicas=[0],offline_replicas=[]},{partition=54,controller_epoch=83,leader=-1,leader_epoch=47,isr=[0],zk_version=47,replicas=[0],offline_replicas=[]},{partition=55,controller_epoch=83,leader=-1,leader_epoch=47
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: 
java.lang.IllegalStateException: Epoch 25770069286 larger than current broker 
epoch 25770067254
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2612)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:242)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.handle(KafkaApis.scala:119)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
java.lang.Thread.run(Thread.java:748)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,885] INFO Stat of the created znode at /brokers/ids/0 is: 
25770069286,25770069286,1587024073851,1587024073851,1,0,0,72057596413149211,200,0,25770069286
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]:  
(kafka.zk.KafkaZkClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,885] INFO Registered

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-17 Thread Sigurd Sandve (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085702#comment-17085702
 ] 

Sigurd Sandve commented on KAFKA-9839:
--

We experienced the same issue, adding some logs if it could be helpful. 

After this happened this server got stuck in a loop for an hour where ISR was 
not in sync, and it wasn't fixed until a hard restart was done on the server.
{code:java}
Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:11,527] WARN Client session timed out, have not heard from server in 
6981ms for sessionid 0x5484a370021 (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:11,892] INFO Client session timed out, have not heard from server in 
6981ms for sessionid 0x5484a370021, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:12 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,930] INFO Opening socket connection to server
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,930] INFO Socket connection established to 
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] WARN Unable to reconnect to ZooKeeper service, session 
0x5484a370021 has expired (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO Unable to reconnect to ZooKeeper service, session 
0x5484a370021 has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO EventThread shut down for session: 0x5484a370021 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,932] INFO [ZooKeeperClient Kafka server] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO [ZooKeeperClient Kafka server] Initializing a new session to 
<1><2><3><4><5>. (kafka.zookeepe
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO Initiating client connection, connectString=<1><2><3><4><5> 
sessionTimeout=6000 watcher=kafka
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:12,934] INFO Creating /brokers/ids/0 (is it secure? false) 
(kafka.zk.KafkaZkClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,042] INFO Opening socket connection to server <1> Will not attempt to 
authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,042] INFO Socket connection established to <1>, initiating session 
(org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,832] INFO Session establishment complete on server <1>, sessionid = 
0x1008d93001b, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,874] ERROR [KafkaApi-0] Error when handling request: clientId=22, 
correlationId=0, api=UPDATE_METADATA, 
body={controller_id=22,controller_epoch=83,broker_epoch=25770069286,topic_states=[{topic
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: 
=47,replicas=[0],offline_replicas=[]},{partition=54,controller_epoch=83,leader=-1,leader_epoch=47,isr=[0],zk_version=47,replicas=[0],offline_replicas=[]},{partition=55,controller_epoch=83,leader=-1,leader_epoch=47
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: 
java.lang.IllegalStateException: Epoch 25770069286 larger than current broker 
epoch 25770067254
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2612)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:242)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaApis.handle(KafkaApis.scala:119)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at 
java.lang.Thread.run(Thread.java:748)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,885] INFO Stat of the created znode at /brokers/ids/0 is: 
25770069286,25770069286,1587024073851,1587024073851,1,0,0,72057596413149211,200,0,25770069286
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]:  
(kafka.zk.KafkaZkClient)
Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 
10:01:13,885] INFO Registered broker 0 at path /brokers/ids/0 with addresses:

[jira] [Created] (KAFKA-9885) Evict last members of a group when the maximum allowed is reached

2020-04-17 Thread David Jacot (Jira)

David Jacot created KAFKA-9885:
--

 Summary: Evict last members of a group when the maximum allowed is 
reached
 Key: KAFKA-9885
 URL: https://issues.apache.org/jira/browse/KAFKA-9885
 Project: Kafka
  Issue Type: Bug
Reporter: David Jacot
Assignee: David Jacot


While analysing https://issues.apache.org/jira/browse/KAFKA-7965, we found that 
multiple members of a group can be evicted from a group if the leader of the 
consumer offset partition changes before the group is persisted. This happens 
because the current evection logic always evict the first member which rejoins 
the group.

We would like to change the evection logic so that the last members to rejoin 
the group are kicked out instead.

Here is an example of what happens when the leader changes:
{noformat}
// Group is loaded in GroupCoordinator 0
// A rebalance is triggered because the group is over capacity
[2020-04-02 11:14:33,393] INFO [GroupMetadataManager brokerId=0] Scheduling 
loading of offsets and group metadata from __consumer_offsets-0 
(kafka.coordinator.group.GroupMetadataManager:66)
[2020-04-02 11:14:33,406] INFO [Consumer clientId=ConsumerTestConsumer, 
groupId=group-max-size-test] Discovered group coordinator localhost:40071 (id: 
2147483647 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:794)
[2020-04-02 11:14:33,409] INFO Static member 
MemberMetadata(memberId=ConsumerTestConsumer-ceaab707-69af-4a65-8275-cb7db7fb66b3,
 groupInstanceId=Some(null), clientId=ConsumerTestConsumer, 
clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, 
supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test 
loaded with member id ConsumerTestConsumer-ceaab707-69af-4a65-8275-cb7db7fb66b3 
at generation 1. (kafka.coordinator.group.GroupMetadata$:126)
[2020-04-02 11:14:33,410] INFO Static member 
MemberMetadata(memberId=ConsumerTestConsumer-07077ca2-30e9-45cd-b363-30672281bacb,
 groupInstanceId=Some(null), clientId=ConsumerTestConsumer, 
clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, 
supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test 
loaded with member id ConsumerTestConsumer-07077ca2-30e9-45cd-b363-30672281bacb 
at generation 1. (kafka.coordinator.group.GroupMetadata$:126)
[2020-04-02 11:14:33,412] INFO Static member 
MemberMetadata(memberId=ConsumerTestConsumer-5d359e65-1f11-43ce-874e-fddf55c0b49d,
 groupInstanceId=Some(null), clientId=ConsumerTestConsumer, 
clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, 
supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test 
loaded with member id ConsumerTestConsumer-5d359e65-1f11-43ce-874e-fddf55c0b49d 
at generation 1. (kafka.coordinator.group.GroupMetadata$:126)
[2020-04-02 11:14:33,413] INFO [GroupCoordinator 0]: Loading group metadata for 
group-max-size-test with generation 1 
(kafka.coordinator.group.GroupCoordinator:66)
[2020-04-02 11:14:33,413] INFO [GroupCoordinator 0]: Preparing to rebalance 
group group-max-size-test in state PreparingRebalance with old generation 1 
(__consumer_offsets-0) (reason: Freshly-loaded group is over capacity 
(GroupConfig(10,180,2,0).groupMaxSize). Rebalacing in order to give a 
chance for consumers to commit offsets) 
(kafka.coordinator.group.GroupCoordinator:66)
[2020-04-02 11:14:33,431] INFO [GroupMetadataManager brokerId=0] Finished 
loading offsets and group metadata from __consumer_offsets-0 in 28 
milliseconds, of which 0 milliseconds was spent in the scheduler. 
(kafka.coordinator.group.GroupMetadataManager:66)

// A first consumer is kicked out of the group while trying to re-join
[2020-04-02 11:14:33,449] ERROR [Consumer clientId=ConsumerTestConsumer, 
groupId=group-max-size-test] Attempt to join group failed due to fatal error: 
The consumer group has reached its max size. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:627)
[2020-04-02 11:14:33,451] ERROR [daemon-consumer-assignment-2]: Error due to 
(kafka.api.AbstractConsumerTest$ConsumerAssignmentPoller:76)
org.apache.kafka.common.errors.GroupMaxSizeReachedException: Consumer group 
group-max-size-test already has the configured maximum number of members.
[2020-04-02 11:14:33,451] INFO [daemon-consumer-assignment-2]: Stopped 
(kafka.api.AbstractConsumerTest$ConsumerAssignmentPoller:66)

// Before the rebalance is completed, a preferred replica leader election kicks 
in and move the leader from 0 to 1
[2020-04-02 11:14:34,155] INFO [Controller id=0] Processing automatic preferred 
replica leader election (kafka.controller.KafkaController:66)
[2020-04-02 11:14:34,169] INFO [Controller id=0] Starting replica leader 
election (PREFERRED) for partitions 
group-max-size-test-0,group-max-size-test-3,__consumer_offsets-0 triggered by 
AutoTriggered (kafka.controller.KafkaController:66)

// The group

[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

2020-04-17 Thread William Reynolds (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085625#comment-17085625
 ] 

William Reynolds commented on KAFKA-6266:
-

If anyone runs into this and needs a workaround before getting 2.4.1/trunk. 
Might need some tweaking based on where/how you log and obviously different if 
you aren't using systemd, will need user/dirs tweaked to be a general workaround
{code:java}
journalctl -u kafka --since 'yesterday' | grep 'Resetting first dirty offset 
of' | awk '{print $14 ,$19}' | sort -u > /home/user/checkpoints
cp /kafka-topic-data/cleaner-offset-checkpoint /home/user
cat /home/user/checkpoints | sed -r 's/(.*)-/\1 /g' > 
/home/user/checkpoints-processed
cat /home/user/checkpoints-processed cleaner-offset-checkpoint | sort --key=1,2 
-u > /home/user/cleaner-offset-checkpoint.clean
sudo systemctl stop kafka; sudo mv /kafka-topic-data/cleaner-offset-checkpoint 
/home/user/; sudo mv /home/user/cleaner-offset-checkpoint.clean 
/kafka-topic-data/cleaner-offset-checkpoint; sudo chown -R kafka:kafka 
/kafka-topic-data/cleaner-offset-checkpoint; sleep 5; sudo systemctl start kafka
{code}

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Assignee: David Mao
>Priority: Major
> Fix For: 2.5.0, 2.4.1
>
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9874) broker can not work when use dns fault

2020-04-17 Thread Jira



[ 
https://issues.apache.org/jira/browse/KAFKA-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085608#comment-17085608
 ] 

Sönke Liebau commented on KAFKA-9874:
-

Hi [~zchzx]

I am afraid I don't fully understand your ticket, could you please elaborate a 
little bit more?

> broker can not work when use dns fault
> --
>
> Key: KAFKA-9874
> URL: https://issues.apache.org/jira/browse/KAFKA-9874
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.1, 2.4.1
>Reporter: para
>Priority: Critical
>  Labels: acl, dns
> Attachments: kast.log
>
>
> in 2.3.1 we authenticate using sasl blocked when the dns service is 
> fault，caused by java native func getHostByAddr.
> but the hostname was never used, can use the default name instead of it
>  
> h3.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-1614) Partition log directory name and segments information exposed via JMX

2020-04-17 Thread Raffaele Saggino (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085575#comment-17085575
 ] 

Raffaele Saggino commented on KAFKA-1614:
-

After 5 years, do we need a KIP to get this merged?

> Partition log directory name and segments information exposed via JMX
> -
>
> Key: KAFKA-1614
> URL: https://issues.apache.org/jira/browse/KAFKA-1614
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
>Reporter: Alexander Demidko
>Assignee: Alexander Demidko
>Priority: Major
> Attachments: log_segments_dir_jmx_info.patch
>
>
> Makes partition log directory and single segments information exposed via 
> JMX. This is useful to:
> - monitor disks usage in a cluster and on single broker
> - calculate disk space taken by different topics
> - estimate space to be freed when segments are expired
> Patch attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (KAFKA-9704) z/OS won't let us resize file when mmap

2020-04-17 Thread Mickael Maison (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-9704:
-

Assignee: Shuo Zhang

> z/OS won't let us resize file when mmap
> ---
>
> Key: KAFKA-9704
> URL: https://issues.apache.org/jira/browse/KAFKA-9704
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.4.0
>Reporter: Shuo Zhang
>Assignee: Shuo Zhang
>Priority: Major
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> z/OS won't let us resize file when mmap, so we need to force unman like 
> Windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-9603) Number of open files keeps increasing in Streams application

2020-04-17 Thread Abhishek Kumar (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085556#comment-17085556
 ] 

Abhishek Kumar edited comment on KAFKA-9603 at 4/17/20, 8:44 AM:
-

[~lpandzic] Potentially related issue with using Confluent ksql. ksql is built 
using kafka streams. - [https://github.com/confluentinc/ksql/issues/5057]


was (Author: a6kme):
[~lpandzic] Something related while using Confluent ksql. ksql is built using 
kafka streams. - [https://github.com/confluentinc/ksql/issues/5057]

> Number of open files keeps increasing in Streams application
> 
>
> Key: KAFKA-9603
> URL: https://issues.apache.org/jira/browse/KAFKA-9603
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0, 2.3.1
> Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image
>Reporter: Bruno Iljazovic
>Priority: Major
>
> Problem appeared when upgrading from *2.0.1* to *2.3.1*. 
> Relevant Kafka Streams code:
> {code:java}
> KStream events1 =
> builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, 
> event1TimestampExtractor(), null))
>.mapValues(...);
> KStream events2 =
> builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, 
> event2TimestampExtractor(), null))
>.mapValues(...);
> var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis())
>  .until(Duration.of(1, HOURS).toMillis());
> events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, 
> event2Serde, event1Serde))
>.foreach(...);
> {code}
> Number of open *.sst files keeps increasing until eventually it hits the os 
> limit (65536) and causes this exception:
> {code:java}
> Caused by: org.rocksdb.RocksDBException: While open a file for appending: 
> /.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst:
>  Too many open files
>   at org.rocksdb.RocksDB.flush(Native Method)
>   at org.rocksdb.RocksDB.flush(RocksDB.java:2394)
> {code}
> Here are example files that are opened and never closed:
> {code:java}
> /.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst
> /.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst
> /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst
> /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst
> /.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-04-17 Thread Brian Jones (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085539#comment-17085539
 ] 

Brian Jones commented on KAFKA-9543:


Thanks Jason - that sounds promising. I'll keep an eye for the 2.4.2 / 2.5.1 / 
2.6.0 release. We only ever hit this in production, which makes testing 
tricking, but the fact that you have such a solid explanation of why it 
might've happened and why it only affected 2.4 is very reassuring.

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-9884) Unable to override some client properties in Mirror maker 2.0

2020-04-17 Thread Mithun Kumar (Jira)

Mithun Kumar created KAFKA-9884:
---

 Summary: Unable to override some client properties in Mirror maker 
2.0
 Key: KAFKA-9884
 URL: https://issues.apache.org/jira/browse/KAFKA-9884
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.4.1, 2.5.0, 2.4.0
Reporter: Mithun Kumar
 Attachments: mm2.log

I have a two 3 node kafka clusters. MirrorMaker 2.0 is being run as a cluster 
with bin/connect-mirror-maker.sh mm2.properties

I am trying to disable message duplication on replication by enabling 
idempotence. I understand that EOS is marked as a future work in 
[KIP-382|https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
 however it should be possible by setting enable.idempotence = true and retries 
> 0.

The .enable.idempotence = true takes effect, however overriding 
the retries fails. I tried all 3 versions that provide MM2 2.4.0 , 2.4.1 and 
2.5.0.

My mm2.properties config :
{noformat}
name = pri_to_bkp
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
topics = test-mm-topic-3
groups = .*
clusters = pri, bkp
source.cluster.alias = pri
target.cluster.alias = bkp

sasl.mechanism = GSSAPI
sasl.kerberos.service.name = kafka
security.protocol = SASL_PLAINTEXT
sasl.jaas.config = com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
keyTab="/etc/security/keytabs/user.keytab" \
principal="u...@xx.xx.com";

pri.enable.idempotence = true
bkp.enable.idempotence = true
pri.retries = 2147483647
bkp.retries = 2147483647

pri.bootstrap.servers = SASL_PLAINTEXT://kafka1:9092, 
SASL_PLAINTEXT://kafka2:9092, SASL_PLAINTEXT://kafka3:9092
bkp.bootstrap.servers = SASL_PLAINTEXT://bkp-kafka1:9092, 
SASL_PLAINTEXT://bkp-kafka2:9092, SASL_PLAINTEXT://bkp-kafka3:9092
pri->bkp.enabled = true
pri->bkp.topics = "test-mm-topic-3"
{noformat}
 

The error leading to failure is:
{noformat}
[2020-04-17 15:46:26,525] ERROR [Worker clientId=connect-1, groupId=pri-mm2] 
Uncaught exception in herder work thread, exiting:  
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:297)
org.apache.kafka.common.config.ConfigException: Must set retries to non-zero 
when using the idempotent producer.
at 
org.apache.kafka.clients.producer.ProducerConfig.maybeOverrideAcksAndRetries(ProducerConfig.java:432)
at 
org.apache.kafka.clients.producer.ProducerConfig.postProcessParsedConfig(ProducerConfig.java:400)
at 
org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:110)
at 
org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:129)
at 
org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:481)
at 
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:326)
at 
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:270)
at 
org.apache.kafka.connect.util.KafkaBasedLog.createProducer(KafkaBasedLog.java:248)
at 
org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:129)
at 
org.apache.kafka.connect.storage.KafkaStatusBackingStore.start(KafkaStatusBackingStore.java:199)
at 
org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:124)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:284)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2020-04-17 15:46:29,089] INFO [Worker clientId=connect-1, groupId=pri-mm2] 
Herder stopped 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:636)
[2020-04-17 15:46:29,089] INFO [Worker clientId=connect-2, groupId=bkp-mm2] 
Herder stopping 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:616)

[2020-04-17 15:46:34,090] INFO [Worker clientId=connect-2, groupId=bkp-mm2] 
Herder stopped 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:636)
[2020-04-17 15:46:34,090] INFO Kafka MirrorMaker stopped. 
(org.apache.kafka.connect.mirror.MirrorMaker:191)
{noformat}
 The complete log file is attached.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8812) Rebalance Producers

2020-04-17 Thread Werner Daehn (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085506#comment-17085506
 ] 

Werner Daehn commented on KAFKA-8812:
-

I am still of the opinion this feature would help all and Kafka Connect in 
particular.

Can we get a decision on that?

> Rebalance Producers
> ---
>
> Key: KAFKA-8812
> URL: https://issues.apache.org/jira/browse/KAFKA-8812
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Werner Daehn
>Assignee: Werner Daehn
>Priority: Major
>  Labels: kip
>
> [KIP-509: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers|https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers]
> Please bare with me. Initially this thought sounds stupid but it has its 
> merits.
>  
> How do you build a distributed producer at the moment? You use Kafka Connect 
> which in turn requires a cluster that tells which instance is producing what 
> partitions.
> On the consumer side it is different. There Kafka itself does the data 
> distribution. If you have 10 Kafka partitions and 10 consumers, each will get 
> data for one partition. With 5 consumers, each will get data from two 
> partitions. And if there is only a single consumer active, it gets all data. 
> All is managed by Kafka, all you have to do is start as many consumers as you 
> want.
>  
> I'd like to suggest something similar for the producers. A producer would 
> tell Kafka that its source has 10 partitions. The Kafka server then responds 
> with a list of partitions this instance shall be responsible for. If it is 
> the only producer, the response would be all 10 partitions. If it is the 
> second instance starting up, the first instance would get the information it 
> should produce data for partition 1-5 and the new one for partition 6-10. If 
> the producer fails to respond with an alive packet, a rebalance does happen, 
> informing the active producer to take more load and the dead producer will 
> get an error when sending data again.
> For restart, the producer rebalance has to send the starting point where to 
> start producing the data onwards from as well, of course. Would be best if 
> this is a user generated pointer and not the topic offset. Then it can be 
> e.g. the database system change number, a database transaction id or 
> something similar.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9863) update the deprecated --zookeeper option in the documentation into --bootstrap-server

2020-04-17 Thread Luke Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085487#comment-17085487
 ] 

Luke Chen commented on KAFKA-9863:
--

PR: https://github.com/apache/kafka/pull/8482

> update the deprecated --zookeeper option in the documentation into 
> --bootstrap-server
> -
>
> Key: KAFKA-9863
> URL: https://issues.apache.org/jira/browse/KAFKA-9863
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, documentation
>Affects Versions: 2.4.1
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
>
> Since V2.2.0, the -zookeeper option turned into deprecated because Kafka can 
> directly connect to brokers with --bootstrap-server (KIP-377). But in the 
> official documentation, there are many example commands use --zookeeper 
> instead of --bootstrap-server. Follow the command in the documentation, 
> you'll get this warning, which is not good.
> {code:java}
> Warning: --zookeeper is deprecated and will be removed in a future version of 
> Kafka.
> Use --bootstrap-server instead to specify a broker to connect to.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot

[jira] [Created] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

[jira] [Commented] (KAFKA-7689) Add Commit/List Offsets Operations to AdminClient

[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result

[jira] [Resolved] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result

[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

[jira] [Commented] (KAFKA-4090) JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol

[jira] [Comment Edited] (KAFKA-9543) Consumer offset reset after new segment rolling

[jira] [Created] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup

[jira] [Commented] (KAFKA-9881) Verify whether RocksDBMetrics Get Measurements from RocksDB in a Unit Test instead of an Intergration Test

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

[jira] [Created] (KAFKA-9886) Validate segment range before reading in `Log.read`

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler

[jira] [Resolved] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]

[jira] [Commented] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]

[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

[jira] [Comment Edited] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

[jira] [Created] (KAFKA-9885) Evict last members of a group when the maximum allowed is reached

[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

[jira] [Commented] (KAFKA-9874) broker can not work when use dns fault

[jira] [Commented] (KAFKA-1614) Partition log directory name and segments information exposed via JMX

[jira] [Assigned] (KAFKA-9704) z/OS won't let us resize file when mmap

[jira] [Comment Edited] (KAFKA-9603) Number of open files keeps increasing in Streams application

[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

[jira] [Created] (KAFKA-9884) Unable to override some client properties in Mirror maker 2.0

[jira] [Commented] (KAFKA-8812) Rebalance Producers

[jira] [Commented] (KAFKA-9863) update the deprecated --zookeeper option in the documentation into --bootstrap-server

34 matches

Site Navigation

Mail list logo

Footer information