[jira] [Commented] (KAFKA-7940) Flaky Test CustomQuotaCallbackTest#testCustomQuotaCallback
[ https://issues.apache.org/jira/browse/KAFKA-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794108#comment-16794108 ] Matthias J. Sax commented on KAFKA-7940: Failed again: [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/69/testReport/junit/kafka.api/CustomQuotaCallbackTest/testCustomQuotaCallback/] StackTrace is different: {quote}java.lang.AssertionError: Partition [group1_largeTopic,69] metadata not propagated after 15000 ms at kafka.utils.TestUtils$.fail(TestUtils.scala:381) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:880) at kafka.utils.TestUtils$.$anonfun$createTopic$6(TestUtils.scala:360) at kafka.utils.TestUtils$.$anonfun$createTopic$6$adapted(TestUtils.scala:359) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.MapLike$DefaultKeySet.foreach(MapLike.scala:181) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:51) at scala.collection.SetLike.map(SetLike.scala:104) at scala.collection.SetLike.map$(SetLike.scala:104) at scala.collection.AbstractSet.map(Set.scala:51) at kafka.utils.TestUtils$.createTopic(TestUtils.scala:359) at kafka.utils.TestUtils$.createTopic(TestUtils.scala:332) at kafka.api.CustomQuotaCallbackTest.createTopic(CustomQuotaCallbackTest.scala:181) at kafka.api.CustomQuotaCallbackTest.testCustomQuotaCallback(CustomQuotaCallbackTest.scala:136){quote} STDOUT {quote}[2019-03-15 16:44:31,140] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:31,140] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) [2019-03-15 16:44:31,545] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:31,545] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) Completed Updating config for entity: user-principal 'scram-admin'. [2019-03-15 16:44:31,597] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:31,599] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) [2019-03-15 16:44:31,728] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:31,728] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) [2019-03-15 16:44:32,592] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:32,604] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) Completed Updating config for entity: user-principal 'group0_user1'. [2019-03-15 16:44:36,625] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka8953054928214446748.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011) [2019-03-15 16:44:36,625] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient:74) Completed Updating config for entity: user-principal 'group0_user2'.
[jira] [Updated] (KAFKA-7855) Kafka Streams Maven Archetype quickstart fails to compile out of the box
[ https://issues.apache.org/jira/browse/KAFKA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-7855: --- Affects Version/s: (was: 2.0.1) 2.0.0 > Kafka Streams Maven Archetype quickstart fails to compile out of the box > > > Key: KAFKA-7855 > URL: https://issues.apache.org/jira/browse/KAFKA-7855 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.0.0 > Environment: Java 8, OS X 10.13.6 >Reporter: Michael Drogalis >Assignee: Kristian Aurlien >Priority: Major > Labels: newbie++ > Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1 > > Attachments: output.log > > > When I follow the [quickstart > tutorial|https://kafka.apache.org/21/documentation/streams/tutorial] and > issue the command to set up a new Maven project, the generated example fails > to compile. Adding a Produced.with() on the source seems to fix this. I've > attached the compiler output. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7855) Kafka Streams Maven Archetype quickstart fails to compile out of the box
[ https://issues.apache.org/jira/browse/KAFKA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-7855: --- Affects Version/s: (was: 2.1.0) 2.0.1 > Kafka Streams Maven Archetype quickstart fails to compile out of the box > > > Key: KAFKA-7855 > URL: https://issues.apache.org/jira/browse/KAFKA-7855 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.0.1 > Environment: Java 8, OS X 10.13.6 >Reporter: Michael Drogalis >Assignee: Kristian Aurlien >Priority: Major > Labels: newbie++ > Attachments: output.log > > > When I follow the [quickstart > tutorial|https://kafka.apache.org/21/documentation/streams/tutorial] and > issue the command to set up a new Maven project, the generated example fails > to compile. Adding a Produced.with() on the source seems to fix this. I've > attached the compiler output. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7855) Kafka Streams Maven Archetype quickstart fails to compile out of the box
[ https://issues.apache.org/jira/browse/KAFKA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794101#comment-16794101 ] ASF GitHub Bot commented on KAFKA-7855: --- mjsax commented on pull request #6194: KAFKA-7855: Kafka Streams Maven Archetype quickstart fails to compile out of the box URL: https://github.com/apache/kafka/pull/6194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Maven Archetype quickstart fails to compile out of the box > > > Key: KAFKA-7855 > URL: https://issues.apache.org/jira/browse/KAFKA-7855 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0 > Environment: Java 8, OS X 10.13.6 >Reporter: Michael Drogalis >Assignee: Kristian Aurlien >Priority: Major > Labels: newbie++ > Attachments: output.log > > > When I follow the [quickstart > tutorial|https://kafka.apache.org/21/documentation/streams/tutorial] and > issue the command to set up a new Maven project, the generated example fails > to compile. Adding a Produced.with() on the source seems to fix this. I've > attached the compiler output. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8116) Add Kafka Streams archetype for Java11
Matthias J. Sax created KAFKA-8116: -- Summary: Add Kafka Streams archetype for Java11 Key: KAFKA-8116 URL: https://issues.apache.org/jira/browse/KAFKA-8116 Project: Kafka Issue Type: Bug Components: streams Reporter: Matthias J. Sax In https://issues.apache.org/jira/browse/KAFKA-5727 we added an archetype for Kafka Streams. However, this archetype only works for Java8 but not for Java11. Thus, we should add a new archetype project for Java11. This ticket requires a KIP: [https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-8027) Gradual decline in performance of CachingWindowStore provider when number of keys grow
[ https://issues.apache.org/jira/browse/KAFKA-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793994#comment-16793994 ] Sophie Blee-Goldman edited comment on KAFKA-8027 at 3/15/19 10:28 PM: -- Hi [~prashantideal], I have been looking into this and have two PRs aimed at improving performance of segmented stores with caching enabled. Would you be able to test either or both of them out, and let me know if they improve things at all? You can find the first PR [here|[https://github.com/apache/kafka/pull/6433]] and the second one [here|[https://github.com/apache/kafka/pull/6448]] Keep in mind these are just improvements to the caching layer and are unlikely to result in overall better fetching performance than withCachingDisabled, since as you point out for range queries we must search the underlying RocksDBStore anyway. If you don't need caching for other reasons (eg reducing downstream traffic or writes to RocksDB) and can afford to turn it off, I recommend doing so. was (Author: ableegoldman): Hi [~prashantideal], I have been looking into this and have two PRs aimed at improving performance of segmented stores with caching enabled. Would you be able to test either or both of them out, and let me know if they improve things at all? You can find the first PR [here|[https://github.com/apache/kafka/pull/6433]] and the second one [here|[https://github.com/apache/kafka/pull/6448]] Keep in mind these are just improvements to the caching layer and are unlikely to result in overall better performance than withCachingDisabled, since as you point out for range queries we must search the underlying RocksDBStore anyway. If you don't need caching for other reasons (eg reducing downstream traffic) and can afford to turn it off, I recommend doing so. > Gradual decline in performance of CachingWindowStore provider when number of > keys grow > -- > > Key: KAFKA-8027 > URL: https://issues.apache.org/jira/browse/KAFKA-8027 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0 >Reporter: Prashant >Priority: Major > Labels: interactivequ, kafka-streams > > We observed this during a performance test of our stream application which > tracks user's activity and provides REST interface to query the window state > store. We used default configuration of Materialized i.e. withCachingEnabled > for storing user behaviour stats in a window state store > (CompositeWindowStore with CachingWindowStore as underlyin which internally > uses RocksDBStore for persistent). > While querying window store with store.fetch(key, long, long), it internally > tries to fetch the range from ThreadCache which uses a byte iterator to > search for a key in cache and on a cache miss it goes to RocksDBStore for > result. So, when number of keys in cache becomes large this ThreadCache > search starts taking time (range Iterator on all keys) which impacts > WindowStore query performance. > > Workaround: If we disable cache with switch on Materialized instance i.e. > withCachingDisabled, key search is delegated directly to RocksDBStore which > is way faster and completed search in microseconds against millis in case of > CachingWindowStore. > > Stats: With Unique users > 0.5M, random search for a key i.e. UserId: > > withCachingEnabled : 40 < t < 80ms (upper bound increases as unique users > grow) > withCahingDisabled: t < 1ms (Almost constant time) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8115) Flaky Test CoordinatorTest#testTaskRequestWithOldStartMsGetsUpdated
Matthias J. Sax created KAFKA-8115: -- Summary: Flaky Test CoordinatorTest#testTaskRequestWithOldStartMsGetsUpdated Key: KAFKA-8115 URL: https://issues.apache.org/jira/browse/KAFKA-8115 Project: Kafka Issue Type: Bug Components: core, unit tests Affects Versions: 2.3.0 Reporter: Matthias J. Sax Fix For: 2.3.0 [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3254/testReport/junit/org.apache.kafka.trogdor.coordinator/CoordinatorTest/testTaskRequestWithOldStartMsGetsUpdated/] {quote}org.junit.runners.model.TestTimedOutException: test timed out after 12 milliseconds at java.base@11.0.1/jdk.internal.misc.Unsafe.park(Native Method) at java.base@11.0.1/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) at java.base@11.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) at java.base@11.0.1/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454) at java.base@11.0.1/java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:709) at app//org.apache.kafka.trogdor.rest.JsonRestServer.waitForShutdown(JsonRestServer.java:157) at app//org.apache.kafka.trogdor.agent.Agent.waitForShutdown(Agent.java:123) at app//org.apache.kafka.trogdor.common.MiniTrogdorCluster.close(MiniTrogdorCluster.java:285) at app//org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated(CoordinatorTest.java:596) at java.base@11.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.1/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.base@11.0.1/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base@11.0.1/java.lang.Thread.run(Thread.java:834){quote} STDOUT {quote}[2019-03-15 09:23:41,364] INFO Creating MiniTrogdorCluster with agents: node02 and coordinator: node01 (org.apache.kafka.trogdor.common.MiniTrogdorCluster:135) [2019-03-15 09:23:41,595] INFO Logging initialized @13340ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log:193) [2019-03-15 09:23:41,752] INFO Starting REST server (org.apache.kafka.trogdor.rest.JsonRestServer:89) [2019-03-15 09:23:41,912] INFO Registered resource org.apache.kafka.trogdor.agent.AgentRestResource@3fa38ceb (org.apache.kafka.trogdor.rest.JsonRestServer:94) [2019-03-15 09:23:42,178] INFO jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: c4550056e785fb5665914545889f21dc136ad9e6; jvm 11.0.1+13-LTS (org.eclipse.jetty.server.Server:370) [2019-03-15 09:23:42,360] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session:365) [2019-03-15 09:23:42,362] INFO No SessionScavenger set, using defaults (org.eclipse.jetty.server.session:370) [2019-03-15 09:23:42,370] INFO node0 Scavenging every 66ms (org.eclipse.jetty.server.session:149) [2019-03-15 09:23:44,412] INFO Started o.e.j.s.ServletContextHandler@335a5293\{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:855) [2019-03-15 09:23:44,473] INFO Started ServerConnector@79a93bf1\{HTTP/1.1,[http/1.1]}{0.0.0.0:33477} (org.eclipse.jetty.server.AbstractConnector:292) [2019-03-15 09:23:44,474] INFO Started @16219ms (org.eclipse.jetty.server.Server:407) [2019-03-15 09:23:44,475] INFO REST server listening at [http://127.0.1.1:33477/] (org.apache.kafka.trogdor.rest.JsonRestServer:123) [2019-03-15 09:23:44,484] INFO Starting REST server (org.apache.kafka.trogdor.rest.JsonRestServer:89) [2019-03-15 09:23:44,485] INFO Registered resource org.apache.kafka.trogdor.coordinator.CoordinatorRestResource@2e06ee92 (org.apache.kafka.trogdor.rest.JsonRestServer:94) [2019-03-15 09:23:44,486] INFO jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: c4550056e785fb5665914545889f21dc136ad9e6; jvm 11.0.1+13-LTS (org.eclipse.jetty.server.Server:370) [2019-03-15 09:23:44,536] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session:365) [2019-03-15
[jira] [Created] (KAFKA-8114) Flaky Test DelegationTokenEndToEndAuthorizationTest#testNoGroupAcl
Matthias J. Sax created KAFKA-8114: -- Summary: Flaky Test DelegationTokenEndToEndAuthorizationTest#testNoGroupAcl Key: KAFKA-8114 URL: https://issues.apache.org/jira/browse/KAFKA-8114 Project: Kafka Issue Type: Bug Components: core, unit tests Affects Versions: 2.3.0 Reporter: Matthias J. Sax Fix For: 2.3.0 [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3254/testReport/junit/kafka.api/DelegationTokenEndToEndAuthorizationTest/testNoGroupAcl/] {quote}java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-256 at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) at kafka.api.DelegationTokenEndToEndAuthorizationTest.createDelegationToken(DelegationTokenEndToEndAuthorizationTest.scala:88) at kafka.api.DelegationTokenEndToEndAuthorizationTest.configureSecurityAfterServersStart(DelegationTokenEndToEndAuthorizationTest.scala:63) at kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:107) at kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81) at kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) at kafka.api.EndToEndAuthorizationTest.setUp(EndToEndAuthorizationTest.scala:183) at kafka.api.DelegationTokenEndToEndAuthorizationTest.setUp(DelegationTokenEndToEndAuthorizationTest.scala:74){quote} STDOUT {quote}Adding ACLs for resource `Cluster:LITERAL:kafka-cluster`: User:scram-admin has Allow permission for operations: ClusterAction from hosts: * Current ACLs for resource `Cluster:LITERAL:kafka-cluster`: User:scram-admin has Allow permission for operations: ClusterAction from hosts: * Adding ACLs for resource `Topic:LITERAL:*`: User:scram-admin has Allow permission for operations: Read from hosts: * Current ACLs for resource `Topic:LITERAL:*`: User:scram-admin has Allow permission for operations: Read from hosts: * Completed Updating config for entity: user-principal 'scram-admin'. Completed Updating config for entity: user-principal 'scram-user'. Adding ACLs for resource `Topic:LITERAL:e2etopic`: User:scram-user has Allow permission for operations: Write from hosts: * User:scram-user has Allow permission for operations: Create from hosts: * User:scram-user has Allow permission for operations: Describe from hosts: * Current ACLs for resource `Topic:LITERAL:e2etopic`: User:scram-user has Allow permission for operations: Write from hosts: * User:scram-user has Allow permission for operations: Create from hosts: * User:scram-user has Allow permission for operations: Describe from hosts: * Adding ACLs for resource `Group:LITERAL:group`: User:scram-user has Allow permission for operations: Read from hosts: * Current ACLs for resource `Group:LITERAL:group`: User:scram-user has Allow permission for operations: Read from hosts: * Current ACLs for resource `Topic:LITERAL:e2etopic`: User:scram-user has Allow permission for operations: Write from hosts: * User:scram-user has Allow permission for operations: Create from hosts: * Current ACLs for resource `Topic:LITERAL:e2etopic`: User:scram-user has Allow permission for operations: Create from hosts: * [2019-03-15 09:58:16,481] ERROR [Consumer clientId=consumer-99, groupId=group] Topic authorization failed for topics [e2etopic] (org.apache.kafka.clients.Metadata:297) [2019-03-15 09:58:17,527] WARN Unable to read additional data from client sessionid 0x104549c2b88000a, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) Adding ACLs for resource `Cluster:LITERAL:kafka-cluster`: User:scram-admin has Allow permission for operations: ClusterAction from hosts: * Current ACLs for resource `Cluster:LITERAL:kafka-cluster`: User:scram-admin has Allow permission for operations: ClusterAction from hosts: * Adding ACLs for resource `Topic:LITERAL:*`: User:scram-admin has Allow permission for operations: Read from hosts: * Current ACLs for resource `Topic:LITERAL:*`: User:scram-admin has Allow permission for operations: Read from hosts: * Completed Updating config for entity: user-principal 'scram-admin'. Completed Updating config for entity: user-principal 'scram-user'. Adding ACLs for resource `Topic:PREFIXED:e2e`: User:scram-user has Allow permission for operations: Read from hosts: * User:scram-user has Allow permission for operations: Describe from hosts: * User:scram-user has Allow permission for operations: Write from hosts: *
[jira] [Commented] (KAFKA-8027) Gradual decline in performance of CachingWindowStore provider when number of keys grow
[ https://issues.apache.org/jira/browse/KAFKA-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793994#comment-16793994 ] Sophie Blee-Goldman commented on KAFKA-8027: Hi [~prashantideal], I have been looking into this and have two PRs aimed at improving performance of segmented stores with caching enabled. Would you be able to test either or both of them out, and let me know if they improve things at all? You can find the first PR [here|[https://github.com/apache/kafka/pull/6433]] and the second one [here|[https://github.com/apache/kafka/pull/6448]] Keep in mind these are just improvements to the caching layer and are unlikely to result in overall better performance than withCachingDisabled, since as you point out for range queries we must search the underlying RocksDBStore anyway. If you don't need caching for other reasons (eg reducing downstream traffic) and can afford to turn it off, I recommend doing so. > Gradual decline in performance of CachingWindowStore provider when number of > keys grow > -- > > Key: KAFKA-8027 > URL: https://issues.apache.org/jira/browse/KAFKA-8027 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0 >Reporter: Prashant >Priority: Major > Labels: interactivequ, kafka-streams > > We observed this during a performance test of our stream application which > tracks user's activity and provides REST interface to query the window state > store. We used default configuration of Materialized i.e. withCachingEnabled > for storing user behaviour stats in a window state store > (CompositeWindowStore with CachingWindowStore as underlyin which internally > uses RocksDBStore for persistent). > While querying window store with store.fetch(key, long, long), it internally > tries to fetch the range from ThreadCache which uses a byte iterator to > search for a key in cache and on a cache miss it goes to RocksDBStore for > result. So, when number of keys in cache becomes large this ThreadCache > search starts taking time (range Iterator on all keys) which impacts > WindowStore query performance. > > Workaround: If we disable cache with switch on Materialized instance i.e. > withCachingDisabled, key search is delegated directly to RocksDBStore which > is way faster and completed search in microseconds against millis in case of > CachingWindowStore. > > Stats: With Unique users > 0.5M, random search for a key i.e. UserId: > > withCachingEnabled : 40 < t < 80ms (upper bound increases as unique users > grow) > withCahingDisabled: t < 1ms (Almost constant time) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8030) Flaky Test TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitionsMixed
[ https://issues.apache.org/jira/browse/KAFKA-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793972#comment-16793972 ] Matthias J. Sax commented on KAFKA-8030: Failed again: [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/376/tests] > Flaky Test > TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitionsMixed > - > > Key: KAFKA-8030 > URL: https://issues.apache.org/jira/browse/KAFKA-8030 > Project: Kafka > Issue Type: Bug > Components: admin, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: Viktor Somogyi-Vass >Priority: Critical > Labels: flaky-test > Fix For: 2.3.0 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2830/testReport/junit/kafka.admin/TopicCommandWithAdminClientTest/testDescribeUnderMinIsrPartitionsMixed/] > {quote}java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderMinIsrPartitionsMixed(TopicCommandWithAdminClientTest.scala:602){quote} > STDERR > {quote}Option "[replica-assignment]" can't be used with option > "[partitions]"{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8091) Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener
[ https://issues.apache.org/jira/browse/KAFKA-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793980#comment-16793980 ] Matthias J. Sax commented on KAFKA-8091: Ack. Thanks! > Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener > --- > > Key: KAFKA-8091 > URL: https://issues.apache.org/jira/browse/KAFKA-8091 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.2.0 >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > > See KAFKA-6824 for details. Since the SSL version of the test is currently > skipped using @Ignore, fixing this for SASL first and wait for that to be > stable before re-enabling SSL tests under KAFKA-6824. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793973#comment-16793973 ] Matthias J. Sax commented on KAFKA-7965: Failed again with "Should have received an class org.apache.kafka.common.errors.GroupMaxSizeReachedException during the cluster roll": [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3469/tests] > Flaky Test > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > > > Key: KAFKA-7965 > URL: https://issues.apache.org/jira/browse/KAFKA-7965 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, unit tests >Affects Versions: 2.2.0, 2.3.0 >Reporter: Matthias J. Sax >Assignee: Stanislav Kozlovski >Priority: Critical > Labels: flaky-test > Fix For: 2.3.0, 2.2.1 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/] > {quote}java.lang.AssertionError: Received 0, expected at least 68 at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.assertTrue(Assert.java:41) at > kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8113) Flaky Test ListOffsetsRequestTest#testResponseIncludesLeaderEpoch
Matthias J. Sax created KAFKA-8113: -- Summary: Flaky Test ListOffsetsRequestTest#testResponseIncludesLeaderEpoch Key: KAFKA-8113 URL: https://issues.apache.org/jira/browse/KAFKA-8113 Project: Kafka Issue Type: Bug Components: core, unit tests Affects Versions: 2.3.0 Reporter: Matthias J. Sax Fix For: 2.3.0 [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3468/tests] {quote}java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at kafka.server.ListOffsetsRequestTest.fetchOffsetAndEpoch$1(ListOffsetsRequestTest.scala:136) at kafka.server.ListOffsetsRequestTest.testResponseIncludesLeaderEpoch(ListOffsetsRequestTest.scala:151){quote} STDOUT {quote}[2019-03-15 17:16:13,029] ERROR [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Error for partition topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 17:16:13,231] ERROR [KafkaApi-0] Error while responding to offset request (kafka.server.KafkaApis:76) org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition topic-0 is not available{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8112) Add system test to detect compatibility issues when requests are updated
Rajini Sivaram created KAFKA-8112: - Summary: Add system test to detect compatibility issues when requests are updated Key: KAFKA-8112 URL: https://issues.apache.org/jira/browse/KAFKA-8112 Project: Kafka Issue Type: Test Components: system tests Reporter: Rajini Sivaram Both compatibility_test_new_broker_test.py and upgrade_test.py passed with the Metadata version issue in KAFKA-8111. We didn't have a full system test build after the changes, so not sure if there are other tests which may have failed. This is to make sure that we add a test that would fail for similar compatibility issues in future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8111) KafkaProducer can't produce data
[ https://issues.apache.org/jira/browse/KAFKA-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Roesler updated KAFKA-8111: Priority: Critical (was: Major) > KafkaProducer can't produce data > > > Key: KAFKA-8111 > URL: https://issues.apache.org/jira/browse/KAFKA-8111 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.3.0 >Reporter: John Roesler >Assignee: Rajini Sivaram >Priority: Critical > > Using a Producer from the current trunk (a6691fb79), I'm unable to produce > data to a 2.2 broker. > tl;dr;, I narrowed down the problem to > [https://github.com/apache/kafka/commit/a42f16f98] . My hypothesis is that > some part of that commit broke backward compatibility with older brokers. > > Repro steps: > I'm using this Producer config: > {noformat} > final Properties properties = new Properties(); > properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER); > properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > return properties;{noformat} > # create a simple Producer to produce test data to a broker > # build against commmit a42f16f98 > # start an older broker. (I was using 2.1, and someone else reproduced it > with 2.2) > # run your producer and note that it doesn't produce data (seems to hang, I > see it produce 2 records in 1 minute) > # build against the predecessor commit 65aea1f36 > # run your producer and note that it DOES produce data (I see it produce 1M > records every 15 second) > I've also confirmed that if I check out the current trunk (a6691fb79e2c55b3) > and revert a42f16f98, I also observe that it produces as expected (1M every > 15 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8111) KafkaProducer can't produce data
[ https://issues.apache.org/jira/browse/KAFKA-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Roesler updated KAFKA-8111: Labels: blocker (was: ) > KafkaProducer can't produce data > > > Key: KAFKA-8111 > URL: https://issues.apache.org/jira/browse/KAFKA-8111 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.3.0 >Reporter: John Roesler >Assignee: Rajini Sivaram >Priority: Critical > Labels: blocker > > Using a Producer from the current trunk (a6691fb79), I'm unable to produce > data to a 2.2 broker. > tl;dr;, I narrowed down the problem to > [https://github.com/apache/kafka/commit/a42f16f98] . My hypothesis is that > some part of that commit broke backward compatibility with older brokers. > > Repro steps: > I'm using this Producer config: > {noformat} > final Properties properties = new Properties(); > properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER); > properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > return properties;{noformat} > # create a simple Producer to produce test data to a broker > # build against commmit a42f16f98 > # start an older broker. (I was using 2.1, and someone else reproduced it > with 2.2) > # run your producer and note that it doesn't produce data (seems to hang, I > see it produce 2 records in 1 minute) > # build against the predecessor commit 65aea1f36 > # run your producer and note that it DOES produce data (I see it produce 1M > records every 15 second) > I've also confirmed that if I check out the current trunk (a6691fb79e2c55b3) > and revert a42f16f98, I also observe that it produces as expected (1M every > 15 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3083) a soft failure in controller may leave a topic partition in an inconsistent state
[ https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793855#comment-16793855 ] Shannon Carey commented on KAFKA-3083: -- Is there a way to make this less likely to occur in versions before the fix? Would using a larger value for zookeeper.session.timeout.ms make any difference? I assume that "broker A's session expires" refers to the broker's Zookeeper session? > a soft failure in controller may leave a topic partition in an inconsistent > state > - > > Key: KAFKA-3083 > URL: https://issues.apache.org/jira/browse/KAFKA-3083 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Onur Karaman >Priority: Major > Labels: reliability > Fix For: 1.1.0 > > > The following sequence can happen. > 1. Broker A is the controller and is in the middle of processing a broker > change event. As part of this process, let's say it's about to shrink the isr > of a partition. > 2. Then broker A's session expires and broker B takes over as the new > controller. Broker B sends the initial leaderAndIsr request to all brokers. > 3. Broker A continues by shrinking the isr of the partition in ZK and sends > the new leaderAndIsr request to the broker (say C) that leads the partition. > Broker C will reject this leaderAndIsr since the request comes from a > controller with an older epoch. Now we could be in a situation that Broker C > thinks the isr has all replicas, but the isr stored in ZK is different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8091) Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener
[ https://issues.apache.org/jira/browse/KAFKA-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793849#comment-16793849 ] Rajini Sivaram commented on KAFKA-8091: --- [~mjsax] Merged another fix today for that issue (after that failed run). Will wait and see if that has fixed the issue. > Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener > --- > > Key: KAFKA-8091 > URL: https://issues.apache.org/jira/browse/KAFKA-8091 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.2.0 >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > > See KAFKA-6824 for details. Since the SSL version of the test is currently > skipped using @Ignore, fixing this for SASL first and wait for that to be > stable before re-enabling SSL tests under KAFKA-6824. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8106) Remove unnecessary decompression operation when logValidator do validation.
[ https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flower.min updated KAFKA-8106: -- Description: We do performance testing about kafka in specific scenarios as described below .We build a kafka cluster with one broker,and create topics with different number of partitions;then we start lots of producer processes to send large amounts of messages to one of the topics at one testing . *_specific scenario_* # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. # _*Topics :*_ topic1:50 partitions,topic2:100partitions,topic3:200 partitions,...,2000 partiitons # _*size of Single Message* :_ 1024B *_Config of KafkaProducer :_ __* ** # _*compression.type*:_ lz4 # _*linger.ms*:_ 1000ms/2000ms/5000ms # *_batch.size:_* _1_6384B/10240B/102400B # _*buffer.memory:*_ 134217728B *_The best result of performance testing:_* # *_Pe_r*_*formance*:_2300 messages/s. # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : 97%~99%,Disk write speed:550M/s~610MB/s . _*Phenomenon and my doubt:*_ _** The upper limit of CPU usage has been reached But it does not reach the upper limit of the bandwidth of the server network. We are doubtful about which cost too much CPU time and we want to Improve performance and reduces CPU usage of kafka server._ _**_ was: We do performance testing about kafka in specific scenarios as described below .We build a kafka cluster with one broker,and create topics with different number of partitions;then we start lots of producer processes to send large amounts of messages to one of the topics at one testing . *_specific scenario_* # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. # _*Topics :* _ topic1:50 partitions,topic2:100partitions,topic3:200 partitions,...,2000 partiitons # _*size of Single Message* :_ 1024B *_Config of KafkaProducer :_ __* ** # _*compression.type*:_ lz4 # _*linger.ms*:_ 1000ms/2000ms/5000ms # *_batch.size:_* _1_6384B/10240B/102400B # _*buffer.memory:*_ 134217728B *_The best result of performance testing:_* # *_Performance:_*2300 messages/s. # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : 97%~99%,Disk write speed:550M/s~610MB/s . _*Phenomenon and my doubt:*_ _** The upper limit of CPU usage has been reached But it does not reach the upper limit of the bandwidth of the server network. We are doubtful about which cost too much CPU time and we want to Improve performance and reduces CPU usage of kafka server._ _**_ > Remove unnecessary decompression operation when logValidator do validation. > > > Key: KAFKA-8106 > URL: https://issues.apache.org/jira/browse/KAFKA-8106 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.1.1 >Reporter: Flower.min >Priority: Major > > We do performance testing about kafka in specific scenarios as > described below .We build a kafka cluster with one broker,and create topics > with different number of partitions;then we start lots of producer processes > to send large amounts of messages to one of the topics at one testing . > *_specific scenario_* > # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel > Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. > # _*Topics :*_ topic1:50 partitions,topic2:100partitions,topic3:200 > partitions,...,2000 partiitons > # _*size of Single Message* :_ 1024B > *_Config of KafkaProducer :_ __* ** > # _*compression.type*:_ lz4 > # _*linger.ms*:_ 1000ms/2000ms/5000ms > # *_batch.size:_* _1_6384B/10240B/102400B > # _*buffer.memory:*_ 134217728B > *_The best result of performance testing:_* > # *_Pe_r*_*formance*:_2300 messages/s. > # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : > 97%~99%,Disk write speed:550M/s~610MB/s . > _*Phenomenon and my doubt:*_ > _** The upper limit of CPU usage has been reached But it does > not reach the upper limit of the bandwidth of the server network. We are > doubtful about which cost too much CPU time and we want to Improve > performance and reduces CPU usage of kafka server._ > _**_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8106) Remove unnecessary decompression operation when logValidator do validation.
[ https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flower.min updated KAFKA-8106: -- Description: We do performance testing about kafka in specific scenarios as described below .We build a kafka cluster with one broker,and create topics with different number of partitions;then we start lots of producer processes to send large amounts of messages to one of the topics at one testing . *_specific scenario_* # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. # _*Topics :* _ topic1:50 partitions,topic2:100partitions,topic3:200 partitions,...,2000 partiitons # _*size of Single Message* :_ 1024B *_Config of KafkaProducer :_ __* ** # _*compression.type*:_ lz4 # _*linger.ms*:_ 1000ms/2000ms/5000ms # *_batch.size:_* _1_6384B/10240B/102400B # _*buffer.memory:*_ 134217728B *_The best result of performance testing:_* # *_Performance:_*2300 messages/s. # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : 97%~99%,Disk write speed:550M/s~610MB/s . _*Phenomenon and my doubt:*_ _** The upper limit of CPU usage has been reached But it does not reach the upper limit of the bandwidth of the server network. We are doubtful about which cost too much CPU time and we want to Improve performance and reduces CPU usage of kafka server._ _**_ > Remove unnecessary decompression operation when logValidator do validation. > > > Key: KAFKA-8106 > URL: https://issues.apache.org/jira/browse/KAFKA-8106 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.1.1 >Reporter: Flower.min >Priority: Major > > We do performance testing about kafka in specific scenarios as > described below .We build a kafka cluster with one broker,and create topics > with different number of partitions;then we start lots of producer processes > to send large amounts of messages to one of the topics at one testing . > *_specific scenario_* > # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel > Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. > # _*Topics :* _ topic1:50 partitions,topic2:100partitions,topic3:200 > partitions,...,2000 partiitons > # _*size of Single Message* :_ 1024B > *_Config of KafkaProducer :_ __* ** > # _*compression.type*:_ lz4 > # _*linger.ms*:_ 1000ms/2000ms/5000ms > # *_batch.size:_* _1_6384B/10240B/102400B > # _*buffer.memory:*_ 134217728B > *_The best result of performance testing:_* > # *_Performance:_*2300 messages/s. > # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : > 97%~99%,Disk write speed:550M/s~610MB/s . > _*Phenomenon and my doubt:*_ > _** The upper limit of CPU usage has been reached But it does > not reach the upper limit of the bandwidth of the server network. We are > doubtful about which cost too much CPU time and we want to Improve > performance and reduces CPU usage of kafka server._ > _**_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8106) Remove
[ https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flower.min updated KAFKA-8106: -- Summary: Remove (was: Remove unnecessary decompression operation when logValidator do validation.) > Remove > --- > > Key: KAFKA-8106 > URL: https://issues.apache.org/jira/browse/KAFKA-8106 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.1.1 >Reporter: Flower.min >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8111) KafkaProducer can't produce data
[ https://issues.apache.org/jira/browse/KAFKA-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793837#comment-16793837 ] ASF GitHub Bot commented on KAFKA-8111: --- rajinisivaram commented on pull request #6451: KAFKA-8111; Set min and max versions for Metadata requests URL: https://github.com/apache/kafka/pull/6451 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > KafkaProducer can't produce data > > > Key: KAFKA-8111 > URL: https://issues.apache.org/jira/browse/KAFKA-8111 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.3.0 >Reporter: John Roesler >Assignee: Rajini Sivaram >Priority: Major > > Using a Producer from the current trunk (a6691fb79), I'm unable to produce > data to a 2.2 broker. > tl;dr;, I narrowed down the problem to > [https://github.com/apache/kafka/commit/a42f16f98] . My hypothesis is that > some part of that commit broke backward compatibility with older brokers. > > Repro steps: > I'm using this Producer config: > {noformat} > final Properties properties = new Properties(); > properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER); > properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > return properties;{noformat} > # create a simple Producer to produce test data to a broker > # build against commmit a42f16f98 > # start an older broker. (I was using 2.1, and someone else reproduced it > with 2.2) > # run your producer and note that it doesn't produce data (seems to hang, I > see it produce 2 records in 1 minute) > # build against the predecessor commit 65aea1f36 > # run your producer and note that it DOES produce data (I see it produce 1M > records every 15 second) > I've also confirmed that if I check out the current trunk (a6691fb79e2c55b3) > and revert a42f16f98, I also observe that it produces as expected (1M every > 15 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8106) Remove unnecessary decompression operation when logValidator do validation.
[ https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flower.min updated KAFKA-8106: -- Description: (was: We do performance testing about kafka in specific scenarios as described below .We build a kafka cluster with one broker,and create topics with different number of partitions;then we start lots of producer processes to send large amounts of messages to one of the topics at one testing . *_specific scenario_* # *_server :_* cpu:2*16 ; MemTotal : 256G,Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection ; SSD. # _*Topics :*_ topic1:50 partitions,topic2:100partitions,topic3:200 partitions,...,2000 partiitons # _*size of Single Message* :_ 1024B *_Config of KafkaProducer :_ __* ** # _*compression.type*:_ lz4 # _*linger.ms*:_ 1000ms/2000ms/5000ms # *_batch.size:_* _1_6384B/10240B/102400B # _*buffer.memory:*_ 134217728B *_The best result of performance testing:_* # *_Pe_r*_*formance*:_2300 messages/s. # *_Resource usage:_* Network inflow rate : 550M/s~610MB/s,CPU(%) : 97%~99%,Disk write speed:550M/s~610MB/s . _*Phenomenon and my doubt:*_ _** The upper limit of CPU usage has been reached But it does not reach the upper limit of the bandwidth of the server network. We are doubtful about which cost too much CPU time and we want to Improve performance and reduces CPU usage of kafka server._ _**_ ) > Remove unnecessary decompression operation when logValidator do validation. > > > Key: KAFKA-8106 > URL: https://issues.apache.org/jira/browse/KAFKA-8106 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.1.1 >Reporter: Flower.min >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-8111) KafkaProducer can't produce data
[ https://issues.apache.org/jira/browse/KAFKA-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram reassigned KAFKA-8111: - Assignee: Rajini Sivaram > KafkaProducer can't produce data > > > Key: KAFKA-8111 > URL: https://issues.apache.org/jira/browse/KAFKA-8111 > Project: Kafka > Issue Type: Bug > Components: clients, core >Affects Versions: 2.3.0 >Reporter: John Roesler >Assignee: Rajini Sivaram >Priority: Major > > Using a Producer from the current trunk (a6691fb79), I'm unable to produce > data to a 2.2 broker. > tl;dr;, I narrowed down the problem to > [https://github.com/apache/kafka/commit/a42f16f98] . My hypothesis is that > some part of that commit broke backward compatibility with older brokers. > > Repro steps: > I'm using this Producer config: > {noformat} > final Properties properties = new Properties(); > properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER); > properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getCanonicalName()); > return properties;{noformat} > # create a simple Producer to produce test data to a broker > # build against commmit a42f16f98 > # start an older broker. (I was using 2.1, and someone else reproduced it > with 2.2) > # run your producer and note that it doesn't produce data (seems to hang, I > see it produce 2 records in 1 minute) > # build against the predecessor commit 65aea1f36 > # run your producer and note that it DOES produce data (I see it produce 1M > records every 15 second) > I've also confirmed that if I check out the current trunk (a6691fb79e2c55b3) > and revert a42f16f98, I also observe that it produces as expected (1M every > 15 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793781#comment-16793781 ] Matthias J. Sax commented on KAFKA-7965: "Boxed error" here [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/373/tests] > Flaky Test > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > > > Key: KAFKA-7965 > URL: https://issues.apache.org/jira/browse/KAFKA-7965 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, unit tests >Affects Versions: 2.2.0, 2.3.0 >Reporter: Matthias J. Sax >Assignee: Stanislav Kozlovski >Priority: Critical > Labels: flaky-test > Fix For: 2.3.0, 2.2.1 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/] > {quote}java.lang.AssertionError: Received 0, expected at least 68 at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.assertTrue(Assert.java:41) at > kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8091) Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener
[ https://issues.apache.org/jira/browse/KAFKA-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793780#comment-16793780 ] Matthias J. Sax commented on KAFKA-8091: [~rsivaram] this test failed again: [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3466/tests] {quote}org.scalatest.junit.JUnitTestFailedError: Operation should not have completed at org.scalatest.junit.AssertionsForJUnit.newAssertionFailedException(AssertionsForJUnit.scala:100) at org.scalatest.junit.AssertionsForJUnit.newAssertionFailedException$(AssertionsForJUnit.scala:99) at org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:71) at org.scalatest.Assertions.fail(Assertions.scala:1089) at org.scalatest.Assertions.fail$(Assertions.scala:1085) at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:71) at kafka.server.DynamicBrokerReconfigurationTest.verifyTimeout(DynamicBrokerReconfigurationTest.scala:1328) at kafka.server.DynamicBrokerReconfigurationTest.verifyRemoveListener(DynamicBrokerReconfigurationTest.scala:981) at kafka.server.DynamicBrokerReconfigurationTest.testAddRemoveSaslListeners(DynamicBrokerReconfigurationTest.scala:843){quote} STDOUT {quote}Completed Updating config for entity: brokers '0'. Completed Updating config for entity: brokers '1'. Completed Updating config for entity: brokers '2'. [2019-03-15 05:51:46,395] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=1] Error for partition testtopic-6 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:51:46,395] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=1] Error for partition testtopic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. Completed Updating config for entity: brokers '0'. Completed Updating config for entity: brokers '1'. Completed Updating config for entity: brokers '2'. [2019-03-15 05:51:54,754] ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=1] Error for partition testtopic-4 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. Completed Updating config for entity: brokers '0'. Completed Updating config for entity: brokers '1'. Completed Updating config for entity: brokers '2'. [2019-03-15 05:52:06,197] WARN Unable to reconnect to ZooKeeper service, session 0x10453e5652b0002 has expired (org.apache.zookeeper.ClientCnxn:1289) Completed Updating config for entity: brokers '0'. Completed Updating config for entity: brokers '1'. Completed Updating config for entity: brokers '2'. [2019-03-15 05:52:15,144] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=1] Error for partition testtopic-6 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,144] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=1] Error for partition testtopic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,157] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition testtopic-7 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,157] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition testtopic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,168] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=1] Error for partition testtopic-2 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,168] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition testtopic-5 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,168] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=1] Error for partition testtopic-8 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2019-03-15 05:52:15,174] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=1]
[jira] [Created] (KAFKA-8111) KafkaProducer can't produce data
John Roesler created KAFKA-8111: --- Summary: KafkaProducer can't produce data Key: KAFKA-8111 URL: https://issues.apache.org/jira/browse/KAFKA-8111 Project: Kafka Issue Type: Bug Components: clients, core Affects Versions: 2.3.0 Reporter: John Roesler Using a Producer from the current trunk (a6691fb79), I'm unable to produce data to a 2.2 broker. tl;dr;, I narrowed down the problem to [https://github.com/apache/kafka/commit/a42f16f98] . My hypothesis is that some part of that commit broke backward compatibility with older brokers. Repro steps: I'm using this Producer config: {noformat} final Properties properties = new Properties(); properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER); properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getCanonicalName()); properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getCanonicalName()); return properties;{noformat} # create a simple Producer to produce test data to a broker # build against commmit a42f16f98 # start an older broker. (I was using 2.1, and someone else reproduced it with 2.2) # run your producer and note that it doesn't produce data (seems to hang, I see it produce 2 records in 1 minute) # build against the predecessor commit 65aea1f36 # run your producer and note that it DOES produce data (I see it produce 1M records every 15 second) I've also confirmed that if I check out the current trunk (a6691fb79e2c55b3) and revert a42f16f98, I also observe that it produces as expected (1M every 15 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8110) Flaky Test DescribeConsumerGroupTest#testDescribeMembersWithConsumersWithoutAssignedPartitions
Matthias J. Sax created KAFKA-8110: -- Summary: Flaky Test DescribeConsumerGroupTest#testDescribeMembersWithConsumersWithoutAssignedPartitions Key: KAFKA-8110 URL: https://issues.apache.org/jira/browse/KAFKA-8110 Project: Kafka Issue Type: Bug Components: core, unit tests Affects Versions: 2.2.0 Reporter: Matthias J. Sax Fix For: 2.3.0, 2.2.1 [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/67/testReport/junit/kafka.admin/DescribeConsumerGroupTest/testDescribeMembersWithConsumersWithoutAssignedPartitions/] {quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata not propagated after 15000 ms at kafka.utils.TestUtils$.fail(TestUtils.scala:381) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:880) at kafka.utils.TestUtils$.$anonfun$createTopic$3(TestUtils.scala:318) at kafka.utils.TestUtils$.$anonfun$createTopic$3$adapted(TestUtils.scala:317) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.immutable.Range.foreach(Range.scala:158) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at kafka.utils.TestUtils$.createTopic(TestUtils.scala:317) at kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:375) at kafka.admin.DescribeConsumerGroupTest.testDescribeMembersWithConsumersWithoutAssignedPartitions(DescribeConsumerGroupTest.scala:372){quote} STDOUT {quote}[2019-03-14 20:01:52,347] WARN Ignoring unexpected runtime exception (org.apache.zookeeper.server.NIOServerCnxnFactory:236) java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205) at java.lang.Thread.run(Thread.java:748) TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID foo 0 0 0 0 - - - TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID foo 0 0 0 0 - - - COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE #MEMBERS localhost:44669 (0){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7898) ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
[ https://issues.apache.org/jira/browse/KAFKA-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793755#comment-16793755 ] Steven McDonald commented on KAFKA-7898: {quote}I also consider it a bug that this NullPointerException leaves the Kafka cluster in a state that it does not recover from automatically.{quote} For this issue, I have raised [ZOOKEEPER-3315|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3315]. > ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn) > --- > > Key: KAFKA-7898 > URL: https://issues.apache.org/jira/browse/KAFKA-7898 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Gabriel Lukacs >Priority: Major > > We observed a NullPointerException on one of our broker in 3 broker cluster > environment. If I list the processes and open ports it seems that the faulty > broker is running, but the kafka-connect (we used it also) periodically > restarts due to fact that it can not connect to the kafka cluster (configured > ssl & plaintext mode too). Is it a bug in kafka/zookeeper? > > [2019-02-05 14:28:11,359] WARN Client session timed out, have not heard from > server in 4141ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > java.lang.NullPointerException > at > kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > [2019-02-05 14:28:12,526] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:22,701] WARN Client session timed out, have not heard from > server in 4004ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:28,670] WARN Client session timed out, have not heard from > server in 4049ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 15:05:20,601] WARN [GroupCoordinator 1]: Failed to write empty > metadata for group > encodable-emvTokenAccess-delta-encoder-group-emvIssuerAccess-v2-2-0: The > group is rebalancing, so a rejoin is needed. > (kafka.coordinator.group.GroupCoordinator) > kafka 7381 1 0 14:22 ? 00:00:19 java -Xmx512M -Xms512M -server -XX:+UseG1GC > -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 > -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true > -Xloggc:/opt/kafka/bin/../logs/zookeeper-gc.log -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dkafka.logs.dir=/opt/kafka/bin/../logs > -Dlog4j.configuration=file:/opt/kafka/config/zoo-log4j.properties -cp >
[jira] [Comment Edited] (KAFKA-7027) Overloaded StreamsBuilder Build Method to Accept java.util.Properties
[ https://issues.apache.org/jira/browse/KAFKA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793734#comment-16793734 ] Bill Bejeck edited comment on KAFKA-7027 at 3/15/19 4:08 PM: - cherry-picked [https://github.com/apache/kafka/pull/6373] to 2.2 and 2.1 was (Author: bbejeck): cherry-picked [https://github.com/apache/kafka/pull/6373] tp 2.2 and 2.1 > Overloaded StreamsBuilder Build Method to Accept java.util.Properties > - > > Key: KAFKA-7027 > URL: https://issues.apache.org/jira/browse/KAFKA-7027 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Major > Labels: kip > Fix For: 2.1.0 > > > Add overloaded method to {{StreamsBuilder}} accepting a > {{java.utils.Properties}} instance. > > KIP can be found here > https://cwiki.apache.org/confluence/display/KAFKA/KIP-312%3A+Add+Overloaded+StreamsBuilder+Build+Method+to+Accept+java.util.Properties -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7027) Overloaded StreamsBuilder Build Method to Accept java.util.Properties
[ https://issues.apache.org/jira/browse/KAFKA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793734#comment-16793734 ] Bill Bejeck commented on KAFKA-7027: cherry-picked [https://github.com/apache/kafka/pull/6373] tp 2.2 and 2.1 > Overloaded StreamsBuilder Build Method to Accept java.util.Properties > - > > Key: KAFKA-7027 > URL: https://issues.apache.org/jira/browse/KAFKA-7027 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Major > Labels: kip > Fix For: 2.1.0 > > > Add overloaded method to {{StreamsBuilder}} accepting a > {{java.utils.Properties}} instance. > > KIP can be found here > https://cwiki.apache.org/confluence/display/KAFKA/KIP-312%3A+Add+Overloaded+StreamsBuilder+Build+Method+to+Accept+java.util.Properties -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8091) Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener
[ https://issues.apache.org/jira/browse/KAFKA-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793720#comment-16793720 ] ASF GitHub Bot commented on KAFKA-8091: --- rajinisivaram commented on pull request #6450: KAFKA-8091; Use commitSync to check connection failure in listener update test URL: https://github.com/apache/kafka/pull/6450 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener > --- > > Key: KAFKA-8091 > URL: https://issues.apache.org/jira/browse/KAFKA-8091 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.2.0 >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > > See KAFKA-6824 for details. Since the SSL version of the test is currently > skipped using @Ignore, fixing this for SASL first and wait for that to be > stable before re-enabling SSL tests under KAFKA-6824. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7027) Overloaded StreamsBuilder Build Method to Accept java.util.Properties
[ https://issues.apache.org/jira/browse/KAFKA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793688#comment-16793688 ] ASF GitHub Bot commented on KAFKA-7027: --- bbejeck commented on pull request #6373: KAFKA-7027: Add an overload build method in scala URL: https://github.com/apache/kafka/pull/6373 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Overloaded StreamsBuilder Build Method to Accept java.util.Properties > - > > Key: KAFKA-7027 > URL: https://issues.apache.org/jira/browse/KAFKA-7027 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Major > Labels: kip > Fix For: 2.1.0 > > > Add overloaded method to {{StreamsBuilder}} accepting a > {{java.utils.Properties}} instance. > > KIP can be found here > https://cwiki.apache.org/confluence/display/KAFKA/KIP-312%3A+Add+Overloaded+StreamsBuilder+Build+Method+to+Accept+java.util.Properties -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7723) Kafka Connect support override worker kafka api configuration with connector configuration that post by rest api
[ https://issues.apache.org/jira/browse/KAFKA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793635#comment-16793635 ] Randall Hauch commented on KAFKA-7723: -- [~laomei], first of all, thanks for logging this issue, creating [KIP-407|https://cwiki.apache.org/confluence/display/KAFKA/KIP-407%3A+Kafka+Connect+support+override+worker+kafka+api+configuration+with+connector+configuration+that+post+by+rest+api], and creating a pull request. However, I think this is nearly identical to (or rather a subset of) KAFKA-6890 / [KIP-296|https://cwiki.apache.org/confluence/display/KAFKA/KIP-296%3A+Connector+level+configurability+for+client+configs], which IMO has the correct scope and where a discussion is taking place about the requirements and user experience. This proposal seems to differ from KAFKA-6890 / KIP-296 is that the approach proposed here only addresses connector configurations specified via the REST API, and not via configuration files passed to the standalone Connect worker. This would be a significant departure from the current behavior, where the REST API and file configurations are completely compatible. Since KAFKA-6890 / KIP-296 are older, can we resolve this issue as DUPLICATE, close the PR without merging, and withdraw KIP-407? > Kafka Connect support override worker kafka api configuration with connector > configuration that post by rest api > > > Key: KAFKA-7723 > URL: https://issues.apache.org/jira/browse/KAFKA-7723 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Reporter: laomei >Priority: Minor > Labels: needs-kip > > I'm using kafka sink connect; "auto.offset.reset" is set in > connect-distributed*.properties; > It works for all connector which in one worker; So the consumer will poll > records from latest or earliest; I can not control the auto.offset.reset in > connector configs post with rest api; > So I think is necessary to override worker kafka api configs with connector > configs; > Like this > {code:java} > { > "name": "test", > "config": { > "consumer.auto.offset.reset": "latest", > "consumer.xxx" > "connector.class": "com.laomei.sis.solr.SolrConnector", > "tasks.max": "1", > "poll.interval.ms": "100", > "connect.timeout.ms": "6", > "topics": "test" > } > } > {code} > We can override kafka consumer auto offset reset in sink connector; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7723) Kafka Connect support override worker kafka api configuration with connector configuration that post by rest api
[ https://issues.apache.org/jira/browse/KAFKA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-7723: - Labels: needs-kip (was: ) > Kafka Connect support override worker kafka api configuration with connector > configuration that post by rest api > > > Key: KAFKA-7723 > URL: https://issues.apache.org/jira/browse/KAFKA-7723 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Reporter: laomei >Priority: Minor > Labels: needs-kip > > I'm using kafka sink connect; "auto.offset.reset" is set in > connect-distributed*.properties; > It works for all connector which in one worker; So the consumer will poll > records from latest or earliest; I can not control the auto.offset.reset in > connector configs post with rest api; > So I think is necessary to override worker kafka api configs with connector > configs; > Like this > {code:java} > { > "name": "test", > "config": { > "consumer.auto.offset.reset": "latest", > "consumer.xxx" > "connector.class": "com.laomei.sis.solr.SolrConnector", > "tasks.max": "1", > "poll.interval.ms": "100", > "connect.timeout.ms": "6", > "topics": "test" > } > } > {code} > We can override kafka consumer auto offset reset in sink connector; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7898) ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
[ https://issues.apache.org/jira/browse/KAFKA-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793567#comment-16793567 ] Steven McDonald edited comment on KAFKA-7898 at 3/15/19 11:54 AM: -- Hi! We encountered this as well and I did some investigation into the problem. It is a bug in Kafka 2.1.x that is fixed in 2.2.x (though not explicitly; the fix is the result of [a refactor|https://github.com/apache/kafka/commit/2155c6d54b087206b6aa1d58747f141761394eaf#diff-8bcd2c427556f434e33cf22abec548c2R217]). The underlying problem is with the Zookeeper client library's MultiCallback interface. The [documentation|https://zookeeper.apache.org/doc/r3.4.13/api/org/apache/zookeeper/AsyncCallback.MultiCallback.html] for this says that "all opResults are OpResult.ErrorResult", but [some error conditions|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689] will pass the callback a null pointer in place of a list. Kafka 2.1.x is implemented according to the documentation, so the null pointer case is not handled, leading to this bug. I have [raised this issue|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3314] with Zookeeper. I also consider it a bug that this NullPointerException leaves the Kafka cluster in a state that it does not recover from automatically. In our case, this bug was hit during a controller election, resulting in a node that was designated as controller but unable to function as such. It would be sufficient for this exception to simply kill the Kafka node so that the remaining nodes can recover, but I think that is a separate bug (which I will raise with Zookeeper first, as the exception is currently caught there). I can provide additional information on our experience if it's of any interest, but since this is already fixed in Kafka 2.2.x I don't see much point expanding here. was (Author: steven-usabilla): Hi! We encountered this as well and I did some investigation into the problem. It is a bug in Kafka 2.1.x that is fixed in 2.2.x (though not explicitly; the fix is the result of [a refactor|https://github.com/apache/kafka/commit/2155c6d54b087206b6aa1d58747f141761394eaf#diff-8bcd2c427556f434e33cf22abec548c2R217]). The underlying problem is with the Zookeeper client library's MultiCallback interface. The [documentation|https://zookeeper.apache.org/doc/r3.4.13/api/org/apache/zookeeper/AsyncCallback.MultiCallback.html] for this says that "all opResults are OpResult.ErrorResult", but [some error conditions|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689] will pass the callback a null pointer in place of a list. Kafka 2.1.x is implemented according to the documentation, so the null pointer case is not handled, leading to this bug. I also consider it a bug that this NullPointerException leaves the Kafka cluster in a state that it does not recover from automatically. In our case, this bug was hit during a controller election, resulting in a node that was designated as controller but unable to function as such. It would be sufficient for this exception to simply kill the Kafka node so that the remaining nodes can recover, but I think that is a separate bug (which I will raise with Zookeeper first, as the exception is currently caught there). I can provide additional information on our experience if it's of any interest, but since this is already fixed in Kafka 2.2.x I don't see much point expanding here. > ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn) > --- > > Key: KAFKA-7898 > URL: https://issues.apache.org/jira/browse/KAFKA-7898 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Gabriel Lukacs >Priority: Major > > We observed a NullPointerException on one of our broker in 3 broker cluster > environment. If I list the processes and open ports it seems that the faulty > broker is running, but the kafka-connect (we used it also) periodically > restarts due to fact that it can not connect to the kafka cluster (configured > ssl & plaintext mode too). Is it a bug in kafka/zookeeper? > > [2019-02-05 14:28:11,359] WARN Client session timed out, have not heard from > server in 4141ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > java.lang.NullPointerException > at > kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633) > at
[jira] [Commented] (KAFKA-7898) ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
[ https://issues.apache.org/jira/browse/KAFKA-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793567#comment-16793567 ] Steven McDonald commented on KAFKA-7898: Hi! We encountered this as well and I did some investigation into the problem. It is a bug in Kafka 2.1.x that is fixed in 2.2.x (though not explicitly; the fix is the result of [a refactor|https://github.com/apache/kafka/commit/2155c6d54b087206b6aa1d58747f141761394eaf#diff-8bcd2c427556f434e33cf22abec548c2R217]). The underlying problem is with the Zookeeper client library's MultiCallback interface. The [documentation|https://zookeeper.apache.org/doc/r3.4.13/api/org/apache/zookeeper/AsyncCallback.MultiCallback.html] for this says that "all opResults are OpResult.ErrorResult", but [some error conditions|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689] will pass the callback a null pointer in place of a list. Kafka 2.1.x is implemented according to the documentation, so the null pointer case is not handled, leading to this bug. I also consider it a bug that this NullPointerException leaves the Kafka cluster in a state that it does not recover from automatically. In our case, this bug was hit during a controller election, resulting in a node that was designated as controller but unable to function as such. It would be sufficient for this exception to simply kill the Kafka node so that the remaining nodes can recover, but I think that is a separate bug (which I will raise with Zookeeper first, as the exception is currently caught there). I can provide additional information on our experience if it's of any interest, but since this is already fixed in Kafka 2.2.x I don't see much point expanding here. > ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn) > --- > > Key: KAFKA-7898 > URL: https://issues.apache.org/jira/browse/KAFKA-7898 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Gabriel Lukacs >Priority: Major > > We observed a NullPointerException on one of our broker in 3 broker cluster > environment. If I list the processes and open ports it seems that the faulty > broker is running, but the kafka-connect (we used it also) periodically > restarts due to fact that it can not connect to the kafka cluster (configured > ssl & plaintext mode too). Is it a bug in kafka/zookeeper? > > [2019-02-05 14:28:11,359] WARN Client session timed out, have not heard from > server in 4141ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > java.lang.NullPointerException > at > kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > [2019-02-05 14:28:12,526] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:22,701] WARN Client session timed out, have not heard from > server in 4004ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 14:28:28,670] WARN Client session timed out, have not heard from > server in 4049ms for sessionid 0x310166e > (org.apache.zookeeper.ClientCnxn) > [2019-02-05 15:05:20,601] WARN [GroupCoordinator 1]: Failed to write empty > metadata for group > encodable-emvTokenAccess-delta-encoder-group-emvIssuerAccess-v2-2-0: The > group is rebalancing, so a rejoin is needed. > (kafka.coordinator.group.GroupCoordinator) > kafka 7381 1 0 14:22 ? 00:00:19 java -Xmx512M -Xms512M -server -XX:+UseG1GC > -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 > -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true > -Xloggc:/opt/kafka/bin/../logs/zookeeper-gc.log -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dkafka.logs.dir=/opt/kafka/bin/../logs > -Dlog4j.configuration=file:/opt/kafka/config/zoo-log4j.properties -cp >
[jira] [Created] (KAFKA-8109) Consumer with isolation level as 'read_committed' are getting stuck for few partitions
Love Singh created KAFKA-8109: - Summary: Consumer with isolation level as 'read_committed' are getting stuck for few partitions Key: KAFKA-8109 URL: https://issues.apache.org/jira/browse/KAFKA-8109 Project: Kafka Issue Type: Bug Affects Versions: 1.0.0 Reporter: Love Singh Hello , Consumers with isolation level set as 'read_committed' are getting stuck for few partitions in a topic , for other it is working fine . Upon examination we have found out that the LSO(last stable offset) lag for those topic-partitions are more than 25K (JMX Metric : LastStableOffsetLag). We can read for any offsets from these topic-partitions in read_commited before LSO but consumers gets stuck when it reaches LSO . READ_UNCOMMITED mode works fine . We have seen below error repeatedly in our log for that partition : _"Found no record of producerId on the broker. It is possible that the last message with the producerId has been removed due to hitting the retention limit."_ All the producers are transactional We are not sure what else to check here . Can some one please have a look . Thanks . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8091) Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener
[ https://issues.apache.org/jira/browse/KAFKA-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793473#comment-16793473 ] ASF GitHub Bot commented on KAFKA-8091: --- rajinisivaram commented on pull request #6450: KAFKA-8091; Use commitSync to check connection failure in listener update test URL: https://github.com/apache/kafka/pull/6450 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky test DynamicBrokerReconfigurationTest#testAddRemoveSaslListener > --- > > Key: KAFKA-8091 > URL: https://issues.apache.org/jira/browse/KAFKA-8091 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.2.0 >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > > See KAFKA-6824 for details. Since the SSL version of the test is currently > skipped using @Ignore, fixing this for SASL first and wait for that to be > stable before re-enabling SSL tests under KAFKA-6824. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7697) Possible deadlock in kafka.cluster.Partition
[ https://issues.apache.org/jira/browse/KAFKA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793469#comment-16793469 ] Ankit Singhal commented on KAFKA-7697: -- We also hit the same issue. Had to restart the broker after almost every 6 hours! [~jnadler] what is the issue with 2.1.1 ? We are planning to move to this version.. [~rsivaram] Shall we move to 2.0.1 since 2.1.1 is just released and we might hit other issues? 2.0.1 seems pretty stable! > Possible deadlock in kafka.cluster.Partition > > > Key: KAFKA-7697 > URL: https://issues.apache.org/jira/browse/KAFKA-7697 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Gian Merlino >Assignee: Rajini Sivaram >Priority: Blocker > Fix For: 2.2.0, 2.1.1 > > Attachments: threaddump.txt > > > After upgrading a fairly busy broker from 0.10.2.0 to 2.1.0, it locked up > within a few minutes (by "locked up" I mean that all request handler threads > were busy, and other brokers reported that they couldn't communicate with > it). I restarted it a few times and it did the same thing each time. After > downgrading to 0.10.2.0, the broker was stable. I attached a thread dump from > the last attempt on 2.1.0 that shows lots of kafka-request-handler- threads > trying to acquire the leaderIsrUpdateLock lock in kafka.cluster.Partition. > It jumps out that there are two threads that already have some read lock > (can't tell which one) and are trying to acquire a second one (on two > different read locks: 0x000708184b88 and 0x00070821f188): > kafka-request-handler-1 and kafka-request-handler-4. Both are handling a > produce request, and in the process of doing so, are calling > Partition.fetchOffsetSnapshot while trying to complete a DelayedFetch. At the > same time, both of those locks have writers from other threads waiting on > them (kafka-request-handler-2 and kafka-scheduler-6). Neither of those locks > appear to have writers that hold them (if only because no threads in the dump > are deep enough in inWriteLock to indicate that). > ReentrantReadWriteLock in nonfair mode prioritizes waiting writers over > readers. Is it possible that kafka-request-handler-1 and > kafka-request-handler-4 are each trying to read-lock the partition that is > currently locked by the other one, and they're both parked waiting for > kafka-request-handler-2 and kafka-scheduler-6 to get write locks, which they > never will, because the former two threads own read locks and aren't giving > them up? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6178) Broker is listed as only ISR for all partitions it is leader of
[ https://issues.apache.org/jira/browse/KAFKA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793461#comment-16793461 ] Narayan Periwal commented on KAFKA-6178: We are also seeing the same issue in our kafka cluster. We are using the version 0.10.2.1 > Broker is listed as only ISR for all partitions it is leader of > --- > > Key: KAFKA-6178 > URL: https://issues.apache.org/jira/browse/KAFKA-6178 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 > Environment: Windows >Reporter: AS >Priority: Major > Labels: windows > Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log > > > We're running a 15 broker cluster on windows machines, and one of the > brokers, 10, is the only ISR on all partitions that it is the leader of. On > partitions where it isn't the leader, it seems to follow the leadeer fine. > This is an excerpt from 'describe': > Topic: ClientQosCombined Partition: 458 Leader: 10 Replicas: > 10,6,7,8,9,0,1 Isr: 10 > Topic: ClientQosCombined Partition: 459 Leader: 11 Replicas: > 11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8 > The server.log files all seem to be pretty standard, and the only indication > of this issue is the following pattern that often repeats: > 2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition > [kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: > Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10 > 2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - > Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition > [ClientQosCombined,398] from 5,10 to 10 > For each of the partitions that 10 leads. This is the only topic that we > currently have in our cluster. The __consumer_offsets topic seems completely > normal in terms of isr counts. The controller is broker 5, which is cycling > through attempting and failing to trigger leader elections on broker 10 led > partitions. From the controller log in broker 5: > 2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader > election for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine > [kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking > state change to OnlinePartition for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] > kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] > - [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition > [ClientQosCombined,375] is not the preferred replica. Trigerring preferred > replica leader election > 2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] > failed to complete preferred replica leader election. Leader is 10 > I've also attached the logs and output from broker 10. Any idea what's wrong > here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)