[jira] [Commented] (KAFKA-8113) Flaky Test ListOffsetsRequestTest#testResponseIncludesLeaderEpoch
[ https://issues.apache.org/jira/browse/KAFKA-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861750#comment-16861750 ] Boyang Chen commented on KAFKA-8113: Failed again [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5459/console] > Flaky Test ListOffsetsRequestTest#testResponseIncludesLeaderEpoch > - > > Key: KAFKA-8113 > URL: https://issues.apache.org/jira/browse/KAFKA-8113 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3468/tests] > {quote}java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:87) > at org.junit.Assert.assertTrue(Assert.java:42) > at org.junit.Assert.assertTrue(Assert.java:53) > at > kafka.server.ListOffsetsRequestTest.fetchOffsetAndEpoch$1(ListOffsetsRequestTest.scala:136) > at > kafka.server.ListOffsetsRequestTest.testResponseIncludesLeaderEpoch(ListOffsetsRequestTest.scala:151){quote} > STDOUT > {quote}[2019-03-15 17:16:13,029] ERROR [ReplicaFetcher replicaId=2, > leaderId=1, fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-15 17:16:13,231] ERROR [KafkaApi-0] Error while responding to offset > request (kafka.server.KafkaApis:76) > org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition > topic-0 is not available{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-8525) Make log in Partion non-optional
[ https://issues.apache.org/jira/browse/KAFKA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Singh reassigned KAFKA-8525: -- Assignee: Vikas Singh > Make log in Partion non-optional > > > Key: KAFKA-8525 > URL: https://issues.apache.org/jira/browse/KAFKA-8525 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.3.0 >Reporter: Vikas Singh >Assignee: Vikas Singh >Priority: Minor > > While moving log out of replica to partition as part of KAFKA-8457 cleaned a > bunch of code by removing code like "if (!localReplica) throw), there are > still couple of additional cleanups that can be done: > # The log object in Partition can be made non-optional. As it doesn't make > sense to have a partition w/o log. Here is comment on PR for KAFKA-8457: > {{I think it shouldn't be possible to have a Partition without a > corresponding Log. Once this is merged, I think we can look into whether we > can replace the optional log field in this class with a concrete instance.}} > # The LocalReplica class can be removed simplifying replica class. Here is > another comment on the PR: {{it might be possible to turn Replica into a > trait and then let Log implement it directly. Then we could get rid of > LocalReplica. That would also help us clean up RemoteReplica, since the usage > of LogOffsetMetadata only makes sense for the local replica.}} > Creating this JIRA to track these refactoring tasks for future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8262) Flaky Test MetricsIntegrationTest#testStreamMetric
[ https://issues.apache.org/jira/browse/KAFKA-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861639#comment-16861639 ] Boyang Chen commented on KAFKA-8262: [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22647/console] Failed again > Flaky Test MetricsIntegrationTest#testStreamMetric > -- > > Key: KAFKA-8262 > URL: https://issues.apache.org/jira/browse/KAFKA-8262 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: Bruno Cadonna >Priority: Major > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetric/] > {quote}java.lang.AssertionError: Condition not met within timeout 1. > testTaskMetric -> Size of metrics of type:'commit-latency-avg' must be equal > to:5 but it's equal to 0 expected:<5> but was:<0> at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361) at > org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetric(MetricsIntegrationTest.java:228){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8527) add dynamic maintenance broker config
xiongqi wu created KAFKA-8527: - Summary: add dynamic maintenance broker config Key: KAFKA-8527 URL: https://issues.apache.org/jira/browse/KAFKA-8527 Project: Kafka Issue Type: Improvement Reporter: xiongqi wu Assignee: xiongqi wu Before we remove a broker for maintenance, we want to remove all partitions out of the broker first to avoid introducing new Under Replicated Partitions (URPs) . That is because shutting down (or killing) a broker that still hosts live partitions will lead to temporarily reduced replicas of those partitions. Moving partitions out of a broker can be done via partition reassignment. However, during the partition reassignment process, new topics can be created by Kafka and thereby new partitions can be added to the broker that is pending for removal. As a result, the removal process will need to recursively moving new topic partitions out of the maintenance broker. In a production environment in which topic creation is frequent and URP causing by broker removal cannot be tolerated, the removal process can take multiple iterations to complete the partition reassignment. We want to provide a mechanism to mask a broker as maintenance broker (Via Cluster Level Dynamic configuration). One action Kafka can take for the maintenance broker is not to assign new topic partitions to it, and thereby facilitate the broker removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8526) Broker may select a failed dir for new replica even in the presence of other live dirs
Anna Povzner created KAFKA-8526: --- Summary: Broker may select a failed dir for new replica even in the presence of other live dirs Key: KAFKA-8526 URL: https://issues.apache.org/jira/browse/KAFKA-8526 Project: Kafka Issue Type: Bug Affects Versions: 2.2.1, 2.1.1, 2.0.1, 1.1.1, 2.3.0 Reporter: Anna Povzner Suppose a broker is configured with multiple log dirs. One of the log dirs fails, but there is no load on that dir, so the broker does not know about the failure yet, _i.e._, the failed dir is still in LogManager#_liveLogDirs. Suppose a new topic gets created, and the controller chooses the broker with failed log dir to host one of the replicas. The broker gets LeaderAndIsr request with isNew flag set. LogManager#getOrCreateLog() selects a log dir for the new replica from _liveLogDirs, then one two things can happen: 1) getAbsolutePath can fail, in which case getOrCreateLog will throw an IOException 2) Creating directory for new the replica log may fail (_e.g._, if directory becomes read-only, so getAbsolutePath worked). In both cases, the selected dir will be marked offline (which is correct). However, LeaderAndIsr will return an error and replica will be marked offline, even though the broker may have other live dirs. *Proposed solution*: Broker should retry selecting a dir for the new replica, if initially selected dir threw an IOException when trying to create a directory for the new replica. We should be able to do that in LogManager#getOrCreateLog() method, but keep in mind that logDirFailureChannel.maybeAddOfflineLogDir does not synchronously removes the dir from _liveLogDirs. So, it makes sense to select initial dir by calling LogManager#nextLogDir (current implementation), but if we fail to create log on that dir, one approach is to select next dir from _liveLogDirs in round-robin fashion (until we get to initial log dir – the case where all dirs failed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8305) AdminClient should support creating topics with default partitions and replication factor
[ https://issues.apache.org/jira/browse/KAFKA-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8305: --- Labels: kip (was: ) > AdminClient should support creating topics with default partitions and > replication factor > - > > Key: KAFKA-8305 > URL: https://issues.apache.org/jira/browse/KAFKA-8305 > Project: Kafka > Issue Type: Improvement > Components: admin >Reporter: Almog Gavra >Assignee: Almog Gavra >Priority: Major > Labels: kip > Fix For: 2.4.0 > > > Today, the AdminClient creates topics by requiring a `NewTopic` object, which > must contain either partitions and replicas or an exact broker mapping (which > then infers partitions and replicas). Some users, however, could benefit from > just using the cluster default for replication factor but may not want to use > auto topic creation. > NOTE: I am planning on working on this, but I do not have permissions to > assign this ticket to myself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8456) Flaky Test StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi
[ https://issues.apache.org/jira/browse/KAFKA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861525#comment-16861525 ] Boyang Chen commented on KAFKA-8456: [~mjsax] Any tip for debugging this? > Flaky Test > StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > --- > > Key: KAFKA-8456 > URL: https://issues.apache.org/jira/browse/KAFKA-8456 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Boyang Chen >Priority: Major > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22331/console] > *01:20:07* > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/streams/build/reports/testOutput/org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi.test.stdout*01:20:07* > *01:20:07* org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest > > shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > FAILED*01:20:07* java.lang.AssertionError: Condition not met within > timeout 15000. Could not get expected result in time.*01:20:07* at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:375)*01:20:07* > at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:335)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.verifyWindowedCountWithTimestamp(StoreUpgradeIntegrationTest.java:830)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigrateWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:573)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:517) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8391) Flaky Test RebalanceSourceConnectorsIntegrationTest#testDeleteConnector
[ https://issues.apache.org/jira/browse/KAFKA-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861519#comment-16861519 ] Boyang Chen commented on KAFKA-8391: Failed again: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5415/console] > Flaky Test RebalanceSourceConnectorsIntegrationTest#testDeleteConnector > --- > > Key: KAFKA-8391 > URL: https://issues.apache.org/jira/browse/KAFKA-8391 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/4747/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testDeleteConnector/] > {quote}java.lang.AssertionError: Condition not met within timeout 3. > Connector tasks did not stop in time. at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:375) at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:352) at > org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testDeleteConnector(RebalanceSourceConnectorsIntegrationTest.java:166){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8525) Make log in Partion non-optional
Vikas Singh created KAFKA-8525: -- Summary: Make log in Partion non-optional Key: KAFKA-8525 URL: https://issues.apache.org/jira/browse/KAFKA-8525 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.3.0 Reporter: Vikas Singh While moving log out of replica to partition as part of KAFKA-8457 cleaned a bunch of code by removing code like "if (!localReplica) throw), there are still couple of additional cleanups that can be done: # The log object in Partition can be made non-optional. As it doesn't make sense to have a partition w/o log. Here is comment on PR for KAFKA-8457: {{I think it shouldn't be possible to have a Partition without a corresponding Log. Once this is merged, I think we can look into whether we can replace the optional log field in this class with a concrete instance.}} # The LocalReplica class can be removed simplifying replica class. Here is another comment on the PR: {{it might be possible to turn Replica into a trait and then let Log implement it directly. Then we could get rid of LocalReplica. That would also help us clean up RemoteReplica, since the usage of LogOffsetMetadata only makes sense for the local replica.}} Creating this JIRA to track these refactoring tasks for future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8456) Flaky Test StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi
[ https://issues.apache.org/jira/browse/KAFKA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861337#comment-16861337 ] Matthias J. Sax commented on KAFKA-8456: One more: [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3716/tests] > Flaky Test > StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > --- > > Key: KAFKA-8456 > URL: https://issues.apache.org/jira/browse/KAFKA-8456 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Boyang Chen >Priority: Major > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22331/console] > *01:20:07* > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/streams/build/reports/testOutput/org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi.test.stdout*01:20:07* > *01:20:07* org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest > > shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > FAILED*01:20:07* java.lang.AssertionError: Condition not met within > timeout 15000. Could not get expected result in time.*01:20:07* at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:375)*01:20:07* > at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:335)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.verifyWindowedCountWithTimestamp(StoreUpgradeIntegrationTest.java:830)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigrateWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:573)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:517) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-4893) async topic deletion conflicts with max topic length
[ https://issues.apache.org/jira/browse/KAFKA-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-4893: --- Fix Version/s: 2.1.2 > async topic deletion conflicts with max topic length > > > Key: KAFKA-4893 > URL: https://issues.apache.org/jira/browse/KAFKA-4893 > Project: Kafka > Issue Type: Bug >Reporter: Onur Karaman >Assignee: Colin P. McCabe >Priority: Major > Fix For: 2.3.0, 2.1.2, 2.2.2 > > > As per the > [documentation|http://kafka.apache.org/documentation/#basic_ops_add_topic], > topics can be only 249 characters long to line up with typical filesystem > limitations: > {quote} > Each sharded partition log is placed into its own folder under the Kafka log > directory. The name of such folders consists of the topic name, appended by a > dash (\-) and the partition id. Since a typical folder name can not be over > 255 characters long, there will be a limitation on the length of topic names. > We assume the number of partitions will not ever be above 100,000. Therefore, > topic names cannot be longer than 249 characters. This leaves just enough > room in the folder name for a dash and a potentially 5 digit long partition > id. > {quote} > {{kafka.common.Topic.maxNameLength}} is set to 249 and is used during > validation. > This limit ends up not being quite right since topic deletion ends up > renaming the directory to the form {{topic-partition.uniqueId-delete}} as can > be seen in {{LogManager.asyncDelete}}: > {code} > val dirName = new StringBuilder(removedLog.name) > .append(".") > > .append(java.util.UUID.randomUUID.toString.replaceAll("-","")) > .append(Log.DeleteDirSuffix) > .toString() > {code} > So the unique id and "-delete" suffix end up hogging some of the characters. > Deleting a long-named topic results in a log message such as the following: > {code} > kafka.common.KafkaStorageException: Failed to rename log directory from > /tmp/kafka-logs0/0-0 > to > /tmp/kafka-logs0/0-0.797bba3fb2464729840f87769243edbb-delete > at kafka.log.LogManager.asyncDelete(LogManager.scala:439) > at > kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:142) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) > at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221) > at kafka.cluster.Partition.delete(Partition.scala:137) > at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:230) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:260) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:259) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:259) > at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:174) > at kafka.server.KafkaApis.handle(KafkaApis.scala:86) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64) > at java.lang.Thread.run(Thread.java:745) > {code} > The topic after this point still exists but has Leader set to -1 and the > controller recognizes the topic completion as incomplete (the topic znode is > still in /admin/delete_topics). > I don't believe linkedin has any topic name this long but I'm making the > ticket in case anyone runs into this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-5105) ReadOnlyKeyValueStore range scans are not ordered
[ https://issues.apache.org/jira/browse/KAFKA-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861256#comment-16861256 ] Dmitry Minkovsky commented on KAFKA-5105: - Wondering what the latest on this is. This [blog post|https://danlebrero.com/2018/12/17/big-results-in-kafka-streams-range-query-rocksdb/] concludes that implementation-wise, a RockDB-backed store, with or without cache, will iterate lexicographically. > ReadOnlyKeyValueStore range scans are not ordered > - > > Key: KAFKA-5105 > URL: https://issues.apache.org/jira/browse/KAFKA-5105 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Dmitry Minkovsky >Priority: Major > > Following up with this thread > https://www.mail-archive.com/users@kafka.apache.org/msg25373.html > Although ReadOnlyKeyValueStore's #range() is documented not to returns values > in order, it would be great if it would for keys within a single partition. > This would facilitate using interactive queries and local state as one would > use HBase to index data by prefixed keys. If range returned keys in > lexicographical order, I could use interactive queries for all my data needs > except search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-4893) async topic deletion conflicts with max topic length
[ https://issues.apache.org/jira/browse/KAFKA-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-4893: --- Priority: Major (was: Minor) > async topic deletion conflicts with max topic length > > > Key: KAFKA-4893 > URL: https://issues.apache.org/jira/browse/KAFKA-4893 > Project: Kafka > Issue Type: Bug >Reporter: Onur Karaman >Assignee: Colin P. McCabe >Priority: Major > Fix For: 2.3.0, 2.2.2 > > > As per the > [documentation|http://kafka.apache.org/documentation/#basic_ops_add_topic], > topics can be only 249 characters long to line up with typical filesystem > limitations: > {quote} > Each sharded partition log is placed into its own folder under the Kafka log > directory. The name of such folders consists of the topic name, appended by a > dash (\-) and the partition id. Since a typical folder name can not be over > 255 characters long, there will be a limitation on the length of topic names. > We assume the number of partitions will not ever be above 100,000. Therefore, > topic names cannot be longer than 249 characters. This leaves just enough > room in the folder name for a dash and a potentially 5 digit long partition > id. > {quote} > {{kafka.common.Topic.maxNameLength}} is set to 249 and is used during > validation. > This limit ends up not being quite right since topic deletion ends up > renaming the directory to the form {{topic-partition.uniqueId-delete}} as can > be seen in {{LogManager.asyncDelete}}: > {code} > val dirName = new StringBuilder(removedLog.name) > .append(".") > > .append(java.util.UUID.randomUUID.toString.replaceAll("-","")) > .append(Log.DeleteDirSuffix) > .toString() > {code} > So the unique id and "-delete" suffix end up hogging some of the characters. > Deleting a long-named topic results in a log message such as the following: > {code} > kafka.common.KafkaStorageException: Failed to rename log directory from > /tmp/kafka-logs0/0-0 > to > /tmp/kafka-logs0/0-0.797bba3fb2464729840f87769243edbb-delete > at kafka.log.LogManager.asyncDelete(LogManager.scala:439) > at > kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:142) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) > at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221) > at kafka.cluster.Partition.delete(Partition.scala:137) > at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:230) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:260) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:259) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:259) > at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:174) > at kafka.server.KafkaApis.handle(KafkaApis.scala:86) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64) > at java.lang.Thread.run(Thread.java:745) > {code} > The topic after this point still exists but has Leader set to -1 and the > controller recognizes the topic completion as incomplete (the topic znode is > still in /admin/delete_topics). > I don't believe linkedin has any topic name this long but I'm making the > ticket in case anyone runs into this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-4893) async topic deletion conflicts with max topic length
[ https://issues.apache.org/jira/browse/KAFKA-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-4893: --- Fix Version/s: 2.2.2 > async topic deletion conflicts with max topic length > > > Key: KAFKA-4893 > URL: https://issues.apache.org/jira/browse/KAFKA-4893 > Project: Kafka > Issue Type: Bug >Reporter: Onur Karaman >Assignee: Colin P. McCabe >Priority: Minor > Fix For: 2.3.0, 2.2.2 > > > As per the > [documentation|http://kafka.apache.org/documentation/#basic_ops_add_topic], > topics can be only 249 characters long to line up with typical filesystem > limitations: > {quote} > Each sharded partition log is placed into its own folder under the Kafka log > directory. The name of such folders consists of the topic name, appended by a > dash (\-) and the partition id. Since a typical folder name can not be over > 255 characters long, there will be a limitation on the length of topic names. > We assume the number of partitions will not ever be above 100,000. Therefore, > topic names cannot be longer than 249 characters. This leaves just enough > room in the folder name for a dash and a potentially 5 digit long partition > id. > {quote} > {{kafka.common.Topic.maxNameLength}} is set to 249 and is used during > validation. > This limit ends up not being quite right since topic deletion ends up > renaming the directory to the form {{topic-partition.uniqueId-delete}} as can > be seen in {{LogManager.asyncDelete}}: > {code} > val dirName = new StringBuilder(removedLog.name) > .append(".") > > .append(java.util.UUID.randomUUID.toString.replaceAll("-","")) > .append(Log.DeleteDirSuffix) > .toString() > {code} > So the unique id and "-delete" suffix end up hogging some of the characters. > Deleting a long-named topic results in a log message such as the following: > {code} > kafka.common.KafkaStorageException: Failed to rename log directory from > /tmp/kafka-logs0/0-0 > to > /tmp/kafka-logs0/0-0.797bba3fb2464729840f87769243edbb-delete > at kafka.log.LogManager.asyncDelete(LogManager.scala:439) > at > kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:142) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:137) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) > at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221) > at kafka.cluster.Partition.delete(Partition.scala:137) > at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:230) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:260) > at > kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:259) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:259) > at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:174) > at kafka.server.KafkaApis.handle(KafkaApis.scala:86) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64) > at java.lang.Thread.run(Thread.java:745) > {code} > The topic after this point still exists but has Leader set to -1 and the > controller recognizes the topic completion as incomplete (the topic znode is > still in /admin/delete_topics). > I don't believe linkedin has any topic name this long but I'm making the > ticket in case anyone runs into this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8487) Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit response handler
[ https://issues.apache.org/jira/browse/KAFKA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-8487: - Priority: Major (was: Blocker) > Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit > response handler > - > > Key: KAFKA-8487 > URL: https://issues.apache.org/jira/browse/KAFKA-8487 > Project: Kafka > Issue Type: Bug > Components: consumer, streams >Affects Versions: 2.3.0 >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > Fix For: 2.4.0 > > > In consumer, we handle the errors in sync / heartbeat / join response such > that: > 1. UNKNOWN_MEMBER_ID / ILLEGAL_GENERATION: we reset the generation and > request re-join. > 2. REBALANCE_IN_PROGRESS: do nothing if a rejoin will be executed, or request > re-join explicitly. > However, for commit response, we require resetGeneration for > REBALANCE_IN_PROGRESS as well. This is a flaw in two folds: > 1. As in KIP-345, with static members, reseting generation will lose the > member.id and hence may cause incorrect fencing. > 2. As in KIP-429, resetting generation will cause partitions to be "lost" > unnecessarily before re-joining the group. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8487) Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit response handler
[ https://issues.apache.org/jira/browse/KAFKA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-8487: - Affects Version/s: (was: 2.3.0) > Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit > response handler > - > > Key: KAFKA-8487 > URL: https://issues.apache.org/jira/browse/KAFKA-8487 > Project: Kafka > Issue Type: Bug > Components: consumer, streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > Fix For: 2.4.0 > > > In consumer, we handle the errors in sync / heartbeat / join response such > that: > 1. UNKNOWN_MEMBER_ID / ILLEGAL_GENERATION: we reset the generation and > request re-join. > 2. REBALANCE_IN_PROGRESS: do nothing if a rejoin will be executed, or request > re-join explicitly. > However, for commit response, we require resetGeneration for > REBALANCE_IN_PROGRESS as well. This is a flaw in two folds: > 1. As in KIP-345, with static members, reseting generation will lose the > member.id and hence may cause incorrect fencing. > 2. As in KIP-429, resetting generation will cause partitions to be "lost" > unnecessarily before re-joining the group. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8487) Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit response handler
[ https://issues.apache.org/jira/browse/KAFKA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-8487. -- Resolution: Fixed Fix Version/s: 2.4.0 > Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit > response handler > - > > Key: KAFKA-8487 > URL: https://issues.apache.org/jira/browse/KAFKA-8487 > Project: Kafka > Issue Type: Bug > Components: consumer, streams >Affects Versions: 2.3.0 >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Blocker > Fix For: 2.4.0 > > > In consumer, we handle the errors in sync / heartbeat / join response such > that: > 1. UNKNOWN_MEMBER_ID / ILLEGAL_GENERATION: we reset the generation and > request re-join. > 2. REBALANCE_IN_PROGRESS: do nothing if a rejoin will be executed, or request > re-join explicitly. > However, for commit response, we require resetGeneration for > REBALANCE_IN_PROGRESS as well. This is a flaw in two folds: > 1. As in KIP-345, with static members, reseting generation will lose the > member.id and hence may cause incorrect fencing. > 2. As in KIP-429, resetting generation will cause partitions to be "lost" > unnecessarily before re-joining the group. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8487) Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit response handler
[ https://issues.apache.org/jira/browse/KAFKA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861208#comment-16861208 ] ASF GitHub Bot commented on KAFKA-8487: --- guozhangwang commented on pull request #6894: KAFKA-8487: Only request re-join on REBALANCE_IN_PROGRESS in CommitOffsetResponse URL: https://github.com/apache/kafka/pull/6894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Consumer should not resetGeneration upon REBALANCE_IN_PROGRESS in commit > response handler > - > > Key: KAFKA-8487 > URL: https://issues.apache.org/jira/browse/KAFKA-8487 > Project: Kafka > Issue Type: Bug > Components: consumer, streams >Affects Versions: 2.3.0 >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Blocker > > In consumer, we handle the errors in sync / heartbeat / join response such > that: > 1. UNKNOWN_MEMBER_ID / ILLEGAL_GENERATION: we reset the generation and > request re-join. > 2. REBALANCE_IN_PROGRESS: do nothing if a rejoin will be executed, or request > re-join explicitly. > However, for commit response, we require resetGeneration for > REBALANCE_IN_PROGRESS as well. This is a flaw in two folds: > 1. As in KIP-345, with static members, reseting generation will lose the > member.id and hence may cause incorrect fencing. > 2. As in KIP-429, resetting generation will cause partitions to be "lost" > unnecessarily before re-joining the group. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7469) Broker keeps group rebalance after adding FS
[ https://issues.apache.org/jira/browse/KAFKA-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gwen Shapira resolved KAFKA-7469. - Resolution: Fixed > Broker keeps group rebalance after adding FS > - > > Key: KAFKA-7469 > URL: https://issues.apache.org/jira/browse/KAFKA-7469 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: Boaz >Priority: Major > Fix For: 0.8.1.2 > > > Hi, > I'm using a kafka_2.10-0.10.1.1 with 3 brokers cluster. > A few days ago, we started running out of FS and our System Admin allocated > some more Disc space. After the allocation, we started experiencing high lags > on the consumers which kept growing. > On the Consumer side, we saw that no data is being consumed and the following > message keep coming in the log files: > o.a.k.c.c.internals.AbstractCoordinator : (Re-)joining group > AutoPaymentSyncGroup > > On the Broker logs, we keep seeing seeing messages of restabilize group : > [2018-09-27 18:38:16,264] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentActivityGroup with old generation 357 > (kafka.coordinator.GroupCoordinator) > [2018-09-27 18:38:16,264] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentCreditCardTypeGroup with old generation 278 > (kafka.coordinator.GroupCoordinator) > [2018-09-27 18:38:16,284] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentAuthChannelGroup with old generation 349 > (kafka.coordinator.GroupCoordinator) > [2018-09-27 18:38:16,411] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentAuthCodeGroup with old generation 284 > (kafka.coordinator.GroupCoordinator) > [2018-09-27 18:38:16,463] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentInteractionSyncGroup with old generation 359 > (kafka.coordinator.GroupCoordinator) > [2018-09-27 18:38:16,464] INFO [GroupCoordinator 0]: Preparing to restabilize > group AutoPaymentSyncGroup with old generation 358 > (kafka.coordinator.GroupCoordinator) > > After bouncing the broker, the issue was resolved. > Thanks, > Boaz. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path Extension
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Labels: path zkcli zookeeper (was: ) > Zookeeper Acl Sensitive Path Extension > -- > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug > Components: zkclient >Affects Versions: 1.1.0, 2.2.1 >Reporter: sebastien diaz >Priority: Major > Labels: path, zkcli, zookeeper > > There is too more readable config in Zookeeper as /brokers,/controller, > /kafka-acl, . > As Zookeeper can be accessed by other projects/users , the security should be > extended to Zookeeper ACL properly. > We shoudl have the possibility to set these paths by configuration and not > (as it is today) in the code. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path Extension
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Component/s: zkclient > Zookeeper Acl Sensitive Path Extension > -- > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug > Components: zkclient >Affects Versions: 1.1.0, 2.2.1 >Reporter: sebastien diaz >Priority: Major > > There is too more readable config in Zookeeper as /brokers,/controller, > /kafka-acl, . > As Zookeeper can be accessed by other projects/users , the security should be > extended to Zookeeper ACL properly. > We shoudl have the possibility to set these paths by configuration and not > (as it is today) in the code. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Description: There is too more readable config in Zookeeper as /brokers,/controller, /kafka-acl, . As Zookeeper can be accessed by other projects/users , the security should be extended to Zookeeper ACL properly. We shoudl have the possibility to set these paths by configuration and not (as it is today) in the code. > Zookeeper Acl Sensitive Path > > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug >Reporter: sebastien diaz >Priority: Major > > There is too more readable config in Zookeeper as /brokers,/controller, > /kafka-acl, . > As Zookeeper can be accessed by other projects/users , the security should be > extended to Zookeeper ACL properly. > We shoudl have the possibility to set these paths by configuration and not > (as it is today) in the code. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path Extension
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Summary: Zookeeper Acl Sensitive Path Extension (was: Zookeeper Acl Sensitive Path) > Zookeeper Acl Sensitive Path Extension > -- > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug >Reporter: sebastien diaz >Priority: Major > > There is too more readable config in Zookeeper as /brokers,/controller, > /kafka-acl, . > As Zookeeper can be accessed by other projects/users , the security should be > extended to Zookeeper ACL properly. > We shoudl have the possibility to set these paths by configuration and not > (as it is today) in the code. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path Extension
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Affects Version/s: 1.1.0 2.2.1 > Zookeeper Acl Sensitive Path Extension > -- > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.1 >Reporter: sebastien diaz >Priority: Major > > There is too more readable config in Zookeeper as /brokers,/controller, > /kafka-acl, . > As Zookeeper can be accessed by other projects/users , the security should be > extended to Zookeeper ACL properly. > We shoudl have the possibility to set these paths by configuration and not > (as it is today) in the code. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8524) Zookeeper Acl Sensitive Path
[ https://issues.apache.org/jira/browse/KAFKA-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sebastien diaz updated KAFKA-8524: -- Summary: Zookeeper Acl Sensitive Path (was: Zookeeper Acl ) > Zookeeper Acl Sensitive Path > > > Key: KAFKA-8524 > URL: https://issues.apache.org/jira/browse/KAFKA-8524 > Project: Kafka > Issue Type: Bug >Reporter: sebastien diaz >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8524) Zookeeper Acl
sebastien diaz created KAFKA-8524: - Summary: Zookeeper Acl Key: KAFKA-8524 URL: https://issues.apache.org/jira/browse/KAFKA-8524 Project: Kafka Issue Type: Bug Reporter: sebastien diaz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8456) Flaky Test StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi
[ https://issues.apache.org/jira/browse/KAFKA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861155#comment-16861155 ] Boyang Chen commented on KAFKA-8456: Failed again: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5432/consoleFull] > Flaky Test > StoreUpgradeIntegrationTest#shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > --- > > Key: KAFKA-8456 > URL: https://issues.apache.org/jira/browse/KAFKA-8456 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Boyang Chen >Priority: Major > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22331/console] > *01:20:07* > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/streams/build/reports/testOutput/org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi.test.stdout*01:20:07* > *01:20:07* org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest > > shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi > FAILED*01:20:07* java.lang.AssertionError: Condition not met within > timeout 15000. Could not get expected result in time.*01:20:07* at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:375)*01:20:07* > at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:335)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.verifyWindowedCountWithTimestamp(StoreUpgradeIntegrationTest.java:830)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigrateWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:573)*01:20:07* > at > org.apache.kafka.streams.integration.StoreUpgradeIntegrationTest.shouldMigratePersistentWindowStoreToTimestampedWindowStoreUsingPapi(StoreUpgradeIntegrationTest.java:517) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8502) Flakey test AdminClientIntegrationTest#testElectUncleanLeadersForAllPartitions
[ https://issues.apache.org/jira/browse/KAFKA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861154#comment-16861154 ] Boyang Chen commented on KAFKA-8502: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5432/consoleFull] Failed again > Flakey test AdminClientIntegrationTest#testElectUncleanLeadersForAllPartitions > -- > > Key: KAFKA-8502 > URL: https://issues.apache.org/jira/browse/KAFKA-8502 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Priority: Major > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5355/consoleFull] > > *18:06:01* *18:06:01* kafka.api.AdminClientIntegrationTest > > testElectUncleanLeadersForAllPartitions FAILED*18:06:01* > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout.*18:06:01* at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)*18:06:01* > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)*18:06:01* > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)*18:06:01* > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)*18:06:01* > at > kafka.api.AdminClientIntegrationTest.testElectUncleanLeadersForAllPartitions(AdminClientIntegrationTest.scala:1496)*18:06:01* > *18:06:01* Caused by:*18:06:01* > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8501) Remove key and value from exception message
[ https://issues.apache.org/jira/browse/KAFKA-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck resolved KAFKA-8501. Resolution: Fixed > Remove key and value from exception message > --- > > Key: KAFKA-8501 > URL: https://issues.apache.org/jira/browse/KAFKA-8501 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Badai Aqrandista >Assignee: Carlos Manuel Duclos Vergara >Priority: Major > Labels: easy-fix, newbie > Fix For: 2.4.0 > > > KAFKA-7510 moves the WARN messages that contain key and value to TRACE level. > But the exceptions still contain key and value. These are the two in > RecordCollectorImpl: > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L185-L196] > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L243-L254] > > Can these be modified as well to remove key and value from the error message, > which we don't know what log level it will be printed in? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8501) Remove key and value from exception message
[ https://issues.apache.org/jira/browse/KAFKA-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck updated KAFKA-8501: --- Fix Version/s: 2.4.0 > Remove key and value from exception message > --- > > Key: KAFKA-8501 > URL: https://issues.apache.org/jira/browse/KAFKA-8501 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Badai Aqrandista >Assignee: Carlos Manuel Duclos Vergara >Priority: Major > Labels: easy-fix, newbie > Fix For: 2.4.0 > > > KAFKA-7510 moves the WARN messages that contain key and value to TRACE level. > But the exceptions still contain key and value. These are the two in > RecordCollectorImpl: > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L185-L196] > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L243-L254] > > Can these be modified as well to remove key and value from the error message, > which we don't know what log level it will be printed in? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8501) Remove key and value from exception message
[ https://issues.apache.org/jira/browse/KAFKA-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861073#comment-16861073 ] ASF GitHub Bot commented on KAFKA-8501: --- bbejeck commented on pull request #6904: KAFKA-8501: Removing key and value from exception message URL: https://github.com/apache/kafka/pull/6904 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove key and value from exception message > --- > > Key: KAFKA-8501 > URL: https://issues.apache.org/jira/browse/KAFKA-8501 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Badai Aqrandista >Assignee: Carlos Manuel Duclos Vergara >Priority: Major > Labels: easy-fix, newbie > > KAFKA-7510 moves the WARN messages that contain key and value to TRACE level. > But the exceptions still contain key and value. These are the two in > RecordCollectorImpl: > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L185-L196] > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L243-L254] > > Can these be modified as well to remove key and value from the error message, > which we don't know what log level it will be printed in? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8334) Occasional OffsetCommit Timeout
[ https://issues.apache.org/jira/browse/KAFKA-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861023#comment-16861023 ] ASF GitHub Bot commented on KAFKA-8334: --- windkit commented on pull request #6915: KAFKA-8334 Executor to retry delayed operations failed to obtain lock URL: https://github.com/apache/kafka/pull/6915 **ASF**: https://issues.apache.org/jira/browse/KAFKA-8334 ### Brief Summary: We have seen `OffsetCommit` timed out when we do manual offset commit with low traffic. When we append the offset commit to the topic `__consumer_offsets`, the `DelayedProduce` would need to obtain the group metadata in order to complete. If the group metadata is obtained by others (such as `HeartBeat`), it would fail and it would only be retried when there is a next `OffsetCommit` ### Reproduce Methodology 1. DelayedProduce on __consumer_offsets could not be completed if the group.lock is acquired by others 2. We spam requests like Heartbeat to keep acquiring group.lock 3. We keep sending OffsetCommit and check the processing time Reproduce Script https://gist.github.com/windkit/3384bb86dc146111d1e0857e66b85861 - jammer.py - join the group "wilson-tester" and keep spamming Heartbeat - tester.py - fetch one message and do a long processing (or sleep) and then commit the offset ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Occasional OffsetCommit Timeout > --- > > Key: KAFKA-8334 > URL: https://issues.apache.org/jira/browse/KAFKA-8334 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.1 >Reporter: windkithk >Priority: Major > Attachments: kafka8334.patch, offsetcommit_p99.9_cut.jpg > > > h2. 1) Issue Summary > Since we have upgraded to 1.1, we have observed occasional OffsetCommit > timeouts from clients > {code:java} > Offset commit failed on partition sometopic-somepartition at offset > someoffset: The request timed out{code} > Normally OffsetCommit completes within 10ms but when we check the 99.9 > percentile, we see the request duration time jumps up to 5000 ms > (offsets.commit.timeout.ms) > Here is a screenshot of prometheus recording > kafka_network_request_duration_milliseconds > (offsetcommit_p99.9_cut.jpg) > and after checking the duration breakdown, most of the time was spent on > "Remote" Scope > (Below is a request log line produced by inhouse slow request logger > {code:java} > [2019-04-16 13:06:20,339] WARN Slow > response:RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=2, > clientId=kafka-python-1.4.6, correlationId=953) -- > {group_id=wilson-tester,generation_id=28,member_id=kafka-python-1.4.6-69ed979d-a069-4c6d-9862-e4fc34883269,retention_time=-1,topics=[{topic=test,partitions=[{partition=2,offset=63456,metadata=null}]}]} > from > connection;totalTime:5001.942000,requestQueueTime:0.03,localTime:0.574000,remoteTime:5001.173000,responseQueueTime:0.058000,sendTime:0.053000,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS > (kafka.request.logger) > {code} > h2. 2) What got changed in 1.1 from 0.10.2.1? > # Log Level Changed > In 1.1 Kafka Consumer, logging about timed out OffsetCommit is changed from > DEBUG to WARN > # Group Lock is acquired when trying to complete DelayedProduce of > OffsetCommit > This was added after 0.11.0.2 > (Ticket: https://issues.apache.org/jira/browse/KAFKA-6042) > (PR: https://github.com/apache/kafka/pull/4103) > (in 1.1 > https://github.com/apache/kafka/blob/1.1/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L292) > # Followers do incremental fetch > https://cwiki.apache.org/confluence/display/KAFKA/KIP-227%3A+Introduce+Incremental+FetchRequests+to+Increase+Partition+Scalability > h2. 3) Interaction between OffsetCommit, DelayedProduce and FetchFollower > {quote} > OffsetCommit append a message of committed offset to partition of topic > `__consumer_offsets` > During the append, it would create a DelayedProduce with lock to > GroupMetadata.lock (ReentrantLock) and add to delayedProducePurgatory > When follower fetches the partition of topic `__consumer_offsets` and causes > an increase in HighWaterMark, delayedProducePurgatory would be transversed > and all operations related to the partition may be completed > {quote} > *DelayedProduce from
[jira] [Commented] (KAFKA-8523) InsertField transformation fails when encountering tombstone event
[ https://issues.apache.org/jira/browse/KAFKA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860794#comment-16860794 ] Gunnar Morling commented on KAFKA-8523: --- Filed https://github.com/apache/kafka/pull/6914. > InsertField transformation fails when encountering tombstone event > -- > > Key: KAFKA-8523 > URL: https://issues.apache.org/jira/browse/KAFKA-8523 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Gunnar Morling >Priority: Major > > When applying the {{InsertField}} transformation to a tombstone event, an > exception is raised: > {code} > org.apache.kafka.connect.errors.DataException: Only Map objects supported in > absence of schema for [field insertion], found: null > at > org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) > at > org.apache.kafka.connect.transforms.InsertField.applySchemaless(InsertField.java:138) > at > org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:131) > at > org.apache.kafka.connect.transforms.InsertFieldTest.tombstone(InsertFieldTest.java:128) > {code} > AFAICS, the transform can still be made working in in this case by simply > building up a new value map from scratch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8523) InsertField transformation fails when encountering tombstone event
[ https://issues.apache.org/jira/browse/KAFKA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860793#comment-16860793 ] ASF GitHub Bot commented on KAFKA-8523: --- gunnarmorling commented on pull request #6914: KAFKA-8523 Enabling InsertField transform to be used with tombstone events URL: https://github.com/apache/kafka/pull/6914 Fixes https://issues.apache.org/jira/browse/KAFKA-8523 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > InsertField transformation fails when encountering tombstone event > -- > > Key: KAFKA-8523 > URL: https://issues.apache.org/jira/browse/KAFKA-8523 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Gunnar Morling >Priority: Major > > When applying the {{InsertField}} transformation to a tombstone event, an > exception is raised: > {code} > org.apache.kafka.connect.errors.DataException: Only Map objects supported in > absence of schema for [field insertion], found: null > at > org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) > at > org.apache.kafka.connect.transforms.InsertField.applySchemaless(InsertField.java:138) > at > org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:131) > at > org.apache.kafka.connect.transforms.InsertFieldTest.tombstone(InsertFieldTest.java:128) > {code} > AFAICS, the transform can still be made working in in this case by simply > building up a new value map from scratch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8523) InsertField transformation fails when encountering tombstone event
[ https://issues.apache.org/jira/browse/KAFKA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860688#comment-16860688 ] Gunnar Morling commented on KAFKA-8523: --- I'll provide a PR shortly. > InsertField transformation fails when encountering tombstone event > -- > > Key: KAFKA-8523 > URL: https://issues.apache.org/jira/browse/KAFKA-8523 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Gunnar Morling >Priority: Major > > When applying the {{InsertField}} transformation to a tombstone event, an > exception is raised: > {code} > org.apache.kafka.connect.errors.DataException: Only Map objects supported in > absence of schema for [field insertion], found: null > at > org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) > at > org.apache.kafka.connect.transforms.InsertField.applySchemaless(InsertField.java:138) > at > org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:131) > at > org.apache.kafka.connect.transforms.InsertFieldTest.tombstone(InsertFieldTest.java:128) > {code} > AFAICS, the transform can still be made working in in this case by simply > building up a new value map from scratch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8523) InsertField transformation fails when encountering tombstone event
Gunnar Morling created KAFKA-8523: - Summary: InsertField transformation fails when encountering tombstone event Key: KAFKA-8523 URL: https://issues.apache.org/jira/browse/KAFKA-8523 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Gunnar Morling When applying the {{InsertField}} transformation to a tombstone event, an exception is raised: {code} org.apache.kafka.connect.errors.DataException: Only Map objects supported in absence of schema for [field insertion], found: null at org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) at org.apache.kafka.connect.transforms.InsertField.applySchemaless(InsertField.java:138) at org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:131) at org.apache.kafka.connect.transforms.InsertFieldTest.tombstone(InsertFieldTest.java:128) {code} AFAICS, the transform can still be made working in in this case by simply building up a new value map from scratch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8503) AdminClient should ignore retries config if a custom timeout is provided
[ https://issues.apache.org/jira/browse/KAFKA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860642#comment-16860642 ] ASF GitHub Bot commented on KAFKA-8503: --- huxihx commented on pull request #6913: KAFKA-8503: Ignore retries config if a custom timeout is provided URL: https://github.com/apache/kafka/pull/6913 https://issues.apache.org/jira/browse/KAFKA-8503 When custom timeout is provided, `retries` config could be ignored for individual APIs in KafkaAdminClient. Tweaked code path for `Call.fail` to pass NPathComplexity check. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AdminClient should ignore retries config if a custom timeout is provided > > > Key: KAFKA-8503 > URL: https://issues.apache.org/jira/browse/KAFKA-8503 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: huxihx >Priority: Major > > The admin client takes a `retries` config similar to the producer. The > default value is 5. Individual APIs also accept an optional timeout, which is > defaulted to `request.timeout.ms`. The call will fail if either `retries` or > the API timeout is exceeded. This is not very intuitive. I think a user would > expect to wait if they provided a timeout and the operation cannot be > completed. In general, timeouts are much easier for users to work with and > reason about. > A couple options are either to ignore `retries` in this case or to increase > the default value of `retries` to something large and not likely to be > exceeded. I propose to do the first. Longer term, we could consider > deprecating `retries` and avoiding the overloading of `request.timeout.ms` by > providing a `default.api.timeout.ms` similar to the consumer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-8503) AdminClient should ignore retries config if a custom timeout is provided
[ https://issues.apache.org/jira/browse/KAFKA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxihx reassigned KAFKA-8503: - Assignee: huxihx > AdminClient should ignore retries config if a custom timeout is provided > > > Key: KAFKA-8503 > URL: https://issues.apache.org/jira/browse/KAFKA-8503 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: huxihx >Priority: Major > > The admin client takes a `retries` config similar to the producer. The > default value is 5. Individual APIs also accept an optional timeout, which is > defaulted to `request.timeout.ms`. The call will fail if either `retries` or > the API timeout is exceeded. This is not very intuitive. I think a user would > expect to wait if they provided a timeout and the operation cannot be > completed. In general, timeouts are much easier for users to work with and > reason about. > A couple options are either to ignore `retries` in this case or to increase > the default value of `retries` to something large and not likely to be > exceeded. I propose to do the first. Longer term, we could consider > deprecating `retries` and avoiding the overloading of `request.timeout.ms` by > providing a `default.api.timeout.ms` similar to the consumer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8507) Support --bootstrap-server in all command line tools
[ https://issues.apache.org/jira/browse/KAFKA-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860611#comment-16860611 ] Lee Dongjin commented on KAFKA-8507: I reviewed the tools in `bin` and found that the out-of-date tools are fixed in the PRs following: 1. https://github.com/apache/kafka/pull/3453 (KAFKA-5532) * ProducerPerformance 2. https://github.com/apache/kafka/pull/2161 (KAFKA-4307) * VerifiableConsumer * VerifiableProducer 3. https://github.com/apache/kafka/pull/3605 (KAFKA-2111) * ConsumerGroupCommand * ReassignPartitionsCommand * ConsoleConsumer * StreamResetter Are there any tools I omitted? > Support --bootstrap-server in all command line tools > > > Key: KAFKA-8507 > URL: https://issues.apache.org/jira/browse/KAFKA-8507 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Jason Gustafson >Priority: Major > Labels: needs-kip > > This is a unambitious initial move toward standardizing the command line > tools. We have favored the name {{\-\-bootstrap-server}} in all new tools > since it matches the config {{bootstrap.server}} which is used by all > clients. Some older commands use {{\-\-broker-list}} or > {{\-\-bootstrap-servers}} and maybe other exotic variations. We should > support {{\-\-bootstrap-server}} in all commands and deprecate the other > options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8193) Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore
[ https://issues.apache.org/jira/browse/KAFKA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8193: --- Fix Version/s: (was: 2.4.0) > Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore > --- > > Key: KAFKA-8193 > URL: https://issues.apache.org/jira/browse/KAFKA-8193 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.3.0 >Reporter: Konstantine Karantasis >Assignee: Guozhang Wang >Priority: Major > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3576/console] > *14:14:48* org.apache.kafka.streams.integration.MetricsIntegrationTest > > testStreamMetricOfWindowStore STARTED > *14:14:59* > org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetricOfWindowStore > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/streams/build/reports/testOutput/org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetricOfWindowStore.test.stdout > *14:14:59* > *14:14:59* org.apache.kafka.streams.integration.MetricsIntegrationTest > > testStreamMetricOfWindowStore FAILED > *14:14:59* java.lang.AssertionError: Condition not met within timeout 1. > testStoreMetricWindow -> Size of metrics of type:'put-latency-avg' must be > equal to:2 but it's equal to 0 expected:<2> but was:<0> > *14:14:59* at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361) > *14:14:59* at > org.apache.kafka.streams.integration.MetricsIntegrationTest.testStreamMetricOfWindowStore(MetricsIntegrationTest.java:260) > *14:15:01* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8263) Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore
[ https://issues.apache.org/jira/browse/KAFKA-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860599#comment-16860599 ] Matthias J. Sax commented on KAFKA-8263: [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3714/tests] > Flaky Test MetricsIntegrationTest#testStreamMetricOfWindowStore > --- > > Key: KAFKA-8263 > URL: https://issues.apache.org/jira/browse/KAFKA-8263 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: Bruno Cadonna >Priority: Major > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3900/testReport/junit/org.apache.kafka.streams.integration/MetricsIntegrationTest/testStreamMetricOfWindowStore/] > {quote}java.lang.AssertionError: Condition not met within timeout 1. > testStoreMetricWindow -> Size of metrics of type:'put-latency-avg' must be > equal to:2 but it's equal to 0 expected:<2> but was:<0> at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:361){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8418) Connect System tests are not waiting for REST resources to be registered
[ https://issues.apache.org/jira/browse/KAFKA-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8418: --- Fix Version/s: (was: 2.3) > Connect System tests are not waiting for REST resources to be registered > > > Key: KAFKA-8418 > URL: https://issues.apache.org/jira/browse/KAFKA-8418 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.2.0 >Reporter: Oleksandr Diachenko >Assignee: Oleksandr Diachenko >Priority: Blocker > Fix For: 1.0.3, 1.1.2, 2.0.2, 2.3.0, 2.1.2, 2.2.1 > > > I am getting an error while running Kafka tests: > {code} > Traceback (most recent call last): File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 132, in run data = self.run_test() File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 189, in run_test return self.test_context.function(self.test) File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/connect/connect_rest_test.py", > line 89, in test_rest_api assert set([connector_plugin['class'] for > connector_plugin in self.cc.list_connector_plugins()]) == > \{self.FILE_SOURCE_CONNECTOR, self.FILE_SINK_CONNECTOR} File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/services/connect.py", > line 218, in list_connector_plugins return self._rest('/connector-plugins/', > node=node) File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/services/connect.py", > line 234, in _rest raise ConnectRestError(resp.status_code, resp.text, > resp.url) ConnectRestError > {code} > From the logs, I see two messages: > {code} > [2019-05-23 16:09:59,373] INFO REST server listening at > http://172.31.39.205:8083/, advertising URL http://worker1:8083/ > (org.apache.kafka.connect.runtime.rest.RestServer) > {code} > and {code} > [2019-05-23 16:10:00,738] INFO REST resources initialized; server is started > and ready to handle requests > (org.apache.kafka.connect.runtime.rest.RestServer) > {code} > it takes 1365 ms to actually load REST resources, but the test is waiting on > a port to be listening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8265) Connect Client Config Override policy
[ https://issues.apache.org/jira/browse/KAFKA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8265: --- Fix Version/s: (was: 2.3) 2.3.0 > Connect Client Config Override policy > - > > Key: KAFKA-8265 > URL: https://issues.apache.org/jira/browse/KAFKA-8265 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect >Reporter: Magesh kumar Nandakumar >Assignee: Magesh kumar Nandakumar >Priority: Major > Fix For: 2.3.0 > > > Right now, each source connector and sink connector inherit their client > configurations from the worker properties. Within the worker properties, all > configurations that have a prefix of "producer." or "consumer." are applied > to all source connectors and sink connectors respectively. > We should allow the "producer." or "consumer." to be overridden in > accordance to an override policy determined by the administrator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8473) Adjust Connect system tests for incremental cooperative rebalancing and enable them for both eager and incremental cooperative rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8473: --- Affects Version/s: (was: 2.3) 2.3.0 > Adjust Connect system tests for incremental cooperative rebalancing and > enable them for both eager and incremental cooperative rebalancing > -- > > Key: KAFKA-8473 > URL: https://issues.apache.org/jira/browse/KAFKA-8473 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Critical > Fix For: 2.3.0 > > > > {{connect.protocol=compatible}} that enables incremental cooperative > rebalancing if all workers are in that version is now the default option. > System tests should be parameterized to keep running the for eager > rebalancing protocol as well to make sure no regression have happened. > Also, for the incremental cooperative protocol, a few tests need to be > adjusted to have a lower rebalancing delay (the delay that is applied to wait > for returning workers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8475) Temporarily restore SslFactory.sslContext() helper
[ https://issues.apache.org/jira/browse/KAFKA-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8475: --- Fix Version/s: (was: 2.3) 2.3.0 > Temporarily restore SslFactory.sslContext() helper > -- > > Key: KAFKA-8475 > URL: https://issues.apache.org/jira/browse/KAFKA-8475 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Randall Hauch >Priority: Blocker > Fix For: 2.3.0 > > > Temporarily restore the SslFactory.sslContext() function, which some > connectors use. This function is not a public API and it will be removed > eventually. For now, we will mark it as deprecated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8473) Adjust Connect system tests for incremental cooperative rebalancing and enable them for both eager and incremental cooperative rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-8473: --- Fix Version/s: (was: 2.3) 2.3.0 > Adjust Connect system tests for incremental cooperative rebalancing and > enable them for both eager and incremental cooperative rebalancing > -- > > Key: KAFKA-8473 > URL: https://issues.apache.org/jira/browse/KAFKA-8473 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Critical > Fix For: 2.3.0 > > > > {{connect.protocol=compatible}} that enables incremental cooperative > rebalancing if all workers are in that version is now the default option. > System tests should be parameterized to keep running the for eager > rebalancing protocol as well to make sure no regression have happened. > Also, for the incremental cooperative protocol, a few tests need to be > adjusted to have a lower rebalancing delay (the delay that is applied to wait > for returning workers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7988) Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize
[ https://issues.apache.org/jira/browse/KAFKA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860597#comment-16860597 ] Matthias J. Sax commented on KAFKA-7988: Failed in 2.3: [https://builds.apache.org/blue/organizations/jenkins/kafka-2.3-jdk8/detail/kafka-2.3-jdk8/47/tests] {code:java} java.lang.AssertionError: expected:<{0=10, 1=11, 2=12, 3=13, 4=4, 5=5, 6=6, 7=7, 8=8, 9=9}> but was:<{0=10, 1=11, 2=12, 3=13, 4=14, 5=5, 6=6, 7=7, 8=8, 9=9}> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:120) at org.junit.Assert.assertEquals(Assert.java:146) at kafka.server.DynamicBrokerReconfigurationTest.stopAndVerifyProduceConsume(DynamicBrokerReconfigurationTest.scala:1353) at kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:615) at kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:629) {code} > Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize > > > Key: KAFKA-7988 > URL: https://issues.apache.org/jira/browse/KAFKA-7988 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.2.0 >Reporter: Matthias J. Sax >Assignee: Rajini Sivaram >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/30/] > {quote}kafka.server.DynamicBrokerReconfigurationTest > testThreadPoolResize > FAILED java.lang.AssertionError: Invalid threads: expected 6, got 5: > List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, > ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-1) > at org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.assertTrue(Assert.java:41) at > kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1260) > at > kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:531) > at > kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:550) > at > kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:536) > at > kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:559) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at > kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:558) > at > kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:572){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7988) Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize
[ https://issues.apache.org/jira/browse/KAFKA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-7988: --- Affects Version/s: 2.3.0 > Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize > > > Key: KAFKA-7988 > URL: https://issues.apache.org/jira/browse/KAFKA-7988 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.2.0, 2.3.0 >Reporter: Matthias J. Sax >Assignee: Rajini Sivaram >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/30/] > {quote}kafka.server.DynamicBrokerReconfigurationTest > testThreadPoolResize > FAILED java.lang.AssertionError: Invalid threads: expected 6, got 5: > List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, > ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-1) > at org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.assertTrue(Assert.java:41) at > kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1260) > at > kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:531) > at > kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:550) > at > kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:536) > at > kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:559) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at > kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:558) > at > kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:572){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8078) Flaky Test TableTableJoinIntegrationTest#testInnerInner
[ https://issues.apache.org/jira/browse/KAFKA-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860596#comment-16860596 ] Matthias J. Sax commented on KAFKA-8078: Failed again with timeout: [https://builds.apache.org/blue/organizations/jenkins/kafka-2.3-jdk8/detail/kafka-2.3-jdk8/47/tests] testLeftOuter, caching enabled > Flaky Test TableTableJoinIntegrationTest#testInnerInner > --- > > Key: KAFKA-8078 > URL: https://issues.apache.org/jira/browse/KAFKA-8078 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Priority: Major > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3445/tests] > {quote}java.lang.AssertionError: Condition not met within timeout 15000. > Never received expected final result. > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325) > at > org.apache.kafka.streams.integration.AbstractJoinIntegrationTest.runTest(AbstractJoinIntegrationTest.java:246) > at > org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerInner(TableTableJoinIntegrationTest.java:196){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
[ https://issues.apache.org/jira/browse/KAFKA-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860594#comment-16860594 ] Matthias J. Sax commented on KAFKA-8041: Failed again: [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/275/tests] > Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll > - > > Key: KAFKA-8041 > URL: https://issues.apache.org/jira/browse/KAFKA-8041 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.0.1, 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests] > {quote}java.lang.AssertionError: Expected some messages > at kafka.utils.TestUtils$.fail(TestUtils.scala:357) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787) > at > kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189) > at > kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote} > STDOUT > {quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, > leaderId=0, fetcherId=0] Error for partition topic-6 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-10 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-4 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-8 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-2 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 > in dir > /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216 > (kafka.server.LogDirFailureChannel:76) > java.io.FileNotFoundException: > /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index > (Not a directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.(RandomAccessFile.java:243) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121) > at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115) > at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184) > at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184) > at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501) > at kafka.log.Log.$anonfun$roll$8(Log.scala:1520) > at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520) > at scala.Option.foreach(Option.scala:257) > at kafka.log.Log.$anonfun$roll$2(Log.scala:1520) > at kafka.log.Log.maybeHandleIOException(Log.scala:1881) > at kafka.log.Log.roll(Log.scala:1484) > at > kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154) > at > kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Comment Edited] (KAFKA-8516) Consider allowing all replicas to have read/write permissions
[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860591#comment-16860591 ] Richard Yu edited comment on KAFKA-8516 at 6/11/19 6:16 AM: Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model loosely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? Edit: If implemented correctly, there should be a pretty good performance gain (results in RAFT paper anyways seem to indicate this). was (Author: yohan123): Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model closely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? Edit: If implemented correctly, there should be a pretty good performance gain (results in RAFT paper anyways seem to indicate this). > Consider allowing all replicas to have read/write permissions > - > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement >Reporter: Richard Yu >Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. > Relevant KIP for read permissions can be found here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-8516) Consider allowing all replicas to have read/write permissions
[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860591#comment-16860591 ] Richard Yu edited comment on KAFKA-8516 at 6/11/19 6:15 AM: Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model closely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? Edit: If implemented correctly, there should be a pretty good performance gain (results in RAFT paper anyways seem to indicate this). was (Author: yohan123): Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model closely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? > Consider allowing all replicas to have read/write permissions > - > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement >Reporter: Richard Yu >Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. > Relevant KIP for read permissions can be found here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8516) Consider allowing all replicas to have read/write permissions
[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860591#comment-16860591 ] Richard Yu commented on KAFKA-8516: --- Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model closely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? > Consider allowing all replicas to have read/write permissions > - > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement >Reporter: Richard Yu >Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. > Relevant KIP for read permissions can be found here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8450) Augment processed in MockProcessor as KeyValueAndTimestamp
[ https://issues.apache.org/jira/browse/KAFKA-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860589#comment-16860589 ] Matthias J. Sax commented on KAFKA-8450: Yes. Instead of calling `makeRecord` that returns a String, the idea is to use `KeyValueTimestamp` instead. > Augment processed in MockProcessor as KeyValueAndTimestamp > -- > > Key: KAFKA-8450 > URL: https://issues.apache.org/jira/browse/KAFKA-8450 > Project: Kafka > Issue Type: Improvement > Components: streams, unit tests >Reporter: Guozhang Wang >Assignee: SuryaTeja Duggi >Priority: Major > Labels: newbie > > Today the book-keeping list of `processed` records in MockProcessor is in the > form of String, in which we just call the key / value type's toString > function in order to book-keep, this loses the type information as well as > not having timestamp associated with it. > It's better to translate its type to `KeyValueAndTimestamp` and refactor > impacted unit tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8513) Add kafka-streams-application-reset.bat for Windows platform
[ https://issues.apache.org/jira/browse/KAFKA-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860588#comment-16860588 ] Matthias J. Sax commented on KAFKA-8513: Ack. Fine with me. The PR looks good in general. > Add kafka-streams-application-reset.bat for Windows platform > > > Key: KAFKA-8513 > URL: https://issues.apache.org/jira/browse/KAFKA-8513 > Project: Kafka > Issue Type: Improvement > Components: streams, tools >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > For improving Windows support, it'd be nice if there were a batch file > corresponding to bin/kafka-streams-application-reset.sh. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8516) Consider allowing all replicas to have read/write permissions
[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860587#comment-16860587 ] Matthias J. Sax commented on KAFKA-8516: Agreed. In fact, I am not even sure if we can (or want) to allow writing to different replicas at all. Solving the consistency problem is very, very(!) hard, and might not be possible without a major performance hit. Hence, I tend to think that it will never be implemented. > Consider allowing all replicas to have read/write permissions > - > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement >Reporter: Richard Yu >Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. > Relevant KIP for read permissions can be found here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)