[jira] [Created] (KAFKA-10509) Add metric to track throttle time due to hitting connection rate quota
Anna Povzner created KAFKA-10509: Summary: Add metric to track throttle time due to hitting connection rate quota Key: KAFKA-10509 URL: https://issues.apache.org/jira/browse/KAFKA-10509 Project: Kafka Issue Type: Improvement Components: core Reporter: Anna Povzner Assignee: Anna Povzner Fix For: 2.7.0 See KIP-612. kafka.network:type=socket-server-metrics,name=connection-accept-throttle-time,listener=\{listenerName} * Type: SampledStat.Avg * Description: Average throttle time due to violating per-listener or broker-wide connection acceptance rate quota on a given listener. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-10458) Need a way to update quota for TokenBucket registered with Sensor
[ https://issues.apache.org/jira/browse/KAFKA-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner resolved KAFKA-10458. -- Resolution: Fixed > Need a way to update quota for TokenBucket registered with Sensor > - > > Key: KAFKA-10458 > URL: https://issues.apache.org/jira/browse/KAFKA-10458 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Anna Povzner >Assignee: David Jacot >Priority: Major > Fix For: 2.7.0 > > > For Rate() metric with quota config, we update quota by updating config of > KafkaMetric. However, it is not enough for TokenBucket, because it uses quota > config on record() to properly calculate the number of tokens. Sensor passes > config stored in the corresponding StatAndConfig, which currently never > changes. This means that after updating quota via KafkaMetric.config, our > current and only method, Sensor will record the value using old quota but > then measure the value to check for quota violation using the new quota > value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10458) Need a way to update quota for TokenBucket registered with Sensor
Anna Povzner created KAFKA-10458: Summary: Need a way to update quota for TokenBucket registered with Sensor Key: KAFKA-10458 URL: https://issues.apache.org/jira/browse/KAFKA-10458 Project: Kafka Issue Type: Improvement Components: clients Reporter: Anna Povzner Assignee: Anna Povzner Fix For: 2.7.0 For Rate() metric with quota config, we update quota by updating config of KafkaMetric. However, it is not enough for TokenBucket, because it uses quota config on record() to properly calculate the number of tokens. Sensor passes config stored in the corresponding StatAndConfig, which currently never changes. This means that after updating quota via KafkaMetric.config, our current and only method, Sensor will record the value using old quota but then measure the value to check for quota violation using the new quota value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10157) Multiple tests failed due to "Failed to process feature ZK node change event"
Anna Povzner created KAFKA-10157: Summary: Multiple tests failed due to "Failed to process feature ZK node change event" Key: KAFKA-10157 URL: https://issues.apache.org/jira/browse/KAFKA-10157 Project: Kafka Issue Type: Bug Reporter: Anna Povzner Multiple tests failed due to "Failed to process feature ZK node change event". Looks like a result of merge of this PR: [https://github.com/apache/kafka/pull/8680] Note that running tests without `--info` gives output like this one: {quote}Process 'Gradle Test Executor 36' finished with non-zero exit value 1 {quote} kafka.network.DynamicConnectionQuotaTest failed: {quote} kafka.network.DynamicConnectionQuotaTest > testDynamicConnectionQuota STANDARD_OUT [2020-06-11 20:52:42,596] ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76) java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090) at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147){quote} kafka.api.CustomQuotaCallbackTest failed: {quote} [2020-06-11 21:07:36,745] ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76) java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090) at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.util.control.Exception$Catch.apply(Exception.scala:227) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.util.control.Exception$Catch.apply(Exception.scala:227) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) {quote} kafka.server.DynamicBrokerReconfigurationTest failed: {quote} [2020-06-11 21:13:01,207] ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76) java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090) at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.util.control.Exception$Catch.apply(Exception.scala:227) at kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10024) Add dynamic configuration and enforce quota for per-IP connection rate limits
Anna Povzner created KAFKA-10024: Summary: Add dynamic configuration and enforce quota for per-IP connection rate limits Key: KAFKA-10024 URL: https://issues.apache.org/jira/browse/KAFKA-10024 Project: Kafka Issue Type: Improvement Components: core Reporter: Anna Povzner Assignee: Anna Povzner This JIRA is for the second part of KIP-612 – Add per-IP connection creation rate limits. As described here: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10023) Enforce broker-wide and per-listener connection creation rate (KIP-612, part 1)
Anna Povzner created KAFKA-10023: Summary: Enforce broker-wide and per-listener connection creation rate (KIP-612, part 1) Key: KAFKA-10023 URL: https://issues.apache.org/jira/browse/KAFKA-10023 Project: Kafka Issue Type: Improvement Components: core Reporter: Anna Povzner Assignee: Anna Povzner This JIRA is for the first part of KIP-612 – Add an ability to configure and enforce broker-wide and per-listener connection creation rate. As described here: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
Anna Povzner created KAFKA-9839: --- Summary: IllegalStateException on metadata update when broker learns about its new epoch after the controller Key: KAFKA-9839 URL: https://issues.apache.org/jira/browse/KAFKA-9839 Project: Kafka Issue Type: Bug Components: controller, core Affects Versions: 2.3.1 Reporter: Anna Povzner Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY" on UPDATE_METADATA when the controller learns about the broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker completes (the broker learns about its new epoch). Here is the scenario we observed in more detail: 1. ZK session expires on broker 1 2. Broker 1 establishes new session to ZK and creates znode 3. Controller learns about broker 1 and assigns epoch 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know about its new epoch yet, so we get an exception: ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, api=UPDATE_METADATA, body={ . java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at kafka.server.KafkaApis.handle(KafkaApis.scala:139) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748) 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the created znode at /brokers/ids/1" The result is the broker has a stale metadata for some time. Possible solutions: 1. Broker returns a more specific error and controller retries UPDATE_MEDATA 2. Broker accepts UPDATE_METADATA with larger broker epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9677) Low consume bandwidth quota may cause consumer not being able to fetch data
Anna Povzner created KAFKA-9677: --- Summary: Low consume bandwidth quota may cause consumer not being able to fetch data Key: KAFKA-9677 URL: https://issues.apache.org/jira/browse/KAFKA-9677 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.1.1, 2.0.1 Reporter: Anna Povzner Assignee: Anna Povzner When we changed quota communication with KIP-219, fetch requests get throttled by returning empty response with the delay in `throttle_time_ms` and Kafka consumer retrying again after the delay. With default configs, the maximum fetch size could be as big as 50MB (or 10MB per partition). The default broker config (1-second window, 10 full windows of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer quota (per broker) may stop fetch request from ever being successful. Or the other way around: 1 MB/s consumer quota (per broker) means that any fetch request that gets >= 10MB of data (10 seconds * 1MB/second) in the response will never get through. h3. Proposed fix Return less data in fetch response in this case: Cap `fetchMaxBytes` passed to replicaManager.fetchMessages() from KafkaApis.handleFetchRequest() to * . In the example of default configs and 1MB/s consumer bandwidth quota, fetchMaxBytes will be 10MB. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9658) Removing default user quota doesn't take effect until broker restart
Anna Povzner created KAFKA-9658: --- Summary: Removing default user quota doesn't take effect until broker restart Key: KAFKA-9658 URL: https://issues.apache.org/jira/browse/KAFKA-9658 Project: Kafka Issue Type: Bug Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.1.1, 2.0.1 Reporter: Anna Povzner Assignee: Anna Povzner To reproduce (for any quota type: produce, consume, and request): Example with consumer quota, assuming no user/client quotas are set initially. 1. Set default user consumer quotas: {{./kafka-configs.sh --zookeeper --alter --add-config 'consumer_byte_rate=1' --entity-type users --entity-default}} {{2. Send some consume load for some user, say user1.}} {{3. Remove default user consumer quota using:}} {{./kafka-configs.sh --zookeeper --alter --delete-config 'consumer_byte_rate' --entity-type users --entity-default}} Result: --describe (as below) returns correct result that there is no quota, but quota bound in ClientQuotaManager.metrics does not get updated for users that were sending load, which causes the broker to continue throttling requests with the previously set quota. {{/opt/confluent/bin/kafka-configs.sh --zookeeper --describe --entity-type users --entity-default}} {{}}{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-8800) Flaky Test SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
[ https://issues.apache.org/jira/browse/KAFKA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner reopened KAFKA-8800: - Assignee: Anastasia Vela (was: Lee Dongjin) > Flaky Test > SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl > -- > > Key: KAFKA-8800 > URL: https://issues.apache.org/jira/browse/KAFKA-8800 > Project: Kafka > Issue Type: Bug > Components: core, security, unit tests >Affects Versions: 2.4.0 >Reporter: Matthias J. Sax >Assignee: Anastasia Vela >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0, 2.3.1 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6956/testReport/junit/kafka.api/SaslScramSslEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/] > {quote}org.scalatest.exceptions.TestFailedException: Consumed 0 records > before timeout instead of the expected 1 records at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) > at org.scalatest.Assertions.fail(Assertions.scala:1091) at > org.scalatest.Assertions.fail$(Assertions.scala:1087) at > org.scalatest.Assertions$.fail(Assertions.scala:1389) at > kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:822) at > kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:781) at > kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1312) at > kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1320) at > kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:522) > at > kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:361){quote} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KAFKA-8837) KafkaMetricReporterClusterIdTest may not shutdown ZooKeeperTestHarness
Anna Povzner created KAFKA-8837: --- Summary: KafkaMetricReporterClusterIdTest may not shutdown ZooKeeperTestHarness Key: KAFKA-8837 URL: https://issues.apache.org/jira/browse/KAFKA-8837 Project: Kafka Issue Type: Bug Components: core Reporter: Anna Povzner Assignee: Anastasia Vela @After method in KafkaMetricReporterClusterIdTest calls `TestUtils.verifyNonDaemonThreadsStatus(this.getClass.getName)` before it calls tearDown on ZooKeeperTestHarness (which shut downs ZK and zk client). If verifyNonDaemonThreadsStatus asserts, the rest of the resources will not get cleaned up. We should move `TestUtils.verifyNonDaemonThreadsStatus(this.getClass.getName)` to the end of `tearDown()`. However, would also be good to prevent people using this method in tear down similarly in the future. Maybe just adding a comment would help here. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KAFKA-8782) ReplicationQuotaManagerTest and ClientQuotaManagerTest should close Metrics object
Anna Povzner created KAFKA-8782: --- Summary: ReplicationQuotaManagerTest and ClientQuotaManagerTest should close Metrics object Key: KAFKA-8782 URL: https://issues.apache.org/jira/browse/KAFKA-8782 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.2.1 Reporter: Anna Povzner ReplicationQuotaManagerTest and ClientQuotaManagerTest create Metrics objects in several tests, but do not close() them in the end. It would be good to cleanup resources in those tests, which also helps with reducing overall test flakiness. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KAFKA-8526) Broker may select a failed dir for new replica even in the presence of other live dirs
Anna Povzner created KAFKA-8526: --- Summary: Broker may select a failed dir for new replica even in the presence of other live dirs Key: KAFKA-8526 URL: https://issues.apache.org/jira/browse/KAFKA-8526 Project: Kafka Issue Type: Bug Affects Versions: 2.2.1, 2.1.1, 2.0.1, 1.1.1, 2.3.0 Reporter: Anna Povzner Suppose a broker is configured with multiple log dirs. One of the log dirs fails, but there is no load on that dir, so the broker does not know about the failure yet, _i.e._, the failed dir is still in LogManager#_liveLogDirs. Suppose a new topic gets created, and the controller chooses the broker with failed log dir to host one of the replicas. The broker gets LeaderAndIsr request with isNew flag set. LogManager#getOrCreateLog() selects a log dir for the new replica from _liveLogDirs, then one two things can happen: 1) getAbsolutePath can fail, in which case getOrCreateLog will throw an IOException 2) Creating directory for new the replica log may fail (_e.g._, if directory becomes read-only, so getAbsolutePath worked). In both cases, the selected dir will be marked offline (which is correct). However, LeaderAndIsr will return an error and replica will be marked offline, even though the broker may have other live dirs. *Proposed solution*: Broker should retry selecting a dir for the new replica, if initially selected dir threw an IOException when trying to create a directory for the new replica. We should be able to do that in LogManager#getOrCreateLog() method, but keep in mind that logDirFailureChannel.maybeAddOfflineLogDir does not synchronously removes the dir from _liveLogDirs. So, it makes sense to select initial dir by calling LogManager#nextLogDir (current implementation), but if we fail to create log on that dir, one approach is to select next dir from _liveLogDirs in round-robin fashion (until we get to initial log dir – the case where all dirs failed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8481) Clients may fetch incomplete set of topic partitions just after topic is created
Anna Povzner created KAFKA-8481: --- Summary: Clients may fetch incomplete set of topic partitions just after topic is created Key: KAFKA-8481 URL: https://issues.apache.org/jira/browse/KAFKA-8481 Project: Kafka Issue Type: Bug Affects Versions: 2.2.1 Reporter: Anna Povzner KafkaConsumer#partitionsFor() or AdminClient#describeTopics() may return incomplete set of partitions for the given topic if the topic just got created. Cause: When topic gets created, in most cases, controller sends partitions of this topics via several UpdateMetadataRequests (vs. one UpdateMetadataRequest with all partitions). First UpdateMetadataRequest contains partitions for which this broker hosts replicas, and then one or more UpdateMetadataRequest for the remaining partitions. This means that if a broker gets topic metadata requests between first and last UpdateMetadataRequest, the response will contain only subset of topic partitions. Proposed fix: In KafkaController#processTopicChange(), before calling OnNewPartitionCreation(), send UpdateRequestMetadata with partitions of new topics (addedPartitionReplicaAssignment) to all live brokers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8480) Clients may fetch incomplete set of topic partitions during cluster startup
Anna Povzner created KAFKA-8480: --- Summary: Clients may fetch incomplete set of topic partitions during cluster startup Key: KAFKA-8480 URL: https://issues.apache.org/jira/browse/KAFKA-8480 Project: Kafka Issue Type: Bug Affects Versions: 2.2.1 Reporter: Anna Povzner Assignee: Anna Povzner KafkaConsumer#partitionsFor() or AdminClient#describeTopics() may return not all partitions for a given topic when the cluster is starting up (after cluster was down). The cause is controller, on becoming a controller, sending UpdateMetadataRequest for all partitions with at least one online replica, and then a separate UpdateMetadataRequest for all partitions with at least one offline replica. If client sends metadata request in between broker processing those two update metadata requests, clients will get incomplete set of partitions. Proposed fix: controller should send one UpdateMetadataRequest (containing all partitions) in ReplicaStateMachine#startup(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8002) Replica reassignment to new log dir may not complete if future and current replicas segment files have different base offsets
Anna Povzner created KAFKA-8002: --- Summary: Replica reassignment to new log dir may not complete if future and current replicas segment files have different base offsets Key: KAFKA-8002 URL: https://issues.apache.org/jira/browse/KAFKA-8002 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.1.1 Reporter: Anna Povzner Once future replica fetches log end offset, the intended logic is to finish the move (and rename the future dir to current replica dir, etc). However, the check in Partition.maybeReplaceCurrentWithFutureReplica compares the whole LogOffsetMetadata vs. log end offset. The resulting behavior is that the re-assignment will not finish for topic partitions that were cleaned/ compacted such that base offset of the last segment is different for the current and future replica. The proposed fix is to compare only log end offsets of the current and future replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8001) Fetch from future replica stalls when local replica becomes a leader
Anna Povzner created KAFKA-8001: --- Summary: Fetch from future replica stalls when local replica becomes a leader Key: KAFKA-8001 URL: https://issues.apache.org/jira/browse/KAFKA-8001 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.1.1 Reporter: Anna Povzner With KIP-320, fetch from follower / future replica returns FENCED_LEADER_EPOCH if current leader epoch in the request is lower than the leader epoch known to the leader (or local replica in case of future replica fetching). In case of future replica fetching from the local replica, if local replica becomes the leader of the partition, the next fetch from future replica fails with FENCED_LEADER_EPOCH and fetching from future replica is stopped until the next leader change. Proposed solution: on local replica leader change, future replica should "become a follower" again, and go through the truncation phase. Or we could optimize it, and just update partition state of the future replica to reflect the updated current leader epoch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7923) Add unit test to verify Kafka-7401 in AK versions >= 2.0
Anna Povzner created KAFKA-7923: --- Summary: Add unit test to verify Kafka-7401 in AK versions >= 2.0 Key: KAFKA-7923 URL: https://issues.apache.org/jira/browse/KAFKA-7923 Project: Kafka Issue Type: Test Affects Versions: 2.1.0, 2.0.1 Reporter: Anna Povzner Assignee: Anna Povzner Kafka-7401 affected versions 1.0 and 1.1, which was fixed and the unit test was added. Versions 2.0 did not have that bug, because it was fixed as part of another change. To make sure we don't regress, we need to add a similar unit test that was added as part of Kafka-7401. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7786) Fast update of leader epoch may stall partition fetching due to FENCED_LEADER_EPOCH
Anna Povzner created KAFKA-7786: --- Summary: Fast update of leader epoch may stall partition fetching due to FENCED_LEADER_EPOCH Key: KAFKA-7786 URL: https://issues.apache.org/jira/browse/KAFKA-7786 Project: Kafka Issue Type: Bug Affects Versions: 2.1.0 Reporter: Anna Povzner KIP-320/KAFKA-7395 Added FENCED_LEADER_EPOCH error response to a OffsetsForLeaderEpoch request if the epoch in the request is lower than the broker's leader epoch. ReplicaFetcherThread builds a OffsetsForLeaderEpoch request under _partitionMapLock_, sends the request outside the lock, and then processes the response under _partitionMapLock_. The broker may receive LeaderAndIsr with the same leader but with the next leader epoch, remove and add partition to the fetcher thread (with partition state reflecting the updated leader epoch) – all while the OffsetsForLeaderEpoch request (with the old leader epoch) is still outstanding/ waiting for the lock to process the OffsetsForLeaderEpoch response. As a result, partition gets removed from partitionStates and this broker will not fetch for this partition until the next LeaderAndIsr which may take a while. We will see log message like this: [2018-12-23 07:23:04,802] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Partition test_topic-17 has an older epoch (7) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaFetcherThread) We saw this happen with kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.reassign_from_offset_zero=True. This test does partition re-assignment while bouncing 2 out of 4 total brokers. When the failure happen, each bounced broker was also a controller. Because of re-assignment, the controller updates leader epoch without updating the leader on controller change or on broker startup, so we see several leader epoch changes without the leader change, which increases the likelihood of the race condition described above. Here is exact events that happen in this test (around the failure): We have 4 brokers Brokers 1, 2, 3, 4. Partition re-assignment is started for test_topic-17 [2, 4, 1] —> [3, 1, 2]. At time t0, leader of test_topic-17 is broker 2. # clean shutdown of broker 3, which is also a controller # broker 4 becomes controller, continues re-assignment and updates leader epoch for test_topic-17 to 6 (with same leader) # broker 2 (leader of test_topic-17) receives new leader epoch: “test_topic-17 starts at Leader Epoch 6 from offset 1388. Previous Leader Epoch was: 5” # broker 3 is started again after clean shutdown # controller sees broker 3 startup, and sends LeaderAndIsr(leader epoch 6) to broker 3 # controller updates leader epoch to 7 # broker 2 (leader of test_topic-17) receives LeaderAndIsr for leader epoch 7: “test_topic-17 starts at Leader Epoch 7 from offset 1974. Previous Leader Epoch was: 6” # broker 3 receives LeaderAndIsr for test_topic-17 and leader epoch 6 from controller: “Added fetcher to broker BrokerEndPoint(id=2) for leader epoch 6” and sends OffsetsForLeaderEpoch request to broker 2 # broker 3 receives LeaderAndIsr for test_topic-17 and leader epoch 7 from controller; removes fetcher thread and adds fetcher thread + executes AbstractFetcherThread.addPartitions() which updates partition state with leader epoch 7 # broker 3 receives FENCED_LEADER_EPOCH in response to OffsetsForLeaderEpoch(leader epoch 6), because the leader received LeaderAndIsr for leader epoch 7 before it got OffsetsForLeaderEpoch(leader epoch 6) from broker 3. As a result, it removes partition from partitionStates and it does not fetch until controller updates leader epoch and sends LeaderAndIsr for this partition to broker 3. The test fails, because re-assignment does not finish on time (due to broker 3 not fetching). One way to address this is possibly add more state to PartitionFetchState. However, we may introduce other race condition. A cleaner way, I think, is to return leader epoch in the OffsetsForLeaderEpoch response with FENCED_LEADER_EPOCH error, and then ignore the error if partition state contains a higher leader epoch. The advantage is less state maintenance, but disadvantage is it requires bumping inter-broker protocol. h1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7415) OffsetsForLeaderEpoch may incorrectly respond with undefined epoch causing truncation to HW
Anna Povzner created KAFKA-7415: --- Summary: OffsetsForLeaderEpoch may incorrectly respond with undefined epoch causing truncation to HW Key: KAFKA-7415 URL: https://issues.apache.org/jira/browse/KAFKA-7415 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 2.0.0 Reporter: Anna Povzner If the follower's last appended epoch is ahead of the leader's last appended epoch, the OffsetsForLeaderEpoch response will incorrectly send (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET), and the follower will truncate to HW. This may lead to data loss in some rare cases where 2 back-to-back leader elections happen (failure of one leader, followed by quick re-election of the next leader due to preferred leader election, so that all replicas are still in the ISR, and then failure of the 3rd leader). The bug is in LeaderEpochFileCache.endOffsetFor(), which returns (UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET) if the requested leader epoch is ahead of the last leader epoch in the cache. The method should return (last leader epoch in the cache, LEO) in this scenario. Here is an example of a scenario where the issue leads to the data loss. Suppose we have three replicas: r1, r2, and r3. Initially, the ISR consists of (r1, r2, r3) and the leader is r1. The data up to offset 10 has been committed to the ISR. Here is the initial state: {code:java} Leader: r1 leader epoch: 0 ISR(r1, r2, r3) r1: [hw=10, leo=10] r2: [hw=8, leo=10] r3: [hw=5, leo=10] {code} Replica 1 fails and leaves the ISR, which makes Replica 2 the new leader with leader epoch = 1. The leader appends a batch, but it is not replicated yet to the followers. {code:java} Leader: r2 leader epoch: 1 ISR(r2, r3) r1: [hw=10, leo=10] r2: [hw=8, leo=11] r3: [hw=5, leo=10] {code} Replica 3 is elected a leader (due to preferred leader election) before it has a chance to truncate, with leader epoch 2. {code:java} Leader: r3 leader epoch: 2 ISR(r2, r3) r1: [hw=10, leo=10] r2: [hw=8, leo=11] r3: [hw=5, leo=10] {code} Replica 2 sends OffsetsForLeaderEpoch(leader epoch = 1) to Replica 3. Replica 3 incorrectly replies with UNDEFINED_EPOCH_OFFSET, and Replica 2 truncates to HW. If Replica 3 fails before Replica 2 re-fetches the data, this may lead to data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7151) Broker running out of disk space may result in state where unclean leader election is required
Anna Povzner created KAFKA-7151: --- Summary: Broker running out of disk space may result in state where unclean leader election is required Key: KAFKA-7151 URL: https://issues.apache.org/jira/browse/KAFKA-7151 Project: Kafka Issue Type: Bug Reporter: Anna Povzner We have seen situations like the following: 1) Broker A is a leader for topic partition, and brokers B and C are the followers 2) Broker A is running out of disk space, shrinks ISR only to itself, and then sometime later gets disk errors, etc. 3) Broker A is stopped, disk space is reclaimed, and broker A is restarted Result: Broker A becomes a leader, but followers cannot fetch because their log is ahead. The only way to continue is to enable unclean leader election. There are several issues here: -- if the machine is running out of disk space, we do not reliably get an error from a file system as soon as that happens. The broker could be in a state where some writes succeed (possibly if the write is not flushed to disk) and some writes fails, or maybe fail later. This may cause fetchers fetch records that are still in the leader's file system cache, and then the flush to disk failing on the leader, causes followers to be ahead of the leader. -- I am not sure exactly why, but it seems like the leader broker (that is running out of disk space) may also stop servicing fetch requests making followers fall behind and kicked out of ISR. Ideally, the broker should stop being a leader for any topic partition before accepting any records that may fail to be flushed to disk. One option is to automatically detect disk space usage and make a broker read-only for topic partitions if disk space gets to 80% or something. Maybe there is a better option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7150) Error in processing fetched data for one partition may stop follower fetching other partitions
Anna Povzner created KAFKA-7150: --- Summary: Error in processing fetched data for one partition may stop follower fetching other partitions Key: KAFKA-7150 URL: https://issues.apache.org/jira/browse/KAFKA-7150 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 1.1.0, 1.0.2, 0.11.0.3, 0.10.2.2 Reporter: Anna Povzner If the followers fails to process data for one topic partitions, like out of order offsets error, the whole ReplicaFetcherThread is killed, which also stops fetching for other topic partitions serviced by this fetcher thread. This may result in un-necessary under-replicated partitions. I think it would be better to continue fetching for other topic partitions, and just remove the partition with an error from the responsibility of the fetcher thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7104) ReplicaFetcher thread may die because of inconsistent log start offset in fetch response
Anna Povzner created KAFKA-7104: --- Summary: ReplicaFetcher thread may die because of inconsistent log start offset in fetch response Key: KAFKA-7104 URL: https://issues.apache.org/jira/browse/KAFKA-7104 Project: Kafka Issue Type: Bug Affects Versions: 1.1.0, 1.0.0 Reporter: Anna Povzner Assignee: Anna Povzner What we saw: The follower fetches offset 116617, which it was able successfully append. However, leader's log start offset in fetch request was 116753, which was higher than fetched offset 116617. When replica fetcher thread tried to increment log start offset to leader's log start offset, it failed with OffsetOutOfRangeException: [2018-06-23 00:45:37,409] ERROR [ReplicaFetcher replicaId=1002, leaderId=1001, fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread) kafka.common.KafkaException: Error processing data for partition X-N offset 116617 Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the log start offset to 116753 of partition X-N since it is larger than the high watermark 116619 In leader's log, we see that log start offset was incremented almost at the same time (within one 100 ms or so). [2018-06-23 00:45:37,339] INFO Incrementing log start offset of partition X-N to 116753 in dir /kafka/kafka-logs (kafka.log.Log) In leader's logic: ReplicaManager#ReplicaManager first calls readFromLocalLog() that reads from local log and returns LogReadResult that contains fetched data and leader's log start offset and HW. However, it then calls ReplicaManager#updateFollowerLogReadResults() that may move leader's log start offset and update leader's log start offset and HW in fetch response. If deleteRecords() happens in between, it is possible that log start offset may move beyond fetched offset. As a result, fetch response will contain fetched data but log start offset that is beyond fetched offset (and indicate the state on leader that fetched data does not actually exist anymore on leader). When a follower receives such fetch response, it will first append, then move it's HW no further than its LEO, which maybe less than leader's log start offset in fetch response, and then call `replica.maybeIncrementLogStartOffset(leaderLogStartOffset)` which will throw OffsetOutOfRangeException exception causing the fetcher thread to stop. *Suggested fix:* If the leader moves log start offset beyond fetched offset, ReplicaManager#updateFollowerLogReadResults() should update the log read result with OFFSET_OUT_OF_RANGE error, which will cause the follower to reset fetch offset to leader's log start offset. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6975) AdminClient.deleteRecords() may cause replicas unable to fetch from beginning
Anna Povzner created KAFKA-6975: --- Summary: AdminClient.deleteRecords() may cause replicas unable to fetch from beginning Key: KAFKA-6975 URL: https://issues.apache.org/jira/browse/KAFKA-6975 Project: Kafka Issue Type: Bug Affects Versions: 1.1.0 Reporter: Anna Povzner Assignee: Anna Povzner AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to the requested offset. If the requested offset is in the middle of the batch, the replica will not be able to fetch from that offset (because it is in the middle of the batch). One use-case where this could cause problems is replica re-assignment. Suppose we have a topic partition with 3 initial replicas, and at some point the user issues AdminClient.deleteRecords() for the offset that falls in the middle of the batch. It now becomes log start offset for this topic partition. Suppose at some later time, the user starts partition re-assignment to 3 new replicas. The new replicas (followers) will start with HW = 0, will try to fetch from 0, then get "out of order offset" because 0 < log start offset (LSO); the follower will be able to reset offset to LSO of the leader and fetch LSO; the leader will send a batch in response with base offset
[jira] [Resolved] (KAFKA-6795) Add unit test for ReplicaAlterLogDirsThread
[ https://issues.apache.org/jira/browse/KAFKA-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner resolved KAFKA-6795. - Resolution: Fixed Fix Version/s: 2.0.0 > Add unit test for ReplicaAlterLogDirsThread > --- > > Key: KAFKA-6795 > URL: https://issues.apache.org/jira/browse/KAFKA-6795 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Major > Fix For: 2.0.0 > > > ReplicaAlterLogDirsThread was added as part of KIP-113 implementation, but > there is no unit test. > [~lindong] I assigned this to myself, since ideally I wanted to add unit > tests for KAFKA-6361 related changes (KIP-279), but feel free to re-assign. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6859) Follower should not send OffsetForLeaderEpoch for undefined leader epochs
Anna Povzner created KAFKA-6859: --- Summary: Follower should not send OffsetForLeaderEpoch for undefined leader epochs Key: KAFKA-6859 URL: https://issues.apache.org/jira/browse/KAFKA-6859 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Reporter: Anna Povzner This is more of an optimization, rather than correctness. Currently, if the follower on inter broker protocol version 0.11 and higher, but on older message format, it does not track leader epochs. However, will still send OffsetForLeaderEpoch request to the leader with undefined epoch which is guaranteed to return undefined offset, so that the follower truncated to high watermark. Another example is a bootstrapping follower that does not have any leader epochs recorded, It is cleaner and more efficient to not send OffsetForLeaderEpoch requests to the follower with undefined leader epochs, since we already know the answer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6824) Transient failure in DynamicBrokerReconfigurationTest.testAddRemoveSslListener
Anna Povzner created KAFKA-6824: --- Summary: Transient failure in DynamicBrokerReconfigurationTest.testAddRemoveSslListener Key: KAFKA-6824 URL: https://issues.apache.org/jira/browse/KAFKA-6824 Project: Kafka Issue Type: Bug Reporter: Anna Povzner Saw in my PR build (*JDK 7 and Scala 2.11* ): *17:20:49* kafka.server.DynamicBrokerReconfigurationTest > testAddRemoveSslListener FAILED *17:20:49* java.lang.AssertionError: expected:<10> but was:<12> *17:20:49* at org.junit.Assert.fail(Assert.java:88) *17:20:49* at org.junit.Assert.failNotEquals(Assert.java:834) *17:20:49* at org.junit.Assert.assertEquals(Assert.java:645) *17:20:49* at org.junit.Assert.assertEquals(Assert.java:631) *17:20:49* at kafka.server.DynamicBrokerReconfigurationTest.verifyProduceConsume(DynamicBrokerReconfigurationTest.scala:959) *17:20:49* at kafka.server.DynamicBrokerReconfigurationTest.verifyRemoveListener(DynamicBrokerReconfigurationTest.scala:784) *17:20:49* at kafka.server.DynamicBrokerReconfigurationTest.testAddRemoveSslListener(DynamicBrokerReconfigurationTest.scala:705) *17:20:50* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6823) Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize
Anna Povzner created KAFKA-6823: --- Summary: Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize Key: KAFKA-6823 URL: https://issues.apache.org/jira/browse/KAFKA-6823 Project: Kafka Issue Type: Bug Reporter: Anna Povzner Saw in my PR build (*DK 10 and Scala 2.12 ):* *15:58:46* kafka.server.DynamicBrokerReconfigurationTest > testThreadPoolResize FAILED *15:58:46* java.lang.AssertionError: Invalid threads: expected 6, got 7: List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, ReplicaFetcherThread-1-0, ReplicaFetcherThread-0-1) *15:58:46* at org.junit.Assert.fail(Assert.java:88) *15:58:46* at org.junit.Assert.assertTrue(Assert.java:41) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1147) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:412) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:431) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:417) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:440) *15:58:46* at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:439) *15:58:46* at kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:453) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6809) connections-created metric does not behave as expected
Anna Povzner created KAFKA-6809: --- Summary: connections-created metric does not behave as expected Key: KAFKA-6809 URL: https://issues.apache.org/jira/browse/KAFKA-6809 Project: Kafka Issue Type: Bug Affects Versions: 1.0.1 Reporter: Anna Povzner "connections-created" sensor is described as "new connections established". It currently records only connections that the broker creates, but does not count connections received. Seems like we should also count connections received – either include them into this metric (and also clarify the description) or add a new metric (separately counting two types of connections). I am not sure how useful is to separate them, so I think we should do the first approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6795) Add unit test for ReplicaAlterLogDirsThread
Anna Povzner created KAFKA-6795: --- Summary: Add unit test for ReplicaAlterLogDirsThread Key: KAFKA-6795 URL: https://issues.apache.org/jira/browse/KAFKA-6795 Project: Kafka Issue Type: Improvement Reporter: Anna Povzner Assignee: Anna Povzner ReplicaAlterLogDirsThread was added as part of KIP-113 implementation, but there is no unit test. [~lindong] I assigned this to myself, since ideally I wanted to add unit tests for KAFKA-6361 related changes (KIP-279), but feel free to re-assign. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6693) Add Consumer-only benchmark workload to Trogdor
[ https://issues.apache.org/jira/browse/KAFKA-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner resolved KAFKA-6693. - Resolution: Fixed https://github.com/apache/kafka/pull/4775 > Add Consumer-only benchmark workload to Trogdor > --- > > Key: KAFKA-6693 > URL: https://issues.apache.org/jira/browse/KAFKA-6693 > Project: Kafka > Issue Type: Improvement > Components: system tests >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Major > Original Estimate: 48h > Remaining Estimate: 48h > > Consumer-only benchmark workload that uses existing pre-populated topic -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6693) Add Consumer-only benchmark workload to Trogdor
Anna Povzner created KAFKA-6693: --- Summary: Add Consumer-only benchmark workload to Trogdor Key: KAFKA-6693 URL: https://issues.apache.org/jira/browse/KAFKA-6693 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Anna Povzner Assignee: Anna Povzner Consumer-only benchmark workload that uses existing pre-populated topic -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-4691) ProducerInterceptor.onSend() is called after key and value are serialized
[ https://issues.apache.org/jira/browse/KAFKA-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838829#comment-15838829 ] Anna Povzner commented on KAFKA-4691: - I agree with [~mjsax] about not changing KafkaProducer API. Instead, not have any producer interceptors configured, if we do that change and let Streams intercept. In the case of completely disabling the producer interceptor, and implementing this functionality in Streams, RecordCollectorImpl.send() should also call interceptor's onAcknowledgement(), in the similar situations as KafkaProducer does. E.g. if send() fails, onAcknowledgement() should be called with mostly empty RecordMetadata but with topic and partition set. Also, onAcknowledgement() should be called from the onCompletion in RecordCollectorImpl.send(). It looks like all of that could be implemented in RecordCollectorImpl.send(). > ProducerInterceptor.onSend() is called after key and value are serialized > - > > Key: KAFKA-4691 > URL: https://issues.apache.org/jira/browse/KAFKA-4691 > Project: Kafka > Issue Type: Bug > Components: clients, streams >Affects Versions: 0.10.1.1 >Reporter: Francesco Lemma > Labels: easyfix > Attachments: 2017-01-24 00_50_55-SDG_CR33_DevStudio - Java EE - > org.apache.kafka.streams.processor.internals.Reco.png > > > According to the JavaDoc > (https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/producer/ProducerInterceptor.html) > " This is called from KafkaProducer.send(ProducerRecord) and > KafkaProducer.send(ProducerRecord, Callback) methods, before key and value > get serialized and partition is assigned (if partition is not specified in > ProducerRecord)". > Although when using this with Kafka Streams > (StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG)) the > key and value contained in the record object are already serialized. > As you can see from the screenshot, the serialization is performed inside > RecordCollectionImpl.send(ProducerRecordrecord, Serializer > keySerializer, Serializer valueSerializer, > StreamPartitioner partitioner), effectively > before calling the send method of the producer which will trigger the > interceptor. > This makes it unable to perform any kind of operation involving the key or > value of the message, unless at least performing an additional > deserialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3777) Extract the existing LRU cache out of RocksDBStore
[ https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3777 started by Anna Povzner. --- > Extract the existing LRU cache out of RocksDBStore > -- > > Key: KAFKA-3777 > URL: https://issues.apache.org/jira/browse/KAFKA-3777 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Eno Thereska >Assignee: Anna Povzner > Fix For: 0.10.1.0 > > > The LRU cache that is currently inside the RocksDbStore class. As part of > KAFKA-3776 it needs to come outside of RocksDbStore and be a separate > component used in: > 1. KGroupedStream.aggregate() / reduce(), > 2. KStream.aggregateByKey() / reduceByKey(), > 3. KTable.to() (this will be done in KAFKA-3779). > As all of the above operators can have a cache on top to deduplicate the > materialized state store in RocksDB. > The scope of this JIRA is to extract out the cache of RocksDBStore, and keep > them as item 1) and 2) above; and it should be done together / after > KAFKA-3780. > Note it is NOT in the scope of this JIRA to re-write the cache, so this will > basically stay the same record-based cache we currently have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly
[ https://issues.apache.org/jira/browse/KAFKA-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3597: Status: Patch Available (was: In Progress) > Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly > > > Key: KAFKA-3597 > URL: https://issues.apache.org/jira/browse/KAFKA-3597 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > It would be useful for some tests to check if ConsoleConsumer and > VerifiableProducer shutdown cleanly or not. > Add methods to ConsoleConsumer and VerifiableProducer that return true if all > producers/consumes shutdown cleanly; otherwise false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3566) Enable VerifiableProducer and ConsoleConsumer to run with interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3566 started by Anna Povzner. --- > Enable VerifiableProducer and ConsoleConsumer to run with interceptors > -- > > Key: KAFKA-3566 > URL: https://issues.apache.org/jira/browse/KAFKA-3566 > Project: Kafka > Issue Type: Test >Affects Versions: 0.10.0.0 >Reporter: Anna Povzner >Assignee: Anna Povzner > Labels: test > > Add interceptor class list and export path list params to VerifiableProducer > and ConsoleConsumer constructors. This is to allow running VerifiableProducer > and ConsoleConsumer with interceptors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly
[ https://issues.apache.org/jira/browse/KAFKA-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3597 started by Anna Povzner. --- > Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly > > > Key: KAFKA-3597 > URL: https://issues.apache.org/jira/browse/KAFKA-3597 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > It would be useful for some tests to check if ConsoleConsumer and > VerifiableProducer shutdown cleanly or not. > Add methods to ConsoleConsumer and VerifiableProducer that return true if all > producers/consumes shutdown cleanly; otherwise false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly
Anna Povzner created KAFKA-3597: --- Summary: Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly Key: KAFKA-3597 URL: https://issues.apache.org/jira/browse/KAFKA-3597 Project: Kafka Issue Type: Test Reporter: Anna Povzner Assignee: Anna Povzner Fix For: 0.10.0.0 It would be useful for some tests to check if ConsoleConsumer and VerifiableProducer shutdown cleanly or not. Add methods to ConsoleConsumer and VerifiableProducer that return true if all producers/consumes shutdown cleanly; otherwise false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3566) Enable VerifiableProducer and ConsoleConsumer to run with interceptors
Anna Povzner created KAFKA-3566: --- Summary: Enable VerifiableProducer and ConsoleConsumer to run with interceptors Key: KAFKA-3566 URL: https://issues.apache.org/jira/browse/KAFKA-3566 Project: Kafka Issue Type: Test Affects Versions: 0.10.0.0 Reporter: Anna Povzner Assignee: Anna Povzner Add interceptor class list and export path list params to VerifiableProducer and ConsoleConsumer constructors. This is to allow running VerifiableProducer and ConsoleConsumer with interceptors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3555) Unexpected close of KStreams transformer
Anna Povzner created KAFKA-3555: --- Summary: Unexpected close of KStreams transformer Key: KAFKA-3555 URL: https://issues.apache.org/jira/browse/KAFKA-3555 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.0.0 Reporter: Anna Povzner Assignee: Guozhang Wang I consistently get this behavior when running my system test that runs 1-node kafka cluster. We implemented TransformerSupplier, and the topology is transform().filter().map().filter().aggregate(). I have a log message in my transformer's close() method. On every run of the test, I see that after running 10-20 seconds, transformer's close() is called. Then, in about 20 seconds, I see that transformer is re-initialized and continues running. I don't see any exceptions happening in KStreams before close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3320) Add successful acks verification to ProduceConsumeValidateTest
[ https://issues.apache.org/jira/browse/KAFKA-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213677#comment-15213677 ] Anna Povzner commented on KAFKA-3320: - If you look at verifiable_producer.py, it collects all successfully produced messages into acked_values. If producer send() was unsuccessful, those messages are collected into not_acked_values. However, our tests do not check whether any produce send() got an error. Suppose the test tried to produce 100 messages, and only 50 were successfully produced. If the consumer successfully consumed 50 messages, then the test is considered a success. It would be good to verify that we also did not get any produce errors for some tests. > Add successful acks verification to ProduceConsumeValidateTest > -- > > Key: KAFKA-3320 > URL: https://issues.apache.org/jira/browse/KAFKA-3320 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner > > Currently ProduceConsumeValidateTest only validates that each acked message > was consumed. Some tests may want an additional verification that all acks > were successful. > This JIRA is to add an addition optional verification that all acks were > successful and use it in couple of tests that need that verification. Example > is compression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3202) Add system test for KIP-31 and KIP-32 - Change message format version on the fly
[ https://issues.apache.org/jira/browse/KAFKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189917#comment-15189917 ] Anna Povzner commented on KAFKA-3202: - [~enothereska] I think you meant to post the test description in KAFKA-3188 (Compatibility test). Not sure if you meant to pick this JIRA or KAFKA-3188. > Add system test for KIP-31 and KIP-32 - Change message format version on the > fly > > > Key: KAFKA-3202 > URL: https://issues.apache.org/jira/browse/KAFKA-3202 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin >Assignee: Eno Thereska >Priority: Blocker > Fix For: 0.10.0.0 > > > The system test should cover the case that message format changes are made > when clients are producing/consuming. The message format change should not > cause client side issue. > We already cover 0.10 brokers with old producers/consumers in upgrade tests. > So, the main thing to test is a mix of 0.9 and 0.10 producers and consumers. > E.g., test1: 0.9 producer/0.10 consumer and then test2: 0.10 producer/0.9 > consumer. And then, each of them: compression/no compression (like in upgrade > test). And we could probably add another dimension : topic configured with > CreateTime (default) and LogAppendTime. So, total 2x2x2 combinations (but > maybe can reduce that — eg. do LogAppendTime with compression only). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors
[ https://issues.apache.org/jira/browse/KAFKA-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3303: Status: Patch Available (was: In Progress) > Pass partial record metadata to Interceptor onAcknowledgement in case of > errors > --- > > Key: KAFKA-3303 > URL: https://issues.apache.org/jira/browse/KAFKA-3303 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.10.0.0 >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Blocker > Fix For: 0.10.0.0 > > > Currently Interceptor.onAcknowledgement behaves similarly to Callback. If > exception occurred and exception is passed to onAcknowledgement, metadata > param is set to null. > However, it would be useful to pass topic, and partition if available to the > interceptor so that it knows which topic/partition got an error. > This is part of KIP-42. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test
[ https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3201: Status: Patch Available (was: In Progress) Verified that upgrade tests are passing on Jenkins (ran 5 times). > Add system test for KIP-31 and KIP-32 - Upgrade Test > > > Key: KAFKA-3201 > URL: https://issues.apache.org/jira/browse/KAFKA-3201 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin >Assignee: Anna Povzner >Priority: Blocker > Fix For: 0.10.0.0 > > > This system test should test the procedure to upgrade a Kafka broker from > 0.8.x and 0.9.0 to 0.10.0 > The procedure is documented in KIP-32: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors
[ https://issues.apache.org/jira/browse/KAFKA-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3303 started by Anna Povzner. --- > Pass partial record metadata to Interceptor onAcknowledgement in case of > errors > --- > > Key: KAFKA-3303 > URL: https://issues.apache.org/jira/browse/KAFKA-3303 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.10.0.0 >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Blocker > Fix For: 0.10.0.0 > > > Currently Interceptor.onAcknowledgement behaves similarly to Callback. If > exception occurred and exception is passed to onAcknowledgement, metadata > param is set to null. > However, it would be useful to pass topic, and partition if available to the > interceptor so that it knows which topic/partition got an error. > This is part of KIP-42. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3320) Add successful acks verification to ProduceConsumeValidateTest
Anna Povzner created KAFKA-3320: --- Summary: Add successful acks verification to ProduceConsumeValidateTest Key: KAFKA-3320 URL: https://issues.apache.org/jira/browse/KAFKA-3320 Project: Kafka Issue Type: Test Reporter: Anna Povzner Currently ProduceConsumeValidateTest only validates that each acked message was consumed. Some tests may want an additional verification that all acks were successful. This JIRA is to add an addition optional verification that all acks were successful and use it in couple of tests that need that verification. Example is compression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors
Anna Povzner created KAFKA-3303: --- Summary: Pass partial record metadata to Interceptor onAcknowledgement in case of errors Key: KAFKA-3303 URL: https://issues.apache.org/jira/browse/KAFKA-3303 Project: Kafka Issue Type: Improvement Affects Versions: 0.10.0.0 Reporter: Anna Povzner Assignee: Anna Povzner Fix For: 0.10.0.0 Currently Interceptor.onAcknowledgement behaves similarly to Callback. If exception occurred and exception is passed to onAcknowledgement, metadata param is set to null. However, it would be useful to pass topic, and partition if available to the interceptor so that it knows which topic/partition got an error. This is part of KIP-42. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
[ https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3196: Status: Patch Available (was: In Progress) > KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords > -- > > Key: KAFKA-3196 > URL: https://issues.apache.org/jira/browse/KAFKA-3196 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > This is the second (smaller) part of KIP-42, which includes: Add record size > and CRC to RecordMetadata and ConsumerRecord. > See details in KIP-42 wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
[ https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3196: Fix Version/s: 0.10.0.0 > KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords > -- > > Key: KAFKA-3196 > URL: https://issues.apache.org/jira/browse/KAFKA-3196 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > This is the second (smaller) part of KIP-42, which includes: Add record size > and CRC to RecordMetadata and ConsumerRecord. > See details in KIP-42 wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3214) Add consumer system tests for compressed topics
[ https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3214: Fix Version/s: 0.10.0.0 > Add consumer system tests for compressed topics > --- > > Key: KAFKA-3214 > URL: https://issues.apache.org/jira/browse/KAFKA-3214 > Project: Kafka > Issue Type: Test > Components: consumer >Reporter: Jason Gustafson >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > As far as I can tell, we don't have any ducktape tests which verify > correctness when compression is enabled. If we did, we might have caught > KAFKA-3179 earlier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3214) Add consumer system tests for compressed topics
[ https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3214: Status: Patch Available (was: In Progress) > Add consumer system tests for compressed topics > --- > > Key: KAFKA-3214 > URL: https://issues.apache.org/jira/browse/KAFKA-3214 > Project: Kafka > Issue Type: Test > Components: consumer >Reporter: Jason Gustafson >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > As far as I can tell, we don't have any ducktape tests which verify > correctness when compression is enabled. If we did, we might have caught > KAFKA-3179 earlier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3214) Add consumer system tests for compressed topics
[ https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3214 started by Anna Povzner. --- > Add consumer system tests for compressed topics > --- > > Key: KAFKA-3214 > URL: https://issues.apache.org/jira/browse/KAFKA-3214 > Project: Kafka > Issue Type: Test > Components: consumer >Reporter: Jason Gustafson >Assignee: Anna Povzner > > As far as I can tell, we don't have any ducktape tests which verify > correctness when compression is enabled. If we did, we might have caught > KAFKA-3179 earlier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test
[ https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159416#comment-15159416 ] Anna Povzner commented on KAFKA-3201: - Here is a set of tests: 1. Setup: Producer (0.8) → Kafka Cluster → Consumer (0.8) First rolling bounce: Set inter.broker.protocol.version = 0.8 and message.format.version = 0.8 Second rolling bonus, use latest (default) inter.broker.protocol.version and message.format.version 2. Setup: Producer (0.9) → Kafka Cluster → Consumer (0.9) First rolling bounce: Set inter.broker.protocol.version = 0.9 and message.format.version = 0.9 Second rolling bonus, use latest (default) inter.broker.protocol.version and message.format.version 3. Setup: Producer (0.9) → Kafka Cluster → Consumer (0.9) First rolling bounce: Set inter.broker.protocol.version = 0.9 and message.format.version = 0.9 Second rolling bonus: use inter.broker.protocol.version = 0.10 and message.format.version = 0.9 All three tests will be producing/consuming in the background; in the end the messages consumed will be validated; we will use a mix of producers: using compression and not using compression. [~becket_qin] I understand the test where message format changes on the fly is KAFKA-3202, so I will not do a third phase in setup 3 to move to 0.10 message format (because it will be covered in KAFKA-3202). Is my understanding correct? > Add system test for KIP-31 and KIP-32 - Upgrade Test > > > Key: KAFKA-3201 > URL: https://issues.apache.org/jira/browse/KAFKA-3201 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > This system test should test the procedure to upgrade a Kafka broker from > 0.8.x and 0.9.0 to 0.10.0 > The procedure is documented in KIP-32: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test
[ https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3201 started by Anna Povzner. --- > Add system test for KIP-31 and KIP-32 - Upgrade Test > > > Key: KAFKA-3201 > URL: https://issues.apache.org/jira/browse/KAFKA-3201 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > This system test should test the procedure to upgrade a Kafka broker from > 0.8.x and 0.9.0 to 0.10.0 > The procedure is documented in KIP-32: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157723#comment-15157723 ] Anna Povzner commented on KAFKA-3256: - [~becket_qin] I wrote my comment without seeing yours. Yes, I think tear down timeout failures are unrelated and I don't think hey actually cause any issues ([~geoffra] ?). I'll take on upgrade and compatibility system tests if you don't mind. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157719#comment-15157719 ] Anna Povzner commented on KAFKA-3256: - FYI: The upgrade test fails with this error: java.lang.IllegalArgumentException: requirement failed: message.format.version 0.10.0-IV0 cannot be used when inter.broker.protocol.version is set to 0.8.2 I think this is expected, right? We need to use 0.9.0 (or 0.8) message format in the first pass of upgrade in 0.8 to 0.10 upgrade test (which is what current upgrade test is testing), is that correct? > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157604#comment-15157604 ] Anna Povzner commented on KAFKA-3256: - [~becket_qin], [~ijuma], [~geoffra] The remaining system tests are compatibility test and rolling upgrade tests. The issue is that both tests assume trunk to be 0.9. Since we are testing 0.8 to 0.9 upgrade tests (and similarly compatibility tests) in 0.9 branch, we don't need to port the tests to get 0.9 version vs. trunk. We have separate JIRAs (KAFKA-3201 and KAFKA-3188) to add 0.8 to 0.10 and 0.9 to 0.10 upgrade tests, and test compatibility of mix of 0.9 and 0.10 clients with 0.10 brokers. My proposal to have a patch with current fixes, and address compatibility and upgrade test failures as part of KAFKA-3201 and KAFKA-3188, which are currently assigned to me. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
[ https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3196 started by Anna Povzner. --- > KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords > -- > > Key: KAFKA-3196 > URL: https://issues.apache.org/jira/browse/KAFKA-3196 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > This is the second (smaller) part of KIP-42, which includes: Add record size > and CRC to RecordMetadata and ConsumerRecord. > See details in KIP-42 wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155911#comment-15155911 ] Anna Povzner commented on KAFKA-3256: - Also for completeness, the remaining system tests: mirror_maker_test.py, zookeeper_security_upgrade_test.py, and upgrade_test.py all use console consumer and set message_validator=is_int. So, they all expect console consumer to output values that are integers, and additional "CreateTime:> breaks that. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155910#comment-15155910 ] Anna Povzner edited comment on KAFKA-3256 at 2/21/16 6:18 AM: -- I looked more into system tests to find where the format is expected, and there are several places actually: 1) Connect tests expect the output to be in JSON format. The value is published in JSON format, and since before the test was expecting the value only, the test was written to expect the console consumer output in JSON format. 2) Other tests such as reassign_partition, compatibility tests that are using ConsoleConsumer are setting message_validator=is_int when constructing it (because they were expecting only value of type integer in the console consumer output). This means that produce_consume_validate.py will be expecting consumer output to be an integer. I actually think unless we want a system test that specifically verifies a timestamp, we shouldn't modify existing tests to work with a console consumer output containing timestamp type and timestamp. So I agree with your decision to not output timestamp type and timestamp by default. If we want to write/extend system tests that specifically checks for timestamps (type or validates timestamp range), then we will use an output with a timestamp. was (Author: apovzner): I looked more into system tests to find where the format is expected, and there are several places actually: 1) Connect tests expect the output to be in JSON format. The value is published in JSON format, and since before the test was expecting the value only, the test was written to expect the console consumer output in JSON format. 2) Other tests such as reassign_partition, compatibility tests that are using ConsoleConsumer are setting message_validator=is_int when constructing it (because they were expecting only value of type integer in the console consumer output). This means that produce_consume_validate.py will be expecting consumer output to be an integer. I actually think unless we want a system test that specifically verify a timestamp, we shouldn't modify existing tests to work with a console consumer output containing timestamp type and timestamp. So I agree with your decision to not output timestamp type and timestamp by default. If we want to write/extend system tests that specifically checks for timestamps (type or validates timestamp range), then we will use an output with a timestamp. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155910#comment-15155910 ] Anna Povzner commented on KAFKA-3256: - I looked more into system tests to find where the format is expected, and there are several places actually: 1) Connect tests expect the output to be in JSON format. The value is published in JSON format, and since before the test was expecting the value only, the test was written to expect the console consumer output in JSON format. 2) Other tests such as reassign_partition, compatibility tests that are using ConsoleConsumer are setting message_validator=is_int when constructing it (because they were expecting only value of type integer in the console consumer output). This means that produce_consume_validate.py will be expecting consumer output to be an integer. I actually think unless we want a system test that specifically verify a timestamp, we shouldn't modify existing tests to work with a console consumer output containing timestamp type and timestamp. So I agree with your decision to not output timestamp type and timestamp by default. If we want to write/extend system tests that specifically checks for timestamps (type or validates timestamp range), then we will use an output with a timestamp. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155841#comment-15155841 ] Anna Povzner edited comment on KAFKA-3256 at 2/21/16 1:40 AM: -- [~becket_qin] The "No JSON object could be decoded" failure is also caused by ConsoleConsumer outputting timestamp type and timestamp. If I remove that output from ConsoleConsumer, I stop getting this failure. Also, once I did that, reassign_partitions_test also started passing. So, it leads me to believe that there is some common test tool which expects a particular *format* of an output, rather than a test expecting a specific output. What was the reason for outputting timestamp type and timestamp in ConsoleConsumer? If it does not bring much value, I propose to just got back to the original format of outputting key and value in ConsoleConsumer. I think that would fix most of the tests. was (Author: apovzner): [~becket_qin] The "No JSON object could be decoded" failure is also caused by ConsoleConsumer outputting timestamp type and timestamp. If I remove that output from ConsoleConsumer, I stop getting this failure. Also, once I did not, reassign_partitions_test also started passing. So, it leads me to believe that there is some common test tool which expects a particular *format* of an output, rather than a test expecting a specific output. What was the reason for outputting timestamp type and timestamp in ConsoleConsumer? If it does not bring much value, I propose to just got back to the original format of outputting key and value in ConsoleConsumer. I think that would fix most of the tests. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3256) Large number of system test failures
[ https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155832#comment-15155832 ] Anna Povzner commented on KAFKA-3256: - [~becket_qin] Most of these tests are reproducible locally, so should be easier to debug. I found the issue with some of the connect tests, which produce output like this: Expected ["foo", "bar", "baz", "razz", "ma", "tazz"] but saw ["CreateTime:1455962742782\tfoo", "CreateTime:1455962742789\tbar", "CreateTime:1455962742789\tbaz", "CreateTime:1455962758003\trazz", "CreateTime:1455962758009\tma", "CreateTime:1455962758009\ttazz"] in Kafka Traceback (most recent call last): File "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/tests/runner.py", line 102, in run_all_tests result.data = self.run_single_test() File "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/tests/runner.py", line 154, in run_single_test return self.current_test_context.function(self.current_test) File "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/mark/_mark.py", line 331, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/connect_test.py", line 86, in test_file_source_and_sink assert expected == actual, "Expected %s but saw %s in Kafka" % (expected, actual) AssertionError: Expected ["foo", "bar", "baz", "razz", "ma", "tazz"] but saw ["CreateTime:1455962742782\tfoo", "CreateTime:1455962742789\tbar", "CreateTime:1455962742789\tbaz", "CreateTime:1455962758003\trazz", "CreateTime:1455962758009\tma", "CreateTime:1455962758009\ttazz"] in Kafka ConsoleConsumer was changed to also output timestamp type and timestamp value in addition to key/value. However, it looks connect tests expect output with just key and value. See test_file_source_and_sink in connect_test.py for example. > Large number of system test failures > > > Key: KAFKA-3256 > URL: https://issues.apache.org/jira/browse/KAFKA-3256 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Jiangjie Qin > > Confluent's nightly run of the kafka system tests reported a large number of > failures beginning 2/20/2016 > Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html > Pass: 136 > Fail: 0 > Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/ > Link: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html > Pass: 72 > Fail: 64 > I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large > number of failures. > Given its complexity, the most likely culprit is 45c8195fa, and I confirmed > this is the first commit with failures on a small number of tests. > [~becket_qin] do you mind investigating? > {code} > commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309 > Author: Jun Rao> Date: Fri Feb 19 09:40:59 2016 -0800 > trivial fix to authorization CLI table > commit 45c8195fa14c766b200c720f316836dbb84e9d8b > Author: Jiangjie Qin > Date: Fri Feb 19 07:56:40 2016 -0800 > KAFKA-3025; Added timetamp to Message and use relative offset. > commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd > Author: Yasuhiro Matsuda > Date: Thu Feb 18 09:39:30 2016 +0800 > MINOR: remove streams config params from producer/consumer configs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test
[ https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner reassigned KAFKA-3201: --- Assignee: Anna Povzner > Add system test for KIP-31 and KIP-32 - Upgrade Test > > > Key: KAFKA-3201 > URL: https://issues.apache.org/jira/browse/KAFKA-3201 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > This system test should test the procedure to upgrade a Kafka broker from > 0.8.x and 0.9.0 to 0.10.0 > The procedure is documented in KIP-32: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test
[ https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153249#comment-15153249 ] Anna Povzner commented on KAFKA-3201: - Actually, it makes more sense to do this one before the compatibility tests (KAFKA-3188). I will assign to myself and start working on it. > Add system test for KIP-31 and KIP-32 - Upgrade Test > > > Key: KAFKA-3201 > URL: https://issues.apache.org/jira/browse/KAFKA-3201 > Project: Kafka > Issue Type: Sub-task > Components: system tests >Reporter: Jiangjie Qin > Fix For: 0.10.0.0 > > > This system test should test the procedure to upgrade a Kafka broker from > 0.8.x and 0.9.0 to 0.10.0 > The procedure is documented in KIP-32: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test
[ https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133419#comment-15133419 ] Anna Povzner commented on KAFKA-3188: - Cool! I took compatibility test. I can take one more later if needed. > Add system test for KIP-31 and KIP-32 - Compatibility Test > -- > > Key: KAFKA-3188 > URL: https://issues.apache.org/jira/browse/KAFKA-3188 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jiangjie Qin >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > The integration test should test the compatibility between 0.10.0 broker with > clients on older versions. The clients version should include 0.9.0 and 0.8.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test
[ https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner reassigned KAFKA-3188: --- Assignee: Anna Povzner > Add system test for KIP-31 and KIP-32 - Compatibility Test > -- > > Key: KAFKA-3188 > URL: https://issues.apache.org/jira/browse/KAFKA-3188 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jiangjie Qin >Assignee: Anna Povzner > Fix For: 0.10.0.0 > > > The integration test should test the compatibility between 0.10.0 broker with > clients on older versions. The clients version should include 0.9.0 and 0.8.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test
[ https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1519#comment-1519 ] Anna Povzner commented on KAFKA-3188: - [~becket_qin] Thanks! Do you mind if I take one of these three tickets? I can do either this one (Compatibility test) or the rolling upgrade. Let me know. > Add system test for KIP-31 and KIP-32 - Compatibility Test > -- > > Key: KAFKA-3188 > URL: https://issues.apache.org/jira/browse/KAFKA-3188 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jiangjie Qin > Fix For: 0.10.0.0 > > > The integration test should test the compatibility between 0.10.0 broker with > clients on older versions. The clients version should include 0.9.0 and 0.8.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3162: Description: This JIRA is for KIP-42 implementation, which includes: 1. Add ProducerInterceptor interface and call its callbacks from appropriate places in Kafka Producer. 2. Add ConsumerInterceptor interface and call its callbacks from appropriate places in Kafka Consumer. 3. Add unit tests for interceptor changes 4. Add integration test for both mutable consumer and producer interceptors (running at the same time). 5. Add record size and CRC to RecordMetadata and ConsumerRecord. See details in KIP-42 wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors was: This JIRA is for KIP-42 implementation, which includes: 1. Add ProducerInterceptor interface and call its callbacks from appropriate places in Kafka Producer. 2. Add ConsumerInterceptor interface and call its callbacks from appropriate places in Kafka Consumer. See details in KIP-42 wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > KIP-42: Add Producer and Consumer Interceptors > -- > > Key: KAFKA-3162 > URL: https://issues.apache.org/jira/browse/KAFKA-3162 > Project: Kafka > Issue Type: Bug >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for KIP-42 implementation, which includes: > 1. Add ProducerInterceptor interface and call its callbacks from appropriate > places in Kafka Producer. > 2. Add ConsumerInterceptor interface and call its callbacks from appropriate > places in Kafka Consumer. > 3. Add unit tests for interceptor changes > 4. Add integration test for both mutable consumer and producer interceptors > (running at the same time). > 5. Add record size and CRC to RecordMetadata and ConsumerRecord. > See details in KIP-42 wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3188) Add integration test for KIP-31 and KIP-32
[ https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129105#comment-15129105 ] Anna Povzner commented on KAFKA-3188: - [~becket_qin] Do you mind making separate JIRAs for each of the three tests? It will be much easier to review as separate PRs and also would be easy to share workload if one person cannot take on all three tests. Thanks! > Add integration test for KIP-31 and KIP-32 > -- > > Key: KAFKA-3188 > URL: https://issues.apache.org/jira/browse/KAFKA-3188 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jiangjie Qin > Fix For: 0.10.0.0 > > > The integration test should cover the followings: > 1. Compatibility test. > 2. Upgrade test > 3. Changing message format type on the fly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3162: Status: Patch Available (was: In Progress) > KIP-42: Add Producer and Consumer Interceptors > -- > > Key: KAFKA-3162 > URL: https://issues.apache.org/jira/browse/KAFKA-3162 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for main part of KIP-42 implementation, which includes: > 1. Add ProducerInterceptor interface and call its callbacks from appropriate > places in Kafka Producer. > 2. Add ConsumerInterceptor interface and call its callbacks from appropriate > places in Kafka Consumer. > 3. Add unit tests for interceptor changes > 4. Add integration test for both mutable consumer and producer interceptors > (running at the same time). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
Anna Povzner created KAFKA-3196: --- Summary: KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords Key: KAFKA-3196 URL: https://issues.apache.org/jira/browse/KAFKA-3196 Project: Kafka Issue Type: Improvement Reporter: Anna Povzner Assignee: Anna Povzner This is the second (smaller) part of KIP-42, which includes: Add record size and CRC to RecordMetadata and ConsumerRecord. See details in KIP-42 wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3162: Description: This JIRA is for main part of KIP-42 implementation, which includes: 1. Add ProducerInterceptor interface and call its callbacks from appropriate places in Kafka Producer. 2. Add ConsumerInterceptor interface and call its callbacks from appropriate places in Kafka Consumer. 3. Add unit tests for interceptor changes 4. Add integration test for both mutable consumer and producer interceptors (running at the same time). was: This JIRA is for KIP-42 implementation, which includes: 1. Add ProducerInterceptor interface and call its callbacks from appropriate places in Kafka Producer. 2. Add ConsumerInterceptor interface and call its callbacks from appropriate places in Kafka Consumer. 3. Add unit tests for interceptor changes 4. Add integration test for both mutable consumer and producer interceptors (running at the same time). 5. Add record size and CRC to RecordMetadata and ConsumerRecord. See details in KIP-42 wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > KIP-42: Add Producer and Consumer Interceptors > -- > > Key: KAFKA-3162 > URL: https://issues.apache.org/jira/browse/KAFKA-3162 > Project: Kafka > Issue Type: Bug >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for main part of KIP-42 implementation, which includes: > 1. Add ProducerInterceptor interface and call its callbacks from appropriate > places in Kafka Producer. > 2. Add ConsumerInterceptor interface and call its callbacks from appropriate > places in Kafka Consumer. > 3. Add unit tests for interceptor changes > 4. Add integration test for both mutable consumer and producer interceptors > (running at the same time). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3162: Issue Type: Improvement (was: Bug) > KIP-42: Add Producer and Consumer Interceptors > -- > > Key: KAFKA-3162 > URL: https://issues.apache.org/jira/browse/KAFKA-3162 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for main part of KIP-42 implementation, which includes: > 1. Add ProducerInterceptor interface and call its callbacks from appropriate > places in Kafka Producer. > 2. Add ConsumerInterceptor interface and call its callbacks from appropriate > places in Kafka Consumer. > 3. Add unit tests for interceptor changes > 4. Add integration test for both mutable consumer and producer interceptors > (running at the same time). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
Anna Povzner created KAFKA-3162: --- Summary: KIP-42: Add Producer and Consumer Interceptors Key: KAFKA-3162 URL: https://issues.apache.org/jira/browse/KAFKA-3162 Project: Kafka Issue Type: Bug Reporter: Anna Povzner Assignee: Anna Povzner This JIRA is for KIP-42 implementation, which includes: 1. Add ProducerInterceptor interface and call its callbacks from appropriate places in Kafka Producer. 2. Add ConcumerInterceptor interface and call its callbacks from appropriate places in Kafka Consumer. See details in KIP-42 wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors
[ https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3162 started by Anna Povzner. --- > KIP-42: Add Producer and Consumer Interceptors > -- > > Key: KAFKA-3162 > URL: https://issues.apache.org/jira/browse/KAFKA-3162 > Project: Kafka > Issue Type: Bug >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for KIP-42 implementation, which includes: > 1. Add ProducerInterceptor interface and call its callbacks from appropriate > places in Kafka Producer. > 2. Add ConsumerInterceptor interface and call its callbacks from appropriate > places in Kafka Consumer. > See details in KIP-42 wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord
[ https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3025: Description: The discussion about this JIRA assignment is still under discussion with [~becket_qin]. Not going to implement without his agreement. This JIRA is for changes for KIP-32 excluding broker checking and acting on timestamp field in a message. This JIRA includes: 1. Add time field to the message Timestamp => int64 Timestamp is the number of milliseconds since Unix Epoch 2. Add time field to both ProducerRecord and Consumer Record If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is sent with this timestamp. If a user does not specify the timestamp in a ProducerRecord, the producer stamps the ProducerRecord with current time. ConsumerRecord will have the timestamp of the message that were stored on broker. 3. Add two new configurations to the broker. Configuration is per topic. * message.timestamp.type: type of a timestamp. Possible values: CreateTime, LogAppendTime. Default: CreateTime * max.message.time.difference.ms: threshold for the acceptable time difference between Timestamp in the message and local time on the broker. Default: Long.MaxValue was: This JIRA is for changes for KIP-32 excluding broker checking and acting on timestamp field in a message. This JIRA includes: 1. Add time field to the message Timestamp => int64 Timestamp is the number of milliseconds since Unix Epoch 2. Add time field to both ProducerRecord and Consumer Record If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is sent with this timestamp. If a user does not specify the timestamp in a ProducerRecord, the producer stamps the ProducerRecord with current time. ConsumerRecord will have the timestamp of the message that were stored on broker. 3. Add two new configurations to the broker. Configuration is per topic. * message.timestamp.type: type of a timestamp. Possible values: CreateTime, LogAppendTime. Default: CreateTime * max.message.time.difference.ms: threshold for the acceptable time difference between Timestamp in the message and local time on the broker. Default: Long.MaxValue > KIP-32 (part 1): Add timestamp field to message, configs, and > Producer/ConsumerRecord > - > > Key: KAFKA-3025 > URL: https://issues.apache.org/jira/browse/KAFKA-3025 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > The discussion about this JIRA assignment is still under discussion with > [~becket_qin]. Not going to implement without his agreement. > This JIRA is for changes for KIP-32 excluding broker checking and acting on > timestamp field in a message. > This JIRA includes: > 1. Add time field to the message > Timestamp => int64 > Timestamp is the number of milliseconds since Unix Epoch > 2. Add time field to both ProducerRecord and Consumer Record > If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is > sent with this timestamp. > If a user does not specify the timestamp in a ProducerRecord, the producer > stamps the ProducerRecord with current time. > ConsumerRecord will have the timestamp of the message that were stored on > broker. > 3. Add two new configurations to the broker. Configuration is per topic. > * message.timestamp.type: type of a timestamp. Possible values: CreateTime, > LogAppendTime. Default: CreateTime > * max.message.time.difference.ms: threshold for the acceptable time > difference between Timestamp in the message and local time on the broker. > Default: Long.MaxValue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message
[ https://issues.apache.org/jira/browse/KAFKA-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3026: Description: The discussion about this JIRA assignment is still under discussion with [~becket_qin]. Not going to implement without his agreement. This JIRA includes: When the broker receives a message, it checks the configs: 1. If message.timestamp.type=LogAppendTime, the server over-writes the timestamp with its current local time Message could be compressed or not compressed. In either case, the timestamp is always over-written to broker's current time 2. If message.timestamp.type=CreateTime, the server calculated the difference between the current time on broker and Timestamp in the message: If difference is within max.message.time.difference.ms, the server will accept it and append it to the log. For compressed message, server will update the timestamp in compressed message to -1: this means that CreateTime is used and the timestamp is in each individual inner message. If difference is higher than max.message.time.difference.ms, the server will reject the entire batch with TimestampExceededThresholdException. (Actually adding the timestamp to the message and adding configs are covered by KAFKA-3025). was: This JIRA includes: When the broker receives a message, it checks the configs: 1. If message.timestamp.type=LogAppendTime, the server over-writes the timestamp with its current local time Message could be compressed or not compressed. In either case, the timestamp is always over-written to broker's current time 2. If message.timestamp.type=CreateTime, the server calculated the difference between the current time on broker and Timestamp in the message: If difference is within max.message.time.difference.ms, the server will accept it and append it to the log. For compressed message, server will update the timestamp in compressed message to -1: this means that CreateTime is used and the timestamp is in each individual inner message. If difference is higher than max.message.time.difference.ms, the server will reject the entire batch with TimestampExceededThresholdException. (Actually adding the timestamp to the message and adding configs are covered by KAFKA-3025). > KIP-32 (part 2): Changes in broker to over-write timestamp or reject message > > > Key: KAFKA-3026 > URL: https://issues.apache.org/jira/browse/KAFKA-3026 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > The discussion about this JIRA assignment is still under discussion with > [~becket_qin]. Not going to implement without his agreement. > This JIRA includes: > When the broker receives a message, it checks the configs: > 1. If message.timestamp.type=LogAppendTime, the server over-writes the > timestamp with its current local time > Message could be compressed or not compressed. In either case, the timestamp > is always over-written to broker's current time > 2. If message.timestamp.type=CreateTime, the server calculated the difference > between the current time on broker and Timestamp in the message: > If difference is within max.message.time.difference.ms, the server will > accept it and append it to the log. For compressed message, server will > update the timestamp in compressed message to -1: this means that CreateTime > is used and the timestamp is in each individual inner message. > If difference is higher than max.message.time.difference.ms, the server will > reject the entire batch with TimestampExceededThresholdException. > (Actually adding the timestamp to the message and adding configs are covered > by KAFKA-3025). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message
[ https://issues.apache.org/jira/browse/KAFKA-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3026: Description: The decision about this JIRA assignment is still under discussion with [~becket_qin]. Not going to implement without his agreement. This JIRA includes: When the broker receives a message, it checks the configs: 1. If message.timestamp.type=LogAppendTime, the server over-writes the timestamp with its current local time Message could be compressed or not compressed. In either case, the timestamp is always over-written to broker's current time 2. If message.timestamp.type=CreateTime, the server calculated the difference between the current time on broker and Timestamp in the message: If difference is within max.message.time.difference.ms, the server will accept it and append it to the log. For compressed message, server will update the timestamp in compressed message to -1: this means that CreateTime is used and the timestamp is in each individual inner message. If difference is higher than max.message.time.difference.ms, the server will reject the entire batch with TimestampExceededThresholdException. (Actually adding the timestamp to the message and adding configs are covered by KAFKA-3025). was: The discussion about this JIRA assignment is still under discussion with [~becket_qin]. Not going to implement without his agreement. This JIRA includes: When the broker receives a message, it checks the configs: 1. If message.timestamp.type=LogAppendTime, the server over-writes the timestamp with its current local time Message could be compressed or not compressed. In either case, the timestamp is always over-written to broker's current time 2. If message.timestamp.type=CreateTime, the server calculated the difference between the current time on broker and Timestamp in the message: If difference is within max.message.time.difference.ms, the server will accept it and append it to the log. For compressed message, server will update the timestamp in compressed message to -1: this means that CreateTime is used and the timestamp is in each individual inner message. If difference is higher than max.message.time.difference.ms, the server will reject the entire batch with TimestampExceededThresholdException. (Actually adding the timestamp to the message and adding configs are covered by KAFKA-3025). > KIP-32 (part 2): Changes in broker to over-write timestamp or reject message > > > Key: KAFKA-3026 > URL: https://issues.apache.org/jira/browse/KAFKA-3026 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > The decision about this JIRA assignment is still under discussion with > [~becket_qin]. Not going to implement without his agreement. > This JIRA includes: > When the broker receives a message, it checks the configs: > 1. If message.timestamp.type=LogAppendTime, the server over-writes the > timestamp with its current local time > Message could be compressed or not compressed. In either case, the timestamp > is always over-written to broker's current time > 2. If message.timestamp.type=CreateTime, the server calculated the difference > between the current time on broker and Timestamp in the message: > If difference is within max.message.time.difference.ms, the server will > accept it and append it to the log. For compressed message, server will > update the timestamp in compressed message to -1: this means that CreateTime > is used and the timestamp is in each individual inner message. > If difference is higher than max.message.time.difference.ms, the server will > reject the entire batch with TimestampExceededThresholdException. > (Actually adding the timestamp to the message and adding configs are covered > by KAFKA-3025). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord
[ https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068580#comment-15068580 ] Anna Povzner commented on KAFKA-3025: - [~lindong] : I am in contact with [~becket_qin] about KIP-32 and the plan for implementation. I will update my two JIRAs with this info. Sorry for the confusion. > KIP-32 (part 1): Add timestamp field to message, configs, and > Producer/ConsumerRecord > - > > Key: KAFKA-3025 > URL: https://issues.apache.org/jira/browse/KAFKA-3025 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for changes for KIP-32 excluding broker checking and acting on > timestamp field in a message. > This JIRA includes: > 1. Add time field to the message > Timestamp => int64 > Timestamp is the number of milliseconds since Unix Epoch > 2. Add time field to both ProducerRecord and Consumer Record > If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is > sent with this timestamp. > If a user does not specify the timestamp in a ProducerRecord, the producer > stamps the ProducerRecord with current time. > ConsumerRecord will have the timestamp of the message that were stored on > broker. > 3. Add two new configurations to the broker. Configuration is per topic. > * message.timestamp.type: type of a timestamp. Possible values: CreateTime, > LogAppendTime. Default: CreateTime > * max.message.time.difference.ms: threshold for the acceptable time > difference between Timestamp in the message and local time on the broker. > Default: Long.MaxValue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3025) KIP-31 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord
Anna Povzner created KAFKA-3025: --- Summary: KIP-31 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord Key: KAFKA-3025 URL: https://issues.apache.org/jira/browse/KAFKA-3025 Project: Kafka Issue Type: Improvement Reporter: Anna Povzner Assignee: Anna Povzner This JIRA is for changes for KIP-32 excluding broker checking and acting on timestamp field in a message. This JIRA includes: 1. Add time field to the message Timestamp => int64 Timestamp is the number of milliseconds since Unix Epoch 2. Add time field to both ProducerRecord and Consumer Record If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is sent with this timestamp. If a user does not specify the timestamp in a ProducerRecord, the producer stamps the ProducerRecord with current time. ConsumerRecord will have the timestamp of the message that were stored on broker. 3. Add two new configurations to the broker. Configuration is per topic. * message.timestamp.type: type of a timestamp. Possible values: CreateTime, LogAppendTime. Default: CreateTime * max.message.time.difference.ms: threshold for the acceptable time difference between Timestamp in the message and local time on the broker. Default: Long.MaxValue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message
Anna Povzner created KAFKA-3026: --- Summary: KIP-32 (part 2): Changes in broker to over-write timestamp or reject message Key: KAFKA-3026 URL: https://issues.apache.org/jira/browse/KAFKA-3026 Project: Kafka Issue Type: Improvement Reporter: Anna Povzner Assignee: Anna Povzner This JIRA includes: When the broker receives a message, it checks the configs: 1. If message.timestamp.type=LogAppendTime, the server over-writes the timestamp with its current local time Message could be compressed or not compressed. In either case, the timestamp is always over-written to broker's current time 2. If message.timestamp.type=CreateTime, the server calculated the difference between the current time on broker and Timestamp in the message: If difference is within max.message.time.difference.ms, the server will accept it and append it to the log. For compressed message, server will update the timestamp in compressed message to -1: this means that CreateTime is used and the timestamp is in each individual inner message. If difference is higher than max.message.time.difference.ms, the server will reject the entire batch with TimestampExceededThresholdException. (Actually adding the timestamp to the message and adding configs are covered by KAFKA-3025). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord
[ https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-3025: Summary: KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord (was: KIP-31 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord) > KIP-32 (part 1): Add timestamp field to message, configs, and > Producer/ConsumerRecord > - > > Key: KAFKA-3025 > URL: https://issues.apache.org/jira/browse/KAFKA-3025 > Project: Kafka > Issue Type: Improvement >Reporter: Anna Povzner >Assignee: Anna Povzner > > This JIRA is for changes for KIP-32 excluding broker checking and acting on > timestamp field in a message. > This JIRA includes: > 1. Add time field to the message > Timestamp => int64 > Timestamp is the number of milliseconds since Unix Epoch > 2. Add time field to both ProducerRecord and Consumer Record > If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is > sent with this timestamp. > If a user does not specify the timestamp in a ProducerRecord, the producer > stamps the ProducerRecord with current time. > ConsumerRecord will have the timestamp of the message that were stored on > broker. > 3. Add two new configurations to the broker. Configuration is per topic. > * message.timestamp.type: type of a timestamp. Possible values: CreateTime, > LogAppendTime. Default: CreateTime > * max.message.time.difference.ms: threshold for the acceptable time > difference between Timestamp in the message and local time on the broker. > Default: Long.MaxValue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2896) System test for partition re-assignment
[ https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2896: Status: Patch Available (was: In Progress) > System test for partition re-assignment > --- > > Key: KAFKA-2896 > URL: https://issues.apache.org/jira/browse/KAFKA-2896 > Project: Kafka > Issue Type: Task >Reporter: Gwen Shapira >Assignee: Anna Povzner > > Lots of users depend on partition re-assignment tool to manage their cluster. > Will be nice to have a simple system tests that creates a topic with few > partitions and few replicas, reassigns everything and validates the ISR > afterwards. > Just to make sure we are not breaking anything. Especially since we have > plans to improve (read: modify) this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2825) Add controller failover to existing replication tests
[ https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2825: Reviewer: Guozhang Wang (was: Geoff Anderson) > Add controller failover to existing replication tests > - > > Key: KAFKA-2825 > URL: https://issues.apache.org/jira/browse/KAFKA-2825 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner >Assignee: Anna Povzner > Fix For: 0.9.1.0 > > > Extend existing replication tests to include controller failover: > * clean/hard shutdown > * clean/hard bounce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231 ] Anna Povzner edited comment on KAFKA-2851 at 12/2/15 6:16 PM: -- Pull request: https://github.com/apache/kafka/pull/610 was (Author: apovzner): Pull request: https://github.com/apache/kafka/pull/609 > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2851: Reviewer: Guozhang Wang (was: Gwen Shapira) > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2851: Resolution: Fixed Status: Resolved (was: Patch Available) > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2896) System test for partition re-assignment
[ https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034606#comment-15034606 ] Anna Povzner commented on KAFKA-2896: - [~jinxing6...@126.com] I am already working on it, sorry I did not assign it to myself earlier. > System test for partition re-assignment > --- > > Key: KAFKA-2896 > URL: https://issues.apache.org/jira/browse/KAFKA-2896 > Project: Kafka > Issue Type: Task >Reporter: Gwen Shapira >Assignee: Anna Povzner > > Lots of users depend on partition re-assignment tool to manage their cluster. > Will be nice to have a simple system tests that creates a topic with few > partitions and few replicas, reassigns everything and validates the ISR > afterwards. > Just to make sure we are not breaking anything. Especially since we have > plans to improve (read: modify) this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-2896) System test for partition re-assignment
[ https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-2896 started by Anna Povzner. --- > System test for partition re-assignment > --- > > Key: KAFKA-2896 > URL: https://issues.apache.org/jira/browse/KAFKA-2896 > Project: Kafka > Issue Type: Task >Reporter: Gwen Shapira >Assignee: Anna Povzner > > Lots of users depend on partition re-assignment tool to manage their cluster. > Will be nice to have a simple system tests that creates a topic with few > partitions and few replicas, reassigns everything and validates the ISR > afterwards. > Just to make sure we are not breaking anything. Especially since we have > plans to improve (read: modify) this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2896) System test for partition re-assignment
[ https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner reassigned KAFKA-2896: --- Assignee: Anna Povzner > System test for partition re-assignment > --- > > Key: KAFKA-2896 > URL: https://issues.apache.org/jira/browse/KAFKA-2896 > Project: Kafka > Issue Type: Task >Reporter: Gwen Shapira >Assignee: Anna Povzner > > Lots of users depend on partition re-assignment tool to manage their cluster. > Will be nice to have a simple system tests that creates a topic with few > partitions and few replicas, reassigns everything and validates the ISR > afterwards. > Just to make sure we are not breaking anything. Especially since we have > plans to improve (read: modify) this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2851: Comment: was deleted (was: In the same PR as KAFKA-2825) > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231 ] Anna Povzner edited comment on KAFKA-2851 at 12/2/15 12:10 AM: --- Pull request: https://github.com/apache/kafka/pull/609 was (Author: apovzner): Pull request: https://github.com/apache/kafka/pull/518 > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2851: Reviewer: Gwen Shapira Status: Patch Available (was: In Progress) In the same PR as KAFKA-2825 > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2825) Add controller failover to existing replication tests
[ https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner updated KAFKA-2825: Reviewer: Gwen Shapira Status: Patch Available (was: In Progress) The PR also contains the fix for KAFKA-2851. Will assign the same reviewer. > Add controller failover to existing replication tests > - > > Key: KAFKA-2825 > URL: https://issues.apache.org/jira/browse/KAFKA-2825 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner >Assignee: Anna Povzner > > Extend existing replication tests to include controller failover: > * clean/hard shutdown > * clean/hard bounce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Povzner reassigned KAFKA-2851: --- Assignee: Anna Povzner > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-2851 started by Anna Povzner. --- > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2851) system tests: error copying keytab file
[ https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231 ] Anna Povzner commented on KAFKA-2851: - Pull request: https://github.com/apache/kafka/pull/518 > system tests: error copying keytab file > --- > > Key: KAFKA-2851 > URL: https://issues.apache.org/jira/browse/KAFKA-2851 > Project: Kafka > Issue Type: Bug >Reporter: Geoff Anderson >Assignee: Anna Povzner >Priority: Minor > > It is best to use unique paths for temporary files on the test driver machine > so that multiple test jobs don't conflict. > If the test driver machine is running multiple ducktape jobs concurrently, as > is the case with Confluent nightly test runs, conflicts can occur if the same > canonical path is always used. > In this case, security_config.py copies a file to /tmp/keytab on the test > driver machine, while other jobs may remove this from the driver machine. > Then you can get errors like this: > {code} > > test_id: > 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce > status: FAIL > run time: 1 minute 33.395 seconds > > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 101, in run_all_tests > result.data = self.run_single_test() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py", > line 151, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in test_replication_with_broker_failure > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 66, in run_produce_consume_validate > core_test_action() > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 132, in > self.run_produce_consume_validate(core_test_action=lambda: > failures[failure_mode](self)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py", > line 43, in clean_bounce > test.kafka.restart_node(prev_leader_node, clean_shutdown=True) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 275, in restart_node > self.start_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py", > line 123, in start_node > self.security_config.setup_node(node) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py", > line 130, in setup_node > node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 174, in scp_to > return self._ssh_quiet(self.scp_to_command(src, dest, recursive)) > File > "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py", > line 219, in _ssh_quiet > raise e > CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o > 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o > 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' > -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' /tmp/keytab > ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-2825) Add controller failover to existing replication tests
[ https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-2825 started by Anna Povzner. --- > Add controller failover to existing replication tests > - > > Key: KAFKA-2825 > URL: https://issues.apache.org/jira/browse/KAFKA-2825 > Project: Kafka > Issue Type: Test >Reporter: Anna Povzner >Assignee: Anna Povzner > > Extend existing replication tests to include controller failover: > * clean/hard shutdown > * clean/hard bounce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2825) Add controller failover to existing replication tests
Anna Povzner created KAFKA-2825: --- Summary: Add controller failover to existing replication tests Key: KAFKA-2825 URL: https://issues.apache.org/jira/browse/KAFKA-2825 Project: Kafka Issue Type: Test Reporter: Anna Povzner Assignee: Anna Povzner Extend existing replication tests to include controller failover: * clean/hard shutdown * clean/hard bounce -- This message was sent by Atlassian JIRA (v6.3.4#6332)