[jira] [Created] (KAFKA-10509) Add metric to track throttle time due to hitting connection rate quota

2020-09-21 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-10509:


 Summary: Add metric to track throttle time due to hitting 
connection rate quota
 Key: KAFKA-10509
 URL: https://issues.apache.org/jira/browse/KAFKA-10509
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Anna Povzner
Assignee: Anna Povzner
 Fix For: 2.7.0


See KIP-612.

 

kafka.network:type=socket-server-metrics,name=connection-accept-throttle-time,listener=\{listenerName}
 * Type: SampledStat.Avg
 * Description: Average throttle time due to violating per-listener or 
broker-wide connection acceptance rate quota on a given listener.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10458) Need a way to update quota for TokenBucket registered with Sensor

2020-09-21 Thread Anna Povzner (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner resolved KAFKA-10458.
--
Resolution: Fixed

> Need a way to update quota for TokenBucket registered with Sensor
> -
>
> Key: KAFKA-10458
> URL: https://issues.apache.org/jira/browse/KAFKA-10458
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Anna Povzner
>Assignee: David Jacot
>Priority: Major
> Fix For: 2.7.0
>
>
> For Rate() metric with quota config, we update quota by updating config of 
> KafkaMetric. However, it is not enough for TokenBucket, because it uses quota 
> config on record() to properly calculate the number of tokens. Sensor passes 
> config stored in the corresponding StatAndConfig, which currently never 
> changes. This means that after updating quota via KafkaMetric.config, our 
> current and only method, Sensor will record the value using old quota but 
> then measure the value to check for quota violation using the new quota 
> value. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10458) Need a way to update quota for TokenBucket registered with Sensor

2020-09-02 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-10458:


 Summary: Need a way to update quota for TokenBucket registered 
with Sensor
 Key: KAFKA-10458
 URL: https://issues.apache.org/jira/browse/KAFKA-10458
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Anna Povzner
Assignee: Anna Povzner
 Fix For: 2.7.0


For Rate() metric with quota config, we update quota by updating config of 
KafkaMetric. However, it is not enough for TokenBucket, because it uses quota 
config on record() to properly calculate the number of tokens. Sensor passes 
config stored in the corresponding StatAndConfig, which currently never 
changes. This means that after updating quota via KafkaMetric.config, our 
current and only method, Sensor will record the value using old quota but then 
measure the value to check for quota violation using the new quota value. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10157) Multiple tests failed due to "Failed to process feature ZK node change event"

2020-06-11 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-10157:


 Summary: Multiple tests failed due to "Failed to process feature 
ZK node change event"
 Key: KAFKA-10157
 URL: https://issues.apache.org/jira/browse/KAFKA-10157
 Project: Kafka
  Issue Type: Bug
Reporter: Anna Povzner


Multiple tests failed due to "Failed to process feature ZK node change event". 
Looks like a result of merge of this PR: 
[https://github.com/apache/kafka/pull/8680]

Note that running tests without `--info` gives output like this one: 
{quote}Process 'Gradle Test Executor 36' finished with non-zero exit value 1
{quote}
kafka.network.DynamicConnectionQuotaTest failed:
{quote}
kafka.network.DynamicConnectionQuotaTest > testDynamicConnectionQuota 
STANDARD_OUT
 [2020-06-11 20:52:42,596] ERROR [feature-zk-node-event-process-thread]: Failed 
to process feature ZK node change event. The broker will eventually exit. 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76)
 java.lang.InterruptedException
 at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
 at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)
 at 
java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
 at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147){quote}
 

kafka.api.CustomQuotaCallbackTest failed:

{quote}    [2020-06-11 21:07:36,745] ERROR 
[feature-zk-node-event-process-thread]: Failed to process feature ZK node 
change event. The broker will eventually exit. 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76)

    java.lang.InterruptedException

        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)

        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)

        at 
java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)

        at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147)

        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)

        at scala.util.control.Exception$Catch.apply(Exception.scala:227)

        at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147)

        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)

at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
 at scala.util.control.Exception$Catch.apply(Exception.scala:227)
 at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
{quote}
 

kafka.server.DynamicBrokerReconfigurationTest failed:

{quote}    [2020-06-11 21:13:01,207] ERROR 
[feature-zk-node-event-process-thread]: Failed to process feature ZK node 
change event. The broker will eventually exit. 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread:76)

    java.lang.InterruptedException

        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)

        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)

        at 
java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)

        at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.$anonfun$doWork$1(FinalizedFeatureChangeListener.scala:147)

        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)

        at scala.util.control.Exception$Catch.apply(Exception.scala:227)

        at 
kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread.doWork(FinalizedFeatureChangeListener.scala:147)

        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
{quote}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10024) Add dynamic configuration and enforce quota for per-IP connection rate limits

2020-05-19 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-10024:


 Summary: Add dynamic configuration and enforce quota for per-IP 
connection rate limits
 Key: KAFKA-10024
 URL: https://issues.apache.org/jira/browse/KAFKA-10024
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Anna Povzner
Assignee: Anna Povzner


This JIRA is for the second part of KIP-612 – Add per-IP connection creation 
rate limits.

As described here: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10023) Enforce broker-wide and per-listener connection creation rate (KIP-612, part 1)

2020-05-19 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-10023:


 Summary: Enforce broker-wide and per-listener connection creation 
rate (KIP-612, part 1)
 Key: KAFKA-10023
 URL: https://issues.apache.org/jira/browse/KAFKA-10023
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Anna Povzner
Assignee: Anna Povzner


This JIRA is for the first part of KIP-612 – Add an ability to configure and 
enforce broker-wide and per-listener connection creation rate. 

As described here: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-08 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-9839:
---

 Summary: IllegalStateException on metadata update when broker 
learns about its new epoch after the controller
 Key: KAFKA-9839
 URL: https://issues.apache.org/jira/browse/KAFKA-9839
 Project: Kafka
  Issue Type: Bug
  Components: controller, core
Affects Versions: 2.3.1
Reporter: Anna Povzner


Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current 
broker epoch YYY"  on UPDATE_METADATA when the controller learns about the 
broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker 
completes (the broker learns about its new epoch).

Here is the scenario we observed in more detail:
1. ZK session expires on broker 1
2. Broker 1 establishes new session to ZK and creates znode
3. Controller learns about broker 1 and assigns epoch
4. Broker 1 receives UPDATE_METADATA from controller, but it does not know 
about its new epoch yet, so we get an exception:

ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, 
api=UPDATE_METADATA, body={
.
java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY 
at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at 
kafka.server.KafkaApis.handle(KafkaApis.scala:139) at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at 
java.lang.Thread.run(Thread.java:748)

5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the 
created znode at /brokers/ids/1"

The result is the broker has a stale metadata for some time.

Possible solutions:
1. Broker returns a more specific error and controller retries UPDATE_MEDATA
2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9677) Low consume bandwidth quota may cause consumer not being able to fetch data

2020-03-06 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-9677:
---

 Summary: Low consume bandwidth quota may cause consumer not being 
able to fetch data
 Key: KAFKA-9677
 URL: https://issues.apache.org/jira/browse/KAFKA-9677
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.1.1, 2.0.1
Reporter: Anna Povzner
Assignee: Anna Povzner


When we changed quota communication with KIP-219, fetch requests get throttled 
by returning empty response with the delay in `throttle_time_ms` and Kafka 
consumer retrying again after the delay. 

With default configs, the maximum fetch size could be as big as 50MB (or 10MB 
per partition). The default broker config (1-second window, 10 full windows of 
tracked bandwidth/thread utilization usage) means that < 5MB/s consumer quota 
(per broker) may stop fetch request from ever being successful.

Or the other way around: 1 MB/s consumer quota (per broker) means that any 
fetch request that gets >= 10MB of data (10 seconds * 1MB/second) in the 
response will never get through.
h3. Proposed fix

Return less data in fetch response in this case: Cap `fetchMaxBytes` passed to 
replicaManager.fetchMessages() from KafkaApis.handleFetchRequest() to  * . In the example of default configs and 
1MB/s consumer bandwidth quota, fetchMaxBytes will be 10MB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9658) Removing default user quota doesn't take effect until broker restart

2020-03-04 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-9658:
---

 Summary: Removing default user quota doesn't take effect until 
broker restart
 Key: KAFKA-9658
 URL: https://issues.apache.org/jira/browse/KAFKA-9658
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.1.1, 2.0.1
Reporter: Anna Povzner
Assignee: Anna Povzner


To reproduce (for any quota type: produce, consume, and request):

Example with consumer quota, assuming no user/client quotas are set initially.
1. Set default user consumer quotas:

{{./kafka-configs.sh --zookeeper  --alter --add-config 
'consumer_byte_rate=1' --entity-type users --entity-default}}

{{2. Send some consume load for some user, say user1.}}

{{3. Remove default user consumer quota using:}}
{{./kafka-configs.sh --zookeeper  --alter --delete-config 
'consumer_byte_rate' --entity-type users --entity-default}}

Result: --describe (as below) returns correct result that there is no quota, 
but quota bound in ClientQuotaManager.metrics does not get updated for users 
that were sending load, which causes the broker to continue throttling requests 
with the previously set quota.
 {{/opt/confluent/bin/kafka-configs.sh --zookeeper   --describe 
--entity-type users --entity-default}}
{{}}{{}} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-8800) Flaky Test SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-08-26 Thread Anna Povzner (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner reopened KAFKA-8800:
-
  Assignee: Anastasia Vela  (was: Lee Dongjin)

> Flaky Test 
> SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> --
>
> Key: KAFKA-8800
> URL: https://issues.apache.org/jira/browse/KAFKA-8800
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.4.0
>Reporter: Matthias J. Sax
>Assignee: Anastasia Vela
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.4.0, 2.3.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6956/testReport/junit/kafka.api/SaslScramSslEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/]
> {quote}org.scalatest.exceptions.TestFailedException: Consumed 0 records 
> before timeout instead of the expected 1 records at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) 
> at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) 
> at org.scalatest.Assertions.fail(Assertions.scala:1091) at 
> org.scalatest.Assertions.fail$(Assertions.scala:1087) at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:822) at 
> kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:781) at 
> kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:1312) at 
> kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1320) at 
> kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:522)
>  at 
> kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:361){quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8837) KafkaMetricReporterClusterIdTest may not shutdown ZooKeeperTestHarness

2019-08-26 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-8837:
---

 Summary: KafkaMetricReporterClusterIdTest may not shutdown 
ZooKeeperTestHarness
 Key: KAFKA-8837
 URL: https://issues.apache.org/jira/browse/KAFKA-8837
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Anna Povzner
Assignee: Anastasia Vela


@After method in KafkaMetricReporterClusterIdTest calls  
`TestUtils.verifyNonDaemonThreadsStatus(this.getClass.getName)` before it calls 
tearDown on ZooKeeperTestHarness (which shut downs ZK and zk client). If 
verifyNonDaemonThreadsStatus asserts, the rest of the resources will not get 
cleaned up.

We should move `TestUtils.verifyNonDaemonThreadsStatus(this.getClass.getName)` 
to the end of `tearDown()`. However, would also be good to prevent people using 
this method in tear down similarly in the future. Maybe just adding a comment 
would help here.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8782) ReplicationQuotaManagerTest and ClientQuotaManagerTest should close Metrics object

2019-08-09 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8782:
---

 Summary: ReplicationQuotaManagerTest and ClientQuotaManagerTest 
should close Metrics object
 Key: KAFKA-8782
 URL: https://issues.apache.org/jira/browse/KAFKA-8782
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.2.1
Reporter: Anna Povzner


ReplicationQuotaManagerTest and ClientQuotaManagerTest create Metrics objects 
in several tests, but do not close() them in the end. It would be good to 
cleanup resources in those tests, which also helps with reducing overall test 
flakiness.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8526) Broker may select a failed dir for new replica even in the presence of other live dirs

2019-06-11 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8526:
---

 Summary: Broker may select a failed dir for new replica even in 
the presence of other live dirs
 Key: KAFKA-8526
 URL: https://issues.apache.org/jira/browse/KAFKA-8526
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.2.1, 2.1.1, 2.0.1, 1.1.1, 2.3.0
Reporter: Anna Povzner


Suppose a broker is configured with multiple log dirs. One of the log dirs 
fails, but there is no load on that dir, so the broker does not know about the 
failure yet, _i.e._, the failed dir is still in LogManager#_liveLogDirs. 
Suppose a new topic gets created, and the controller chooses the broker with 
failed log dir to host one of the replicas. The broker gets LeaderAndIsr 
request with isNew flag set. LogManager#getOrCreateLog() selects a log dir for 
the new replica from _liveLogDirs, then one two things can happen:
1) getAbsolutePath can fail, in which case getOrCreateLog will throw an 
IOException
2) Creating directory for new the replica log may fail (_e.g._, if directory 
becomes read-only, so getAbsolutePath worked). 

In both cases, the selected dir will be marked offline (which is correct). 
However, LeaderAndIsr will return an error and replica will be marked offline, 
even though the broker may have other live dirs. 

*Proposed solution*: Broker should retry selecting a dir for the new replica, 
if initially selected dir threw an IOException when trying to create a 
directory for the new replica. We should be able to do that in 
LogManager#getOrCreateLog() method, but keep in mind that 
logDirFailureChannel.maybeAddOfflineLogDir does not synchronously removes the 
dir from _liveLogDirs. So, it makes sense to select initial dir by calling 
LogManager#nextLogDir (current implementation), but if we fail to create log on 
that dir, one approach is to select next dir from _liveLogDirs in round-robin 
fashion (until we get to initial log dir – the case where all dirs failed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8481) Clients may fetch incomplete set of topic partitions just after topic is created

2019-06-04 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8481:
---

 Summary: Clients may fetch incomplete set of topic partitions just 
after topic is created
 Key: KAFKA-8481
 URL: https://issues.apache.org/jira/browse/KAFKA-8481
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.2.1
Reporter: Anna Povzner


KafkaConsumer#partitionsFor() or AdminClient#describeTopics() may return 
incomplete set of partitions for the given topic if the topic just got created.

Cause: When topic gets created, in most cases, controller sends partitions of 
this topics via several UpdateMetadataRequests (vs. one UpdateMetadataRequest 
with all partitions). First UpdateMetadataRequest contains partitions for which 
this broker hosts replicas, and then one or more UpdateMetadataRequest for the 
remaining partitions. This means that if a broker gets topic metadata requests 
between first and last UpdateMetadataRequest, the response will contain only 
subset of topic partitions.

Proposed fix: In KafkaController#processTopicChange(), before calling 
OnNewPartitionCreation(), send UpdateRequestMetadata with partitions of new 
topics (addedPartitionReplicaAssignment) to all live brokers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8480) Clients may fetch incomplete set of topic partitions during cluster startup

2019-06-04 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8480:
---

 Summary: Clients may fetch incomplete set of topic partitions 
during cluster startup
 Key: KAFKA-8480
 URL: https://issues.apache.org/jira/browse/KAFKA-8480
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.2.1
Reporter: Anna Povzner
Assignee: Anna Povzner


KafkaConsumer#partitionsFor() or AdminClient#describeTopics() may return not 
all partitions for a given topic when the cluster is starting up (after cluster 
was down). 

The cause is controller, on becoming a controller, sending 
UpdateMetadataRequest for all partitions with at least one online replica, and 
then a separate UpdateMetadataRequest for all partitions with at least one 
offline replica. If client sends metadata request in between broker processing 
those two update metadata requests, clients will get incomplete set of 
partitions.

Proposed fix: controller should send one UpdateMetadataRequest (containing all 
partitions) in  ReplicaStateMachine#startup().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8002) Replica reassignment to new log dir may not complete if future and current replicas segment files have different base offsets

2019-02-25 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8002:
---

 Summary: Replica reassignment to new log dir may not complete if 
future and current replicas segment files have different base offsets
 Key: KAFKA-8002
 URL: https://issues.apache.org/jira/browse/KAFKA-8002
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: Anna Povzner


Once future replica fetches log end offset, the intended logic is to finish the 
move (and rename the future dir to current replica dir, etc). However, the 
check in Partition.maybeReplaceCurrentWithFutureReplica compares  the whole 
LogOffsetMetadata vs. log end offset. The resulting behavior is that the 
re-assignment will not finish for topic partitions that were cleaned/ compacted 
such that base offset of the last segment is different for the current and 
future replica. 

The proposed fix is to compare only log end offsets of the current and future 
replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8001) Fetch from future replica stalls when local replica becomes a leader

2019-02-25 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-8001:
---

 Summary: Fetch from future replica stalls when local replica 
becomes a leader
 Key: KAFKA-8001
 URL: https://issues.apache.org/jira/browse/KAFKA-8001
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: Anna Povzner


With KIP-320, fetch from follower / future replica returns FENCED_LEADER_EPOCH 
if current leader epoch in the request is lower than the leader epoch known to 
the leader (or local replica in case of future replica fetching). In case of 
future replica fetching from the local replica, if local replica becomes the 
leader of the partition, the next fetch from future replica fails with 
FENCED_LEADER_EPOCH and fetching from future replica is stopped until the next 
leader change. 

Proposed solution: on local replica leader change, future replica should 
"become a follower" again, and go through the truncation phase. Or we could 
optimize it, and just update partition state of the future replica to reflect 
the updated current leader epoch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7923) Add unit test to verify Kafka-7401 in AK versions >= 2.0

2019-02-12 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7923:
---

 Summary: Add unit test to verify Kafka-7401 in AK versions >= 2.0
 Key: KAFKA-7923
 URL: https://issues.apache.org/jira/browse/KAFKA-7923
 Project: Kafka
  Issue Type: Test
Affects Versions: 2.1.0, 2.0.1
Reporter: Anna Povzner
Assignee: Anna Povzner


Kafka-7401 affected versions 1.0 and 1.1, which was fixed and the unit test was 
added. Versions 2.0 did not have that bug, because it was fixed as part of 
another change. To make sure we don't regress, we need to add a similar unit 
test that was added as part of Kafka-7401.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7786) Fast update of leader epoch may stall partition fetching due to FENCED_LEADER_EPOCH

2019-01-03 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7786:
---

 Summary: Fast update of leader epoch may stall partition fetching 
due to FENCED_LEADER_EPOCH
 Key: KAFKA-7786
 URL: https://issues.apache.org/jira/browse/KAFKA-7786
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Anna Povzner


KIP-320/KAFKA-7395 Added FENCED_LEADER_EPOCH error response to a 
OffsetsForLeaderEpoch request if the epoch in the request is lower than the 
broker's leader epoch. ReplicaFetcherThread builds a OffsetsForLeaderEpoch 
request under _partitionMapLock_, sends the request outside the lock, and then 
processes the response under _partitionMapLock_. The broker may receive 
LeaderAndIsr with the same leader but with the next leader epoch, remove and 
add partition to the fetcher thread (with partition state reflecting the 
updated leader epoch) – all while the OffsetsForLeaderEpoch request (with the 
old leader epoch) is still outstanding/ waiting for the lock to process the 
OffsetsForLeaderEpoch response. As a result, partition gets removed from 
partitionStates and this broker will not fetch for this partition until the 
next LeaderAndIsr which may take a while. We will see log message like this:

[2018-12-23 07:23:04,802] INFO [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Partition test_topic-17 has an older epoch (7) than the current 
leader. Will await the new LeaderAndIsr state before resuming fetching. 
(kafka.server.ReplicaFetcherThread)

We saw this happen with 
kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.reassign_from_offset_zero=True.
 This test does partition re-assignment while bouncing 2 out of 4 total 
brokers. When the failure happen, each bounced broker was also a controller. 
Because of re-assignment, the controller updates leader epoch without updating 
the leader on controller change or on broker startup, so we see several leader 
epoch changes without the leader change, which increases the likelihood of the 
race condition described above.

Here is exact events that happen in this test (around the failure):

We have 4 brokers Brokers 1, 2, 3, 4. Partition re-assignment is started for 
test_topic-17 [2, 4, 1]  —> [3, 1, 2]. At time t0, leader of test_topic-17 is 
broker 2.
 # clean shutdown of broker 3, which is also a controller
 # broker 4 becomes controller, continues re-assignment and updates leader 
epoch for test_topic-17 to 6 (with same leader)
 # broker 2 (leader of test_topic-17) receives new leader epoch: “test_topic-17 
starts at Leader Epoch 6 from offset 1388. Previous Leader Epoch was: 5”
 # broker 3 is started again after clean shutdown
 # controller sees broker 3 startup, and sends LeaderAndIsr(leader epoch 6) to 
broker 3
 # controller updates leader epoch to 7
 # broker 2 (leader of test_topic-17) receives LeaderAndIsr for leader epoch 7: 
“test_topic-17 starts at Leader Epoch 7 from offset 1974. Previous Leader Epoch 
was: 6”
 # broker 3 receives LeaderAndIsr for test_topic-17 and leader epoch 6 from 
controller: “Added fetcher to broker BrokerEndPoint(id=2) for leader epoch 6” 
and sends OffsetsForLeaderEpoch request to broker 2
 # broker 3 receives LeaderAndIsr for test_topic-17 and leader epoch 7 from 
controller; removes fetcher thread and adds fetcher thread + executes 
AbstractFetcherThread.addPartitions() which updates partition state with leader 
epoch 7
 # broker 3 receives FENCED_LEADER_EPOCH in response to 
OffsetsForLeaderEpoch(leader epoch 6), because the leader received LeaderAndIsr 
for leader epoch 7 before it got OffsetsForLeaderEpoch(leader epoch 6) from 
broker 3. As a result, it removes partition from partitionStates and it does 
not fetch until controller updates leader epoch and sends LeaderAndIsr for this 
partition to broker 3. The test fails, because re-assignment does not finish on 
time (due to broker 3 not fetching).

 

One way to address this is possibly add more state to PartitionFetchState. 
However, we may introduce other race condition. A cleaner way, I think, is to 
return leader epoch in the OffsetsForLeaderEpoch response with 
FENCED_LEADER_EPOCH error, and then ignore the error if partition state 
contains a higher leader epoch. The advantage is less state maintenance, but 
disadvantage is it requires bumping inter-broker protocol.
h1.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7415) OffsetsForLeaderEpoch may incorrectly respond with undefined epoch causing truncation to HW

2018-09-14 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7415:
---

 Summary: OffsetsForLeaderEpoch may incorrectly respond with 
undefined epoch causing truncation to HW
 Key: KAFKA-7415
 URL: https://issues.apache.org/jira/browse/KAFKA-7415
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 2.0.0
Reporter: Anna Povzner


If the follower's last appended epoch is ahead of the leader's last appended 
epoch, the OffsetsForLeaderEpoch response will incorrectly send 
(UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET), and the follower will truncate to 
HW. This may lead to data loss in some rare cases where 2 back-to-back leader 
elections happen (failure of one leader, followed by quick re-election of the 
next leader due to preferred leader election, so that all replicas are still in 
the ISR, and then failure of the 3rd leader).

The bug is in LeaderEpochFileCache.endOffsetFor(), which returns 
(UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET) if the requested leader epoch is 
ahead of the last leader epoch in the cache. The method should return (last 
leader epoch in the cache, LEO) in this scenario.

 

Here is an example of a scenario where the issue leads to the data loss.

Suppose we have three replicas: r1, r2, and r3. Initially, the ISR consists of 
(r1, r2, r3) and the leader is r1. The data up to offset 10 has been committed 
to the ISR. Here is the initial state:
{code:java}
Leader: r1
leader epoch: 0
ISR(r1, r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=10]
r3: [hw=5, leo=10]
{code}
Replica 1 fails and leaves the ISR, which makes Replica 2 the new leader with 
leader epoch = 1. The leader appends a batch, but it is not replicated yet to 
the followers.
{code:java}
Leader: r2
leader epoch: 1
ISR(r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=11]
r3: [hw=5, leo=10]
{code}
Replica 3 is elected a leader (due to preferred leader election) before it has 
a chance to truncate, with leader epoch 2. 
{code:java}
Leader: r3
leader epoch: 2
ISR(r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=11]
r3: [hw=5, leo=10]
{code}
Replica 2 sends OffsetsForLeaderEpoch(leader epoch = 1) to Replica 3. Replica 3 
incorrectly replies with UNDEFINED_EPOCH_OFFSET, and Replica 2 truncates to HW. 
If Replica 3 fails before Replica 2 re-fetches the data, this may lead to data 
loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7151) Broker running out of disk space may result in state where unclean leader election is required

2018-07-11 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7151:
---

 Summary: Broker running out of disk space may result in state 
where unclean leader election is required
 Key: KAFKA-7151
 URL: https://issues.apache.org/jira/browse/KAFKA-7151
 Project: Kafka
  Issue Type: Bug
Reporter: Anna Povzner


We have seen situations like the following:

1) Broker A is a leader for topic partition, and brokers B and C are the 
followers

2) Broker A is running out of disk space, shrinks ISR only to itself, and then 
sometime later gets disk errors, etc.

3) Broker A is stopped, disk space is reclaimed, and broker A is restarted

Result: Broker A becomes a leader, but followers cannot fetch because their log 
is ahead. The only way to continue is to enable unclean leader election.

 

There are several issues here:

-- if the machine is running out of disk space, we do not reliably get an error 
from a file system as soon as that happens. The broker could be in a state 
where some writes succeed (possibly if the write is not flushed to disk) and 
some writes fails, or maybe fail later. This may cause fetchers fetch records 
that are still in the leader's file system cache, and then the flush to disk 
failing on the leader, causes followers to be ahead of the leader.

-- I am not sure exactly why, but it seems like the leader broker (that is 
running out of disk space) may also stop servicing fetch requests making 
followers fall behind and kicked out of ISR.

Ideally, the broker should stop being a leader for any topic partition before 
accepting any records that may fail to be flushed to disk. One option is to 
automatically detect disk space usage and make a broker read-only for topic 
partitions if disk space gets to 80% or something. Maybe there is a better 
option.  

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7150) Error in processing fetched data for one partition may stop follower fetching other partitions

2018-07-11 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7150:
---

 Summary: Error in processing fetched data for one partition may 
stop follower fetching other partitions
 Key: KAFKA-7150
 URL: https://issues.apache.org/jira/browse/KAFKA-7150
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 1.1.0, 1.0.2, 0.11.0.3, 0.10.2.2
Reporter: Anna Povzner


If the followers fails to process data for one topic partitions, like out of 
order offsets error, the whole ReplicaFetcherThread is killed, which also stops 
fetching for other topic partitions serviced by this fetcher thread. This may 
result in un-necessary under-replicated partitions. I think it would be better 
to continue fetching for other topic partitions, and just remove the partition 
with an error from the responsibility of the fetcher thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7104) ReplicaFetcher thread may die because of inconsistent log start offset in fetch response

2018-06-26 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-7104:
---

 Summary: ReplicaFetcher thread may die because of inconsistent log 
start offset in fetch response
 Key: KAFKA-7104
 URL: https://issues.apache.org/jira/browse/KAFKA-7104
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0, 1.0.0
Reporter: Anna Povzner
Assignee: Anna Povzner


What we saw:

The follower fetches offset 116617, which it was able successfully append. 
However, leader's log start offset in fetch request was 116753, which was 
higher than fetched offset 116617. When replica fetcher thread tried to 
increment log start offset to leader's log start offset, it failed with 
OffsetOutOfRangeException: 

[2018-06-23 00:45:37,409] ERROR [ReplicaFetcher replicaId=1002, leaderId=1001, 
fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread) 
 kafka.common.KafkaException: Error processing data for partition X-N offset 
116617 

Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot 
increment the log start offset to 116753 of partition X-N since it is larger 
than the high watermark 116619

 

In leader's log, we see that log start offset was incremented almost at the 
same time (within one 100 ms or so). 

[2018-06-23 00:45:37,339] INFO Incrementing log start offset of partition X-N 
to 116753 in dir /kafka/kafka-logs (kafka.log.Log)

 

In leader's logic: ReplicaManager#ReplicaManager first calls readFromLocalLog() 
that reads from local log and returns LogReadResult that contains fetched data 
and leader's log start offset and HW. However, it then calls 
ReplicaManager#updateFollowerLogReadResults() that may move leader's log start 
offset and update leader's log start offset and HW in fetch response. If 
deleteRecords() happens in between, it is possible that log start offset may 
move beyond fetched offset. As a result, fetch response will contain fetched 
data but log start offset that is beyond fetched offset (and indicate the state 
on leader that fetched data does not actually exist anymore on leader).

When a follower receives such fetch response, it will first append, then move 
it's HW no further than its LEO, which maybe less than leader's log start 
offset in fetch response, and then call 
`replica.maybeIncrementLogStartOffset(leaderLogStartOffset)` which will throw 
OffsetOutOfRangeException exception causing the fetcher thread to stop. 

 

*Suggested fix:*

If the leader moves log start offset beyond fetched offset, 
ReplicaManager#updateFollowerLogReadResults()  should update the log read 
result with OFFSET_OUT_OF_RANGE error, which will cause the follower to reset 
fetch offset to leader's log start offset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6975) AdminClient.deleteRecords() may cause replicas unable to fetch from beginning

2018-05-31 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6975:
---

 Summary: AdminClient.deleteRecords() may cause replicas unable to 
fetch from beginning
 Key: KAFKA-6975
 URL: https://issues.apache.org/jira/browse/KAFKA-6975
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Anna Povzner
Assignee: Anna Povzner


AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to 
the requested offset. If the requested offset is in the middle of the batch, 
the replica will not be able to fetch from that offset (because it is in the 
middle of the batch). 

One use-case where this could cause problems is replica re-assignment. Suppose 
we have a topic partition with 3 initial replicas, and at some point the user 
issues  AdminClient.deleteRecords() for the offset that falls in the middle of 
the batch. It now becomes log start offset for this topic partition. Suppose at 
some later time, the user starts partition re-assignment to 3 new replicas. The 
new replicas (followers) will start with HW = 0, will try to fetch from 0, then 
get "out of order offset" because 0 < log start offset (LSO); the follower will 
be able to reset offset to LSO of the leader and fetch LSO; the leader will 
send a batch in response with base offset 

[jira] [Resolved] (KAFKA-6795) Add unit test for ReplicaAlterLogDirsThread

2018-05-04 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner resolved KAFKA-6795.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Add unit test for ReplicaAlterLogDirsThread
> ---
>
> Key: KAFKA-6795
> URL: https://issues.apache.org/jira/browse/KAFKA-6795
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Major
> Fix For: 2.0.0
>
>
> ReplicaAlterLogDirsThread was added as part of KIP-113 implementation, but 
> there is no unit test. 
> [~lindong] I assigned this to myself, since ideally I wanted to add unit 
> tests for KAFKA-6361 related changes (KIP-279), but feel free to re-assign. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6859) Follower should not send OffsetForLeaderEpoch for undefined leader epochs

2018-05-03 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6859:
---

 Summary: Follower should not send OffsetForLeaderEpoch for 
undefined leader epochs
 Key: KAFKA-6859
 URL: https://issues.apache.org/jira/browse/KAFKA-6859
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
Reporter: Anna Povzner


This is more of an optimization, rather than correctness.

Currently, if the follower on inter broker protocol version 0.11 and higher, 
but on older message format, it does not track leader epochs. However, will 
still send OffsetForLeaderEpoch request to the leader with undefined epoch 
which is guaranteed to return undefined offset, so that the follower truncated 
to high watermark. Another example is a bootstrapping follower that does not 
have any leader epochs recorded, 

It is cleaner and more efficient to not send OffsetForLeaderEpoch requests to 
the follower with undefined leader epochs, since we already know the answer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6824) Transient failure in DynamicBrokerReconfigurationTest.testAddRemoveSslListener

2018-04-24 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6824:
---

 Summary: Transient failure in 
DynamicBrokerReconfigurationTest.testAddRemoveSslListener
 Key: KAFKA-6824
 URL: https://issues.apache.org/jira/browse/KAFKA-6824
 Project: Kafka
  Issue Type: Bug
Reporter: Anna Povzner


Saw in my PR build (*JDK 7 and Scala 2.11* ):

*17:20:49* kafka.server.DynamicBrokerReconfigurationTest > 
testAddRemoveSslListener FAILED

*17:20:49*     java.lang.AssertionError: expected:<10> but was:<12>

*17:20:49*         at org.junit.Assert.fail(Assert.java:88)

*17:20:49*         at org.junit.Assert.failNotEquals(Assert.java:834)

*17:20:49*         at org.junit.Assert.assertEquals(Assert.java:645)

*17:20:49*         at org.junit.Assert.assertEquals(Assert.java:631)

*17:20:49*         at 
kafka.server.DynamicBrokerReconfigurationTest.verifyProduceConsume(DynamicBrokerReconfigurationTest.scala:959)

*17:20:49*         at 
kafka.server.DynamicBrokerReconfigurationTest.verifyRemoveListener(DynamicBrokerReconfigurationTest.scala:784)

*17:20:49*         at 
kafka.server.DynamicBrokerReconfigurationTest.testAddRemoveSslListener(DynamicBrokerReconfigurationTest.scala:705)

*17:20:50*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6823) Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize

2018-04-24 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6823:
---

 Summary: Transient failure in 
DynamicBrokerReconfigurationTest.testThreadPoolResize
 Key: KAFKA-6823
 URL: https://issues.apache.org/jira/browse/KAFKA-6823
 Project: Kafka
  Issue Type: Bug
Reporter: Anna Povzner


Saw in my PR build (*DK 10 and Scala 2.12 ):*

*15:58:46* kafka.server.DynamicBrokerReconfigurationTest > testThreadPoolResize 
FAILED

*15:58:46*     java.lang.AssertionError: Invalid threads: expected 6, got 7: 
List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, 
ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, 
ReplicaFetcherThread-1-0, ReplicaFetcherThread-0-1)

*15:58:46*         at org.junit.Assert.fail(Assert.java:88)

*15:58:46*         at org.junit.Assert.assertTrue(Assert.java:41)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1147)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:412)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:431)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:417)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:440)

*15:58:46*         at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:439)

*15:58:46*         at 
kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:453)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6809) connections-created metric does not behave as expected

2018-04-19 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6809:
---

 Summary: connections-created metric does not behave as expected
 Key: KAFKA-6809
 URL: https://issues.apache.org/jira/browse/KAFKA-6809
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.1
Reporter: Anna Povzner


"connections-created" sensor is described as "new connections established". It 
currently records only connections that the broker creates, but does not count 
connections received. Seems like we should also count connections received – 
either include them into this metric (and also clarify the description) or add 
a new metric (separately counting two types of connections). I am not sure how 
useful is to separate them, so I think we should do the first approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6795) Add unit test for ReplicaAlterLogDirsThread

2018-04-16 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6795:
---

 Summary: Add unit test for ReplicaAlterLogDirsThread
 Key: KAFKA-6795
 URL: https://issues.apache.org/jira/browse/KAFKA-6795
 Project: Kafka
  Issue Type: Improvement
Reporter: Anna Povzner
Assignee: Anna Povzner


ReplicaAlterLogDirsThread was added as part of KIP-113 implementation, but 
there is no unit test. 

[~lindong] I assigned this to myself, since ideally I wanted to add unit tests 
for KAFKA-6361 related changes (KIP-279), but feel free to re-assign. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6693) Add Consumer-only benchmark workload to Trogdor

2018-04-10 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner resolved KAFKA-6693.
-
Resolution: Fixed

https://github.com/apache/kafka/pull/4775

> Add Consumer-only benchmark workload to Trogdor
> ---
>
> Key: KAFKA-6693
> URL: https://issues.apache.org/jira/browse/KAFKA-6693
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Consumer-only benchmark workload that uses existing pre-populated topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6693) Add Consumer-only benchmark workload to Trogdor

2018-03-20 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6693:
---

 Summary: Add Consumer-only benchmark workload to Trogdor
 Key: KAFKA-6693
 URL: https://issues.apache.org/jira/browse/KAFKA-6693
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Anna Povzner
Assignee: Anna Povzner


Consumer-only benchmark workload that uses existing pre-populated topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4691) ProducerInterceptor.onSend() is called after key and value are serialized

2017-01-25 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838829#comment-15838829
 ] 

Anna Povzner commented on KAFKA-4691:
-

I agree with [~mjsax] about not changing KafkaProducer API. Instead, not have 
any producer interceptors configured, if we do that change and let Streams 
intercept.

In the case of completely disabling the producer interceptor, and implementing 
this functionality in Streams, RecordCollectorImpl.send() should also call 
interceptor's onAcknowledgement(), in the similar situations as KafkaProducer 
does. E.g. if send() fails, onAcknowledgement() should be called with mostly 
empty RecordMetadata but with topic and partition set. Also, 
onAcknowledgement() should be called from the onCompletion in 
RecordCollectorImpl.send(). It looks like all of that could be implemented in 
RecordCollectorImpl.send(). 

> ProducerInterceptor.onSend() is called after key and value are serialized
> -
>
> Key: KAFKA-4691
> URL: https://issues.apache.org/jira/browse/KAFKA-4691
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, streams
>Affects Versions: 0.10.1.1
>Reporter: Francesco Lemma
>  Labels: easyfix
> Attachments: 2017-01-24 00_50_55-SDG_CR33_DevStudio - Java EE - 
> org.apache.kafka.streams.processor.internals.Reco.png
>
>
> According to the JavaDoc 
> (https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/producer/ProducerInterceptor.html)
>  " This is called from KafkaProducer.send(ProducerRecord) and 
> KafkaProducer.send(ProducerRecord, Callback) methods, before key and value 
> get serialized and partition is assigned (if partition is not specified in 
> ProducerRecord)".
> Although when using this with Kafka Streams 
> (StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG)) the 
> key and value contained in the record object are already serialized.
> As you can see from the screenshot, the serialization is performed inside 
> RecordCollectionImpl.send(ProducerRecord record, Serializer 
> keySerializer, Serializer valueSerializer,
> StreamPartitioner partitioner), effectively 
> before calling the send method of the producer which will trigger the 
> interceptor.
> This makes it unable to perform any kind of operation involving the key or 
> value of the message, unless at least performing an additional 
> deserialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3777) Extract the existing LRU cache out of RocksDBStore

2016-07-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3777 started by Anna Povzner.
---
> Extract the existing LRU cache out of RocksDBStore
> --
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Anna Povzner
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.
> Note it is NOT in the scope of this JIRA to re-write the cache, so this will 
> basically stay the same record-based cache we currently have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly

2016-04-28 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3597:

Status: Patch Available  (was: In Progress)

> Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly
> 
>
> Key: KAFKA-3597
> URL: https://issues.apache.org/jira/browse/KAFKA-3597
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> It would be useful for some tests to check if ConsoleConsumer and 
> VerifiableProducer shutdown cleanly or not. 
> Add methods to ConsoleConsumer and VerifiableProducer that return true if all 
> producers/consumes shutdown cleanly; otherwise false. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3566) Enable VerifiableProducer and ConsoleConsumer to run with interceptors

2016-04-27 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3566 started by Anna Povzner.
---
> Enable VerifiableProducer and ConsoleConsumer to run with interceptors
> --
>
> Key: KAFKA-3566
> URL: https://issues.apache.org/jira/browse/KAFKA-3566
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 0.10.0.0
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>  Labels: test
>
> Add interceptor class list and export path list params to VerifiableProducer 
> and ConsoleConsumer constructors. This is to allow running VerifiableProducer 
> and ConsoleConsumer with interceptors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly

2016-04-27 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3597 started by Anna Povzner.
---
> Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly
> 
>
> Key: KAFKA-3597
> URL: https://issues.apache.org/jira/browse/KAFKA-3597
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> It would be useful for some tests to check if ConsoleConsumer and 
> VerifiableProducer shutdown cleanly or not. 
> Add methods to ConsoleConsumer and VerifiableProducer that return true if all 
> producers/consumes shutdown cleanly; otherwise false. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3597) Enable query ConsoleConsumer and VerifiableProducer if they shutdown cleanly

2016-04-20 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3597:
---

 Summary: Enable query ConsoleConsumer and VerifiableProducer if 
they shutdown cleanly
 Key: KAFKA-3597
 URL: https://issues.apache.org/jira/browse/KAFKA-3597
 Project: Kafka
  Issue Type: Test
Reporter: Anna Povzner
Assignee: Anna Povzner
 Fix For: 0.10.0.0


It would be useful for some tests to check if ConsoleConsumer and 
VerifiableProducer shutdown cleanly or not. 

Add methods to ConsoleConsumer and VerifiableProducer that return true if all 
producers/consumes shutdown cleanly; otherwise false. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3566) Enable VerifiableProducer and ConsoleConsumer to run with interceptors

2016-04-15 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3566:
---

 Summary: Enable VerifiableProducer and ConsoleConsumer to run with 
interceptors
 Key: KAFKA-3566
 URL: https://issues.apache.org/jira/browse/KAFKA-3566
 Project: Kafka
  Issue Type: Test
Affects Versions: 0.10.0.0
Reporter: Anna Povzner
Assignee: Anna Povzner


Add interceptor class list and export path list params to VerifiableProducer 
and ConsoleConsumer constructors. This is to allow running VerifiableProducer 
and ConsoleConsumer with interceptors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3555) Unexpected close of KStreams transformer

2016-04-13 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3555:
---

 Summary: Unexpected close of KStreams transformer 
 Key: KAFKA-3555
 URL: https://issues.apache.org/jira/browse/KAFKA-3555
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Anna Povzner
Assignee: Guozhang Wang


I consistently get this behavior when running my system test that runs 1-node 
kafka cluster.

We implemented TransformerSupplier, and the topology is 
transform().filter().map().filter().aggregate(). I have a log message in my 
transformer's close() method. On every run of the test, I see that after 
running 10-20 seconds, transformer's close() is called. Then, in about 20 
seconds, I see that transformer is re-initialized and continues running.

I don't see any exceptions happening in KStreams before close() happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3320) Add successful acks verification to ProduceConsumeValidateTest

2016-03-27 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213677#comment-15213677
 ] 

Anna Povzner commented on KAFKA-3320:
-

If you look at verifiable_producer.py, it collects all successfully produced 
messages into acked_values. If producer send() was unsuccessful, those messages 
are collected into not_acked_values. However, our tests do not check whether 
any produce send() got an error. Suppose the test tried to produce 100 
messages, and only 50 were successfully produced. If the consumer successfully 
consumed 50 messages, then the test is considered a success. It would be good 
to verify that we also did not get any produce errors for some tests.

> Add successful acks verification to ProduceConsumeValidateTest
> --
>
> Key: KAFKA-3320
> URL: https://issues.apache.org/jira/browse/KAFKA-3320
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>
> Currently ProduceConsumeValidateTest only validates that each acked message 
> was consumed. Some tests may want an additional verification that all acks 
> were successful.
> This JIRA is to add an addition optional verification that all acks were 
> successful and use it in couple of tests that need that verification. Example 
> is compression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3202) Add system test for KIP-31 and KIP-32 - Change message format version on the fly

2016-03-10 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189917#comment-15189917
 ] 

Anna Povzner commented on KAFKA-3202:
-

[~enothereska] I think you meant to post the test description in KAFKA-3188 
(Compatibility test). Not sure if you meant to pick this JIRA or KAFKA-3188.

> Add system test for KIP-31 and KIP-32 - Change message format version on the 
> fly
> 
>
> Key: KAFKA-3202
> URL: https://issues.apache.org/jira/browse/KAFKA-3202
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> The system test should cover the case that message format changes are made 
> when clients are producing/consuming. The message format change should not 
> cause client side issue.
> We already cover 0.10 brokers with old producers/consumers in upgrade tests. 
> So, the main thing to test is a mix of 0.9 and 0.10 producers and consumers. 
> E.g., test1: 0.9 producer/0.10 consumer and then test2: 0.10 producer/0.9 
> consumer. And then, each of them: compression/no compression (like in upgrade 
> test). And we could probably add another dimension : topic configured with 
> CreateTime (default) and LogAppendTime. So, total 2x2x2 combinations (but 
> maybe can reduce that — eg. do LogAppendTime with compression only).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors

2016-03-04 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3303:

Status: Patch Available  (was: In Progress)

> Pass partial record metadata to Interceptor onAcknowledgement in case of 
> errors
> ---
>
> Key: KAFKA-3303
> URL: https://issues.apache.org/jira/browse/KAFKA-3303
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Currently Interceptor.onAcknowledgement behaves similarly to Callback. If 
> exception occurred and exception is passed to onAcknowledgement, metadata 
> param is set to null.
> However, it would be useful to pass topic, and partition if available to the 
> interceptor so that it knows which topic/partition got an error.
> This is part of KIP-42.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test

2016-03-03 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3201:

Status: Patch Available  (was: In Progress)

Verified that upgrade tests are passing on Jenkins (ran 5 times).

> Add system test for KIP-31 and KIP-32 - Upgrade Test
> 
>
> Key: KAFKA-3201
> URL: https://issues.apache.org/jira/browse/KAFKA-3201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> This system test should test the procedure to upgrade a Kafka broker from 
> 0.8.x and 0.9.0 to 0.10.0
> The procedure is documented in KIP-32:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors

2016-03-03 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3303 started by Anna Povzner.
---
> Pass partial record metadata to Interceptor onAcknowledgement in case of 
> errors
> ---
>
> Key: KAFKA-3303
> URL: https://issues.apache.org/jira/browse/KAFKA-3303
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Currently Interceptor.onAcknowledgement behaves similarly to Callback. If 
> exception occurred and exception is passed to onAcknowledgement, metadata 
> param is set to null.
> However, it would be useful to pass topic, and partition if available to the 
> interceptor so that it knows which topic/partition got an error.
> This is part of KIP-42.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3320) Add successful acks verification to ProduceConsumeValidateTest

2016-03-02 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3320:
---

 Summary: Add successful acks verification to 
ProduceConsumeValidateTest
 Key: KAFKA-3320
 URL: https://issues.apache.org/jira/browse/KAFKA-3320
 Project: Kafka
  Issue Type: Test
Reporter: Anna Povzner


Currently ProduceConsumeValidateTest only validates that each acked message was 
consumed. Some tests may want an additional verification that all acks were 
successful.

This JIRA is to add an addition optional verification that all acks were 
successful and use it in couple of tests that need that verification. Example 
is compression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3303) Pass partial record metadata to Interceptor onAcknowledgement in case of errors

2016-02-29 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3303:
---

 Summary: Pass partial record metadata to Interceptor 
onAcknowledgement in case of errors
 Key: KAFKA-3303
 URL: https://issues.apache.org/jira/browse/KAFKA-3303
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.10.0.0
Reporter: Anna Povzner
Assignee: Anna Povzner
 Fix For: 0.10.0.0


Currently Interceptor.onAcknowledgement behaves similarly to Callback. If 
exception occurred and exception is passed to onAcknowledgement, metadata param 
is set to null.

However, it would be useful to pass topic, and partition if available to the 
interceptor so that it knows which topic/partition got an error.

This is part of KIP-42.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords

2016-02-24 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3196:

Status: Patch Available  (was: In Progress)

> KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
> --
>
> Key: KAFKA-3196
> URL: https://issues.apache.org/jira/browse/KAFKA-3196
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> This is the second (smaller) part of KIP-42, which includes: Add record size 
> and CRC to RecordMetadata and ConsumerRecord.
> See details in KIP-42 wiki: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords

2016-02-24 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3196:

Fix Version/s: 0.10.0.0

> KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
> --
>
> Key: KAFKA-3196
> URL: https://issues.apache.org/jira/browse/KAFKA-3196
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> This is the second (smaller) part of KIP-42, which includes: Add record size 
> and CRC to RecordMetadata and ConsumerRecord.
> See details in KIP-42 wiki: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3214) Add consumer system tests for compressed topics

2016-02-24 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3214:

Fix Version/s: 0.10.0.0

> Add consumer system tests for compressed topics
> ---
>
> Key: KAFKA-3214
> URL: https://issues.apache.org/jira/browse/KAFKA-3214
> Project: Kafka
>  Issue Type: Test
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> As far as I can tell, we don't have any ducktape tests which verify 
> correctness when compression is enabled. If we did, we might have caught 
> KAFKA-3179 earlier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3214) Add consumer system tests for compressed topics

2016-02-24 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3214:

Status: Patch Available  (was: In Progress)

> Add consumer system tests for compressed topics
> ---
>
> Key: KAFKA-3214
> URL: https://issues.apache.org/jira/browse/KAFKA-3214
> Project: Kafka
>  Issue Type: Test
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> As far as I can tell, we don't have any ducktape tests which verify 
> correctness when compression is enabled. If we did, we might have caught 
> KAFKA-3179 earlier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3214) Add consumer system tests for compressed topics

2016-02-23 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3214 started by Anna Povzner.
---
> Add consumer system tests for compressed topics
> ---
>
> Key: KAFKA-3214
> URL: https://issues.apache.org/jira/browse/KAFKA-3214
> Project: Kafka
>  Issue Type: Test
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Anna Povzner
>
> As far as I can tell, we don't have any ducktape tests which verify 
> correctness when compression is enabled. If we did, we might have caught 
> KAFKA-3179 earlier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test

2016-02-23 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159416#comment-15159416
 ] 

Anna Povzner commented on KAFKA-3201:
-

Here is a set of tests:

1. Setup:  Producer (0.8) → Kafka Cluster → Consumer (0.8)
First rolling bounce: Set inter.broker.protocol.version = 0.8 and 
message.format.version = 0.8
Second rolling bonus, use latest (default)  inter.broker.protocol.version  and 
message.format.version

2. Setup:  Producer (0.9) → Kafka Cluster → Consumer (0.9)
First rolling bounce: Set inter.broker.protocol.version = 0.9 and 
message.format.version = 0.9
Second rolling bonus, use latest (default)  inter.broker.protocol.version  and 
message.format.version

3. Setup:  Producer (0.9) → Kafka Cluster → Consumer (0.9)
First rolling bounce: Set inter.broker.protocol.version = 0.9 and 
message.format.version = 0.9
Second rolling bonus: use inter.broker.protocol.version = 0.10 and 
message.format.version = 0.9

All three tests will be producing/consuming in the background; in the end the 
messages consumed will be validated; we will use a mix of producers: using 
compression and not using compression.

[~becket_qin] I understand the test where message format changes on the fly is 
KAFKA-3202, so I will not do a third phase in setup 3 to move to 0.10 message 
format (because it will be covered in KAFKA-3202). Is my understanding correct?


> Add system test for KIP-31 and KIP-32 - Upgrade Test
> 
>
> Key: KAFKA-3201
> URL: https://issues.apache.org/jira/browse/KAFKA-3201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> This system test should test the procedure to upgrade a Kafka broker from 
> 0.8.x and 0.9.0 to 0.10.0
> The procedure is documented in KIP-32:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test

2016-02-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3201 started by Anna Povzner.
---
> Add system test for KIP-31 and KIP-32 - Upgrade Test
> 
>
> Key: KAFKA-3201
> URL: https://issues.apache.org/jira/browse/KAFKA-3201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> This system test should test the procedure to upgrade a Kafka broker from 
> 0.8.x and 0.9.0 to 0.10.0
> The procedure is documented in KIP-32:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-22 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157723#comment-15157723
 ] 

Anna Povzner commented on KAFKA-3256:
-

[~becket_qin] I wrote my comment without seeing yours. Yes, I think tear down 
timeout failures are unrelated and I don't think hey actually cause any issues 
([~geoffra] ?). 

I'll take on upgrade and compatibility system tests if you don't mind. 

> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-22 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157719#comment-15157719
 ] 

Anna Povzner commented on KAFKA-3256:
-

FYI: The upgrade test fails with this error:
java.lang.IllegalArgumentException: requirement failed: message.format.version 
0.10.0-IV0 cannot be used when inter.broker.protocol.version is set to 0.8.2

I think this is expected, right? We need to use 0.9.0 (or 0.8) message format 
in the first pass of upgrade in 0.8 to 0.10 upgrade test (which is what current 
upgrade test is testing), is that correct?

> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-22 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157604#comment-15157604
 ] 

Anna Povzner commented on KAFKA-3256:
-

[~becket_qin], [~ijuma], [~geoffra] The remaining system tests are 
compatibility test and rolling upgrade tests. The issue is that both tests 
assume trunk to be 0.9. Since we are testing 0.8 to 0.9 upgrade tests (and 
similarly compatibility tests) in 0.9 branch, we don't need to port the tests 
to get 0.9 version vs. trunk. We have separate JIRAs (KAFKA-3201 and 
KAFKA-3188) to add 0.8 to 0.10 and 0.9 to 0.10 upgrade tests, and test 
compatibility of mix of 0.9 and 0.10 clients with 0.10 brokers. My proposal to 
have a patch with current fixes, and address compatibility and upgrade test 
failures as part of KAFKA-3201 and KAFKA-3188, which are currently assigned to 
me.

> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords

2016-02-21 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3196 started by Anna Povzner.
---
> KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords
> --
>
> Key: KAFKA-3196
> URL: https://issues.apache.org/jira/browse/KAFKA-3196
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This is the second (smaller) part of KIP-42, which includes: Add record size 
> and CRC to RecordMetadata and ConsumerRecord.
> See details in KIP-42 wiki: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-20 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155911#comment-15155911
 ] 

Anna Povzner commented on KAFKA-3256:
-

Also for completeness, the remaining system tests:
mirror_maker_test.py, zookeeper_security_upgrade_test.py, and upgrade_test.py 
all use console consumer and set message_validator=is_int. So, they all expect  
console consumer to output values that are integers, and additional 
"CreateTime:> breaks that.


> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3256) Large number of system test failures

2016-02-20 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155910#comment-15155910
 ] 

Anna Povzner edited comment on KAFKA-3256 at 2/21/16 6:18 AM:
--

I looked more into system tests to find where the format is expected, and there 
are several places actually:
1) Connect tests expect the output to be in JSON format. The value is published 
in JSON format, and since before the test was expecting the value only, the 
test was written to expect the console consumer output in JSON format.
2) Other tests such as reassign_partition, compatibility tests that are using 
ConsoleConsumer are setting message_validator=is_int when constructing it 
(because they were expecting only value of type integer in the console consumer 
output). This means that produce_consume_validate.py will be expecting consumer 
output to be an integer.

I actually think unless we want a system test that specifically verifies a 
timestamp, we shouldn't modify existing tests to work with a console consumer 
output containing timestamp type and timestamp. So I agree with your decision 
to not output timestamp type and timestamp by default. If we want to 
write/extend system tests that specifically checks for timestamps (type or 
validates timestamp range), then we will use an output with a timestamp.


was (Author: apovzner):
I looked more into system tests to find where the format is expected, and there 
are several places actually:
1) Connect tests expect the output to be in JSON format. The value is published 
in JSON format, and since before the test was expecting the value only, the 
test was written to expect the console consumer output in JSON format.
2) Other tests such as reassign_partition, compatibility tests that are using 
ConsoleConsumer are setting message_validator=is_int when constructing it 
(because they were expecting only value of type integer in the console consumer 
output). This means that produce_consume_validate.py will be expecting consumer 
output to be an integer.

I actually think unless we want a system test that specifically verify a 
timestamp, we shouldn't modify existing tests to work with a console consumer 
output containing timestamp type and timestamp. So I agree with your decision 
to not output timestamp type and timestamp by default. If we want to 
write/extend system tests that specifically checks for timestamps (type or 
validates timestamp range), then we will use an output with a timestamp.

> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-20 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155910#comment-15155910
 ] 

Anna Povzner commented on KAFKA-3256:
-

I looked more into system tests to find where the format is expected, and there 
are several places actually:
1) Connect tests expect the output to be in JSON format. The value is published 
in JSON format, and since before the test was expecting the value only, the 
test was written to expect the console consumer output in JSON format.
2) Other tests such as reassign_partition, compatibility tests that are using 
ConsoleConsumer are setting message_validator=is_int when constructing it 
(because they were expecting only value of type integer in the console consumer 
output). This means that produce_consume_validate.py will be expecting consumer 
output to be an integer.

I actually think unless we want a system test that specifically verify a 
timestamp, we shouldn't modify existing tests to work with a console consumer 
output containing timestamp type and timestamp. So I agree with your decision 
to not output timestamp type and timestamp by default. If we want to 
write/extend system tests that specifically checks for timestamps (type or 
validates timestamp range), then we will use an output with a timestamp.

> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3256) Large number of system test failures

2016-02-20 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155841#comment-15155841
 ] 

Anna Povzner edited comment on KAFKA-3256 at 2/21/16 1:40 AM:
--

[~becket_qin] The "No JSON object could be decoded" failure is also caused by 
ConsoleConsumer outputting timestamp type and timestamp. If I remove that 
output from ConsoleConsumer, I stop getting this failure. 

Also, once I did that, reassign_partitions_test also started passing.

So, it leads me to believe that there is some common test tool which expects a 
particular *format* of an output, rather than a test expecting a specific 
output.

What was the reason for outputting timestamp type and timestamp in 
ConsoleConsumer? If it does not bring much value, I propose to just got back to 
the original format of outputting key and value in ConsoleConsumer. I think  
that would fix most of the tests.  



was (Author: apovzner):
[~becket_qin] The "No JSON object could be decoded" failure is also caused by 
ConsoleConsumer outputting timestamp type and timestamp. If I remove that 
output from ConsoleConsumer, I stop getting this failure. 

Also, once I did not, reassign_partitions_test also started passing.

So, it leads me to believe that there is some common test tool which expects a 
particular *format* of an output, rather than a test expecting a specific 
output.

What was the reason for outputting timestamp type and timestamp in 
ConsoleConsumer? If it does not bring much value, I propose to just got back to 
the original format of outputting key and value in ConsoleConsumer. I think  
that would fix most of the tests.  


> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3256) Large number of system test failures

2016-02-20 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155832#comment-15155832
 ] 

Anna Povzner commented on KAFKA-3256:
-

[~becket_qin] Most of these tests are reproducible locally, so should be easier 
to debug.

I found the issue with some of the connect tests, which produce output like 
this:
 Expected ["foo", "bar", "baz", "razz", "ma", "tazz"] but saw 
["CreateTime:1455962742782\tfoo", "CreateTime:1455962742789\tbar", 
"CreateTime:1455962742789\tbaz", "CreateTime:1455962758003\trazz", 
"CreateTime:1455962758009\tma", "CreateTime:1455962758009\ttazz"] in Kafka
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/tests/runner.py",
 line 102, in run_all_tests
result.data = self.run_single_test()
  File 
"/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/tests/runner.py",
 line 154, in run_single_test
return self.current_test_context.function(self.current_test)
  File 
"/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.10-py2.7.egg/ducktape/mark/_mark.py",
 line 331, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/connect_test.py",
 line 86, in test_file_source_and_sink
assert expected == actual, "Expected %s but saw %s in Kafka" % (expected, 
actual)
AssertionError: Expected ["foo", "bar", "baz", "razz", "ma", "tazz"] but saw 
["CreateTime:1455962742782\tfoo", "CreateTime:1455962742789\tbar", 
"CreateTime:1455962742789\tbaz", "CreateTime:1455962758003\trazz", 
"CreateTime:1455962758009\tma", "CreateTime:1455962758009\ttazz"] in Kafka

ConsoleConsumer was changed to also output timestamp type and timestamp value 
in addition to key/value. However, it looks connect tests expect output with 
just key and value. See test_file_source_and_sink in connect_test.py for 
example.



> Large number of system test failures
> 
>
> Key: KAFKA-3256
> URL: https://issues.apache.org/jira/browse/KAFKA-3256
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Jiangjie Qin
>
> Confluent's nightly run of the kafka system tests reported a large number of 
> failures beginning 2/20/2016
> Test run: 2016-02-19--001.1455897182--apache--trunk--eee9522/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-19--001.1455897182--apache--trunk--eee9522/report.html
> Pass: 136
> Fail: 0
> Test run: 2016-02-20--001.1455979842--apache--trunk--5caa800/
> Link: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-02-20--001.1455979842--apache--trunk--5caa800/report.html
> Pass: 72
> Fail: 64
> I.e. trunk@eee9522 was the last passing run, and trunk@5caa800 had a large 
> number of failures.
> Given its complexity, the most likely culprit is 45c8195fa, and I confirmed 
> this is the first commit with failures on a small number of tests.
> [~becket_qin] do you mind investigating?
> {code}
> commit 5caa800e217c6b83f62ee3e6b5f02f56e331b309
> Author: Jun Rao 
> Date:   Fri Feb 19 09:40:59 2016 -0800
> trivial fix to authorization CLI table
> commit 45c8195fa14c766b200c720f316836dbb84e9d8b
> Author: Jiangjie Qin 
> Date:   Fri Feb 19 07:56:40 2016 -0800
> KAFKA-3025; Added timetamp to Message and use relative offset.
> commit eee95228fabe1643baa016a2d49fb0a9fe2c66bd
> Author: Yasuhiro Matsuda 
> Date:   Thu Feb 18 09:39:30 2016 +0800
> MINOR: remove streams config params from producer/consumer configs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test

2016-02-18 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner reassigned KAFKA-3201:
---

Assignee: Anna Povzner

> Add system test for KIP-31 and KIP-32 - Upgrade Test
> 
>
> Key: KAFKA-3201
> URL: https://issues.apache.org/jira/browse/KAFKA-3201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> This system test should test the procedure to upgrade a Kafka broker from 
> 0.8.x and 0.9.0 to 0.10.0
> The procedure is documented in KIP-32:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3201) Add system test for KIP-31 and KIP-32 - Upgrade Test

2016-02-18 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153249#comment-15153249
 ] 

Anna Povzner commented on KAFKA-3201:
-

Actually, it makes more sense to do this one before the compatibility tests 
(KAFKA-3188). I will assign to myself and start working on it.

> Add system test for KIP-31 and KIP-32 - Upgrade Test
> 
>
> Key: KAFKA-3201
> URL: https://issues.apache.org/jira/browse/KAFKA-3201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: system tests
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.0
>
>
> This system test should test the procedure to upgrade a Kafka broker from 
> 0.8.x and 0.9.0 to 0.10.0
> The procedure is documented in KIP-32:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test

2016-02-04 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133419#comment-15133419
 ] 

Anna Povzner commented on KAFKA-3188:
-

Cool! I took compatibility test. I can take one more later if needed. 

> Add system test for KIP-31 and KIP-32 - Compatibility Test
> --
>
> Key: KAFKA-3188
> URL: https://issues.apache.org/jira/browse/KAFKA-3188
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> The integration test should test the compatibility between 0.10.0 broker with 
> clients on older versions. The clients version should include 0.9.0 and 0.8.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test

2016-02-04 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner reassigned KAFKA-3188:
---

Assignee: Anna Povzner

> Add system test for KIP-31 and KIP-32 - Compatibility Test
> --
>
> Key: KAFKA-3188
> URL: https://issues.apache.org/jira/browse/KAFKA-3188
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Anna Povzner
> Fix For: 0.10.0.0
>
>
> The integration test should test the compatibility between 0.10.0 broker with 
> clients on older versions. The clients version should include 0.9.0 and 0.8.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test

2016-02-04 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1519#comment-1519
 ] 

Anna Povzner commented on KAFKA-3188:
-

[~becket_qin] Thanks! Do you mind if I take one of these three tickets? I can 
do either this one (Compatibility test) or the rolling upgrade. Let me know. 

> Add system test for KIP-31 and KIP-32 - Compatibility Test
> --
>
> Key: KAFKA-3188
> URL: https://issues.apache.org/jira/browse/KAFKA-3188
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.0
>
>
> The integration test should test the compatibility between 0.10.0 broker with 
> clients on older versions. The clients version should include 0.9.0 and 0.8.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-02-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3162:

Description: 
This JIRA is for KIP-42 implementation, which includes:

1. Add ProducerInterceptor interface and call its callbacks from appropriate 
places in Kafka Producer.
2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
places in Kafka Consumer.
3. Add unit tests for interceptor changes
4. Add integration test for both mutable consumer and producer interceptors 
(running at the same time).
5. Add record size and CRC to RecordMetadata and ConsumerRecord.

See details in KIP-42 wiki: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
 

  was:
This JIRA is for KIP-42 implementation, which includes:

1. Add ProducerInterceptor interface and call its callbacks from appropriate 
places in Kafka Producer.
2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
places in Kafka Consumer.

See details in KIP-42 wiki: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
 


> KIP-42: Add Producer and Consumer Interceptors
> --
>
> Key: KAFKA-3162
> URL: https://issues.apache.org/jira/browse/KAFKA-3162
> Project: Kafka
>  Issue Type: Bug
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for KIP-42 implementation, which includes:
> 1. Add ProducerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Producer.
> 2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Consumer.
> 3. Add unit tests for interceptor changes
> 4. Add integration test for both mutable consumer and producer interceptors 
> (running at the same time).
> 5. Add record size and CRC to RecordMetadata and ConsumerRecord.
> See details in KIP-42 wiki: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3188) Add integration test for KIP-31 and KIP-32

2016-02-02 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129105#comment-15129105
 ] 

Anna Povzner commented on KAFKA-3188:
-

[~becket_qin] Do you mind making separate JIRAs for each of the three tests? It 
will be much easier to review as separate PRs and also would be easy to share 
workload if one person cannot take on all three tests. Thanks!

> Add integration test for KIP-31 and KIP-32
> --
>
> Key: KAFKA-3188
> URL: https://issues.apache.org/jira/browse/KAFKA-3188
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.0
>
>
> The integration test should cover the followings:
> 1. Compatibility test.
> 2. Upgrade test
> 3. Changing message format type on the fly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-02-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3162:

Status: Patch Available  (was: In Progress)

> KIP-42: Add Producer and Consumer Interceptors
> --
>
> Key: KAFKA-3162
> URL: https://issues.apache.org/jira/browse/KAFKA-3162
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for main part of KIP-42 implementation, which includes:
> 1. Add ProducerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Producer.
> 2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Consumer.
> 3. Add unit tests for interceptor changes
> 4. Add integration test for both mutable consumer and producer interceptors 
> (running at the same time).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3196) KIP-42 (part 2): add record size and CRC to RecordMetadata and ConsumerRecords

2016-02-02 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3196:
---

 Summary: KIP-42 (part 2): add record size and CRC to 
RecordMetadata and ConsumerRecords
 Key: KAFKA-3196
 URL: https://issues.apache.org/jira/browse/KAFKA-3196
 Project: Kafka
  Issue Type: Improvement
Reporter: Anna Povzner
Assignee: Anna Povzner


This is the second (smaller) part of KIP-42, which includes: Add record size 
and CRC to RecordMetadata and ConsumerRecord.

See details in KIP-42 wiki: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-02-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3162:

Description: 
This JIRA is for main part of KIP-42 implementation, which includes:

1. Add ProducerInterceptor interface and call its callbacks from appropriate 
places in Kafka Producer.
2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
places in Kafka Consumer.
3. Add unit tests for interceptor changes
4. Add integration test for both mutable consumer and producer interceptors 
(running at the same time).

  was:
This JIRA is for KIP-42 implementation, which includes:

1. Add ProducerInterceptor interface and call its callbacks from appropriate 
places in Kafka Producer.
2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
places in Kafka Consumer.
3. Add unit tests for interceptor changes
4. Add integration test for both mutable consumer and producer interceptors 
(running at the same time).
5. Add record size and CRC to RecordMetadata and ConsumerRecord.

See details in KIP-42 wiki: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
 


> KIP-42: Add Producer and Consumer Interceptors
> --
>
> Key: KAFKA-3162
> URL: https://issues.apache.org/jira/browse/KAFKA-3162
> Project: Kafka
>  Issue Type: Bug
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for main part of KIP-42 implementation, which includes:
> 1. Add ProducerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Producer.
> 2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Consumer.
> 3. Add unit tests for interceptor changes
> 4. Add integration test for both mutable consumer and producer interceptors 
> (running at the same time).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-02-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3162:

Issue Type: Improvement  (was: Bug)

> KIP-42: Add Producer and Consumer Interceptors
> --
>
> Key: KAFKA-3162
> URL: https://issues.apache.org/jira/browse/KAFKA-3162
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for main part of KIP-42 implementation, which includes:
> 1. Add ProducerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Producer.
> 2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Consumer.
> 3. Add unit tests for interceptor changes
> 4. Add integration test for both mutable consumer and producer interceptors 
> (running at the same time).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-01-27 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3162:
---

 Summary: KIP-42: Add Producer and Consumer Interceptors
 Key: KAFKA-3162
 URL: https://issues.apache.org/jira/browse/KAFKA-3162
 Project: Kafka
  Issue Type: Bug
Reporter: Anna Povzner
Assignee: Anna Povzner


This JIRA is for KIP-42 implementation, which includes:

1. Add ProducerInterceptor interface and call its callbacks from appropriate 
places in Kafka Producer.
2. Add ConcumerInterceptor interface and call its callbacks from appropriate 
places in Kafka Consumer.

See details in KIP-42 wiki: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3162) KIP-42: Add Producer and Consumer Interceptors

2016-01-27 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3162 started by Anna Povzner.
---
> KIP-42: Add Producer and Consumer Interceptors
> --
>
> Key: KAFKA-3162
> URL: https://issues.apache.org/jira/browse/KAFKA-3162
> Project: Kafka
>  Issue Type: Bug
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for KIP-42 implementation, which includes:
> 1. Add ProducerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Producer.
> 2. Add ConsumerInterceptor interface and call its callbacks from appropriate 
> places in Kafka Consumer.
> See details in KIP-42 wiki: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord

2015-12-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3025:

Description: 
The discussion about this JIRA assignment is still under discussion with 
[~becket_qin]. Not going to implement without his agreement.

This JIRA is for changes for KIP-32 excluding broker checking and acting on 
timestamp field in a message.

This JIRA includes:
1. Add time field to the message
Timestamp => int64
Timestamp is the number of milliseconds since Unix Epoch

2. Add time field to both ProducerRecord and Consumer Record
If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
sent with this timestamp.
If a user does not specify the timestamp in a ProducerRecord, the producer 
stamps the ProducerRecord with current time.
ConsumerRecord will have the timestamp of the message that were stored on 
broker.

3. Add two new configurations to the broker. Configuration is per topic.
* message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
LogAppendTime. Default: CreateTime
* max.message.time.difference.ms: threshold for the acceptable time difference 
between Timestamp in the message and local time on the broker. Default: 
Long.MaxValue

  was:
This JIRA is for changes for KIP-32 excluding broker checking and acting on 
timestamp field in a message.

This JIRA includes:
1. Add time field to the message
Timestamp => int64
Timestamp is the number of milliseconds since Unix Epoch

2. Add time field to both ProducerRecord and Consumer Record
If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
sent with this timestamp.
If a user does not specify the timestamp in a ProducerRecord, the producer 
stamps the ProducerRecord with current time.
ConsumerRecord will have the timestamp of the message that were stored on 
broker.

3. Add two new configurations to the broker. Configuration is per topic.
* message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
LogAppendTime. Default: CreateTime
* max.message.time.difference.ms: threshold for the acceptable time difference 
between Timestamp in the message and local time on the broker. Default: 
Long.MaxValue


> KIP-32 (part 1): Add timestamp field to message, configs, and 
> Producer/ConsumerRecord
> -
>
> Key: KAFKA-3025
> URL: https://issues.apache.org/jira/browse/KAFKA-3025
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> The discussion about this JIRA assignment is still under discussion with 
> [~becket_qin]. Not going to implement without his agreement.
> This JIRA is for changes for KIP-32 excluding broker checking and acting on 
> timestamp field in a message.
> This JIRA includes:
> 1. Add time field to the message
> Timestamp => int64
> Timestamp is the number of milliseconds since Unix Epoch
> 2. Add time field to both ProducerRecord and Consumer Record
> If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
> sent with this timestamp.
> If a user does not specify the timestamp in a ProducerRecord, the producer 
> stamps the ProducerRecord with current time.
> ConsumerRecord will have the timestamp of the message that were stored on 
> broker.
> 3. Add two new configurations to the broker. Configuration is per topic.
> * message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
> LogAppendTime. Default: CreateTime
> * max.message.time.difference.ms: threshold for the acceptable time 
> difference between Timestamp in the message and local time on the broker. 
> Default: Long.MaxValue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message

2015-12-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3026:

Description: 
The discussion about this JIRA assignment is still under discussion with 
[~becket_qin]. Not going to implement without his agreement.

This JIRA includes:
When the broker receives a message, it checks the configs:
1. If message.timestamp.type=LogAppendTime, the server over-writes the 
timestamp with its current local time
Message could be compressed or not compressed. In either case, the timestamp is 
always over-written to broker's current time

2. If message.timestamp.type=CreateTime, the server calculated the difference 
between the current time on broker and Timestamp in the message:
If difference is within max.message.time.difference.ms, the server will accept 
it and append it to the log. For compressed message, server will update the 
timestamp in compressed message to -1: this means that CreateTime is used and 
the timestamp is in each individual inner message.
If difference is higher than max.message.time.difference.ms, the server will 
reject the entire batch with TimestampExceededThresholdException.

(Actually adding the timestamp to the message and adding configs are covered by 
KAFKA-3025).

  was:
This JIRA includes:
When the broker receives a message, it checks the configs:
1. If message.timestamp.type=LogAppendTime, the server over-writes the 
timestamp with its current local time
Message could be compressed or not compressed. In either case, the timestamp is 
always over-written to broker's current time

2. If message.timestamp.type=CreateTime, the server calculated the difference 
between the current time on broker and Timestamp in the message:
If difference is within max.message.time.difference.ms, the server will accept 
it and append it to the log. For compressed message, server will update the 
timestamp in compressed message to -1: this means that CreateTime is used and 
the timestamp is in each individual inner message.
If difference is higher than max.message.time.difference.ms, the server will 
reject the entire batch with TimestampExceededThresholdException.

(Actually adding the timestamp to the message and adding configs are covered by 
KAFKA-3025).


> KIP-32 (part 2): Changes in broker to over-write timestamp or reject message
> 
>
> Key: KAFKA-3026
> URL: https://issues.apache.org/jira/browse/KAFKA-3026
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> The discussion about this JIRA assignment is still under discussion with 
> [~becket_qin]. Not going to implement without his agreement.
> This JIRA includes:
> When the broker receives a message, it checks the configs:
> 1. If message.timestamp.type=LogAppendTime, the server over-writes the 
> timestamp with its current local time
> Message could be compressed or not compressed. In either case, the timestamp 
> is always over-written to broker's current time
> 2. If message.timestamp.type=CreateTime, the server calculated the difference 
> between the current time on broker and Timestamp in the message:
> If difference is within max.message.time.difference.ms, the server will 
> accept it and append it to the log. For compressed message, server will 
> update the timestamp in compressed message to -1: this means that CreateTime 
> is used and the timestamp is in each individual inner message.
> If difference is higher than max.message.time.difference.ms, the server will 
> reject the entire batch with TimestampExceededThresholdException.
> (Actually adding the timestamp to the message and adding configs are covered 
> by KAFKA-3025).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message

2015-12-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3026:

Description: 
The decision about this JIRA assignment is still under discussion with 
[~becket_qin]. Not going to implement without his agreement.

This JIRA includes:
When the broker receives a message, it checks the configs:
1. If message.timestamp.type=LogAppendTime, the server over-writes the 
timestamp with its current local time
Message could be compressed or not compressed. In either case, the timestamp is 
always over-written to broker's current time

2. If message.timestamp.type=CreateTime, the server calculated the difference 
between the current time on broker and Timestamp in the message:
If difference is within max.message.time.difference.ms, the server will accept 
it and append it to the log. For compressed message, server will update the 
timestamp in compressed message to -1: this means that CreateTime is used and 
the timestamp is in each individual inner message.
If difference is higher than max.message.time.difference.ms, the server will 
reject the entire batch with TimestampExceededThresholdException.

(Actually adding the timestamp to the message and adding configs are covered by 
KAFKA-3025).

  was:
The discussion about this JIRA assignment is still under discussion with 
[~becket_qin]. Not going to implement without his agreement.

This JIRA includes:
When the broker receives a message, it checks the configs:
1. If message.timestamp.type=LogAppendTime, the server over-writes the 
timestamp with its current local time
Message could be compressed or not compressed. In either case, the timestamp is 
always over-written to broker's current time

2. If message.timestamp.type=CreateTime, the server calculated the difference 
between the current time on broker and Timestamp in the message:
If difference is within max.message.time.difference.ms, the server will accept 
it and append it to the log. For compressed message, server will update the 
timestamp in compressed message to -1: this means that CreateTime is used and 
the timestamp is in each individual inner message.
If difference is higher than max.message.time.difference.ms, the server will 
reject the entire batch with TimestampExceededThresholdException.

(Actually adding the timestamp to the message and adding configs are covered by 
KAFKA-3025).


> KIP-32 (part 2): Changes in broker to over-write timestamp or reject message
> 
>
> Key: KAFKA-3026
> URL: https://issues.apache.org/jira/browse/KAFKA-3026
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> The decision about this JIRA assignment is still under discussion with 
> [~becket_qin]. Not going to implement without his agreement.
> This JIRA includes:
> When the broker receives a message, it checks the configs:
> 1. If message.timestamp.type=LogAppendTime, the server over-writes the 
> timestamp with its current local time
> Message could be compressed or not compressed. In either case, the timestamp 
> is always over-written to broker's current time
> 2. If message.timestamp.type=CreateTime, the server calculated the difference 
> between the current time on broker and Timestamp in the message:
> If difference is within max.message.time.difference.ms, the server will 
> accept it and append it to the log. For compressed message, server will 
> update the timestamp in compressed message to -1: this means that CreateTime 
> is used and the timestamp is in each individual inner message.
> If difference is higher than max.message.time.difference.ms, the server will 
> reject the entire batch with TimestampExceededThresholdException.
> (Actually adding the timestamp to the message and adding configs are covered 
> by KAFKA-3025).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord

2015-12-22 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068580#comment-15068580
 ] 

Anna Povzner commented on KAFKA-3025:
-

[~lindong] : I am in contact with [~becket_qin] about KIP-32 and the plan for 
implementation. I will update my two JIRAs with this info. Sorry for the 
confusion.

> KIP-32 (part 1): Add timestamp field to message, configs, and 
> Producer/ConsumerRecord
> -
>
> Key: KAFKA-3025
> URL: https://issues.apache.org/jira/browse/KAFKA-3025
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for changes for KIP-32 excluding broker checking and acting on 
> timestamp field in a message.
> This JIRA includes:
> 1. Add time field to the message
> Timestamp => int64
> Timestamp is the number of milliseconds since Unix Epoch
> 2. Add time field to both ProducerRecord and Consumer Record
> If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
> sent with this timestamp.
> If a user does not specify the timestamp in a ProducerRecord, the producer 
> stamps the ProducerRecord with current time.
> ConsumerRecord will have the timestamp of the message that were stored on 
> broker.
> 3. Add two new configurations to the broker. Configuration is per topic.
> * message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
> LogAppendTime. Default: CreateTime
> * max.message.time.difference.ms: threshold for the acceptable time 
> difference between Timestamp in the message and local time on the broker. 
> Default: Long.MaxValue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3025) KIP-31 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord

2015-12-21 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3025:
---

 Summary: KIP-31 (part 1): Add timestamp field to message, configs, 
and Producer/ConsumerRecord
 Key: KAFKA-3025
 URL: https://issues.apache.org/jira/browse/KAFKA-3025
 Project: Kafka
  Issue Type: Improvement
Reporter: Anna Povzner
Assignee: Anna Povzner


This JIRA is for changes for KIP-32 excluding broker checking and acting on 
timestamp field in a message.

This JIRA includes:
1. Add time field to the message
Timestamp => int64
Timestamp is the number of milliseconds since Unix Epoch

2. Add time field to both ProducerRecord and Consumer Record
If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
sent with this timestamp.
If a user does not specify the timestamp in a ProducerRecord, the producer 
stamps the ProducerRecord with current time.
ConsumerRecord will have the timestamp of the message that were stored on 
broker.

3. Add two new configurations to the broker. Configuration is per topic.
* message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
LogAppendTime. Default: CreateTime
* max.message.time.difference.ms: threshold for the acceptable time difference 
between Timestamp in the message and local time on the broker. Default: 
Long.MaxValue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3026) KIP-32 (part 2): Changes in broker to over-write timestamp or reject message

2015-12-21 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3026:
---

 Summary: KIP-32 (part 2): Changes in broker to over-write 
timestamp or reject message
 Key: KAFKA-3026
 URL: https://issues.apache.org/jira/browse/KAFKA-3026
 Project: Kafka
  Issue Type: Improvement
Reporter: Anna Povzner
Assignee: Anna Povzner


This JIRA includes:
When the broker receives a message, it checks the configs:
1. If message.timestamp.type=LogAppendTime, the server over-writes the 
timestamp with its current local time
Message could be compressed or not compressed. In either case, the timestamp is 
always over-written to broker's current time

2. If message.timestamp.type=CreateTime, the server calculated the difference 
between the current time on broker and Timestamp in the message:
If difference is within max.message.time.difference.ms, the server will accept 
it and append it to the log. For compressed message, server will update the 
timestamp in compressed message to -1: this means that CreateTime is used and 
the timestamp is in each individual inner message.
If difference is higher than max.message.time.difference.ms, the server will 
reject the entire batch with TimestampExceededThresholdException.

(Actually adding the timestamp to the message and adding configs are covered by 
KAFKA-3025).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3025) KIP-32 (part 1): Add timestamp field to message, configs, and Producer/ConsumerRecord

2015-12-21 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-3025:

Summary: KIP-32 (part 1): Add timestamp field to message, configs, and 
Producer/ConsumerRecord  (was: KIP-31 (part 1): Add timestamp field to message, 
configs, and Producer/ConsumerRecord)

> KIP-32 (part 1): Add timestamp field to message, configs, and 
> Producer/ConsumerRecord
> -
>
> Key: KAFKA-3025
> URL: https://issues.apache.org/jira/browse/KAFKA-3025
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> This JIRA is for changes for KIP-32 excluding broker checking and acting on 
> timestamp field in a message.
> This JIRA includes:
> 1. Add time field to the message
> Timestamp => int64
> Timestamp is the number of milliseconds since Unix Epoch
> 2. Add time field to both ProducerRecord and Consumer Record
> If a user specifies the timestamp in a ProducerRecord, the ProducerRecord is 
> sent with this timestamp.
> If a user does not specify the timestamp in a ProducerRecord, the producer 
> stamps the ProducerRecord with current time.
> ConsumerRecord will have the timestamp of the message that were stored on 
> broker.
> 3. Add two new configurations to the broker. Configuration is per topic.
> * message.timestamp.type: type of a timestamp. Possible values: CreateTime, 
> LogAppendTime. Default: CreateTime
> * max.message.time.difference.ms: threshold for the acceptable time 
> difference between Timestamp in the message and local time on the broker. 
> Default: Long.MaxValue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2896) System test for partition re-assignment

2015-12-10 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2896:

Status: Patch Available  (was: In Progress)

> System test for partition re-assignment
> ---
>
> Key: KAFKA-2896
> URL: https://issues.apache.org/jira/browse/KAFKA-2896
> Project: Kafka
>  Issue Type: Task
>Reporter: Gwen Shapira
>Assignee: Anna Povzner
>
> Lots of users depend on partition re-assignment tool to manage their cluster. 
> Will be nice to have a simple system tests that creates a topic with few 
> partitions and few replicas, reassigns everything and validates the ISR 
> afterwards. 
> Just to make sure we are not breaking anything. Especially since we have 
> plans to improve (read: modify) this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2825) Add controller failover to existing replication tests

2015-12-03 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2825:

Reviewer: Guozhang Wang  (was: Geoff Anderson)

> Add controller failover to existing replication tests
> -
>
> Key: KAFKA-2825
> URL: https://issues.apache.org/jira/browse/KAFKA-2825
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>Assignee: Anna Povzner
> Fix For: 0.9.1.0
>
>
> Extend existing replication tests to include controller failover:
> * clean/hard shutdown
> * clean/hard bounce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2851) system tests: error copying keytab file

2015-12-02 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231
 ] 

Anna Povzner edited comment on KAFKA-2851 at 12/2/15 6:16 PM:
--

Pull request: https://github.com/apache/kafka/pull/610




was (Author: apovzner):
Pull request: https://github.com/apache/kafka/pull/609



> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file

2015-12-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2851:

Reviewer: Guozhang Wang  (was: Gwen Shapira)

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file

2015-12-02 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2851:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2896) System test for partition re-assignment

2015-12-01 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034606#comment-15034606
 ] 

Anna Povzner commented on KAFKA-2896:
-

[~jinxing6...@126.com] I am already working on it, sorry I did not assign it to 
myself earlier.

> System test for partition re-assignment
> ---
>
> Key: KAFKA-2896
> URL: https://issues.apache.org/jira/browse/KAFKA-2896
> Project: Kafka
>  Issue Type: Task
>Reporter: Gwen Shapira
>Assignee: Anna Povzner
>
> Lots of users depend on partition re-assignment tool to manage their cluster. 
> Will be nice to have a simple system tests that creates a topic with few 
> partitions and few replicas, reassigns everything and validates the ISR 
> afterwards. 
> Just to make sure we are not breaking anything. Especially since we have 
> plans to improve (read: modify) this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-2896) System test for partition re-assignment

2015-12-01 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2896 started by Anna Povzner.
---
> System test for partition re-assignment
> ---
>
> Key: KAFKA-2896
> URL: https://issues.apache.org/jira/browse/KAFKA-2896
> Project: Kafka
>  Issue Type: Task
>Reporter: Gwen Shapira
>Assignee: Anna Povzner
>
> Lots of users depend on partition re-assignment tool to manage their cluster. 
> Will be nice to have a simple system tests that creates a topic with few 
> partitions and few replicas, reassigns everything and validates the ISR 
> afterwards. 
> Just to make sure we are not breaking anything. Especially since we have 
> plans to improve (read: modify) this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2896) System test for partition re-assignment

2015-12-01 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner reassigned KAFKA-2896:
---

Assignee: Anna Povzner

> System test for partition re-assignment
> ---
>
> Key: KAFKA-2896
> URL: https://issues.apache.org/jira/browse/KAFKA-2896
> Project: Kafka
>  Issue Type: Task
>Reporter: Gwen Shapira
>Assignee: Anna Povzner
>
> Lots of users depend on partition re-assignment tool to manage their cluster. 
> Will be nice to have a simple system tests that creates a topic with few 
> partitions and few replicas, reassigns everything and validates the ISR 
> afterwards. 
> Just to make sure we are not breaking anything. Especially since we have 
> plans to improve (read: modify) this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-2851) system tests: error copying keytab file

2015-12-01 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2851:

Comment: was deleted

(was: In the same PR as KAFKA-2825)

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2851) system tests: error copying keytab file

2015-12-01 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231
 ] 

Anna Povzner edited comment on KAFKA-2851 at 12/2/15 12:10 AM:
---

Pull request: https://github.com/apache/kafka/pull/609




was (Author: apovzner):
Pull request: https://github.com/apache/kafka/pull/518

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2851) system tests: error copying keytab file

2015-11-18 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2851:

Reviewer: Gwen Shapira
  Status: Patch Available  (was: In Progress)

In the same PR as KAFKA-2825

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2825) Add controller failover to existing replication tests

2015-11-18 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner updated KAFKA-2825:

Reviewer: Gwen Shapira
  Status: Patch Available  (was: In Progress)

The PR also contains the fix for KAFKA-2851. Will assign the same reviewer.

> Add controller failover to existing replication tests
> -
>
> Key: KAFKA-2825
> URL: https://issues.apache.org/jira/browse/KAFKA-2825
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> Extend existing replication tests to include controller failover:
> * clean/hard shutdown
> * clean/hard bounce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2851) system tests: error copying keytab file

2015-11-17 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Povzner reassigned KAFKA-2851:
---

Assignee: Anna Povzner

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-2851) system tests: error copying keytab file

2015-11-17 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2851 started by Anna Povzner.
---
> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2851) system tests: error copying keytab file

2015-11-17 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010231#comment-15010231
 ] 

Anna Povzner commented on KAFKA-2851:
-

Pull request: https://github.com/apache/kafka/pull/518

> system tests: error copying keytab file
> ---
>
> Key: KAFKA-2851
> URL: https://issues.apache.org/jira/browse/KAFKA-2851
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Anna Povzner
>Priority: Minor
>
> It is best to use unique paths for temporary files on the test driver machine 
> so that multiple test jobs don't conflict. 
> If the test driver machine is running multiple ducktape jobs concurrently, as 
> is the case with Confluent nightly test runs, conflicts can occur if the same 
> canonical path is always used.
> In this case, security_config.py copies a file to /tmp/keytab on the test 
> driver machine, while other jobs may remove this from the driver machine. 
> Then you can get errors like this:
> {code}
> 
> test_id:
> 2015-11-17--001.kafkatest.tests.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_PLAINTEXT.failure_mode=clean_bounce
> status: FAIL
> run time:   1 minute 33.395 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in test_replication_with_broker_failure
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 66, in run_produce_consume_validate
> core_test_action()
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 132, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> failures[failure_mode](self))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/tests/replication_test.py",
>  line 43, in clean_bounce
> test.kafka.restart_node(prev_leader_node, clean_shutdown=True)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 275, in restart_node
> self.start_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 123, in start_node
> self.security_config.setup_node(node)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/tests/kafkatest/services/security/security_config.py",
>  line 130, in setup_node
> node.account.scp_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 174, in scp_to
> return self._ssh_quiet(self.scp_to_command(src, dest, recursive))
>   File 
> "/var/lib/jenkins/workspace/kafka_system_tests/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/cluster/remoteaccount.py",
>  line 219, in _ssh_quiet
> raise e
> CalledProcessError: Command 'scp -o 'HostName 52.33.250.202' -o 'Port 22' -o 
> 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 
> 'PasswordAuthentication no' -o 'IdentityFile /var/lib/jenkins/muckrake.pem' 
> -o 'IdentitiesOnly yes' -o 'LogLevel FATAL'  /tmp/keytab 
> ubuntu@worker2:/mnt/security/keytab' returned non-zero exit status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-2825) Add controller failover to existing replication tests

2015-11-12 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2825 started by Anna Povzner.
---
> Add controller failover to existing replication tests
> -
>
> Key: KAFKA-2825
> URL: https://issues.apache.org/jira/browse/KAFKA-2825
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>
> Extend existing replication tests to include controller failover:
> * clean/hard shutdown
> * clean/hard bounce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2825) Add controller failover to existing replication tests

2015-11-12 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-2825:
---

 Summary: Add controller failover to existing replication tests
 Key: KAFKA-2825
 URL: https://issues.apache.org/jira/browse/KAFKA-2825
 Project: Kafka
  Issue Type: Test
Reporter: Anna Povzner
Assignee: Anna Povzner


Extend existing replication tests to include controller failover:
* clean/hard shutdown
* clean/hard bounce




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >