[jira] [Updated] (KAFKA-4214) kafka-reassign-partitions fails all the time when brokers are bounced during reassignment

2016-09-23 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-4214:

Description: 
Due to KAFKA-4204, we never realized that the existing system test for testing 
reassignment would always fail when brokers were bounced in mid process. This 
happens reliably, even for topics of varying number of partition and varying 
replication factors.

In particular, we see errors like this in the logs when the brokers are 
bounced: 

{noformat}
Status of partition reassignment:
ERROR: Assigned replicas (1,2) don't match the list of replicas for 
reassignment (1) for partition [test_topic,2]
Reassignment of partition [test_topic,1] completed successfully
Reassignment of partition [test_topic,2] failed
Reassignment of partition [test_topic,3] completed successfully
Reassignment of partition [test_topic,0] is still in progress
{noformat}

Currently, the tests which bounce brokers during reassignment are disabled 
until this bug is fixed.

  was:
Due to KAFKA-4204, we never that the existing system test for testing 
reassignment would always fail when brokers were bounced in mid process. 

In particular, we see errors like this in the logs when the brokers are 
bounced: 

{noformat}
Status of partition reassignment:
ERROR: Assigned replicas (1,2) don't match the list of replicas for 
reassignment (1) for partition [test_topic,2]
Reassignment of partition [test_topic,1] completed successfully
Reassignment of partition [test_topic,2] failed
Reassignment of partition [test_topic,3] completed successfully
Reassignment of partition [test_topic,0] is still in progress
{noformat}

Currently, the tests which bounce brokers during reassignment are disabled 
until this bug is fixed.


> kafka-reassign-partitions fails all the time when brokers are bounced during 
> reassignment
> -
>
> Key: KAFKA-4214
> URL: https://issues.apache.org/jira/browse/KAFKA-4214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> Due to KAFKA-4204, we never realized that the existing system test for 
> testing reassignment would always fail when brokers were bounced in mid 
> process. This happens reliably, even for topics of varying number of 
> partition and varying replication factors.
> In particular, we see errors like this in the logs when the brokers are 
> bounced: 
> {noformat}
> Status of partition reassignment:
> ERROR: Assigned replicas (1,2) don't match the list of replicas for 
> reassignment (1) for partition [test_topic,2]
> Reassignment of partition [test_topic,1] completed successfully
> Reassignment of partition [test_topic,2] failed
> Reassignment of partition [test_topic,3] completed successfully
> Reassignment of partition [test_topic,0] is still in progress
> {noformat}
> Currently, the tests which bounce brokers during reassignment are disabled 
> until this bug is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4215) Consumers miss messages during partition reassignment

2016-09-23 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-4215:
---

 Summary: Consumers miss messages during partition reassignment
 Key: KAFKA-4215
 URL: https://issues.apache.org/jira/browse/KAFKA-4215
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Apurva Mehta


In the specific case where the replication-factor of a topic is 1, when 
partition reassignment is ongoing, and when a broker is bounced, consumers 
reliably lose some messages in the stream. 

This can be reproduced in system tests where the following error message sis 
observed:

{noformat}
AssertionError: 737 acked message did not make it to the Consumer. They are: 
22530, 45059, 22534, 45063, 22538, 45067, 22542, 45071, 22546, 45075, 22550, 
45079, 22554, 45083, 22558, 45087, 22562, 45091, 22566, 45095, ...plus 717 
more. Total Acked: 51809, Total Consumed: 51073. We validated that the first 
737 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #909

2016-09-23 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-3590; Handle not-enough-replicas errors when writing to offsets

[ismael] KAFKA-3719; Allow underscores in hostname

--
[...truncated 3550 lines...]
kafka.security.auth.ZkAuthorizationTest > testZkMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testChroot STARTED

kafka.security.auth.ZkAuthorizationTest > testChroot PASSED

kafka.security.auth.ZkAuthorizationTest > testDelete STARTED

kafka.security.auth.ZkAuthorizationTest > testDelete PASSED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive STARTED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED


Build failed in Jenkins: kafka-trunk-jdk7 #1566

2016-09-23 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3719; Allow underscores in hostname

--
[...truncated 1004 lines...]
kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing STARTED

kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing PASSED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue STARTED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr STARTED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition STARTED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask STARTED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration STARTED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.timer.TimerTaskListTest > testAll STARTED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart STARTED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler STARTED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask STARTED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg STARTED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs STARTED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.UtilsTest > testGenerateUuidAsBase64 STARTED

kafka.utils.UtilsTest > testGenerateUuidAsBase64 PASSED

kafka.utils.UtilsTest > testAbs STARTED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix STARTED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator STARTED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes STARTED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList STARTED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt STARTED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testUrlSafeBase64EncodeUUID STARTED

kafka.utils.UtilsTest > testUrlSafeBase64EncodeUUID PASSED

kafka.utils.UtilsTest > testCsvMap STARTED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock STARTED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow STARTED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.utils.JsonTest > testJsonEncoding STARTED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.utils.IteratorTemplateTest > testIterator STARTED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer STARTED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > testClusterIdMetric STARTED

kafka.metrics.MetricsTest > testClusterIdMetric PASSED

kafka.metrics.MetricsTest > testMetricsLeak STARTED

kafka.metrics.MetricsTest > testMetricsLeak PASSED

kafka.consumer.TopicFilterTest > testWhitelists STARTED

kafka.consumer.TopicFilterTest > testWhitelists PASSED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson STARTED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson PASSED

kafka.consumer.TopicFilterTest > testBlacklists STARTED

kafka.consumer.TopicFilterTest > testBlacklists PASSED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor PASSED

kafka.consumer.PartitionAssignorTest > testRangePartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > 

[jira] [Created] (KAFKA-4214) kafka-reassign-partitions fails all the time when brokers are bounced during reassignment

2016-09-23 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-4214:
---

 Summary: kafka-reassign-partitions fails all the time when brokers 
are bounced during reassignment
 Key: KAFKA-4214
 URL: https://issues.apache.org/jira/browse/KAFKA-4214
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Apurva Mehta


Due to KAFKA-4204, we never that the existing system test for testing 
reassignment would always fail when brokers were bounced in mid process. 

In particular, we see errors like this in the logs when the brokers are 
bounced: 

{noformat}
Status of partition reassignment:
ERROR: Assigned replicas (1,2) don't match the list of replicas for 
reassignment (1) for partition [test_topic,2]
Reassignment of partition [test_topic,1] completed successfully
Reassignment of partition [test_topic,2] failed
Reassignment of partition [test_topic,3] completed successfully
Reassignment of partition [test_topic,0] is still in progress
{noformat}

Currently, the tests which bounce brokers during reassignment are disabled 
until this bug is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1565

2016-09-23 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-3590; Handle not-enough-replicas errors when writing to offsets

--
[...truncated 11978 lines...]
 ^
:94:
 method fetchTopicMetadata in object ClientUtils is deprecated: This method has 
been deprecated and will be removed in a future release.
fetchTopicMetadata(topics, brokers, producerConfig, correlationId)
^
:393:
 constructor UpdateMetadataRequest in class UpdateMetadataRequest is 
deprecated: see corresponding Javadoc for more information.
new UpdateMetadataRequest(controllerId, controllerEpoch, 
liveBrokers.asJava, partitionStates.asJava)
^
:191:
 object ProducerRequestStatsRegistry in package producer is deprecated: This 
object has been deprecated and will be removed in a future release.
ProducerRequestStatsRegistry.removeProducerRequestStats(clientId)
^
:129:
 method readFromReadableChannel in class NetworkReceive is deprecated: see 
corresponding Javadoc for more information.
  response.readFromReadableChannel(channel)
   ^
:311:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
  if (partitionData.timestamp == 
OffsetCommitRequest.DEFAULT_TIMESTAMP)
^
:314:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
offsetRetention + partitionData.timestamp
^
:553:
 method offsetData in class ListOffsetRequest is deprecated: see corresponding 
Javadoc for more information.
val (authorizedRequestInfo, unauthorizedRequestInfo) = 
offsetRequest.offsetData.asScala.partition {
 ^
:553:
 class PartitionData in object ListOffsetRequest is deprecated: see 
corresponding Javadoc for more information.
val (authorizedRequestInfo, unauthorizedRequestInfo) = 
offsetRequest.offsetData.asScala.partition {

  ^
:558:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
  new PartitionData(Errors.TOPIC_AUTHORIZATION_FAILED.code, 
List[JLong]().asJava)
  ^
:583:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
(topicPartition, new ListOffsetResponse.PartitionData(Errors.NONE.code, 
offsets.map(new JLong(_)).asJava))
 ^
:590:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
  (topicPartition, new 
ListOffsetResponse.PartitionData(Errors.forException(e).code, 
List[JLong]().asJava))
   ^
:593:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
  (topicPartition, new 
ListOffsetResponse.PartitionData(Errors.forException(e).code, 
List[JLong]().asJava))
   ^
:270:
 class PartitionData in object ListOffsetRequest is deprecated: see 
corresponding Javadoc for more information.
val partitions = Map(topicPartition -> new 
ListOffsetRequest.PartitionData(earliestOrLatest, 1))
 ^

[jira] [Assigned] (KAFKA-4211) Change system tests to use the new consumer by default

2016-09-23 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian reassigned KAFKA-4211:
--

Assignee: Vahid Hashemian

> Change system tests to use the new consumer by default
> --
>
> Key: KAFKA-4211
> URL: https://issues.apache.org/jira/browse/KAFKA-4211
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> We have utility methods like `run_produce_consume_validate` that use the old 
> consumer by default. We should change them to use the new consumer by default 
> while ensuring that we still have coverage for the old consumer (while we 
> still support it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1904: KAFKA-4213: First set of system tests for replicat...

2016-09-23 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1904

KAFKA-4213: First set of system tests for replication throttling, KIP-73.

This patch also fixes the following:

  1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.

  2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka throttling-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1904.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1904


commit fe4a0b1070f25e687fb8075210da9c5a356fa1c8
Author: Apurva Mehta 
Date:   2016-09-23T23:41:02Z

Initial commit of system tests for replication throttling, KIP-73.

This patch also fixes the following:

  1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.

  2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4213) Add system tests for replication throttling (KIP-73)

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517913#comment-15517913
 ] 

ASF GitHub Bot commented on KAFKA-4213:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1904

KAFKA-4213: First set of system tests for replication throttling, KIP-73.

This patch also fixes the following:

  1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.

  2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka throttling-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1904.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1904


commit fe4a0b1070f25e687fb8075210da9c5a356fa1c8
Author: Apurva Mehta 
Date:   2016-09-23T23:41:02Z

Initial commit of system tests for replication throttling, KIP-73.

This patch also fixes the following:

  1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.

  2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.




> Add system tests for replication throttling (KIP-73)
> 
>
> Key: KAFKA-4213
> URL: https://issues.apache.org/jira/browse/KAFKA-4213
> Project: Kafka
>  Issue Type: Test
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> Add system tests for replication throttling. The two main things to test are: 
> 1. kafka-reassign-partitions: in this use case, a new broker is added to a 
> cluster, and we are testing throttling of the partitions being replicated to 
> this cluster. The '--throttle' option of the reassign partitions tool is what 
> we want to test. we will invoke the tool with this option, and assert that 
> the replication takes a minimum amount of time, based on the throttle and the 
> amount of data being replicated.
> 2. kafka-configs: in this use case, we lost a broker of an existing cluster 
> for whatever reason, and want to re-replicate data to it from some point in 
> time. We want this re-replicated data to be throttled. Again, we will check 
> that the re-replication took at least a certain amount of time based on the 
> value of the throttle and the amount of data being replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4213) Add system tests for replication throttling (KIP-73)

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517878#comment-15517878
 ] 

ASF GitHub Bot commented on KAFKA-4213:
---

Github user apurvam closed the pull request at:

https://github.com/apache/kafka/pull/1903


> Add system tests for replication throttling (KIP-73)
> 
>
> Key: KAFKA-4213
> URL: https://issues.apache.org/jira/browse/KAFKA-4213
> Project: Kafka
>  Issue Type: Test
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> Add system tests for replication throttling. The two main things to test are: 
> 1. kafka-reassign-partitions: in this use case, a new broker is added to a 
> cluster, and we are testing throttling of the partitions being replicated to 
> this cluster. The '--throttle' option of the reassign partitions tool is what 
> we want to test. we will invoke the tool with this option, and assert that 
> the replication takes a minimum amount of time, based on the throttle and the 
> amount of data being replicated.
> 2. kafka-configs: in this use case, we lost a broker of an existing cluster 
> for whatever reason, and want to re-replicate data to it from some point in 
> time. We want this re-replicated data to be throttled. Again, we will check 
> that the re-replication took at least a certain amount of time based on the 
> value of the throttle and the amount of data being replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1903: KAFKA-4213: First set of system tests for replicat...

2016-09-23 Thread apurvam
Github user apurvam closed the pull request at:

https://github.com/apache/kafka/pull/1903


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4213) Add system tests for replication throttling (KIP-73)

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517875#comment-15517875
 ] 

ASF GitHub Bot commented on KAFKA-4213:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1903

KAFKA-4213: First set of system tests for replication throttling 

Added the first set of system tests for replication quotas. These tests 
validate throttling behavior during partition reassigment.

Along with this patch are fixes to the test framework which include:

1. KakfaService.verify_replica_reassignment: this method was a no-op and 
would always return success, as explained in KAFKA-4204. This patch adds a 
workaround to the problems mentioned there, by grepping correctly for success, 
failure, and 'in progress' states of partition reassignment.
2.ProduceConsumeValidateTest.annotate_missing_messages would call 
missing.pop() to enumerate the first 20 missing messages. This meant that all 
future counts of what is actually missing would be off by 20, leading to the 
impression of data loss.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka throttling-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1903


commit 372db72c2da55dd2aba70019b258429855832804
Author: Apurva Mehta 
Date:   2016-09-09T18:53:42Z

Merge remote-tracking branch 'apache/trunk' into trunk

commit ae912d444d3fb63c2e5487f88949408e0b1207e9
Author: Jason Gustafson 
Date:   2016-09-09T20:44:55Z

KAFKA-3807; Fix transient test failure caused by race on future completion

Author: Jason Gustafson 

Reviewers: Dan Norwood , Ismael Juma 


Closes #1821 from hachikuji/KAFKA-3807

commit d0a86ffdec330f6e7213a370287a2d81bb93e2bc
Author: Vahid Hashemian 
Date:   2016-09-10T07:16:23Z

KAFKA-4145; Avoid redundant integration testing in ProducerSendTests

Author: Vahid Hashemian 

Reviewers: Ismael Juma 

Closes #1842 from vahidhashemian/KAFKA-4145

commit 42b5583561895e308063ed9e2186d83c83ca35d8
Author: Jason Gustafson 
Date:   2016-09-11T07:46:20Z

KAFKA-4147; Fix transient failure in 
ConsumerCoordinatorTest.testAutoCommitDynamicAssignment

Author: Jason Gustafson 

Reviewers: Ismael Juma 

Closes #1841 from hachikuji/KAFKA-4147

commit e7697ad0ab0f292ad1e29d9a159d113574bfcf67
Author: Eric Wasserman 
Date:   2016-09-12T01:45:05Z

KAFKA-1981; Make log compaction point configurable

Now uses LogSegment.largestTimestamp to determine age of segment's messages.

Author: Eric Wasserman 

Reviewers: Jun Rao 

Closes #1794 from ewasserman/feat-1981

commit b36034eaa4eb284fafddb1a7507a2cf187993e62
Author: Damian Guy 
Date:   2016-09-12T04:00:32Z

MINOR: catch InvalidStateStoreException in QueryableStateIntegrationTest

A couple of the tests may transiently fail in QueryableStateIntegrationTest 
as they are not catching InvalidStateStoreException. This exception is expected 
during rebalance.

Author: Damian Guy 

Reviewers: Eno Thereska, Guozhang Wang

Closes #1840 from dguy/minor-fix

commit 642b709f919a02379f9d0c9313586b02d179ca78
Author: Tim Brooks 
Date:   2016-09-13T03:28:01Z

KAFKA-2311; Make KafkaConsumer's ensureNotClosed method thread-safe

Here is the patch on github ijuma.

Acquiring the consumer lock (the single thread access controls) requires 
that the consumer be open. I changed the closed variable to be volatile so that 
another thread's writes will visible to the reading thread.

Additionally, there was an additional check if the consumer was closed 
after the lock was acquired. This check is no longer necessary.

This is my original work and I license it to the project under the 
project's open source license.

Author: Tim Brooks 

Reviewers: Jason Gustafson 

Closes #1637 from tbrooks8/KAFKA-2311

commit ca539df5887bdfdbe86ba45f5514ed54b3b648d4
Author: Dong Lin 
Date:   2016-09-14T00:33:54Z

KAFKA-4158; Reset quota to default value if quota override is deleted

Author: Dong Lin 

Reviewers: Joel Koshy 

[jira] [Work started] (KAFKA-4213) Add system tests for replication throttling (KIP-73)

2016-09-23 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4213 started by Apurva Mehta.
---
> Add system tests for replication throttling (KIP-73)
> 
>
> Key: KAFKA-4213
> URL: https://issues.apache.org/jira/browse/KAFKA-4213
> Project: Kafka
>  Issue Type: Test
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> Add system tests for replication throttling. The two main things to test are: 
> 1. kafka-reassign-partitions: in this use case, a new broker is added to a 
> cluster, and we are testing throttling of the partitions being replicated to 
> this cluster. The '--throttle' option of the reassign partitions tool is what 
> we want to test. we will invoke the tool with this option, and assert that 
> the replication takes a minimum amount of time, based on the throttle and the 
> amount of data being replicated.
> 2. kafka-configs: in this use case, we lost a broker of an existing cluster 
> for whatever reason, and want to re-replicate data to it from some point in 
> time. We want this re-replicated data to be throttled. Again, we will check 
> that the re-replication took at least a certain amount of time based on the 
> value of the throttle and the amount of data being replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1903: KAFKA-4213: First set of system tests for replicat...

2016-09-23 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1903

KAFKA-4213: First set of system tests for replication throttling 

Added the first set of system tests for replication quotas. These tests 
validate throttling behavior during partition reassigment.

Along with this patch are fixes to the test framework which include:

1. KakfaService.verify_replica_reassignment: this method was a no-op and 
would always return success, as explained in KAFKA-4204. This patch adds a 
workaround to the problems mentioned there, by grepping correctly for success, 
failure, and 'in progress' states of partition reassignment.
2.ProduceConsumeValidateTest.annotate_missing_messages would call 
missing.pop() to enumerate the first 20 missing messages. This meant that all 
future counts of what is actually missing would be off by 20, leading to the 
impression of data loss.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka throttling-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1903


commit 372db72c2da55dd2aba70019b258429855832804
Author: Apurva Mehta 
Date:   2016-09-09T18:53:42Z

Merge remote-tracking branch 'apache/trunk' into trunk

commit ae912d444d3fb63c2e5487f88949408e0b1207e9
Author: Jason Gustafson 
Date:   2016-09-09T20:44:55Z

KAFKA-3807; Fix transient test failure caused by race on future completion

Author: Jason Gustafson 

Reviewers: Dan Norwood , Ismael Juma 


Closes #1821 from hachikuji/KAFKA-3807

commit d0a86ffdec330f6e7213a370287a2d81bb93e2bc
Author: Vahid Hashemian 
Date:   2016-09-10T07:16:23Z

KAFKA-4145; Avoid redundant integration testing in ProducerSendTests

Author: Vahid Hashemian 

Reviewers: Ismael Juma 

Closes #1842 from vahidhashemian/KAFKA-4145

commit 42b5583561895e308063ed9e2186d83c83ca35d8
Author: Jason Gustafson 
Date:   2016-09-11T07:46:20Z

KAFKA-4147; Fix transient failure in 
ConsumerCoordinatorTest.testAutoCommitDynamicAssignment

Author: Jason Gustafson 

Reviewers: Ismael Juma 

Closes #1841 from hachikuji/KAFKA-4147

commit e7697ad0ab0f292ad1e29d9a159d113574bfcf67
Author: Eric Wasserman 
Date:   2016-09-12T01:45:05Z

KAFKA-1981; Make log compaction point configurable

Now uses LogSegment.largestTimestamp to determine age of segment's messages.

Author: Eric Wasserman 

Reviewers: Jun Rao 

Closes #1794 from ewasserman/feat-1981

commit b36034eaa4eb284fafddb1a7507a2cf187993e62
Author: Damian Guy 
Date:   2016-09-12T04:00:32Z

MINOR: catch InvalidStateStoreException in QueryableStateIntegrationTest

A couple of the tests may transiently fail in QueryableStateIntegrationTest 
as they are not catching InvalidStateStoreException. This exception is expected 
during rebalance.

Author: Damian Guy 

Reviewers: Eno Thereska, Guozhang Wang

Closes #1840 from dguy/minor-fix

commit 642b709f919a02379f9d0c9313586b02d179ca78
Author: Tim Brooks 
Date:   2016-09-13T03:28:01Z

KAFKA-2311; Make KafkaConsumer's ensureNotClosed method thread-safe

Here is the patch on github ijuma.

Acquiring the consumer lock (the single thread access controls) requires 
that the consumer be open. I changed the closed variable to be volatile so that 
another thread's writes will visible to the reading thread.

Additionally, there was an additional check if the consumer was closed 
after the lock was acquired. This check is no longer necessary.

This is my original work and I license it to the project under the 
project's open source license.

Author: Tim Brooks 

Reviewers: Jason Gustafson 

Closes #1637 from tbrooks8/KAFKA-2311

commit ca539df5887bdfdbe86ba45f5514ed54b3b648d4
Author: Dong Lin 
Date:   2016-09-14T00:33:54Z

KAFKA-4158; Reset quota to default value if quota override is deleted

Author: Dong Lin 

Reviewers: Joel Koshy , Jiangjie Qin 


Closes #1851 from lindong28/KAFKA-4158

commit ba712d29eb2880fbf1709b5d0921028735a09f68
Author: Ismael Juma 
Date:   2016-09-14T16:16:29Z

MINOR: 

[jira] [Created] (KAFKA-4213) Add system tests for replication throttling (KIP-73)

2016-09-23 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-4213:
---

 Summary: Add system tests for replication throttling (KIP-73)
 Key: KAFKA-4213
 URL: https://issues.apache.org/jira/browse/KAFKA-4213
 Project: Kafka
  Issue Type: Test
Reporter: Apurva Mehta
Assignee: Apurva Mehta


Add system tests for replication throttling. The two main things to test are: 

1. kafka-reassign-partitions: in this use case, a new broker is added to a 
cluster, and we are testing throttling of the partitions being replicated to 
this cluster. The '--throttle' option of the reassign partitions tool is what 
we want to test. we will invoke the tool with this option, and assert that the 
replication takes a minimum amount of time, based on the throttle and the 
amount of data being replicated.

2. kafka-configs: in this use case, we lost a broker of an existing cluster for 
whatever reason, and want to re-replicate data to it from some point in time. 
We want this re-replicated data to be throttled. Again, we will check that the 
re-replication took at least a certain amount of time based on the value of the 
throttle and the amount of data being replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1313) Support adding replicas to existing topic partitions

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1313:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Support adding replicas to existing topic partitions
> 
>
> Key: KAFKA-1313
> URL: https://issues.apache.org/jira/browse/KAFKA-1313
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.8.0
>Reporter: Marc Labbe
>Assignee: Geoff Anderson
>Priority: Critical
>  Labels: newbie++
> Fix For: 0.10.2.0
>
>
> There is currently no easy way to add replicas to an existing topic 
> partitions.
> For example, topic create-test has been created with ReplicationFactor=1: 
> Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
> Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
> Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
> Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
> I would like to increase the ReplicationFactor=2 (or more) so it shows up 
> like this instead.
> Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
> Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
> Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
> Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
> Use cases for this:
> - adding brokers and thus increase fault tolerance
> - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1044) change log4j to slf4j

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1044:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> change log4j to slf4j 
> --
>
> Key: KAFKA-1044
> URL: https://issues.apache.org/jira/browse/KAFKA-1044
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0
>Reporter: shijinkui
> Fix For: 0.10.2.0
>
>
> can u chanage the log4j to slf4j, in my project, i use logback, it's conflict 
> with log4j.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3462) Allow SinkTasks to disable consumer offset commit

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3462:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Allow SinkTasks to disable consumer offset commit 
> --
>
> Key: KAFKA-3462
> URL: https://issues.apache.org/jira/browse/KAFKA-3462
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Liquan Pei
>Assignee: Liquan Pei
>Priority: Minor
> Fix For: 0.10.2.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
>  SinkTasks should be able to disable consumer offset commit if they manage 
> offsets in the sink data store rather than using Kafka consumer offsets.  For 
> example, an HDFS connector might record offsets in HDFS to provide exactly 
> once delivery. When the SinkTask is started or a rebalance occurs, the task 
> would reload offsets from HDFS. In this case, disabling consumer offset 
> commit will save some CPU cycles and network IOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3987) Allow configuration of the hash algorithm used by the LogCleaner's offset map

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3987:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Allow configuration of the hash algorithm used by the LogCleaner's offset map
> -
>
> Key: KAFKA-3987
> URL: https://issues.apache.org/jira/browse/KAFKA-3987
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Reporter: Luciano Afranllie
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> In order to be able to do deployments of Kafka that are FIPS 140-2 
> (https://en.wikipedia.org/wiki/FIPS_140-2) complaint one of the requirements 
> is not to use MD5.
> Kafka is using MD5 to hash message keys in the offset map (SkimpyOffsetMap) 
> used by the log cleaner.
> The idea is to be able to configure this hash algorithm to something allowed 
> by FIPS using a new configuration property.
> The property could be named "log.cleaner.hash.algorithm" with a default value 
> equal to "MD5" and the idea is to use it in the constructor of CleanerConfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2273) Add rebalance with a minimal number of reassignments to server-defined strategy list

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2273:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add rebalance with a minimal number of reassignments to server-defined 
> strategy list
> 
>
> Key: KAFKA-2273
> URL: https://issues.apache.org/jira/browse/KAFKA-2273
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Olof Johansson
>Assignee: Vahid Hashemian
>  Labels: newbie++, newbiee
> Fix For: 0.10.2.0
>
>
> Add a new partitions.assignment.strategy to the server-defined list that will 
> do reassignments based on moving as few partitions as possible. This should 
> be a quite common reassignment strategy especially for the cases where the 
> consumer has to maintain state, either in memory, or on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2617) Move protocol field default values to Protocol

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2617:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Move protocol field default values to Protocol
> --
>
> Key: KAFKA-2617
> URL: https://issues.apache.org/jira/browse/KAFKA-2617
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Jakub Nowak
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> Right now the default values are scattered in the Request / Response classes, 
> and some duplicates already exists like JoinGroupRequest.UNKNOWN_CONSUMER_ID 
> and OffsetCommitRequest.DEFAULT_CONSUMER_ID. We would like to move all 
> default values into org.apache.kafka.common.protocol.Protocol since 
> org.apache.kafka.common.requests depends on org.apache.kafka.common.protocol 
> anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2145) An option to add topic owners.

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2145:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> An option to add topic owners. 
> ---
>
> Key: KAFKA-2145
> URL: https://issues.apache.org/jira/browse/KAFKA-2145
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Parth Brahmbhatt
>Assignee: Parth Brahmbhatt
> Fix For: 0.10.2.0
>
>
> We need to expose a way so users can identify users/groups that share 
> ownership of topic. We discussed adding this as part of 
> https://issues.apache.org/jira/browse/KAFKA-2035 and agreed that it will be 
> simpler to add owner as a logconfig. 
> The owner field can be used for auditing and also by authorization layer to 
> grant access without having to explicitly configure acls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1207) Launch Kafka from within Apache Mesos

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1207:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Launch Kafka from within Apache Mesos
> -
>
> Key: KAFKA-1207
> URL: https://issues.apache.org/jira/browse/KAFKA-1207
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>  Labels: mesos
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-1207.patch, KAFKA-1207_2014-01-19_00:04:58.patch, 
> KAFKA-1207_2014-01-19_00:48:49.patch
>
>
> There are a few components to this.
> 1) The Framework:  This is going to be responsible for starting up and 
> managing the fail over of brokers within the mesos cluster.  This will have 
> to get some Kafka focused paramaters for launching new replica brokers, 
> moving topics and partitions around based on what is happening in the grid 
> through time.
> 2) The Scheduler: This is what is going to ask for resources for Kafka 
> brokers (new ones, replacement ones, commissioned ones) and other operations 
> such as stopping tasks (decommissioning brokers).  I think this should also 
> expose a user interface (or at least a rest api) for producers and consumers 
> so we can have producers and consumers run inside of the mesos cluster if 
> folks want (just add the jar)
> 3) The Executor : This is the task launcher.  It launches tasks kills them 
> off.
> 4) Sharing data between Scheduler and Executor: I looked at the a few 
> implementations of this.  I like parts of the Storm implementation but think 
> using the environment variable 
> ExectorInfo.CommandInfo.Enviornment.Variables[] is the best shot.  We can 
> have a command line bin/kafka-mesos-scheduler-start.sh that would build the 
> contrib project if not already built and support conf/server.properties to 
> start.
> The Framework and operating Scheduler would run in on an administrative node. 
>  I am probably going to hook Apache Curator into it so it can do it's own 
> failure to a another follower.  Running more than 2 should be sufficient as 
> long as it can bring back it's state (e.g. from zk).  I think we can add this 
> in after once everything is working.
> Additional detail can be found on the Wiki page 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1894) Avoid long or infinite blocking in the consumer

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1894:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Avoid long or infinite blocking in the consumer
> ---
>
> Key: KAFKA-1894
> URL: https://issues.apache.org/jira/browse/KAFKA-1894
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
> Fix For: 0.10.2.0
>
>
> The new consumer has a lot of loops that look something like
> {code}
>   while(!isThingComplete())
> client.poll();
> {code}
> This occurs both in KafkaConsumer but also in NetworkClient.completeAll. 
> These retry loops are actually mostly the behavior we want but there are 
> several cases where they may cause problems:
>  - In the case of a hard failure we may hang for a long time or indefinitely 
> before realizing the connection is lost.
>  - In the case where the cluster is malfunctioning or down we may retry 
> forever.
> It would probably be better to give a timeout to these. The proposed approach 
> would be to add something like retry.time.ms=6 and only continue retrying 
> for that period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3281) Improve message of stop scripts when no processes are running

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3281:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve message of stop scripts when no processes are running
> -
>
> Key: KAFKA-3281
> URL: https://issues.apache.org/jira/browse/KAFKA-3281
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Sasaki Toru
>Assignee: Sasaki Toru
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> Stop scritps such as kafka-server-stop.sh log messages of kill command's 
> error when processes aren't running.
> Example(Brokers are not running):
> {code}
> $ bin/kafka-server-stop.sh 
> kill: invalid argument S
> Usage:
>  kill [options]  [...]
> Options:
>   [...]send signal to every  listed
>  -, -s, --signal 
> specify the  to be sent
>  -l, --list=[]  list all signal names, or convert one to a name
>  -L, --tablelist all signal names in a nice table
>  -h, --help display this help and exit
>  -V, --version  output version information and exit
> For more details see kill(1).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2955) Add Prompt to kafka-console-producer

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2955:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add Prompt to kafka-console-producer
> 
>
> Key: KAFKA-2955
> URL: https://issues.apache.org/jira/browse/KAFKA-2955
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: Jesse Anderson
>Assignee: Manikumar Reddy
> Fix For: 0.10.2.0
>
>
> A common source of confusion for people using the kafka-console-producer is a 
> lack of prompt. People think that kafka-console-producer is still starting up 
> or connecting. Adding a ">" prompt to show that the kafka-console-producer is 
> ready will fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2500) Expose fetch response high watermark in ConsumerRecords

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2500:
---
Summary: Expose fetch response high watermark in ConsumerRecords  (was: 
Make logEndOffset available in the 0.8.3 Consumer)

> Expose fetch response high watermark in ConsumerRecords
> ---
>
> Key: KAFKA-2500
> URL: https://issues.apache.org/jira/browse/KAFKA-2500
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Will Funnell
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2500) Make logEndOffset available in the 0.8.3 Consumer

2016-09-23 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517826#comment-15517826
 ] 

Jason Gustafson commented on KAFKA-2500:


Renaming this ticket to avoid confusion with KIP-79 which exposed an API for 
retrieving the current high watermark. I think this issue is about whether or 
not to expose the high watermark returned from fetch responses in the 
{{ConsumerRecords}} object. One benefit of doing so is that it makes lag 
tracking available to interceptors.

> Make logEndOffset available in the 0.8.3 Consumer
> -
>
> Key: KAFKA-2500
> URL: https://issues.apache.org/jira/browse/KAFKA-2500
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Will Funnell
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2387) Improve KafkaConsumer API

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2387:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Improve KafkaConsumer API
> -
>
> Key: KAFKA-2387
> URL: https://issues.apache.org/jira/browse/KAFKA-2387
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
> Fix For: 0.10.2.0
>
>
> Currently KafkaConsumer API has several behaviors that are not intuitive or 
> might be a little bit hard to use. This is an umbrella ticket for the 
> improvements of API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2500) Expose fetch response high watermark in ConsumerRecords

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2500:
---
Fix Version/s: 0.10.2.0

> Expose fetch response high watermark in ConsumerRecords
> ---
>
> Key: KAFKA-2500
> URL: https://issues.apache.org/jira/browse/KAFKA-2500
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Will Funnell
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-3283) Remove beta from new consumer documentation

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson closed KAFKA-3283.
--

> Remove beta from new consumer documentation
> ---
>
> Key: KAFKA-3283
> URL: https://issues.apache.org/jira/browse/KAFKA-3283
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> Ideally, we would:
> * Remove the beta label
> * Filling any critical gaps in functionality
> * Update the documentation on the old consumers to recommend the new consumer 
> (without deprecating the old consumer, however)
> Current target is 0.10.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2435:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3304) KIP-35 - Retrieving protocol version

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3304:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> KIP-35 - Retrieving protocol version
> 
>
> Key: KAFKA-3304
> URL: https://issues.apache.org/jira/browse/KAFKA-3304
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> Uber JIRA to track adding of functionality to retrieve protocol versions. 
> More discussion can be found on 
> [KIP-35|https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3545) Generalized Serdes for List/Map

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3545:
-
Fix Version/s: (was: 0.10.1.0)

> Generalized Serdes for List/Map
> ---
>
> Key: KAFKA-3545
> URL: https://issues.apache.org/jira/browse/KAFKA-3545
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Greg Fodor
>Priority: Minor
>  Labels: api, newbie
>
> In working with Kafka Streams I've found it's often the case I want to 
> perform a "group by" operation, where I repartition a stream based on a 
> foreign key and then do an aggregation of all the values into a single 
> collection, so the stream becomes one where each entry has a value that is a 
> serialized list of values that belonged to the key. (This seems unrelated to 
> the 'group by' operation talked about in KAFKA-3544.) Basically the same 
> typical group by operation found in systems like Cascading.
> In order to create these intermediate list values I needed to define custom 
> avro schemas that simply wrap the elements of interest into a list. It seems 
> desirable that there be some basic facility for constructing simple Serdes of 
> Lists/Maps/Sets of other types, potentially using avro's serialization under 
> the hood. If this existed in the core library it would also enable the 
> addition of higher level operations on streams that can use these Serdes to 
> perform simple operations like the "group by" example I mention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1300) Added WaitForReplaction admin tool.

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1300:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Added WaitForReplaction admin tool.
> ---
>
> Key: KAFKA-1300
> URL: https://issues.apache.org/jira/browse/KAFKA-1300
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.8.0
> Environment: Ubuntu 12.04
>Reporter: Brenden Matthews
>  Labels: patch
> Fix For: 0.10.2.0
>
> Attachments: 0001-Added-WaitForReplaction-admin-tool.patch
>
>
> I have created a tool similar to the broker shutdown tool for doing rolling 
> restarts of Kafka clusters.
> The tool watches the max replica lag of the specified broker, and waits until 
> the lag drops to 0 before exiting.
> To do a rolling restart, here's the process we use:
> for (broker <- brokers) {
>   run shutdown tool for broker
>   terminate broker
>   start new broker
>   run wait for replication tool on new broker
> }
> Here's an example command line use:
> ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
> zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3715) Higher granularity streams metrics

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3715:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Higher granularity streams metrics 
> ---
>
> Key: KAFKA-3715
> URL: https://issues.apache.org/jira/browse/KAFKA-3715
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jeff Klukas
>Assignee: aarti gupta
>Priority: Minor
>  Labels: api, newbie
> Fix For: 0.10.2.0
>
>
> Originally proposed by [~guozhang] in 
> https://github.com/apache/kafka/pull/1362#issuecomment-218326690
> We can consider adding metrics for process / punctuate / commit rate at the 
> granularity of each processor node in addition to the global rate mentioned 
> above. This is very helpful in debugging.
> We can consider adding rate / total cumulated metrics for context.forward 
> indicating how many records were forwarded downstream from this processor 
> node as well. This is helpful in debugging.
> We can consider adding metrics for each stream partition's timestamp. This is 
> helpful in debugging.
> Besides the latency metrics, we can also add throughput latency in terms of 
> source records consumed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3835) Streams is creating two ProducerRecords for each send via RecordCollector

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3835:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Streams is creating two ProducerRecords for each send via RecordCollector
> -
>
> Key: KAFKA-3835
> URL: https://issues.apache.org/jira/browse/KAFKA-3835
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Damian Guy
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> The RecordCollector.send(..) method below, currently receives a 
> ProducerRecord from its caller and then creates another one to forward on to 
> its producer.  The creation of 2 ProducerRecords should be removed.
> {code}
> public  void send(ProducerRecord record, Serializer 
> keySerializer, Serializer valueSerializer,
> StreamPartitioner partitioner)
> {code}
> We could replace the above method with
> {code}
> public  void send(String topic,
> K key,
> V value,
> Integer partition,
> Long timestamp,
> Serializer keySerializer,
> Serializer valueSerializer,
> StreamPartitioner partitioner)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1449) Extend wire protocol to allow CRC32C

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1449:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Extend wire protocol to allow CRC32C
> 
>
> Key: KAFKA-1449
> URL: https://issues.apache.org/jira/browse/KAFKA-1449
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Albert Strasheim
>Assignee: Neha Narkhede
> Fix For: 0.10.2.0
>
>
> Howdy
> We are currently building out a number of Kafka consumers in Go, based on a 
> patched version of the Sarama library that Shopify released a while back.
> We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
> and lots of cores. We have various consumers computing all kinds of 
> aggregates on a reasonably high volume access log stream (1.1e6 messages/sec 
> peak, about 500-600 bytes per message uncompressed).
> When profiling our consumer, our single hottest function (until we disabled 
> it), was the CRC32 checksum validation, since the deserialization and 
> aggregation in these consumers is pretty cheap.
> We believe things could be improved by extending the wire protocol to support 
> CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
> calculation.
> https://en.wikipedia.org/wiki/SSE4#SSE4.2
> It might be hard to use from Java, but consumers written in most other 
> languages will benefit a lot.
> To give you an idea, here are some benchmarks for the Go CRC32 functions 
> running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:
> BenchmarkCrc32KB   90196 ns/op 363.30 MB/s
> BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s
> I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
> CRC32-C speed should be close to what one achieves in Go.
> (Met Todd and Clark at the meetup last night. Thanks for the great 
> presentation!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3537) Provide access to low-level Metrics in ProcessorContext

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3537:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Provide access to low-level Metrics in ProcessorContext
> ---
>
> Key: KAFKA-3537
> URL: https://issues.apache.org/jira/browse/KAFKA-3537
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.9.0.1
>Reporter: Michael Coon
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: semantics
> Fix For: 0.10.2.0
>
>
> It would be good to have access to the underlying Metrics component in 
> StreamMetrics. StreamMetrics forces a naming convention for metrics that does 
> not fit our use case for reporting. We need to be able to convert the stream 
> metrics to our own metrics formatting and it's cumbersome to extract group/op 
> names from pre-formatted strings the way they are setup in StreamMetricsImpl. 
> If there were a "metrics()" method of StreamMetrics to give me the underlying 
> Metrics object, I could register my own sensors/metrics as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3534) Deserialize on demand when default time extractor used

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3534:
-
Fix Version/s: (was: 0.10.1.0)

> Deserialize on demand when default time extractor used
> --
>
> Key: KAFKA-3534
> URL: https://issues.apache.org/jira/browse/KAFKA-3534
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.9.0.1
>Reporter: Michael Coon
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: performance
>
> When records are added to the RecordQueue, they are deserialized at that time 
> in order to extract the timestamp. But for some data flows where large 
> messages are consumed (particularly compressed messages), this can result in 
> large spikes in memory as all messages must be deserialized prior to 
> processing (and getting out of memory). An optimization might be to only 
> require deserialization at this stage if a non-default timestamp extractor is 
> being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3184) Add Checkpoint for In-memory State Store

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3184:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add Checkpoint for In-memory State Store
> 
>
> Key: KAFKA-3184
> URL: https://issues.apache.org/jira/browse/KAFKA-3184
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.2.0
>
>
> Currently Kafka Streams does not make a checkpoint of the persistent state 
> store upon committing, which would be expensive since it is "stopping the 
> world" and write on disks: for example, RocksDB would require you to copy the 
> file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous 
> manner hence it can be done quickly. And the benefit of having intermediate 
> checkpoint is to avoid restoring from scratch if standby tasks are not 
> present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4168) More precise accounting of memory usage

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4168:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> More precise accounting of memory usage
> ---
>
> Key: KAFKA-4168
> URL: https://issues.apache.org/jira/browse/KAFKA-4168
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
> Fix For: 0.10.2.0
>
>
> Right now, the cache.max.bytes.buffering parameter controls the size of the 
> cache used. Specifically the size includes the size of the values stored in 
> the cache plus basic overheads, such as key size, all the LRU entry sizes, 
> etc. However, we could be more fine-grained in the memory accounting and add 
> up the size of hash sets, hash maps and their entries more precisely. For 
> example, currently a dirty entry is placed into a dirty keys set, but we do 
> not account for the size of that set in the memory consumption calculation.
> It is likely this falls under "memory management" rather than "buffer cache 
> management".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3779) Add the LRU cache for KTable.to() operator

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3779:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add the LRU cache for KTable.to() operator
> --
>
> Key: KAFKA-3779
> URL: https://issues.apache.org/jira/browse/KAFKA-3779
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
> Fix For: 0.10.2.0
>
>
> The KTable.to operator currently does not use a cache. We can add a cache to 
> this operator to deduplicate and reduce data traffic as well. This is to be 
> done after KAFKA-3777.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3559) Task creation time taking too long in rebalance callback

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3559:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Task creation time taking too long in rebalance callback
> 
>
> Key: KAFKA-3559
> URL: https://issues.apache.org/jira/browse/KAFKA-3559
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: architecture
> Fix For: 0.10.2.0
>
>
> Currently in Kafka Streams, we create stream tasks upon getting newly 
> assigned partitions in rebalance callback function {code} onPartitionAssigned 
> {code}, which involves initialization of the processor state stores as well 
> (including opening the rocksDB, restore the store from changelog, etc, which 
> takes time).
> With a large number of state stores, the initialization time itself could 
> take tens of seconds, which usually is larger than the consumer session 
> timeout. As a result, when the callback is completed, the consumer is already 
> treated as failed by the coordinator and rebalance again.
> We need to consider if we can optimize the initialization process, or move it 
> out of the callback function, and while initializing the stores one-by-one, 
> use poll call to send heartbeats to avoid being kicked out by coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4058) Failure in org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset

2016-09-23 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517803#comment-15517803
 ] 

Guozhang Wang commented on KAFKA-4058:
--

[~mjsax] Could you take a look at this issue again before the 0.10.1.0 code 
freeze?

> Failure in 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset
> --
>
> Key: KAFKA-4058
> URL: https://issues.apache.org/jira/browse/KAFKA-4058
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: test
> Fix For: 0.10.1.0
>
>
> {code}
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.cleanGlobal(ResetIntegrationTest.java:225)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset(ResetIntegrationTest.java:103)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:377)
>   at 
> 

[jira] [Updated] (KAFKA-4001) Improve Kafka Streams Join Semantics (KIP-77)

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4001:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Improve Kafka Streams Join Semantics (KIP-77)
> -
>
> Key: KAFKA-4001
> URL: https://issues.apache.org/jira/browse/KAFKA-4001
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
> Fix For: 0.10.2.0
>
>
> Kafka Streams supports three types of joins:
> * KStream-KStream
> * KStream-KTable
> * KTable-KTable
> Furthermore, Kafka Streams supports the join variant, namely
> * inner join
> * left join
> * outer join
> Not all combination of "type" and "variant" are supported.
> *The problem is, that the semantics of the different joins do use different 
> semantics (and are thus inconsistent).*
> With this ticket, we want to
> * introduce unique semantics over all joins
> * improve handling of "null"
> * add missing inner KStream-KTable join
> See KIP-76 for more details: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-76%3A+Improve+Kafka+Streams+Join+Semantics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3452) Support session windows besides time interval windows

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3452:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Support session windows besides time interval windows
> -
>
> Key: KAFKA-3452
> URL: https://issues.apache.org/jira/browse/KAFKA-3452
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.2.0
>
>
> The Streams DSL currently does not provide session window as in the DataFlow 
> model. We have seen some common use cases for this feature and it's better 
> adding this support asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3973) Investigate feasibility of caching bytes vs. records

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3973:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Investigate feasibility of caching bytes vs. records
> 
>
> Key: KAFKA-3973
> URL: https://issues.apache.org/jira/browse/KAFKA-3973
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
> Attachments: MemBytesBenchmark.txt
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3262) Make KafkaStreams debugging friendly

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3262:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Make KafkaStreams debugging friendly
> 
>
> Key: KAFKA-3262
> URL: https://issues.apache.org/jira/browse/KAFKA-3262
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Yasuhiro Matsuda
>Assignee: Eno Thereska
>  Labels: user-experience
> Fix For: 0.10.2.0
>
>
> Current KafkaStreams polls records in the same thread as the data processing 
> thread. This makes debugging user code, as well as KafkaStreams itself, 
> difficult. When the thread is suspended by the debugger, the next heartbeat 
> of the consumer tie to the thread won't be send until the thread is resumed. 
> This often results in missed heartbeats and causes a group rebalance. So it 
> may will be a completely different context then the thread hits the break 
> point the next time.
> We should consider using separate threads for polling and processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3903) Convert tests to use static helper methods for Consumer/Producer/StreamsConfigs setup

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3903:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Convert tests to use static helper methods for 
> Consumer/Producer/StreamsConfigs setup
> -
>
> Key: KAFKA-3903
> URL: https://issues.apache.org/jira/browse/KAFKA-3903
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
> Fix For: 0.10.2.0
>
>
> There are several unit/integration tests where we create 
> Consumer/Producer/Streams configs.  All of these calls essentially create the 
> same configs over and over.  We should migrate these config setups to use the 
> static helper methods  TestUtils.consumerConfigs, TestUtils.producerConfigs, 
> StreamsTestUtils.getStreamsConfigs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3705) Support non-key joining in KTable

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3705:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Support non-key joining in KTable
> -
>
> Key: KAFKA-3705
> URL: https://issues.apache.org/jira/browse/KAFKA-3705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Liquan Pei
>  Labels: api
> Fix For: 0.10.2.0
>
>
> Today in Kafka Streams DSL, KTable joins are only based on keys. If users 
> want to join a KTable A by key {{a}} with another KTable B by key {{b}} but 
> with a "foreign key" {{a}}, and assuming they are read from two topics which 
> are partitioned on {{a}} and {{b}} respectively, they need to do the 
> following pattern:
> {code}
> tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' 
> is partitioned on "a"
> tableA.join(tableB', joiner);
> {code}
> Even if these two tables are read from two topics which are already 
> partitioned on {{a}}, users still need to do the pre-aggregation in order to 
> make the two joining streams to be on the same key. This is a draw-back from 
> programability and we should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3429) Remove Serdes needed for repartitioning in KTable stateful operations

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3429:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Remove Serdes needed for repartitioning in KTable stateful operations
> -
>
> Key: KAFKA-3429
> URL: https://issues.apache.org/jira/browse/KAFKA-3429
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: api, newbie++
> Fix For: 0.10.2.0
>
>
> Currently in KTable aggregate operations where a repartition is possibly 
> needed since the aggregation key may not be the same as the original primary 
> key, we require the users to provide serdes (default to configured ones) for 
> read / write to the internally created re-partition topic. However, these are 
> not necessary since for all KTable instances either generated from the topics 
> directly:
> {code}table = builder.table(...){code}
> or from aggregation operations:
> {code}table = stream.aggregate(...){code}
> There are already serde provided for materializing the data, and hence the 
> same serde can be re-used when the resulted KTable is involved in future 
> aggregation operations. For example:
> {code}
> table1 = stream.aggregateByKey(serde);
> table2 = table1.aggregate(aggregator, selector, originalSerde, 
> aggregateSerde);
> {code}
> We would not need to require users to specify the "originalSerde" in 
> table1.aggregate since it could always reuse the "serde" from 
> stream.aggregateByKey, which is used to materialize the table1 object.
> In order to get ride of it, implementation-wise we need to carry the serde 
> information along with the KTableImpl instance in order to re-use it in a 
> future operation that requires repartitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3514) Stream timestamp computation needs some further thoughts

2016-09-23 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3514:
-
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Stream timestamp computation needs some further thoughts
> 
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
> Fix For: 0.10.2.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as 
> selecting which stream to process next (i.e. best effort stream 
> synchronization). And it is defined as the smallest timestamp over all 
> partitions in the task's partition group. This results in two unintuitive 
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for 
> a period of time, and hence keep being process until that late record. For 
> example take two partitions within the same task annotated by their 
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected 
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
> until the record itself is dequeued and processed, then stream B will be 
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, 
> and hence the task timestamp as well since it is the smallest among all 
> partitions. This may not be a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3600) Enhance java clients to use ApiVersion Req/Resp to check if the broker they are talking to supports required api versions

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3600:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Enhance java clients to use ApiVersion Req/Resp to check if the broker they 
> are talking to supports required api versions
> -
>
> Key: KAFKA-3600
> URL: https://issues.apache.org/jira/browse/KAFKA-3600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Fix For: 0.10.2.0
>
>
> Use the new ApiVersion Req/Resp to enhance java clients to use ApiVersion 
> Req/Resp to check if the broker they are talking to supports required api 
> versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1326) New consumer checklist

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1326:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> New consumer checklist
> --
>
> Key: KAFKA-1326
> URL: https://issues.apache.org/jira/browse/KAFKA-1326
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Affects Versions: 0.8.2.1
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>  Labels: feature
> Fix For: 0.10.2.0
>
>
> We will use this JIRA to track the list of issues to resolve to get a working 
> new consumer client. The consumer client can work in phases -
> 1. Add new consumer APIs and configs
> 2. Refactor Sender. We will need to use some common APIs from Sender.java 
> (https://issues.apache.org/jira/browse/KAFKA-1316)
> 3. Add metadata fetch and refresh functionality to the consumer (This will 
> require https://issues.apache.org/jira/browse/KAFKA-1316)
> 4. Add functionality to support subscribe(TopicPartition...partitions). This 
> will add SimpleConsumer functionality to the new consumer. This does not 
> include any group management related work.
> 5. Add ability to commit offsets to Kafka. This will include adding 
> functionality to the commit()/commitAsync()/committed() APIs. This still does 
> not include any group management related work.
> 6. Add functionality to the offsetsBeforeTime() API.
> 7. Add consumer co-ordinator election to the server. This will only add a new 
> module for the consumer co-ordinator, but not necessarily all the logic to do 
> group management. 
> At this point, we will have a fully functional standalone consumer and a 
> server side co-ordinator module. This will be a good time to start adding 
> group management functionality to the server and consumer.
> 8. Add failure detection capability to the consumer when group management is 
> used. This will not include any rebalancing logic, just the ability to detect 
> failures using session.timeout.ms.
> 9. Add rebalancing logic to the server and consumer. This will be a tricky 
> and potentially large change since it will involve implementing the group 
> management protocol.
> 10. Add system tests for the new consumer
> 11. Add metrics 
> 12. Convert mirror maker to use the new consumer.
> 13. Convert perf test to use the new consumer
> 14. Performance testing and analysis.
> 15. Review and fine tune log4j logging



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-316) disallow recursively compressed message

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-316:
--
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> disallow recursively compressed message
> ---
>
> Key: KAFKA-316
> URL: https://issues.apache.org/jira/browse/KAFKA-316
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jun Rao
> Fix For: 0.10.2.0
>
> Attachments: 
> 0001-KAFKA-316-Disallow-MessageSets-within-MessageSets.patch
>
>
> Currently, it is possible to create a compressed Message that contains a set 
> of Messages, each of which is further compressed. Support recursively 
> compressed messages has little benefit and can complicates the on disk 
> storage format. We should probably disallow this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-72 Allow Sizing Incoming Request Queue in Bytes

2016-09-23 Thread Jun Rao
Hi, Radai,

Thanks for the updated KIP. A few more questions/comments below.

10. For "the mute/unmute happens just before poll(), which means as a worst
case there will be no reads for 300ms if memory was unavailable", I am
thinking that memory-pool could track if there is any pending request and
wake up the selector when memory is released and there is a pending
request. This way, poll() doesn't have to wait for the timeout if memory
frees up early.

11. For "to facilitate faster implementation (as a safety net) the pool
will be implemented in such a way that memory that was not release()ed (but
still garbage collected) would be detected and "reclaimed". this is to
prevent "leaks" in case of code paths that fail to release() properly.",
could you explain a bit at the high level how this is done?

12. For "As the pool would allow any size request if it has any capacity
available, the actual memory bound is queued.max.bytes +
socket.request.max.bytes.", it seems intuitively, the pool should only give
the Buffer back if it has enough available bytes. Then the request memory
can be bounded by queued.max.bytes. We can validate that queued.max.bytes
is at least socket.request.max.bytes.

13. For the naming, it seems request.queue.max.bytes is clearer than
queue.max.bytes.

Jun



On Thu, Sep 22, 2016 at 10:53 AM, radai  wrote:

> As discussed in the KIP call, I have updated the kip-72 page (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 72%3A+Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests)
> to record both configuration validations and implementation concerns.
> I've also implemented channel muting/unmuting in response to memory
> pressure. its available as a separate branch here -
> https://github.com/radai-rosenblatt/kafka/tree/broker-
> memory-pool-with-muting
> . the implementation without muting is here -
> https://github.com/radai-rosenblatt/kafka/tree/broker-memory-pool.
>
> the mute/unmute happens just before poll(), which means as a worst case
> there will be no reads for 300ms if memory was unavailable (thats the
> timeout untill the next poll). perhaps a design with dedicated read threads
> could do better (such a thread could actually block waiting for memory),
> but that would be a giant change.
>
> On Tue, Sep 13, 2016 at 9:20 AM, radai  wrote:
>
> > the specific memory pool implementation i wrote will allocate _any_
> amount
> > you request if it has _any_ memory available (so if it has 1 byte
> available
> > and you ask for 1MB you will get 1MB and the counter will go negative).
> > this was done to avoid issues with starvation of large requests. other
> > implementations may be more strict. to me this means that generally its
> not
> > a simple "have memory" vs "no memory" split (which gets worse under a
> > hypothetical tiered pool scheme for QoS).
> >
> > to allow this flexibility in pool implementation i must preserve the
> > amount of memory required. once read from the channel i cant put it back,
> > so i store it?
> >
> > On Tue, Sep 13, 2016 at 5:30 AM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> >> Is there any value in allowing the 4-byte size to be read even when the
> >> request memory limit has been reached? If not, you can disable OP_READ
> >> interest for all channels that are ready inside Selector.poll() when
> >> memory
> >> limit has been reached and re-enable before returning from poll().
> Perhaps
> >> a listener that is invoked when MemoryPool moves from unavailable to
> >> available state can wakeup the selector. The changes for this should be
> >> fairly contained without any additional channel state. And it would
> avoid
> >> the overhead of polls that return immediately even when progress cannot
> be
> >> made because memory limit has been reached.
> >>
> >> On Tue, Sep 13, 2016 at 12:31 AM, radai 
> >> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Yes, youre right - right now the next select() call will return
> >> immediately
> >> > with the same set of keys as earlier (at least) as they were not
> >> previously
> >> > handled (no memory).
> >> > My assumption is that this happens under considerable load - something
> >> has
> >> > to be occupying all this memory. also, this happens in the context of
> >> > SocketServer.Processor.run():
> >> >
> >> > while (isRunning) {
> >> >configureNewConnections()
> >> >processNewResponses()
> >> >poll()   <-- HERE
> >> >processCompletedReceives()
> >> >processCompletedSends()
> >> >processDisconnected()
> >> > }
> >> >
> >> > even within poll(), things like finishConnection(), prepare(), and
> >> write()s
> >> > can still make progress under low memory conditions. and given the
> load,
> >> > there's probably progress to be made in processCompletedReceives(),
> >> > processCompletedSends() and processDisconnected().
> >> >
> >> > if there's progress to be made in other things 

[jira] [Updated] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4064:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add support for infinite endpoints for range queries in Kafka Streams KV 
> stores
> ---
>
> Key: KAFKA-4064
> URL: https://issues.apache.org/jira/browse/KAFKA-4064
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> In some applications, it's useful to iterate over the key-value store either:
> 1. from the beginning up to a certain key
> 2. from a certain key to the end
> We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3625) Move kafka-streams test fixtures into a published package

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3625:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Move kafka-streams test fixtures into a published package
> -
>
> Key: KAFKA-3625
> URL: https://issues.apache.org/jira/browse/KAFKA-3625
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.2.0
>
>
> The KStreamTestDriver and related fixtures defined in 
> streams/src/test/java/org/apache/kafka/test would be useful to developers 
> building applications on top of Kafka Streams, but they are not currently 
> exposed in a package.
> I propose moving this directory to live under streams/fixtures/src/main and 
> creating a new 'streams:fixtures' project in the gradle configuration to 
> publish these as a separate package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1056:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Evenly Distribute Intervals in OffsetIndex
> --
>
> Key: KAFKA-1056
> URL: https://issues.apache.org/jira/browse/KAFKA-1056
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: newbie++, newbiee
> Fix For: 0.10.2.0
>
>
> Today a new entry will be created in OffsetIndex for each produce request 
> regardless of the number of messages it contains. It is better to evenly 
> distribute the intervals between index entries for index search efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1722) static analysis code coverage for pci audit needs

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1722:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> static analysis code coverage for pci audit needs
> -
>
> Key: KAFKA-1722
> URL: https://issues.apache.org/jira/browse/KAFKA-1722
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Joe Stein
>Assignee: Ashish K Singh
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-1722.patch, KAFKA-1722_2015-01-29_12:33:01.patch, 
> Sonar's summary report.png, clients coverage.png, core coverage.png
>
>
> Code coverage is a measure used to describe the degree to which the source 
> code of a product is tested. A product with high code coverage has been more 
> thoroughly tested and has a lower chance of containing software bugs than a 
> product with low code coverage. Apart from PCI audit needs, increasing user 
> base of Kafka makes it important to increase code coverage of Kafka. 
> Something just can not be improved without being measured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3359:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Parallel log-recovery of un-flushed segments on startup
> ---
>
> Key: KAFKA-3359
> URL: https://issues.apache.org/jira/browse/KAFKA-3359
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2.2, 0.9.0.1
>Reporter: Vamsi Subhash Achanta
>Assignee: Jay Kreps
> Fix For: 0.10.2.0
>
>
> On startup, currently the log segments within a logDir are loaded 
> sequentially when there is a un-clean shutdown. This will take a lot of time 
> for the segments to be loaded as the logSegment.recover(..) is called for 
> every segment and for brokers which have many partitions, the time taken will 
> be very high (we have noticed ~40mins for 2k partitions).
> https://github.com/apache/kafka/pull/1035
> This pull request will make the log-segment load parallel with two 
> configurable properties "log.recovery.threads" and 
> "log.recovery.max.interval.ms".
> Logic:
> 1. Have a threadpool defined of fixed length (log.recovery.threads)
> 2. Submit the logSegment recovery as a job to the threadpool and add the 
> future returned to a job list
> 3. Wait till all the jobs are done within req. time 
> (log.recovery.max.interval.ms - default set to Long.Max).
> 4. If they are done and the futures are all null (meaning that the jobs are 
> successfully completed), it is considered done.
> 5. If any of the recovery jobs failed, then it is logged and 
> LogRecoveryFailedException is thrown
> 6. If the timeout is reached, LogRecoveryFailedException is thrown.
> The logic is backward compatible with the current sequential implementation 
> as the default thread count is set to 1.
> PS: I am new to Scala and the code might look Java-ish but I will be happy to 
> modify the code review changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3186) KIP-50: Move Authorizer and related classes to separate package.

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3186:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> KIP-50: Move Authorizer and related classes to separate package.
> 
>
> Key: KAFKA-3186
> URL: https://issues.apache.org/jira/browse/KAFKA-3186
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Fix For: 0.10.2.0
>
>
> [KIP-50|https://cwiki.apache.org/confluence/display/KAFKA/KIP-50+-+Move+Authorizer+to+a+separate+package]
>  has more details.
> Kafka supports pluggable authorization. Third party authorizer 
> implementations allow existing authorization systems like, Apache Sentry, 
> Apache Ranger, etc to extend authorization to Kafka as well. Implementing 
> Kafka's authorizer interface requires depending on kafka's core, which is 
> huge. This has been already raised as a concern by Sentry, Ranger and Kafka 
> community. Even Kafka clients require duplication of authorization related 
> classes, like Resource, Operation, etc, for adding ACLs CRUD APIs.
> Kafka authorizer is agnostic of principal types it supports, so are the acls 
> CRUD methods in Authorizer interface. The intent behind is to keep Kafka 
> principal types pluggable, which is really great. However, this leads to Acls 
> CRUD methods not performing any check on validity of acls, as they are not 
> aware of what principal types Authorizer implementation supports. This opens 
> up space for lots of user errors, KAFKA-3097 is an instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3455) Connect custom processors with the streams DSL

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3455:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Connect custom processors with the streams DSL
> --
>
> Key: KAFKA-3455
> URL: https://issues.apache.org/jira/browse/KAFKA-3455
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Jonathan Bender
>  Labels: user-experience
> Fix For: 0.10.2.0
>
>
> From the kafka users email thread, we discussed the idea of connecting custom 
> processors with topologies defined from the Streams DSL (and being able to 
> sink data from the processor).  Possibly this could involve exposing the 
> underlying processor's name in the streams DSL so it can be connected with 
> the standard processor API.
> {quote}
> Thanks for the feedback. This is definitely something we wanted to support
> in the Streams DSL.
> One tricky thing, though, is that some operations do not translate to a
> single processor, but a sub-graph of processors (think of a stream-stream
> join, which is translated to actually 5 processors for windowing / state
> queries / merging, each with a different internal name). So how to define
> the API to return the processor name needs some more thinking.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3177) Kafka consumer can hang when position() is called on a non-existing partition.

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3177:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Kafka consumer can hang when position() is called on a non-existing partition.
> --
>
> Key: KAFKA-3177
> URL: https://issues.apache.org/jira/browse/KAFKA-3177
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> This can be easily reproduced as following:
> {code}
> {
> ...
> consumer.assign(SomeNonExsitingTopicParition);
> consumer.position();
> ...
> }
> {code}
> It seems when position is called we will try to do the following:
> 1. Fetch committed offsets.
> 2. If there is no committed offsets, try to reset offset using reset 
> strategy. in sendListOffsetRequest(), if the consumer does not know the 
> TopicPartition, it will refresh its metadata and retry. In this case, because 
> the partition does not exist, we fall in to the infinite loop of refreshing 
> topic metadata.
> Another orthogonal issue is that if the topic in the above code piece does 
> not exist, position() call will actually create the topic due to the fact 
> that currently topic metadata request could automatically create the topic. 
> This is a known separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-23 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517727#comment-15517727
 ] 

Dustin Cote commented on KAFKA-4207:


Agreed, marking this one as a duplicate

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-23 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-4207.

Resolution: Duplicate

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3177) Kafka consumer can hang when position() is called on a non-existing partition.

2016-09-23 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517724#comment-15517724
 ] 

Jason Gustafson commented on KAFKA-3177:


Kicking this to 0.10.2. Improving the logging alone doesn't bring much value. 
We need a better long term solution for addressing unknown topics/blocking in 
the consumer.

> Kafka consumer can hang when position() is called on a non-existing partition.
> --
>
> Key: KAFKA-3177
> URL: https://issues.apache.org/jira/browse/KAFKA-3177
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> This can be easily reproduced as following:
> {code}
> {
> ...
> consumer.assign(SomeNonExsitingTopicParition);
> consumer.position();
> ...
> }
> {code}
> It seems when position is called we will try to do the following:
> 1. Fetch committed offsets.
> 2. If there is no committed offsets, try to reset offset using reset 
> strategy. in sendListOffsetRequest(), if the consumer does not know the 
> TopicPartition, it will refresh its metadata and retry. In this case, because 
> the partition does not exist, we fall in to the infinite loop of refreshing 
> topic metadata.
> Another orthogonal issue is that if the topic in the above code piece does 
> not exist, position() call will actually create the topic due to the fact 
> that currently topic metadata request could automatically create the topic. 
> This is a known separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3829) Warn that kafka-connect group.id must not conflict with connector names

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-3829:
--

Assignee: Jason Gustafson  (was: Liquan Pei)

> Warn that kafka-connect group.id must not conflict with connector names
> ---
>
> Key: KAFKA-3829
> URL: https://issues.apache.org/jira/browse/KAFKA-3829
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Barry Kaplan
>Assignee: Jason Gustafson
>Priority: Critical
>  Labels: documentation
> Fix For: 0.10.1.0
>
>
> If the group.id value happens to have the same value as a connector names the 
> following error will be issued:
> {quote}
> Attempt to join group connect-elasticsearch-indexer failed due to: The group 
> member's supported protocols are incompatible with those of existing members.
> {quote}
> Maybe the documentation for Distributed Worker Configuration group.id could 
> be worded:
> {quote}
> A unique string that identifies the Connect cluster group this worker belongs 
> to. This value must be different than all connector configuration 'name' 
> properties.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3829) Warn that kafka-connect group.id must not conflict with connector names

2016-09-23 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517622#comment-15517622
 ] 

Jason Gustafson commented on KAFKA-3829:


[~liquanpei] Going to go ahead and pick this up since the code freeze is just a 
week out. Let me know if you have time and would rather finish it yourself.

> Warn that kafka-connect group.id must not conflict with connector names
> ---
>
> Key: KAFKA-3829
> URL: https://issues.apache.org/jira/browse/KAFKA-3829
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Barry Kaplan
>Assignee: Liquan Pei
>Priority: Critical
>  Labels: documentation
> Fix For: 0.10.1.0
>
>
> If the group.id value happens to have the same value as a connector names the 
> following error will be issued:
> {quote}
> Attempt to join group connect-elasticsearch-indexer failed due to: The group 
> member's supported protocols are incompatible with those of existing members.
> {quote}
> Maybe the documentation for Distributed Worker Configuration group.id could 
> be worded:
> {quote}
> A unique string that identifies the Connect cluster group this worker belongs 
> to. This value must be different than all connector configuration 'name' 
> properties.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-09-23 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3719.

   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1856
[https://github.com/apache/kafka/pull/1856]

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Assignee: Ryan P
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517614#comment-15517614
 ] 

ASF GitHub Bot commented on KAFKA-3719:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1856


> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Assignee: Ryan P
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1856: KAFKA-3719: Allow underscores in hostname

2016-09-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1856


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

2016-09-23 Thread Elias Levy (JIRA)
Elias Levy created KAFKA-4212:
-

 Summary: Add a key-value store that is a TTL persistent cache
 Key: KAFKA-4212
 URL: https://issues.apache.org/jira/browse/KAFKA-4212
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.1
Reporter: Elias Levy
Assignee: Guozhang Wang


Some jobs needs to maintain as state a large set of key-values for some period 
of time.  I.e. they need to maintain a TTL cache of values potentially larger 
than memory. 

Currently Kafka Streams provides non-windowed and windowed key-value stores.  
Neither is an exact fit to this use case.  

The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as 
required, but does not support expiration.  The TTL option of RocksDB is 
explicitly not used.

The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment 
dropping, but it stores multiple items per key, based on their timestamp.  But 
this store can be repurposed as a cache by fetching the items in reverse 
chronological order and returning the first item found.

KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here we 
desire a variable-capacity memory-overflowing TTL caching store.

Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be 
useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-23 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517526#comment-15517526
 ] 

Jun Rao commented on KAFKA-4207:


This seems to be the same issue as reported in 
https://issues.apache.org/jira/browse/KAFKA-1342. We can probably just 
consolidate.

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-09-23 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-3590.

   Resolution: Fixed
Fix Version/s: 0.10.2.0

Issue resolved by pull request 1859
[https://github.com/apache/kafka/pull/1859]

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0, 0.10.2.0
>
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517451#comment-15517451
 ] 

ASF GitHub Bot commented on KAFKA-3590:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1859


> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0, 0.10.2.0
>
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1859: KAFKA-3590: Handle not-enough-replicas errors when...

2016-09-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1859


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4201) Add an --assignment-strategy option to new-consumer-based Mirror Maker

2016-09-23 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-4201:
---
Summary: Add an --assignment-strategy option to new-consumer-based Mirror 
Maker  (was: Add an --assignment-strategy option to Mirror Maker)

> Add an --assignment-strategy option to new-consumer-based Mirror Maker
> --
>
> Key: KAFKA-4201
> URL: https://issues.apache.org/jira/browse/KAFKA-4201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The default assignment strategy in mirror maker will be changed from range to 
> round robin in an upcoming release ([see 
> KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to 
> make it easier for users to change the assignment strategy, add an 
> {{--assignment-strategy}} option to Mirror Maker command line tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4201) Add an --assignment-strategy option to Mirror Maker

2016-09-23 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517269#comment-15517269
 ] 

Vahid Hashemian commented on KAFKA-4201:


[~hachikuji] It seems to me that the default assignment strategy of MM is 
inherited from the default strategy of both old and new consumers. So, with 
this change we are adding a config option to MM to overwrite the default 
strategy of the consumers or any strategy set in consumer configuration. If 
that is correct, then the fix for 
[KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818] is merely setting 
a (new) default for this new config.

> Add an --assignment-strategy option to Mirror Maker
> ---
>
> Key: KAFKA-4201
> URL: https://issues.apache.org/jira/browse/KAFKA-4201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The default assignment strategy in mirror maker will be changed from range to 
> round robin in an upcoming release ([see 
> KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to 
> make it easier for users to change the assignment strategy, add an 
> {{--assignment-strategy}} option to Mirror Maker command line tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-09-23 Thread Gwen Shapira
Is it updated? are all concerns addressed? do you want to start a vote?

Sorry for being pushy, I do appreciate that we are all volunteers and
finding time is difficult. This feature is important for anything that
integrates with Kafka (stream processors, Flume, NiFi, etc) and I
don't want to see this getting stuck because we lack coordination
within the community.

On Thu, Sep 15, 2016 at 6:39 PM, Harsha Chintalapani  wrote:
> The only pending update for the KIP is to write up the protocol changes like
> we've it KIP-4. I'll update the wiki.
>
>
> On Thu, Sep 15, 2016 at 4:27 PM Ashish Singh  wrote:
>>
>> I think we decided to not support secret rotation, I guess this can be
>> stated clearly on the KIP. Also, more details on how clients will perform
>> token distribution and how CLI will look like will be helpful.
>>
>> On Thu, Sep 15, 2016 at 3:20 PM, Gwen Shapira  wrote:
>>
>> > Hi Guys,
>> >
>> > This discussion was dead for a while. Are there still contentious
>> > points? If not, why are there no votes?
>> >
>> > On Tue, Aug 23, 2016 at 1:26 PM, Jun Rao  wrote:
>> > > Ashish,
>> > >
>> > > Yes, I will send out a KIP invite for next week to discuss KIP-48 and
>> > other
>> > > remaining KIPs.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Tue, Aug 23, 2016 at 1:22 PM, Ashish Singh 
>> > wrote:
>> > >
>> > >> Thanks Harsha!
>> > >>
>> > >> Jun, can we add KIP-48 to next KIP hangout's agenda. Also, we did not
>> > >> actually make a call on when we should have next KIP call. As there
>> > >> are
>> > a
>> > >> few outstanding KIPs that could not be discussed this week, can we
>> > >> have
>> > a
>> > >> KIP hangout call next week?
>> > >>
>> > >> On Tue, Aug 23, 2016 at 1:10 PM, Harsha Chintalapani
>> > >> 
>> > >> wrote:
>> > >>
>> > >>> Ashish,
>> > >>> Yes we are working on it. Lets discuss in the next KIP
>> > >>> meeting.
>> > >>> I'll join.
>> > >>> -Harsha
>> > >>>
>> > >>> On Tue, Aug 23, 2016 at 12:07 PM Ashish Singh 
>> > >>> wrote:
>> > >>>
>> > >>> > Hello Harsha,
>> > >>> >
>> > >>> > Are you still working on this? Wondering if we can discuss this in
>> > next
>> > >>> KIP
>> > >>> > meeting, if you can join.
>> > >>> >
>> > >>> > On Mon, Jul 18, 2016 at 9:51 AM, Harsha Chintalapani <
>> > ka...@harsha.io>
>> > >>> > wrote:
>> > >>> >
>> > >>> > > Hi Grant,
>> > >>> > >   We are working on it. Will add the details to KIP
>> > >>> > > about
>> > the
>> > >>> > > request protocol.
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > > Harsha
>> > >>> > >
>> > >>> > > On Mon, Jul 18, 2016 at 6:50 AM Grant Henke
>> > >>> > > 
>> > >>> wrote:
>> > >>> > >
>> > >>> > > > Hi Parth,
>> > >>> > > >
>> > >>> > > > Are you still working on this? If you need any help please
>> > >>> > > > don't
>> > >>> > hesitate
>> > >>> > > > to ask.
>> > >>> > > >
>> > >>> > > > Thanks,
>> > >>> > > > Grant
>> > >>> > > >
>> > >>> > > > On Thu, Jun 30, 2016 at 4:35 PM, Jun Rao 
>> > wrote:
>> > >>> > > >
>> > >>> > > > > Parth,
>> > >>> > > > >
>> > >>> > > > > Thanks for the reply.
>> > >>> > > > >
>> > >>> > > > > It makes sense to only allow the renewal by users that
>> > >>> authenticated
>> > >>> > > > using
>> > >>> > > > > *non* delegation token mechanism. Then, should we make the
>> > >>> renewal a
>> > >>> > > > list?
>> > >>> > > > > For example, in the case of rest proxy, it will be useful
>> > >>> > > > > for
>> > >>> every
>> > >>> > > > > instance of rest proxy to be able to renew the tokens.
>> > >>> > > > >
>> > >>> > > > > It would be clearer if we can document the request protocol
>> > like
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >>> > >
>> > >>> > > 4+-+Command+line+and+centralized+administrative+operations#KIP-4-
>> > >>> > > Commandlineandcentralizedadministrativeoperations-
>> > >>> > > CreateTopicsRequest(KAFKA-2945):(VotedandPlannedforin0.10.1.0)
>> > >>> > > > > .
>> > >>> > > > >
>> > >>> > > > > It would also be useful to document the client APIs.
>> > >>> > > > >
>> > >>> > > > > Thanks,
>> > >>> > > > >
>> > >>> > > > > Jun
>> > >>> > > > >
>> > >>> > > > > On Tue, Jun 28, 2016 at 2:55 PM, parth brahmbhatt <
>> > >>> > > > > brahmbhatt.pa...@gmail.com> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hi,
>> > >>> > > > > >
>> > >>> > > > > > I am suggesting that we will only allow the renewal by
>> > >>> > > > > > users
>> > >>> that
>> > >>> > > > > > authenticated using *non* delegation token mechanism. For
>> > >>> example,
>> > >>> > If
>> > >>> > > > > user
>> > >>> > > > > > Alice authenticated using kerberos and requested
>> > >>> > > > > > delegation
>> > >>> tokens,
>> > >>> > > > only
>> > >>> > > > > > user Alice authenticated via non delegation token
>> > >>> > > > > > mechanism
>> > can
>> > >>> > > renew.
>> > >>> > > > 

[jira] [Comment Edited] (KAFKA-4204) KafkaService.verify_reassign_partitions is a no-op

2016-09-23 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517207#comment-15517207
 ] 

Apurva Mehta edited comment on KAFKA-4204 at 9/23/16 6:36 PM:
--

Another issue with {{KafkaService.verify_reassign_partitions}}: it doesn't 
explicitly check for reassignment failure. Ie. any completion is a success. 
This means that it reports success even if the reassignment completely failed. 
This also needs to be addressed by any fix associated with this ticket.



was (Author: apurva):
Another issue with `KafkaService.verify_reassign_partitions`: it doesn't 
explicitly check for reassignment failure. Ie. any completion is a success. 
This means that it reports success even if the reassignment completely failed. 
This also needs to be addressed by any fix associated with this ticket.


> KafkaService.verify_reassign_partitions is a no-op 
> ---
>
> Key: KAFKA-4204
> URL: https://issues.apache.org/jira/browse/KAFKA-4204
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> In the 'verify_reassign_partitions' method of the KafkaService class in the 
> system tests, we execute the kafka-reassign-partitions command and then do a 
> regular expression match on the tools output to verify that the reassignment 
> succeeded. 
> In particular, we search for the pattern 'is in progress' in the output 
> string. If the pattern exists, it means that the reassignment is still in 
> progress. 
> As it stands, this mechanism is broken because the tool outputs 'is still in 
> progress' for each reassignment which hasn't completed. Further, the tool 
> outputs a multi-line string, but the regex does not factor this in. 
> In general, depending on a specific pattern on stdout to determine success or 
> failure of an operation like reassignment is very fragile. 
> The right thing to do would be for the tool to output a well defined data 
> structure, which can be accurately interpreted by the test or any other 
> program which needs that information. 
> This JIRA is going to track the discussion and progress for implementing the 
> latter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4204) KafkaService.verify_reassign_partitions is a no-op

2016-09-23 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517207#comment-15517207
 ] 

Apurva Mehta commented on KAFKA-4204:
-

Another issue with `KafkaService.verify_reassign_partitions`: it doesn't 
explicitly check for reassignment failure. Ie. any completion is a success. 
This means that it reports success even if the reassignment completely failed. 
This also needs to be addressed by any fix associated with this ticket.


> KafkaService.verify_reassign_partitions is a no-op 
> ---
>
> Key: KAFKA-4204
> URL: https://issues.apache.org/jira/browse/KAFKA-4204
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> In the 'verify_reassign_partitions' method of the KafkaService class in the 
> system tests, we execute the kafka-reassign-partitions command and then do a 
> regular expression match on the tools output to verify that the reassignment 
> succeeded. 
> In particular, we search for the pattern 'is in progress' in the output 
> string. If the pattern exists, it means that the reassignment is still in 
> progress. 
> As it stands, this mechanism is broken because the tool outputs 'is still in 
> progress' for each reassignment which hasn't completed. Further, the tool 
> outputs a multi-line string, but the regex does not factor this in. 
> In general, depending on a specific pattern on stdout to determine success or 
> failure of an operation like reassignment is very fragile. 
> The right thing to do would be for the tool to output a well defined data 
> structure, which can be accurately interpreted by the test or any other 
> program which needs that information. 
> This JIRA is going to track the discussion and progress for implementing the 
> latter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4211) Change system tests to use the new consumer by default

2016-09-23 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517183#comment-15517183
 ] 

Ismael Juma commented on KAFKA-4211:


[~vahid], I haven't looked in detail, so I am not sure to be honest. :) 
[~geoffra] or [~hachikuji] may have an opinion. If not, then maybe you could 
take a look and suggest something more concrete that makes sense.

> Change system tests to use the new consumer by default
> --
>
> Key: KAFKA-4211
> URL: https://issues.apache.org/jira/browse/KAFKA-4211
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
> Fix For: 0.10.2.0
>
>
> We have utility methods like `run_produce_consume_validate` that use the old 
> consumer by default. We should change them to use the new consumer by default 
> while ensuring that we still have coverage for the old consumer (while we 
> still support it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4211) Change system tests to use the new consumer by default

2016-09-23 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517177#comment-15517177
 ] 

Vahid Hashemian commented on KAFKA-4211:


[~ijuma] So the default consumer in all those tests needs to change to "new 
consumer", and for every existing test that uses "old consumer" the same test 
should exists after this change of default? I could give it a try.

> Change system tests to use the new consumer by default
> --
>
> Key: KAFKA-4211
> URL: https://issues.apache.org/jira/browse/KAFKA-4211
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
> Fix For: 0.10.2.0
>
>
> We have utility methods like `run_produce_consume_validate` that use the old 
> consumer by default. We should change them to use the new consumer by default 
> while ensuring that we still have coverage for the old consumer (while we 
> still support it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3906) Connect logical types do not support nulls.

2016-09-23 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-3906.

Resolution: Fixed

I think we should handle null values at the converter layer to avoid 
duplication in logical type impls, as [~ewencp] suggested in the PR. KAFKA-4183 
fixes the null handling for logical types in {{JsonConverter}}.

> Connect logical types do not support nulls.
> ---
>
> Key: KAFKA-3906
> URL: https://issues.apache.org/jira/browse/KAFKA-3906
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jeremy Custenborder
>Assignee: Ewen Cheslack-Postava
>
> The logical types for Kafka Connect do not support null data values. Date, 
> Decimal, Time, and Timestamp all will throw null reference exceptions if a 
> null is passed in to their fromLogical and toLogical methods. Date, Time, and 
> Timestamp require signature changes for these methods to support nullable 
> types.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4211) Change system tests to use the new consumer by default

2016-09-23 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-4211:
--

 Summary: Change system tests to use the new consumer by default
 Key: KAFKA-4211
 URL: https://issues.apache.org/jira/browse/KAFKA-4211
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
 Fix For: 0.10.2.0


We have utility methods like `run_produce_consume_validate` that use the old 
consumer by default. We should change them to use the new consumer by default 
while ensuring that we still have coverage for the old consumer (while we still 
support it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4210) Improve Replication Follower Throttling Stability By Checking Quota During Backoff

2016-09-23 Thread Ben Stopford (JIRA)
Ben Stopford created KAFKA-4210:
---

 Summary: Improve Replication Follower Throttling Stability By 
Checking Quota During Backoff
 Key: KAFKA-4210
 URL: https://issues.apache.org/jira/browse/KAFKA-4210
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.11.0.0
Reporter: Ben Stopford
Assignee: Ben Stopford


In Replication Throttling, the leader is more stable, particularly at low 
throttle throughputs, than the follower, as the leader uses Purgatory's Delayed 
Fetch feature to terminate a blocked request early should the quota no longer 
be breached. 

We can simulate the same behaviour simply on the follower, by altering the way 
the back-off works in the AbstractFetcherThread so that, for replication 
throttling, it checks the quota, during the pause, to see if a new request 
should be sent early. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3396) Unauthorized topics are returned to the user

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516761#comment-15516761
 ] 

ASF GitHub Bot commented on KAFKA-3396:
---

Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1428


> Unauthorized topics are returned to the user
> 
>
> Key: KAFKA-3396
> URL: https://issues.apache.org/jira/browse/KAFKA-3396
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Grant Henke
>Assignee: Edoardo Comar
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> Kafka's clients and protocol exposes unauthorized topics to the end user. 
> This is often considered a security hole. To some, the topic name is 
> considered sensitive information. Those that do not consider the name 
> sensitive, still consider it more information that allows a user to try and 
> circumvent security.  Instead, if a user does not have access to the topic, 
> the servers should act as if the topic does not exist. 
> To solve this some of the changes could include:
>   - The broker should not return a TOPIC_AUTHORIZATION(29) error for 
> requests (metadata, produce, fetch, etc) that include a topic that the user 
> does not have DESCRIBE access to.
>   - A user should not receive a TopicAuthorizationException when they do 
> not have DESCRIBE access to a topic or the cluster.
>  - The client should not maintain and expose a list of unauthorized 
> topics in org.apache.kafka.common.Cluster. 
> Other changes may be required that are not listed here. Further analysis is 
> needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1428: KAFKA-3396 : Unauthorized topics are returned to t...

2016-09-23 Thread edoardocomar
Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2392) Kafka Server does not accept 0 as a port

2016-09-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516397#comment-15516397
 ] 

Stig Rohde Døssing commented on KAFKA-2392:
---

I don't know if this helps you, but this can already be done if you're willing 
to run Kafka in-memory. We're using Curator's TestingServer to run an in-memory 
Zookeeper, which allows you to get the assigned port 
https://curator.apache.org/apidocs/org/apache/curator/test/TestingServer.html#getConnectString--.
 You can then use that string to start an in-memory Kafka broker. 

{code}
TestingServer testingServer = new TestingServer()
Properties properties = new Properties();
properties.setProperty("zookeeper.connect", testingServer.getConnectString());
properties.setProperty("advertised.host.name", "localhost");
properties.setProperty("host.name", "localhost");
properties.setProperty("port", "0");

KafkaServer server = new KafkaServer(new KafkaConfig(properties), new 
SystemTime(), scala.Option$.MODULE$.apply(null));
server.startup();
{code}
Where SystemTime is this
{code}

public class SystemTime implements Time {
public long milliseconds() {
return System.currentTimeMillis();
}

public long nanoseconds() {
return System.nanoTime();
}

public void sleep(long ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
// Ignore
}
}
}
{code}
We're on Kafka 0.9.0.1, the parameters may be slightly different on 0.8.

> Kafka Server does not accept 0 as a port
> 
>
> Key: KAFKA-2392
> URL: https://issues.apache.org/jira/browse/KAFKA-2392
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.8.2.1
>Reporter: Buğra Gedik
>Priority: Minor
>
> I want to specify 0 as a port number for Zookeeper as well as Kafka Server. 
> For instance server.properties configuration file has a 'port' property, but 
> does not accept 0 as a value. Similarly, zookeeper.properties has a 
> 'clientPort' property, but does not accept 0 as a value.
> I want 0 to specify that the port will be selected automatically (OS 
> assigned). In my use case, I want to run Zookeeper with an automatically 
> picked port, and use that port to create a Kafka Server configuration file, 
> that specifies the Kafka Server port as 0 as well. I parse the output from 
> the servers to figure out the actual ports used. All this is needed for a 
> testing environment.
> Not supporting automatically selected ports makes it difficult to run Kafka 
> server as part of our tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4209) Reduce time taken to run quota integration tests

2016-09-23 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-4209:
--
Status: Patch Available  (was: Open)

> Reduce time taken to run quota integration tests
> 
>
> Key: KAFKA-4209
> URL: https://issues.apache.org/jira/browse/KAFKA-4209
> Project: Kafka
>  Issue Type: Test
>  Components: unit tests
>Affects Versions: 0.10.1.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>
> Quota integration tests take over a minute to run each class. Since there are 
> three versions of the test now for client-id, user and , the 
> total time for these tests has a big impact on the total test run time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4209) Reduce time taken to run quota integration tests

2016-09-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516137#comment-15516137
 ] 

ASF GitHub Bot commented on KAFKA-4209:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1902

KAFKA-4209: Reduce run time for quota integration tests

Run quota tests which expect throttling only until the first 
produce/consume request is throttled.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4209

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1902


commit 8425363159d5237a89e7ce738f9409daa1429f60
Author: Rajini Sivaram 
Date:   2016-09-23T09:38:31Z

KAFKA-4209: Reduce run time for quota integration tests




> Reduce time taken to run quota integration tests
> 
>
> Key: KAFKA-4209
> URL: https://issues.apache.org/jira/browse/KAFKA-4209
> Project: Kafka
>  Issue Type: Test
>  Components: unit tests
>Affects Versions: 0.10.1.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>
> Quota integration tests take over a minute to run each class. Since there are 
> three versions of the test now for client-id, user and , the 
> total time for these tests has a big impact on the total test run time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1902: KAFKA-4209: Reduce run time for quota integration ...

2016-09-23 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1902

KAFKA-4209: Reduce run time for quota integration tests

Run quota tests which expect throttling only until the first 
produce/consume request is throttled.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4209

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1902


commit 8425363159d5237a89e7ce738f9409daa1429f60
Author: Rajini Sivaram 
Date:   2016-09-23T09:38:31Z

KAFKA-4209: Reduce run time for quota integration tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4209) Reduce time taken to run quota integration tests

2016-09-23 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-4209:
-

 Summary: Reduce time taken to run quota integration tests
 Key: KAFKA-4209
 URL: https://issues.apache.org/jira/browse/KAFKA-4209
 Project: Kafka
  Issue Type: Test
  Components: unit tests
Affects Versions: 0.10.1.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


Quota integration tests take over a minute to run each class. Since there are 
three versions of the test now for client-id, user and , the 
total time for these tests has a big impact on the total test run time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

2016-09-23 Thread Harald Kirsch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515792#comment-15515792
 ] 

Harald Kirsch edited comment on KAFKA-2170 at 9/23/16 8:37 AM:
---

Better, but not yet there, it seems. Here is what I did.

I applied the .diff file of the 1757 pull request as downloaded from github 
with the patch command to 106a7456060750ab0604d290b8c1e33a57adbf20 from 
http://git-wip-us.apache.org/repos/asf/kafka.git. It ran in just fin.

Build the tgz, put it on my test machine, ran kafka with these settings:
{code}
log.segment.bytes=6000111
log.cleaner.enable=true
log.cleanup.policy=compact
log.cleaner.min.cleanable.ratio=0.01
log.cleaner.backoff.ms=15000
log.segment.delete.delay.ms=600
auto.create.topics.enable=false
{code}

Ran in around 75 files as binary blobs between 10k and 6M in size twice. The 
cleanup triggered and worked just fine.

Tried this a few times more, also with running the files in in quick succession 
and it worked just fine.

Stopped Kafka with CTRL-C, it shut down nicely an restarted (mentioning just 
for completeness, not sure whether this is relevant).

Tried this a few more times, running the files in roughly 15 times in quick 
succession and it bombed out as shown follows:

{code}
kafka.common.KafkaStorageException: Failed to change the log file suffix from  
to .deleted for log segment 695
at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:327)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:329)
at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:956)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1002)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:997)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Log.replaceSegments(Log.scala:997)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:425)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.nio.file.FileAlreadyExistsException: 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log -> 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
at 
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
at java.nio.file.Files.move(Files.java:1395)
at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:670)
at kafka.log.FileMessageSet.renameTo(FileMessageSet.scala:431)
... 14 more
Suppressed: java.nio.file.AccessDeniedException: 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log -> 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
at 
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
at java.nio.file.Files.move(Files.java:1395)
at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:667)
... 15 more
[2016-09-23 10:12:36,254] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)
{code}

Afterwards I restartet Kafka, which worked without complaints. I ran one round 
on the 75 files and the logcleaner cleaned up just fine, so it looks like 
you're pretty close to a working solution.
Let me know if I need to provide more information or run a different experiment.


was (Author: haraldk):
Better, but not yet there, it seems. Here is what I did.

I applied the .diff file of the 1757 pull request as downloaded from github 
with the patch command to 106a7456060750ab0604d290b8c1e33a57adbf20 from 
http://git-wip-us.apache.org/repos/asf/kafka.git. It ran in just fin.

Build the tgz, put it on my test machine, ran kafka with these settings:
{code}
log.segment.bytes=6000111
log.cleaner.enable=true
log.cleanup.policy=compact
log.cleaner.min.cleanable.ratio=0.01
log.cleaner.backoff.ms=15000
log.segment.delete.delay.ms=600
auto.create.topics.enable=false

[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

2016-09-23 Thread Harald Kirsch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515792#comment-15515792
 ] 

Harald Kirsch commented on KAFKA-2170:
--

Better, but not yet there, it seems. Here is what I did.

I applied the .diff file of the 1757 pull request as downloaded from github 
with the patch command to 106a7456060750ab0604d290b8c1e33a57adbf20 from 
http://git-wip-us.apache.org/repos/asf/kafka.git. It ran in just fin.

Build the tgz, put it on my test machine, ran kafka with these settings:
{code}
log.segment.bytes=6000111
log.cleaner.enable=true
log.cleanup.policy=compact
log.cleaner.min.cleanable.ratio=0.01
log.cleaner.backoff.ms=15000
log.segment.delete.delay.ms=600
auto.create.topics.enable=false
{code}

Ran in around 75 files as binary blobs between 10k and 6M in size twice. The 
cleanup triggered and worked just fine.

Tried this a few times more, also with running the files in in quick succession 
and it worked just fine.

Stopped Kafka with CTRL-C, it shut down nicely an restarted (mentioning just 
for completeness, not sure whether this is relevant).

Tried this a few more times, running the files in roughly 15 times in quick 
succession and it bombed out as shown follows:

{code}
kafka.common.KafkaStorageException: Failed to change the log file suffix from  
to .deleted for log segment 695
at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:327)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:329)
at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:956)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1002)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:997)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Log.replaceSegments(Log.scala:997)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:425)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.nio.file.FileAlreadyExistsException: 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log -> 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
at 
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
at java.nio.file.Files.move(Files.java:1395)
at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:670)
at kafka.log.FileMessageSet.renameTo(FileMessageSet.scala:431)
... 14 more
Suppressed: java.nio.file.AccessDeniedException: 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log -> 
C:\Users\hk\tmp\kafka-data\hktest-0\0695.log.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
at 
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
at java.nio.file.Files.move(Files.java:1395)
at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:667)
... 15 more
[2016-09-23 10:12:36,254] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)
{code}

Let me know if I need to provide more information or run a different experiment.

> 10 LogTest cases failed for  file.renameTo failed under windows
> ---
>
> Key: KAFKA-2170
> URL: https://issues.apache.org/jira/browse/KAFKA-2170
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Windows
>Reporter: Honghai Chen
>Assignee: Jay Kreps
>
> get latest code from trunk, then run test 
> gradlew  -i core:test --tests kafka.log.LogTest
> Got 10 cases failed for same reason:
> kafka.common.KafkaStorageException: Failed to change the log file suffix from 
>  to .deleted for log segment 0
>   at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:259)
>   at 

Re: Does Kafka 0.9 can guarantee not loss data

2016-09-23 Thread Kafka
Oh please ignore my last reply.
I find if leaderReplica.highWatermark.messageOffset >= requiredOffset , this 
can ensure all replicas’ leo  in curInSyncReplicas is >=  the requiredOffset.

> 在 2016年9月23日,下午3:39,Kafka  写道:
> 
> OK, the example before is not enough to exposure problem.
> What will happen to the situation under the numAcks is 1,and 
> curInSyncReplica.size >= minIsr,but in fact the replica in curInSyncReplica 
> only have one replica has caught up to leader,
> and this replica is the leader replica itself,this is not safe when the 
> machine that deploys leader partition’s broker is restart. 
> 
> current code is as belows,
> if (minIsr <= curInSyncReplicas.size) {
>(true, ErrorMapping.NoError)
>  } else {
>(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>  }
> 
> why not the code as belows,
> if (minIsr <= curInSyncReplicas.size && minIsr <= numAcks) {
>(true, ErrorMapping.NoError)
>  } else {
>(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>  }
> 
> Its seems that only one condition in kafka broker’s code is not enough to 
> ensure safe,because replicas in curInSyncReplicas is not Strong 
> synchronization.
> 
>> 在 2016年9月23日,下午1:45,Becket Qin  写道:
>> 
>> In order to satisfy a produce response, there are two conditions:
>> A. The leader's high watermark should be higher than the requiredOffset
>> (max offset in that produce request of that partition)
>> B. The number of in sync replica is greater than min.isr.
>> 
>> The ultimate goal here is to make sure at least min.isr number of replicas
>> has caught up to requiredOffset. So the check is not only whether we have
>> enough number of replicas in the isr, but also whether those replicas in
>> the ISR has caught up to the required offset.
>> 
>> In your example, if numAcks is 0 and curInSyncReplica.size >= minIsr, the
>> produce response won't return if min.isr > 0, because
>> leaderReplica.highWatermark must be less than requiredOffset given the fact
>> that numAcks is 0. i.e. condition A is not met.
>> 
>> We are actually even doing a stronger than necessary check here.
>> Theoretically as long as min.isr number of replicas has caught up to
>> requiredOffset, we should be able to return the response, but we also
>> require those replicas to be in the ISR.
>> 
>> On Thu, Sep 22, 2016 at 8:15 PM, Kafka  wrote:
>> 
>>> @wangguozhang,could you give me some advices.
>>> 
 在 2016年9月22日,下午6:56,Kafka  写道:
 
 Hi all,
 in terms of topic, we create a topic with 6 partition,and each
>>> with 3 replicas.
  in terms of producer,when we send message with ack -1 using sync
>>> interface.
 in terms of brokers,we set min.insync.replicas to 2.
 
 after we review the kafka broker’s code,we know that we send a message
>>> to broker with ack -1, then we can get response if ISR of this partition is
>>> great than or equal to min.insync.replicas,but what confused
 me is replicas in ISR is not strongly consistent,in kafka 0.9 we use
>>> replica.lag.time.max.ms param to judge whether to shrink ISR, and the
>>> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
 we we restart broker which own this partitions’ leader, then controller
>>> will start a new leader election, which will choose the first replica in
>>> ISR that not equals to current leader as new leader, then this will loss
>>> data.
 
 
 The main produce handle code shows below:
 val numAcks = curInSyncReplicas.count(r => {
if (!r.isLocal)
  if (r.logEndOffset.messageOffset >= requiredOffset) {
trace("Replica %d of %s-%d received offset
>>> %d".format(r.brokerId, topic, partitionId, requiredOffset))
true
  }
  else
false
else
  true /* also count the local (leader) replica */
  })
 
  trace("%d acks satisfied for %s-%d with acks =
>>> -1".format(numAcks, topic, partitionId))
 
  val minIsr = leaderReplica.log.get.config.minInSyncReplicas
 
  if (leaderReplica.highWatermark.messageOffset >= requiredOffset
>>> ) {
/*
* The topic may be configured not to accept messages if there
>>> are not enough replicas in ISR
* in this scenario the request was already appended locally and
>>> then added to the purgatory before the ISR was shrunk
*/
if (minIsr <= curInSyncReplicas.size) {
  (true, ErrorMapping.NoError)
} else {
  (true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
}
  } else
(false, ErrorMapping.NoError)
 
 
 why only logging unAcks and not use numAcks to compare with minIsr, if
>>> numAcks is 0, but curInSyncReplicas.size >= minIsr, then this will 

Re: Does Kafka 0.9 can guarantee not loss data

2016-09-23 Thread Kafka
OK, the example before is not enough to exposure problem.
What will happen to the situation under the numAcks is 1,and 
curInSyncReplica.size >= minIsr,but in fact the replica in curInSyncReplica 
only have one replica has caught up to leader,
and this replica is the leader replica itself,this is not safe when the machine 
that deploys leader partition’s broker is restart. 

current code is as belows,
if (minIsr <= curInSyncReplicas.size) {
(true, ErrorMapping.NoError)
  } else {
(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
  }

why not the code as belows,
if (minIsr <= curInSyncReplicas.size && minIsr <= numAcks) {
(true, ErrorMapping.NoError)
  } else {
(true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
  }

Its seems that only one condition in kafka broker’s code is not enough to 
ensure safe,because replicas in curInSyncReplicas is not Strong synchronization.

> 在 2016年9月23日,下午1:45,Becket Qin  写道:
> 
> In order to satisfy a produce response, there are two conditions:
> A. The leader's high watermark should be higher than the requiredOffset
> (max offset in that produce request of that partition)
> B. The number of in sync replica is greater than min.isr.
> 
> The ultimate goal here is to make sure at least min.isr number of replicas
> has caught up to requiredOffset. So the check is not only whether we have
> enough number of replicas in the isr, but also whether those replicas in
> the ISR has caught up to the required offset.
> 
> In your example, if numAcks is 0 and curInSyncReplica.size >= minIsr, the
> produce response won't return if min.isr > 0, because
> leaderReplica.highWatermark must be less than requiredOffset given the fact
> that numAcks is 0. i.e. condition A is not met.
> 
> We are actually even doing a stronger than necessary check here.
> Theoretically as long as min.isr number of replicas has caught up to
> requiredOffset, we should be able to return the response, but we also
> require those replicas to be in the ISR.
> 
> On Thu, Sep 22, 2016 at 8:15 PM, Kafka  wrote:
> 
>> @wangguozhang,could you give me some advices.
>> 
>>> 在 2016年9月22日,下午6:56,Kafka  写道:
>>> 
>>> Hi all,
>>>  in terms of topic, we create a topic with 6 partition,and each
>> with 3 replicas.
>>>   in terms of producer,when we send message with ack -1 using sync
>> interface.
>>>  in terms of brokers,we set min.insync.replicas to 2.
>>> 
>>> after we review the kafka broker’s code,we know that we send a message
>> to broker with ack -1, then we can get response if ISR of this partition is
>> great than or equal to min.insync.replicas,but what confused
>>> me is replicas in ISR is not strongly consistent,in kafka 0.9 we use
>> replica.lag.time.max.ms param to judge whether to shrink ISR, and the
>> defaults is 1 ms, so replicas’ data in isr can lag 1ms at most,
>>> we we restart broker which own this partitions’ leader, then controller
>> will start a new leader election, which will choose the first replica in
>> ISR that not equals to current leader as new leader, then this will loss
>> data.
>>> 
>>> 
>>> The main produce handle code shows below:
>>> val numAcks = curInSyncReplicas.count(r => {
>>> if (!r.isLocal)
>>>   if (r.logEndOffset.messageOffset >= requiredOffset) {
>>> trace("Replica %d of %s-%d received offset
>> %d".format(r.brokerId, topic, partitionId, requiredOffset))
>>> true
>>>   }
>>>   else
>>> false
>>> else
>>>   true /* also count the local (leader) replica */
>>>   })
>>> 
>>>   trace("%d acks satisfied for %s-%d with acks =
>> -1".format(numAcks, topic, partitionId))
>>> 
>>>   val minIsr = leaderReplica.log.get.config.minInSyncReplicas
>>> 
>>>   if (leaderReplica.highWatermark.messageOffset >= requiredOffset
>> ) {
>>> /*
>>> * The topic may be configured not to accept messages if there
>> are not enough replicas in ISR
>>> * in this scenario the request was already appended locally and
>> then added to the purgatory before the ISR was shrunk
>>> */
>>> if (minIsr <= curInSyncReplicas.size) {
>>>   (true, ErrorMapping.NoError)
>>> } else {
>>>   (true, ErrorMapping.NotEnoughReplicasAfterAppendCode)
>>> }
>>>   } else
>>> (false, ErrorMapping.NoError)
>>> 
>>> 
>>> why only logging unAcks and not use numAcks to compare with minIsr, if
>> numAcks is 0, but curInSyncReplicas.size >= minIsr, then this will return,
>> as ISR shrink procedure is not real time, does this will loss data after
>> leader election?
>>> 
>>> Feedback is greatly appreciated. Thanks.
>>> meituan.inf
>>> 
>>> 
>>> 
>> 
>> 
>> 




[GitHub] kafka pull request #1901: MINOR: Fix Javadoc for KafkaConsumer.poll

2016-09-23 Thread reftel
GitHub user reftel opened a pull request:

https://github.com/apache/kafka/pull/1901

MINOR: Fix Javadoc for KafkaConsumer.poll



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/reftel/kafka feature/poll_javadoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1901.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1901


commit 0be3abbda9a97c6f3d135226d3e38755aed7f283
Author: Magnus Reftel 
Date:   2016-09-23T07:02:59Z

MINOR: Fix Javadoc for KafkaConsumer.poll




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >