[jira] [Commented] (KAFKA-4104) Queryable state metadata is sometimes invalid

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457479#comment-15457479
 ] 

ASF GitHub Bot commented on KAFKA-4104:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1804


> Queryable state metadata is sometimes invalid
> -
>
> Key: KAFKA-4104
> URL: https://issues.apache.org/jira/browse/KAFKA-4104
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> The streams.metadataForKey method sometimes fails because the cluster-wide 
> metadata is invalid/null in non-leader nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4104) Queryable state metadata is sometimes invalid

2016-09-01 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4104:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1804
[https://github.com/apache/kafka/pull/1804]

> Queryable state metadata is sometimes invalid
> -
>
> Key: KAFKA-4104
> URL: https://issues.apache.org/jira/browse/KAFKA-4104
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> The streams.metadataForKey method sometimes fails because the cluster-wide 
> metadata is invalid/null in non-leader nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1804: KAFKA-4104: Queryable state metadata is sometimes ...

2016-09-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1804


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1815: KAFKA-4115: grow default heap size for connect-dis...

2016-09-01 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1815

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka connect-heap-opts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1815


commit 1ceb80ed040ed0a5b988857273af5a2606f1df0b
Author: Shikhar Bhushan 
Date:   2016-09-02T04:01:27Z

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4115) Grow default heap settings for distributed Connect from 256M to 1G

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457438#comment-15457438
 ] 

ASF GitHub Bot commented on KAFKA-4115:
---

GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1815

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka connect-heap-opts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1815


commit 1ceb80ed040ed0a5b988857273af5a2606f1df0b
Author: Shikhar Bhushan 
Date:   2016-09-02T04:01:27Z

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G




> Grow default heap settings for distributed Connect from 256M to 1G
> --
>
> Key: KAFKA-4115
> URL: https://issues.apache.org/jira/browse/KAFKA-4115
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> Currently, both {{connect-standalone.sh}} and {{connect-distributed.sh}} 
> start the Connect JVM with the default heap settings from 
> {{kafka-run-class.sh}} of {{-Xmx256M}}.
> At least for distributed connect, we should default to a much higher limit 
> like 1G. While the 'correct' sizing is workload dependent, with a system 
> where you can run arbitrary connector plugins which may perform buffering of 
> data, we should provide for more headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4115) Grow default heap settings for distributed Connect from 256M to 1G

2016-09-01 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4115:
--

 Summary: Grow default heap settings for distributed Connect from 
256M to 1G
 Key: KAFKA-4115
 URL: https://issues.apache.org/jira/browse/KAFKA-4115
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


Currently, both {{connect-standalone.sh}} and {{connect-distributed.sh}} start 
the Connect JVM with the default heap settings from {{kafka-run-class.sh}} of 
{{-Xmx256M}}.

At least for distributed connect, we should default to a much higher limit like 
1G. While the 'correct' sizing is workload dependent, with a system where you 
can run arbitrary connector plugins which may perform buffering of data, we 
should provide for more headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1814: MINOR: doc fix related to monitoring consumer lag.

2016-09-01 Thread alexlod
GitHub user alexlod opened a pull request:

https://github.com/apache/kafka/pull/1814

MINOR: doc fix related to monitoring consumer lag.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexlod/kafka consumer-lag-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1814.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1814


commit 4567d9f1da35569324cc66601139ba7e028494a4
Author: Alex Loddengaard 
Date:   2016-09-02T02:09:17Z

MINOR: doc fix related to monitoring consumer lag.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-09-01 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456584#comment-15456584
 ] 

Vahid Hashemian edited comment on KAFKA-3144 at 9/2/16 12:44 AM:
-

[~junrao] How do you think the consumer group coordinator (I assume the broker 
id is sufficient) should be displayed in the output?
One option would be in a new {{COORDINATOR}} column (which repeats the same id 
in each row similar to how group name gets repeated):
{code}
GROUPCOORDINATORTOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  
  MEMBER-ID MEMBER-HOST  CLIENT-ID
group1   0  bar0  445 445 0 
 group1_kafka-1472762710305-24f2ffa3
  
{code}

Also, I believe this applies to the case where {{--new-consumer}} is used. 
Correct? 


was (Author: vahid):
[~junrao] How do you think the consumer group coordinator (I assume the broker 
id is sufficient) should be displayed in the output?
One option would be in a new {{COORDINATOR}} column (which repeats the same id 
in each row similar to how group name gets repeated):
{code}
GROUPCOORDINATORTOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  
  MEMBER-ID MEMBER-HOST  CLIENT-ID
group1   0  bar0  445 445 0 
 group1_kafka-1472762710305-24f2ffa3
  
{code}


> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3148) SslSelectorTest.testRenegotiation can hang on slow boxes

2016-09-01 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457099#comment-15457099
 ] 

Ashish K Singh commented on KAFKA-3148:
---

[~revans2] since you have figured out the issue, do you want to create a PR?

> SslSelectorTest.testRenegotiation can hang on slow boxes
> 
>
> Key: KAFKA-3148
> URL: https://issues.apache.org/jira/browse/KAFKA-3148
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Robert Joseph Evans
>
> The SslSelectorTest hangs very frequently (about 75% of the time) for me when 
> running on a very slow Linux Virtual Machine, but I can artificially 
> reproduce the issue on a fast box by inserting a sleep right before the 
> renegotiate happens in the EchoServer.
> It appears that after the renegotiate happens the client will not be woken up 
> to receive anything unless a send is first done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1813: make mirror maker threads daemons and make sure an...

2016-09-01 Thread radai-rosenblatt
GitHub user radai-rosenblatt opened a pull request:

https://github.com/apache/kafka/pull/1813

make mirror maker threads daemons and make sure any uncaught exceptions are 
logged



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radai-rosenblatt/kafka mm-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1813


commit 211928bdd5d90e16d375fd965cca592795a44d30
Author: radai-rosenblatt 
Date:   2016-09-02T00:33:49Z

make mirror maker threads daemons and make sure any uncaught exceptions are 
logged




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4099) Change the time based log rolling to only based on the message timestamp.

2016-09-01 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-4099:

Status: Patch Available  (was: Open)

> Change the time based log rolling to only based on the message timestamp.
> -
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #857

2016-09-01 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4112: Remove alpha quality label from Kafka Streams in docs

--
[...truncated 4698 lines...]
kafka.admin.AdminRackAwareTest > testMoreReplicasThanRacks STARTED

kafka.admin.AdminRackAwareTest > testMoreReplicasThanRacks PASSED

kafka.admin.AdminRackAwareTest > testSingleRack STARTED

kafka.admin.AdminRackAwareTest > testSingleRack PASSED

kafka.admin.AdminRackAwareTest > 
testAssignmentWithRackAwareWithRandomStartIndex STARTED

kafka.admin.AdminRackAwareTest > 
testAssignmentWithRackAwareWithRandomStartIndex PASSED

kafka.admin.AdminRackAwareTest > testLargeNumberPartitionsAssignment STARTED

kafka.admin.AdminRackAwareTest > testLargeNumberPartitionsAssignment PASSED

kafka.admin.AdminRackAwareTest > testLessReplicasThanRacks STARTED

kafka.admin.AdminRackAwareTest > testLessReplicasThanRacks PASSED

kafka.admin.ConfigCommandTest > testArgumentParse STARTED

kafka.admin.ConfigCommandTest > testArgumentParse PASSED

kafka.admin.AddPartitionsTest > testReplicaPlacementAllServers STARTED

kafka.admin.AddPartitionsTest > testReplicaPlacementAllServers PASSED

kafka.admin.AddPartitionsTest > testWrongReplicaCount STARTED

kafka.admin.AddPartitionsTest > testWrongReplicaCount PASSED

kafka.admin.AddPartitionsTest > testReplicaPlacementPartialServers STARTED

kafka.admin.AddPartitionsTest > testReplicaPlacementPartialServers PASSED

kafka.admin.AddPartitionsTest > testTopicDoesNotExist STARTED

kafka.admin.AddPartitionsTest > testTopicDoesNotExist PASSED

kafka.admin.AddPartitionsTest > testIncrementPartitions STARTED

kafka.admin.AddPartitionsTest > testIncrementPartitions PASSED

kafka.admin.AddPartitionsTest > testManualAssignmentOfReplicas STARTED

kafka.admin.AddPartitionsTest > testManualAssignmentOfReplicas PASSED

kafka.admin.DeleteConsumerGroupTest > 
testGroupWideDeleteInZKDoesNothingForActiveConsumerGroup STARTED

kafka.admin.DeleteConsumerGroupTest > 
testGroupWideDeleteInZKDoesNothingForActiveConsumerGroup PASSED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKDoesNothingForActiveGroupConsumingMultipleTopics 
STARTED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKDoesNothingForActiveGroupConsumingMultipleTopics 
PASSED

kafka.admin.DeleteConsumerGroupTest > 
testConsumptionOnRecreatedTopicAfterTopicWideDeleteInZK STARTED

kafka.admin.DeleteConsumerGroupTest > 
testConsumptionOnRecreatedTopicAfterTopicWideDeleteInZK PASSED

kafka.admin.DeleteConsumerGroupTest > testTopicWideDeleteInZK STARTED

kafka.admin.DeleteConsumerGroupTest > testTopicWideDeleteInZK PASSED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKForGroupConsumingOneTopic STARTED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKForGroupConsumingOneTopic PASSED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKForGroupConsumingMultipleTopics STARTED

kafka.admin.DeleteConsumerGroupTest > 
testGroupTopicWideDeleteInZKForGroupConsumingMultipleTopics PASSED

kafka.admin.DeleteConsumerGroupTest > testGroupWideDeleteInZK STARTED

kafka.admin.DeleteConsumerGroupTest > testGroupWideDeleteInZK PASSED

kafka.admin.ReassignPartitionsCommandTest > testRackAwareReassign STARTED

kafka.admin.ReassignPartitionsCommandTest > testRackAwareReassign PASSED

kafka.admin.TopicCommandTest > testCreateIfNotExists STARTED

kafka.admin.TopicCommandTest > testCreateIfNotExists PASSED

kafka.admin.TopicCommandTest > testCreateAlterTopicWithRackAware STARTED

kafka.admin.TopicCommandTest > testCreateAlterTopicWithRackAware PASSED

kafka.admin.TopicCommandTest > testTopicDeletion STARTED

kafka.admin.TopicCommandTest > testTopicDeletion PASSED

kafka.admin.TopicCommandTest > testConfigPreservationAcrossPartitionAlteration 
STARTED

kafka.admin.TopicCommandTest > testConfigPreservationAcrossPartitionAlteration 
PASSED

kafka.admin.TopicCommandTest > testAlterIfExists STARTED

kafka.admin.TopicCommandTest > testAlterIfExists PASSED

kafka.admin.TopicCommandTest > testDeleteIfExists STARTED

kafka.admin.TopicCommandTest > testDeleteIfExists PASSED

kafka.admin.AdminTest > testBasicPreferredReplicaElection STARTED

kafka.admin.AdminTest > testBasicPreferredReplicaElection PASSED

kafka.admin.AdminTest > testPreferredReplicaJsonData STARTED

kafka.admin.AdminTest > testPreferredReplicaJsonData PASSED

kafka.admin.AdminTest > testReassigningNonExistingPartition STARTED

kafka.admin.AdminTest > testReassigningNonExistingPartition PASSED

kafka.admin.AdminTest > testGetBrokerMetadatas STARTED

kafka.admin.AdminTest > testGetBrokerMetadatas PASSED

kafka.admin.AdminTest > testBootstrapClientIdConfig STARTED

kafka.admin.AdminTest > testBootstrapClientIdConfig PASSED

kafka.admin.AdminTest > testPartitionReassignmentNonOverlappingReplicas STARTED

kafka.admin.AdminTest > testPartitionReassignmentNonOverlappingReplicas PASSED

kafka.admin.AdminTest > 

[jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456866#comment-15456866
 ] 

James Cheng commented on KAFKA-4113:


I wrote a blog post on some other ways to know when you have read everything in 
a Kafka topic: 
https://logallthethings.com/2016/06/28/how-to-read-to-the-end-of-a-kafka-topic/



> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to 
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase 
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the 
> data. Only after this topic got read completely and the KTable is ready, the 
> application should start processing. This would indicate, that on startup, 
> the current partition sizes must be fetched and stored, and after KTable got 
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-0.10.0-jdk7 #198

2016-09-01 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4112: Remove alpha quality label from Kafka Streams in docs

--
[...truncated 5748 lines...]
org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > testKTable 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testValueGetter PASSED

org.apache.kafka.streams.kstream.internals.KStreamSelectKeyTest > testSelectKey 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testCopartitioning PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde PASSED

org.apache.kafka.streams.kstream.internals.KStreamTransformValuesTest > 
testTransform PASSED

org.apache.kafka.streams.kstream.internals.KGroupedTableImplTest > 
testGroupedCountOccurences PASSED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > 
testNotJoinable PASSED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testSkipNullOnMaterialization PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > testKTable PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamImplTest > 
testToWithNullValueSerdeDoesntNPE PASSED

org.apache.kafka.streams.kstream.internals.KStreamImplTest > testNumProcesses 
PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testLeftJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream PASSED

org.apache.kafka.streams.kstream.internals.KStreamBranchTest > 
testKStreamBranch PASSED

org.apache.kafka.streams.kstream.internals.KStreamMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testNotSedingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testSedingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testKTable PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapTest > testFlatMap 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilterNot 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilter PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > testStateStore 
PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > testRepartition 
PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
testStateStoreLazyEval PASSED


Jenkins build is back to normal : kafka-trunk-jdk7 #1513

2016-09-01 Thread Apache Jenkins Server
See 



[jira] [Comment Edited] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-09-01 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456584#comment-15456584
 ] 

Vahid Hashemian edited comment on KAFKA-3144 at 9/1/16 8:56 PM:


[~junrao] How do you think the consumer group coordinator (I assume the broker 
id is sufficient) should be displayed in the output?
One option would be in a new {{COORDINATOR}} column (which repeats the same id 
in each row similar to how group name gets repeated):
{code}
GROUPCOORDINATORTOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  
  MEMBER-ID MEMBER-HOST  CLIENT-ID
group1   0  bar0  445 445 0 
 group1_kafka-1472762710305-24f2ffa3
  
{code}



was (Author: vahid):
[~junrao] How do you think the consumer group coordinator (I assume the broker 
id is sufficient) should be displayed in the output?
One option would be in a new {{COORDINATOR}} column (which repeats the same id 
in each row similar to how group name gets repeated):
{code}
GROUPCOORDINATOR  TOPIC PARTITION  CURRENT-OFFSET  
LOG-END-OFFSET  LAGMEMBER-ID MEMBER-HOST 
CLIENT-ID
group1   0bar   0  445 445  
   0  group1_kafka-1472762710305-24f2ffa3   
   
{code}


> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-09-01 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456584#comment-15456584
 ] 

Vahid Hashemian commented on KAFKA-3144:


[~junrao] How do you think the consumer group coordinator (I assume the broker 
id is sufficient) should be displayed in the output?
One option would be in a new {{COORDINATOR}} column (which repeats the same id 
in each row similar to how group name gets repeated):
{code}
GROUPCOORDINATOR  TOPIC PARTITION  CURRENT-OFFSET  
LOG-END-OFFSET  LAGMEMBER-ID MEMBER-HOST 
CLIENT-ID
group1   0bar   0  445 445  
   0  group1_kafka-1472762710305-24f2ffa3   
   
{code}


> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Remove beta label from the new Java consumer

2016-09-01 Thread Jason Gustafson
Hey Harsha,

I'm trying to understand your specific concern. Is it the fact that the
client cannot tell the difference between non-existing topic and hence
needlessly retries (instead of perhaps raising an error). Or is it
specifically that in some cases, the consumer will block because of this?
In other words, would you be satisfied with retrying internally for
non-existing topics as long as it doesn't prevent the consumer from making
progress on other assigned partitions?

Thanks,
Jason


On Thu, Sep 1, 2016 at 11:28 AM, Harsha Chintalapani 
wrote:

> I would like to see we address
> https://issues.apache.org/jira/browse/KAFKA-1894 . This is problematic in
> secure cluster when the users try to access the topics that they don't have
> ACLs turned on.
>
> -Harsha
>
> On Tue, Aug 30, 2016 at 9:40 PM Jaikiran Pai 
> wrote:
>
> > We have been using the (new) Java consumer API in 0.9.0.1 for a while
> > now. We have some well known issues with it - like heart beats being
> > part of the same thread causing the consumer to sometimes be considered
> > dead. I understand that this has been fixed in 0.10.0.1 but we haven't
> > yet had a chance to migrate to it. We plan to do that in the next month
> > or so.
> >
> > Personally, I would be OK if the beta label is removed from it if the
> > dev team is sure the API isn't going to change. I don't know if that's
> > true or not post 0.10.0.1. For me the major thing that I think needs to
> > be addressed is these JIRAs which actually expose some API
> > implementation level issues. Not sure if solving those issues will
> > involve changes to API itself:
> >
> > https://issues.apache.org/jira/browse/KAFKA-1894
> > https://issues.apache.org/jira/browse/KAFKA-3540
> > https://issues.apache.org/jira/browse/KAFKA-3539
> >
> > If solving issues like these will not involve changes to the API, I
> > think it's safe to move it out of beta label.
> >
> > -Jaikiran
> >
> > On Tuesday 30 August 2016 05:09 PM, Ismael Juma wrote:
> > > Thanks for the feedback everyone. Since Harsha said that he is OK
> either
> > > way and everyone else is in favour, I think we should go ahead with
> this.
> > > Since we committed to API stability for the new Java consumer in
> 0.10.0.0
> > > via KIP-45, this is simply a documentation change and I don't think we
> > need
> > > an official vote thread (we didn't have one for the equivalent producer
> > > change).
> > >
> > > Ismael
> > >
> > > On Mon, Aug 29, 2016 at 7:37 PM, Jay Kreps  wrote:
> > >
> > >> +1 I talk to a lot of kafka users, and I would say > 75% of people
> doing
> > >> new things are on the new consumer despite our warnings :-)
> > >>
> > >> -Jay
> > >>
> > >> On Thu, Aug 25, 2016 at 2:05 PM, Jason Gustafson 
> > >> wrote:
> > >>
> > >>> I'm +1 also. I feel a lot more confident about this with all of the
> > >> system
> > >>> testing we now have in place (including the tests covering Streams
> and
> > >>> Connect).
> > >>>
> > >>> -Jason
> > >>>
> > >>> On Thu, Aug 25, 2016 at 9:57 AM, Gwen Shapira 
> > wrote:
> > >>>
> >  Makes sense :)
> > 
> >  On Thu, Aug 25, 2016 at 9:40 AM, Neha Narkhede 
> > >>> wrote:
> > > Yeah, I'm supportive of this.
> > >
> > > On Thu, Aug 25, 2016 at 9:26 AM Ismael Juma 
> > >> wrote:
> > >> Hi Gwen,
> > >>
> > >> We have a few recent stories of people using Connect and Streams
> in
> > >> production. That means the new Java Consumer too. :)
> > >>
> > >> Ismael
> > >>
> > >> On Thu, Aug 25, 2016 at 5:09 PM, Gwen Shapira 
> >  wrote:
> > >>> Originally, we suggested keeping the beta label until we know
> > >>> someone
> > >>> successfully uses the new consumer in production.
> > >>>
> > >>> We can consider the recent KIPs enough, but IMO it will be better
> > >> if
> > >>> someone with production deployment hanging out on our mailing
> list
> > >>> will confirm good experience with the new consumer.
> > >>>
> > >>> Gwen
> > >>>
> > >>> On Wed, Aug 24, 2016 at 8:45 PM, Ismael Juma 
> >  wrote:
> >  Hi all,
> > 
> >  We currently say the following in our documentation:
> > 
> >  "As of the 0.9.0 release we have added a new Java consumer to
> >  replace
> > >> our
> >  existing high-level ZooKeeper-based consumer and low-level
> > >>> consumer
> > >> APIs.
> >  This client is considered beta quality."[1]
> > 
> >  Since then, Jason and the community have done a lot of work to
> >  improve
> > >> it
> >  (including KIP-41 and KIP-62), we declared it API stable in
> > >>> 0.10.0.0
> > >> and
> >  it's the only option for those that need security support. Yes,
> > >> it
> > >> still
> >  has bugs, but so does 

Build failed in Jenkins: kafka-trunk-jdk7 #1512

2016-09-01 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4077: Backdate system test certificates to cope with clock skew

--
[...truncated 12207 lines...]

org.apache.kafka.streams.kstream.JoinWindowsTest > validWindows STARTED

org.apache.kafka.streams.kstream.JoinWindowsTest > validWindows PASSED

org.apache.kafka.streams.kstream.JoinWindowsTest > 
timeDifferenceMustNotBeNegative STARTED

org.apache.kafka.streams.kstream.JoinWindowsTest > 
timeDifferenceMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStream STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStream PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStreamWithProcessors STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStreamWithProcessors PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testMerge STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testMerge PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testFrom STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testFrom PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldThrowExceptionWhenTopicNamesAreNull STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldThrowExceptionWhenTopicNamesAreNull PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldThrowExceptionWhenNoTopicPresent STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldThrowExceptionWhenNoTopicPresent PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testNewName STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > testNewName PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
shouldHaveSaneEqualsAndHashCode STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
shouldHaveSaneEqualsAndHashCode PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeZero 
STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeZero 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > advanceIntervalMustNotBeZero 
STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > advanceIntervalMustNotBeZero 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeNegative 
STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowSizeMustNotBeNegative 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeNegative STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeLargerThanWindowSize STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
advanceIntervalMustNotBeLargerThanWindowSize PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForTumblingWindows 
STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForTumblingWindows 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForHoppingWindows 
STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > windowsForHoppingWindows 
PASSED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
windowsForBarelyOverlappingHoppingWindows STARTED

org.apache.kafka.streams.kstream.TimeWindowsTest > 
windowsForBarelyOverlappingHoppingWindows PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
STARTED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedProducerConfigs PASSED


[jira] [Resolved] (KAFKA-4112) Remove alpha quality label from Kafka Streams in docs

2016-09-01 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4112.
--
   Resolution: Fixed
Fix Version/s: 0.10.0.2
   0.10.1.0

> Remove alpha quality label from Kafka Streams in docs
> -
>
> Key: KAFKA-4112
> URL: https://issues.apache.org/jira/browse/KAFKA-4112
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Trivial
> Fix For: 0.10.1.0, 0.10.0.2
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4112) Remove alpha quality label from Kafka Streams in docs

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456357#comment-15456357
 ] 

ASF GitHub Bot commented on KAFKA-4112:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1811


> Remove alpha quality label from Kafka Streams in docs
> -
>
> Key: KAFKA-4112
> URL: https://issues.apache.org/jira/browse/KAFKA-4112
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1811: KAFKA-4112: Remove alpha quality label from Kafka ...

2016-09-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1811


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456332#comment-15456332
 ] 

Matthias J. Sax commented on KAFKA-4113:


Two things to add. (1) prioritizing smaller timestamps in processing is "best 
effort" and thus only a weak guarantee. (2) This feature is useful if you have 
a more "static table" that you need to fully populate on *very first Streams 
application startup* -- this table might get an update every few weeks, but the 
"current content" of the table must be fully available because if I get a first 
KStream record, I need to enrich it via a join to the KTable and if the KTable 
record could be missing, the result would be wrong. (Assume you know that there 
will be a matching KTable record for every KStreams record). Thus, we need to 
have a way to _guarantee_ that the KTable is first populated completely. Our 
current "best effort approach" would start to process all streams from the 
beginning (even if it would prioritize KTable and catch up eventually). 
However, the first KStreams records would be processed incorrectly, as no 
matching KTable record might be found, and thus the KStream record gets dropped 
on inner-join (or has missing right-hand side on left-join an is thus 
corrupted).

> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to 
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase 
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the 
> data. Only after this topic got read completely and the KTable is ready, the 
> application should start processing. This would indicate, that on startup, 
> the current partition sizes must be fetched and stored, and after KTable got 
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-3478) Finer Stream Flow Control

2016-09-01 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-3478:
---
Comment: was deleted

(was: [~bbejeck] I just created 
https://issues.apache.org/jira/browse/KAFKA-4113 and 
https://issues.apache.org/jira/browse/KAFKA-4114. Feel free to grab one :))

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2016-09-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456314#comment-15456314
 ] 

Alejandro Abdelnur commented on KAFKA-3199:
---

[~sriharsha], thanks for you quick reply. 

Following why I think is relevant, not in form but in behavior. The proposed 
patch does the following change:

{code}
 private LoginManager(LoginType loginType, Map configs) throws 
IOException, LoginException {
 this.loginType = loginType;
 String loginContext = loginType.contextName();
-login = new Login(loginContext, configs);
+
+// Check for an existing Subject
+AccessControlContext context = AccessController.getContext();
+subject = context != null ? Subject.getSubject(context) : null;
+
+// Otherwise try to login
+if (subject == null || !JaasUtils.hasValidKerberosTicket(subject)) {
+login = new Login(loginContext, configs);
+login.startThreadIfNeeded();
+subject = login.subject();
+} else {
+login = null;
+}
 this.serviceName = getServiceName(loginContext, configs);
-login.startThreadIfNeeded();
 }
{code}

So, while the Kafka API does not receive a {{Subject}} as parameter, it will 
obtain it from the current context, and if there is one it will use it. If the 
subject was obtained from the context, Kafka client should not be responsible 
for it s renewal and that is what the patch is doing.

> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3478) Finer Stream Flow Control

2016-09-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456311#comment-15456311
 ] 

Matthias J. Sax commented on KAFKA-3478:


[~bbejeck] I just created https://issues.apache.org/jira/browse/KAFKA-4113 and 
https://issues.apache.org/jira/browse/KAFKA-4114. Feel free to grab one :)

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3478) Finer Stream Flow Control

2016-09-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456310#comment-15456310
 ] 

Matthias J. Sax commented on KAFKA-3478:


[~bbejeck] I just created https://issues.apache.org/jira/browse/KAFKA-4113 and 
https://issues.apache.org/jira/browse/KAFKA-4114. Feel free to grab one :)

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456296#comment-15456296
 ] 

Jay Kreps edited comment on KAFKA-4113 at 9/1/16 6:52 PM:
--

Don't you get this naturally out of the message timestamps and the 
prioritization we already do? People say they want to  "fully populate" a table 
but i think this isn't true. Rather you want the table to be in the same state 
the associated streams would be in. To see the difference imagine a case where 
you have a job that is doing a stream-table join and say you lose all your 
materialized table state and have a job that is down for three hours (for 
whatever reason--maintenance or something). When it comes back up you don't 
actually want to catch all the way up on the table because if you do that you 
will be joining table data from now to stream data from three hours ago. 
Rather, what you want is to catch up the table to three hours ago and then keep 
the two roughly aligned so you are joining stream data from time X to the state 
of the table at time X.

But isn't this exactly what the time stamp prioritization does already? It 
naturally leads to you catching up on populating the table first if that data 
is older, right?


was (Author: jkreps):
Don't you get this naturally out of the message timestamps and the 
prioritization we already do? People say they want to  "fully populate" a table 
but i think this isn't true. Rather you want the table to be in the same state 
the associated streams would be in. To see the difference imagine a case where 
you have a job that is doing a stream-table join and say you lose all your 
materialized table state and have a job that is down for three hours (for 
whatever reason--maintenance or something). When it comes back up you don't 
actually want to catch all the way up on the table because if you do that you 
will be joining table data from now to stream data from three hours ago. 
Rather, what you want is to catch up the table to three hours ago and then keep 
the two roughly aligned so you are joining stream data from time X to the state 
of the table at time X.

But isn't this exactly what the time stamp prioritization does already?

> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to 
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase 
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the 
> data. Only after this topic got read completely and the KTable is ready, the 
> application should start processing. This would indicate, that on startup, 
> the current partition sizes must be fetched and stored, and after KTable got 
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4114) Allow for different "auto.offset.reset" strategies for different input streams

2016-09-01 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4114:
--

 Summary: Allow for different "auto.offset.reset" strategies for 
different input streams
 Key: KAFKA-4114
 URL: https://issues.apache.org/jira/browse/KAFKA-4114
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Matthias J. Sax
Assignee: Guozhang Wang


Today we only have one consumer config "offset.auto.reset" to control that 
behavior, which means all streams are read either from "earliest" or "latest".

However, it would be useful to improve this settings to allow users have finer 
control over different input stream. For example, with two input streams, one 
of them always reading from offset 0 upon (re)-starting, and the other reading 
for log end offset.

This JIRA requires to extend {{KStreamBuilder}} API for methods 
{{.stream(...)}} and {{.table(...)}} to add a new parameter that indicate the 
initial offset to be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456296#comment-15456296
 ] 

Jay Kreps edited comment on KAFKA-4113 at 9/1/16 6:51 PM:
--

Don't you get this naturally out of the message timestamps and the 
prioritization we already do? People say they want to  "fully populate" a table 
but i think this isn't true. Rather you want the table to be in the same state 
the associated streams would be in. To see the difference imagine a case where 
you have a job that is doing a stream-table join and say you lose all your 
materialized table state and have a job that is down for three hours (for 
whatever reason--maintenance or something). When it comes back up you don't 
actually want to catch all the way up on the table because if you do that you 
will be joining table data from now to stream data from three hours ago. 
Rather, what you want is to catch up the table to three hours ago and then keep 
the two roughly aligned so you are joining stream data from time X to the state 
of the table at time X.

But isn't this exactly what the time stamp prioritization does already?


was (Author: jkreps):
Don't you get this naturally out of the message timestamps and the 
prioritization we already do? People say they want to  "fully populate" a table 
but i think this isn't true. Rather you want the table to be in the same state 
the associated streams would be in. To see the difference imagine a case where 
you have a job that is doing a stream-table join and say you lose all your 
materialized table state and have a job that is down for two hours (for 
whatever reason--maintenance or something). When it comes back up you don't 
actually want to catch all the way up on the table because if you do that you 
will be joining table data from now to stream data from three hours ago. 
Rather, what you want is to catch up the table to three hours ago and then keep 
the two roughly aligned.

But isn't this exactly what the time stamp prioritization does already?

> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to 
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase 
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the 
> data. Only after this topic got read completely and the KTable is ready, the 
> application should start processing. This would indicate, that on startup, 
> the current partition sizes must be fetched and stored, and after KTable got 
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-78: Cluster Id

2016-09-01 Thread Dong Lin
Hey Ismael,

Thanks for the KIP.

I share the view with Harsha and would like to understand how the current
approach of randomly generating cluster.id compares with the approach of
manually specifying it in meta.properties.

I think one big advantage of defining it manually in zookeeper is that we
can easily tell which cluster it is by simply looking at the sensor name,
which makes it more useful to the auditing or monitoring use-case that this
KIP intends to address. On the other hand, if you can only tell whether two
sensors are measuring the same cluster or not. Also note that even this
goal is not easily guaranteed, because you need an external mechanism to
manually re-generate znode with the old cluster.id if znode is deleted or
if the same cluster (w.r.t purpose) is changed to use a different zookeeper.

I read your reply to Harsha but still I don't fully understand your concern
with that approach. I think the broker can simply register group.id in that
znode if it is not specified yet, in the same way that this KIP proposes to
do it, right? Can you please elaborate more about your concern with this
approach?

Thanks,
Dong


On Thu, Sep 1, 2016 at 10:19 AM, Ismael Juma  wrote:

> Hi Jun,
>
> Thanks for the feedback. Using `Cluster` was appealing because it has the
> information we need and it is already a public class, so we would not need
> to introduce a new public class with a potentially confusing name. Having
> said that, I agree with your points and I have updated the KIP so that the
> listener is called ClusterResourceListener and we now pass a
> ClusterResource instance.
>
> I have also clarified the motivation section a little and added a paragraph
> about the fact that the cluster id is only stored in ZooKeeper. I think
> this is fine for the first version and we can tackle improvements in future
> KIPs.
>
> I intend to start a vote soon as it seems like people are generally fine
> with the current proposal.
>
> Ismael
>
> On Wed, Aug 31, 2016 at 4:34 PM, Jun Rao  wrote:
>
> > Ismael,
> >
> > Thanks for the proposal. It looks good overall. Just a comment below.
> >
> > We are adding the following interface in ClusterListener and Cluster
> > includes all brokers' endpoint and the metadata of topic partitions.
> >
> > void onClusterUpdate(Cluster cluster);
> >
> >
> > On the broker side, will that method be called when there is broker or
> > topic/partition change in metadata cache? Another thing is that Cluster
> > only includes one endpoint. This makes sense for the client. However, on
> > the broker side, it's not clear which endpoint should be used.
> >
> > In general, I am not sure how useful it is for serializers, interceptors,
> > metric reporters to know all brokers endpoints and topic/partition
> > metadata.
> >
> > I was thinking we could instead pass in sth like a ClusterResourceMeta
> and
> > just include the clusterId for now. This way, we can guarantee that
> > onClusterUpdate()
> > will only be called once and it's easier to implement on the broker side.
> >
> > Jun
> >
> >
> > On Wed, Aug 31, 2016 at 1:24 AM, Ismael Juma  wrote:
> >
> > > Thanks for the feedback Guozhang. Comment inline.
> > >
> > > On Wed, Aug 31, 2016 at 2:49 AM, Guozhang Wang 
> > wrote:
> > >
> > > > About logging / debugging with the cluster id: I think the random
> UUID
> > > > itself may not be very helpful for human-readable debugging
> > information,
> > > > and we'd better use the cluster name mentioned in future work in
> > logging.
> > > >
> > >
> > > We can also add the human-readable value once it's available. However,
> > the
> > > random UUID is still useful now. After all, we use Git commit hashes in
> > > many places and they are significantly longer than what we are
> proposing
> > > here (40 instead of 22 characters) . When comparing by eye, one can
> often
> > > just look at the first few characters to distinguish. Does that make
> > sense?
> > >
> > > Ismael
> > >
> >
>


[jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456296#comment-15456296
 ] 

Jay Kreps commented on KAFKA-4113:
--

Don't you get this naturally out of the message timestamps and the 
prioritization we already do? People say they want to  "fully populate" a table 
but i think this isn't true. Rather you want the table to be in the same state 
the associated streams would be in. To see the difference imagine a case where 
you have a job that is doing a stream-table join and say you lose all your 
materialized table state and have a job that is down for two hours (for 
whatever reason--maintenance or something). When it comes back up you don't 
actually want to catch all the way up on the table because if you do that you 
will be joining table data from now to stream data from three hours ago. 
Rather, what you want is to catch up the table to three hours ago and then keep 
the two roughly aligned.

But isn't this exactly what the time stamp prioritization does already?

> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to 
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase 
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the 
> data. Only after this topic got read completely and the KTable is ready, the 
> application should start processing. This would indicate, that on startup, 
> the current partition sizes must be fetched and stored, and after KTable got 
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4113) Allow KTable bootstrap

2016-09-01 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4113:
--

 Summary: Allow KTable bootstrap
 Key: KAFKA-4113
 URL: https://issues.apache.org/jira/browse/KAFKA-4113
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Matthias J. Sax
Assignee: Guozhang Wang


On the mailing list, there are multiple request about the possibility to "fully 
populate" a KTable before actual stream processing start.

Even if it is somewhat difficult to define, when the initial populating phase 
should end, there are multiple possibilities:

The main idea is, that there is a rarely updated topic that contains the data. 
Only after this topic got read completely and the KTable is ready, the 
application should start processing. This would indicate, that on startup, the 
current partition sizes must be fetched and stored, and after KTable got 
populated up to those offsets, stream processing can start.

Other discussed ideas are:
1) an initial fixed time period for populating
(it might be hard for a user to estimate the correct value)
2) an "idle" period, ie, if no update to a KTable for a certain time is
done, we consider it as populated
3) a timestamp cut off point, ie, all records with an older timestamp
belong to the initial populating phase

The API change is not decided yet, and the API desing is part of this JIRA.

One suggestion (for option (4)) was:
{noformat}
KTable table = builder.table("topic", 1000); // populate the table without 
reading any other topics until see one record with timestamp 1000.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: PartitionAssignor / Sort members per subscription time before assigning partitions

2016-09-01 Thread Jason Gustafson
Hi Florian,

I'm not totally sure I understand the problem. The consumer id consists of
the clientId configured by the user with a UUID appended to it. If the
clientId has not been passed in configuration, we use "consumer-{n}" for it
where n is incremented for every new consumer instance. Is the problem
basically that this default when applied independently on different jvms is
giving you less than ideal ordering?

For what it's worth, I know that Kafka Streams is a little more clever in
how partitions are assigned. It uses a custom assignor which takes into
account the consumer's host information.

Thanks,
Jason

On Thu, Sep 1, 2016 at 9:00 AM, Florian Hussonnois 
wrote:

> Hi Kafka Team,
>
> I would like to have your opinion before creating a new JIRA.
>
> I'm working with the Java Consumer API. The current partition assignors use
> the consumer ids to sort members before assigning partitions.
>
> This works pretty well as long as all consumers are started into the same
> JVM and no more consumers than the number of partitions are created.
>
> However, in many cases consumers are distributed across multiple hosts. To
> continue consuming with an optimal number of consumers (even with a host
> failure) we can create as many consumers as partitions on each host.
>
> For example, we have two consumers C0, C1 each on a dedicated host and one
> topic with 4 partitions. With current assignors it is not possible to have
> 2 consuming threads and 2 idle threads per host.
>
> Instead of that, C0 will have 4 consuming threads and C1 will have 4 idle
> threads.
>
> One solution could be to keep a timestamp the first time a member
> subscribes to a topic. This timestamp can then be used to sort members for
> the partitions assignment. In this way, the partition assignment will be
> more predictable as it will not depend on member ids.
>
> One drawback of this solution is that the consumer responsible of
> assignments will keep a local state.
>
> Thanks,
>
> --
> Florian
>


Re: [DISCUSS] Remove beta label from the new Java consumer

2016-09-01 Thread Harsha Chintalapani
I would like to see we address
https://issues.apache.org/jira/browse/KAFKA-1894 . This is problematic in
secure cluster when the users try to access the topics that they don't have
ACLs turned on.

-Harsha

On Tue, Aug 30, 2016 at 9:40 PM Jaikiran Pai 
wrote:

> We have been using the (new) Java consumer API in 0.9.0.1 for a while
> now. We have some well known issues with it - like heart beats being
> part of the same thread causing the consumer to sometimes be considered
> dead. I understand that this has been fixed in 0.10.0.1 but we haven't
> yet had a chance to migrate to it. We plan to do that in the next month
> or so.
>
> Personally, I would be OK if the beta label is removed from it if the
> dev team is sure the API isn't going to change. I don't know if that's
> true or not post 0.10.0.1. For me the major thing that I think needs to
> be addressed is these JIRAs which actually expose some API
> implementation level issues. Not sure if solving those issues will
> involve changes to API itself:
>
> https://issues.apache.org/jira/browse/KAFKA-1894
> https://issues.apache.org/jira/browse/KAFKA-3540
> https://issues.apache.org/jira/browse/KAFKA-3539
>
> If solving issues like these will not involve changes to the API, I
> think it's safe to move it out of beta label.
>
> -Jaikiran
>
> On Tuesday 30 August 2016 05:09 PM, Ismael Juma wrote:
> > Thanks for the feedback everyone. Since Harsha said that he is OK either
> > way and everyone else is in favour, I think we should go ahead with this.
> > Since we committed to API stability for the new Java consumer in 0.10.0.0
> > via KIP-45, this is simply a documentation change and I don't think we
> need
> > an official vote thread (we didn't have one for the equivalent producer
> > change).
> >
> > Ismael
> >
> > On Mon, Aug 29, 2016 at 7:37 PM, Jay Kreps  wrote:
> >
> >> +1 I talk to a lot of kafka users, and I would say > 75% of people doing
> >> new things are on the new consumer despite our warnings :-)
> >>
> >> -Jay
> >>
> >> On Thu, Aug 25, 2016 at 2:05 PM, Jason Gustafson 
> >> wrote:
> >>
> >>> I'm +1 also. I feel a lot more confident about this with all of the
> >> system
> >>> testing we now have in place (including the tests covering Streams and
> >>> Connect).
> >>>
> >>> -Jason
> >>>
> >>> On Thu, Aug 25, 2016 at 9:57 AM, Gwen Shapira 
> wrote:
> >>>
>  Makes sense :)
> 
>  On Thu, Aug 25, 2016 at 9:40 AM, Neha Narkhede 
> >>> wrote:
> > Yeah, I'm supportive of this.
> >
> > On Thu, Aug 25, 2016 at 9:26 AM Ismael Juma 
> >> wrote:
> >> Hi Gwen,
> >>
> >> We have a few recent stories of people using Connect and Streams in
> >> production. That means the new Java Consumer too. :)
> >>
> >> Ismael
> >>
> >> On Thu, Aug 25, 2016 at 5:09 PM, Gwen Shapira 
>  wrote:
> >>> Originally, we suggested keeping the beta label until we know
> >>> someone
> >>> successfully uses the new consumer in production.
> >>>
> >>> We can consider the recent KIPs enough, but IMO it will be better
> >> if
> >>> someone with production deployment hanging out on our mailing list
> >>> will confirm good experience with the new consumer.
> >>>
> >>> Gwen
> >>>
> >>> On Wed, Aug 24, 2016 at 8:45 PM, Ismael Juma 
>  wrote:
>  Hi all,
> 
>  We currently say the following in our documentation:
> 
>  "As of the 0.9.0 release we have added a new Java consumer to
>  replace
> >> our
>  existing high-level ZooKeeper-based consumer and low-level
> >>> consumer
> >> APIs.
>  This client is considered beta quality."[1]
> 
>  Since then, Jason and the community have done a lot of work to
>  improve
> >> it
>  (including KIP-41 and KIP-62), we declared it API stable in
> >>> 0.10.0.0
> >> and
>  it's the only option for those that need security support. Yes,
> >> it
> >> still
>  has bugs, but so does the old consumer and all development is
>  currently
>  focused on the new consumer.
> 
>  As such, I propose we remove the beta label for the next release
> >>> and
> >>> switch
>  our tools to use the new consumer by default unless the
> >> zookeeper
>  command-line option is present (for compatibility). This is
> >>> similar
>  to
> >>> what
>  we did it for the new producer in 0.9.0.0, but backwards
> >>> compatible.
>  Thoughts?
> 
>  Ismael
> 
>  [1] http://kafka.apache.org/documentation.html#consumerapi
> >>>
> >>>
> >>> --
> >>> Gwen Shapira
> >>> Product Manager | Confluent
> >>> 650.450.2760 | @gwenshap
> >>> Follow us: Twitter | blog
> >>>
> > --
> > 

[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2016-09-01 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456228#comment-15456228
 ] 

Sriharsha Chintalapani commented on KAFKA-3199:
---

[~tucu00] HADOOP-13558 is not relevant here. We don't accept the subjects that 
comes from host application and there is no such provision. A client will 
initiate the LoginManager and will take care of the life cycle of the subject.

> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2016-09-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456209#comment-15456209
 ] 

Alejandro Abdelnur commented on KAFKA-3199:
---

[~sriharsha], if the {{Subject}} was given to the Kafka client by the host 
application, it is the responsibility of the host application to renew the 
ticket, Kafka client should not attempt that. A related discussion in Hadoop 
and the {{UserGroupInformation}} class doing the same is at HADOOP-13558.

> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4106) Consumer / add configure method to PartitionAssignor interface

2016-09-01 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-4106.

Resolution: Resolved
  Assignee: Jason Gustafson

The requested functionality already exists.

> Consumer / add configure method to PartitionAssignor interface
> --
>
> Key: KAFKA-4106
> URL: https://issues.apache.org/jira/browse/KAFKA-4106
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Affects Versions: 0.10.0.1
>Reporter: Florian Hussonnois
>Assignee: Jason Gustafson
>Priority: Minor
>
> Currently, we can implement a custom PartitionAssignor which will forward 
> user data that will be used during the assignments protocol. For example, 
> data can be used to implement a rack-aware assignor
> However, currently we cannot dynamically configure a PartitionAssignor 
> instance.
> It would be nice to add a method configure(Map PartitionAssignor interface. Then, this method will be invoked by the 
> KafkaConsumer  on each assignor, as this is do for deserializers.
> The code modifications are pretty straight-forward but involve modifying the 
> public interface PartitionAssignor. Does that mean this JIRA needs a KIP ?
> I can contribute to that improvement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #856

2016-09-01 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4077: Backdate system test certificates to cope with clock skew

--
[...truncated 4906 lines...]

kafka.network.SocketServerTest > tooBigRequestIsRejected STARTED

kafka.network.SocketServerTest > tooBigRequestIsRejected PASSED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath PASSED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath PASSED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue STARTED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr STARTED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition STARTED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask STARTED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration STARTED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.timer.TimerTaskListTest > testAll STARTED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart STARTED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler STARTED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask STARTED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg STARTED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs STARTED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.UtilsTest > testAbs STARTED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix STARTED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator STARTED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes STARTED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList STARTED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt STARTED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testCsvMap STARTED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock STARTED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow STARTED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.utils.JsonTest > testJsonEncoding STARTED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.utils.IteratorTemplateTest > testIterator STARTED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer STARTED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > testMetricsLeak STARTED

kafka.metrics.MetricsTest > testMetricsLeak PASSED

kafka.consumer.TopicFilterTest > testWhitelists STARTED

kafka.consumer.TopicFilterTest > testWhitelists PASSED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson STARTED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson PASSED

kafka.consumer.TopicFilterTest > testBlacklists STARTED

kafka.consumer.TopicFilterTest > testBlacklists PASSED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor PASSED

kafka.consumer.PartitionAssignorTest > testRangePartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > testRangePartitionAssignor PASSED


Re: [DISCUSS] KIP-78: Cluster Id

2016-09-01 Thread Ismael Juma
Hi Jun,

Thanks for the feedback. Using `Cluster` was appealing because it has the
information we need and it is already a public class, so we would not need
to introduce a new public class with a potentially confusing name. Having
said that, I agree with your points and I have updated the KIP so that the
listener is called ClusterResourceListener and we now pass a
ClusterResource instance.

I have also clarified the motivation section a little and added a paragraph
about the fact that the cluster id is only stored in ZooKeeper. I think
this is fine for the first version and we can tackle improvements in future
KIPs.

I intend to start a vote soon as it seems like people are generally fine
with the current proposal.

Ismael

On Wed, Aug 31, 2016 at 4:34 PM, Jun Rao  wrote:

> Ismael,
>
> Thanks for the proposal. It looks good overall. Just a comment below.
>
> We are adding the following interface in ClusterListener and Cluster
> includes all brokers' endpoint and the metadata of topic partitions.
>
> void onClusterUpdate(Cluster cluster);
>
>
> On the broker side, will that method be called when there is broker or
> topic/partition change in metadata cache? Another thing is that Cluster
> only includes one endpoint. This makes sense for the client. However, on
> the broker side, it's not clear which endpoint should be used.
>
> In general, I am not sure how useful it is for serializers, interceptors,
> metric reporters to know all brokers endpoints and topic/partition
> metadata.
>
> I was thinking we could instead pass in sth like a ClusterResourceMeta and
> just include the clusterId for now. This way, we can guarantee that
> onClusterUpdate()
> will only be called once and it's easier to implement on the broker side.
>
> Jun
>
>
> On Wed, Aug 31, 2016 at 1:24 AM, Ismael Juma  wrote:
>
> > Thanks for the feedback Guozhang. Comment inline.
> >
> > On Wed, Aug 31, 2016 at 2:49 AM, Guozhang Wang 
> wrote:
> >
> > > About logging / debugging with the cluster id: I think the random UUID
> > > itself may not be very helpful for human-readable debugging
> information,
> > > and we'd better use the cluster name mentioned in future work in
> logging.
> > >
> >
> > We can also add the human-readable value once it's available. However,
> the
> > random UUID is still useful now. After all, we use Git commit hashes in
> > many places and they are significantly longer than what we are proposing
> > here (40 instead of 22 characters) . When comparing by eye, one can often
> > just look at the first few characters to distinguish. Does that make
> sense?
> >
> > Ismael
> >
>


[jira] [Commented] (KAFKA-4077) Backdate validity of certificates in system tests to cope with clock skew

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456026#comment-15456026
 ] 

ASF GitHub Bot commented on KAFKA-4077:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1810


> Backdate validity of certificates in system tests to cope with clock skew
> -
>
> Key: KAFKA-4077
> URL: https://issues.apache.org/jira/browse/KAFKA-4077
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The scenario described by [~ewencp] in 
> https://github.com/apache/kafka/pull/1483 where tests failed with 
> java.security.cert.CertificateNotYetValidException. Certificates are created 
> on the host and copied to VMs and hence should cope with a small amount of 
> clock skew. Set start date to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4077) Backdate validity of certificates in system tests to cope with clock skew

2016-09-01 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-4077:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1810
[https://github.com/apache/kafka/pull/1810]

> Backdate validity of certificates in system tests to cope with clock skew
> -
>
> Key: KAFKA-4077
> URL: https://issues.apache.org/jira/browse/KAFKA-4077
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The scenario described by [~ewencp] in 
> https://github.com/apache/kafka/pull/1483 where tests failed with 
> java.security.cert.CertificateNotYetValidException. Certificates are created 
> on the host and copied to VMs and hence should cope with a small amount of 
> clock skew. Set start date to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1810: KAFKA-4077: Backdate system test certificates to c...

2016-09-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1810


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1812: KIP-74: Add Fetch Response Size Limit in Bytes (KI...

2016-09-01 Thread nepal
GitHub user nepal opened a pull request:

https://github.com/apache/kafka/pull/1812

KIP-74: Add Fetch Response Size Limit in Bytes (KIP-74) [WIP]

This PR is incomplete implementation of 
[KIP-74](https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes)
 which is originally motivated by 
[KAFKA-2063](https://issues.apache.org/jira/browse/KAFKA-2063).

What is missing:

1) FetchRequest is modified to preserve partition ordering, but clients 
(Consumer & ReplicaFetcherThread) currently do not do round-robin shuffling of 
partitions
2) Serven can exceed response limit by up to max.partition.fetch.bytes, not 
by the size of first message in first partition

Your comments are greatly appreciated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nepal/kafka kip-74

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1812


commit 4a03265fabb563104f37329aff28b27167055c3e
Author: Andrey L. Neporada 
Date:   2016-07-29T12:28:07Z

KAFKA-2063: Add possibility to bound fetch response size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


PartitionAssignor / Sort members per subscription time before assigning partitions

2016-09-01 Thread Florian Hussonnois
Hi Kafka Team,

I would like to have your opinion before creating a new JIRA.

I'm working with the Java Consumer API. The current partition assignors use
the consumer ids to sort members before assigning partitions.

This works pretty well as long as all consumers are started into the same
JVM and no more consumers than the number of partitions are created.

However, in many cases consumers are distributed across multiple hosts. To
continue consuming with an optimal number of consumers (even with a host
failure) we can create as many consumers as partitions on each host.

For example, we have two consumers C0, C1 each on a dedicated host and one
topic with 4 partitions. With current assignors it is not possible to have
2 consuming threads and 2 idle threads per host.

Instead of that, C0 will have 4 consuming threads and C1 will have 4 idle
threads.

One solution could be to keep a timestamp the first time a member
subscribes to a topic. This timestamp can then be used to sort members for
the partitions assignment. In this way, the partition assignment will be
more predictable as it will not depend on member ids.

One drawback of this solution is that the consumer responsible of
assignments will keep a local state.

Thanks,

-- 
Florian


[jira] [Assigned] (KAFKA-3708) Rethink exception handling in KafkaStreams

2016-09-01 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy reassigned KAFKA-3708:
-

Assignee: Damian Guy

> Rethink exception handling in KafkaStreams
> --
>
> Key: KAFKA-3708
> URL: https://issues.apache.org/jira/browse/KAFKA-3708
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> As for 0.10.0.0, the worker threads (i.e. {{StreamThreads}}) can possibly 
> encounter the following runtime exceptions:
> 1) {{consumer.poll()}} could throw KafkaException if some of the 
> configuration are not accepted, such as topics not authorized to read / write 
> (security), session-timeout value not valid, etc; these exceptions will be 
> thrown in the first ever {{poll()}}.
> 2) {{task.addRecords()}} could throw KafkaException (most likely 
> SerializationException) if the deserialization fails.
> 3) {{task.process() / punctuate()}} could throw various KafkaException; for 
> example, serialization / deserialization errors, state storage operation 
> failures (RocksDBException, for example),  producer sending failures, etc.
> 4) {{maybeCommit / commitAll / commitOne}} could throw various Exceptions if 
> the flushing of state store fails, and when {{consumer.commitSync}} throws 
> exceptions other than {{CommitFailedException}}.
> For all the above 4 cases, KafkaStreams does not capture and handle them, but 
> expose them to users, and let users to handle them via 
> {{KafkaStreams.setUncaughtExceptionHandler}}. We need to re-think if the 
> library should just handle these cases without exposing them to users and 
> kill the threads / migrate tasks to others since they are all not recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4112) Remove alpha quality label from Kafka Streams in docs

2016-09-01 Thread Damian Guy (JIRA)
Damian Guy created KAFKA-4112:
-

 Summary: Remove alpha quality label from Kafka Streams in docs
 Key: KAFKA-4112
 URL: https://issues.apache.org/jira/browse/KAFKA-4112
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Damian Guy
Assignee: Damian Guy
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-33 - Add a time based log index

2016-09-01 Thread Ismael Juma
Sounds good to me too.

For completeness, one thing that is mentioned in the JIRA but not in the
message below is: "If the first message in the segment does not have a
timetamp, we will fall back to use the wall clock time for log rolling".

Ismael

On Wed, Aug 31, 2016 at 7:59 PM, Guozhang Wang  wrote:

> Some of the streams integration tests also encounters this issue where the
> timestamps we are using in the test are very small (e.g. 1,2,3...) which
> cause the log to roll upon each append, and the old segment gets deleted
> very soon. Arguably this can be resolved to enforce LogAppendTime
> configuration on the embedded server.
>
> +1 on the proposed change, makes sense to me.
>
> Guozhang
>
>
> On Tue, Aug 30, 2016 at 4:33 PM, Becket Qin  wrote:
>
> > Hi folks,
> >
> > Here is another update on the change of time based log rolling.
> >
> > After the latest implementation, we encountered KAFKA-4099. The issue is
> > that if users move replicas, for the messages in the old segments, the
> new
> > replica will create one log segment for each message. The root cause of
> > this is we are comparing the wall clock time with the message timestamp.
> A
> > solution to that is also described in KAFKA-4099, which is to change the
> > log rolling purely based on the timestamp in the messages. More
> > specifically, we roll out the log segment if the timestamp in the current
> > message is greater than the timestamp of the first message in the segment
> > by more than log.roll.ms. This approach is wall clock independent and
> > should solve the problem. With message.timestamp.difference.max.ms
> > configuration, we can achieve 1) the log segment will be rolled out in a
> > bounded time, 2) no excessively large timestamp will be accepted and
> cause
> > frequent log rolling.
> >
> > Any concern regarding this change?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Jun 13, 2016 at 2:30 PM, Guozhang Wang 
> wrote:
> >
> > > Thanks Jiangjie,
> > >
> > > I see the need for sensitive data purging, the above proposed change
> > LGTM.
> > > One minor concern is that a wrongly marked timestamp on the first
> record
> > > could cause the segment to roll much later / earlier, though it may be
> > > rare.
> > >
> > > Guozhang
> > >
> > > On Fri, Jun 10, 2016 at 10:07 AM, Becket Qin 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > During the implementation of KIP-33, we found it might be useful to
> > have
> > > a
> > > > more deterministic time based log rolling than what proposed in the
> > KIP.
> > > >
> > > > The current KIP proposal uses the largest timestamp in the segment
> for
> > > time
> > > > based rolling. The active log segment only rolls when there is no
> > message
> > > > appended in max.roll.ms since the largest timestamp in the segment.
> > i.e.
> > > > the rolling time may change if user keeping appending messages into
> the
> > > > segment. This may not be a desirable behavior for people who have
> > > sensitive
> > > > data and want to make sure they are removed after some time.
> > > >
> > > > To solve the above issue, we want to modify the KIP proposal
> regarding
> > > the
> > > > time based rolling to the following behavior. The time based log
> > rolling
> > > > will be based on the first message with a timestamp in the log
> segment
> > if
> > > > there is such a message. If no message in the segment has a
> timestamp,
> > > the
> > > > time based log rolling will still be based on log segment create
> time,
> > > > which is the same as we are doing now. The reasons we don't want to
> > > always
> > > > roll based on file create time are because 1) the message timestamp
> may
> > > be
> > > > assigned by clients which can be different from the create time of
> the
> > > log
> > > > segment file. 2) On some Linux, the file create time is not
> available,
> > so
> > > > using segment file create time may not always work.
> > > >
> > > > Do people have any concern for this change? I will update the KIP if
> > > people
> > > > think the change is OK.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Apr 19, 2016 at 6:27 PM, Becket Qin 
> > > wrote:
> > > >
> > > > > Thanks Joel and Ismael. I just updated the KIP based on your
> > feedback.
> > > > >
> > > > > KIP-33 has passed with +4 (binding) and +2 (non-binding)
> > > > >
> > > > > Thanks everyone for the reading, feedback and voting!
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Apr 19, 2016 at 5:25 PM, Ismael Juma 
> > > wrote:
> > > > >
> > > > >> Thanks Becket. I think it would be nice to update the KIP with
> > regards
> > > > to
> > > > >> point 3 and 4.
> > > > >>
> > > > >> In any case, +1 (non-binding)
> > > > >>
> > > > >> Ismael
> > > > >>
> > > > >> On Tue, Apr 19, 2016 at 2:03 AM, Becket Qin  >
> > > > wrote:
> > > > >>

WIKI page outdated

2016-09-01 Thread UMESH CHAUDHARY
Hello Mates,
I may be reiterating the previously identified issue here so please forgive
me if you find it noisy.

The page :
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example# is
having Producer example which is seems to be outdated but having
"partitioner.class" property which is introduced after 0.8.x .

Also it is using kafka.producer.ProducerConfig which seems bit old.

Can we update the page and correct the example?

Regards,
Umesh


[jira] [Updated] (KAFKA-4111) broker compress data of certain size instead on a produce request

2016-09-01 Thread julien1987 (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

julien1987 updated KAFKA-4111:
--
Summary: broker compress data of certain size instead on a produce request  
(was: broker compress data on certain size instead on a produce request)

> broker compress data of certain size instead on a produce request
> -
>
> Key: KAFKA-4111
> URL: https://issues.apache.org/jira/browse/KAFKA-4111
> Project: Kafka
>  Issue Type: Improvement
>  Components: compression
>Affects Versions: 0.10.0.1
>Reporter: julien1987
>
> When "compression.type" is set on broker config, broker compress data on 
> every produce request. But on our sences, produce requst is very many, and 
> data of every request is not so much. So compression result is not good. Can 
> Broker compress data of every certain size from many produce requests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4111) broker compress data on certain size instead on a produce request

2016-09-01 Thread julien1987 (JIRA)
julien1987 created KAFKA-4111:
-

 Summary: broker compress data on certain size instead on a produce 
request
 Key: KAFKA-4111
 URL: https://issues.apache.org/jira/browse/KAFKA-4111
 Project: Kafka
  Issue Type: Improvement
  Components: compression
Affects Versions: 0.10.0.1
Reporter: julien1987


When "compression.type" is set on broker config, broker compress data on every 
produce request. But on our sences, produce requst is very many, and data of 
every request is not so much. So compression result is not good. Can Broker 
compress data of every certain size from many produce requests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Message sent ordering guarantees

2016-09-01 Thread Gerard Klijs
For async you could set ack to -1, but it keep slow, because it has to wait
for the broker(s), to know it is received in order, before sending the next
one, you need this if order is very important. When sending async the order
gets changed in case the leader becomes temporarily unavailable and other
errors, because data is send correct to the new leader, before it fails on
the old leader, and is send to the new leader.
Depending on your use case you have different options to handle this, for
example with using timestamps in the messages, so you know which event
happened first, bust this wil not work when you use compaction.

On Thu, Sep 1, 2016 at 6:19 AM 郭旭  wrote:

> Hi Kafka Experts,
>
> (Sorry to send this question to DEV group, but it seems that I can not find
> related document in user manual.)
>
> For official document ,I can find message sent guarantee as below.
> For *sync producer*, I think it is true but sync sent are very slow.(about
> 408 message per second if ack = all, 1000 message per second if ack = 1).
>
> batch and async sent could satisfy our throughput requirement, but I'm not
> sure if message sent ordering are guaranteed in *async *style.
>
> For some critical application, for example( replicate mysql binlog to kafka
> distributed committed log), binlog ordering are important(partitioned by
> database/table/PK). throughput also important.
>
> If I use async producer, partition the binlog by table and send them in
> batch. Is it safe for binlog ordering for a single table?
>
> Will async producer guarantee the send ordering?
>
>
> Regards
> Shawn
>
> Guarantees At
> a high-level Kafka gives the following guarantees:
>
>- Messages sent by a producer to a particular topic partition will be
>appended in the order they are sent. That is, if a message M1 is sent by
>the same producer as a message M2, and M1 is sent first, then M1 will
> have
>a lower offset than M2 and appear earlier in the log.
>


[jira] [Commented] (KAFKA-4106) Consumer / add configure method to PartitionAssignor interface

2016-09-01 Thread Florian Hussonnois (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454817#comment-15454817
 ] 

Florian Hussonnois commented on KAFKA-4106:
---

Thank you very much, sorry to have created this Jira too hastily.

> Consumer / add configure method to PartitionAssignor interface
> --
>
> Key: KAFKA-4106
> URL: https://issues.apache.org/jira/browse/KAFKA-4106
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Affects Versions: 0.10.0.1
>Reporter: Florian Hussonnois
>Priority: Minor
>
> Currently, we can implement a custom PartitionAssignor which will forward 
> user data that will be used during the assignments protocol. For example, 
> data can be used to implement a rack-aware assignor
> However, currently we cannot dynamically configure a PartitionAssignor 
> instance.
> It would be nice to add a method configure(Map PartitionAssignor interface. Then, this method will be invoked by the 
> KafkaConsumer  on each assignor, as this is do for deserializers.
> The code modifications are pretty straight-forward but involve modifying the 
> public interface PartitionAssignor. Does that mean this JIRA needs a KIP ?
> I can contribute to that improvement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4110) When running multiple Kafka instances, the stop script kills all running instances

2016-09-01 Thread Jan Callewaert (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Callewaert updated KAFKA-4110:
--
Attachment: my-kafka-server-stop.sh

Attached an update to the stop script which uses the path of the script to 
determine which process to kill.

> When running multiple Kafka instances, the stop script kills all running 
> instances
> --
>
> Key: KAFKA-4110
> URL: https://issues.apache.org/jira/browse/KAFKA-4110
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.0.1
> Environment: Linux
>Reporter: Jan Callewaert
>Priority: Minor
> Attachments: my-kafka-server-stop.sh
>
>
> The script kafka-server-stop.sh uses ps to get the PIDs of all the running 
> Kafka processes and then kills them.
> In my development environment, I run multiple Kafka-instances on the same 
> machine. When I want to stop one instance, the other instances are also 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4110) When running multiple Kafka instances, the stop script kills all running instances

2016-09-01 Thread Jan Callewaert (JIRA)
Jan Callewaert created KAFKA-4110:
-

 Summary: When running multiple Kafka instances, the stop script 
kills all running instances
 Key: KAFKA-4110
 URL: https://issues.apache.org/jira/browse/KAFKA-4110
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0.1
 Environment: Linux
Reporter: Jan Callewaert
Priority: Minor


The script kafka-server-stop.sh uses ps to get the PIDs of all the running 
Kafka processes and then kills them.

In my development environment, I run multiple Kafka-instances on the same 
machine. When I want to stop one instance, the other instances are also stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1811: MINOR: rephrase alpha quality kafka streams wordin...

2016-09-01 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1811

MINOR: rephrase alpha quality kafka streams wording

Rephrase 'alpha quality' wording in Streams section of api.html.
Couple of other minor fixes in streams.html

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kstreams-312

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1811


commit 886e848fc2908a084a5720cf123398d38a5ab3c0
Author: Damian Guy 
Date:   2016-09-01T07:32:59Z

remove alpha quality wording from Kafka Streams section in doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4077) Backdate validity of certificates in system tests to cope with clock skew

2016-09-01 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-4077:
--
Status: Patch Available  (was: Open)

> Backdate validity of certificates in system tests to cope with clock skew
> -
>
> Key: KAFKA-4077
> URL: https://issues.apache.org/jira/browse/KAFKA-4077
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Minor
>
> The scenario described by [~ewencp] in 
> https://github.com/apache/kafka/pull/1483 where tests failed with 
> java.security.cert.CertificateNotYetValidException. Certificates are created 
> on the host and copied to VMs and hence should cope with a small amount of 
> clock skew. Set start date to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4077) Backdate validity of certificates in system tests to cope with clock skew

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454596#comment-15454596
 ] 

ASF GitHub Bot commented on KAFKA-4077:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1810

KAFKA-4077: Backdate system test certificates to cope with clock skew



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4077

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1810


commit 89236f9093e583752eb7fdca71bd6b68b416cc0f
Author: Rajini Sivaram 
Date:   2016-08-24T10:06:45Z

KAFKA-4077: Backdate system test certificates to cope with clock skew




> Backdate validity of certificates in system tests to cope with clock skew
> -
>
> Key: KAFKA-4077
> URL: https://issues.apache.org/jira/browse/KAFKA-4077
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Minor
>
> The scenario described by [~ewencp] in 
> https://github.com/apache/kafka/pull/1483 where tests failed with 
> java.security.cert.CertificateNotYetValidException. Certificates are created 
> on the host and copied to VMs and hence should cope with a small amount of 
> clock skew. Set start date to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1810: KAFKA-4077: Backdate system test certificates to c...

2016-09-01 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1810

KAFKA-4077: Backdate system test certificates to cope with clock skew



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4077

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1810


commit 89236f9093e583752eb7fdca71bd6b68b416cc0f
Author: Rajini Sivaram 
Date:   2016-08-24T10:06:45Z

KAFKA-4077: Backdate system test certificates to cope with clock skew




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1780: Kafka 4077: Backdate system test certificates to c...

2016-09-01 Thread rajinisivaram
Github user rajinisivaram closed the pull request at:

https://github.com/apache/kafka/pull/1780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---