[jira] [Commented] (KAFKA-2720) Periodic purging groups in the coordinator

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299416#comment-15299416
 ] 

ASF GitHub Bot commented on KAFKA-2720:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1427

KAFKA-2720 [WIP]: expire group metadata when all offsets have expired



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-2720

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1427


commit f077556496c7c8e2da43a2989efb545d6a0e7de5
Author: Jason Gustafson 
Date:   2016-05-25T03:45:13Z

KAFKA-2720 [WIP]: expire group metadata when all offsets have expired




> Periodic purging groups in the coordinator
> --
>
> Key: KAFKA-2720
> URL: https://issues.apache.org/jira/browse/KAFKA-2720
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Currently the coordinator removes the group (i.e. both removing it from the 
> cache and writing the tombstone message on its local replica without waiting 
> for ack) once it becomes an empty group.
> This can lead to a few issues such as 1) group removal and creation churns 
> when a group with very few members are being rebalanced, 2) if the local 
> write is failed / not propagated to other followers, they can only be removed 
> again when a new coordinator is migrated and detects the group has no members 
> already.
> We could instead piggy-back the periodic offset expiration along with the 
> group purging as well which removes any groups that had no members already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2720 [WIP]: expire group metadata when a...

2016-05-24 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1427

KAFKA-2720 [WIP]: expire group metadata when all offsets have expired



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-2720

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1427


commit f077556496c7c8e2da43a2989efb545d6a0e7de5
Author: Jason Gustafson 
Date:   2016-05-25T03:45:13Z

KAFKA-2720 [WIP]: expire group metadata when all offsets have expired




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread Ewen Cheslack-Postava
+1 (binding)

Agreed that the log.cleaner.compaction.delay.ms is probably a better name,
and consistent with log.segment.delete.delay.ms. Checked configs for other
suffixes that seemed reasonable and despite only appearing in that one
broker config, it seems the best match.

-Ewen

On Tue, May 24, 2016 at 8:16 PM, Jay Kreps  wrote:

> I'm +1 on the concept.
>
> As with others I think the core challenge is to express this in an
> intuitive way, and carry the same terminology across the docs, the configs,
> and docstrings for the configs. Pictures would help.
>
> -Jay
>
> On Tue, May 24, 2016 at 6:54 PM, James Cheng  wrote:
>
> > I'm not sure what are the rules for who is allowed to vote, but I'm:
> >
> > +1 (non-binding) on the proposal
> >
> > I agree that the "log.cleaner.min.compaction.lag.ms" name is a little
> > confusing.
> >
> > I like Becket's "log.cleaner.compaction.delay.ms", or something similar.
> >
> > The KIP describes it as the portion of the topic "that will remain
> > uncompacted", so if you're open to alternate names:
> >
> > "log.cleaner.uncompacted.range.ms"
> > "log.cleaner.uncompacted.head.ms" (Except that I always get "log tail"
> > and "log head" mixed up...)
> > "log.cleaner.uncompacted.retention.ms" (Will it be confusing to have the
> > word "retention" in non-time-based topics?)
> >
> > I just thought of something: what happens to the value of "
> > log.cleaner.delete.retention.ms"? Does it still have the same meaning as
> > before? Does the timer start when log compaction happens (as it currently
> > does), so in reality, tombstones will only be removed from the log some
> > time after (log.cleaner.min.compaction.lag.ms +
> > log.cleaner.delete.retention.ms)?
> >
> > -James
> >
> > > On May 24, 2016, at 5:46 PM, Becket Qin  wrote:
> > >
> > > +1 (non-binding) on the proposal. Just a minor suggestion.
> > >
> > > I am wondering should we change the config name to "
> > > log.cleaner.compaction.delay.ms"? The first glance at the
> configuration
> > > name is a little confusing. I was thinking do we have a "max" lag? And
> is
> > > this "lag" a bad thing?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Tue, May 24, 2016 at 4:21 PM, Gwen Shapira 
> wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> Thanks for responding to all my original concerns in the discussion
> > thread.
> > >>
> > >> On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman <
> > eric.wasser...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I would like to begin voting on KIP-58 - Make Log Compaction Point
> > >>> Configurable
> > >>>
> > >>> KIP-58 is here:  <
> > >>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
> > 
> > >>>
> > >>> The Jira ticket KAFKA-1981 Make log compaction point configurable
> > >>> is here: 
> > >>>
> > >>> The original pull request is here: <
> > >>> https://github.com/apache/kafka/pull/1168>
> > >>> (this includes configurations for size and message count lags that
> will
> > >> be
> > >>> removed per discussion of KIP-58).
> > >>>
> > >>> The vote will run for 72 hours.
> > >>>
> > >>
> >
> >
>



-- 
Thanks,
Ewen


Kafka server contains messages but consumer cannot receive any(Producer used compression.type = snappy)

2016-05-24 Thread Shaolu Xu
Hi dev,

Kafka version: 0.9.0  language:  Java

When using kafka, I can set a codec by setting the* compression.type=snappy
*property of my kafka producer.

Suppose I use snappy compression in my producer, and i can see the message
use kafkaMonitor. But when I consuming the messages from kafka using
consumer, I cannot receive any messages.
So should I do something to *decode the data from snappy or set some
configuration when create consumer*?


Thanks in advance.


Thanks,
Nicole


Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread Jay Kreps
I'm +1 on the concept.

As with others I think the core challenge is to express this in an
intuitive way, and carry the same terminology across the docs, the configs,
and docstrings for the configs. Pictures would help.

-Jay

On Tue, May 24, 2016 at 6:54 PM, James Cheng  wrote:

> I'm not sure what are the rules for who is allowed to vote, but I'm:
>
> +1 (non-binding) on the proposal
>
> I agree that the "log.cleaner.min.compaction.lag.ms" name is a little
> confusing.
>
> I like Becket's "log.cleaner.compaction.delay.ms", or something similar.
>
> The KIP describes it as the portion of the topic "that will remain
> uncompacted", so if you're open to alternate names:
>
> "log.cleaner.uncompacted.range.ms"
> "log.cleaner.uncompacted.head.ms" (Except that I always get "log tail"
> and "log head" mixed up...)
> "log.cleaner.uncompacted.retention.ms" (Will it be confusing to have the
> word "retention" in non-time-based topics?)
>
> I just thought of something: what happens to the value of "
> log.cleaner.delete.retention.ms"? Does it still have the same meaning as
> before? Does the timer start when log compaction happens (as it currently
> does), so in reality, tombstones will only be removed from the log some
> time after (log.cleaner.min.compaction.lag.ms +
> log.cleaner.delete.retention.ms)?
>
> -James
>
> > On May 24, 2016, at 5:46 PM, Becket Qin  wrote:
> >
> > +1 (non-binding) on the proposal. Just a minor suggestion.
> >
> > I am wondering should we change the config name to "
> > log.cleaner.compaction.delay.ms"? The first glance at the configuration
> > name is a little confusing. I was thinking do we have a "max" lag? And is
> > this "lag" a bad thing?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Tue, May 24, 2016 at 4:21 PM, Gwen Shapira  wrote:
> >
> >> +1 (binding)
> >>
> >> Thanks for responding to all my original concerns in the discussion
> thread.
> >>
> >> On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman <
> eric.wasser...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I would like to begin voting on KIP-58 - Make Log Compaction Point
> >>> Configurable
> >>>
> >>> KIP-58 is here:  <
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
> 
> >>>
> >>> The Jira ticket KAFKA-1981 Make log compaction point configurable
> >>> is here: 
> >>>
> >>> The original pull request is here: <
> >>> https://github.com/apache/kafka/pull/1168>
> >>> (this includes configurations for size and message count lags that will
> >> be
> >>> removed per discussion of KIP-58).
> >>>
> >>> The vote will run for 72 hours.
> >>>
> >>
>
>


Jenkins build is back to normal : kafka-trunk-jdk8 #647

2016-05-24 Thread Apache Jenkins Server
See 



Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread James Cheng
I'm not sure what are the rules for who is allowed to vote, but I'm:

+1 (non-binding) on the proposal

I agree that the "log.cleaner.min.compaction.lag.ms" name is a little confusing.

I like Becket's "log.cleaner.compaction.delay.ms", or something similar.

The KIP describes it as the portion of the topic "that will remain 
uncompacted", so if you're open to alternate names:

"log.cleaner.uncompacted.range.ms"
"log.cleaner.uncompacted.head.ms" (Except that I always get "log tail" and "log 
head" mixed up...)
"log.cleaner.uncompacted.retention.ms" (Will it be confusing to have the word 
"retention" in non-time-based topics?)

I just thought of something: what happens to the value of 
"log.cleaner.delete.retention.ms"? Does it still have the same meaning as 
before? Does the timer start when log compaction happens (as it currently 
does), so in reality, tombstones will only be removed from the log some time 
after (log.cleaner.min.compaction.lag.ms + log.cleaner.delete.retention.ms)?

-James

> On May 24, 2016, at 5:46 PM, Becket Qin  wrote:
> 
> +1 (non-binding) on the proposal. Just a minor suggestion.
> 
> I am wondering should we change the config name to "
> log.cleaner.compaction.delay.ms"? The first glance at the configuration
> name is a little confusing. I was thinking do we have a "max" lag? And is
> this "lag" a bad thing?
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> On Tue, May 24, 2016 at 4:21 PM, Gwen Shapira  wrote:
> 
>> +1 (binding)
>> 
>> Thanks for responding to all my original concerns in the discussion thread.
>> 
>> On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I would like to begin voting on KIP-58 - Make Log Compaction Point
>>> Configurable
>>> 
>>> KIP-58 is here:  <
>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
 
>>> 
>>> The Jira ticket KAFKA-1981 Make log compaction point configurable
>>> is here: 
>>> 
>>> The original pull request is here: <
>>> https://github.com/apache/kafka/pull/1168>
>>> (this includes configurations for size and message count lags that will
>> be
>>> removed per discussion of KIP-58).
>>> 
>>> The vote will run for 72 hours.
>>> 
>> 



[GitHub] kafka pull request: Setting broker state as running after publishi...

2016-05-24 Thread theduderog
GitHub user theduderog opened a pull request:

https://github.com/apache/kafka/pull/1426

Setting broker state as running after publishing to ZK

@junrao 

Currently, the broker state is set to running before it registers itself in 
ZooKeeper.  This is too early in the broker lifecycle.  If clients use the 
broker state as an indicator that the broker is ready to accept requests, they 
will get errors.  This change is to delay setting the broker state to running 
until it's registered in ZK.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/theduderog/kafka broker-running-after-zk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1426.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1426


commit 54052cb17b1572b50ce770168a30c5d9cbcb278a
Author: Roger Hoover 
Date:   2016-05-25T01:37:24Z

Setting broker state as running after publishing to ZK

commit cc8ce55f874121dc4c26a63ffd6c8c7eb3a8107d
Author: Roger Hoover 
Date:   2016-05-25T01:40:51Z

Restore imports




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread Becket Qin
+1 (non-binding) on the proposal. Just a minor suggestion.

I am wondering should we change the config name to "
log.cleaner.compaction.delay.ms"? The first glance at the configuration
name is a little confusing. I was thinking do we have a "max" lag? And is
this "lag" a bad thing?

Thanks,

Jiangjie (Becket) Qin


On Tue, May 24, 2016 at 4:21 PM, Gwen Shapira  wrote:

> +1 (binding)
>
> Thanks for responding to all my original concerns in the discussion thread.
>
> On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman 
> wrote:
>
> > Hi,
> >
> > I would like to begin voting on KIP-58 - Make Log Compaction Point
> > Configurable
> >
> > KIP-58 is here:  <
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
> > >
> >
> > The Jira ticket KAFKA-1981 Make log compaction point configurable
> > is here: 
> >
> > The original pull request is here: <
> > https://github.com/apache/kafka/pull/1168>
> > (this includes configurations for size and message count lags that will
> be
> > removed per discussion of KIP-58).
> >
> > The vote will run for 72 hours.
> >
>


[jira] [Commented] (KAFKA-3370) Add options to auto.offset.reset to reset offsets upon initialization only

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299252#comment-15299252
 ] 

Gwen Shapira commented on KAFKA-3370:
-

Not necessarily efficiency - maybe correctness of some kind. 

Right now, we assume that all offsets that are out of range are the same. 
It looks like we need finer breakdown:
* First time we see a partition (because we are new, or the partition is new)
* offset too small for range (starting from earliest almost always makes sense?)
* offset too large for range (going to end makes more sense?)

Maybe there are other "special" cases? 

> Add options to auto.offset.reset to reset offsets upon initialization only
> --
>
> Key: KAFKA-3370
> URL: https://issues.apache.org/jira/browse/KAFKA-3370
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> Currently "auto.offset.reset" is applied in the following two cases:
> 1) upon starting the consumer for the first time (hence no committed offsets 
> before);
> 2) upon fetching offsets out-of-range.
> For scenarios where case 2) needs to be avoid (i.e. people need to be 
> notified upon offsets out-of-range rather than silently offset reset), 
> "auto.offset.reset" need to be set to "none". However for case 1) setting 
> "auto.offset.reset" to "none" will cause NoOffsetForPartitionException upon 
> polling. And in this case, seekToBeginning/seekToEnd is mistakenly applied 
> trying to set the offset at initialization, which are actually designed for 
> during the life time of the consumer (in rebalance callback, for example).
> The fix proposal is to add two more options to "auto.offset.reset", 
> "earliest-on-start", and "latest-on-start", whose semantics are "earliest" 
> and "latest" for case 1) only, and "none" for case 2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1311

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[cshapi] Minor: Fix ps command example in docs

[cshapi] KAFKA-3683; Add file descriptor recommendation to ops guide

[cshapi] MINOR: Add virtual env to Kafka system test README.md

--
[...truncated 84 lines...]
^
:191:
 object ProducerRequestStatsRegistry in package producer is deprecated: This 
object has been deprecated and will be removed in a future release.
ProducerRequestStatsRegistry.removeProducerRequestStats(clientId)
^
:129:
 method readFromReadableChannel in class NetworkReceive is deprecated: see 
corresponding Javadoc for more information.
  response.readFromReadableChannel(channel)
   ^
:301:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
  if (partitionData.timestamp == 
OffsetCommitRequest.DEFAULT_TIMESTAMP)
^
:304:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
offsetRetention + partitionData.timestamp
^
:43:
 class OldProducer in package producer is deprecated: This class has been 
deprecated and will be removed in a future release. Please use 
org.apache.kafka.clients.producer.KafkaProducer instead.
new OldProducer(getOldProducerProps(config))
^
:45:
 class NewShinyProducer in package producer is deprecated: This class has been 
deprecated and will be removed in a future release. Please use 
org.apache.kafka.clients.producer.KafkaProducer instead.
new NewShinyProducer(getNewProducerProps(config))
^
14 warnings found
:kafka-trunk-jdk7:core:processResources UP-TO-DATE
:kafka-trunk-jdk7:core:classes
:kafka-trunk-jdk7:clients:compileTestJavaNote: 

 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:kafka-trunk-jdk7:clients:processTestResources
:kafka-trunk-jdk7:clients:testClasses
:kafka-trunk-jdk7:core:copyDependantLibs
:kafka-trunk-jdk7:core:jar
:jar_core_2_11
Building project 'core' with Scala version 2.11.8
:kafka-trunk-jdk7:clients:compileJava UP-TO-DATE
:kafka-trunk-jdk7:clients:processResources UP-TO-DATE
:kafka-trunk-jdk7:clients:classes UP-TO-DATE
:kafka-trunk-jdk7:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk7:clients:createVersionFile
:kafka-trunk-jdk7:clients:jar UP-TO-DATE
:kafka-trunk-jdk7:core:compileJava UP-TO-DATE
:kafka-trunk-jdk7:core:compileScala
:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP) {

  ^
:401:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
  if (value.expireTimestamp == 

[jira] [Commented] (KAFKA-3370) Add options to auto.offset.reset to reset offsets upon initialization only

2016-05-24 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299249#comment-15299249
 ] 

Vahid Hashemian commented on KAFKA-3370:


[~gwenshap] Thanks for your feedback, If I understand you correctly, you are 
describing an offset reset policy that is "smart" (for lack of a better term), 
and acts as "earliest" or "latest" depending on the situation to improve 
efficiency.

> Add options to auto.offset.reset to reset offsets upon initialization only
> --
>
> Key: KAFKA-3370
> URL: https://issues.apache.org/jira/browse/KAFKA-3370
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> Currently "auto.offset.reset" is applied in the following two cases:
> 1) upon starting the consumer for the first time (hence no committed offsets 
> before);
> 2) upon fetching offsets out-of-range.
> For scenarios where case 2) needs to be avoid (i.e. people need to be 
> notified upon offsets out-of-range rather than silently offset reset), 
> "auto.offset.reset" need to be set to "none". However for case 1) setting 
> "auto.offset.reset" to "none" will cause NoOffsetForPartitionException upon 
> polling. And in this case, seekToBeginning/seekToEnd is mistakenly applied 
> trying to set the offset at initialization, which are actually designed for 
> during the life time of the consumer (in rebalance callback, for example).
> The fix proposal is to add two more options to "auto.offset.reset", 
> "earliest-on-start", and "latest-on-start", whose semantics are "earliest" 
> and "latest" for case 1) only, and "none" for case 2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Add virtual env to Kafka system test RE...

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1346


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3683) Add file descriptor recommendation to ops guide

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-3683.
-
   Resolution: Fixed
Fix Version/s: 0.10.0.1
   0.10.1.0

Issue resolved by pull request 1353
[https://github.com/apache/kafka/pull/1353]

> Add file descriptor recommendation to ops guide
> ---
>
> Key: KAFKA-3683
> URL: https://issues.apache.org/jira/browse/KAFKA-3683
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> The Ops section of the documentation says that the file descriptor limits are 
> an important OS configuration to pay attention to but offer no guidance on 
> how to configure them.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3683) Add file descriptor recommendation to ops guide

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299230#comment-15299230
 ] 

ASF GitHub Bot commented on KAFKA-3683:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1353


> Add file descriptor recommendation to ops guide
> ---
>
> Key: KAFKA-3683
> URL: https://issues.apache.org/jira/browse/KAFKA-3683
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> The Ops section of the documentation says that the file descriptor limits are 
> an important OS configuration to pay attention to but offer no guidance on 
> how to configure them.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3683: Add file descriptor recommendation...

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #646

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[cshapi] Minor: Fix ps command example in docs

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-2 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 98928f7ad0f556caf4b8ff4b212c946ce948d7d6 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 98928f7ad0f556caf4b8ff4b212c946ce948d7d6
 > git rev-list 5f498855d9e247b07af54c6329b5b5347e469ace # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson3270836720972156384.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 9.581 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson9061066027008067371.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJavawarning: [options] bootstrap class path 
not set in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:kafka-trunk-jdk8:clients:processResources UP-TO-DATE
:kafka-trunk-jdk8:clients:classes
:kafka-trunk-jdk8:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk8:clients:createVersionFile
:kafka-trunk-jdk8:clients:jar
:kafka-trunk-jdk8:core:compileJava UP-TO-DATE
:kafka-trunk-jdk8:core:compileScalaJava HotSpot(TM) 64-Bit Server VM warning: 
ignoring option MaxPermSize=512m; support was removed in 8.0

:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long = 

[GitHub] kafka pull request: Update doc

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3737) Closing connection during produce request should be log with WARN level.

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299191#comment-15299191
 ] 

Gwen Shapira commented on KAFKA-3737:
-

I don't mind bumping up the LOG level, but I don't think this is the log you 
were looking for...

This INFO is only when data is sent with "request.required.acks=0" from the 
consumer, which theoretically should mean that the producer doesn't care 
whether the data made it or not (by the time we write this log, the consumer 
already assumes everything is amazing, which is why we need to close the 
connector to get its attention).

Maybe the log message you are looking for is:
" debug("Produce request with correlation id %d from client %s on partition %s 
failed due to %s".format("
Few lines up? 

This is for all the produce errors, but then, this can flood the logs pretty 
easily. Brokers handle thousands produce requests per second...



In general:
1. The best way to detect producer errors is from the producer log... 
2. We recommend running the broker log at INFO level anyway :)

> Closing connection during produce request should be log with WARN level.
> 
>
> Key: KAFKA-3737
> URL: https://issues.apache.org/jira/browse/KAFKA-3737
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Florian Hussonnois
>Priority: Trivial
>
> Currently if an an error occurred during a produce request the exeception is 
> log as info.
> INFO [KafkaApi-0] Closing connection due to error during produce request with 
> correlation id 24 from client id console-producer with ack=0
> Topic and partition to exceptions: [test,0] -> 
> kafka.common.MessageSizeTooLargeException (kafka.server.KafkaApis)
> It could be more conveniant to use a WARN level to ease the tracing of this 
> errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3736) Add http metrics reporter

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299164#comment-15299164
 ] 

Gwen Shapira commented on KAFKA-3736:
-

Any reason not to leave this as a github project (like connectors, non-java 
clients, tons of admin tools, etc) and put it inside Kafka?

I hope the KIP will answer this concern.

> Add http metrics reporter
> -
>
> Key: KAFKA-3736
> URL: https://issues.apache.org/jira/browse/KAFKA-3736
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Adrian Muraru
> Fix For: 0.10.1.0
>
>
> The current builtin JMX metrics reporter is pretty heavy in terms of load and 
> collection. A new http lightweight reporter is proposed to expose the metrics 
> via a local http port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-0.10.0-jdk7 #106

2016-05-24 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3370) Add options to auto.offset.reset to reset offsets upon initialization only

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299148#comment-15299148
 ] 

Gwen Shapira commented on KAFKA-3370:
-

actually, it is more complicated...

suppose that you have MirrorMaker - you want it to be earliest on start to 
avoid missing data if MirrorMaker starts when topics already exist. But if it 
accidentally gets ahead (or inconsistent) with HWM on broker, you don't want it 
to reread and rereplicate the entire topic (true story) - in that case, we want 
it to just move to latest.

Maybe we want to think through few scenarios (earliest if you are close to 
beginning and latest if you are closer to end?) to see what makes sense.

> Add options to auto.offset.reset to reset offsets upon initialization only
> --
>
> Key: KAFKA-3370
> URL: https://issues.apache.org/jira/browse/KAFKA-3370
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> Currently "auto.offset.reset" is applied in the following two cases:
> 1) upon starting the consumer for the first time (hence no committed offsets 
> before);
> 2) upon fetching offsets out-of-range.
> For scenarios where case 2) needs to be avoid (i.e. people need to be 
> notified upon offsets out-of-range rather than silently offset reset), 
> "auto.offset.reset" need to be set to "none". However for case 1) setting 
> "auto.offset.reset" to "none" will cause NoOffsetForPartitionException upon 
> polling. And in this case, seekToBeginning/seekToEnd is mistakenly applied 
> trying to set the offset at initialization, which are actually designed for 
> during the life time of the consumer (in rebalance callback, for example).
> The fix proposal is to add two more options to "auto.offset.reset", 
> "earliest-on-start", and "latest-on-start", whose semantics are "earliest" 
> and "latest" for case 1) only, and "none" for case 2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread Gwen Shapira
+1 (binding)

Thanks for responding to all my original concerns in the discussion thread.

On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman 
wrote:

> Hi,
>
> I would like to begin voting on KIP-58 - Make Log Compaction Point
> Configurable
>
> KIP-58 is here:  <
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
> >
>
> The Jira ticket KAFKA-1981 Make log compaction point configurable
> is here: 
>
> The original pull request is here: <
> https://github.com/apache/kafka/pull/1168>
> (this includes configurations for size and message count lags that will be
> removed per discussion of KIP-58).
>
> The vote will run for 72 hours.
>


Re: [DISCUSS] scalability limits in the coordinator

2016-05-24 Thread Gwen Shapira
Regarding the change to the assignment field. It would be a protocol bump,
otherwise consumers will not know how to parse the bytes the broker is
returning, right?
Or did I misunderstand the suggestion?

On Tue, May 24, 2016 at 2:52 PM, Guozhang Wang  wrote:

> I think for just solving issue 1), Jun's suggestion is sufficient and
> simple. So I'd prefer that approach.
>
> In addition, Jason's optimization on the assignment field would be good for
> 2) and 3) as well, and I like that optimization for its simplicity and no
> format change as well. And in the future I'm in favor of considering to
> change the in-memory cache format as Jiangjie suggested.
>
> Guozhang
>
>
> On Tue, May 24, 2016 at 12:42 PM, Becket Qin  wrote:
>
> > Hi Jason,
> >
> > There are a few problems we want to solve here:
> > 1. The group metadata is too big to be appended to the log.
> > 2. Reduce the memory footprint on the broker
> > 3. Reduce the bytes transferred over the wire.
> >
> > To solve (1), I like your idea of having separate messages per member.
> The
> > proposal (Onur's option 8) is to break metadata into small records in the
> > same uncompressed message set so each record is small. I agree it would
> be
> > ideal if we are able to store the metadata separately for each member. I
> > was also thinking about storing the metadata into multiple messages, too.
> > What concerns me was that having multiple messages seems breaking the
> > atomicity. I am not sure how we are going to deal with the potential
> > issues. For example, What if group metadata is replicated but the member
> > metadata is not? It might be fine depending on the implementation though,
> > but I am not sure.
> >
> > For (2) we want to store the metadata onto the disk, which is what we
> have
> > to do anyway. The only question is in what format should we store them.
> >
> > To address (3) we want to have the metadata to be compressed, which is
> > contradict to the the above solution of (1).
> >
> > I think Jun's suggestion is probably still the simplest. To avoid
> changing
> > the behavior for consumers, maybe we can do that only for offset_topic,
> > i.e, if the max fetch bytes of the fetch request is smaller than the
> > message size on the offset topic, we always return at least one full
> > message. This should avoid the unexpected problem on the client side
> > because supposedly only tools and brokers will fetch from the the
> internal
> > topics,
> >
> > As a modification to what you suggested, one solution I was thinking was
> to
> > have multiple messages in a single compressed message. That means for
> > SyncGroupResponse we still need to read the entire compressed messages
> and
> > extract the inner messages, which seems not quite different from having a
> > single message containing everything. But let me just put it here and see
> > if that makes sense.
> >
> > We can have a map of GroupMetadataKey -> GroupMetadataValueOffset.
> >
> > The GroupMetadataValue is stored in a compressed message. The inner
> > messages are the following:
> >
> > Inner Message 0: Version GroupId Generation
> >
> > Inner Message 1: MemberId MemberMetadata_1 (we can compress the bytes
> here)
> >
> > Inner Message 2: MemberId MemberMetadata_2
> > 
> > Inner Message N: MemberId MemberMetadata_N
> >
> > The MemberMetadata format is the following:
> >   MemberMetadata => Version Generation ClientId Host Subscription
> > Assignment
> >
> > So DescribeGroupResponse will just return the entire compressed
> > GroupMetadataMessage. SyncGroupResponse will return the corresponding
> inner
> > message.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Tue, May 24, 2016 at 9:14 AM, Jason Gustafson 
> > wrote:
> >
> > > Hey Becket,
> > >
> > > I like your idea to store only the offset for the group metadata in
> > memory.
> > > I think it would be safe to keep it in memory for a short time after
> the
> > > rebalance completes, but after that, it's only real purpose is to
> answer
> > > DescribeGroup requests, so your proposal makes a lot of sense to me.
> > >
> > > As for the specific problem with the size of the group metadata message
> > for
> > > the MM case, if we cannot succeed in reducing the size of the
> > > subscription/assignment (which I think is still probably the best
> > > alternative if it can work), then I think there are some options for
> > > changing the message format (option #8 in Onur's initial e-mail).
> > > Currently, the key used for storing the group metadata is this:
> > >
> > > GroupMetadataKey => Version GroupId
> > >
> > > And the value is something like this (some details elided):
> > >
> > > GroupMetadataValue => Version GroupId Generation [MemberMetadata]
> > >   MemberMetadata => ClientId Host Subscription Assignment
> > >
> > > I don't think we can change the key without a lot of pain, but it seems
> > > like we can change the value format. Maybe we can 

[jira] [Commented] (KAFKA-3708) Rethink exception handling in KafkaStreams

2016-05-24 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299107#comment-15299107
 ] 

Guozhang Wang commented on KAFKA-3708:
--

Related commit to this ticket: 

https://github.com/apache/kafka/pull/894

> Rethink exception handling in KafkaStreams
> --
>
> Key: KAFKA-3708
> URL: https://issues.apache.org/jira/browse/KAFKA-3708
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> As for 0.10.0.0, the worker threads (i.e. {{StreamThreads}}) can possibly 
> encounter the following runtime exceptions:
> 1) {{consumer.poll()}} could throw KafkaException if some of the 
> configuration are not accepted, such as topics not authorized to read / write 
> (security), session-timeout value not valid, etc; these exceptions will be 
> thrown in the first ever {{poll()}}.
> 2) {{task.addRecords()}} could throw KafkaException (most likely 
> SerializationException) if the deserialization fails.
> 3) {{task.process() / punctuate()}} could throw various KafkaException; for 
> example, serialization / deserialization errors, state storage operation 
> failures (RocksDBException, for example),  producer sending failures, etc.
> 4) {{maybeCommit / commitAll / commitOne}} could throw various Exceptions if 
> the flushing of state store fails, and when {{consumer.commitSync}} throws 
> exceptions other than {{CommitFailedException}}.
> For all the above 4 cases, KafkaStreams does not capture and handle them, but 
> expose them to users, and let users to handle them via 
> {{KafkaStreams.setUncaughtExceptionHandler}}. We need to re-think if the 
> library should just handle these cases without exposing them to users and 
> kill the threads / migrate tasks to others since they are all not recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka Connect connector monitoring tasks via offset storage reader

2016-05-24 Thread Randall Hauch



On May 24, 2016 at 4:28:41 PM, Liquan Pei (liquan...@gmail.com) wrote:

Hi Randall, 

This is interesting. Essentially we can track the progress by getting data 
from offset storage. What are the use cases you have in mind that uses the 
offsets of source partitions? I can imagine that comparing the source 
offsets for the new data and already delivered data and make some decisions 
such as task reconfiguration and etc.
Yes, exactly. I can think of two use cases:

Upon startup, using previously committed offsets to identify new vs existing 
source partitions, and using this to more intelligently distribute the source 
partitions across tasks.
Tracking the progress of tasks by reading the committed offsets, and signaling 
for task reconfiguration when task(s) have reached some predetermined point. 
One concern is that offsets read from OffsetStorageReader may be stale and 
we may end up making decisions not on the latest data. In general, I think 
the questions is that what do we want to do if we know that up to this 
source offset the data is delivered in Kafka. 
Would they really be stale during connector startup? Aren’t they accurate in 
this case, enabling the connector to make intelligent decisions about task 
configurations.

However, if the connector and tasks are running then, yes, the only guarantee 
about offsets read from storage is that the offsets were committed, so tasks 
have _at least_ recorded those offsets but may have done more.

From the API perspective, do we want to expose the OffsetStorageReader or 
just add a method to return the source offsets? Note that this is only 
relevant to source connectors, not sure whether makes sense or not to 
create SourceConnectorContext and SinkConnectorContext. 
Yes, this probably is only related to source connectors, and defining 
SourceConnectorContext and SinkConnectorContext may be the appropriate way to 
go forward. Changing the type of the ‘context’ field in the Connector abstract 
class or moving that to the appropriate subclasses would break binary classfile 
compatibility with earlier versions (meaning a newer version of Kafka Connect 
could not use a connector compiled with an older Kafka Connect library). But 
that may not matter, since the `Connector` interface is still annotated with 
“@InterfaceStability.Unstable” in the 0.10.0.0 tag [1] and in the trunk branch 
[2]. In fact, it may be useful to define those ConnectorContext subtypes sooner 
than later to allow binary classfile backward compatibility later on.

Best regards,

Randall

[1] 
https://github.com/apache/kafka/blob/0.10.0/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java

[2] 
https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java



Thanks, 
Liquan 

On Tue, May 24, 2016 at 1:31 PM, Randall Hauch  wrote: 

> I have a need for one of my SourceConnector implementations to configure a 
> bunch of tasks and, when those are all “done”, request a task 
> reconfiguration so that it can run a single task. Think: many tasks to make 
> snapshot of database tables, then when those are completed reconfigure 
> itself so that it then started _one_ task to read the transaction log. 
> 
> Unfortunately, I can’t figure out a way for the connector to “monitor” the 
> progress of its tasks, especially when those tasks are distributed across 
> the cluster. The only way I can think of to get around this is to have my 
> connector start *one* task that performs the snapshot and then starts 
> reading the transaction log. Unfortunately, that means to parallelize the 
> snapshotting work, the task would need to manage its own threads. That’s 
> possible, but undesirable for many reasons, not the least of which is that 
> the work can’t be distributed as multiple tasks amongst the cluster of 
> Kafka Connect workers. 
> 
> On the other hand, a simple enhancement to Kafka Connect would make this 
> very easy: add to the ConnectorContext a method that returned the 
> OffsetStorageReader. The connector could start a thread to periodically 
> poll the offsets for various partitions, and effectively watch the progress 
> of the tasks. Not only that, the connector’s 'taskConfigs(int)’ method 
> could use the OffsetStorageReader to read previously-recorded offsets to 
> more intelligently configure its tasks. This seems very straightforward, 
> backward compatible, and non-intrusive. 
> 
> Is there any interest in this? If so, I can create an issue and work on a 
> pull request. 
> 
> Best regards, 
> 
> Randall Hauch 




-- 
Liquan Pei 
Software Engineer, Confluent Inc 


[jira] [Commented] (KAFKA-3745) Consider adding join key to ValueJoiner interface

2016-05-24 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299045#comment-15299045
 ] 

Guozhang Wang commented on KAFKA-3745:
--

Let's leave this ticket open for a while in case we see more common usage to 
modify this API.

> Consider adding join key to ValueJoiner interface
> -
>
> Key: KAFKA-3745
> URL: https://issues.apache.org/jira/browse/KAFKA-3745
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>Priority: Minor
>  Labels: api, newbie
>
> In working with Kafka Stream joining, it's sometimes the case that a join key 
> is not actually present in the values of the joins themselves (if, for 
> example, a previous transform generated an ephemeral join key.) In such 
> cases, the actual key of the join is not available in the ValueJoiner 
> implementation to be used to construct the final joined value. This can be 
> worked around by explicitly threading the join key into the value if needed, 
> but it seems like extending the interface to pass the join key along as well 
> would be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] scalability limits in the coordinator

2016-05-24 Thread Guozhang Wang
I think for just solving issue 1), Jun's suggestion is sufficient and
simple. So I'd prefer that approach.

In addition, Jason's optimization on the assignment field would be good for
2) and 3) as well, and I like that optimization for its simplicity and no
format change as well. And in the future I'm in favor of considering to
change the in-memory cache format as Jiangjie suggested.

Guozhang


On Tue, May 24, 2016 at 12:42 PM, Becket Qin  wrote:

> Hi Jason,
>
> There are a few problems we want to solve here:
> 1. The group metadata is too big to be appended to the log.
> 2. Reduce the memory footprint on the broker
> 3. Reduce the bytes transferred over the wire.
>
> To solve (1), I like your idea of having separate messages per member. The
> proposal (Onur's option 8) is to break metadata into small records in the
> same uncompressed message set so each record is small. I agree it would be
> ideal if we are able to store the metadata separately for each member. I
> was also thinking about storing the metadata into multiple messages, too.
> What concerns me was that having multiple messages seems breaking the
> atomicity. I am not sure how we are going to deal with the potential
> issues. For example, What if group metadata is replicated but the member
> metadata is not? It might be fine depending on the implementation though,
> but I am not sure.
>
> For (2) we want to store the metadata onto the disk, which is what we have
> to do anyway. The only question is in what format should we store them.
>
> To address (3) we want to have the metadata to be compressed, which is
> contradict to the the above solution of (1).
>
> I think Jun's suggestion is probably still the simplest. To avoid changing
> the behavior for consumers, maybe we can do that only for offset_topic,
> i.e, if the max fetch bytes of the fetch request is smaller than the
> message size on the offset topic, we always return at least one full
> message. This should avoid the unexpected problem on the client side
> because supposedly only tools and brokers will fetch from the the internal
> topics,
>
> As a modification to what you suggested, one solution I was thinking was to
> have multiple messages in a single compressed message. That means for
> SyncGroupResponse we still need to read the entire compressed messages and
> extract the inner messages, which seems not quite different from having a
> single message containing everything. But let me just put it here and see
> if that makes sense.
>
> We can have a map of GroupMetadataKey -> GroupMetadataValueOffset.
>
> The GroupMetadataValue is stored in a compressed message. The inner
> messages are the following:
>
> Inner Message 0: Version GroupId Generation
>
> Inner Message 1: MemberId MemberMetadata_1 (we can compress the bytes here)
>
> Inner Message 2: MemberId MemberMetadata_2
> 
> Inner Message N: MemberId MemberMetadata_N
>
> The MemberMetadata format is the following:
>   MemberMetadata => Version Generation ClientId Host Subscription
> Assignment
>
> So DescribeGroupResponse will just return the entire compressed
> GroupMetadataMessage. SyncGroupResponse will return the corresponding inner
> message.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Tue, May 24, 2016 at 9:14 AM, Jason Gustafson 
> wrote:
>
> > Hey Becket,
> >
> > I like your idea to store only the offset for the group metadata in
> memory.
> > I think it would be safe to keep it in memory for a short time after the
> > rebalance completes, but after that, it's only real purpose is to answer
> > DescribeGroup requests, so your proposal makes a lot of sense to me.
> >
> > As for the specific problem with the size of the group metadata message
> for
> > the MM case, if we cannot succeed in reducing the size of the
> > subscription/assignment (which I think is still probably the best
> > alternative if it can work), then I think there are some options for
> > changing the message format (option #8 in Onur's initial e-mail).
> > Currently, the key used for storing the group metadata is this:
> >
> > GroupMetadataKey => Version GroupId
> >
> > And the value is something like this (some details elided):
> >
> > GroupMetadataValue => Version GroupId Generation [MemberMetadata]
> >   MemberMetadata => ClientId Host Subscription Assignment
> >
> > I don't think we can change the key without a lot of pain, but it seems
> > like we can change the value format. Maybe we can take the
> > subscription/assignment payloads out of the value and introduce a new
> > "MemberMetadata" message for each member in the group. For example:
> >
> > MemberMetadataKey => Version GroupId MemberId
> >
> > MemberMetadataValue => Version Generation ClientId Host Subscription
> > Assignment
> >
> > When a new generation is created, we would first write the group metadata
> > message which includes the generation and all of the memberIds, and then
> > we'd write the member metadata messages. To answer 

[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-05-24 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298949#comment-15298949
 ] 

Guozhang Wang commented on KAFKA-3752:
--

We are adding locks to the directory only for the case where there are multiple 
stream threads on the same machine, when one thread is accessing (hence owning) 
this directory and the other is cleaning it. We should revisit this issue and 
see if there is a better solution than using locks here.

> Provide a way for KStreams to recover from unclean shutdown
> ---
>
> Key: KAFKA-3752
> URL: https://issues.apache.org/jira/browse/KAFKA-3752
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Guozhang Wang
>  Labels: architecture
>
> If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
> Killer), it may leave behind lock files and fail to recover.
> It would be useful to have an options (say --force) to tell KStreams to 
> proceed even if it finds old LOCK files.
> {noformat}
> [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in 
> thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:583)
> org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
> the state manager
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
>   at 
> 

Re: Kafka Connect connector monitoring tasks via offset storage reader

2016-05-24 Thread Liquan Pei
Hi Randall,

This is interesting. Essentially we can track the progress by getting data
from offset storage. What are the use cases you have in mind that uses the
offsets of source partitions? I can imagine that comparing the source
offsets for the new data and already delivered data and make some decisions
such as task reconfiguration and etc.

One concern is that offsets read from OffsetStorageReader may be stale and
we may end up making decisions not on the latest data. In general, I think
the questions is that what do we want to do if we know that up to this
source offset the data is delivered in Kafka.

>From the API perspective, do we want to expose the OffsetStorageReader or
just add a method to return the source offsets?  Note that this is only
relevant to source connectors, not sure whether makes sense or not to
create SourceConnectorContext and SinkConnectorContext.

Thanks,
Liquan

On Tue, May 24, 2016 at 1:31 PM, Randall Hauch  wrote:

> I have a need for one of my SourceConnector implementations to configure a
> bunch of tasks and, when those are all “done”, request a task
> reconfiguration so that it can run a single task. Think: many tasks to make
> snapshot of database tables, then when those are completed reconfigure
> itself so that it then started _one_ task to read the transaction log.
>
> Unfortunately, I can’t figure out a way for the connector to “monitor” the
> progress of its tasks, especially when those tasks are distributed across
> the cluster. The only way I can think of to get around this is to have my
> connector start *one* task that performs the snapshot and then starts
> reading the transaction log. Unfortunately, that means to parallelize the
> snapshotting work, the task would need to manage its own threads. That’s
> possible, but undesirable for many reasons, not the least of which is that
> the work can’t be distributed as multiple tasks amongst the cluster of
> Kafka Connect workers.
>
> On the other hand, a simple enhancement to Kafka Connect would make this
> very easy: add to the ConnectorContext a method that returned the
> OffsetStorageReader. The connector could start a thread to periodically
> poll the offsets for various partitions, and effectively watch the progress
> of the tasks. Not only that, the connector’s 'taskConfigs(int)’ method
> could use the OffsetStorageReader to read previously-recorded offsets to
> more intelligently configure its tasks. This seems very straightforward,
> backward compatible, and non-intrusive.
>
> Is there any interest in this? If so, I can create an issue and work on a
> pull request.
>
> Best regards,
>
> Randall Hauch




-- 
Liquan Pei
Software Engineer, Confluent Inc


[jira] [Updated] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-05-24 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3752:
-
Labels: architecture  (was: )

> Provide a way for KStreams to recover from unclean shutdown
> ---
>
> Key: KAFKA-3752
> URL: https://issues.apache.org/jira/browse/KAFKA-3752
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Guozhang Wang
>  Labels: architecture
>
> If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
> Killer), it may leave behind lock files and fail to recover.
> It would be useful to have an options (say --force) to tell KStreams to 
> proceed even if it finds old LOCK files.
> {noformat}
> [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in 
> thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:583)
> org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
> the state manager
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
>   at 
> 

[jira] [Commented] (KAFKA-3728) EndToEndAuthorizationTest offsets_topic misconfigured

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298907#comment-15298907
 ] 

ASF GitHub Bot commented on KAFKA-3728:
---

GitHub user edoardocomar opened a pull request:

https://github.com/apache/kafka/pull/1425

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-3728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1425


commit 82a1d1bb0ce3065d7008f18f9d1bcd84bcb24d94
Author: Edoardo Comar 
Date:   2016-05-19T15:52:15Z

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.




> EndToEndAuthorizationTest offsets_topic misconfigured
> -
>
> Key: KAFKA-3728
> URL: https://issues.apache.org/jira/browse/KAFKA-3728
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>
> A consumer that is manually assigned a topic-partition is able to consume 
> messages that a consumer that subscribes to the topic can not.
> To reproduce : take the test 
> EndToEndAuthorizationTest.testProduceConsume 
> (eg the SaslSslEndToEndAuthorizationTest implementation)
>  
> it passes ( = messages are consumed) 
> if the consumer is assigned the single topic-partition
>   consumers.head.assign(List(tp).asJava)
> but fails 
> if the consumer subscribes to the topic - changing the line to :
>   consumers.head.subscribe(List(topic).asJava)
> The failure when subscribed shows this error about synchronization:
>  org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:455)
> The test passes in both cases (subscribe and assign) with the setting
>   this.serverConfig.setProperty(KafkaConfig.MinInSyncReplicasProp, "1")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3728 EndToEndAuthorizationTest offsets_t...

2016-05-24 Thread edoardocomar
GitHub user edoardocomar opened a pull request:

https://github.com/apache/kafka/pull/1425

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-3728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1425


commit 82a1d1bb0ce3065d7008f18f9d1bcd84bcb24d94
Author: Edoardo Comar 
Date:   2016-05-19T15:52:15Z

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1573) Transient test failures on LogTest.testCorruptLog

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298901#comment-15298901
 ] 

Gwen Shapira commented on KAFKA-1573:
-

[~guozhang] is this still failing?

> Transient test failures on LogTest.testCorruptLog
> -
>
> Key: KAFKA-1573
> URL: https://issues.apache.org/jira/browse/KAFKA-1573
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>  Labels: transient-unit-test-failure
> Fix For: 0.10.1.0
>
>
> Here is an example of the test failure trace:
> junit.framework.AssertionFailedError: expected:<87> but was:<68>
>   at junit.framework.Assert.fail(Assert.java:47)
>   at junit.framework.Assert.failNotEquals(Assert.java:277)
>   at junit.framework.Assert.assertEquals(Assert.java:64)
>   at junit.framework.Assert.assertEquals(Assert.java:130)
>   at junit.framework.Assert.assertEquals(Assert.java:136)
>   at 
> kafka.log.LogTest$$anonfun$testCorruptLog$1.apply$mcVI$sp(LogTest.scala:615)
>   at 
> scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
>   at 
> scala.collection.immutable.Range$$anon$2.foreach$mVc$sp(Range.scala:265)
>   at kafka.log.LogTest.testCorruptLog(LogTest.scala:595)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.internal.runners.TestMethodRunner.executeMethodBody(TestMethodRunner.java:99)
>   at 
> org.junit.internal.runners.TestMethodRunner.runUnprotected(TestMethodRunner.java:81)
>   at 
> org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
>   at 
> org.junit.internal.runners.TestMethodRunner.runMethod(TestMethodRunner.java:75)
>   at 
> org.junit.internal.runners.TestMethodRunner.run(TestMethodRunner.java:45)
>   at 
> org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(TestClassMethodsRunner.java:71)
>   at 
> org.junit.internal.runners.TestClassMethodsRunner.run(TestClassMethodsRunner.java:35)
>   at 
> org.junit.internal.runners.TestClassRunner$1.runUnprotected(TestClassRunner.java:42)
>   at 
> org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
>   at 
> org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:80)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:47)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:49)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at $Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:103)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
>   at 
> org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:66)
>   at 
> 

[jira] [Commented] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset

2016-05-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298900#comment-15298900
 ] 

Gwen Shapira commented on KAFKA-1211:
-

[~junrao] is this still an issue?

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> 
>
> Key: KAFKA-1211
> URL: https://issues.apache.org/jira/browse/KAFKA-1211
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.10.1.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1011) Decompression and re-compression on MirrorMaker could result in messages being dropped in the pipeline

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-1011:

   Resolution: Fixed
Fix Version/s: (was: 0.10.1.0)
   Status: Resolved  (was: Patch Available)

MirrorMaker was rewritten, compression was re-written, lets just assume it was 
fixed :)

> Decompression and re-compression on MirrorMaker could result in messages 
> being dropped in the pipeline
> --
>
> Key: KAFKA-1011
> URL: https://issues.apache.org/jira/browse/KAFKA-1011
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-1011.v1.patch
>
>
> The way MirrorMaker works today is that its consumers could use deep iterator 
> to decompress messages received from the source brokers and its producers 
> could re-compress the messages while sending them to the target brokers. 
> Since MirrorMakers use a centralized data channel for its consumers to pipe 
> messages to its producers, and since producers would compress messages with 
> the same topic within a batch as a single produce request, this could result 
> in messages accepted at the front end of the pipeline being dropped at the 
> target brokers of the MirrorMaker due to MesageSizeTooLargeException if it 
> happens that one batch of messages contain too many messages of the same 
> topic in MirrorMaker's producer. If we can use shallow iterator at the 
> MirrorMaker's consumer side to directly pipe compressed messages this issue 
> can be fixed. 
> Also as Swapnil pointed out, currently if the MirrorMaker lags and there are 
> large messages in the MirrorMaker queue (large after decompression), it can 
> run into an OutOfMemoryException. Shallow iteration will be very helpful in 
> avoiding this exception.
> The proposed solution of this issue is also related to KAFKA-527.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1190) create a draw performance graph script

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-1190:

   Resolution: Won't Fix
Fix Version/s: (was: 0.10.1.0)
   Status: Resolved  (was: Patch Available)

with this delay, maybe lets just assume it is out of scope for now. I don't 
think we want to maintain R scripts in this project anyway.

> create a draw performance graph script
> --
>
> Key: KAFKA-1190
> URL: https://issues.apache.org/jira/browse/KAFKA-1190
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Joe Stein
> Attachments: KAFKA-1190.patch
>
>
> This will be an R script to draw relevant graphs given a bunch of csv files 
> from the above tools. The output of this script will be a bunch of png files 
> that can be combined with some html to act as a perf report.
> Here are the graphs that would be good to see:
> * Latency histogram for producer
> * MB/sec and messages/sec produced
> * MB/sec and messages/sec consumed
> * Flush time
> * Errors (should not be any)
> * Consumer cache hit ratio (both the bytes and count, specifically 1 
>   #physical_reads / #requests and 1 - physical_bytes_read / bytes_read)
> * Write merge ratio (num_physical_writes/num_produce_requests and 
> avg_request_size/avg_physical_write_size)
> CPU, network, io, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-485) Support MacOS for this test framework

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-485.

   Resolution: Fixed
Fix Version/s: (was: 0.10.1.0)

related to old test framework which no longer exists

> Support MacOS for this test framework
> -
>
> Key: KAFKA-485
> URL: https://issues.apache.org/jira/browse/KAFKA-485
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.8.0
>Reporter: John Fung
>Assignee: John Fung
>  Labels: replication-testing
>
> Currently this test framework doesn't work properly in MacOS due to the 
> different "ps" arguments from Linux. It is required to have "ps" to work in 
> MacOS to stop background running processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


subscription request

2016-05-24 Thread Sriram Subramanian



[jira] [Resolved] (KAFKA-174) Add performance suite for Kafka

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-174.

   Resolution: Fixed
Fix Version/s: (was: 0.10.1.0)

was done ages ago...

> Add performance suite for Kafka
> ---
>
> Key: KAFKA-174
> URL: https://issues.apache.org/jira/browse/KAFKA-174
> Project: Kafka
>  Issue Type: New Feature
>Affects Versions: 0.8.0
>Reporter: Neha Narkhede
>  Labels: replication
>
> This is a placeholder JIRA for adding a perf suite to Kafka. The high level 
> proposal is here -
> https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
> There will be more JIRAs covering smaller tasks to fully implement this. They 
> will be linked to this JIRA. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-175) Add helper scripts to wrap the current perf tools

2016-05-24 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-175:
---
   Resolution: Fixed
Fix Version/s: (was: 0.10.1.0)
   Status: Resolved  (was: Patch Available)

This was done ages ago...

> Add helper scripts to wrap the current perf tools
> -
>
> Key: KAFKA-175
> URL: https://issues.apache.org/jira/browse/KAFKA-175
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Attachments: kafka-175-updated.patch, kafka-175.patch
>
>
> We have 3 useful tools to run producer and consumer perf tests - 
> ProducerPerformance.scala, SimpleConsumerPerformance.scala and 
> ConsumerPerformance.scala.
> These tests expose several options that allows you to define the load for 
> each perf run. It will be good to expose some helper scripts that will cover 
> some single node perf testing scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-24 Thread Eric Wasserman
Hi,

I would like to begin voting on KIP-58 - Make Log Compaction Point
Configurable

KIP-58 is here:  <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
>

The Jira ticket KAFKA-1981 Make log compaction point configurable
is here: 

The original pull request is here: <
https://github.com/apache/kafka/pull/1168>
(this includes configurations for size and message count lags that will be
removed per discussion of KIP-58).

The vote will run for 72 hours.


Kafka Connect connector monitoring tasks via offset storage reader

2016-05-24 Thread Randall Hauch
I have a need for one of my SourceConnector implementations to configure a 
bunch of tasks and, when those are all “done”, request a task reconfiguration 
so that it can run a single task. Think: many tasks to make snapshot of 
database tables, then when those are completed reconfigure itself so that it 
then started _one_ task to read the transaction log.

Unfortunately, I can’t figure out a way for the connector to “monitor” the 
progress of its tasks, especially when those tasks are distributed across the 
cluster. The only way I can think of to get around this is to have my connector 
start *one* task that performs the snapshot and then starts reading the 
transaction log. Unfortunately, that means to parallelize the snapshotting 
work, the task would need to manage its own threads. That’s possible, but 
undesirable for many reasons, not the least of which is that the work can’t be 
distributed as multiple tasks amongst the cluster of Kafka Connect workers.

On the other hand, a simple enhancement to Kafka Connect would make this very 
easy: add to the ConnectorContext a method that returned the 
OffsetStorageReader. The connector could start a thread to periodically poll 
the offsets for various partitions, and effectively watch the progress of the 
tasks. Not only that, the connector’s 'taskConfigs(int)’ method could use the 
OffsetStorageReader to read previously-recorded offsets to more intelligently 
configure its tasks. This seems very straightforward, backward compatible, and 
non-intrusive.

Is there any interest in this? If so, I can create an issue and work on a pull 
request.

Best regards,

Randall Hauch

Build failed in Jenkins: kafka-trunk-jdk8 #645

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[cshapi] MINOR: Fix documentation table of contents and

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-2 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 5f498855d9e247b07af54c6329b5b5347e469ace 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5f498855d9e247b07af54c6329b5b5347e469ace
 > git rev-list fe27d8f787f38428e0add36edeac9d694f16af53 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson2810088534898225725.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 11.163 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson6692265967400446311.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJavawarning: [options] bootstrap class path 
not set in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:kafka-trunk-jdk8:clients:processResources UP-TO-DATE
:kafka-trunk-jdk8:clients:classes
:kafka-trunk-jdk8:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk8:clients:createVersionFile
:kafka-trunk-jdk8:clients:jar
:kafka-trunk-jdk8:core:compileJava UP-TO-DATE
:kafka-trunk-jdk8:core:compileScalaJava HotSpot(TM) 64-Bit Server VM warning: 
ignoring option MaxPermSize=512m; support was removed in 8.0

:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long 

[jira] [Assigned] (KAFKA-3370) Add options to auto.offset.reset to reset offsets upon initialization only

2016-05-24 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian reassigned KAFKA-3370:
--

Assignee: Vahid Hashemian

> Add options to auto.offset.reset to reset offsets upon initialization only
> --
>
> Key: KAFKA-3370
> URL: https://issues.apache.org/jira/browse/KAFKA-3370
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> Currently "auto.offset.reset" is applied in the following two cases:
> 1) upon starting the consumer for the first time (hence no committed offsets 
> before);
> 2) upon fetching offsets out-of-range.
> For scenarios where case 2) needs to be avoid (i.e. people need to be 
> notified upon offsets out-of-range rather than silently offset reset), 
> "auto.offset.reset" need to be set to "none". However for case 1) setting 
> "auto.offset.reset" to "none" will cause NoOffsetForPartitionException upon 
> polling. And in this case, seekToBeginning/seekToEnd is mistakenly applied 
> trying to set the offset at initialization, which are actually designed for 
> during the life time of the consumer (in rebalance callback, for example).
> The fix proposal is to add two more options to "auto.offset.reset", 
> "earliest-on-start", and "latest-on-start", whose semantics are "earliest" 
> and "latest" for case 1) only, and "none" for case 2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Fix documentation table of contents and...

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1423


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] scalability limits in the coordinator

2016-05-24 Thread Becket Qin
Hi Jason,

There are a few problems we want to solve here:
1. The group metadata is too big to be appended to the log.
2. Reduce the memory footprint on the broker
3. Reduce the bytes transferred over the wire.

To solve (1), I like your idea of having separate messages per member. The
proposal (Onur's option 8) is to break metadata into small records in the
same uncompressed message set so each record is small. I agree it would be
ideal if we are able to store the metadata separately for each member. I
was also thinking about storing the metadata into multiple messages, too.
What concerns me was that having multiple messages seems breaking the
atomicity. I am not sure how we are going to deal with the potential
issues. For example, What if group metadata is replicated but the member
metadata is not? It might be fine depending on the implementation though,
but I am not sure.

For (2) we want to store the metadata onto the disk, which is what we have
to do anyway. The only question is in what format should we store them.

To address (3) we want to have the metadata to be compressed, which is
contradict to the the above solution of (1).

I think Jun's suggestion is probably still the simplest. To avoid changing
the behavior for consumers, maybe we can do that only for offset_topic,
i.e, if the max fetch bytes of the fetch request is smaller than the
message size on the offset topic, we always return at least one full
message. This should avoid the unexpected problem on the client side
because supposedly only tools and brokers will fetch from the the internal
topics,

As a modification to what you suggested, one solution I was thinking was to
have multiple messages in a single compressed message. That means for
SyncGroupResponse we still need to read the entire compressed messages and
extract the inner messages, which seems not quite different from having a
single message containing everything. But let me just put it here and see
if that makes sense.

We can have a map of GroupMetadataKey -> GroupMetadataValueOffset.

The GroupMetadataValue is stored in a compressed message. The inner
messages are the following:

Inner Message 0: Version GroupId Generation

Inner Message 1: MemberId MemberMetadata_1 (we can compress the bytes here)

Inner Message 2: MemberId MemberMetadata_2

Inner Message N: MemberId MemberMetadata_N

The MemberMetadata format is the following:
  MemberMetadata => Version Generation ClientId Host Subscription Assignment

So DescribeGroupResponse will just return the entire compressed
GroupMetadataMessage. SyncGroupResponse will return the corresponding inner
message.

Thanks,

Jiangjie (Becket) Qin



On Tue, May 24, 2016 at 9:14 AM, Jason Gustafson  wrote:

> Hey Becket,
>
> I like your idea to store only the offset for the group metadata in memory.
> I think it would be safe to keep it in memory for a short time after the
> rebalance completes, but after that, it's only real purpose is to answer
> DescribeGroup requests, so your proposal makes a lot of sense to me.
>
> As for the specific problem with the size of the group metadata message for
> the MM case, if we cannot succeed in reducing the size of the
> subscription/assignment (which I think is still probably the best
> alternative if it can work), then I think there are some options for
> changing the message format (option #8 in Onur's initial e-mail).
> Currently, the key used for storing the group metadata is this:
>
> GroupMetadataKey => Version GroupId
>
> And the value is something like this (some details elided):
>
> GroupMetadataValue => Version GroupId Generation [MemberMetadata]
>   MemberMetadata => ClientId Host Subscription Assignment
>
> I don't think we can change the key without a lot of pain, but it seems
> like we can change the value format. Maybe we can take the
> subscription/assignment payloads out of the value and introduce a new
> "MemberMetadata" message for each member in the group. For example:
>
> MemberMetadataKey => Version GroupId MemberId
>
> MemberMetadataValue => Version Generation ClientId Host Subscription
> Assignment
>
> When a new generation is created, we would first write the group metadata
> message which includes the generation and all of the memberIds, and then
> we'd write the member metadata messages. To answer the DescribeGroup
> request, we'd read the group metadata at the cached offset and, depending
> on the version, all of the following member metadata. This would be more
> complex to maintain, but it seems doable if it comes to it.
>
> Thanks,
> Jason
>
> On Mon, May 23, 2016 at 6:15 PM, Becket Qin  wrote:
>
> > It might worth thinking a little further. We have discussed this before
> > that we want to avoid holding all the group metadata in memory.
> >
> > I am thinking about the following end state:
> >
> > 1. Enable compression on the offset topic.
> > 2. Instead of holding the entire group metadata in memory on the brokers,
> > each 

Re: Kafka KIP meeting May 24 at 11:00am PST

2016-05-24 Thread Jun Rao
The following are the notes from today's KIP discussion.


   - KIP-58 - Make Log Compaction Point Configurable: We want to start with
   just a time-based configuration since there is no good usage for byte-based
   or message-based configuration. Eric will change the KIP and start the vote.
   - KIP-4 - Admin api: Grant will pick up the work. Initially, he plans to
   route the write requests from the admin clients to the controller directly
   to avoid having the broker forward the requests to the controller.
   - KIP-48 - Delegation tokens: Two of the remaining issues are (1) how to
   store the delegation tokens and (2) how token expiration works. Since Parth
   wasn't able to attend the meeting. We will follow up in the mailing list.


The video will be uploaded soon in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
 .

Thanks,

Jun


On Mon, May 23, 2016 at 9:20 AM, Jun Rao  wrote:

> Hi, Everyone,
>
> Now that Kafka 0.10.0.0 is released, we will have a Kafka KIP meeting tomorrow
> at 11:00am PST. If you plan to attend but haven't received an invite,
> please let me know. The following is the agenda.
>
> Agenda:
>
> KIP-48 - Delegation tokens
> KIP-58 - Make Log Compaction Point Configurable
> KIP-4   - Status check
>
> Thanks,
>
> Jun
>


[jira] [Commented] (KAFKA-3728) EndToEndAuthorizationTest offsets_topic misconfigured

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298724#comment-15298724
 ] 

ASF GitHub Bot commented on KAFKA-3728:
---

Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1414


> EndToEndAuthorizationTest offsets_topic misconfigured
> -
>
> Key: KAFKA-3728
> URL: https://issues.apache.org/jira/browse/KAFKA-3728
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>
> A consumer that is manually assigned a topic-partition is able to consume 
> messages that a consumer that subscribes to the topic can not.
> To reproduce : take the test 
> EndToEndAuthorizationTest.testProduceConsume 
> (eg the SaslSslEndToEndAuthorizationTest implementation)
>  
> it passes ( = messages are consumed) 
> if the consumer is assigned the single topic-partition
>   consumers.head.assign(List(tp).asJava)
> but fails 
> if the consumer subscribes to the topic - changing the line to :
>   consumers.head.subscribe(List(topic).asJava)
> The failure when subscribed shows this error about synchronization:
>  org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:455)
> The test passes in both cases (subscribe and assign) with the setting
>   this.serverConfig.setProperty(KafkaConfig.MinInSyncReplicasProp, "1")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3728 EndToEndAuthorizationTest offsets_t...

2016-05-24 Thread edoardocomar
Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1414


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3728) EndToEndAuthorizationTest offsets_topic misconfigured

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298722#comment-15298722
 ] 

ASF GitHub Bot commented on KAFKA-3728:
---

Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1414


> EndToEndAuthorizationTest offsets_topic misconfigured
> -
>
> Key: KAFKA-3728
> URL: https://issues.apache.org/jira/browse/KAFKA-3728
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>
> A consumer that is manually assigned a topic-partition is able to consume 
> messages that a consumer that subscribes to the topic can not.
> To reproduce : take the test 
> EndToEndAuthorizationTest.testProduceConsume 
> (eg the SaslSslEndToEndAuthorizationTest implementation)
>  
> it passes ( = messages are consumed) 
> if the consumer is assigned the single topic-partition
>   consumers.head.assign(List(tp).asJava)
> but fails 
> if the consumer subscribes to the topic - changing the line to :
>   consumers.head.subscribe(List(topic).asJava)
> The failure when subscribed shows this error about synchronization:
>  org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:455)
> The test passes in both cases (subscribe and assign) with the setting
>   this.serverConfig.setProperty(KafkaConfig.MinInSyncReplicasProp, "1")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3728) EndToEndAuthorizationTest offsets_topic misconfigured

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298723#comment-15298723
 ] 

ASF GitHub Bot commented on KAFKA-3728:
---

GitHub user edoardocomar reopened a pull request:

https://github.com/apache/kafka/pull/1414

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
and OffsetCommitRequiredAcksProp to 1 to avoid timeouts

unit test for consumer that subscribes added 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-3728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1414


commit 82a1d1bb0ce3065d7008f18f9d1bcd84bcb24d94
Author: Edoardo Comar 
Date:   2016-05-19T15:52:15Z

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.




> EndToEndAuthorizationTest offsets_topic misconfigured
> -
>
> Key: KAFKA-3728
> URL: https://issues.apache.org/jira/browse/KAFKA-3728
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>
> A consumer that is manually assigned a topic-partition is able to consume 
> messages that a consumer that subscribes to the topic can not.
> To reproduce : take the test 
> EndToEndAuthorizationTest.testProduceConsume 
> (eg the SaslSslEndToEndAuthorizationTest implementation)
>  
> it passes ( = messages are consumed) 
> if the consumer is assigned the single topic-partition
>   consumers.head.assign(List(tp).asJava)
> but fails 
> if the consumer subscribes to the topic - changing the line to :
>   consumers.head.subscribe(List(topic).asJava)
> The failure when subscribed shows this error about synchronization:
>  org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:455)
> The test passes in both cases (subscribe and assign) with the setting
>   this.serverConfig.setProperty(KafkaConfig.MinInSyncReplicasProp, "1")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3728 EndToEndAuthorizationTest offsets_t...

2016-05-24 Thread edoardocomar
Github user edoardocomar closed the pull request at:

https://github.com/apache/kafka/pull/1414


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-3728 EndToEndAuthorizationTest offsets_t...

2016-05-24 Thread edoardocomar
GitHub user edoardocomar reopened a pull request:

https://github.com/apache/kafka/pull/1414

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
and OffsetCommitRequiredAcksProp to 1 to avoid timeouts

unit test for consumer that subscribes added 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-3728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1414


commit 82a1d1bb0ce3065d7008f18f9d1bcd84bcb24d94
Author: Edoardo Comar 
Date:   2016-05-19T15:52:15Z

KAFKA-3728 EndToEndAuthorizationTest offsets_topic misconfigured

Set OffsetsTopicReplicationFactorProp to 3 like MinInSyncReplicasProp
Earlier set acl for admin to read all
OffsetCommitRequiredAcksProp to 1 

Rationales:
Moving the set acl for admin avoided errors like :

ERROR [ReplicaFetcherThread-0-0], Error for partition
[__consumer_offsets,0] to broker
0:org.apache.kafka.common.errors.TopicAuthorizationException: Not
authorized to access topics: [Topic authorization failed.]
(kafka.server.ReplicaFetcherThread:97)

that were happening before the read to all was set

OffsetCommitRequiredAcksProp to 1 should not be needed, but without it,
testProduceConsumeViaSubscribe fails if it's not the first one to be
executed. Which suggests a cleanup problem, the test passes on its own.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3370) Add options to auto.offset.reset to reset offsets upon initialization only

2016-05-24 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298700#comment-15298700
 ] 

Vahid Hashemian commented on KAFKA-3370:


[~guozhang] Is this 
https://github.com/apache/kafka/compare/trunk...vahidhashemian:KAFKA-3370?expand=1
 close to what you had in mind for this JIRA? Thanks.

> Add options to auto.offset.reset to reset offsets upon initialization only
> --
>
> Key: KAFKA-3370
> URL: https://issues.apache.org/jira/browse/KAFKA-3370
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
> Fix For: 0.10.1.0
>
>
> Currently "auto.offset.reset" is applied in the following two cases:
> 1) upon starting the consumer for the first time (hence no committed offsets 
> before);
> 2) upon fetching offsets out-of-range.
> For scenarios where case 2) needs to be avoid (i.e. people need to be 
> notified upon offsets out-of-range rather than silently offset reset), 
> "auto.offset.reset" need to be set to "none". However for case 1) setting 
> "auto.offset.reset" to "none" will cause NoOffsetForPartitionException upon 
> polling. And in this case, seekToBeginning/seekToEnd is mistakenly applied 
> trying to set the offset at initialization, which are actually designed for 
> during the life time of the consumer (in rebalance callback, for example).
> The fix proposal is to add two more options to "auto.offset.reset", 
> "earliest-on-start", and "latest-on-start", whose semantics are "earliest" 
> and "latest" for case 1) only, and "none" for case 2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Grant Henke
Awesome! Thanks for managing the release Gwen!

On Tue, May 24, 2016 at 11:24 AM, Gwen Shapira  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.0.
> This is a major release with exciting new features, including first
> release of KafkaStreams and many other improvements.
>
> All of the changes in this release can be found:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and
> writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone
> for a large organization. It can be elastically and transparently expanded
> without downtime.
> Data streams are partitioned and spread over a cluster of machines to allow
> data streams
> larger than the capability of any single machine and to allow clusters of
> co-ordinated consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent
> data loss. Each broker can handle terabytes of messages without performance
> impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers
> strong durability and fault-tolerance guarantees.
>
> You can download the source release from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>
> and binary releases from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.0 release.
>
> Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> Kawamura, zhuchen1018
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
>
> Gwen
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Created] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-05-24 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3752:
---

 Summary: Provide a way for KStreams to recover from unclean 
shutdown
 Key: KAFKA-3752
 URL: https://issues.apache.org/jira/browse/KAFKA-3752
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Guozhang Wang


If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
Killer), it may leave behind lock files and fail to recover.

It would be useful to have an options (say --force) to tell KStreams to proceed 
even if it finds old LOCK files.

{noformat}
[2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in thread 
[StreamThread-1]:  
(org.apache.kafka.streams.processor.internals.StreamThread:583)
org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
the state manager
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
at 
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
at 
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
at 
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
at 
org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
at 
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Caused by: java.io.IOException: Failed to lock the state directory: 
/data/test/2/kafka-streams/test-2/0_0
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:95)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69)
  

Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread Liquan Pei
It seems that the links to images in the KIP are broken.

Liquan

On Tue, May 24, 2016 at 9:33 AM, parth brahmbhatt <
brahmbhatt.pa...@gmail.com> wrote:

> 110. What does getDelegationTokenAs mean?
> In the current proposal we only allow a user to get delegation token for
> the identity that it authenticated as using another mechanism, i.e. A user
> that authenticate using a keytab for principal us...@example.com will get
> delegation tokens for that user only. In future I think we will have to
> extend support such that we allow some set of users (
> kafka-rest-u...@example.com, storm-nim...@example.com) to acquire
> delegation tokens on behalf of other users whose identity they have
> verified independently.  Kafka brokers will have ACLs to control which
> users are allowed to impersonate other users and get tokens on behalf of
> them. Overall Impersonation is a whole different problem in my opinion and
> I think we can tackle it in separate KIP.
>
> 111. What's the typical rate of getting and renewing delegation tokens?
> Typically this should be very very low, 1 request per minute is a
> relatively high estimate. However it depends on the token expiration. I am
> less worried about the extra load it puts on controller vs the added
> complexity and the value it offers.
>
> Thanks
> Parth
>
>
>
> On Tue, May 24, 2016 at 7:30 AM, Ismael Juma  wrote:
>
> > Thanks Rajini. It would probably require a separate KIP as it will
> > introduce user visible changes. We could also update KIP-48 to have this
> > information, but it seems cleaner to do it separately. We can discuss
> that
> > in the KIP call today.
> >
> > Ismael
> >
> > On Tue, May 24, 2016 at 3:19 PM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> > > Ismael,
> > >
> > > I have created a JIRA (
> https://issues.apache.org/jira/browse/KAFKA-3751)
> > > for adding SCRAM as a SASL mechanism. Would that need another KIP? If
> > > KIP-48 will use this mechanism, can this just be a JIRA that gets
> > reviewed
> > > when the PR is ready?
> > >
> > > Thank you,
> > >
> > > Rajini
> > >
> > > On Tue, May 24, 2016 at 2:46 PM, Ismael Juma 
> wrote:
> > >
> > > > Thanks Rajini, SCRAM seems like a good candidate.
> > > >
> > > > Gwen had independently mentioned this as a SASL mechanism that might
> be
> > > > useful for Kafka and I have been meaning to check it in more detail.
> > Good
> > > > to know that you are willing to contribute an implementation. Maybe
> we
> > > > should file a separate JIRA for this?
> > > >
> > > > Ismael
> > > >
> > > > On Tue, May 24, 2016 at 2:12 PM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com> wrote:
> > > >
> > > > > SCRAM (Salted Challenge Response Authentication Mechanism) is a
> > better
> > > > > mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
> > > > > SaslServer or SaslClient, but I will be happy to add support in
> Kafka
> > > > since
> > > > > it would be a useful mechanism to support anyway.
> > > > > https://tools.ietf.org/html/rfc7677 describes the protocol for
> > > > > SCRAM-SHA-256.
> > > > >
> > > > > On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:
> > > > >
> > > > > > Parth,
> > > > > >
> > > > > > Thanks for the explanation. A couple of more questions.
> > > > > >
> > > > > > 110. What does getDelegationTokenAs mean?
> > > > > >
> > > > > > 111. What's the typical rate of getting and renewing delegation
> > > tokens?
> > > > > > That may have an impact on whether they should be directed to the
> > > > > > controller.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> > > > > > brahmbhatt.pa...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks for reviewing.
> > > > > > >
> > > > > > > * We could add a Cluster action to add acls on who can request
> > > > > delegation
> > > > > > > tokens. I don't see the use case for that yet but down the line
> > > when
> > > > we
> > > > > > > start supporting getDelegationTokenAs it will be necessary.
> > > > > > > * Yes we recommend tokens to be only used/distributed over
> secure
> > > > > > channels.
> > > > > > > * Depending on what design we end up choosing Invalidation will
> > be
> > > > > > > responsibility of every broker or controller.
> > > > > > > * I am not sure if I documented somewhere that invalidation
> will
> > > > > directly
> > > > > > > go through zookeeper but that is not the intent. Invalidation
> > will
> > > > > either
> > > > > > > be request based or due to expiration. No direct zookeeper
> > > > interaction
> > > > > > from
> > > > > > > any client.
> > > > > > > * "Broker also stores the DelegationToken without the hmac in
> the
> > > > > > > zookeeper." : Sorry about the confusion. The sole purpose of
> > > > zookeeper
> > > > > in
> > > > > > > this design is as distribution channel for tokens between all
> > > brokers
> > > > > > and a
> > > > > > > 

Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Ismael Juma
I had an offline chat with Gwen and the 0.10.0.0-rc6 tag is the right one.
I've updated the 0.10.0.0 tag to be the same:

https://github.com/apache/kafka/tree/0.10.0.0

Ismael

On Tue, May 24, 2016 at 6:04 PM, Ismael Juma  wrote:

> Hmm, sorry. The tag seems wrong. The commit you linked Tom seems the
> correct one:
>
>
> https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
>
> Gwen, is this right?
>
> Ismael
>
> On Tue, May 24, 2016 at 6:03 PM, Ismael Juma  wrote:
>
>> Hi Tom,
>>
>> The official commit can always be found via the relevant Git tag:
>>
>> https://github.com/apache/kafka/tree/0.10.0.0
>>
>> https://github.com/apache/kafka/commit/1b5879653e0d956c79556301d1d11987baf6f2d7
>>
>> Ismael
>>
>> On Tue, May 24, 2016 at 5:57 PM, Tom Crayford 
>> wrote:
>>
>>> Can I just confirm that
>>>
>>> https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
>>> is the official commit for the release? The source download doesn't have
>>> the git repo and I can't see a sha anywhere in the downloaded source.
>>>
>>> On Tue, May 24, 2016 at 5:42 PM, Becket Qin 
>>> wrote:
>>>
>>> > Awesome!
>>> >
>>> > On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:
>>> >
>>> > > Woohoo!!! :-)
>>> > >
>>> > > -Jay
>>> > >
>>> > > On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira 
>>> > wrote:
>>> > >
>>> > > > The Apache Kafka community is pleased to announce the release for
>>> > Apache
>>> > > > Kafka 0.10.0.0.
>>> > > > This is a major release with exciting new features, including first
>>> > > > release of KafkaStreams and many other improvements.
>>> > > >
>>> > > > All of the changes in this release can be found:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>>> > > >
>>> > > > Apache Kafka is high-throughput, publish-subscribe messaging system
>>> > > > rethought of as a distributed commit log.
>>> > > >
>>> > > > ** Fast => A single Kafka broker can handle hundreds of megabytes
>>> of
>>> > > reads
>>> > > > and
>>> > > > writes per second from thousands of clients.
>>> > > >
>>> > > > ** Scalable => Kafka is designed to allow a single cluster to
>>> serve as
>>> > > the
>>> > > > central data backbone
>>> > > > for a large organization. It can be elastically and transparently
>>> > > expanded
>>> > > > without downtime.
>>> > > > Data streams are partitioned and spread over a cluster of machines
>>> to
>>> > > allow
>>> > > > data streams
>>> > > > larger than the capability of any single machine and to allow
>>> clusters
>>> > of
>>> > > > co-ordinated consumers.
>>> > > >
>>> > > > ** Durable => Messages are persisted on disk and replicated within
>>> the
>>> > > > cluster to prevent
>>> > > > data loss. Each broker can handle terabytes of messages without
>>> > > performance
>>> > > > impact.
>>> > > >
>>> > > > ** Distributed by Design => Kafka has a modern cluster-centric
>>> design
>>> > > that
>>> > > > offers
>>> > > > strong durability and fault-tolerance guarantees.
>>> > > >
>>> > > > You can download the source release from
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>>> > > >
>>> > > > and binary releases from
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>>> > > >
>>> > > > A big thank you for the following people who have contributed to
>>> the
>>> > > > 0.10.0.0 release.
>>> > > >
>>> > > > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin,
>>> Allen
>>> > > > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
>>> > > > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu,
>>> Christian
>>> > > > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
>>> > > > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry
>>> Stratiychuk,
>>> > > > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
>>> > > > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
>>> > > > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank
>>> Scholten,
>>> > > > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
>>> > > > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
>>> > > > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason
>>> Gustafson,
>>> > > > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson,
>>> jholoman,
>>> > > > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
>>> > > > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
>>> > > > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
>>> > > > manasvigupta, Manikumar reddy O, Mark Grover, Matt 

Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Mayuresh Gharat
Great!!! cheers :)

Thanks,

Mayuresh

On Tue, May 24, 2016 at 10:07 AM, Ismael Juma  wrote:

> Awesome, thanks for running the release Gwen. :)
>
> Ismael
>
> On Tue, May 24, 2016 at 5:24 PM, Gwen Shapira  wrote:
>
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.0.0.
> > This is a major release with exciting new features, including first
> > release of KafkaStreams and many other improvements.
> >
> > All of the changes in this release can be found:
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> >
> > Apache Kafka is high-throughput, publish-subscribe messaging system
> > rethought of as a distributed commit log.
> >
> > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> reads
> > and
> > writes per second from thousands of clients.
> >
> > ** Scalable => Kafka is designed to allow a single cluster to serve as
> the
> > central data backbone
> > for a large organization. It can be elastically and transparently
> expanded
> > without downtime.
> > Data streams are partitioned and spread over a cluster of machines to
> allow
> > data streams
> > larger than the capability of any single machine and to allow clusters of
> > co-ordinated consumers.
> >
> > ** Durable => Messages are persisted on disk and replicated within the
> > cluster to prevent
> > data loss. Each broker can handle terabytes of messages without
> performance
> > impact.
> >
> > ** Distributed by Design => Kafka has a modern cluster-centric design
> that
> > offers
> > strong durability and fault-tolerance guarantees.
> >
> > You can download the source release from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> >
> > and binary releases from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> >
> > A big thank you for the following people who have contributed to the
> > 0.10.0.0 release.
> >
> > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > Kawamura, zhuchen1018
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> >
> > Gwen
> >
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Ismael Juma
Awesome, thanks for running the release Gwen. :)

Ismael

On Tue, May 24, 2016 at 5:24 PM, Gwen Shapira  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.0.
> This is a major release with exciting new features, including first
> release of KafkaStreams and many other improvements.
>
> All of the changes in this release can be found:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and
> writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone
> for a large organization. It can be elastically and transparently expanded
> without downtime.
> Data streams are partitioned and spread over a cluster of machines to allow
> data streams
> larger than the capability of any single machine and to allow clusters of
> co-ordinated consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent
> data loss. Each broker can handle terabytes of messages without performance
> impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers
> strong durability and fault-tolerance guarantees.
>
> You can download the source release from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>
> and binary releases from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.0 release.
>
> Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> Kawamura, zhuchen1018
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
>
> Gwen
>


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Ismael Juma
Hmm, sorry. The tag seems wrong. The commit you linked Tom seems the
correct one:

https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339

Gwen, is this right?

Ismael

On Tue, May 24, 2016 at 6:03 PM, Ismael Juma  wrote:

> Hi Tom,
>
> The official commit can always be found via the relevant Git tag:
>
> https://github.com/apache/kafka/tree/0.10.0.0
>
> https://github.com/apache/kafka/commit/1b5879653e0d956c79556301d1d11987baf6f2d7
>
> Ismael
>
> On Tue, May 24, 2016 at 5:57 PM, Tom Crayford 
> wrote:
>
>> Can I just confirm that
>>
>> https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
>> is the official commit for the release? The source download doesn't have
>> the git repo and I can't see a sha anywhere in the downloaded source.
>>
>> On Tue, May 24, 2016 at 5:42 PM, Becket Qin  wrote:
>>
>> > Awesome!
>> >
>> > On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:
>> >
>> > > Woohoo!!! :-)
>> > >
>> > > -Jay
>> > >
>> > > On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira 
>> > wrote:
>> > >
>> > > > The Apache Kafka community is pleased to announce the release for
>> > Apache
>> > > > Kafka 0.10.0.0.
>> > > > This is a major release with exciting new features, including first
>> > > > release of KafkaStreams and many other improvements.
>> > > >
>> > > > All of the changes in this release can be found:
>> > > >
>> > > >
>> > >
>> >
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>> > > >
>> > > > Apache Kafka is high-throughput, publish-subscribe messaging system
>> > > > rethought of as a distributed commit log.
>> > > >
>> > > > ** Fast => A single Kafka broker can handle hundreds of megabytes of
>> > > reads
>> > > > and
>> > > > writes per second from thousands of clients.
>> > > >
>> > > > ** Scalable => Kafka is designed to allow a single cluster to serve
>> as
>> > > the
>> > > > central data backbone
>> > > > for a large organization. It can be elastically and transparently
>> > > expanded
>> > > > without downtime.
>> > > > Data streams are partitioned and spread over a cluster of machines
>> to
>> > > allow
>> > > > data streams
>> > > > larger than the capability of any single machine and to allow
>> clusters
>> > of
>> > > > co-ordinated consumers.
>> > > >
>> > > > ** Durable => Messages are persisted on disk and replicated within
>> the
>> > > > cluster to prevent
>> > > > data loss. Each broker can handle terabytes of messages without
>> > > performance
>> > > > impact.
>> > > >
>> > > > ** Distributed by Design => Kafka has a modern cluster-centric
>> design
>> > > that
>> > > > offers
>> > > > strong durability and fault-tolerance guarantees.
>> > > >
>> > > > You can download the source release from
>> > > >
>> > > >
>> > >
>> >
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>> > > >
>> > > > and binary releases from
>> > > >
>> > > >
>> > >
>> >
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>> > > >
>> > > >
>> > >
>> >
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>> > > >
>> > > > A big thank you for the following people who have contributed to the
>> > > > 0.10.0.0 release.
>> > > >
>> > > > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
>> > > > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
>> > > > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
>> > > > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
>> > > > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
>> > > > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
>> > > > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
>> > > > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank
>> Scholten,
>> > > > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
>> > > > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
>> > > > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason
>> Gustafson,
>> > > > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson,
>> jholoman,
>> > > > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
>> > > > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
>> > > > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
>> > > > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
>> > > > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael
>> Blume,
>> > > > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
>> > > > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
>> > > > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
>> > > > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
>> > > > 

Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Ismael Juma
Hi Tom,

The official commit can always be found via the relevant Git tag:

https://github.com/apache/kafka/tree/0.10.0.0
https://github.com/apache/kafka/commit/1b5879653e0d956c79556301d1d11987baf6f2d7

Ismael

On Tue, May 24, 2016 at 5:57 PM, Tom Crayford  wrote:

> Can I just confirm that
>
> https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
> is the official commit for the release? The source download doesn't have
> the git repo and I can't see a sha anywhere in the downloaded source.
>
> On Tue, May 24, 2016 at 5:42 PM, Becket Qin  wrote:
>
> > Awesome!
> >
> > On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:
> >
> > > Woohoo!!! :-)
> > >
> > > -Jay
> > >
> > > On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira 
> > wrote:
> > >
> > > > The Apache Kafka community is pleased to announce the release for
> > Apache
> > > > Kafka 0.10.0.0.
> > > > This is a major release with exciting new features, including first
> > > > release of KafkaStreams and many other improvements.
> > > >
> > > > All of the changes in this release can be found:
> > > >
> > > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> > > >
> > > > Apache Kafka is high-throughput, publish-subscribe messaging system
> > > > rethought of as a distributed commit log.
> > > >
> > > > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> > > reads
> > > > and
> > > > writes per second from thousands of clients.
> > > >
> > > > ** Scalable => Kafka is designed to allow a single cluster to serve
> as
> > > the
> > > > central data backbone
> > > > for a large organization. It can be elastically and transparently
> > > expanded
> > > > without downtime.
> > > > Data streams are partitioned and spread over a cluster of machines to
> > > allow
> > > > data streams
> > > > larger than the capability of any single machine and to allow
> clusters
> > of
> > > > co-ordinated consumers.
> > > >
> > > > ** Durable => Messages are persisted on disk and replicated within
> the
> > > > cluster to prevent
> > > > data loss. Each broker can handle terabytes of messages without
> > > performance
> > > > impact.
> > > >
> > > > ** Distributed by Design => Kafka has a modern cluster-centric design
> > > that
> > > > offers
> > > > strong durability and fault-tolerance guarantees.
> > > >
> > > > You can download the source release from
> > > >
> > > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> > > >
> > > > and binary releases from
> > > >
> > > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> > > >
> > > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> > > >
> > > > A big thank you for the following people who have contributed to the
> > > > 0.10.0.0 release.
> > > >
> > > > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > > > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > > > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > > > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > > > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > > > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > > > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > > > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank
> Scholten,
> > > > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > > > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > > > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > > > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson,
> jholoman,
> > > > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > > > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > > > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > > > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > > > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael
> Blume,
> > > > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > > > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > > > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > > > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > > > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > > > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > > > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > > > Kawamura, zhuchen1018
> > > >
> > > > We welcome your help and feedback. For more information on how to
> > > > report problems, and to get involved, visit the project website at
> > > > 

[GitHub] kafka-site pull request: fix typo in site_name

2016-05-24 Thread ijuma
Github user ijuma commented on the pull request:

https://github.com/apache/kafka-site/pull/15#issuecomment-221336904
  
Thanks for the PR, merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site pull request: fix typo in site_name

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka-site/pull/15


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Tom Crayford
Can I just confirm that
https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
is the official commit for the release? The source download doesn't have
the git repo and I can't see a sha anywhere in the downloaded source.

On Tue, May 24, 2016 at 5:42 PM, Becket Qin  wrote:

> Awesome!
>
> On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:
>
> > Woohoo!!! :-)
> >
> > -Jay
> >
> > On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira 
> wrote:
> >
> > > The Apache Kafka community is pleased to announce the release for
> Apache
> > > Kafka 0.10.0.0.
> > > This is a major release with exciting new features, including first
> > > release of KafkaStreams and many other improvements.
> > >
> > > All of the changes in this release can be found:
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> > >
> > > Apache Kafka is high-throughput, publish-subscribe messaging system
> > > rethought of as a distributed commit log.
> > >
> > > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> > reads
> > > and
> > > writes per second from thousands of clients.
> > >
> > > ** Scalable => Kafka is designed to allow a single cluster to serve as
> > the
> > > central data backbone
> > > for a large organization. It can be elastically and transparently
> > expanded
> > > without downtime.
> > > Data streams are partitioned and spread over a cluster of machines to
> > allow
> > > data streams
> > > larger than the capability of any single machine and to allow clusters
> of
> > > co-ordinated consumers.
> > >
> > > ** Durable => Messages are persisted on disk and replicated within the
> > > cluster to prevent
> > > data loss. Each broker can handle terabytes of messages without
> > performance
> > > impact.
> > >
> > > ** Distributed by Design => Kafka has a modern cluster-centric design
> > that
> > > offers
> > > strong durability and fault-tolerance guarantees.
> > >
> > > You can download the source release from
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> > >
> > > and binary releases from
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> > >
> > > A big thank you for the following people who have contributed to the
> > > 0.10.0.0 release.
> > >
> > > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> > > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> > > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> > > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > > Kawamura, zhuchen1018
> > >
> > > We welcome your help and feedback. For more information on how to
> > > report problems, and to get involved, visit the project website at
> > > http://kafka.apache.org/
> > >
> > > Thanks,
> > >
> > > Gwen
> > >
> >
>


[jira] [Resolved] (KAFKA-3120) Consumer doesn't get messages from some partitions after reassign

2016-05-24 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-3120.

Resolution: Duplicate

> Consumer doesn't get messages from some partitions after reassign
> -
>
> Key: KAFKA-3120
> URL: https://issues.apache.org/jira/browse/KAFKA-3120
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
> Environment: Linux
>Reporter: Jakub Neubauer
>Priority: Critical
>
> I tested some scenario on one-node cluster with this setup:
> * A topic with 5 partitions, 1 replica.
> * One producer (new java client)
> * 2 consumers were started (let's say A and B). Using the new java client. 2 
> partitions to A, 3 partitions to B were assigned.
> Then I stopped one of the consumers (cleanly). The partitions were 
> re-assigned (The consumer got all 5 partitions in the 
> "ConsumerRebalanceListener.onPartitionsAssigned" listener.
> But as messages were produced, the living consumer received only messages of 
> some partitions (magically those that belonged to the now-dead consumer).
> The messages were not lost. After I restarted the second consumer, it got the 
> messages that it previously didn't get. But without restarting, the messages 
> were not consumed by it.
> It is quite serious issue, since there is no sign of something being wrong. 
> Everything seems to be working. So the administrator has no chance to get the 
> idea that (only some) messages are not consumed on the "healthy" system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3120) Consumer doesn't get messages from some partitions after reassign

2016-05-24 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298478#comment-15298478
 ] 

Jason Gustafson commented on KAFKA-3120:


I'm going to go ahead and close this as a duplicate of KAFKA-2877. 
[~jakub.neubauer] Feel free to reopen if you hit this problem on 0.9.0.1 or 
later.

> Consumer doesn't get messages from some partitions after reassign
> -
>
> Key: KAFKA-3120
> URL: https://issues.apache.org/jira/browse/KAFKA-3120
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
> Environment: Linux
>Reporter: Jakub Neubauer
>Priority: Critical
>
> I tested some scenario on one-node cluster with this setup:
> * A topic with 5 partitions, 1 replica.
> * One producer (new java client)
> * 2 consumers were started (let's say A and B). Using the new java client. 2 
> partitions to A, 3 partitions to B were assigned.
> Then I stopped one of the consumers (cleanly). The partitions were 
> re-assigned (The consumer got all 5 partitions in the 
> "ConsumerRebalanceListener.onPartitionsAssigned" listener.
> But as messages were produced, the living consumer received only messages of 
> some partitions (magically those that belonged to the now-dead consumer).
> The messages were not lost. After I restarted the second consumer, it got the 
> messages that it previously didn't get. But without restarting, the messages 
> were not consumed by it.
> It is quite serious issue, since there is no sign of something being wrong. 
> Everything seems to be working. So the administrator has no chance to get the 
> idea that (only some) messages are not consumed on the "healthy" system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Becket Qin
Awesome!

On Tue, May 24, 2016 at 9:41 AM, Jay Kreps  wrote:

> Woohoo!!! :-)
>
> -Jay
>
> On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira  wrote:
>
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 0.10.0.0.
> > This is a major release with exciting new features, including first
> > release of KafkaStreams and many other improvements.
> >
> > All of the changes in this release can be found:
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> >
> > Apache Kafka is high-throughput, publish-subscribe messaging system
> > rethought of as a distributed commit log.
> >
> > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> reads
> > and
> > writes per second from thousands of clients.
> >
> > ** Scalable => Kafka is designed to allow a single cluster to serve as
> the
> > central data backbone
> > for a large organization. It can be elastically and transparently
> expanded
> > without downtime.
> > Data streams are partitioned and spread over a cluster of machines to
> allow
> > data streams
> > larger than the capability of any single machine and to allow clusters of
> > co-ordinated consumers.
> >
> > ** Durable => Messages are persisted on disk and replicated within the
> > cluster to prevent
> > data loss. Each broker can handle terabytes of messages without
> performance
> > impact.
> >
> > ** Distributed by Design => Kafka has a modern cluster-centric design
> that
> > offers
> > strong durability and fault-tolerance guarantees.
> >
> > You can download the source release from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> >
> > and binary releases from
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> >
> > A big thank you for the following people who have contributed to the
> > 0.10.0.0 release.
> >
> > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > Kawamura, zhuchen1018
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> > Thanks,
> >
> > Gwen
> >
>


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Jay Kreps
Woohoo!!! :-)

-Jay

On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.0.
> This is a major release with exciting new features, including first
> release of KafkaStreams and many other improvements.
>
> All of the changes in this release can be found:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and
> writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone
> for a large organization. It can be elastically and transparently expanded
> without downtime.
> Data streams are partitioned and spread over a cluster of machines to allow
> data streams
> larger than the capability of any single machine and to allow clusters of
> co-ordinated consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent
> data loss. Each broker can handle terabytes of messages without performance
> impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers
> strong durability and fault-tolerance guarantees.
>
> You can download the source release from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>
> and binary releases from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.0 release.
>
> Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> Kawamura, zhuchen1018
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
>
> Gwen
>


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Jun Rao
Gwen,

Thanks for running the release!

Jun

On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.0.
> This is a major release with exciting new features, including first
> release of KafkaStreams and many other improvements.
>
> All of the changes in this release can be found:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and
> writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone
> for a large organization. It can be elastically and transparently expanded
> without downtime.
> Data streams are partitioned and spread over a cluster of machines to allow
> data streams
> larger than the capability of any single machine and to allow clusters of
> co-ordinated consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent
> data loss. Each broker can handle terabytes of messages without performance
> impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers
> strong durability and fault-tolerance guarantees.
>
> You can download the source release from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>
> and binary releases from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.0 release.
>
> Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> Kawamura, zhuchen1018
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
>
> Gwen
>


[GitHub] kafka-site pull request: fix typo in site_name

2016-05-24 Thread onurkaraman
GitHub user onurkaraman opened a pull request:

https://github.com/apache/kafka-site/pull/15

fix typo in site_name



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/onurkaraman/kafka-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/15.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15


commit cdcc1832d62128b5bef7a1bd62d5a4d50284b245
Author: Onur Karaman 
Date:   2016-05-24T16:35:20Z

fix typo in site_name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread parth brahmbhatt
110. What does getDelegationTokenAs mean?
In the current proposal we only allow a user to get delegation token for
the identity that it authenticated as using another mechanism, i.e. A user
that authenticate using a keytab for principal us...@example.com will get
delegation tokens for that user only. In future I think we will have to
extend support such that we allow some set of users (
kafka-rest-u...@example.com, storm-nim...@example.com) to acquire
delegation tokens on behalf of other users whose identity they have
verified independently.  Kafka brokers will have ACLs to control which
users are allowed to impersonate other users and get tokens on behalf of
them. Overall Impersonation is a whole different problem in my opinion and
I think we can tackle it in separate KIP.

111. What's the typical rate of getting and renewing delegation tokens?
Typically this should be very very low, 1 request per minute is a
relatively high estimate. However it depends on the token expiration. I am
less worried about the extra load it puts on controller vs the added
complexity and the value it offers.

Thanks
Parth



On Tue, May 24, 2016 at 7:30 AM, Ismael Juma  wrote:

> Thanks Rajini. It would probably require a separate KIP as it will
> introduce user visible changes. We could also update KIP-48 to have this
> information, but it seems cleaner to do it separately. We can discuss that
> in the KIP call today.
>
> Ismael
>
> On Tue, May 24, 2016 at 3:19 PM, Rajini Sivaram <
> rajinisiva...@googlemail.com> wrote:
>
> > Ismael,
> >
> > I have created a JIRA (https://issues.apache.org/jira/browse/KAFKA-3751)
> > for adding SCRAM as a SASL mechanism. Would that need another KIP? If
> > KIP-48 will use this mechanism, can this just be a JIRA that gets
> reviewed
> > when the PR is ready?
> >
> > Thank you,
> >
> > Rajini
> >
> > On Tue, May 24, 2016 at 2:46 PM, Ismael Juma  wrote:
> >
> > > Thanks Rajini, SCRAM seems like a good candidate.
> > >
> > > Gwen had independently mentioned this as a SASL mechanism that might be
> > > useful for Kafka and I have been meaning to check it in more detail.
> Good
> > > to know that you are willing to contribute an implementation. Maybe we
> > > should file a separate JIRA for this?
> > >
> > > Ismael
> > >
> > > On Tue, May 24, 2016 at 2:12 PM, Rajini Sivaram <
> > > rajinisiva...@googlemail.com> wrote:
> > >
> > > > SCRAM (Salted Challenge Response Authentication Mechanism) is a
> better
> > > > mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
> > > > SaslServer or SaslClient, but I will be happy to add support in Kafka
> > > since
> > > > it would be a useful mechanism to support anyway.
> > > > https://tools.ietf.org/html/rfc7677 describes the protocol for
> > > > SCRAM-SHA-256.
> > > >
> > > > On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:
> > > >
> > > > > Parth,
> > > > >
> > > > > Thanks for the explanation. A couple of more questions.
> > > > >
> > > > > 110. What does getDelegationTokenAs mean?
> > > > >
> > > > > 111. What's the typical rate of getting and renewing delegation
> > tokens?
> > > > > That may have an impact on whether they should be directed to the
> > > > > controller.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> > > > > brahmbhatt.pa...@gmail.com> wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for reviewing.
> > > > > >
> > > > > > * We could add a Cluster action to add acls on who can request
> > > > delegation
> > > > > > tokens. I don't see the use case for that yet but down the line
> > when
> > > we
> > > > > > start supporting getDelegationTokenAs it will be necessary.
> > > > > > * Yes we recommend tokens to be only used/distributed over secure
> > > > > channels.
> > > > > > * Depending on what design we end up choosing Invalidation will
> be
> > > > > > responsibility of every broker or controller.
> > > > > > * I am not sure if I documented somewhere that invalidation will
> > > > directly
> > > > > > go through zookeeper but that is not the intent. Invalidation
> will
> > > > either
> > > > > > be request based or due to expiration. No direct zookeeper
> > > interaction
> > > > > from
> > > > > > any client.
> > > > > > * "Broker also stores the DelegationToken without the hmac in the
> > > > > > zookeeper." : Sorry about the confusion. The sole purpose of
> > > zookeeper
> > > > in
> > > > > > this design is as distribution channel for tokens between all
> > brokers
> > > > > and a
> > > > > > layer that ensures only tokens that were generated by making a
> > > request
> > > > > to a
> > > > > > broker will be accepted (more on this in second paragraph). The
> > token
> > > > > > consists of few elements (owner, renewer, uuid , expiration,
> hmac)
> > ,
> > > > one
> > > > > of
> > > > > > which is the finally generated hmac but hmac it self is derivable
> > if
> > > > you
> > > > > > have all the 

[ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Gwen Shapira
The Apache Kafka community is pleased to announce the release for Apache
Kafka 0.10.0.0.
This is a major release with exciting new features, including first
release of KafkaStreams and many other improvements.

All of the changes in this release can be found:
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html

Apache Kafka is high-throughput, publish-subscribe messaging system
rethought of as a distributed commit log.

** Fast => A single Kafka broker can handle hundreds of megabytes of reads
and
writes per second from thousands of clients.

** Scalable => Kafka is designed to allow a single cluster to serve as the
central data backbone
for a large organization. It can be elastically and transparently expanded
without downtime.
Data streams are partitioned and spread over a cluster of machines to allow
data streams
larger than the capability of any single machine and to allow clusters of
co-ordinated consumers.

** Durable => Messages are persisted on disk and replicated within the
cluster to prevent
data loss. Each broker can handle terabytes of messages without performance
impact.

** Distributed by Design => Kafka has a modern cluster-centric design that
offers
strong durability and fault-tolerance guarantees.

You can download the source release from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz

and binary releases from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

A big thank you for the following people who have contributed to the
0.10.0.0 release.

Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
Kawamura, zhuchen1018

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/

Thanks,

Gwen


Re: [DISCUSS] scalability limits in the coordinator

2016-05-24 Thread Jason Gustafson
Hey Becket,

I like your idea to store only the offset for the group metadata in memory.
I think it would be safe to keep it in memory for a short time after the
rebalance completes, but after that, it's only real purpose is to answer
DescribeGroup requests, so your proposal makes a lot of sense to me.

As for the specific problem with the size of the group metadata message for
the MM case, if we cannot succeed in reducing the size of the
subscription/assignment (which I think is still probably the best
alternative if it can work), then I think there are some options for
changing the message format (option #8 in Onur's initial e-mail).
Currently, the key used for storing the group metadata is this:

GroupMetadataKey => Version GroupId

And the value is something like this (some details elided):

GroupMetadataValue => Version GroupId Generation [MemberMetadata]
  MemberMetadata => ClientId Host Subscription Assignment

I don't think we can change the key without a lot of pain, but it seems
like we can change the value format. Maybe we can take the
subscription/assignment payloads out of the value and introduce a new
"MemberMetadata" message for each member in the group. For example:

MemberMetadataKey => Version GroupId MemberId

MemberMetadataValue => Version Generation ClientId Host Subscription
Assignment

When a new generation is created, we would first write the group metadata
message which includes the generation and all of the memberIds, and then
we'd write the member metadata messages. To answer the DescribeGroup
request, we'd read the group metadata at the cached offset and, depending
on the version, all of the following member metadata. This would be more
complex to maintain, but it seems doable if it comes to it.

Thanks,
Jason

On Mon, May 23, 2016 at 6:15 PM, Becket Qin  wrote:

> It might worth thinking a little further. We have discussed this before
> that we want to avoid holding all the group metadata in memory.
>
> I am thinking about the following end state:
>
> 1. Enable compression on the offset topic.
> 2. Instead of holding the entire group metadata in memory on the brokers,
> each broker only keeps a [group -> Offset] map, the offset points to the
> message in the offset topic which holds the latest metadata of the group.
> 3. DescribeGroupResponse will read from the offset topic directly like a
> normal consumption, except that only exactly one message will be returned.
> 4. SyncGroupResponse will read the message, extract the assignment part and
> send back the partition assignment. We can compress the partition
> assignment before sends it out if we want.
>
> Jiangjie (Becket) Qin
>
> On Mon, May 23, 2016 at 5:08 PM, Jason Gustafson 
> wrote:
>
> > >
> > > Jason, doesn't gzip (or other compression) basically do this? If the
> > topic
> > > is a string and the topic is repeated throughout, won't compression
> > > basically replace all repeated instances of it with an index reference
> to
> > > the full string?
> >
> >
> > Hey James, yeah, that's probably true, but keep in mind that the
> > compression happens on the broker side. It would be nice to have a more
> > compact representation so that get some benefit over the wire as well.
> This
> > seems to be less of a concern here, so the bigger gains are probably from
> > reducing the number of partitions that need to be listed individually.
> >
> > -Jason
> >
> > On Mon, May 23, 2016 at 4:23 PM, Onur Karaman <
> > onurkaraman.apa...@gmail.com>
> > wrote:
> >
> > > When figuring out these optimizations, it's worth keeping in mind the
> > > improvements when the message is uncompressed vs when it's compressed.
> > >
> > > When uncompressed:
> > > Fixing the Assignment serialization to instead be a topic index into
> the
> > > corresponding member's subscription list would usually be a good thing.
> > >
> > > I think the proposal is only worse when the topic names are small. The
> > > Type.STRING we use in our protocol for the assignment's TOPIC_KEY_NAME
> is
> > > limited in length to Short.MAX_VALUE, so our strings are first
> prepended
> > > with 2 bytes to indicate the string size.
> > >
> > > The new proposal does worse when:
> > > 2 + utf_encoded_string_payload_size < index_type_size
> > > in other words when:
> > > utf_encoded_string_payload_size < index_type_size - 2
> > >
> > > If the index type ends up being Type.INT32, then the proposal is worse
> > when
> > > the topic is length 1.
> > > If the index type ends up being Type.INT64, then the proposal is worse
> > when
> > > the topic is less than length 6.
> > >
> > > When compressed:
> > > As James Cheng brought up, I'm not sure how things change when
> > compression
> > > comes into the picture. This would be worth investigating.
> > >
> > > On Mon, May 23, 2016 at 4:05 PM, James Cheng 
> > wrote:
> > >
> > > >
> > > > > On May 23, 2016, at 10:59 AM, Jason Gustafson 
> > > > wrote:
> > > > 

[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-05-24 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298296#comment-15298296
 ] 

Dustin Cote commented on KAFKA-2394:


[~jinxing6...@126.com] are you able to make a pull request for this issue 
instead of attaching the patch to the JIRA?  I think you've already been doing 
[the 
process|https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest]
 on other JIRAs, and this one seems like an easy one to translate for you.

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: jin xing
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread Ismael Juma
Thanks Rajini. It would probably require a separate KIP as it will
introduce user visible changes. We could also update KIP-48 to have this
information, but it seems cleaner to do it separately. We can discuss that
in the KIP call today.

Ismael

On Tue, May 24, 2016 at 3:19 PM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Ismael,
>
> I have created a JIRA (https://issues.apache.org/jira/browse/KAFKA-3751)
> for adding SCRAM as a SASL mechanism. Would that need another KIP? If
> KIP-48 will use this mechanism, can this just be a JIRA that gets reviewed
> when the PR is ready?
>
> Thank you,
>
> Rajini
>
> On Tue, May 24, 2016 at 2:46 PM, Ismael Juma  wrote:
>
> > Thanks Rajini, SCRAM seems like a good candidate.
> >
> > Gwen had independently mentioned this as a SASL mechanism that might be
> > useful for Kafka and I have been meaning to check it in more detail. Good
> > to know that you are willing to contribute an implementation. Maybe we
> > should file a separate JIRA for this?
> >
> > Ismael
> >
> > On Tue, May 24, 2016 at 2:12 PM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> > > SCRAM (Salted Challenge Response Authentication Mechanism) is a better
> > > mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
> > > SaslServer or SaslClient, but I will be happy to add support in Kafka
> > since
> > > it would be a useful mechanism to support anyway.
> > > https://tools.ietf.org/html/rfc7677 describes the protocol for
> > > SCRAM-SHA-256.
> > >
> > > On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:
> > >
> > > > Parth,
> > > >
> > > > Thanks for the explanation. A couple of more questions.
> > > >
> > > > 110. What does getDelegationTokenAs mean?
> > > >
> > > > 111. What's the typical rate of getting and renewing delegation
> tokens?
> > > > That may have an impact on whether they should be directed to the
> > > > controller.
> > > >
> > > > Jun
> > > >
> > > > On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> > > > brahmbhatt.pa...@gmail.com> wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for reviewing.
> > > > >
> > > > > * We could add a Cluster action to add acls on who can request
> > > delegation
> > > > > tokens. I don't see the use case for that yet but down the line
> when
> > we
> > > > > start supporting getDelegationTokenAs it will be necessary.
> > > > > * Yes we recommend tokens to be only used/distributed over secure
> > > > channels.
> > > > > * Depending on what design we end up choosing Invalidation will be
> > > > > responsibility of every broker or controller.
> > > > > * I am not sure if I documented somewhere that invalidation will
> > > directly
> > > > > go through zookeeper but that is not the intent. Invalidation will
> > > either
> > > > > be request based or due to expiration. No direct zookeeper
> > interaction
> > > > from
> > > > > any client.
> > > > > * "Broker also stores the DelegationToken without the hmac in the
> > > > > zookeeper." : Sorry about the confusion. The sole purpose of
> > zookeeper
> > > in
> > > > > this design is as distribution channel for tokens between all
> brokers
> > > > and a
> > > > > layer that ensures only tokens that were generated by making a
> > request
> > > > to a
> > > > > broker will be accepted (more on this in second paragraph). The
> token
> > > > > consists of few elements (owner, renewer, uuid , expiration, hmac)
> ,
> > > one
> > > > of
> > > > > which is the finally generated hmac but hmac it self is derivable
> if
> > > you
> > > > > have all the other elements of the token + secret key to generate
> > hmac.
> > > > > Given zookeeper does not provide SSL support we do not want the
> > entire
> > > > > token to be wire transferred to zookeeper as that will be an
> insecure
> > > > wire
> > > > > transfer. Instead we only store all the other elements of a
> > delegation
> > > > > tokens. Brokers can read these elements and because they also have
> > > access
> > > > > to secret key they will be able to generate hmac on their end.
> > > > >
> > > > > One of the alternative proposed is to avoid zookeeper altogether. A
> > > > Client
> > > > > will call broker with required information (owner, renwer,
> > expiration)
> > > > and
> > > > > get back (signed hmac, uuid). Broker won't store this in zookeeper.
> > > From
> > > > > this point a client can contact any broker with all the delegation
> > > token
> > > > > info (owner, rewner, expiration, hmac, uuid) the borker will
> > regenerate
> > > > the
> > > > > hmac and as long as it matches with hmac presented by client ,
> broker
> > > > will
> > > > > allow the request to authenticate.  Only problem with this approach
> > is
> > > if
> > > > > the secret key is compromised any client can now generate random
> > tokens
> > > > and
> > > > > they will still be able to authenticate as any user they like. with
> > > > > zookeeper we guarantee that only tokens acquired via a broker
> 

[jira] [Commented] (KAFKA-3511) Add common aggregation functions like Sum and Avg as build-ins in Kafka Streams DSL

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298244#comment-15298244
 ] 

ASF GitHub Bot commented on KAFKA-3511:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/1424

KAFKA-3511: Initial commit for aggregators [WiP]

Initial structure. Removed initialiser. Two simple aggregators.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-3511-sum-avg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1424


commit 18416bb213b6eaa3fa5952af67dc5396204e247c
Author: Eno Thereska 
Date:   2016-05-24T14:25:47Z

Initial commit for aggregators




> Add common aggregation functions like Sum and Avg as build-ins in Kafka 
> Streams DSL
> ---
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we have the following aggregation APIs in the Streams DSL:
> {code}
> KStream.aggregateByKey(..)
> KStream.reduceByKey(..)
> KStream.countByKey(..)
> KTable.groupBy(...).aggregate(..)
> KTable.groupBy(...).reduce(..)
> KTable.groupBy(...).count(..)
> {code}
> And it is better to add common aggregation functions like Sum and Avg as 
> built-in into the Streams DSL. A few questions to ask though:
> 1. Should we add those built-in functions as, for example 
> {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, 
> ...)}}. Please see the comments below for detailed pros and cons.
> 2. If we go with the second option above, should we replace the countByKey / 
> count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel 
> it is not necessary, as COUNT is a special aggregate function since we do not 
> need to map on any value fields; this is the same approach as in Spark as 
> well, where Count is built-in as first-citizen in the DSL, and others are 
> built-in as {{aggregate(SUM)}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3511: Initial commit for aggregators [Wi...

2016-05-24 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/1424

KAFKA-3511: Initial commit for aggregators [WiP]

Initial structure. Removed initialiser. Two simple aggregators.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-3511-sum-avg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1424


commit 18416bb213b6eaa3fa5952af67dc5396204e247c
Author: Eno Thereska 
Date:   2016-05-24T14:25:47Z

Initial commit for aggregators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Work started] (KAFKA-3511) Add common aggregation functions like Sum and Avg as build-ins in Kafka Streams DSL

2016-05-24 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3511 started by Eno Thereska.
---
> Add common aggregation functions like Sum and Avg as build-ins in Kafka 
> Streams DSL
> ---
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we have the following aggregation APIs in the Streams DSL:
> {code}
> KStream.aggregateByKey(..)
> KStream.reduceByKey(..)
> KStream.countByKey(..)
> KTable.groupBy(...).aggregate(..)
> KTable.groupBy(...).reduce(..)
> KTable.groupBy(...).count(..)
> {code}
> And it is better to add common aggregation functions like Sum and Avg as 
> built-in into the Streams DSL. A few questions to ask though:
> 1. Should we add those built-in functions as, for example 
> {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, 
> ...)}}. Please see the comments below for detailed pros and cons.
> 2. If we go with the second option above, should we replace the countByKey / 
> count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel 
> it is not necessary, as COUNT is a special aggregate function since we do not 
> need to map on any value fields; this is the same approach as in Spark as 
> well, where Count is built-in as first-citizen in the DSL, and others are 
> built-in as {{aggregate(SUM)}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread Rajini Sivaram
Ismael,

I have created a JIRA (https://issues.apache.org/jira/browse/KAFKA-3751)
for adding SCRAM as a SASL mechanism. Would that need another KIP? If
KIP-48 will use this mechanism, can this just be a JIRA that gets reviewed
when the PR is ready?

Thank you,

Rajini

On Tue, May 24, 2016 at 2:46 PM, Ismael Juma  wrote:

> Thanks Rajini, SCRAM seems like a good candidate.
>
> Gwen had independently mentioned this as a SASL mechanism that might be
> useful for Kafka and I have been meaning to check it in more detail. Good
> to know that you are willing to contribute an implementation. Maybe we
> should file a separate JIRA for this?
>
> Ismael
>
> On Tue, May 24, 2016 at 2:12 PM, Rajini Sivaram <
> rajinisiva...@googlemail.com> wrote:
>
> > SCRAM (Salted Challenge Response Authentication Mechanism) is a better
> > mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
> > SaslServer or SaslClient, but I will be happy to add support in Kafka
> since
> > it would be a useful mechanism to support anyway.
> > https://tools.ietf.org/html/rfc7677 describes the protocol for
> > SCRAM-SHA-256.
> >
> > On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:
> >
> > > Parth,
> > >
> > > Thanks for the explanation. A couple of more questions.
> > >
> > > 110. What does getDelegationTokenAs mean?
> > >
> > > 111. What's the typical rate of getting and renewing delegation tokens?
> > > That may have an impact on whether they should be directed to the
> > > controller.
> > >
> > > Jun
> > >
> > > On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> > > brahmbhatt.pa...@gmail.com> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for reviewing.
> > > >
> > > > * We could add a Cluster action to add acls on who can request
> > delegation
> > > > tokens. I don't see the use case for that yet but down the line when
> we
> > > > start supporting getDelegationTokenAs it will be necessary.
> > > > * Yes we recommend tokens to be only used/distributed over secure
> > > channels.
> > > > * Depending on what design we end up choosing Invalidation will be
> > > > responsibility of every broker or controller.
> > > > * I am not sure if I documented somewhere that invalidation will
> > directly
> > > > go through zookeeper but that is not the intent. Invalidation will
> > either
> > > > be request based or due to expiration. No direct zookeeper
> interaction
> > > from
> > > > any client.
> > > > * "Broker also stores the DelegationToken without the hmac in the
> > > > zookeeper." : Sorry about the confusion. The sole purpose of
> zookeeper
> > in
> > > > this design is as distribution channel for tokens between all brokers
> > > and a
> > > > layer that ensures only tokens that were generated by making a
> request
> > > to a
> > > > broker will be accepted (more on this in second paragraph). The token
> > > > consists of few elements (owner, renewer, uuid , expiration, hmac) ,
> > one
> > > of
> > > > which is the finally generated hmac but hmac it self is derivable if
> > you
> > > > have all the other elements of the token + secret key to generate
> hmac.
> > > > Given zookeeper does not provide SSL support we do not want the
> entire
> > > > token to be wire transferred to zookeeper as that will be an insecure
> > > wire
> > > > transfer. Instead we only store all the other elements of a
> delegation
> > > > tokens. Brokers can read these elements and because they also have
> > access
> > > > to secret key they will be able to generate hmac on their end.
> > > >
> > > > One of the alternative proposed is to avoid zookeeper altogether. A
> > > Client
> > > > will call broker with required information (owner, renwer,
> expiration)
> > > and
> > > > get back (signed hmac, uuid). Broker won't store this in zookeeper.
> > From
> > > > this point a client can contact any broker with all the delegation
> > token
> > > > info (owner, rewner, expiration, hmac, uuid) the borker will
> regenerate
> > > the
> > > > hmac and as long as it matches with hmac presented by client , broker
> > > will
> > > > allow the request to authenticate.  Only problem with this approach
> is
> > if
> > > > the secret key is compromised any client can now generate random
> tokens
> > > and
> > > > they will still be able to authenticate as any user they like. with
> > > > zookeeper we guarantee that only tokens acquired via a broker (using
> > some
> > > > auth scheme other than delegation token) will be accepted. We need to
> > > > discuss which proposal makes more sense and we can go over it in
> > > tomorrow's
> > > > meeting.
> > > >
> > > > Also, can you forward the invite to me?
> > > >
> > > > Thanks
> > > > Parth
> > > >
> > > >
> > > >
> > > > On Mon, May 23, 2016 at 10:35 AM, Jun Rao  wrote:
> > > >
> > > > > Thanks for the KIP. A few comments.
> > > > >
> > > > > 100. This potentially can be useful for Kafka Connect and Kafka
> rest
> > > > proxy
> > > > > where a worker agent will 

[jira] [Created] (KAFKA-3751) Add support for SASL mechanism SCRAM-SHA-256

2016-05-24 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-3751:
-

 Summary: Add support for SASL mechanism SCRAM-SHA-256 
 Key: KAFKA-3751
 URL: https://issues.apache.org/jira/browse/KAFKA-3751
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 0.10.0.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


Salted Challenge Response Authentication Mechanism (SCRAM) provides secure 
authentication and is increasingly being adopted as an alternative to 
Digest-MD5 which is now obsolete. SCRAM is described in the RFC 
[https://tools.ietf.org/html/rfc5802]. It will be good to add support for 
SCRAM-SHA-256 ([https://tools.ietf.org/html/rfc7677]) as a SASL mechanism for 
Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread Ismael Juma
Thanks Rajini, SCRAM seems like a good candidate.

Gwen had independently mentioned this as a SASL mechanism that might be
useful for Kafka and I have been meaning to check it in more detail. Good
to know that you are willing to contribute an implementation. Maybe we
should file a separate JIRA for this?

Ismael

On Tue, May 24, 2016 at 2:12 PM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> SCRAM (Salted Challenge Response Authentication Mechanism) is a better
> mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
> SaslServer or SaslClient, but I will be happy to add support in Kafka since
> it would be a useful mechanism to support anyway.
> https://tools.ietf.org/html/rfc7677 describes the protocol for
> SCRAM-SHA-256.
>
> On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:
>
> > Parth,
> >
> > Thanks for the explanation. A couple of more questions.
> >
> > 110. What does getDelegationTokenAs mean?
> >
> > 111. What's the typical rate of getting and renewing delegation tokens?
> > That may have an impact on whether they should be directed to the
> > controller.
> >
> > Jun
> >
> > On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> > brahmbhatt.pa...@gmail.com> wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for reviewing.
> > >
> > > * We could add a Cluster action to add acls on who can request
> delegation
> > > tokens. I don't see the use case for that yet but down the line when we
> > > start supporting getDelegationTokenAs it will be necessary.
> > > * Yes we recommend tokens to be only used/distributed over secure
> > channels.
> > > * Depending on what design we end up choosing Invalidation will be
> > > responsibility of every broker or controller.
> > > * I am not sure if I documented somewhere that invalidation will
> directly
> > > go through zookeeper but that is not the intent. Invalidation will
> either
> > > be request based or due to expiration. No direct zookeeper interaction
> > from
> > > any client.
> > > * "Broker also stores the DelegationToken without the hmac in the
> > > zookeeper." : Sorry about the confusion. The sole purpose of zookeeper
> in
> > > this design is as distribution channel for tokens between all brokers
> > and a
> > > layer that ensures only tokens that were generated by making a request
> > to a
> > > broker will be accepted (more on this in second paragraph). The token
> > > consists of few elements (owner, renewer, uuid , expiration, hmac) ,
> one
> > of
> > > which is the finally generated hmac but hmac it self is derivable if
> you
> > > have all the other elements of the token + secret key to generate hmac.
> > > Given zookeeper does not provide SSL support we do not want the entire
> > > token to be wire transferred to zookeeper as that will be an insecure
> > wire
> > > transfer. Instead we only store all the other elements of a delegation
> > > tokens. Brokers can read these elements and because they also have
> access
> > > to secret key they will be able to generate hmac on their end.
> > >
> > > One of the alternative proposed is to avoid zookeeper altogether. A
> > Client
> > > will call broker with required information (owner, renwer, expiration)
> > and
> > > get back (signed hmac, uuid). Broker won't store this in zookeeper.
> From
> > > this point a client can contact any broker with all the delegation
> token
> > > info (owner, rewner, expiration, hmac, uuid) the borker will regenerate
> > the
> > > hmac and as long as it matches with hmac presented by client , broker
> > will
> > > allow the request to authenticate.  Only problem with this approach is
> if
> > > the secret key is compromised any client can now generate random tokens
> > and
> > > they will still be able to authenticate as any user they like. with
> > > zookeeper we guarantee that only tokens acquired via a broker (using
> some
> > > auth scheme other than delegation token) will be accepted. We need to
> > > discuss which proposal makes more sense and we can go over it in
> > tomorrow's
> > > meeting.
> > >
> > > Also, can you forward the invite to me?
> > >
> > > Thanks
> > > Parth
> > >
> > >
> > >
> > > On Mon, May 23, 2016 at 10:35 AM, Jun Rao  wrote:
> > >
> > > > Thanks for the KIP. A few comments.
> > > >
> > > > 100. This potentially can be useful for Kafka Connect and Kafka rest
> > > proxy
> > > > where a worker agent will need to run a task on behalf of a client.
> We
> > > will
> > > > likely need to change how those services use Kafka clients
> > > > (producer/consumer). Instead of a shared client per worker, we will
> > need
> > > a
> > > > client per user task since the authentication happens at the
> connection
> > > > level. For Kafka Connect, the renewer will be the workers. So, we
> > > probably
> > > > need to allow multiple renewers. For Kafka rest proxy, the renewer
> can
> > > > probably just be the creator of the token.
> > > >
> > > > 101. Do we need new acl on who can request delegation 

Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-05-24 Thread Rajini Sivaram
SCRAM (Salted Challenge Response Authentication Mechanism) is a better
mechanism than Digest-MD5. Java doesn't come with a built-in SCRAM
SaslServer or SaslClient, but I will be happy to add support in Kafka since
it would be a useful mechanism to support anyway.
https://tools.ietf.org/html/rfc7677 describes the protocol for
SCRAM-SHA-256.

On Tue, May 24, 2016 at 2:37 AM, Jun Rao  wrote:

> Parth,
>
> Thanks for the explanation. A couple of more questions.
>
> 110. What does getDelegationTokenAs mean?
>
> 111. What's the typical rate of getting and renewing delegation tokens?
> That may have an impact on whether they should be directed to the
> controller.
>
> Jun
>
> On Mon, May 23, 2016 at 1:19 PM, parth brahmbhatt <
> brahmbhatt.pa...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > Thanks for reviewing.
> >
> > * We could add a Cluster action to add acls on who can request delegation
> > tokens. I don't see the use case for that yet but down the line when we
> > start supporting getDelegationTokenAs it will be necessary.
> > * Yes we recommend tokens to be only used/distributed over secure
> channels.
> > * Depending on what design we end up choosing Invalidation will be
> > responsibility of every broker or controller.
> > * I am not sure if I documented somewhere that invalidation will directly
> > go through zookeeper but that is not the intent. Invalidation will either
> > be request based or due to expiration. No direct zookeeper interaction
> from
> > any client.
> > * "Broker also stores the DelegationToken without the hmac in the
> > zookeeper." : Sorry about the confusion. The sole purpose of zookeeper in
> > this design is as distribution channel for tokens between all brokers
> and a
> > layer that ensures only tokens that were generated by making a request
> to a
> > broker will be accepted (more on this in second paragraph). The token
> > consists of few elements (owner, renewer, uuid , expiration, hmac) , one
> of
> > which is the finally generated hmac but hmac it self is derivable if you
> > have all the other elements of the token + secret key to generate hmac.
> > Given zookeeper does not provide SSL support we do not want the entire
> > token to be wire transferred to zookeeper as that will be an insecure
> wire
> > transfer. Instead we only store all the other elements of a delegation
> > tokens. Brokers can read these elements and because they also have access
> > to secret key they will be able to generate hmac on their end.
> >
> > One of the alternative proposed is to avoid zookeeper altogether. A
> Client
> > will call broker with required information (owner, renwer, expiration)
> and
> > get back (signed hmac, uuid). Broker won't store this in zookeeper. From
> > this point a client can contact any broker with all the delegation token
> > info (owner, rewner, expiration, hmac, uuid) the borker will regenerate
> the
> > hmac and as long as it matches with hmac presented by client , broker
> will
> > allow the request to authenticate.  Only problem with this approach is if
> > the secret key is compromised any client can now generate random tokens
> and
> > they will still be able to authenticate as any user they like. with
> > zookeeper we guarantee that only tokens acquired via a broker (using some
> > auth scheme other than delegation token) will be accepted. We need to
> > discuss which proposal makes more sense and we can go over it in
> tomorrow's
> > meeting.
> >
> > Also, can you forward the invite to me?
> >
> > Thanks
> > Parth
> >
> >
> >
> > On Mon, May 23, 2016 at 10:35 AM, Jun Rao  wrote:
> >
> > > Thanks for the KIP. A few comments.
> > >
> > > 100. This potentially can be useful for Kafka Connect and Kafka rest
> > proxy
> > > where a worker agent will need to run a task on behalf of a client. We
> > will
> > > likely need to change how those services use Kafka clients
> > > (producer/consumer). Instead of a shared client per worker, we will
> need
> > a
> > > client per user task since the authentication happens at the connection
> > > level. For Kafka Connect, the renewer will be the workers. So, we
> > probably
> > > need to allow multiple renewers. For Kafka rest proxy, the renewer can
> > > probably just be the creator of the token.
> > >
> > > 101. Do we need new acl on who can request delegation tokens?
> > >
> > > 102. Do we recommend people to send delegation tokens in an encrypted
> > > channel?
> > >
> > > 103. Who is responsible for expiring tokens, every broker?
> > >
> > > 104. For invalidating tokens, would it be better to do it in a request
> > > instead of going to ZK directly?
> > >
> > > 105. The terminology of client in the wiki sometimes refers to the end
> > > client and some other times refers to the client using the delegation
> > > tokens. It would be useful to distinguish between the two.
> > >
> > > 106. Could you explain the sentence "Broker also stores the
> > DelegationToken
> > > without the hmac in 

Jenkins build is back to normal : kafka-trunk-jdk7 #1309

2016-05-24 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-55: Secure quotas for authenticated users

2016-05-24 Thread Rajini Sivaram
Jun,

I have updated the KIP based on your suggestion. Can you take a look?

Thank you,

Rajini

On Tue, May 24, 2016 at 11:20 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Jun,
>
> Thank you for the review. I agree that a simple user principal based quota
> is sufficient to allocate broker resources fairly in a multi-user system.
> Hierarchical quotas proposed in the KIP currently enables clients of a user
> to be rate-limited as well. This allows a user to run multiple clients
> which don't interfere with each other's quotas. Since there is no clear
> requirement to support this at the moment, I am happy to limit the scope of
> the KIP to a single-level user-based quota. Will update the KIP today.
>
> Regards,
>
> Rajini
>
> On Mon, May 23, 2016 at 5:24 PM, Jun Rao  wrote:
>
>> Rajini,
>>
>> Thanks for the KIP. When we first added the quota support, the intention
>> was to be able to add a quota per application. Since at that time, we
>> don't
>> have security yet. We essentially simulated users with client-ids. Now
>> that
>> we do have security. It seems that we just need to have a way to set quota
>> at the user level. Setting quota at the combination of users and
>> client-ids
>> seems more complicated and I am not sure if there is a good use case.
>>
>> Also, the new config quota.secure.enable seems a bit weird. Would it be
>> better to add a new config quota.type. It defaults to clientId for
>> backward
>> compatibility. If one sets it to user, then the default broker level quota
>> is for users w/o a customized quota. In this setting, brokers will also
>> only take quota set at the user level (i.e., quota set at clientId level
>> will be ignored).
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, May 3, 2016 at 4:32 AM, Rajini Sivaram <
>> rajinisiva...@googlemail.com
>> > wrote:
>>
>> > Ewen,
>> >
>> > Thank you for the review. I agree that ideally we would have one
>> definition
>> > of quotas that handles all cases. But I couldn't quite fit all the
>> > combinations that are possible today with client-id-based quotas into
>> the
>> > new configuration. I think upgrade path is not bad since quotas are
>> > per-broker. You can configure quotas based on the new configuration, set
>> > quota.secure.enable=true and restart the broker. Since there is no
>> > requirement for both insecure client-id based quotas and secure
>> user-based
>> > quotas to co-exist in a cluster, isn't that sufficient? The
>> implementation
>> > does use a unified approach, so if an alternative configuration can be
>> > defined (perhaps with some acceptable limitations?) which can express
>> both,
>> > it will be easy to implement. Suggestions welcome :-)
>> >
>> > The cases that the new configuration cannot express, but the old one can
>> > are:
>> >
>> >1. SSL/SASL with multiple users, same client ids used by multiple
>> users,
>> >client-id based quotas where quotas are shared between multiple users
>> >2. Default quotas for client-ids. In the new configuration, default
>> >quotas are defined for users and clients with no configured sub-quota
>> > share
>> >the user's quota.
>> >
>> >
>> >
>> > On Sat, Apr 30, 2016 at 6:21 AM, Ewen Cheslack-Postava <
>> e...@confluent.io>
>> > wrote:
>> >
>> > > Rajini,
>> > >
>> > > I'm admittedly not very familiar with a lot of this code or
>> > implementation,
>> > > so correct me if I'm making any incorrect assumptions.
>> > >
>> > > I've only scanned the KIP, but my main concern is the rejection of the
>> > > alternative -- unifying client-id and principal quotas. In particular,
>> > > doesn't this make an upgrade for brokers using those different
>> approaches
>> > > difficult since you have to make a hard break between client-id and
>> > > principal quotas? If people adopt client-id quotas to begin with, it
>> > seems
>> > > like we might not be providing a clean upgrade path.
>> > >
>> > > As I said, I haven't kept up to date with the details of the security
>> and
>> > > quota features, but I'd want to make sure we didn't suggest one path
>> with
>> > > 0.9, then add another that we can't provide a clean upgrade path to.
>> > >
>> > > -Ewen
>> > >
>> > > On Fri, Apr 22, 2016 at 7:22 AM, Rajini Sivaram <
>> > > rajinisiva...@googlemail.com> wrote:
>> > >
>> > > > The PR for KAFKA-3492 (https://github.com/apache/kafka/pull/1256)
>> > > contains
>> > > > the code associated with KIP-55. I will keep it updated during the
>> > review
>> > > > process.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Rajini
>> > > >
>> > > > On Mon, Apr 18, 2016 at 4:41 PM, Rajini Sivaram <
>> > > > rajinisiva...@googlemail.com> wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > I have just created KIP-55 to support quotas based on
>> authenticated
>> > > user
>> > > > > principals.
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
>> > > > >
>> > > > > 

Build failed in Jenkins: kafka-trunk-jdk8 #644

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3747; Close `RecordBatch.records` when append to batch fails

--
Started by an SCM change
Started by an SCM change
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-2 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision fe27d8f787f38428e0add36edeac9d694f16af53 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f fe27d8f787f38428e0add36edeac9d694f16af53
 > git rev-list dee38806663b0062706dfaca40da9537792f05a9 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson6140242388804723532.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 10.795 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson50909447252358389.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJavawarning: [options] bootstrap class path 
not set in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:kafka-trunk-jdk8:clients:processResources UP-TO-DATE
:kafka-trunk-jdk8:clients:classes
:kafka-trunk-jdk8:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk8:clients:createVersionFile
:kafka-trunk-jdk8:clients:jar
:kafka-trunk-jdk8:core:compileJava UP-TO-DATE
:kafka-trunk-jdk8:core:compileScalaJava HotSpot(TM) 64-Bit Server VM warning: 
ignoring option MaxPermSize=512m; support was removed in 8.0

:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for 

Build failed in Jenkins: kafka-0.10.0-jdk7 #105

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3747; Close `RecordBatch.records` when append to batch fails

--
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-5 (docker Ubuntu ubuntu5 ubuntu yahoo-not-h2) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/0.10.0^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/0.10.0^{commit} # timeout=10
Checking out Revision d1e24000c8770d6c207dca265f4cdafe33690325 
(refs/remotes/origin/0.10.0)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d1e24000c8770d6c207dca265f4cdafe33690325
 > git rev-list a86ae26fcb18d307d5d54f7061df613ce148fc33 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson9006276355688133109.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 19.174 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson2449681455978980752.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-0.10.0-jdk7:clients:compileJavaNote: Some input files use unchecked or 
unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:kafka-0.10.0-jdk7:clients:processResources UP-TO-DATE
:kafka-0.10.0-jdk7:clients:classes
:kafka-0.10.0-jdk7:clients:determineCommitId UP-TO-DATE
:kafka-0.10.0-jdk7:clients:createVersionFile
:kafka-0.10.0-jdk7:clients:jar
:kafka-0.10.0-jdk7:core:compileJava UP-TO-DATE
:kafka-0.10.0-jdk7:core:compileScala
:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP) {
 

[GitHub] kafka pull request: MINOR: Fix documentation table of contents and...

2016-05-24 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1423

MINOR: Fix documentation table of contents and `BLOCK_ON_BUFFER_FULL_DOC`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka minor-doc-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1423


commit 68a26649972e91520da228ac57455726cadd5f06
Author: Ismael Juma 
Date:   2016-05-24T11:30:24Z

Fix documentation table of contents and minor `BLOCK_ON_BUFFER_FULL_DOC` fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: MINOR: Removed 1/2 of the hardcoded sleep time...

2016-05-24 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/1422

MINOR: Removed 1/2 of the hardcoded sleep times



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka minor-integration-timeout2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1422


commit 012494a9cb7ac469c4ae7291f0a439ecac133904
Author: Eno Thereska 
Date:   2016-05-24T10:40:30Z

Removed 1/2 of the hardcoded sleep times




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3218) Kafka-0.9.0.0 does not work as OSGi module

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298026#comment-15298026
 ] 

ASF GitHub Bot commented on KAFKA-3218:
---

Github user rajinisivaram closed the pull request at:

https://github.com/apache/kafka/pull/888


> Kafka-0.9.0.0 does not work as OSGi module
> --
>
> Key: KAFKA-3218
> URL: https://issues.apache.org/jira/browse/KAFKA-3218
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.0
> Environment: Apache Felix OSGi container
> jdk_1.8.0_60
>Reporter: Joe O'Connor
>Assignee: Rajini Sivaram
> Attachments: ContextClassLoaderBug.tar.gz
>
>
> KAFKA-2295 changed all Class.forName() calls to use 
> currentThread().getContextClassLoader() instead of the default "classloader 
> that loaded the current class". 
> OSGi loads each module's classes using a separate classloader so this is now 
> broken.
> Steps to reproduce: 
> # install the kafka-clients servicemix OSGi module 0.9.0.0_1
> # attempt to initialize the Kafka producer client from Java code 
> Expected results: 
> - call to "new KafkaProducer()" succeeds
> Actual results: 
> - "new KafkaProducer()" throws ConfigException:
> {quote}Suppressed: java.lang.Exception: Error starting bundle54: 
> Activator start error in bundle com.openet.testcase.ContextClassLoaderBug 
> [54].
> at 
> org.apache.karaf.bundle.command.BundlesCommand.doExecute(BundlesCommand.java:66)
> ... 12 more
> Caused by: org.osgi.framework.BundleException: Activator start error 
> in bundle com.openet.testcase.ContextClassLoaderBug [54].
> at 
> org.apache.felix.framework.Felix.activateBundle(Felix.java:2276)
> at 
> org.apache.felix.framework.Felix.startBundle(Felix.java:2144)
> at 
> org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
> at 
> org.apache.karaf.bundle.command.Start.executeOnBundle(Start.java:38)
> at 
> org.apache.karaf.bundle.command.BundlesCommand.doExecute(BundlesCommand.java:64)
> ... 12 more
> Caused by: java.lang.ExceptionInInitializerError
> at 
> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:156)
> at com.openet.testcase.Activator.start(Activator.java:16)
> at 
> org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
> at 
> org.apache.felix.framework.Felix.activateBundle(Felix.java:2226)
> ... 16 more
> *Caused by: org.apache.kafka.common.config.ConfigException: Invalid 
> value org.apache.kafka.clients.producer.internals.DefaultPartitioner for 
> configuration partitioner.class: Class* 
> *org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be 
> found.*
> at 
> org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:255)
> at 
> org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:78)
> at 
> org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:94)
> at 
> org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:206)
> {quote}
> Workaround is to call "currentThread().setContextClassLoader(null)" before 
> initializing the kafka producer.
> Possible fix is to catch ClassNotFoundException at ConfigDef.java:247 and 
> retry the Class.forName() call with the default classloader. However with 
> this fix there is still a problem at AbstractConfig.java:206,  where the 
> newInstance() call succeeds but "instanceof" is false because the classes 
> were loaded by different classloaders.
> Testcase attached, see README.txt for instructions.
> See also SM-2743



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3218: Use static classloading for defaul...

2016-05-24 Thread rajinisivaram
Github user rajinisivaram closed the pull request at:

https://github.com/apache/kafka/pull/888


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-55: Secure quotas for authenticated users

2016-05-24 Thread Rajini Sivaram
Jun,

Thank you for the review. I agree that a simple user principal based quota
is sufficient to allocate broker resources fairly in a multi-user system.
Hierarchical quotas proposed in the KIP currently enables clients of a user
to be rate-limited as well. This allows a user to run multiple clients
which don't interfere with each other's quotas. Since there is no clear
requirement to support this at the moment, I am happy to limit the scope of
the KIP to a single-level user-based quota. Will update the KIP today.

Regards,

Rajini

On Mon, May 23, 2016 at 5:24 PM, Jun Rao  wrote:

> Rajini,
>
> Thanks for the KIP. When we first added the quota support, the intention
> was to be able to add a quota per application. Since at that time, we don't
> have security yet. We essentially simulated users with client-ids. Now that
> we do have security. It seems that we just need to have a way to set quota
> at the user level. Setting quota at the combination of users and client-ids
> seems more complicated and I am not sure if there is a good use case.
>
> Also, the new config quota.secure.enable seems a bit weird. Would it be
> better to add a new config quota.type. It defaults to clientId for backward
> compatibility. If one sets it to user, then the default broker level quota
> is for users w/o a customized quota. In this setting, brokers will also
> only take quota set at the user level (i.e., quota set at clientId level
> will be ignored).
>
> Thanks,
>
> Jun
>
> On Tue, May 3, 2016 at 4:32 AM, Rajini Sivaram <
> rajinisiva...@googlemail.com
> > wrote:
>
> > Ewen,
> >
> > Thank you for the review. I agree that ideally we would have one
> definition
> > of quotas that handles all cases. But I couldn't quite fit all the
> > combinations that are possible today with client-id-based quotas into the
> > new configuration. I think upgrade path is not bad since quotas are
> > per-broker. You can configure quotas based on the new configuration, set
> > quota.secure.enable=true and restart the broker. Since there is no
> > requirement for both insecure client-id based quotas and secure
> user-based
> > quotas to co-exist in a cluster, isn't that sufficient? The
> implementation
> > does use a unified approach, so if an alternative configuration can be
> > defined (perhaps with some acceptable limitations?) which can express
> both,
> > it will be easy to implement. Suggestions welcome :-)
> >
> > The cases that the new configuration cannot express, but the old one can
> > are:
> >
> >1. SSL/SASL with multiple users, same client ids used by multiple
> users,
> >client-id based quotas where quotas are shared between multiple users
> >2. Default quotas for client-ids. In the new configuration, default
> >quotas are defined for users and clients with no configured sub-quota
> > share
> >the user's quota.
> >
> >
> >
> > On Sat, Apr 30, 2016 at 6:21 AM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > Rajini,
> > >
> > > I'm admittedly not very familiar with a lot of this code or
> > implementation,
> > > so correct me if I'm making any incorrect assumptions.
> > >
> > > I've only scanned the KIP, but my main concern is the rejection of the
> > > alternative -- unifying client-id and principal quotas. In particular,
> > > doesn't this make an upgrade for brokers using those different
> approaches
> > > difficult since you have to make a hard break between client-id and
> > > principal quotas? If people adopt client-id quotas to begin with, it
> > seems
> > > like we might not be providing a clean upgrade path.
> > >
> > > As I said, I haven't kept up to date with the details of the security
> and
> > > quota features, but I'd want to make sure we didn't suggest one path
> with
> > > 0.9, then add another that we can't provide a clean upgrade path to.
> > >
> > > -Ewen
> > >
> > > On Fri, Apr 22, 2016 at 7:22 AM, Rajini Sivaram <
> > > rajinisiva...@googlemail.com> wrote:
> > >
> > > > The PR for KAFKA-3492 (https://github.com/apache/kafka/pull/1256)
> > > contains
> > > > the code associated with KIP-55. I will keep it updated during the
> > review
> > > > process.
> > > >
> > > > Thanks,
> > > >
> > > > Rajini
> > > >
> > > > On Mon, Apr 18, 2016 at 4:41 PM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I have just created KIP-55 to support quotas based on authenticated
> > > user
> > > > > principals.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> > > > >
> > > > > Comments and feedback are appreciated.
> > > > >
> > > > > Thank you...
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Rajini
> >
>



-- 

Reporting security issues

2016-05-24 Thread Ismael Juma
Hi all,

Since Kafka implements a number of security features, we need a procedure
for reporting potential security vulnerabilities privately (as per
http://www.apache.org/security/). We have added a simple page to the
website that describes the procedure (thanks Flavio):

http://kafka.apache.org/project-security.html

See https://issues.apache.org/jira/browse/KAFKA-3709 for more background.

If you have suggestions on how the page could be improved, pull requests
are welcome. :)

Ismael


[jira] [Resolved] (KAFKA-3709) Create project security page

2016-05-24 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3709.

   Resolution: Fixed
Fix Version/s: 0.10.0.0

> Create project security page
> 
>
> Key: KAFKA-3709
> URL: https://issues.apache.org/jira/browse/KAFKA-3709
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
> Fix For: 0.10.0.0
>
>
> We are creating a security@k.a.o mailing list to receive reports of potential 
> vulnerabilities. Now that Kafka has security in place, the community might 
> starts receiving vulnerability reports and we need to follow the guidelines 
> here:
> http://www.apache.org/security/
> Specifically, security issues are better handled in a project-specific list. 
> This jira is to create a web page that informs users and contributors of how 
> we are supposed to handle security issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3709) Create project security page

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297978#comment-15297978
 ] 

ASF GitHub Bot commented on KAFKA-3709:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka-site/pull/12


> Create project security page
> 
>
> Key: KAFKA-3709
> URL: https://issues.apache.org/jira/browse/KAFKA-3709
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>
> We are creating a security@k.a.o mailing list to receive reports of potential 
> vulnerabilities. Now that Kafka has security in place, the community might 
> starts receiving vulnerability reports and we need to follow the guidelines 
> here:
> http://www.apache.org/security/
> Specifically, security issues are better handled in a project-specific list. 
> This jira is to create a web page that informs users and contributors of how 
> we are supposed to handle security issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka-site pull request: KAFKA-3709: Create a project security pag...

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka-site/pull/12


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3744) Message format needs to identify serializer

2016-05-24 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297898#comment-15297898
 ] 

Ismael Juma commented on KAFKA-3744:


Hi [~davek22]. A change to the message format would require a KIP:

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

You may also choose to email the mailing list before doing the KIP to get 
feedback from a wider group. There are other ways of achieving something like 
this (eg https://github.com/confluentinc/schema-registry) with different 
trade-offs.

> Message format needs to identify serializer
> ---
>
> Key: KAFKA-3744
> URL: https://issues.apache.org/jira/browse/KAFKA-3744
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Kay
>Priority: Minor
>
> https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with 
> https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0.
> But Kafka documentation on message formats needs to be more explicit for new 
> users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text 
> from the command line. Beginner's guide 
> (http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign 
> Slide 104 says:
> {noformat}
>Kafka does not care about data format of msg payload
>Up to developer to handle serialization/deserialization
>   Common choices: Avro, JSON
> {noformat}
> If one producer sends lines of console text, another producer sends Avro, a 
> third producer sends JSON, and a fourth sends CBOR, how does the consumer 
> identify which deserializer to use for the payload?  The commit includes an 
> opaque K byte Key that could potentially include a codec identifier, but 
> provides no guidance on how to use it:
> {quote}
> "Leaving the key and value opaque is the right decision: there is a great 
> deal of progress being made on serialization libraries right now, and any 
> particular choice is unlikely to be right for all uses. Needless to say a 
> particular application using Kafka would likely mandate a particular 
> serialization type as part of its usage."
> {quote}
> Mandating any particular serialization is as unrealistic as mandating a 
> single mime-type for all web content.  There must be a way to signal the 
> serialization used to produce this message's V byte payload, and documenting 
> the existence of even a rudimentary codec registry with a few values (text, 
> Avro, JSON, CBOR) would establish the pattern to be used for future 
> serialization libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3747) Close `RecordBatch.records` when append to batch fails

2016-05-24 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3747:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1418
[https://github.com/apache/kafka/pull/1418]

> Close `RecordBatch.records` when append to batch fails
> --
>
> Key: KAFKA-3747
> URL: https://issues.apache.org/jira/browse/KAFKA-3747
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.0.1
>
>
> We should close the existing `RecordBatch.records` when we create a new 
> `RecordBatch` for the `TopicPartition`.
> This would mean that we would only retain temporary resources like 
> compression stream buffers for one `RecordBatch` per partition, which can 
> have a significant impact when producers are dealing with slow brokers, see 
> KAFKA-3704 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3747) Close `RecordBatch.records` when append to batch fails

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297877#comment-15297877
 ] 

ASF GitHub Bot commented on KAFKA-3747:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1418


> Close `RecordBatch.records` when append to batch fails
> --
>
> Key: KAFKA-3747
> URL: https://issues.apache.org/jira/browse/KAFKA-3747
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.0.1
>
>
> We should close the existing `RecordBatch.records` when we create a new 
> `RecordBatch` for the `TopicPartition`.
> This would mean that we would only retain temporary resources like 
> compression stream buffers for one `RecordBatch` per partition, which can 
> have a significant impact when producers are dealing with slow brokers, see 
> KAFKA-3704 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >