[jira] [Assigned] (KAFKA-3123) Follower Broker cannot start if offsets are already out of range

2016-08-09 Thread Soumyajit Sahu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyajit Sahu reassigned KAFKA-3123:
-

Assignee: Soumyajit Sahu  (was: Neha Narkhede)

> Follower Broker cannot start if offsets are already out of range
> 
>
> Key: KAFKA-3123
> URL: https://issues.apache.org/jira/browse/KAFKA-3123
> Project: Kafka
>  Issue Type: Bug
>  Components: core, replication
>Affects Versions: 0.9.0.0
>Reporter: Soumyajit Sahu
>Assignee: Soumyajit Sahu
>Priority: Critical
>  Labels: patch
> Fix For: 0.10.0.2
>
> Attachments: 
> 0001-Fix-Follower-crashes-when-offset-out-of-range-during.patch
>
>
> I was trying to upgrade our test Windows cluster from 0.8.1.1 to 0.9.0 one 
> machine at a time. Our logs have just 2 hours of retention. I had re-imaged 
> the test machine under consideration, and got the following error in loop 
> after starting afresh with 0.9.0 broker:
> [2016-01-19 13:57:28,809] WARN [ReplicaFetcherThread-1-169595708], Replica 
> 15588 for partition [EventLogs4,1] reset its fetch offset from 0 to 
> current leader 169595708's start offset 334086 
> (kafka.server.ReplicaFetcherThread)
> [2016-01-19 13:57:28,809] ERROR [ReplicaFetcherThread-1-169595708], Error 
> getting offset for partition [EventLogs4,1] to broker 169595708 
> (kafka.server.ReplicaFetcherThread)
> java.lang.IllegalStateException: Compaction for partition [EXO_EventLogs4,1] 
> cannot be aborted and paused since it is in LogCleaningPaused state.
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.log.LogCleanerManager.abortAndPauseCleaning(LogCleanerManager.scala:140)
>   at kafka.log.LogCleaner.abortAndPauseCleaning(LogCleaner.scala:141)
>   at kafka.log.LogManager.truncateFullyAndStartAt(LogManager.scala:304)
>   at 
> kafka.server.ReplicaFetcherThread.handleOffsetOutOfRange(ReplicaFetcherThread.scala:185)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:152)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:122)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:122)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> I could unblock myself with a code change. I deleted the action for "case s 
> =>" in the LogCleanerManager.scala's abortAndPauseCleaning(). I think we 
> should not throw exception if the state is already LogCleaningAborted or 
> LogCleaningPaused in this function, but instead just let it roll.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4027) Leader for a cetain partition unavailable forever

2016-08-09 Thread tu nguyen khac (JIRA)
tu nguyen khac created KAFKA-4027:
-

 Summary: Leader for a cetain partition unavailable forever
 Key: KAFKA-4027
 URL: https://issues.apache.org/jira/browse/KAFKA-4027
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.0
Reporter: tu nguyen khac


I have a cluster of brokers ( 9 box) , i 'm naming it from 0 --> 8 . 
Yesterday some servers went down ( hard reset ) i regularly restart these 
server ( down servers ) but after that some topics cannot assign leader 
i checked server log and retrieved these logging : 

kafka.common.NotAssignedReplicaException: Leader 1 failed to record follower 
6's position -1 since the replica is not recognized to be one of the assigned 
replicas 1 for partition [tos_htv3tv.com,31].
at 
kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:251)
at 
kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:864)
at 
kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:861)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
at 
kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:861)
at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:470)
at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:496)
at kafka.server.KafkaApis.handle(KafkaApis.scala:77)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
i tried to run Prefered Leader but it didn't work ( some partitions has node 
leader ) :(




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Ewen Cheslack-Postava
+1 (binding), thanks for working on this Vahid.

@Dana - See https://cwiki.apache.org/confluence/display/KAFKA/Bylaws re:
binding/non-binding, although I now notice that we specify criteria (lazy
majority) on the KIP overview
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Process
but don't seem to specify whose votes are binding -- we've used active
committers as binding votes for KIPs.

-Ewen

On Tue, Aug 9, 2016 at 11:25 AM, Guozhang Wang  wrote:

> +1.
>
> On Tue, Aug 9, 2016 at 10:06 AM, Jun Rao  wrote:
>
> > Vahid,
> >
> > Thanks for the clear explanation in the KIP. +1
> >
> > Jun
> >
> > On Mon, Aug 8, 2016 at 11:53 AM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > I would like to initiate the voting process for KIP-70 (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> > > Consumer%27s+Subscription+Change
> > > ).
> > >
> > > The only issue that was discussed in the discussion thread is
> > > compatibility, but because it applies to an edge case, it is not
> expected
> > > to impact existing users.
> > > The proposal was shared with Spark and Storm users and no issue was
> > raised
> > > by those communities.
> > >
> > > Thanks.
> > >
> > > Regards,
> > > --Vahid
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
Thanks,
Ewen


Re: Consumer Offset Migration Tool

2016-08-09 Thread Grant Henke
Hi Jun,

Exactly what Gwen said. I am assuming shutdown old consumers, migrate
offsets, start new consumers. This is best for cases where you are
migrating from the old clients to the new clients without ever using dual
commit in the old client. Because the new clients can't coordinate with the
old ones an outage is required regardless.

Thanks,
Grant

On Tue, Aug 9, 2016 at 8:19 PM, Gwen Shapira  wrote:

> Jun,
>
> Grant's use-case is about migrating from old-consumer-committing-to-ZK
> to new-consumer-committing-to-Kafka (which is what happens if you
> upgrade Flume, and maybe other stream processing systems too). This
> seems to require shutting down all instances in any case.
>
> Gwen
>
> On Tue, Aug 9, 2016 at 6:05 PM, Jun Rao  wrote:
> > Hi, Grant,
> >
> > For your tool to work, do you expect all consumer instances in the
> consumer
> > group to be stopped before copying the offsets? Some applications may not
> > want to do that.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Aug 9, 2016 at 10:01 AM, Grant Henke 
> wrote:
> >
> >> I had to write a simple offset migration tool and I wanted to get
> feedback
> >> on whether or not this would be a useful addition to Apache Kafka.
> >>
> >> Currently the path to upgrade from the zookeeper offsets to the Kafka
> >> offset (and often the Scala to Java client) is via dual commit. The
> process
> >> is documented here:
> >> http://kafka.apache.org/documentation.html#offsetmigration
> >>
> >> The reason that process wasn't sufficient in my case is because:
> >>
> >>- It needs to be done ahead of the upgrade
> >>- It requires the old client to commit at least once in dual commit
> mode
> >>- Some frameworks don't expose the dual commit functionality well
> >>- Dual commit is not supported in 0.8.1.x
> >>
> >> The tool I wrote takes the relevant connection information and a
> consumer
> >> group and simply copies the Zookeeper offsets into the Kafka offsets for
> >> that group.
> >> A rough WIP PR can be seen here: https://github.com/apache/
> kafka/pull/1715
> >>
> >> Even though many users have already made the transition, I think this
> could
> >> still be useful in Kafka. Here are a few reasons:
> >>
> >>- It simplifies the migration for users who have yet to migrate,
> >>especially as the old clients get deprecated and removed
> >>- Though the tool is not available in the Kafka 0.8.x or 0.9.x
> series,
> >>downloading and using the jar from maven would be fairly
> straightforward
> >>   - Alternatively this could be a separate repo or jar, though I
> hardly
> >>   want to push this single tool to maven as a standalone artifact.
> >>
> >> Do you think this is useful in Apache Kafka? Any thoughts on the
> approach?
> >>
> >> Thanks,
> >> Grant
> >> --
> >> Grant Henke
> >> Software Engineer | Cloudera
> >> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >>
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Consumer Offset Migration Tool

2016-08-09 Thread Gwen Shapira
Jun,

Grant's use-case is about migrating from old-consumer-committing-to-ZK
to new-consumer-committing-to-Kafka (which is what happens if you
upgrade Flume, and maybe other stream processing systems too). This
seems to require shutting down all instances in any case.

Gwen

On Tue, Aug 9, 2016 at 6:05 PM, Jun Rao  wrote:
> Hi, Grant,
>
> For your tool to work, do you expect all consumer instances in the consumer
> group to be stopped before copying the offsets? Some applications may not
> want to do that.
>
> Thanks,
>
> Jun
>
>
> On Tue, Aug 9, 2016 at 10:01 AM, Grant Henke  wrote:
>
>> I had to write a simple offset migration tool and I wanted to get feedback
>> on whether or not this would be a useful addition to Apache Kafka.
>>
>> Currently the path to upgrade from the zookeeper offsets to the Kafka
>> offset (and often the Scala to Java client) is via dual commit. The process
>> is documented here:
>> http://kafka.apache.org/documentation.html#offsetmigration
>>
>> The reason that process wasn't sufficient in my case is because:
>>
>>- It needs to be done ahead of the upgrade
>>- It requires the old client to commit at least once in dual commit mode
>>- Some frameworks don't expose the dual commit functionality well
>>- Dual commit is not supported in 0.8.1.x
>>
>> The tool I wrote takes the relevant connection information and a consumer
>> group and simply copies the Zookeeper offsets into the Kafka offsets for
>> that group.
>> A rough WIP PR can be seen here: https://github.com/apache/kafka/pull/1715
>>
>> Even though many users have already made the transition, I think this could
>> still be useful in Kafka. Here are a few reasons:
>>
>>- It simplifies the migration for users who have yet to migrate,
>>especially as the old clients get deprecated and removed
>>- Though the tool is not available in the Kafka 0.8.x or 0.9.x series,
>>downloading and using the jar from maven would be fairly straightforward
>>   - Alternatively this could be a separate repo or jar, though I hardly
>>   want to push this single tool to maven as a standalone artifact.
>>
>> Do you think this is useful in Apache Kafka? Any thoughts on the approach?
>>
>> Thanks,
>> Grant
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: Consumer Offset Migration Tool

2016-08-09 Thread Jun Rao
Hi, Grant,

For your tool to work, do you expect all consumer instances in the consumer
group to be stopped before copying the offsets? Some applications may not
want to do that.

Thanks,

Jun


On Tue, Aug 9, 2016 at 10:01 AM, Grant Henke  wrote:

> I had to write a simple offset migration tool and I wanted to get feedback
> on whether or not this would be a useful addition to Apache Kafka.
>
> Currently the path to upgrade from the zookeeper offsets to the Kafka
> offset (and often the Scala to Java client) is via dual commit. The process
> is documented here:
> http://kafka.apache.org/documentation.html#offsetmigration
>
> The reason that process wasn't sufficient in my case is because:
>
>- It needs to be done ahead of the upgrade
>- It requires the old client to commit at least once in dual commit mode
>- Some frameworks don't expose the dual commit functionality well
>- Dual commit is not supported in 0.8.1.x
>
> The tool I wrote takes the relevant connection information and a consumer
> group and simply copies the Zookeeper offsets into the Kafka offsets for
> that group.
> A rough WIP PR can be seen here: https://github.com/apache/kafka/pull/1715
>
> Even though many users have already made the transition, I think this could
> still be useful in Kafka. Here are a few reasons:
>
>- It simplifies the migration for users who have yet to migrate,
>especially as the old clients get deprecated and removed
>- Though the tool is not available in the Kafka 0.8.x or 0.9.x series,
>downloading and using the jar from maven would be fairly straightforward
>   - Alternatively this could be a separate repo or jar, though I hardly
>   want to push this single tool to maven as a standalone artifact.
>
> Do you think this is useful in Apache Kafka? Any thoughts on the approach?
>
> Thanks,
> Grant
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


[jira] [Commented] (KAFKA-3123) Follower Broker cannot start if offsets are already out of range

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414528#comment-15414528
 ] 

ASF GitHub Bot commented on KAFKA-3123:
---

GitHub user soumyajit-sahu opened a pull request:

https://github.com/apache/kafka/pull/1716

KAFKA-3123: Make abortAndPauseCleaning() a no-op when state is already 
LogCleaningPaused

The function of LogCleanerManager.abortAndPauseCleaning() is to mark a 
TopicAndPartition as LogCleaningPaused. Hence, it should be a no-op when the 
TopicAndPartition is already in that state.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Microsoft/kafka 
ignoreLogCleaningPauseRequestWhenAlreadyInPausedState

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1716


commit 3ad294377eee6e6ad3fdb47b7220dcb0913f1fc9
Author: Som Sahu 
Date:   2016-08-10T00:45:55Z

Make abortAndPauseCleaning a no-op when state is already LogCleaningPaused




> Follower Broker cannot start if offsets are already out of range
> 
>
> Key: KAFKA-3123
> URL: https://issues.apache.org/jira/browse/KAFKA-3123
> Project: Kafka
>  Issue Type: Bug
>  Components: core, replication
>Affects Versions: 0.9.0.0
>Reporter: Soumyajit Sahu
>Assignee: Neha Narkhede
>Priority: Critical
>  Labels: patch
> Fix For: 0.10.0.2
>
> Attachments: 
> 0001-Fix-Follower-crashes-when-offset-out-of-range-during.patch
>
>
> I was trying to upgrade our test Windows cluster from 0.8.1.1 to 0.9.0 one 
> machine at a time. Our logs have just 2 hours of retention. I had re-imaged 
> the test machine under consideration, and got the following error in loop 
> after starting afresh with 0.9.0 broker:
> [2016-01-19 13:57:28,809] WARN [ReplicaFetcherThread-1-169595708], Replica 
> 15588 for partition [EventLogs4,1] reset its fetch offset from 0 to 
> current leader 169595708's start offset 334086 
> (kafka.server.ReplicaFetcherThread)
> [2016-01-19 13:57:28,809] ERROR [ReplicaFetcherThread-1-169595708], Error 
> getting offset for partition [EventLogs4,1] to broker 169595708 
> (kafka.server.ReplicaFetcherThread)
> java.lang.IllegalStateException: Compaction for partition [EXO_EventLogs4,1] 
> cannot be aborted and paused since it is in LogCleaningPaused state.
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.log.LogCleanerManager.abortAndPauseCleaning(LogCleanerManager.scala:140)
>   at kafka.log.LogCleaner.abortAndPauseCleaning(LogCleaner.scala:141)
>   at kafka.log.LogManager.truncateFullyAndStartAt(LogManager.scala:304)
>   at 
> kafka.server.ReplicaFetcherThread.handleOffsetOutOfRange(ReplicaFetcherThread.scala:185)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:152)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:122)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:122)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at 

[GitHub] kafka pull request #1716: KAFKA-3123: Make abortAndPauseCleaning() a no-op w...

2016-08-09 Thread soumyajit-sahu
GitHub user soumyajit-sahu opened a pull request:

https://github.com/apache/kafka/pull/1716

KAFKA-3123: Make abortAndPauseCleaning() a no-op when state is already 
LogCleaningPaused

The function of LogCleanerManager.abortAndPauseCleaning() is to mark a 
TopicAndPartition as LogCleaningPaused. Hence, it should be a no-op when the 
TopicAndPartition is already in that state.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Microsoft/kafka 
ignoreLogCleaningPauseRequestWhenAlreadyInPausedState

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1716


commit 3ad294377eee6e6ad3fdb47b7220dcb0913f1fc9
Author: Som Sahu 
Date:   2016-08-10T00:45:55Z

Make abortAndPauseCleaning a no-op when state is already LogCleaningPaused




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3123) Follower Broker cannot start if offsets are already out of range

2016-08-09 Thread Soumyajit Sahu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414500#comment-15414500
 ] 

Soumyajit Sahu commented on KAFKA-3123:
---

I hit another (similar) exception while trying out Kafka from 0.10.0.1.

I think abortAndPauseCleaning() should be a no-op if the state is already 
LogCleaningPaused. I will create a PR.

java.lang.IllegalStateException: Compaction for partition [simplestress_5,7] 
cannot be aborted and paused since it is in LogCleaningPaused state.
at 
kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
at 
kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
at 
kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.log.LogCleanerManager.abortAndPauseCleaning(LogCleanerManager.scala:140)
at kafka.log.LogCleaner.abortAndPauseCleaning(LogCleaner.scala:148)
at kafka.log.LogManager.truncateFullyAndStartAt(LogManager.scala:307)
at 
kafka.server.ReplicaFetcherThread.handleOffsetOutOfRange(ReplicaFetcherThread.scala:218)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:157)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:127)
at scala.Option.foreach(Option.scala:257)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:127)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:125)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:125)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:125)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:125)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:123)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

> Follower Broker cannot start if offsets are already out of range
> 
>
> Key: KAFKA-3123
> URL: https://issues.apache.org/jira/browse/KAFKA-3123
> Project: Kafka
>  Issue Type: Bug
>  Components: core, replication
>Affects Versions: 0.9.0.0
>Reporter: Soumyajit Sahu
>Assignee: Neha Narkhede
>Priority: Critical
>  Labels: patch
> Fix For: 0.10.0.2
>
> Attachments: 
> 0001-Fix-Follower-crashes-when-offset-out-of-range-during.patch
>
>
> I was trying to upgrade our test Windows cluster from 0.8.1.1 to 0.9.0 one 
> machine at a time. Our logs have just 2 hours of retention. I had re-imaged 
> the test machine under consideration, and got the following error in loop 
> after starting afresh with 0.9.0 broker:
> [2016-01-19 13:57:28,809] WARN [ReplicaFetcherThread-1-169595708], Replica 
> 15588 for partition [EventLogs4,1] reset its fetch offset from 0 to 
> current leader 169595708's start offset 334086 
> (kafka.server.ReplicaFetcherThread)
> [2016-01-19 13:57:28,809] ERROR [ReplicaFetcherThread-1-169595708], Error 
> getting offset for partition [EventLogs4,1] to broker 169595708 
> (kafka.server.ReplicaFetcherThread)
> java.lang.IllegalStateException: Compaction for partition [EXO_EventLogs4,1] 
> cannot be aborted and paused since it is in LogCleaningPaused state.
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> 

[jira] [Closed] (KAFKA-3448) Support zone index in IPv6 regex

2016-08-09 Thread Soumyajit Sahu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyajit Sahu closed KAFKA-3448.
-

> Support zone index in IPv6 regex
> 
>
> Key: KAFKA-3448
> URL: https://issues.apache.org/jira/browse/KAFKA-3448
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0
> Environment: Windows,Linux
>Reporter: Soumyajit Sahu
>Assignee: Soumyajit Sahu
> Fix For: 0.10.0.0
>
>
> When an address is written textually, the zone index is appended to the 
> address, separated by a percent sign (%). The actual syntax of zone indices 
> depends on the operating system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-2305) Cluster Monitoring and Management UI

2016-08-09 Thread Soumyajit Sahu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyajit Sahu closed KAFKA-2305.
-

> Cluster Monitoring and Management UI
> 
>
> Key: KAFKA-2305
> URL: https://issues.apache.org/jira/browse/KAFKA-2305
> Project: Kafka
>  Issue Type: Wish
>  Components: admin, replication
>Reporter: Soumyajit Sahu
>Assignee: Soumyajit Sahu
>Priority: Minor
>
> At present, we don't have a Admin and Monitoring UI for Kafka cluster.
> We need a view from the perspective of the machines in the cluster.
> Following issues need to be addressed using a UI:
> 1) Resource usage and Kafka statistics by machine.
> 2) View of the partition and replication layout by machine.
> 3) View of spindle usage (or different log directories usage) pattern by 
> machine.
> 4) Ability to move replicas among brokers using the UI and by leveraging the 
> Reassign Partitions Tool.
> More details in the doc in the External Issue Url field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] Time-based releases for Apache Kafka

2016-08-09 Thread Gwen Shapira
Dear Kafka Developers and Users,

In the past, our releases have been quite unpredictable. We'll notice
that a large number of nice features made it in (or are close),
someone would suggest a release and we'd do it. This is fun, but makes
planning really hard - we saw it during the last release which we
decided to delay by a few weeks to allow more features to "land".

Many other communities have adopted time-based releases successfully
(Cassandra, GCC, LLVM, Fedora, Gnome, Ubuntu, etc.). And I thought it
will make sense for the Apache Kafka community to try doing the same.

The benefits of this approach are:

1. A quicker feedback cycle and users can benefit from features
quicker (assuming for reasonably short time between releases - I was
thinking 4 months)

2. Predictability for contributors and users:
* Developers and reviewers can decide in advance what release they are
aiming for with specific features.
* If a feature misses a release we have a good idea of when it will show up.
* Users know when to expect their features

3. Transparency - There will be a published cut-off date (AKA feature
freeze) for the release and people will know about it in advance.
Hopefully this will remove the contention around which features make
it.

4. Quality - we've seen issues pop up in release candidates due to
last-minute features that didn't have proper time to bake in. More
time between feature freeze and release will let us test more,
document more and resolve more issues.

Since nothing is ever perfect, there will be some downsides:

1. Most notably, features that miss the feature-freeze date for a
release will have to wait few month for the next release. Features
will reach users faster overall as per benefit #1, but individual
features that just miss the cut will lose out

2. More releases a year mean that being a committer is more work -
release management is still some headache and we'll have more of
those. Hopefully we'll get better at it. Also, the committer list is
growing and hopefully it will be less than once-a-year effort for each
committer.

3. For users, figuring out which release to use and having frequent
new releases to upgrade to may be a bit confusing.

4. Frequent releases mean we need to do bugfix releases for older
branches. Right now we only do bugfix releases to latest release.

I think the benefits outweigh the drawbacks. Or at least suggest that
its worth trying - we can have another discussion in few releases to
see if we want to keep it that way or try something else.

My suggestion for the process:

1. We decide on a reasonable release cadence
2. We decide on release dates (even rough estimate such as "end of
February" or something) and work back feature freeze dates.
3. Committers volunteer to be "release managers" for specific
releases. We can coordinate on the list or on a wiki. If no committer
volunteers, we assume the community doesn't need a release and skip
it.
4. At the "feature freeze" date, the release manager announces the
contents of the release (which KIPs made it in on time), creates the
release branch and starts the release process as usual. From this
point onwards, only bug fixes should be double-committed to the
release branch while trunk can start collecting features for the
subsequent release.

Comments and improvements are appreciated.

Gwen Shapira
Former-release-manager


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Jun Rao
When there are several unthrottled replicas, we could also just do what's
suggested in KIP-74. The client is responsible for reordering the
partitions and the leader fills in the bytes to those partitions in order,
up to the quota limit.

We could also do what you suggested. If quota is exceeded, include empty
data in the response for throttled replicas. Keep doing that until enough
time has passed so that the quota is no longer exceeded. This potentially
allows better batching per partition. Not sure if the two makes a big
difference in practice though.

Thanks,

Jun


On Tue, Aug 9, 2016 at 2:31 PM, Joel Koshy  wrote:

> >
> >
> >
> > On the leader side, one challenge is related to the fairness issue that
> Ben
> > brought up. The question is what if the fetch response limit is filled up
> > by the throttled replicas? If this happens constantly, we will delay the
> > progress of those un-throttled replicas. However, I think we can address
> > this issue by trying to fill up the unthrottled replicas in the response
> > first. So, the algorithm would be. Fill up unthrottled replicas up to the
> > fetch response limit. If there is space left, fill up throttled replicas.
> > If quota is exceeded for the throttled replicas, reduce the bytes in the
> > throttled replicas in the response accordingly.
> >
>
> Right - that's what I was trying to convey by truncation (vs empty). So we
> would attempt to fill the response for throttled partitions as much as we
> can before hitting the quota limit. There is one more detail to handle in
> this: if there are several throttled partitions and not enough remaining
> allowance in the fetch response to include all the throttled replicas then
> we would need to decide which of those partitions get a share; which is why
> I'm wondering if it is easier to return empty for those partitions entirely
> in the fetch response - they will make progress in the subsequent fetch. If
> they don't make fast enough progress then that would be a case for raising
> the threshold or letting it complete at an off-peak time.
>
>
> >
> > With this approach, we need some new logic to handle throttling on the
> > leader, but we can leave the replica threading model unchanged. So,
> > overall, this still seems to be a simpler approach.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com
> > > wrote:
> >
> > > Nice write up Ben.
> > >
> > > I agree with Joel for keeping this simple by excluding the partitions
> > from
> > > the fetch request/response when the quota is violated at the follower
> or
> > > leader instead of having a separate set of threads for handling the
> quota
> > > and non quota cases. Even though its different from the current quota
> > > implementation it should be OK since its internal to brokers and can be
> > > handled by tuning the quota configs for it appropriately by the admins.
> > >
> > > Also can you elaborate with an example how this would be handled :
> > > *guaranteeing
> > > ordering of updates when replicas shift threads*
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy 
> wrote:
> > >
> > > > On the need for both leader/follower throttling: that makes sense -
> > > thanks
> > > > for clarifying. For completeness, can we add this detail to the doc -
> > > say,
> > > > after the quote that I pasted earlier?
> > > >
> > > > From an implementation perspective though: I’m still interested in
> the
> > > > simplicity of not having to add separate replica fetchers, delay
> queue
> > on
> > > > the leader, and “move” partitions from the throttled replica fetchers
> > to
> > > > the regular replica fetchers once caught up.
> > > >
> > > > Instead, I think it would work and be simpler to include or exclude
> the
> > > > partitions in the fetch request from the follower and fetch response
> > from
> > > > the leader when the quota is violated. The issue of fairness that Ben
> > > noted
> > > > may be a wash between the two options (that Ben wrote in his email).
> > With
> > > > the default quota delay mechanism, partitions get delayed essentially
> > at
> > > > random - i.e., whoever fetches at the time of quota violation gets
> > > delayed
> > > > at the leader. So we can adopt a similar policy in choosing to
> truncate
> > > > partitions in fetch responses. i.e., if at the time of handling the
> > fetch
> > > > the “effect” replication rate exceeds the quota then either empty or
> > > > truncate those partitions from the response. (BTW effect replication
> is
> > > > your terminology in the wiki - i.e., replication due to partition
> > > > reassignment, adding brokers, etc.)
> > > >
> > > > While this may be slightly different from the existing quota
> mechanism
> > I
> > > > think the difference is small (since we would reuse the quota manager
> > at
> > > > worst with some refactoring) and will be internal to 

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Joel Koshy
>
>
>
> On the leader side, one challenge is related to the fairness issue that Ben
> brought up. The question is what if the fetch response limit is filled up
> by the throttled replicas? If this happens constantly, we will delay the
> progress of those un-throttled replicas. However, I think we can address
> this issue by trying to fill up the unthrottled replicas in the response
> first. So, the algorithm would be. Fill up unthrottled replicas up to the
> fetch response limit. If there is space left, fill up throttled replicas.
> If quota is exceeded for the throttled replicas, reduce the bytes in the
> throttled replicas in the response accordingly.
>

Right - that's what I was trying to convey by truncation (vs empty). So we
would attempt to fill the response for throttled partitions as much as we
can before hitting the quota limit. There is one more detail to handle in
this: if there are several throttled partitions and not enough remaining
allowance in the fetch response to include all the throttled replicas then
we would need to decide which of those partitions get a share; which is why
I'm wondering if it is easier to return empty for those partitions entirely
in the fetch response - they will make progress in the subsequent fetch. If
they don't make fast enough progress then that would be a case for raising
the threshold or letting it complete at an off-peak time.


>
> With this approach, we need some new logic to handle throttling on the
> leader, but we can leave the replica threading model unchanged. So,
> overall, this still seems to be a simpler approach.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > Nice write up Ben.
> >
> > I agree with Joel for keeping this simple by excluding the partitions
> from
> > the fetch request/response when the quota is violated at the follower or
> > leader instead of having a separate set of threads for handling the quota
> > and non quota cases. Even though its different from the current quota
> > implementation it should be OK since its internal to brokers and can be
> > handled by tuning the quota configs for it appropriately by the admins.
> >
> > Also can you elaborate with an example how this would be handled :
> > *guaranteeing
> > ordering of updates when replicas shift threads*
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy  wrote:
> >
> > > On the need for both leader/follower throttling: that makes sense -
> > thanks
> > > for clarifying. For completeness, can we add this detail to the doc -
> > say,
> > > after the quote that I pasted earlier?
> > >
> > > From an implementation perspective though: I’m still interested in the
> > > simplicity of not having to add separate replica fetchers, delay queue
> on
> > > the leader, and “move” partitions from the throttled replica fetchers
> to
> > > the regular replica fetchers once caught up.
> > >
> > > Instead, I think it would work and be simpler to include or exclude the
> > > partitions in the fetch request from the follower and fetch response
> from
> > > the leader when the quota is violated. The issue of fairness that Ben
> > noted
> > > may be a wash between the two options (that Ben wrote in his email).
> With
> > > the default quota delay mechanism, partitions get delayed essentially
> at
> > > random - i.e., whoever fetches at the time of quota violation gets
> > delayed
> > > at the leader. So we can adopt a similar policy in choosing to truncate
> > > partitions in fetch responses. i.e., if at the time of handling the
> fetch
> > > the “effect” replication rate exceeds the quota then either empty or
> > > truncate those partitions from the response. (BTW effect replication is
> > > your terminology in the wiki - i.e., replication due to partition
> > > reassignment, adding brokers, etc.)
> > >
> > > While this may be slightly different from the existing quota mechanism
> I
> > > think the difference is small (since we would reuse the quota manager
> at
> > > worst with some refactoring) and will be internal to the broker.
> > >
> > > So I guess the question is if this alternative is simpler enough and
> > > equally functional to not go with dedicated throttled replica fetchers.
> > >
> > > On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao  wrote:
> > >
> > > > Just to elaborate on what Ben said why we need throttling on both the
> > > > leader and the follower side.
> > > >
> > > > If we only have throttling on the follower side, consider a case that
> > we
> > > > add 5 more new brokers and want to move some replicas from existing
> > > brokers
> > > > over to those 5 brokers. Each of those broker is going to fetch data
> > from
> > > > all existing brokers. Then, it's possible that the aggregated fetch
> > load
> > > > from those 5 brokers on a particular existing broker exceeds its
> > outgoing
> > > > network bandwidth, even though the 

Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-08-09 Thread Jun Rao
Grant,

Thanks for the reply. I had one inline reply below.

On Mon, Aug 8, 2016 at 2:44 PM, Grant Henke  wrote:

> Thank you for the feedback everyone. Below I respond to the last batch of
> emails:
>
> You mention that "delete" actions
> > will get processed before "add" actions, which makes sense to me. An
> > alternative to avoid the confusion in the first place would be to replace
> > the AlterAcls APIs with separate AddAcls and DeleteAcls APIs. Was this
> > option already rejected?
>
>
> 4. There is no CreateAcls or DeleteAcls (unlike CreateTopics and
>
> DeleteTopics, for example). It would be good to explain the reasoning for
>
> this choice (Jason also asked this question).
>
>
> Re: 4 and Create/Delete vs Alter, I'm a fan of being able to bundle a bunch
> > of changes in one request. Seems like an ACL change could easily include
> > additions + deletions and is nice to bundle in one request that can be
> > processed as quickly as possible. I don't think its a requirement, but
> the
> > usage pattern for changing ACLs does seem like its different from how you
> > manage topics where you probably usually are either only creating or only
> > deleting topics.
>
>
> I had chosen to handle everything (Add/Delete) in Alter mainly for the
> reason Ewen mentioned. I though it would be fairly common for a single
> change to Add and Remove permissions at the same time. That said the idea
> of having separate requests has not been discussed and is not out of the
> question. More on this below.
>
> I'm a bit concerned about the ordering -- the KIP mentions processing
> > deletes before adds. Isn't that likely to lead to gaps in security? For
> > example, lets say I have a very simple 1 rule ACL. To change it, I have
> to
> > delete it and add a new one. If we do the delete first, don't we end up
> > with an (albeit small) window where the resource ends up unprotected? I
> > would think processing them in the order they are requested gives the
> > operator the control they need to correctly protect resources. Reordering
> > the processing of requests is also just unintuitive to me and I think
> > probably unintuitive to users as well.
>
>
> I agree that this is a bit of a nuance that could lead to rare but
> confusing behavior. This is another chip in the  "separate requests"
> bucket.
>
> The more I think about the protocol and its most common usage in clients,
> the more I lean towards separate requests. Clients are most likely to have
> separate add and remove apis. In order to ensure correct/controlled
> ordering and expected access they will likely fire separate requests
> instead of batching.
>
> Though breaking the request into 2 requests causes more protocol messages,
> its vastly simplifies the broker side implementation and makes the
> expectations very clear. It also follows the Authorizer API much more
> closely.
>
> 1. My main concern is that we are skipping the discussion on the desired
> > model for controlling ACL access and updates. I understand the desire to
> > reduce the scope, but this seems to be a fundamental aspect of the design
> > that we need to get right. Without a plan for that, it is difficult to
> > evaluate if that part of the current proposal is fine.
>
>
> ++1.
>
>
> Do you have any specific thoughts on the approach you would like to see? I
> may be missing the impact and how it affects the rest of the proposal.
>
> The current proposal suggests using a wide authorization requirement where
> the principal must be authorized to the "All" Operation on the "Cluster"
> resource to alter ACLs. I suggested this approach so we could maintain
> backwards compatibility with existing Authorizer implementations while
> still leaving the door open for a more fine grained approach later. If we
> add new Operations or ResourceTypes the existing Authorizer implementations
> will not expect them and could break.
>
> I think choosing to solve it now or later should have no bearing on the
> approach or impact since the "All" Operation on the "Cluster" resource will
> grant access to any model we choose. I could be missing something though.
>
> I had marked it in the follow up work section on the KIP-4 Wiki:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 4+-+Command+line+and+centralized+administrative+operations#KIP-4-
> Commandlineandcentralizedadministrativeoperations-ACLAdminSchema
>
>
> > 2. Are the Java objects in "org.apache.kafka.common.security.auth" going
> > to
> > be public API? If so, we should explain why they should be public and
> > describe them in the KIP. If not, we should mention that.
>
>
> They are public API. They are leveraged and exposed in the Request/Response
> objects and likely the Admin clients. I can describe them a bit more in the
> KIP. They are basically 1-1 mapping from the Authorizer interface. The only
> reason they are being moved/duplicated is because they are needed by the
> clients package and in Java.
>
>
> > 3. It would 

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Jun Rao
Joel, Mayuresh,

Thanks for the suggestion. Yes, I think we could use a single replica fetch
thread to handle both throttled and unthrottled replicas.

On the follower side, this is easy. If the quota is exceeded, we remember
the amount of the delay needed, take those throttled replicas off the fetch
request, and after the delay, add them back.

On the leader side, one challenge is related to the fairness issue that Ben
brought up. The question is what if the fetch response limit is filled up
by the throttled replicas? If this happens constantly, we will delay the
progress of those un-throttled replicas. However, I think we can address
this issue by trying to fill up the unthrottled replicas in the response
first. So, the algorithm would be. Fill up unthrottled replicas up to the
fetch response limit. If there is space left, fill up throttled replicas.
If quota is exceeded for the throttled replicas, reduce the bytes in the
throttled replicas in the response accordingly.

With this approach, we need some new logic to handle throttling on the
leader, but we can leave the replica threading model unchanged. So,
overall, this still seems to be a simpler approach.

Thanks,

Jun

On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat  wrote:

> Nice write up Ben.
>
> I agree with Joel for keeping this simple by excluding the partitions from
> the fetch request/response when the quota is violated at the follower or
> leader instead of having a separate set of threads for handling the quota
> and non quota cases. Even though its different from the current quota
> implementation it should be OK since its internal to brokers and can be
> handled by tuning the quota configs for it appropriately by the admins.
>
> Also can you elaborate with an example how this would be handled :
> *guaranteeing
> ordering of updates when replicas shift threads*
>
> Thanks,
>
> Mayuresh
>
>
> On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy  wrote:
>
> > On the need for both leader/follower throttling: that makes sense -
> thanks
> > for clarifying. For completeness, can we add this detail to the doc -
> say,
> > after the quote that I pasted earlier?
> >
> > From an implementation perspective though: I’m still interested in the
> > simplicity of not having to add separate replica fetchers, delay queue on
> > the leader, and “move” partitions from the throttled replica fetchers to
> > the regular replica fetchers once caught up.
> >
> > Instead, I think it would work and be simpler to include or exclude the
> > partitions in the fetch request from the follower and fetch response from
> > the leader when the quota is violated. The issue of fairness that Ben
> noted
> > may be a wash between the two options (that Ben wrote in his email). With
> > the default quota delay mechanism, partitions get delayed essentially at
> > random - i.e., whoever fetches at the time of quota violation gets
> delayed
> > at the leader. So we can adopt a similar policy in choosing to truncate
> > partitions in fetch responses. i.e., if at the time of handling the fetch
> > the “effect” replication rate exceeds the quota then either empty or
> > truncate those partitions from the response. (BTW effect replication is
> > your terminology in the wiki - i.e., replication due to partition
> > reassignment, adding brokers, etc.)
> >
> > While this may be slightly different from the existing quota mechanism I
> > think the difference is small (since we would reuse the quota manager at
> > worst with some refactoring) and will be internal to the broker.
> >
> > So I guess the question is if this alternative is simpler enough and
> > equally functional to not go with dedicated throttled replica fetchers.
> >
> > On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao  wrote:
> >
> > > Just to elaborate on what Ben said why we need throttling on both the
> > > leader and the follower side.
> > >
> > > If we only have throttling on the follower side, consider a case that
> we
> > > add 5 more new brokers and want to move some replicas from existing
> > brokers
> > > over to those 5 brokers. Each of those broker is going to fetch data
> from
> > > all existing brokers. Then, it's possible that the aggregated fetch
> load
> > > from those 5 brokers on a particular existing broker exceeds its
> outgoing
> > > network bandwidth, even though the inbounding traffic on each of those
> 5
> > > brokers is bounded.
> > >
> > > If we only have throttling on the leader side, consider the same
> example
> > > above. It's possible for the incoming traffic to each of those 5
> brokers
> > to
> > > exceed its network bandwidth since it is fetching data from all
> existing
> > > brokers.
> > >
> > > So, being able to set a quota on both the follower and the leader side
> > > protects both cases.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford  wrote:
> > >
> > > > Hi Joel
> > > >

[jira] [Comment Edited] (KAFKA-4022) TopicCommand is using default max.message.bytes instead of broker's setting

2016-08-09 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413806#comment-15413806
 ] 

Vahid Hashemian edited comment on KAFKA-4022 at 8/9/16 7:47 PM:


[~kornicameis...@gmail.com] Yes, I see what you're saying. It should read from 
the actual setting (not necessarily the default) and report the warning only if 
the topic-level setting is greater than the active broker setting. I hope I got 
that right this time :)

BTW, you can use the {{--force}} switch to suppress the user prompt.


was (Author: vahid):
[~kornicameis...@gmail.com] Yes, I see what you're saying. It should read from 
the actual setting (not necessarily the default) and report the warning only if 
the topic-level setting is greater than the active broker setting. I hope I got 
that right this time :)

> TopicCommand is using default max.message.bytes instead of broker's setting
> ---
>
> Key: KAFKA-4022
> URL: https://issues.apache.org/jira/browse/KAFKA-4022
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
> Environment: Linux testHost 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 
> 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Tomasz Trębski
>Assignee: Vahid Hashemian
>
> Even though it is possible to configure brokers to support message that are 
> bigger than 1MB, admin tool to create topics won't accept the one that's size 
> is above 1,000,000 bytes. 
> Responsible line can be found under this link:
> [TopicCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/TopicCommand.scala#L366]
> Console output also seems to be confirm that:
>  ERROR 
> {code:title=console_otuput|borderStyle=solid}
> 
> *** WARNING: you are creating a topic where the max.message.bytes is greater 
> than the broker  ***
> *** default. This operation is dangerous. There are two potential side 
> effects:  ***
> *** - Consumers will get failures if their fetch.message.max.bytes < the 
> value you are using ***
> *** - Producer requests larger than replica.fetch.max.bytes will not 
> replicate and hence have***
> ***   a higher risk of data loss  
>***
> *** You should ensure both of these settings are greater than the value set 
> here before using***
> *** this topic.   
>***
> 
> - value set here: 1048576
> - Default Broker replica.fetch.max.bytes: 1048576
> - Default Broker max.message.bytes: 112
> - Default Consumer fetch.message.max.bytes: 1048576
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Mayuresh Gharat
Nice write up Ben.

I agree with Joel for keeping this simple by excluding the partitions from
the fetch request/response when the quota is violated at the follower or
leader instead of having a separate set of threads for handling the quota
and non quota cases. Even though its different from the current quota
implementation it should be OK since its internal to brokers and can be
handled by tuning the quota configs for it appropriately by the admins.

Also can you elaborate with an example how this would be handled :
*guaranteeing
ordering of updates when replicas shift threads*

Thanks,

Mayuresh


On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy  wrote:

> On the need for both leader/follower throttling: that makes sense - thanks
> for clarifying. For completeness, can we add this detail to the doc - say,
> after the quote that I pasted earlier?
>
> From an implementation perspective though: I’m still interested in the
> simplicity of not having to add separate replica fetchers, delay queue on
> the leader, and “move” partitions from the throttled replica fetchers to
> the regular replica fetchers once caught up.
>
> Instead, I think it would work and be simpler to include or exclude the
> partitions in the fetch request from the follower and fetch response from
> the leader when the quota is violated. The issue of fairness that Ben noted
> may be a wash between the two options (that Ben wrote in his email). With
> the default quota delay mechanism, partitions get delayed essentially at
> random - i.e., whoever fetches at the time of quota violation gets delayed
> at the leader. So we can adopt a similar policy in choosing to truncate
> partitions in fetch responses. i.e., if at the time of handling the fetch
> the “effect” replication rate exceeds the quota then either empty or
> truncate those partitions from the response. (BTW effect replication is
> your terminology in the wiki - i.e., replication due to partition
> reassignment, adding brokers, etc.)
>
> While this may be slightly different from the existing quota mechanism I
> think the difference is small (since we would reuse the quota manager at
> worst with some refactoring) and will be internal to the broker.
>
> So I guess the question is if this alternative is simpler enough and
> equally functional to not go with dedicated throttled replica fetchers.
>
> On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao  wrote:
>
> > Just to elaborate on what Ben said why we need throttling on both the
> > leader and the follower side.
> >
> > If we only have throttling on the follower side, consider a case that we
> > add 5 more new brokers and want to move some replicas from existing
> brokers
> > over to those 5 brokers. Each of those broker is going to fetch data from
> > all existing brokers. Then, it's possible that the aggregated fetch load
> > from those 5 brokers on a particular existing broker exceeds its outgoing
> > network bandwidth, even though the inbounding traffic on each of those 5
> > brokers is bounded.
> >
> > If we only have throttling on the leader side, consider the same example
> > above. It's possible for the incoming traffic to each of those 5 brokers
> to
> > exceed its network bandwidth since it is fetching data from all existing
> > brokers.
> >
> > So, being able to set a quota on both the follower and the leader side
> > protects both cases.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford  wrote:
> >
> > > Hi Joel
> > >
> > > Thanks for taking the time to look at this. Appreciated.
> > >
> > > Regarding throttling on both leader and follower, this proposal covers
> a
> > > more general solution which can guarantee a quota, even when a
> rebalance
> > > operation produces an asymmetric profile of load. This means
> > administrators
> > > don’t need to calculate the impact that a follower-only quota will have
> > on
> > > the leaders they are fetching from. So for example where replica sizes
> > are
> > > skewed or where a partial rebalance is required.
> > >
> > > Having said that, even with both leader and follower quotas, the use of
> > > additional threads is actually optional. There appear to be two general
> > > approaches (1) omit partitions from fetch requests (follower) / fetch
> > > responses (leader) when they exceed their quota (2) delay them, as the
> > > existing quota mechanism does, using separate fetchers. Both appear
> > valid,
> > > but with slightly different design tradeoffs.
> > >
> > > The issue with approach (1) is that it departs somewhat from the
> existing
> > > quotas implementation, and must include a notion of fairness within,
> the
> > > now size-bounded, request and response. The issue with (2) is
> > guaranteeing
> > > ordering of updates when replicas shift threads, but this is handled,
> for
> > > the most part, in the code today.
> > >
> > > I’ve updated the rejected alternatives section to make this a little
> > > 

[jira] [Commented] (KAFKA-3936) Validate user parameters as early as possible

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413983#comment-15413983
 ] 

ASF GitHub Bot commented on KAFKA-3936:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1711


> Validate user parameters as early as possible
> -
>
> Key: KAFKA-3936
> URL: https://issues.apache.org/jira/browse/KAFKA-3936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Damian Guy
>Priority: Minor
>  Labels: beginner, newbie, starter
> Fix For: 0.10.1.0
>
>
> Currently, parameters handed in by the user via public API, are not 
> validated. For example {{stream.to(null)}} would fail when the underlying 
> producer gets instantiated. This result in a stack trace from deep down in 
> the library, making it hard to reason about the problem for the user.
> We want to check all given user parameters as early as possible and raise 
> corresponding (and helpful!) exceptions to explain users what the problem is, 
> and how to fix it. All parameter checks should get unit tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3936) Validate user parameters as early as possible

2016-08-09 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3936:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1711
[https://github.com/apache/kafka/pull/1711]

> Validate user parameters as early as possible
> -
>
> Key: KAFKA-3936
> URL: https://issues.apache.org/jira/browse/KAFKA-3936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Damian Guy
>Priority: Minor
>  Labels: beginner, newbie, starter
> Fix For: 0.10.1.0
>
>
> Currently, parameters handed in by the user via public API, are not 
> validated. For example {{stream.to(null)}} would fail when the underlying 
> producer gets instantiated. This result in a stack trace from deep down in 
> the library, making it hard to reason about the problem for the user.
> We want to check all given user parameters as early as possible and raise 
> corresponding (and helpful!) exceptions to explain users what the problem is, 
> and how to fix it. All parameter checks should get unit tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1711: KAFKA-3936: Validate parameters as early as possib...

2016-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1711


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1709: MINOR: Doc individual partition must fit on the se...

2016-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1709


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Guozhang Wang
+1.

On Tue, Aug 9, 2016 at 10:06 AM, Jun Rao  wrote:

> Vahid,
>
> Thanks for the clear explanation in the KIP. +1
>
> Jun
>
> On Mon, Aug 8, 2016 at 11:53 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > I would like to initiate the voting process for KIP-70 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> > Consumer%27s+Subscription+Change
> > ).
> >
> > The only issue that was discussed in the discussion thread is
> > compatibility, but because it applies to an edge case, it is not expected
> > to impact existing users.
> > The proposal was shared with Spark and Storm users and no issue was
> raised
> > by those communities.
> >
> > Thanks.
> >
> > Regards,
> > --Vahid
> >
> >
>



-- 
-- Guozhang


Jenkins build is back to normal : kafka-trunk-jdk8 #808

2016-08-09 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Joel Koshy
On the need for both leader/follower throttling: that makes sense - thanks
for clarifying. For completeness, can we add this detail to the doc - say,
after the quote that I pasted earlier?

>From an implementation perspective though: I’m still interested in the
simplicity of not having to add separate replica fetchers, delay queue on
the leader, and “move” partitions from the throttled replica fetchers to
the regular replica fetchers once caught up.

Instead, I think it would work and be simpler to include or exclude the
partitions in the fetch request from the follower and fetch response from
the leader when the quota is violated. The issue of fairness that Ben noted
may be a wash between the two options (that Ben wrote in his email). With
the default quota delay mechanism, partitions get delayed essentially at
random - i.e., whoever fetches at the time of quota violation gets delayed
at the leader. So we can adopt a similar policy in choosing to truncate
partitions in fetch responses. i.e., if at the time of handling the fetch
the “effect” replication rate exceeds the quota then either empty or
truncate those partitions from the response. (BTW effect replication is
your terminology in the wiki - i.e., replication due to partition
reassignment, adding brokers, etc.)

While this may be slightly different from the existing quota mechanism I
think the difference is small (since we would reuse the quota manager at
worst with some refactoring) and will be internal to the broker.

So I guess the question is if this alternative is simpler enough and
equally functional to not go with dedicated throttled replica fetchers.

On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao  wrote:

> Just to elaborate on what Ben said why we need throttling on both the
> leader and the follower side.
>
> If we only have throttling on the follower side, consider a case that we
> add 5 more new brokers and want to move some replicas from existing brokers
> over to those 5 brokers. Each of those broker is going to fetch data from
> all existing brokers. Then, it's possible that the aggregated fetch load
> from those 5 brokers on a particular existing broker exceeds its outgoing
> network bandwidth, even though the inbounding traffic on each of those 5
> brokers is bounded.
>
> If we only have throttling on the leader side, consider the same example
> above. It's possible for the incoming traffic to each of those 5 brokers to
> exceed its network bandwidth since it is fetching data from all existing
> brokers.
>
> So, being able to set a quota on both the follower and the leader side
> protects both cases.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford  wrote:
>
> > Hi Joel
> >
> > Thanks for taking the time to look at this. Appreciated.
> >
> > Regarding throttling on both leader and follower, this proposal covers a
> > more general solution which can guarantee a quota, even when a rebalance
> > operation produces an asymmetric profile of load. This means
> administrators
> > don’t need to calculate the impact that a follower-only quota will have
> on
> > the leaders they are fetching from. So for example where replica sizes
> are
> > skewed or where a partial rebalance is required.
> >
> > Having said that, even with both leader and follower quotas, the use of
> > additional threads is actually optional. There appear to be two general
> > approaches (1) omit partitions from fetch requests (follower) / fetch
> > responses (leader) when they exceed their quota (2) delay them, as the
> > existing quota mechanism does, using separate fetchers. Both appear
> valid,
> > but with slightly different design tradeoffs.
> >
> > The issue with approach (1) is that it departs somewhat from the existing
> > quotas implementation, and must include a notion of fairness within, the
> > now size-bounded, request and response. The issue with (2) is
> guaranteeing
> > ordering of updates when replicas shift threads, but this is handled, for
> > the most part, in the code today.
> >
> > I’ve updated the rejected alternatives section to make this a little
> > clearer.
> >
> > B
> >
> >
> >
> > > On 8 Aug 2016, at 20:38, Joel Koshy  wrote:
> > >
> > > Hi Ben,
> > >
> > > Thanks for the detailed write-up. So the proposal involves
> > self-throttling
> > > on the fetcher side and throttling at the leader. Can you elaborate on
> > the
> > > reasoning that is given on the wiki: *“The throttle is applied to both
> > > leaders and followers. This allows the admin to exert strong guarantees
> > on
> > > the throttle limit".* Is there any reason why one or the other wouldn't
> > be
> > > sufficient.
> > >
> > > Specifically, if we were to only do self-throttling on the fetchers, we
> > > could potentially avoid the additional replica fetchers right? i.e.,
> the
> > > replica fetchers would maintain its quota metrics as you proposed and
> > each
> > > (normal) replica fetch presents an 

Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-09 Thread Jun Rao
Hi, Andrey,

Thanks for the proposal. It looks good overall. Some minor comments.

1. It seems that it's bit weird that fetch.partition.max.bytes is a broker
level configuration while fetch.limit.bytes is a client side configuration.
Intuitively, it seems both should be set by the client? If we do that, one
benefit is that we can validate that fetch.limit.bytes >=
fetch.partition.max.bytes on the client side.

2. Naming wise. fetch.response.max.bytes and replica.fetch.response.max.bytes
seem to be more consistent with our current convention than
fetch.limit.bytes and replica.fetch.limit.bytes.

3. When you say "This way we can ensure that response size is less than (
*limit_bytes* + *message.max.bytes*).", it should be "less than
max(limit_bytes, message.max.bytes)", right?

Finally, KIP-73 (replication quota) is proposing a similar change to fetch
request protocol. We can probably just combine the two changes into one,
instead of bumping the fetch request version twice.

Thanks,

Jun


On Mon, Aug 8, 2016 at 10:11 AM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi all!
>
> I’ve just created KIP-74: Add Fetch Response Size Limit in Bytes.
>
> The idea is to limit client memory consumption when fetching many
> partitions (especially useful for replication).
>
> Full details are here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+
> Add+Fetch+Response+Size+Limit+in+Bytes
>
> Thanks
> Andrey.
>
>


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Jun Rao
Vahid,

Thanks for the clear explanation in the KIP. +1

Jun

On Mon, Aug 8, 2016 at 11:53 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> I would like to initiate the voting process for KIP-70 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> Consumer%27s+Subscription+Change
> ).
>
> The only issue that was discussed in the discussion thread is
> compatibility, but because it applies to an edge case, it is not expected
> to impact existing users.
> The proposal was shared with Spark and Storm users and no issue was raised
> by those communities.
>
> Thanks.
>
> Regards,
> --Vahid
>
>


Consumer Offset Migration Tool

2016-08-09 Thread Grant Henke
I had to write a simple offset migration tool and I wanted to get feedback
on whether or not this would be a useful addition to Apache Kafka.

Currently the path to upgrade from the zookeeper offsets to the Kafka
offset (and often the Scala to Java client) is via dual commit. The process
is documented here:
http://kafka.apache.org/documentation.html#offsetmigration

The reason that process wasn't sufficient in my case is because:

   - It needs to be done ahead of the upgrade
   - It requires the old client to commit at least once in dual commit mode
   - Some frameworks don't expose the dual commit functionality well
   - Dual commit is not supported in 0.8.1.x

The tool I wrote takes the relevant connection information and a consumer
group and simply copies the Zookeeper offsets into the Kafka offsets for
that group.
A rough WIP PR can be seen here: https://github.com/apache/kafka/pull/1715

Even though many users have already made the transition, I think this could
still be useful in Kafka. Here are a few reasons:

   - It simplifies the migration for users who have yet to migrate,
   especially as the old clients get deprecated and removed
   - Though the tool is not available in the Kafka 0.8.x or 0.9.x series,
   downloading and using the jar from maven would be fairly straightforward
  - Alternatively this could be a separate repo or jar, though I hardly
  want to push this single tool to maven as a standalone artifact.

Do you think this is useful in Apache Kafka? Any thoughts on the approach?

Thanks,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[GitHub] kafka pull request #1715: WIP: Add a consumer offset migration tool

2016-08-09 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1715

WIP: Add a consumer offset migration tool

Please see the dev mailing list for context. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka offset-migrator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1715


commit 6fb19323075b6caaf1e1253eff10fdfbfef85f56
Author: Grant Henke 
Date:   2016-08-09T17:00:38Z

WIP: Add a consumer offset migration tool




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (KAFKA-2629) Enable getting SSL password from an executable rather than passing plaintext password

2016-08-09 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413820#comment-15413820
 ] 

Bharat Viswanadham edited comment on KAFKA-2629 at 8/9/16 4:51 PM:
---

[~singhashish] What is the final approach for this solution?



was (Author: bharatviswa):
[~singhashish] What is the final approach for this solution?
Will also be using hadoop credential provider to address this problem?

> Enable getting SSL password from an executable rather than passing plaintext 
> password
> -
>
> Key: KAFKA-2629
> URL: https://issues.apache.org/jira/browse/KAFKA-2629
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Currently there are a couple of options to pass SSL passwords to Kafka, i.e., 
> via properties file or via command line argument. Both of these are not 
> recommended security practices.
> * A password on a command line is a no-no: it's trivial to see that password 
> just by using the 'ps' utility.
> * Putting a password into a file, and then passing the location to that file, 
> is the next best option. The access to the file will be governed by unix 
> access permissions which we all know and love. The downside is that the 
> password is still just sitting there in a file, and those who have access can 
> still see it trivially.
> * The most general, secure solution is to provide a layer of abstraction: 
> provide functionality to get the password from "somewhere else".  The most 
> flexible and generic way to do this is to simply call an executable which 
> returns the desired password. 
> ** The executable is again protected with normal file system privileges
> ** The simplest form, a script that looks like "echo 'my-password'", devolves 
> back to putting the password in a file
> ** A more interesting implementation could open up a local encrypted password 
> store and extract the password from it
> ** A maximally secure implementation could contact an external secret manager 
> with centralized control and audit functionality.
> ** In short: getting the password as the output of a script/executable is 
> maximally generic and enables both simple and complex use cases.
> This JIRA intend to add a config param to enable passing an executable to 
> Kafka for SSL passwords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2629) Enable getting SSL password from an executable rather than passing plaintext password

2016-08-09 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413820#comment-15413820
 ] 

Bharat Viswanadham commented on KAFKA-2629:
---

[~singhashish] What is the final approach for this solution?
Will also be using hadoop credential provider to address this problem?

> Enable getting SSL password from an executable rather than passing plaintext 
> password
> -
>
> Key: KAFKA-2629
> URL: https://issues.apache.org/jira/browse/KAFKA-2629
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Currently there are a couple of options to pass SSL passwords to Kafka, i.e., 
> via properties file or via command line argument. Both of these are not 
> recommended security practices.
> * A password on a command line is a no-no: it's trivial to see that password 
> just by using the 'ps' utility.
> * Putting a password into a file, and then passing the location to that file, 
> is the next best option. The access to the file will be governed by unix 
> access permissions which we all know and love. The downside is that the 
> password is still just sitting there in a file, and those who have access can 
> still see it trivially.
> * The most general, secure solution is to provide a layer of abstraction: 
> provide functionality to get the password from "somewhere else".  The most 
> flexible and generic way to do this is to simply call an executable which 
> returns the desired password. 
> ** The executable is again protected with normal file system privileges
> ** The simplest form, a script that looks like "echo 'my-password'", devolves 
> back to putting the password in a file
> ** A more interesting implementation could open up a local encrypted password 
> store and extract the password from it
> ** A maximally secure implementation could contact an external secret manager 
> with centralized control and audit functionality.
> ** In short: getting the password as the output of a script/executable is 
> maximally generic and enables both simple and complex use cases.
> This JIRA intend to add a config param to enable passing an executable to 
> Kafka for SSL passwords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2629) Enable getting SSL password from an executable rather than passing plaintext password

2016-08-09 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413818#comment-15413818
 ] 

Bharat Viswanadham commented on KAFKA-2629:
---

[~singhashish] Thank you for update.
Hope it approves and it will be in Kafka sooner.

> Enable getting SSL password from an executable rather than passing plaintext 
> password
> -
>
> Key: KAFKA-2629
> URL: https://issues.apache.org/jira/browse/KAFKA-2629
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Currently there are a couple of options to pass SSL passwords to Kafka, i.e., 
> via properties file or via command line argument. Both of these are not 
> recommended security practices.
> * A password on a command line is a no-no: it's trivial to see that password 
> just by using the 'ps' utility.
> * Putting a password into a file, and then passing the location to that file, 
> is the next best option. The access to the file will be governed by unix 
> access permissions which we all know and love. The downside is that the 
> password is still just sitting there in a file, and those who have access can 
> still see it trivially.
> * The most general, secure solution is to provide a layer of abstraction: 
> provide functionality to get the password from "somewhere else".  The most 
> flexible and generic way to do this is to simply call an executable which 
> returns the desired password. 
> ** The executable is again protected with normal file system privileges
> ** The simplest form, a script that looks like "echo 'my-password'", devolves 
> back to putting the password in a file
> ** A more interesting implementation could open up a local encrypted password 
> store and extract the password from it
> ** A maximally secure implementation could contact an external secret manager 
> with centralized control and audit functionality.
> ** In short: getting the password as the output of a script/executable is 
> maximally generic and enables both simple and complex use cases.
> This JIRA intend to add a config param to enable passing an executable to 
> Kafka for SSL passwords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Jun Rao
Just to elaborate on what Ben said why we need throttling on both the
leader and the follower side.

If we only have throttling on the follower side, consider a case that we
add 5 more new brokers and want to move some replicas from existing brokers
over to those 5 brokers. Each of those broker is going to fetch data from
all existing brokers. Then, it's possible that the aggregated fetch load
from those 5 brokers on a particular existing broker exceeds its outgoing
network bandwidth, even though the inbounding traffic on each of those 5
brokers is bounded.

If we only have throttling on the leader side, consider the same example
above. It's possible for the incoming traffic to each of those 5 brokers to
exceed its network bandwidth since it is fetching data from all existing
brokers.

So, being able to set a quota on both the follower and the leader side
protects both cases.

Thanks,

Jun

On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford  wrote:

> Hi Joel
>
> Thanks for taking the time to look at this. Appreciated.
>
> Regarding throttling on both leader and follower, this proposal covers a
> more general solution which can guarantee a quota, even when a rebalance
> operation produces an asymmetric profile of load. This means administrators
> don’t need to calculate the impact that a follower-only quota will have on
> the leaders they are fetching from. So for example where replica sizes are
> skewed or where a partial rebalance is required.
>
> Having said that, even with both leader and follower quotas, the use of
> additional threads is actually optional. There appear to be two general
> approaches (1) omit partitions from fetch requests (follower) / fetch
> responses (leader) when they exceed their quota (2) delay them, as the
> existing quota mechanism does, using separate fetchers. Both appear valid,
> but with slightly different design tradeoffs.
>
> The issue with approach (1) is that it departs somewhat from the existing
> quotas implementation, and must include a notion of fairness within, the
> now size-bounded, request and response. The issue with (2) is guaranteeing
> ordering of updates when replicas shift threads, but this is handled, for
> the most part, in the code today.
>
> I’ve updated the rejected alternatives section to make this a little
> clearer.
>
> B
>
>
>
> > On 8 Aug 2016, at 20:38, Joel Koshy  wrote:
> >
> > Hi Ben,
> >
> > Thanks for the detailed write-up. So the proposal involves
> self-throttling
> > on the fetcher side and throttling at the leader. Can you elaborate on
> the
> > reasoning that is given on the wiki: *“The throttle is applied to both
> > leaders and followers. This allows the admin to exert strong guarantees
> on
> > the throttle limit".* Is there any reason why one or the other wouldn't
> be
> > sufficient.
> >
> > Specifically, if we were to only do self-throttling on the fetchers, we
> > could potentially avoid the additional replica fetchers right? i.e., the
> > replica fetchers would maintain its quota metrics as you proposed and
> each
> > (normal) replica fetch presents an opportunity to make progress for the
> > throttled partitions as long as their effective consumption rate is below
> > the quota limit. If it exceeds the consumption rate then don’t include
> the
> > throttled partitions in the subsequent fetch requests until the effective
> > consumption rate for those partitions returns to within the quota
> threshold.
> >
> > I have more questions on the proposal, but was more interested in the
> above
> > to see if it could simplify things a bit.
> >
> > Also, can you open up access to the google-doc that you link to?
> >
> > Thanks,
> >
> > Joel
> >
> > On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford  wrote:
> >
> >> We’ve created KIP-73: Replication Quotas
> >>
> >> The idea is to allow an admin to throttle moving replicas. Full details
> >> are here:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+
> >> Replication+Quotas  >> luence/display/KAFKA/KIP-73+Replication+Quotas>
> >>
> >> Please take a look and let us know your thoughts.
> >>
> >> Thanks
> >>
> >> B
> >>
> >>
>
>


[jira] [Commented] (KAFKA-4022) TopicCommand is using default max.message.bytes instead of broker's setting

2016-08-09 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413806#comment-15413806
 ] 

Vahid Hashemian commented on KAFKA-4022:


[~kornicameis...@gmail.com] Yes, I see what you're saying. It should read from 
the actual setting (not necessarily the default) and report the warning only if 
the topic-level setting is greater than the active broker setting. I hope I got 
that right this time :)

> TopicCommand is using default max.message.bytes instead of broker's setting
> ---
>
> Key: KAFKA-4022
> URL: https://issues.apache.org/jira/browse/KAFKA-4022
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
> Environment: Linux testHost 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 
> 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Tomasz Trębski
>
> Even though it is possible to configure brokers to support message that are 
> bigger than 1MB, admin tool to create topics won't accept the one that's size 
> is above 1,000,000 bytes. 
> Responsible line can be found under this link:
> [TopicCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/TopicCommand.scala#L366]
> Console output also seems to be confirm that:
>  ERROR 
> {code:title=console_otuput|borderStyle=solid}
> 
> *** WARNING: you are creating a topic where the max.message.bytes is greater 
> than the broker  ***
> *** default. This operation is dangerous. There are two potential side 
> effects:  ***
> *** - Consumers will get failures if their fetch.message.max.bytes < the 
> value you are using ***
> *** - Producer requests larger than replica.fetch.max.bytes will not 
> replicate and hence have***
> ***   a higher risk of data loss  
>***
> *** You should ensure both of these settings are greater than the value set 
> here before using***
> *** this topic.   
>***
> 
> - value set here: 1048576
> - Default Broker replica.fetch.max.bytes: 1048576
> - Default Broker max.message.bytes: 112
> - Default Consumer fetch.message.max.bytes: 1048576
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4022) TopicCommand is using default max.message.bytes instead of broker's setting

2016-08-09 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian reassigned KAFKA-4022:
--

Assignee: Vahid Hashemian

> TopicCommand is using default max.message.bytes instead of broker's setting
> ---
>
> Key: KAFKA-4022
> URL: https://issues.apache.org/jira/browse/KAFKA-4022
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
> Environment: Linux testHost 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 
> 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Tomasz Trębski
>Assignee: Vahid Hashemian
>
> Even though it is possible to configure brokers to support message that are 
> bigger than 1MB, admin tool to create topics won't accept the one that's size 
> is above 1,000,000 bytes. 
> Responsible line can be found under this link:
> [TopicCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/TopicCommand.scala#L366]
> Console output also seems to be confirm that:
>  ERROR 
> {code:title=console_otuput|borderStyle=solid}
> 
> *** WARNING: you are creating a topic where the max.message.bytes is greater 
> than the broker  ***
> *** default. This operation is dangerous. There are two potential side 
> effects:  ***
> *** - Consumers will get failures if their fetch.message.max.bytes < the 
> value you are using ***
> *** - Producer requests larger than replica.fetch.max.bytes will not 
> replicate and hence have***
> ***   a higher risk of data loss  
>***
> *** You should ensure both of these settings are greater than the value set 
> here before using***
> *** this topic.   
>***
> 
> - value set here: 1048576
> - Default Broker replica.fetch.max.bytes: 1048576
> - Default Broker max.message.bytes: 112
> - Default Consumer fetch.message.max.bytes: 1048576
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3934) Start scripts enable GC by default with no way to disable

2016-08-09 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3934:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1631
[https://github.com/apache/kafka/pull/1631]

> Start scripts enable GC by default with no way to disable
> -
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.10.1.0
>
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3934) Start scripts enable GC by default with no way to disable

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413794#comment-15413794
 ] 

ASF GitHub Bot commented on KAFKA-3934:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1631


> Start scripts enable GC by default with no way to disable
> -
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.10.1.0
>
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1631: KAFKA-3934: Start scripts enable GC by default wit...

2016-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review request for KAFKA-3600

2016-08-09 Thread Ashish Singh
Provided wrong link to PR, here is the PR
 for KAFKA-3600.

On Tue, Aug 9, 2016 at 9:45 AM, Ashish Singh  wrote:

> Hey Guys,
>
> KAFKA-3600  was part of
> KIP-35's proposal. KAFKA-3307
> ,
> adding ApiVersionsRequest/Response, was committed to 0.10.0.0, but
> KAFKA-3600, enhancing java clients, is still under review. Here is the PR
> 
>
> I have addressed all review comments and have been waiting for further
> reviews/ this to go in for quite some time. I will really appreciate if a
> committer can help with making progress on this.
>
> --
>
> Regards,
> Ashish
>



-- 

Regards,
Ashish


[GitHub] kafka pull request #1487: MINOR: kafka-run-class.sh now runs under Cygwin.

2016-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1487


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Review request for KAFKA-3600

2016-08-09 Thread Ashish Singh
Hey Guys,

KAFKA-3600  was part of
KIP-35's proposal. KAFKA-3307
,
adding ApiVersionsRequest/Response, was committed to 0.10.0.0, but
KAFKA-3600, enhancing java clients, is still under review. Here is the PR


I have addressed all review comments and have been waiting for further
reviews/ this to go in for quite some time. I will really appreciate if a
committer can help with making progress on this.

-- 

Regards,
Ashish


[jira] [Commented] (KAFKA-2629) Enable getting SSL password from an executable rather than passing plaintext password

2016-08-09 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413738#comment-15413738
 ] 

Ashish K Singh commented on KAFKA-2629:
---

[~bharatviswa] I would like this to go in. We had quite a bit of discussion 
here, but now that it has been a while maybe we can revisit this. [~sriharsha] 
do you think it is OK to add this as an optional config? If so, would you be 
willing to be the reviewer on this? I can work on a patch if we have at least 
one committer willing to commit this.

> Enable getting SSL password from an executable rather than passing plaintext 
> password
> -
>
> Key: KAFKA-2629
> URL: https://issues.apache.org/jira/browse/KAFKA-2629
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Currently there are a couple of options to pass SSL passwords to Kafka, i.e., 
> via properties file or via command line argument. Both of these are not 
> recommended security practices.
> * A password on a command line is a no-no: it's trivial to see that password 
> just by using the 'ps' utility.
> * Putting a password into a file, and then passing the location to that file, 
> is the next best option. The access to the file will be governed by unix 
> access permissions which we all know and love. The downside is that the 
> password is still just sitting there in a file, and those who have access can 
> still see it trivially.
> * The most general, secure solution is to provide a layer of abstraction: 
> provide functionality to get the password from "somewhere else".  The most 
> flexible and generic way to do this is to simply call an executable which 
> returns the desired password. 
> ** The executable is again protected with normal file system privileges
> ** The simplest form, a script that looks like "echo 'my-password'", devolves 
> back to putting the password in a file
> ** A more interesting implementation could open up a local encrypted password 
> store and extract the password from it
> ** A maximally secure implementation could contact an external secret manager 
> with centralized control and audit functionality.
> ** In short: getting the password as the output of a script/executable is 
> maximally generic and enables both simple and complex use cases.
> This JIRA intend to add a config param to enable passing an executable to 
> Kafka for SSL passwords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Jay Kreps
+1

On Mon, Aug 8, 2016 at 11:53 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> I would like to initiate the voting process for KIP-70 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> Consumer%27s+Subscription+Change
> ).
>
> The only issue that was discussed in the discussion thread is
> compatibility, but because it applies to an edge case, it is not expected
> to impact existing users.
> The proposal was shared with Spark and Storm users and no issue was raised
> by those communities.
>
> Thanks.
>
> Regards,
> --Vahid
>
>


[jira] [Updated] (KAFKA-3934) Start scripts enable GC by default with no way to disable

2016-08-09 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3934:
---
Summary: Start scripts enable GC by default with no way to disable  (was: 
kafka-server-start.sh enables GC by default with no way to disable)

> Start scripts enable GC by default with no way to disable
> -
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-72 Allow Sizing Incoming Request Queue in Bytes

2016-08-09 Thread Jun Rao
Radi,

Yes, I got the benefit of bounding the request queue by bytes. My concern
is the following if we don't change the behavior of processor blocking on
queue full.

If the broker truly doesn't have enough memory for buffering outstanding
requests from all connections, we have to either hit OOM or block the
processor. Both will be bad. I am not sure if one is clearly better than
the other. In this case, the solution is probably to expand the cluster to
reduce the per broker request load.

If the broker actually has enough memory, we want to be able to configure
the request queue in such a way that it never blocks. You can tell people
to just set the request queue to be unbounded, which may scare them. If we
do want to put a bound, it seems it's easier to configure the queue size
based on # requests. Basically, we can tell people to set the queue size
based on number of connections. If the queue is based on bytes, it's not
clear how people should set it w/o causing the processor to block.

Finally, Rajini has a good point. The ByteBuffer in the request object is
allocated as soon as we see the first 4 bytes from the socket. So, I am not
sure if just bounding the request queue itself is enough to bound the
memory related to requests.

Thanks,

Jun



On Mon, Aug 8, 2016 at 4:46 PM, radai  wrote:

> I agree that filling up the request queue can cause clients to time out
> (and presumably retry?). However, for the workloads where we expect this
> configuration to be useful the alternative is currently an OOM crash.
> In my opinion an initial implementation of this feature could be
> constrained to a simple drop-in replacement of ArrayBlockingQueue
> (conditional, opt-in) and further study of behavior patterns under load can
> drive future changes to the API later when those behaviors are better
> understood (like back-pressure, nop filler responses to avoid client
> timeouts or whatever).
>
> On Mon, Aug 8, 2016 at 2:23 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com>
> wrote:
>
> > Nice write up Radai.
> > I think what Jun said is a valid concern.
> > If I am not wrong as per the proposal, we are depending on the entire
> > pipeline to flow smoothly from accepting requests to handling it, calling
> > KafkaApis and handing back the responses.
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Mon, Aug 8, 2016 at 12:22 PM, Joel Koshy  wrote:
> >
> > > >
> > > > .
> > > >>
> > > >>
> > > > Hi Becket,
> > > >
> > > > I don't think progress can be made in the processor's run loop if the
> > > > queue fills up. i.e., I think Jun's point is that if the queue is
> full
> > > > (either due to the proposed max.bytes or today due to max.requests
> > > hitting
> > > > the limit) then processCompletedReceives will block and no further
> > > progress
> > > > can be made.
> > > >
> > >
> > > I'm sorry - this isn't right. There will be progress as long as the API
> > > handlers are able to pick requests off the request queue and add the
> > > responses to the response queues (which are effectively unbounded).
> > > However, the point is valid that blocking in the request channel's put
> > has
> > > the effect of exacerbating the pressure on the socket server.
> > >
> > >
> > > >
> > > >>
> > > >> On Mon, Aug 8, 2016 at 10:04 AM, Jun Rao  wrote:
> > > >>
> > > >> > Radai,
> > > >> >
> > > >> > Thanks for the proposal. A couple of comments on this.
> > > >> >
> > > >> > 1. Since we store request objects in the request queue, how do we
> > get
> > > an
> > > >> > accurate size estimate for those requests?
> > > >> >
> > > >> > 2. Currently, it's bad if the processor blocks on adding a request
> > to
> > > >> the
> > > >> > request queue. Once blocked, the processor can't process the
> sending
> > > of
> > > >> > responses of other socket keys either. This will cause all clients
> > in
> > > >> this
> > > >> > processor with an outstanding request to eventually timeout.
> > > Typically,
> > > >> > this will trigger client-side retries, which will add more load on
> > the
> > > >> > broker and cause potentially more congestion in the request queue.
> > > With
> > > >> > queued.max.requests, to prevent blocking on the request queue, our
> > > >> > recommendation is to configure queued.max.requests to be the same
> as
> > > the
> > > >> > number of socket connections on the broker. Since the broker never
> > > >> > processes more than 1 request per connection at a time, the
> request
> > > >> queue
> > > >> > will never be blocked. With queued.max.bytes, it's going to be
> > harder
> > > to
> > > >> > configure the value properly to prevent blocking.
> > > >> >
> > > >> > So, while adding queued.max.bytes is potentially useful for memory
> > > >> > management, for it to be truly useful, we probably need to address
> > the
> > > >> > processor blocking issue for it to be really useful in practice.
> One
> > > >> > possibility is to put back-pressure to the client 

Re: [DISCUSS] KIP-72 Allow Sizing Incoming Request Queue in Bytes

2016-08-09 Thread Rajini Sivaram
I like the simplicity of the approach and can see that it is an improvement
over the current implementation in typical scenarios. But I would like to
see Jun's proposal to mute sockets explored further. With the proposal in
the KIP to limit queue size, I am not sure how to calculate the total
memory requirements for brokers to avoid OOM, particularly to avoid DoS
attacks in a secure cluster. Even though the KIP says, the actual memory
bound is "queued.max.bytes + socket.request.max.bytes",  it seems to me
that even with the queue limits, brokers will need close to
(socket.request.max.bytes
* n) bytes of memory to guarantee they don't OOM (where n is the number of
connections). That upper limit is no different to the current
implementation. With client quotas that delay responses and client requests
that may be retried due to blocking processors, as soon as there is space
in the request queue that unblocks the processor, the next select operation
could read in a lot of data from a lot of connections. So controlling the
amount of data read from the socket could be as important as controlling
the request queue size.

Don't mind this being done in two stages, but it will be good to understand
what the final solution would look like before this one is committed.


On Tue, Aug 9, 2016 at 1:58 AM, Becket Qin  wrote:

> Thought about this again. If I understand correctly Jun's concern is about
> the cascading effect. Currently the processor will try to put all the
> requests received in one poll() call into the RequestChannel. This could
> potentially be long if the queue is moving really really slowly. If we
> don't mute the sockets and clients time out then reconnect, the processor
> may read more requests from the new sockets and take even longer to put
> them into the RequestChannel. So it would be good see if we can avoid this
> cascading effects.
>
> On the other end, allowing the processor to keep reading requests from the
> sockets when the RequestChannel is full also seems hurting the memory usage
> control effort.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Aug 8, 2016 at 4:46 PM, radai  wrote:
>
> > I agree that filling up the request queue can cause clients to time out
> > (and presumably retry?). However, for the workloads where we expect this
> > configuration to be useful the alternative is currently an OOM crash.
> > In my opinion an initial implementation of this feature could be
> > constrained to a simple drop-in replacement of ArrayBlockingQueue
> > (conditional, opt-in) and further study of behavior patterns under load
> can
> > drive future changes to the API later when those behaviors are better
> > understood (like back-pressure, nop filler responses to avoid client
> > timeouts or whatever).
> >
> > On Mon, Aug 8, 2016 at 2:23 PM, Mayuresh Gharat <
> > gharatmayures...@gmail.com>
> > wrote:
> >
> > > Nice write up Radai.
> > > I think what Jun said is a valid concern.
> > > If I am not wrong as per the proposal, we are depending on the entire
> > > pipeline to flow smoothly from accepting requests to handling it,
> calling
> > > KafkaApis and handing back the responses.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Mon, Aug 8, 2016 at 12:22 PM, Joel Koshy 
> wrote:
> > >
> > > > >
> > > > > .
> > > > >>
> > > > >>
> > > > > Hi Becket,
> > > > >
> > > > > I don't think progress can be made in the processor's run loop if
> the
> > > > > queue fills up. i.e., I think Jun's point is that if the queue is
> > full
> > > > > (either due to the proposed max.bytes or today due to max.requests
> > > > hitting
> > > > > the limit) then processCompletedReceives will block and no further
> > > > progress
> > > > > can be made.
> > > > >
> > > >
> > > > I'm sorry - this isn't right. There will be progress as long as the
> API
> > > > handlers are able to pick requests off the request queue and add the
> > > > responses to the response queues (which are effectively unbounded).
> > > > However, the point is valid that blocking in the request channel's
> put
> > > has
> > > > the effect of exacerbating the pressure on the socket server.
> > > >
> > > >
> > > > >
> > > > >>
> > > > >> On Mon, Aug 8, 2016 at 10:04 AM, Jun Rao 
> wrote:
> > > > >>
> > > > >> > Radai,
> > > > >> >
> > > > >> > Thanks for the proposal. A couple of comments on this.
> > > > >> >
> > > > >> > 1. Since we store request objects in the request queue, how do
> we
> > > get
> > > > an
> > > > >> > accurate size estimate for those requests?
> > > > >> >
> > > > >> > 2. Currently, it's bad if the processor blocks on adding a
> request
> > > to
> > > > >> the
> > > > >> > request queue. Once blocked, the processor can't process the
> > sending
> > > > of
> > > > >> > responses of other socket keys either. This will cause all
> clients
> > > in
> > > > >> this
> > > > >> > processor with an outstanding request to eventually 

Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Dana Powers
+1 (still confused about bindings)

On Tue, Aug 9, 2016 at 6:24 AM, Ismael Juma  wrote:
> Thanks for the KIP. +1 (binding)
>
> Ismael
>
> On Mon, Aug 8, 2016 at 7:53 PM, Vahid S Hashemian > wrote:
>
>> I would like to initiate the voting process for KIP-70 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 70%3A+Revise+Partition+Assignment+Semantics+on+New+
>> Consumer%27s+Subscription+Change
>> ).
>>
>> The only issue that was discussed in the discussion thread is
>> compatibility, but because it applies to an edge case, it is not expected
>> to impact existing users.
>> The proposal was shared with Spark and Storm users and no issue was raised
>> by those communities.
>>
>> Thanks.
>>
>> Regards,
>> --Vahid
>>
>>


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Ismael Juma
Thanks for the KIP. +1 (binding)

Ismael

On Mon, Aug 8, 2016 at 7:53 PM, Vahid S Hashemian  wrote:

> I would like to initiate the voting process for KIP-70 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> Consumer%27s+Subscription+Change
> ).
>
> The only issue that was discussed in the discussion thread is
> compatibility, but because it applies to an edge case, it is not expected
> to impact existing users.
> The proposal was shared with Spark and Storm users and no issue was raised
> by those communities.
>
> Thanks.
>
> Regards,
> --Vahid
>
>


Re: [VOTE] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-08-09 Thread Rajini Sivaram
Vahid, Thanks for the KIP.

+1 (non-binding)

On Mon, Aug 8, 2016 at 8:09 PM, Jason Gustafson  wrote:

> Thanks Vahid. +1 from me (non-binding).
>
> On Mon, Aug 8, 2016 at 11:53 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > I would like to initiate the voting process for KIP-70 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 70%3A+Revise+Partition+Assignment+Semantics+on+New+
> > Consumer%27s+Subscription+Change
> > ).
> >
> > The only issue that was discussed in the discussion thread is
> > compatibility, but because it applies to an edge case, it is not expected
> > to impact existing users.
> > The proposal was shared with Spark and Storm users and no issue was
> raised
> > by those communities.
> >
> > Thanks.
> >
> > Regards,
> > --Vahid
> >
> >
>



-- 
Regards,

Rajini


ISR issues

2016-08-09 Thread Pascu, Ciprian (Nokia - FI/Espoo)
Hi,

We are testing Kafka cluster with high traffic loads (2+ messages/second) 
and we encounter quite frequently issues with brokers dropping persistently 
from ISR for some partitions. Looking at the code, I noticed in class 
AbstractFetcherThread , processFetchRequest method, that a KafkaException is 
thrown, in case some other exception than CorruptRecordException is generated 
from processPartitionData method. In my understanding, this will cause the 
fetcher thread to end and thus the replica update it was doing will stop and 
the broker will be removed from some ISR lists. Couldn't we, in this case also, 
just log some error message and update 'partitionsWithError' (like it's done in 
the 'case OFFSET_OUT_OF_RANGE'  and the 'case _' branches)?


Ciprian.




Re: [DISCUSS] KIP-55: Secure quotas for authenticated users

2016-08-09 Thread Rajini Sivaram
Hi Tom,

Have updated the KIP wiki. Will submit a PR later this week.

Regards,

Rajini

On Tue, Aug 9, 2016 at 12:30 PM, Tom Crayford  wrote:

> Seeing as voting passed on this, can somebody with access update the wiki?
>
> Is there code for this KIP in a PR somewhere that needs merging?
>
> Thanks
> Tom Crayford
> Heroku Kafka
>
> On Friday, 1 July 2016, Rajini Sivaram 
> wrote:
>
> > Thank you, Jun.
> >
> > Hi all,
> >
> > Please let me know if you have any comments or suggestions on the updated
> > KIP. If there are no objections, I will initiate voting next week.
> >
> > Thank you...
> >
> >
> > On Thu, Jun 30, 2016 at 10:37 PM, Jun Rao  >
> > wrote:
> >
> > > Rajini,
> > >
> > > The latest wiki looks good to me. Perhaps you want to ask other people
> to
> > > also take a look and then we can start the voting.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Jun 28, 2016 at 6:27 AM, Rajini Sivaram <
> > > rajinisiva...@googlemail.com > wrote:
> > >
> > > > Jun,
> > > >
> > > > Thank you for the review. I have changed all default property configs
> > to
> > > be
> > > > stored with the node name . So the defaults are
> > > > /config/clients/ for default client-id quota,
> > > > /config/users/ for default user quota and
> > > > /config/users/ for default  client-id>
> > > > quota. Hope that makes sense.
> > > >
> > > > On Mon, Jun 27, 2016 at 10:25 PM, Jun Rao  > > wrote:
> > > >
> > > > > Rajini,
> > > > >
> > > > > Thanks for the update. Looks good to me. My only comment is that
> > > > > instead of /config/users//clients,
> > > > > would it be better to represent it as
> > > > > /config/users//clients/
> > > > > so that it's more consistent?
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Jun 23, 2016 at 2:16 PM, Rajini Sivaram <
> > > > > rajinisiva...@googlemail.com > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > Yes, I agree that it makes sense to retain the existing semantics
> > for
> > > > > > client-id quotas for compatibility. Especially if we can provide
> > the
> > > > > option
> > > > > > to enable secure client-id quotas for multi-user clusters as
> well.
> > > > > >
> > > > > > I have updated the KIP - each of these levels can have defaults
> as
> > > well
> > > > > as
> > > > > > specific entries:
> > > > > >
> > > > > >- /config/clients : Insecure  quotas with the same
> > > > > semantics
> > > > > >as now
> > > > > >- /config/users: User quotas
> > > > > >- /config/users/userA/clients:  quotas for
> > userA
> > > > > >- /config/users//clients: Default 
> > > quotas
> > > > > >
> > > > > > Now it is fully flexible as well as compatible with the current
> > > > > > implementation. I used /config/users//clients rather
> than
> > > > > > /config/users/clients since "clients" is a valid (unlikely, but
> > still
> > > > > > possible) user principal. I used , but it could be
> > anything
> > > > that
> > > > > > is a valid Zookeeper node name, but not a valid URL-encoded name.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Thu, Jun 23, 2016 at 3:43 PM, Jun Rao  > > wrote:
> > > > > >
> > > > > > > Hi, Rajini,
> > > > > > >
> > > > > > > For the following statements, would it be better to allocate
> the
> > > > quota
> > > > > to
> > > > > > > all connections whose client-id is clientX? This way, existing
> > > > > client-id
> > > > > > > quotas are fully compatible in the new release whether the
> > cluster
> > > is
> > > > > in
> > > > > > a
> > > > > > > single user or multi-user environment.
> > > > > > >
> > > > > > > 4. If client-id quota override is defined for clientX in
> > > > > > > /config/clients/clientX, this quota is allocated for the sole
> use
> > > of
> > > > > > >  > > > > > > clientX>
> > > > > > > 5. If dynamic client-id default is configured in
> /config/clients,
> > > > this
> > > > > > > default quota is allocated for the sole use of 
> > > > > > > 6. If quota.producer.default is configured for the broker in
> > > > > > > server.properties, this default quota is allocated for the sole
> > use
> > > > of
> > > > > > >  > > > > > > clientX>
> > > > > > >
> > > > > > > We can potentially add a default quota for both user and client
> > at
> > > > path
> > > > > > > /config/users/clients?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Wed, Jun 22, 2016 at 3:01 AM, Rajini Sivaram <
> > > > > > > rajinisiva...@googlemail.com > wrote:
> > > > > > >
> > > > > > > > Ismael, Jun,
> > > > > > > >
> > > > > > > > Thank you both for the feedback. Have updated the KIP to add
> > > > dynamic
> > > > > > > > default quotas for client-id with deprecation of existing
> > 

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-09 Thread Ben Stopford
Hi Joel

Thanks for taking the time to look at this. Appreciated. 

Regarding throttling on both leader and follower, this proposal covers a more 
general solution which can guarantee a quota, even when a rebalance operation 
produces an asymmetric profile of load. This means administrators don’t need to 
calculate the impact that a follower-only quota will have on the leaders they 
are fetching from. So for example where replica sizes are skewed or where a 
partial rebalance is required.

Having said that, even with both leader and follower quotas, the use of 
additional threads is actually optional. There appear to be two general 
approaches (1) omit partitions from fetch requests (follower) / fetch responses 
(leader) when they exceed their quota (2) delay them, as the existing quota 
mechanism does, using separate fetchers. Both appear valid, but with slightly 
different design tradeoffs. 

The issue with approach (1) is that it departs somewhat from the existing 
quotas implementation, and must include a notion of fairness within, the now 
size-bounded, request and response. The issue with (2) is guaranteeing ordering 
of updates when replicas shift threads, but this is handled, for the most part, 
in the code today. 

I’ve updated the rejected alternatives section to make this a little clearer. 

B



> On 8 Aug 2016, at 20:38, Joel Koshy  wrote:
> 
> Hi Ben,
> 
> Thanks for the detailed write-up. So the proposal involves self-throttling
> on the fetcher side and throttling at the leader. Can you elaborate on the
> reasoning that is given on the wiki: *“The throttle is applied to both
> leaders and followers. This allows the admin to exert strong guarantees on
> the throttle limit".* Is there any reason why one or the other wouldn't be
> sufficient.
> 
> Specifically, if we were to only do self-throttling on the fetchers, we
> could potentially avoid the additional replica fetchers right? i.e., the
> replica fetchers would maintain its quota metrics as you proposed and each
> (normal) replica fetch presents an opportunity to make progress for the
> throttled partitions as long as their effective consumption rate is below
> the quota limit. If it exceeds the consumption rate then don’t include the
> throttled partitions in the subsequent fetch requests until the effective
> consumption rate for those partitions returns to within the quota threshold.
> 
> I have more questions on the proposal, but was more interested in the above
> to see if it could simplify things a bit.
> 
> Also, can you open up access to the google-doc that you link to?
> 
> Thanks,
> 
> Joel
> 
> On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford  wrote:
> 
>> We’ve created KIP-73: Replication Quotas
>> 
>> The idea is to allow an admin to throttle moving replicas. Full details
>> are here:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+
>> Replication+Quotas > luence/display/KAFKA/KIP-73+Replication+Quotas>
>> 
>> Please take a look and let us know your thoughts.
>> 
>> Thanks
>> 
>> B
>> 
>> 



Re: [DISCUSS] KIP-55: Secure quotas for authenticated users

2016-08-09 Thread Tom Crayford
Seeing as voting passed on this, can somebody with access update the wiki?

Is there code for this KIP in a PR somewhere that needs merging?

Thanks
Tom Crayford
Heroku Kafka

On Friday, 1 July 2016, Rajini Sivaram  wrote:

> Thank you, Jun.
>
> Hi all,
>
> Please let me know if you have any comments or suggestions on the updated
> KIP. If there are no objections, I will initiate voting next week.
>
> Thank you...
>
>
> On Thu, Jun 30, 2016 at 10:37 PM, Jun Rao >
> wrote:
>
> > Rajini,
> >
> > The latest wiki looks good to me. Perhaps you want to ask other people to
> > also take a look and then we can start the voting.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jun 28, 2016 at 6:27 AM, Rajini Sivaram <
> > rajinisiva...@googlemail.com > wrote:
> >
> > > Jun,
> > >
> > > Thank you for the review. I have changed all default property configs
> to
> > be
> > > stored with the node name . So the defaults are
> > > /config/clients/ for default client-id quota,
> > > /config/users/ for default user quota and
> > > /config/users/ for default 
> > > quota. Hope that makes sense.
> > >
> > > On Mon, Jun 27, 2016 at 10:25 PM, Jun Rao  > wrote:
> > >
> > > > Rajini,
> > > >
> > > > Thanks for the update. Looks good to me. My only comment is that
> > > > instead of /config/users//clients,
> > > > would it be better to represent it as
> > > > /config/users//clients/
> > > > so that it's more consistent?
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Jun 23, 2016 at 2:16 PM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > Yes, I agree that it makes sense to retain the existing semantics
> for
> > > > > client-id quotas for compatibility. Especially if we can provide
> the
> > > > option
> > > > > to enable secure client-id quotas for multi-user clusters as well.
> > > > >
> > > > > I have updated the KIP - each of these levels can have defaults as
> > well
> > > > as
> > > > > specific entries:
> > > > >
> > > > >- /config/clients : Insecure  quotas with the same
> > > > semantics
> > > > >as now
> > > > >- /config/users: User quotas
> > > > >- /config/users/userA/clients:  quotas for
> userA
> > > > >- /config/users//clients: Default 
> > quotas
> > > > >
> > > > > Now it is fully flexible as well as compatible with the current
> > > > > implementation. I used /config/users//clients rather than
> > > > > /config/users/clients since "clients" is a valid (unlikely, but
> still
> > > > > possible) user principal. I used , but it could be
> anything
> > > that
> > > > > is a valid Zookeeper node name, but not a valid URL-encoded name.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Thu, Jun 23, 2016 at 3:43 PM, Jun Rao  > wrote:
> > > > >
> > > > > > Hi, Rajini,
> > > > > >
> > > > > > For the following statements, would it be better to allocate the
> > > quota
> > > > to
> > > > > > all connections whose client-id is clientX? This way, existing
> > > > client-id
> > > > > > quotas are fully compatible in the new release whether the
> cluster
> > is
> > > > in
> > > > > a
> > > > > > single user or multi-user environment.
> > > > > >
> > > > > > 4. If client-id quota override is defined for clientX in
> > > > > > /config/clients/clientX, this quota is allocated for the sole use
> > of
> > > > > >  > > > > > clientX>
> > > > > > 5. If dynamic client-id default is configured in /config/clients,
> > > this
> > > > > > default quota is allocated for the sole use of 
> > > > > > 6. If quota.producer.default is configured for the broker in
> > > > > > server.properties, this default quota is allocated for the sole
> use
> > > of
> > > > > >  > > > > > clientX>
> > > > > >
> > > > > > We can potentially add a default quota for both user and client
> at
> > > path
> > > > > > /config/users/clients?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Wed, Jun 22, 2016 at 3:01 AM, Rajini Sivaram <
> > > > > > rajinisiva...@googlemail.com > wrote:
> > > > > >
> > > > > > > Ismael, Jun,
> > > > > > >
> > > > > > > Thank you both for the feedback. Have updated the KIP to add
> > > dynamic
> > > > > > > default quotas for client-id with deprecation of existing
> static
> > > > > default
> > > > > > > properties.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 22, 2016 at 12:50 AM, Jun Rao  >
> > > wrote:
> > > > > > >
> > > > > > > > Yes, for consistency, perhaps we can allow client-id quota to
> > be
> > > > > > > configured
> > > > > > > > dynamically too and mark the static config in the broker as
> > > > > deprecated.
> > > > > > > If
> > > > > > > > both are set, the dynamic one wins.
> > 

[jira] [Commented] (KAFKA-4022) TopicCommand is using default max.message.bytes instead of broker's setting

2016-08-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413088#comment-15413088
 ] 

Tomasz Trębski commented on KAFKA-4022:
---

[~vahid] , so maybe I was misguided by the error. Basically I am trying to 
create topic via ansible using command (not sure if you are familiar with that).
And the simple fact that is now display prompts awaiting user answer made my 
role useless as it was not able to cope with that the prompt. 

Anyway, only after I read you response I realized that there's a prompt 
actually and I started to dig out how to make that in ansible.

BTW, I pointed that IMHO this code should read values from broker's actual 
configuration instead of defaults.
Does it make any sense at all ?

> TopicCommand is using default max.message.bytes instead of broker's setting
> ---
>
> Key: KAFKA-4022
> URL: https://issues.apache.org/jira/browse/KAFKA-4022
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
> Environment: Linux testHost 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 
> 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Tomasz Trębski
>
> Even though it is possible to configure brokers to support message that are 
> bigger than 1MB, admin tool to create topics won't accept the one that's size 
> is above 1,000,000 bytes. 
> Responsible line can be found under this link:
> [TopicCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/TopicCommand.scala#L366]
> Console output also seems to be confirm that:
>  ERROR 
> {code:title=console_otuput|borderStyle=solid}
> 
> *** WARNING: you are creating a topic where the max.message.bytes is greater 
> than the broker  ***
> *** default. This operation is dangerous. There are two potential side 
> effects:  ***
> *** - Consumers will get failures if their fetch.message.max.bytes < the 
> value you are using ***
> *** - Producer requests larger than replica.fetch.max.bytes will not 
> replicate and hence have***
> ***   a higher risk of data loss  
>***
> *** You should ensure both of these settings are greater than the value set 
> here before using***
> *** this topic.   
>***
> 
> - value set here: 1048576
> - Default Broker replica.fetch.max.bytes: 1048576
> - Default Broker max.message.bytes: 112
> - Default Consumer fetch.message.max.bytes: 1048576
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #807

2016-08-09 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Remove redundant clause in secureAclsEnabled check

[me] KAFKA-3954; Consumer should use internal topics information returned by

--
[...truncated 8287 lines...]

org.apache.kafka.common.metrics.JmxReporterTest > testJmxRegistration STARTED

org.apache.kafka.common.metrics.JmxReporterTest > testJmxRegistration PASSED

org.apache.kafka.common.metrics.stats.HistogramTest > testHistogram STARTED

org.apache.kafka.common.metrics.stats.HistogramTest > testHistogram PASSED

org.apache.kafka.common.metrics.stats.HistogramTest > testConstantBinScheme 
STARTED

org.apache.kafka.common.metrics.stats.HistogramTest > testConstantBinScheme 
PASSED

org.apache.kafka.common.metrics.stats.HistogramTest > testLinearBinScheme 
STARTED

org.apache.kafka.common.metrics.stats.HistogramTest > testLinearBinScheme PASSED

org.apache.kafka.common.utils.CrcTest > testUpdateInt STARTED

org.apache.kafka.common.utils.CrcTest > testUpdateInt PASSED

org.apache.kafka.common.utils.CrcTest > testUpdate STARTED

org.apache.kafka.common.utils.CrcTest > testUpdate PASSED

org.apache.kafka.common.utils.UtilsTest > testAbs STARTED

org.apache.kafka.common.utils.UtilsTest > testAbs PASSED

org.apache.kafka.common.utils.UtilsTest > testMin STARTED

org.apache.kafka.common.utils.UtilsTest > testMin PASSED

org.apache.kafka.common.utils.UtilsTest > testJoin STARTED

org.apache.kafka.common.utils.UtilsTest > testJoin PASSED

org.apache.kafka.common.utils.UtilsTest > testReadBytes STARTED

org.apache.kafka.common.utils.UtilsTest > testReadBytes PASSED

org.apache.kafka.common.utils.UtilsTest > testGetHost STARTED

org.apache.kafka.common.utils.UtilsTest > testGetHost PASSED

org.apache.kafka.common.utils.UtilsTest > testGetPort STARTED

org.apache.kafka.common.utils.UtilsTest > testGetPort PASSED

org.apache.kafka.common.utils.UtilsTest > testCloseAll STARTED

org.apache.kafka.common.utils.UtilsTest > testCloseAll PASSED

org.apache.kafka.common.utils.UtilsTest > testFormatAddress STARTED

org.apache.kafka.common.utils.UtilsTest > testFormatAddress PASSED

org.apache.kafka.common.utils.AbstractIteratorTest > testEmptyIterator STARTED

org.apache.kafka.common.utils.AbstractIteratorTest > testEmptyIterator PASSED

org.apache.kafka.common.utils.AbstractIteratorTest > testIterator STARTED

org.apache.kafka.common.utils.AbstractIteratorTest > testIterator PASSED

org.apache.kafka.common.ClusterTest > testBootstrap STARTED

org.apache.kafka.common.ClusterTest > testBootstrap PASSED

org.apache.kafka.common.cache.LRUCacheTest > testEviction STARTED

org.apache.kafka.common.cache.LRUCacheTest > testEviction PASSED

org.apache.kafka.common.cache.LRUCacheTest > testPutGet STARTED

org.apache.kafka.common.cache.LRUCacheTest > testPutGet PASSED

org.apache.kafka.common.cache.LRUCacheTest > testRemove STARTED

org.apache.kafka.common.cache.LRUCacheTest > testRemove PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequestedValidProvided STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequestedValidProvided PASSED

org.apache.kafka.common.network.SslTransportLayerTest > testUnsupportedCiphers 
STARTED

org.apache.kafka.common.network.SslTransportLayerTest > testUnsupportedCiphers 
PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testUnsupportedTLSVersion STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testUnsupportedTLSVersion PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequiredNotProvided STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequiredNotProvided PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequestedNotProvided STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequestedNotProvided PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testInvalidKeystorePassword STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testInvalidKeystorePassword PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationDisabledNotProvided STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationDisabledNotProvided PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testInvalidEndpointIdentification STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testInvalidEndpointIdentification PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testEndpointIdentificationDisabled STARTED

org.apache.kafka.common.network.SslTransportLayerTest > 
testEndpointIdentificationDisabled PASSED

org.apache.kafka.common.network.SslTransportLayerTest > 
testClientAuthenticationRequiredUntrustedProvided STARTED

org.apache.kafka.common.network.SslTransportLayerTest