[jira] [Commented] (KAFKA-3852) Clarify how to handle message format upgrade without killing performance

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395092#comment-15395092
 ] 

Ewen Cheslack-Postava commented on KAFKA-3852:
--

I just grabbed it to try to get to it tomorrow. If you want to take a stab, 
feel free. I worked out KAFKA-3851 first. I'll look again tomorrow, we can 
coordinate then if you're already making progress.

> Clarify how to handle message format upgrade without killing performance
> 
>
> Key: KAFKA-3852
> URL: https://issues.apache.org/jira/browse/KAFKA-3852
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> The upgrade notes re: performance impact of the message format change in 
> 0.10.0.0 are a bit wishy washy in terms of what you ​*should*​ do. They 
> describe the potential impact and then say (at the end of a paragraph): "To 
> avoid such message conversion before consumers are upgraded to 0.10.0.0, one 
> can set". This should probably be written as a playbook for doing the upgrade 
> without affecting performance and should be a recommendation, not presented 
> as just an option. Nobody with an existing cluster wants the perf impact of 
> switching their brokers to the new format while consumers are still using the 
> old format unless they have a cluster with extremely light load.
> Additionally, if we have some simple benchmark numbers on the performance 
> impact on brokers, we should include that information. Today it is very 
> unclear how bad that change will be -- 5%, 20%, 90% worse?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3851) Add references to important installation/upgrade notes to release notes

2016-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395078#comment-15395078
 ] 

ASF GitHub Bot commented on KAFKA-3851:
---

GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/1670

KAFKA-3851: Automate release notes and include links to upgrade notes for 
release and most recent docs to forward users of older releases to newest docs.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka kafka-3851-automate-release-notes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1670


commit b2ecaa27aca21f715bd4dc88ce8f898a7ccfcaab
Author: Ewen Cheslack-Postava 
Date:   2016-07-27T05:33:29Z

KAFKA-3851: Automate release notes and include links to upgrade notes for 
release and most recent docs to forward users of older releases to newest docs.




> Add references to important installation/upgrade notes to release notes 
> 
>
> Key: KAFKA-3851
> URL: https://issues.apache.org/jira/browse/KAFKA-3851
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Today we use the release notes exactly as exported from JIRA (see "Prepare 
> release notes" on 
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process) and then 
> rely on users to dig through our documentation (none of which starts with 
> release-specific notes) to upgrade and installation notes.
> Especially for folks who aren't intimately familiar with our docs, the 
> information is *very* easy to miss. Ideally we could automate the release 
> notes process a bit and then have them automatically modified to *at least* 
> include links at the very top (it'd be nice if we had some other header 
> material added as well since the automatically generated release notes are 
> very barebones...). Even better would be if we could pull in the 
> version-specific installation/upgrade notes directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1670: KAFKA-3851: Automate release notes and include lin...

2016-07-26 Thread ewencp
GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/1670

KAFKA-3851: Automate release notes and include links to upgrade notes for 
release and most recent docs to forward users of older releases to newest docs.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka kafka-3851-automate-release-notes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1670


commit b2ecaa27aca21f715bd4dc88ce8f898a7ccfcaab
Author: Ewen Cheslack-Postava 
Date:   2016-07-27T05:33:29Z

KAFKA-3851: Automate release notes and include links to upgrade notes for 
release and most recent docs to forward users of older releases to newest docs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Kafka 0.10.0.1 plan

2016-07-26 Thread Ismael Juma
Yes, I'll do it. I will certainly take you up on your offer. :)

Ismael

On Wed, Jul 27, 2016 at 3:58 AM, Gwen Shapira  wrote:

> Thanks Ismael :)
>
> I assume you are managing the release (i.e. rolling the RCs and
> starting the vote)? Let me know if you need help with the RC rollout.
> I did it fairly recently (41 bugs ago?)
>
> Gwen
>
> On Tue, Jul 26, 2016 at 7:55 PM, Ismael Juma  wrote:
> > Hi everyone,
> >
> > We have fixed 41 JIRAs[1] (including a few critical bugs) in the 0.10.0
> > branch since 0.10.0.0 was released and it would be good to release
> 0.10.0.1
> > soon. The current list of issues that have 0.10.0.1 as the target version
> > can be found below:
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%200.10.0.1%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC
> >
> > Once we have resolved the outstanding issues (note that a couple of them
> > are documentation JIRAs), we will start the release process.
> >
> > Ismael
> >
> > [1]
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.10.0.1%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC
>


[jira] [Updated] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3996:
---
Reviewer: Jun Rao
  Status: Patch Available  (was: In Progress)

> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1669: KAFKA-3996: ByteBufferMessageSet.writeTo() should ...

2016-07-26 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1669

KAFKA-3996: ByteBufferMessageSet.writeTo() should be non-blocking

Also:
* Introduce a blocking variant to be used by `FileMessageSet.append`,
* Add tests
* Minor clean-ups

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-3996-byte-buffer-message-set-write-to-non-blocking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1669


commit c057a92a6f296d0b54946452626c2d5aba9513ee
Author: Ismael Juma 
Date:   2016-07-27T05:13:32Z

KAFKA-3996: ByteBufferMessageSet.writeTo() should be non-blocking

Also:
* Introduce a blocking variant to be used by `FileMessageSet.append`,
* Add tests
* Minor clean-ups




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395055#comment-15395055
 ] 

ASF GitHub Bot commented on KAFKA-3996:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1669

KAFKA-3996: ByteBufferMessageSet.writeTo() should be non-blocking

Also:
* Introduce a blocking variant to be used by `FileMessageSet.append`,
* Add tests
* Minor clean-ups

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-3996-byte-buffer-message-set-write-to-non-blocking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1669


commit c057a92a6f296d0b54946452626c2d5aba9513ee
Author: Ismael Juma 
Date:   2016-07-27T05:13:32Z

KAFKA-3996: ByteBufferMessageSet.writeTo() should be non-blocking

Also:
* Introduce a blocking variant to be used by `FileMessageSet.append`,
* Add tests
* Minor clean-ups




> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #777

2016-07-26 Thread Apache Jenkins Server
See 



Jenkins build is back to normal : kafka-0.10.0-jdk7 #163

2016-07-26 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395001#comment-15395001
 ] 

Jiangjie Qin commented on KAFKA-3994:
-

[~ijuma] ^^

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
> at 
> 

[jira] [Commented] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395000#comment-15395000
 ] 

Jiangjie Qin commented on KAFKA-3994:
-

[~hachikuji] This is on the trunk. I will not be able to work on this for a 
while so please feel free to take the ticket.

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> 

[jira] [Commented] (KAFKA-3852) Clarify how to handle message format upgrade without killing performance

2016-07-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394993#comment-15394993
 ] 

Gwen Shapira commented on KAFKA-3852:
-

ah, I see you just picked it up. Never mind, then :)

> Clarify how to handle message format upgrade without killing performance
> 
>
> Key: KAFKA-3852
> URL: https://issues.apache.org/jira/browse/KAFKA-3852
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> The upgrade notes re: performance impact of the message format change in 
> 0.10.0.0 are a bit wishy washy in terms of what you ​*should*​ do. They 
> describe the potential impact and then say (at the end of a paragraph): "To 
> avoid such message conversion before consumers are upgraded to 0.10.0.0, one 
> can set". This should probably be written as a playbook for doing the upgrade 
> without affecting performance and should be a recommendation, not presented 
> as just an option. Nobody with an existing cluster wants the perf impact of 
> switching their brokers to the new format while consumers are still using the 
> old format unless they have a cluster with extremely light load.
> Additionally, if we have some simple benchmark numbers on the performance 
> impact on brokers, we should include that information. Today it is very 
> unclear how bad that change will be -- 5%, 20%, 90% worse?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3852) Clarify how to handle message format upgrade without killing performance

2016-07-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394990#comment-15394990
 ] 

Gwen Shapira commented on KAFKA-3852:
-

Hey [~ewencp], mind if I take a crack at that one? I want to try and get it for 
the upcoming bugfix release.

> Clarify how to handle message format upgrade without killing performance
> 
>
> Key: KAFKA-3852
> URL: https://issues.apache.org/jira/browse/KAFKA-3852
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> The upgrade notes re: performance impact of the message format change in 
> 0.10.0.0 are a bit wishy washy in terms of what you ​*should*​ do. They 
> describe the potential impact and then say (at the end of a paragraph): "To 
> avoid such message conversion before consumers are upgraded to 0.10.0.0, one 
> can set". This should probably be written as a playbook for doing the upgrade 
> without affecting performance and should be a recommendation, not presented 
> as just an option. Nobody with an existing cluster wants the perf impact of 
> switching their brokers to the new format while consumers are still using the 
> old format unless they have a cluster with extremely light load.
> Additionally, if we have some simple benchmark numbers on the performance 
> impact on brokers, we should include that information. Today it is very 
> unclear how bad that change will be -- 5%, 20%, 90% worse?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3852) Clarify how to handle message format upgrade without killing performance

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava reassigned KAFKA-3852:


Assignee: Ewen Cheslack-Postava

> Clarify how to handle message format upgrade without killing performance
> 
>
> Key: KAFKA-3852
> URL: https://issues.apache.org/jira/browse/KAFKA-3852
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> The upgrade notes re: performance impact of the message format change in 
> 0.10.0.0 are a bit wishy washy in terms of what you ​*should*​ do. They 
> describe the potential impact and then say (at the end of a paragraph): "To 
> avoid such message conversion before consumers are upgraded to 0.10.0.0, one 
> can set". This should probably be written as a playbook for doing the upgrade 
> without affecting performance and should be a recommendation, not presented 
> as just an option. Nobody with an existing cluster wants the perf impact of 
> switching their brokers to the new format while consumers are still using the 
> old format unless they have a cluster with extremely light load.
> Additionally, if we have some simple benchmark numbers on the performance 
> impact on brokers, we should include that information. Today it is very 
> unclear how bad that change will be -- 5%, 20%, 90% worse?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka 0.10.0.1 plan

2016-07-26 Thread Gwen Shapira
Thanks Ismael :)

I assume you are managing the release (i.e. rolling the RCs and
starting the vote)? Let me know if you need help with the RC rollout.
I did it fairly recently (41 bugs ago?)

Gwen

On Tue, Jul 26, 2016 at 7:55 PM, Ismael Juma  wrote:
> Hi everyone,
>
> We have fixed 41 JIRAs[1] (including a few critical bugs) in the 0.10.0
> branch since 0.10.0.0 was released and it would be good to release 0.10.0.1
> soon. The current list of issues that have 0.10.0.1 as the target version
> can be found below:
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%200.10.0.1%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC
>
> Once we have resolved the outstanding issues (note that a couple of them
> are documentation JIRAs), we will start the release process.
>
> Ismael
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.10.0.1%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC


[jira] [Assigned] (KAFKA-3851) Add references to important installation/upgrade notes to release notes

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava reassigned KAFKA-3851:


Assignee: Ewen Cheslack-Postava

> Add references to important installation/upgrade notes to release notes 
> 
>
> Key: KAFKA-3851
> URL: https://issues.apache.org/jira/browse/KAFKA-3851
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Today we use the release notes exactly as exported from JIRA (see "Prepare 
> release notes" on 
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process) and then 
> rely on users to dig through our documentation (none of which starts with 
> release-specific notes) to upgrade and installation notes.
> Especially for folks who aren't intimately familiar with our docs, the 
> information is *very* easy to miss. Ideally we could automate the release 
> notes process a bit and then have them automatically modified to *at least* 
> include links at the very top (it'd be nice if we had some other header 
> material added as well since the automatically generated release notes are 
> very barebones...). Even better would be if we could pull in the 
> version-specific installation/upgrade notes directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Kafka 0.10.0.1 plan

2016-07-26 Thread Ismael Juma
Hi everyone,

We have fixed 41 JIRAs[1] (including a few critical bugs) in the 0.10.0
branch since 0.10.0.0 was released and it would be good to release 0.10.0.1
soon. The current list of issues that have 0.10.0.1 as the target version
can be found below:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%200.10.0.1%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC

Once we have resolved the outstanding issues (note that a couple of them
are documentation JIRAs), we will start the release process.

Ismael

[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.10.0.1%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC


[jira] [Updated] (KAFKA-3500) KafkaOffsetBackingStore set method needs to handle null

2016-07-26 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3500:

   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1662
[https://github.com/apache/kafka/pull/1662]

> KafkaOffsetBackingStore set method needs to handle null 
> 
>
> Key: KAFKA-3500
> URL: https://issues.apache.org/jira/browse/KAFKA-3500
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> In some cases, the key or value for the offset map can be null. However, it 
> seems that we didn't handle null properly in this case and didn't perform 
> null check when converting the byte buffer back to byte array. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3500) KafkaOffsetBackingStore set method needs to handle null

2016-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394972#comment-15394972
 ] 

ASF GitHub Bot commented on KAFKA-3500:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1662


> KafkaOffsetBackingStore set method needs to handle null 
> 
>
> Key: KAFKA-3500
> URL: https://issues.apache.org/jira/browse/KAFKA-3500
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> In some cases, the key or value for the offset map can be null. However, it 
> seems that we didn't handle null properly in this case and didn't perform 
> null check when converting the byte buffer back to byte array. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1662: KAFKA-3500: Handle null keys and values in KafkaOf...

2016-07-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3996:
---
Priority: Blocker  (was: Major)

> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3996 started by Ismael Juma.
--
> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
> Fix For: 0.10.0.1
>
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3995) Add a new configuration "enable.comrpession.ratio.estimation" to the producer config

2016-07-26 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3995:

Description: 
We recently see a few cases where RecordTooLargeException is thrown because the 
compressed message sent by KafkaProducer exceeded the max message size.

The root cause of this issue is because the compressor is estimating the batch 
size using an estimated compression ratio based on heuristic compression ratio 
statistics. This does not quite work for the traffic with highly variable 
compression ratios. 

For example, if the batch size is set to 1MB and the max message size is 1MB. 
Initially a the producer is sending messages (each message is 1MB) to topic_1 
whose data can be compressed to 1/10 of the original size. After a while the 
estimated compression ratio in the compressor will be trained to 1/10 and the 
producer would put 10 messages into one batch. Now the producer starts to send 
messages (each message is also 1MB) to topic_2 whose message can only be 
compress to 1/5 of the original size. The producer would still use 1/10 as the 
estimated compression ratio and put 10 messages into a batch. That batch would 
be 2 MB after compression which exceeds the maximum message size. In this case 
the user do not have many options other than resend everything or close the 
producer if they care about ordering.

This is especially an issue for services like MirrorMaker whose producer is 
shared by many different topics.

To solve this issue, we can probably add a configuration 
"enable.compression.ratio.estimation" to the producer. So when this 
configuration is set to false, we stop estimating the compressed size but will 
close the batch once the uncompressed bytes in the batch reaches the batch size.

  was:
We recently see a few cases where RecordTooLargeException is thrown because the 
compressed message sent by KafkaProducer exceeded the max message size.

The root cause of this issue is because the compressor is estimating the batch 
size using an estimated compression ratio based on heuristic compression ratio 
statistics. This does not quite work for the traffic with highly variable 
compression ratios. 

For example, if the batch size is set to 100KB and the max message size is 1MB. 
Initially a the producer is sending messages (each message is 100KB) to topic_1 
whose data can be compressed to 1/10 of the original size. After a while the 
estimated compression ratio in the compressor will be trained to 1/10 and the 
producer would put 10 messages into one batch. Now the producer starts to send 
messages (each message is also 100KB) to topic_2 whose message can only be 
compress to 1/5 of the original size. The producer would still use 1/10 as the 
estimated compression ratio and put 10 messages into a batch. That batch would 
be 2 MB after compression which exceeds the maximum message size. In this case 
the user do not have many options other than resend everything or close the 
producer if they care about ordering.

This is especially an issue for services like MirrorMaker whose producer is 
shared by many different topics.

To solve this issue, we can probably add a configuration 
"enable.compression.ratio.estimation" to the producer. So when this 
configuration is set to false, we stop estimating the compressed size but will 
close the batch once the uncompressed bytes in the batch reaches the batch size.


> Add a new configuration "enable.comrpession.ratio.estimation" to the producer 
> config
> 
>
> Key: KAFKA-3995
> URL: https://issues.apache.org/jira/browse/KAFKA-3995
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
> Fix For: 0.10.1.0
>
>
> We recently see a few cases where RecordTooLargeException is thrown because 
> the compressed message sent by KafkaProducer exceeded the max message size.
> The root cause of this issue is because the compressor is estimating the 
> batch size using an estimated compression ratio based on heuristic 
> compression ratio statistics. This does not quite work for the traffic with 
> highly variable compression ratios. 
> For example, if the batch size is set to 1MB and the max message size is 1MB. 
> Initially a the producer is sending messages (each message is 1MB) to topic_1 
> whose data can be compressed to 1/10 of the original size. After a while the 
> estimated compression ratio in the compressor will be trained to 1/10 and the 
> producer would put 10 messages into one batch. Now the producer starts to 
> send messages (each message is also 1MB) to topic_2 whose message can only be 
> compress to 1/5 of the original size. The producer would still use 1/10 

Jenkins build is back to normal : kafka-trunk-jdk7 #1441

2016-07-26 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3986) completedReceives can contain closed channels

2016-07-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394923#comment-15394923
 ] 

Ismael Juma commented on KAFKA-3986:


Agreed [~sriramsub].

> completedReceives can contain closed channels 
> --
>
> Key: KAFKA-3986
> URL: https://issues.apache.org/jira/browse/KAFKA-3986
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Reporter: Ryan P
> Fix For: 0.10.0.2
>
>
> I'm not entirely sure why at this point but it is possible to throw a Null 
> Pointer Exception when processingCompletedReceives. This happens when a 
> fairly substantial number of simultaneously initiated connections are 
> initiated with the server. 
> The processor thread does carry on but it may be worth investigating how the 
> channel could be both closed and  completedReceives. 
> The NPE in question is thrown here:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L490
> It can not be consistently reproduced either. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3986) completedReceives can contain closed channels

2016-07-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394923#comment-15394923
 ] 

Ismael Juma edited comment on KAFKA-3986 at 7/27/16 1:50 AM:
-

Agreed [~sriramsub], moved to the subsequent version.


was (Author: ijuma):
Agreed [~sriramsub].

> completedReceives can contain closed channels 
> --
>
> Key: KAFKA-3986
> URL: https://issues.apache.org/jira/browse/KAFKA-3986
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Reporter: Ryan P
> Fix For: 0.10.0.2
>
>
> I'm not entirely sure why at this point but it is possible to throw a Null 
> Pointer Exception when processingCompletedReceives. This happens when a 
> fairly substantial number of simultaneously initiated connections are 
> initiated with the server. 
> The processor thread does carry on but it may be worth investigating how the 
> channel could be both closed and  completedReceives. 
> The NPE in question is thrown here:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L490
> It can not be consistently reproduced either. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394920#comment-15394920
 ] 

Jason Gustafson commented on KAFKA-3994:


Good catch [~becket_qin]. Just to clarify, this was not on trunk? I was 
wondering if this could have been caused by the changes in KAFKA-2720.

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> 

[jira] [Updated] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3994:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
> at 
> 

[jira] [Updated] (KAFKA-3689) Exception when attempting to decrease connection count for address with no connections

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3689:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Exception when attempting to decrease connection count for address with no 
> connections
> --
>
> Key: KAFKA-3689
> URL: https://issues.apache.org/jira/browse/KAFKA-3689
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 0.9.0.1
> Environment: ubuntu 14.04,
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
> 3 broker cluster (all 3 servers identical -  Intel Xeon E5-2670 @2.6GHz, 
> 8cores, 16 threads 64 GB RAM & 1 TB Disk)
> Kafka Cluster is managed by 3 server ZK cluster (these servers are different 
> from Kafka broker servers). All 6 servers are connected via 10G switch. 
> Producers run from external servers.
>Reporter: Buvaneswari Ramanan
> Fix For: 0.10.1.0, 0.10.0.2
>
> Attachments: KAFKA-3689.log.redacted, kafka-3689-instrumentation.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As per Ismael Juma's suggestion in email thread to us...@kafka.apache.org 
> with the same subject, I am creating this bug report.
> The following error occurs in one of the brokers in our 3 broker cluster, 
> which serves about 8000 topics. These topics are single partitioned with a 
> replication factor = 3. Each topic gets data at a low rate  – 200 bytes per 
> sec.  Leaders are balanced across the topics.
> Producers run from external servers (4 Ubuntu servers with same config as the 
> brokers), each producing to 2000 topics utilizing kafka-python library.
> This error message occurs repeatedly in one of the servers. Between the hours 
> of 10:30am and 1:30pm on 5/9/16, there were about 10 Million such 
> occurrences. This was right after a cluster restart.
> This is not the first time we got this error in this broker. In those 
> instances, error occurred hours / days after cluster restart.
> =
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144 (actual network address 
> masked)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394919#comment-15394919
 ] 

Ismael Juma commented on KAFKA-3994:


Also, did you see this issue with 0.10.0.0 or trunk?

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
> at 
> 

[jira] [Updated] (KAFKA-3986) completedReceives can contain closed channels

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3986:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> completedReceives can contain closed channels 
> --
>
> Key: KAFKA-3986
> URL: https://issues.apache.org/jira/browse/KAFKA-3986
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Reporter: Ryan P
> Fix For: 0.10.0.2
>
>
> I'm not entirely sure why at this point but it is possible to throw a Null 
> Pointer Exception when processingCompletedReceives. This happens when a 
> fairly substantial number of simultaneously initiated connections are 
> initiated with the server. 
> The processor thread does carry on but it may be worth investigating how the 
> channel could be both closed and  completedReceives. 
> The NPE in question is thrown here:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L490
> It can not be consistently reproduced either. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3689) Exception when attempting to decrease connection count for address with no connections

2016-07-26 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-3689:
---
Assignee: (was: Jun Rao)

> Exception when attempting to decrease connection count for address with no 
> connections
> --
>
> Key: KAFKA-3689
> URL: https://issues.apache.org/jira/browse/KAFKA-3689
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 0.9.0.1
> Environment: ubuntu 14.04,
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
> 3 broker cluster (all 3 servers identical -  Intel Xeon E5-2670 @2.6GHz, 
> 8cores, 16 threads 64 GB RAM & 1 TB Disk)
> Kafka Cluster is managed by 3 server ZK cluster (these servers are different 
> from Kafka broker servers). All 6 servers are connected via 10G switch. 
> Producers run from external servers.
>Reporter: Buvaneswari Ramanan
> Fix For: 0.10.1.0, 0.10.0.1
>
> Attachments: KAFKA-3689.log.redacted, kafka-3689-instrumentation.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As per Ismael Juma's suggestion in email thread to us...@kafka.apache.org 
> with the same subject, I am creating this bug report.
> The following error occurs in one of the brokers in our 3 broker cluster, 
> which serves about 8000 topics. These topics are single partitioned with a 
> replication factor = 3. Each topic gets data at a low rate  – 200 bytes per 
> sec.  Leaders are balanced across the topics.
> Producers run from external servers (4 Ubuntu servers with same config as the 
> brokers), each producing to 2000 topics utilizing kafka-python library.
> This error message occurs repeatedly in one of the servers. Between the hours 
> of 10:30am and 1:30pm on 5/9/16, there were about 10 Million such 
> occurrences. This was right after a cluster restart.
> This is not the first time we got this error in this broker. In those 
> instances, error occurred hours / days after cluster restart.
> =
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144 (actual network address 
> masked)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3986) completedReceives can contain closed channels

2016-07-26 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394900#comment-15394900
 ] 

Sriram Subramanian commented on KAFKA-3986:
---

Given that it cannot be consistently reproduced, it is a blocker for 0.10.0.1? 
cc [~ijuma]

> completedReceives can contain closed channels 
> --
>
> Key: KAFKA-3986
> URL: https://issues.apache.org/jira/browse/KAFKA-3986
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Reporter: Ryan P
> Fix For: 0.10.0.1
>
>
> I'm not entirely sure why at this point but it is possible to throw a Null 
> Pointer Exception when processingCompletedReceives. This happens when a 
> fairly substantial number of simultaneously initiated connections are 
> initiated with the server. 
> The processor thread does carry on but it may be worth investigating how the 
> channel could be both closed and  completedReceives. 
> The NPE in question is thrown here:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L490
> It can not be consistently reproduced either. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #776

2016-07-26 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Increase default timeout for other `wait` methods in `TestUtils`

--
[...truncated 1312 lines...]
kafka.server.LogRecoveryTest > testHWCheckpointWithFailuresMultipleLogSegments 
PASSED

kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresSingleLogSegment 
STARTED

kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresSingleLogSegment PASSED

kafka.server.LogRecoveryTest > testHWCheckpointWithFailuresSingleLogSegment 
STARTED

kafka.server.LogRecoveryTest > testHWCheckpointWithFailuresSingleLogSegment 
PASSED

kafka.server.AbstractFetcherThreadTest > testConsumerLagRemovedWithPartition 
STARTED

kafka.server.AbstractFetcherThreadTest > testConsumerLagRemovedWithPartition 
PASSED

kafka.server.AbstractFetcherThreadTest > testMetricsRemovedOnShutdown STARTED

kafka.server.AbstractFetcherThreadTest > testMetricsRemovedOnShutdown PASSED

kafka.server.ServerShutdownTest > testCleanShutdownAfterFailedStartup STARTED

kafka.server.ServerShutdownTest > testCleanShutdownAfterFailedStartup PASSED

kafka.server.ServerShutdownTest > testConsecutiveShutdown STARTED

kafka.server.ServerShutdownTest > testConsecutiveShutdown PASSED

kafka.server.ServerShutdownTest > testCleanShutdown STARTED

kafka.server.ServerShutdownTest > testCleanShutdown PASSED

kafka.server.ServerShutdownTest > testCleanShutdownWithDeleteTopicEnabled 
STARTED

kafka.server.ServerShutdownTest > testCleanShutdownWithDeleteTopicEnabled PASSED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK STARTED

kafka.server.ServerStartupTest > testBrokerStateRunningAfterZK PASSED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot STARTED

kafka.server.ServerStartupTest > testBrokerCreatesZKChroot PASSED

kafka.server.ServerStartupTest > testConflictBrokerRegistration STARTED

kafka.server.ServerStartupTest > testConflictBrokerRegistration PASSED

kafka.server.ServerStartupTest > testBrokerSelfAware STARTED

kafka.server.ServerStartupTest > testBrokerSelfAware PASSED

kafka.server.EdgeCaseRequestTest > testInvalidApiVersionRequest STARTED

kafka.server.EdgeCaseRequestTest > testInvalidApiVersionRequest PASSED

kafka.server.EdgeCaseRequestTest > testMalformedHeaderRequest STARTED

kafka.server.EdgeCaseRequestTest > testMalformedHeaderRequest PASSED

kafka.server.EdgeCaseRequestTest > testProduceRequestWithNullClientId STARTED

kafka.server.EdgeCaseRequestTest > testProduceRequestWithNullClientId PASSED

kafka.server.EdgeCaseRequestTest > testInvalidApiKeyRequest STARTED

kafka.server.EdgeCaseRequestTest > testInvalidApiKeyRequest PASSED

kafka.server.EdgeCaseRequestTest > testHeaderOnlyRequest STARTED

kafka.server.EdgeCaseRequestTest > testHeaderOnlyRequest PASSED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testNotController STARTED

kafka.server.CreateTopicsRequestTest > testNotController PASSED

kafka.server.OffsetCommitTest > testUpdateOffsets STARTED

kafka.server.OffsetCommitTest > testUpdateOffsets PASSED

kafka.server.OffsetCommitTest > testLargeMetadataPayload STARTED

kafka.server.OffsetCommitTest > testLargeMetadataPayload PASSED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets STARTED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets PASSED

kafka.server.OffsetCommitTest > testOffsetExpiration STARTED

kafka.server.OffsetCommitTest > testOffsetExpiration PASSED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit STARTED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit PASSED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping STARTED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping PASSED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse STARTED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse PASSED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks STARTED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks PASSED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower STARTED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower PASSED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
STARTED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers STARTED

kafka.server.IsrExpirationTest > 

[jira] [Commented] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394836#comment-15394836
 ] 

Ismael Juma commented on KAFKA-3994:


[~becket_qin], are you intending to work on this? cc [~hachikuji]

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.0.1
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
> at 
> 

[jira] [Updated] (KAFKA-3733) Avoid long command lines by setting CLASSPATH in environment

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3733:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Avoid long command lines by setting CLASSPATH in environment
> 
>
> Key: KAFKA-3733
> URL: https://issues.apache.org/jira/browse/KAFKA-3733
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Adrian Muraru
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> {{kafka-run-class.sh}} sets the JVM classpath in the command line via {{-cp}}.
> This generates long command lines that gets trimmed by the shell in commands 
> like ps, pgrep,etc.
> An alternative is to set the CLASSPATH in environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3585) Shutdown slow when there is only one broker which is controller

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3585:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Shutdown slow when there is only one broker which is controller
> ---
>
> Key: KAFKA-3585
> URL: https://issues.apache.org/jira/browse/KAFKA-3585
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.1
>Reporter: Pengwei
>Priority: Minor
> Fix For: 0.10.0.2
>
>
> Reproducer Step:
> 1. Install 3 brokers's cluster
> 2. create a topic with 3 partition
> 3. shutdown the broker one by one , you will find the last one shutdown very 
> slow because of error:
> [2016-04-19 20:30:19,168] INFO [Kafka Server 1], Remaining partitions to 
> move: 
> __consumer_offsets-48,__consumer_offsets-13,__consumer_offsets-46,__consumer_offsets-11,__consumer_offsets-44,__consumer_offsets-42,__consumer_offsets-21,__consumer_offsets-19,__consumer_offsets-32,__consumer_offsets-30,__consumer_offsets-28,__consumer_offsets-26,__consumer_offsets-7,__consumer_offsets-40,__consumer_offsets-38,__consumer_offsets-36,__consumer_offsets-1,__consumer_offsets-34,__consumer_offsets-16,__consumer_offsets-45,__consumer_offsets-14,__consumer_offsets-12,__consumer_offsets-41,__consumer_offsets-10,__consumer_offsets-24,__consumer_offsets-22,__consumer_offsets-20,__consumer_offsets-49,__consumer_offsets-18,__consumer_offsets-31,__consumer_offsets-0,test2-0,__consumer_offsets-27,__consumer_offsets-39,__consumer_offsets-8,__consumer_offsets-37,__consumer_offsets-6,__consumer_offsets-4,__consumer_offsets-2
>  (kafka.server.KafkaServer)
> [2016-04-19 20:30:19,169] INFO [Kafka Server 1], Error code from controller: 
> 0 (kafka.server.KafkaServer)
> [2016-04-19 20:30:24,169] WARN [Kafka Server 1], Retrying controlled shutdown 
> after the previous attempt failed... (kafka.server.KafkaServer)
> [2016-04-19 20:30:24,171] WARN [Kafka Server 1], Proceeding to do an unclean 
> shutdown as all the controlled shutdown attempts failed 
> (kafka.server.KafkaServer)
> it is determined by :
> controlled.shutdown.retry.backoff.ms  = 5000
> controlled.shutdown.max.retries=3
> It slow because the last one can not elect the new leader for the remaining 
> partitions , the last one can improve to shutdown quickly, we can skip the 
> shutdown error when it is the last broker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3437) We don't need sitedocs package for every scala version

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3437:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> We don't need sitedocs package for every scala version
> --
>
> Key: KAFKA-3437
> URL: https://issues.apache.org/jira/browse/KAFKA-3437
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Gwen Shapira
>Assignee: Grant Henke
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> When running "./gradlew releaseTarGzAll - it generates a binary tarball for 
> every scala version we support (good!) and also sitedoc tarball for every 
> scala version we support (useless).
> Will be nice if we have a way to generate just one sitedoc tarball for our 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3500) KafkaOffsetBackingStore set method needs to handle null

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3500:
-
Priority: Blocker  (was: Critical)

> KafkaOffsetBackingStore set method needs to handle null 
> 
>
> Key: KAFKA-3500
> URL: https://issues.apache.org/jira/browse/KAFKA-3500
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> In some cases, the key or value for the offset map can be null. However, it 
> seems that we didn't handle null properly in this case and didn't perform 
> null check when converting the byte buffer back to byte array. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3396) Unauthorized topics are returned to the user

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3396:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Unauthorized topics are returned to the user
> 
>
> Key: KAFKA-3396
> URL: https://issues.apache.org/jira/browse/KAFKA-3396
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Grant Henke
>Assignee: Edoardo Comar
> Fix For: 0.10.1.0
>
>
> Kafka's clients and protocol exposes unauthorized topics to the end user. 
> This is often considered a security hole. To some, the topic name is 
> considered sensitive information. Those that do not consider the name 
> sensitive, still consider it more information that allows a user to try and 
> circumvent security.  Instead, if a user does not have access to the topic, 
> the servers should act as if the topic does not exist. 
> To solve this some of the changes could include:
>   - The broker should not return a TOPIC_AUTHORIZATION(29) error for 
> requests (metadata, produce, fetch, etc) that include a topic that the user 
> does not have DESCRIBE access to.
>   - A user should not receive a TopicAuthorizationException when they do 
> not have DESCRIBE access to a topic or the cluster.
>  - The client should not maintain and expose a list of unauthorized 
> topics in org.apache.kafka.common.Cluster. 
> Other changes may be required that are not listed here. Further analysis is 
> needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3947) kafka-reassign-partitions.sh should support dumping current assignment

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3947:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> kafka-reassign-partitions.sh should support dumping current assignment
> --
>
> Key: KAFKA-3947
> URL: https://issues.apache.org/jira/browse/KAFKA-3947
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10.0.0
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> When I building my own tool to perform reassignment of partitions, I realized 
> that there's no way to dump the current partition assignment in machine 
> parsable format such as JSON.
> Actually giving {{\-\-generate}} option to the kafka-reassign-partitions.sh 
> script dumps the current assignment of topic given by 
> {{\-\-topics-to-assign-json-file}} but it's very inconvenient because of:
> - I want the dump containing all topics. That is, I wanna skip generating the 
> list of current topics to pass it to the generate command.
> - The output is concatenated with the result of reassignment so can't do 
> simply something like: {{kafka-reassign-partitions.sh --generate ... > 
> current-assignment.json}}
> - Don't need to ask kafka to generate reassginment to get the current 
> assignment in the first place.
> Here I'd like to add the {{\-\-dump}} option to kafka-reassign-partitions.sh.
> I was wondering whether this functionality should be provided by 
> {{kafka-reassign-partitions.sh}} or {{kafka-topics.sh}} but now I think 
> {{kafka-reassign-partitions.sh}} should be much proper as the resulting JSON 
> should be in the format of {{\-\-reassignment-json-file}} which sticks to 
> this command.
> Will follow up the patch implements this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3428:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
> Fix For: 0.10.1.0
>
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3234) Minor documentation edits: clarify minISR; some topic-level configs are missing

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3234:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Minor documentation edits: clarify minISR; some topic-level configs are 
> missing
> ---
>
> Key: KAFKA-3234
> URL: https://issues.apache.org/jira/browse/KAFKA-3234
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Joel Koshy
>Assignee: Joel Koshy
> Fix For: 0.10.1.0
>
>
> Based on an offline conversation with [~junrao] and [~gwenshap]
> The current documentation is somewhat confusing on minISR in that it says 
> that it offers a trade-off between consistency and availability. From the 
> user's view-point, consistency (at least in the usual sense of the term) is 
> achieved by disabling unclean leader election - since no replica that was out 
> of ISR can be elected as the leader. So a consumer will never see a message 
> that was not acknowledged to a producer that set acks to "all". Or to put it 
> another way, setting minISR alone will not prevent exposing uncommitted 
> messages - disabling unclean leader election is the stronger requirement. You 
> can achieve the same effect though by setting minISR equal to  the number of 
> replicas.
> There is also some stale documentation that needs to be removed:
> {quote}
> In our current release we choose the second strategy and favor choosing a 
> potentially inconsistent replica when all replicas in the ISR are dead. In 
> the future, we would like to make this configurable to better support use 
> cases where downtime is preferable to inconsistency.
> {quote}
> Finally, it was reported on the mailing list (from Elias Levy) that 
> compression.type should be added under the topic configs. Same goes for 
> unclean leader election. Would be good to have these auto-generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3284) Consider removing beta label in security documentation

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3284:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Consider removing beta label in security documentation
> --
>
> Key: KAFKA-3284
> URL: https://issues.apache.org/jira/browse/KAFKA-3284
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> We currently state that our security support is beta. It would be good to 
> remove that for 0.10.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3438) Rack Aware Replica Reassignment should warn of overloaded brokers

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3438:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Rack Aware Replica Reassignment should warn of overloaded brokers
> -
>
> Key: KAFKA-3438
> URL: https://issues.apache.org/jira/browse/KAFKA-3438
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Ben Stopford
> Fix For: 0.10.1.0
>
>
> We've changed the replica reassignment code to be rack aware.
> One problem that might catch users out would be that they rebalance the 
> cluster using kafka-reassign-partitions.sh but their rack configuration means 
> that some high proportion of replicas are pushed onto a single, or small 
> number of, brokers. 
> This should be an easy problem to avoid, by changing the rack assignment 
> information, but we should probably warn users if they are going to create 
> something that is unbalanced. 
> So imagine I have a Kafka cluster of 12 nodes spread over two racks with rack 
> awareness enabled. If I add a 13th machine, on a new rack, and run the 
> rebalance tool, that new machine will get ~6x as many replicas as the least 
> loaded broker. 
> Suggest a warning  be added to the tool output when --generate is called. 
> "The most loaded broker has 2.3x as many replicas as the the least loaded 
> broker. This is likely due to an uneven distribution of brokers across racks. 
> You're advised to alter the rack config so there are approximately the same 
> number of brokers per rack" and displays the individual rack→#brokers and 
> broker→#replicas data for the proposed move.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3493) Replica fetcher load is not balanced over fetcher threads

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3493:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Replica fetcher load is not balanced over fetcher threads
> -
>
> Key: KAFKA-3493
> URL: https://issues.apache.org/jira/browse/KAFKA-3493
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
> Fix For: 0.10.0.2
>
>
> The replicas are not evenly distributed among the fetcher threads. This has 
> caused some fetcher threads get overloaded and hence their requests time out 
> frequently. This is especially a big issue when a new node is added to the 
> cluster and the fetch traffic is high. 
> Here is an example run in a test cluster with 10 brokers and 6 fetcher 
> threads (per source broker). A single topic consisting of 500+ partitions was 
> assigned to have a replica for each parition on the newly added broker.
> {code}[kafka-jetstream.canary]myabandeh@sjc8c-rl17-23b:~$ for i in `seq 0 5`; 
> do grep ReplicaFetcherThread-$i- /var/log/kafka/server.log | grep "reset its 
> fetch offset from 0" | wc -l; done
> 85
> 83
> 85
> 83
> 85
> 85
> [kafka-jetstream.canary]myabandeh@sjc8c-rl17-23b:~$ for i in `seq 0 5`; do 
> grep ReplicaFetcherThread-$i-22 /var/log/kafka/server.log | grep "reset its 
> fetch offset from 0" | wc -l; done
> 15
> 1
> 13
> 1
> 14
> 1
> {code}
> The problem is that AbstractFetcherManager::getFetcherId method does not take 
> the broker id into account:
> {code}
>   private def getFetcherId(topic: String, partitionId: Int) : Int = {
> Utils.abs(31 * topic.hashCode() + partitionId) % numFetchers
>   }
> {code}
> Hence although the replicas are evenly distributed among the fetcher ids 
> across all source brokers, this is not necessarily the case for each broker 
> separately. 
> I think a random function would do a much better job in distributing the load 
> over the fetcher threads from each source broker.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3665) Default ssl.endpoint.identification.algorithm should be https

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3665:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> Default ssl.endpoint.identification.algorithm should be https
> -
>
> Key: KAFKA-3665
> URL: https://issues.apache.org/jira/browse/KAFKA-3665
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.9.0.1
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> The default `ssl.endpoint.identification.algorithm` is `null` which is not a 
> secure default (man in the middle attacks are possible).
> We should probably use `https` instead. A more conservative alternative would 
> be to update the documentation instead of changing the default.
> A paper on the topic (thanks to Ryan Pridgeon for the reference): 
> http://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3123) Follower Broker cannot start if offsets are already out of range

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3123:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Follower Broker cannot start if offsets are already out of range
> 
>
> Key: KAFKA-3123
> URL: https://issues.apache.org/jira/browse/KAFKA-3123
> Project: Kafka
>  Issue Type: Bug
>  Components: core, replication
>Affects Versions: 0.9.0.0
>Reporter: Soumyajit Sahu
>Assignee: Neha Narkhede
>Priority: Critical
>  Labels: patch
> Fix For: 0.10.0.2
>
> Attachments: 
> 0001-Fix-Follower-crashes-when-offset-out-of-range-during.patch
>
>
> I was trying to upgrade our test Windows cluster from 0.8.1.1 to 0.9.0 one 
> machine at a time. Our logs have just 2 hours of retention. I had re-imaged 
> the test machine under consideration, and got the following error in loop 
> after starting afresh with 0.9.0 broker:
> [2016-01-19 13:57:28,809] WARN [ReplicaFetcherThread-1-169595708], Replica 
> 15588 for partition [EventLogs4,1] reset its fetch offset from 0 to 
> current leader 169595708's start offset 334086 
> (kafka.server.ReplicaFetcherThread)
> [2016-01-19 13:57:28,809] ERROR [ReplicaFetcherThread-1-169595708], Error 
> getting offset for partition [EventLogs4,1] to broker 169595708 
> (kafka.server.ReplicaFetcherThread)
> java.lang.IllegalStateException: Compaction for partition [EXO_EventLogs4,1] 
> cannot be aborted and paused since it is in LogCleaningPaused state.
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply$mcV$sp(LogCleanerManager.scala:149)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at 
> kafka.log.LogCleanerManager$$anonfun$abortAndPauseCleaning$1.apply(LogCleanerManager.scala:140)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.log.LogCleanerManager.abortAndPauseCleaning(LogCleanerManager.scala:140)
>   at kafka.log.LogCleaner.abortAndPauseCleaning(LogCleaner.scala:141)
>   at kafka.log.LogManager.truncateFullyAndStartAt(LogManager.scala:304)
>   at 
> kafka.server.ReplicaFetcherThread.handleOffsetOutOfRange(ReplicaFetcherThread.scala:185)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:152)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:122)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:122)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> I could unblock myself with a code change. I deleted the action for "case s 
> =>" in the LogCleanerManager.scala's abortAndPauseCleaning(). I think we 
> should not throw exception if the state is already LogCleaningAborted or 
> LogCleaningPaused in this function, but instead just let it roll.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3824:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3866) KerberosLogin refresh time bug and other improvements

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3866:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.1.0

> KerberosLogin refresh time bug and other improvements
> -
>
> Key: KAFKA-3866
> URL: https://issues.apache.org/jira/browse/KAFKA-3866
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.0.0
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is 
> also present in our KerberosLogin class. While looking at the code, I found a 
> number of things that could be improved. More details in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3374) Failure in security rolling upgrade phase 2 system test

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3374:
---
Fix Version/s: 0.10.0.2

> Failure in security rolling upgrade phase 2 system test
> ---
>
> Key: KAFKA-3374
> URL: https://issues.apache.org/jira/browse/KAFKA-3374
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Ismael Juma
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.10.1.0, 0.10.0.2
>
>
> [~geoffra] reported the following a few days ago.
> Seeing fairly consistent failures in
> "Module: kafkatest.tests.security_rolling_upgrade_test
> Class:  TestSecurityRollingUpgrade
> Method: test_rolling_upgrade_phase_two
> Arguments:
> {
>   "broker_protocol": "SASL_PLAINTEXT",
>   "client_protocol": "SASL_SSL"
> }
> Last successful run (git hash): 2a58ba9
> First failure: f7887bd
> (note failures are not 100% consistent, so there's non-zero chance the commit 
> that introduced the failure is prior to 2a58ba9)
> See for example:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3190) KafkaProducer should not invoke callback in send()

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3190:
---
Fix Version/s: (was: 0.10.0.1)
   0,10.0.2

> KafkaProducer should not invoke callback in send()
> --
>
> Key: KAFKA-3190
> URL: https://issues.apache.org/jira/browse/KAFKA-3190
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0,10.0.2
>
>
> Currently KafkaProducer will invoke callback.onComplete() if it receives an 
> ApiException during send(). This breaks the guarantee that callback will be 
> invoked in order. It seems ApiException in send() only comes from metadata 
> refresh. If so, we can probably simply throw it instead of invoking 
> callback().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3135:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.2

> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.0.2
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3374) Failure in security rolling upgrade phase 2 system test

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3374:
---
Fix Version/s: (was: 0.10.0.1)

> Failure in security rolling upgrade phase 2 system test
> ---
>
> Key: KAFKA-3374
> URL: https://issues.apache.org/jira/browse/KAFKA-3374
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Ismael Juma
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> [~geoffra] reported the following a few days ago.
> Seeing fairly consistent failures in
> "Module: kafkatest.tests.security_rolling_upgrade_test
> Class:  TestSecurityRollingUpgrade
> Method: test_rolling_upgrade_phase_two
> Arguments:
> {
>   "broker_protocol": "SASL_PLAINTEXT",
>   "client_protocol": "SASL_SSL"
> }
> Last successful run (git hash): 2a58ba9
> First failure: f7887bd
> (note failures are not 100% consistent, so there's non-zero chance the commit 
> that introduced the failure is prior to 2a58ba9)
> See for example:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1665: MINOR: Increase default timeout for other `wait` m...

2016-07-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1665


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3996:
---
Fix Version/s: 0.10.0.1

> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
> Fix For: 0.10.0.1
>
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3996:
--

Assignee: Ismael Juma

> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ismael Juma
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3973) Investigate feasibility of caching bytes vs. records

2016-07-26 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-3973:
---
Status: Patch Available  (was: In Progress)

> Investigate feasibility of caching bytes vs. records
> 
>
> Key: KAFKA-3973
> URL: https://issues.apache.org/jira/browse/KAFKA-3973
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
> Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1668: Kafka 3973 feasibility caching bytes vs records

2016-07-26 Thread bbejeck
GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/1668

Kafka 3973 feasibility caching bytes vs records

@enothereska 

Added the benchmark test per comments.

Thanks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-3973_feasibility_caching_bytes_vs_records

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1668


commit 62a17892d2dfc8f443e48d3a1ed01ec13b44d4b4
Author: bbejeck 
Date:   2016-07-26T00:55:46Z

KAFKA-3973: updates to MemoryLRUCache and CachingPerformanceTest

commit 2b8f88a740fd9a098354b078af25386125d75721
Author: bbejeck 
Date:   2016-07-26T23:11:28Z

KAFKA-3973: Added a caching performance test for caching objects tracking 
cache size by memory, created mock cache, reverted changes to the 
MemoryLRUCache, added the
jamm.jar to build.gradle




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3973) Investigate feasibility of caching bytes vs. records

2016-07-26 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394761#comment-15394761
 ] 

Bill Bejeck commented on KAFKA-3973:


[~ijuma]  

I re-ran the tests with no instrumentation using the FALLBACK_UNSAFE enum, the 
results were the same if not slower.  The benchmark can be run now with no 
instrumentation.

> Investigate feasibility of caching bytes vs. records
> 
>
> Key: KAFKA-3973
> URL: https://issues.apache.org/jira/browse/KAFKA-3973
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
> Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394755#comment-15394755
 ] 

Jun Rao commented on KAFKA-3996:


To address this issue, we will need to get rid of the while loop and return the 
bytes returned from channel.write(buffer) similar to what's in ByteBufferSend.

> ByteBufferMessageSet.writeTo() should be non-blocking
> -
>
> Key: KAFKA-3996
> URL: https://issues.apache.org/jira/browse/KAFKA-3996
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>
> Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
> bytes in the buffer in a single call. The code has been like that since 0.8. 
> This hasn't been a problem historically since the broker uses zero-copy to 
> send fetch responses and only use ByteBufferMessageSet to send produce 
> responses, which are small. However, in 0.10.0, if a consumer is before 
> 0.10.0, the broker has to down convert the message and use 
> ByteBufferMessageSet to send a fetch response to the consumer. If the client 
> is slow and there are lots of bytes in the ByteBufferMessageSet, we may not 
> be able to completely send all the bytes in the buffer for a long period of 
> time. When this happens, the Processor will be blocked and can't handle other 
> connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3996) ByteBufferMessageSet.writeTo() should be non-blocking

2016-07-26 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-3996:
--

 Summary: ByteBufferMessageSet.writeTo() should be non-blocking
 Key: KAFKA-3996
 URL: https://issues.apache.org/jira/browse/KAFKA-3996
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.0
Reporter: Jun Rao


Currently, in ByteBufferMessageSet.writeTo(), we try to finish writing all 
bytes in the buffer in a single call. The code has been like that since 0.8. 
This hasn't been a problem historically since the broker uses zero-copy to send 
fetch responses and only use ByteBufferMessageSet to send produce responses, 
which are small. However, in 0.10.0, if a consumer is before 0.10.0, the broker 
has to down convert the message and use ByteBufferMessageSet to send a fetch 
response to the consumer. If the client is slow and there are lots of bytes in 
the ByteBufferMessageSet, we may not be able to completely send all the bytes 
in the buffer for a long period of time. When this happens, the Processor will 
be blocked and can't handle other connections, which is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3995) Add a new configuration "enable.comrpession.ratio.estimation" to the producer config

2016-07-26 Thread Mayuresh Gharat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayuresh Gharat reassigned KAFKA-3995:
--

Assignee: Mayuresh Gharat

> Add a new configuration "enable.comrpession.ratio.estimation" to the producer 
> config
> 
>
> Key: KAFKA-3995
> URL: https://issues.apache.org/jira/browse/KAFKA-3995
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
> Fix For: 0.10.1.0
>
>
> We recently see a few cases where RecordTooLargeException is thrown because 
> the compressed message sent by KafkaProducer exceeded the max message size.
> The root cause of this issue is because the compressor is estimating the 
> batch size using an estimated compression ratio based on heuristic 
> compression ratio statistics. This does not quite work for the traffic with 
> highly variable compression ratios. 
> For example, if the batch size is set to 100KB and the max message size is 
> 1MB. Initially a the producer is sending messages (each message is 100KB) to 
> topic_1 whose data can be compressed to 1/10 of the original size. After a 
> while the estimated compression ratio in the compressor will be trained to 
> 1/10 and the producer would put 10 messages into one batch. Now the producer 
> starts to send messages (each message is also 100KB) to topic_2 whose message 
> can only be compress to 1/5 of the original size. The producer would still 
> use 1/10 as the estimated compression ratio and put 10 messages into a batch. 
> That batch would be 2 MB after compression which exceeds the maximum message 
> size. In this case the user do not have many options other than resend 
> everything or close the producer if they care about ordering.
> This is especially an issue for services like MirrorMaker whose producer is 
> shared by many different topics.
> To solve this issue, we can probably add a configuration 
> "enable.compression.ratio.estimation" to the producer. So when this 
> configuration is set to false, we stop estimating the compressed size but 
> will close the batch once the uncompressed bytes in the batch reaches the 
> batch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3995) Add a new configuration "enable.comrpession.ratio.estimation" to the producer config

2016-07-26 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-3995:
---

 Summary: Add a new configuration 
"enable.comrpession.ratio.estimation" to the producer config
 Key: KAFKA-3995
 URL: https://issues.apache.org/jira/browse/KAFKA-3995
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.10.0.0
Reporter: Jiangjie Qin
 Fix For: 0.10.1.0


We recently see a few cases where RecordTooLargeException is thrown because the 
compressed message sent by KafkaProducer exceeded the max message size.

The root cause of this issue is because the compressor is estimating the batch 
size using an estimated compression ratio based on heuristic compression ratio 
statistics. This does not quite work for the traffic with highly variable 
compression ratios. 

For example, if the batch size is set to 100KB and the max message size is 1MB. 
Initially a the producer is sending messages (each message is 100KB) to topic_1 
whose data can be compressed to 1/10 of the original size. After a while the 
estimated compression ratio in the compressor will be trained to 1/10 and the 
producer would put 10 messages into one batch. Now the producer starts to send 
messages (each message is also 100KB) to topic_2 whose message can only be 
compress to 1/5 of the original size. The producer would still use 1/10 as the 
estimated compression ratio and put 10 messages into a batch. That batch would 
be 2 MB after compression which exceeds the maximum message size. In this case 
the user do not have many options other than resend everything or close the 
producer if they care about ordering.

This is especially an issue for services like MirrorMaker whose producer is 
shared by many different topics.

To solve this issue, we can probably add a configuration 
"enable.compression.ratio.estimation" to the producer. So when this 
configuration is set to false, we stop estimating the compressed size but will 
close the batch once the uncompressed bytes in the batch reaches the batch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-07-26 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-3994:
---

 Summary: Deadlock between consumer heartbeat expiration and offset 
commit.
 Key: KAFKA-3994
 URL: https://issues.apache.org/jira/browse/KAFKA-3994
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.0
Reporter: Jiangjie Qin
 Fix For: 0.10.0.1


I got the following stacktraces from ConsumerBounceTest

{code}
...
"Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
[0x7fbb06445000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
- locked <0x0003d48bcbb0> (a java.util.Collections$UnmodifiableSet)
- locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:454)
at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
at 
kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
at 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 

[jira] [Assigned] (KAFKA-2806) Allow Kafka System Tests under different JDK versions

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava reassigned KAFKA-2806:


Assignee: Ewen Cheslack-Postava

> Allow Kafka System Tests under different JDK versions
> -
>
> Key: KAFKA-2806
> URL: https://issues.apache.org/jira/browse/KAFKA-2806
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.1.0
>
>
> Currently the Kafka system tests (using ducktape) uses JDK7 as the runtime 
> inside vagrant processes. However, there are some known issues with executing 
> Java8 builds with JDK7 under Scala:
> https://gist.github.com/AlainODea/1375759b8720a3f9f094
> http://stackoverflow.com/questions/24448723/java-error-java-util-concurrent-concurrenthashmap-keyset
> We need to be able to config the system tests to execute different JDK 
> versions in the virtual machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394244#comment-15394244
 ] 

ASF GitHub Bot commented on KAFKA-3993:
---

GitHub user theduderog opened a pull request:

https://github.com/apache/kafka/pull/1667

KAFKA-3993 Console Producer Drops Data 

@ijuma Would you mind taking a look?

This fixes the problem for me using this test script.

```
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/theduderog/kafka KAFKA-3993-close-producer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1667


commit 093d2aaaf85a49dbfe1a07a8ea47739909a321b7
Author: Roger Hoover 
Date:   2016-07-26T18:08:47Z

Added close to producer




> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1667: KAFKA-3993 Console Producer Drops Data

2016-07-26 Thread theduderog
GitHub user theduderog opened a pull request:

https://github.com/apache/kafka/pull/1667

KAFKA-3993 Console Producer Drops Data 

@ijuma Would you mind taking a look?

This fixes the problem for me using this test script.

```
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/theduderog/kafka KAFKA-3993-close-producer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1667


commit 093d2aaaf85a49dbfe1a07a8ea47739909a321b7
Author: Roger Hoover 
Date:   2016-07-26T18:08:47Z

Added close to producer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-3993) Console producer drops data

2016-07-26 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3993:
---

 Summary: Console producer drops data
 Key: KAFKA-3993
 URL: https://issues.apache.org/jira/browse/KAFKA-3993
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.0.0
Reporter: Roger Hoover


The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3991) MirrorMaker: allow custom publisher

2016-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394075#comment-15394075
 ] 

ASF GitHub Bot commented on KAFKA-3991:
---

GitHub user hsun-cnnxty opened a pull request:

https://github.com/apache/kafka/pull/1666

[KAFKA-3991]: allow MirrorMaker to have custom producer

Please see jira: https://issues.apache.org/jira/browse/KAFKA-3991

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsun-cnnxty/kafka k3991

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1666


commit a5fdf0946fc47a3b31c48d037f6c5e3e1aa868ee
Author: Hang Sun 
Date:   2016-07-26T16:36:42Z

KAFKA-3991: allow MirrorMaker to have custom producer




> MirrorMaker: allow custom publisher
> ---
>
> Key: KAFKA-3991
> URL: https://issues.apache.org/jira/browse/KAFKA-3991
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.9.0.1
>Reporter: Hang Sun
>Priority: Minor
>  Labels: features
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It would be nice if MirrorMaker allows custom publisher to be plugged in.  
> The use case we have is to use a customized REST publisher with Kafka REST 
> Proxy (http://docs.confluent.io/2.0.0/kafka-rest/docs/index.html) to mirror 
> data cross data center via public internet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1666: [KAFKA-3991]: allow MirrorMaker to have custom pro...

2016-07-26 Thread hsun-cnnxty
GitHub user hsun-cnnxty opened a pull request:

https://github.com/apache/kafka/pull/1666

[KAFKA-3991]: allow MirrorMaker to have custom producer

Please see jira: https://issues.apache.org/jira/browse/KAFKA-3991

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsun-cnnxty/kafka k3991

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1666


commit a5fdf0946fc47a3b31c48d037f6c5e3e1aa868ee
Author: Hang Sun 
Date:   2016-07-26T16:36:42Z

KAFKA-3991: allow MirrorMaker to have custom producer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3992) InstanceAlreadyExistsException Error for Consumers Starting in Parallel

2016-07-26 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393962#comment-15393962
 ] 

Manikumar Reddy commented on KAFKA-3992:


looks like you are giving same client-Id (consumerId) to multiple consumers.  
With this we will miss some metrics.

> InstanceAlreadyExistsException Error for Consumers Starting in Parallel
> ---
>
> Key: KAFKA-3992
> URL: https://issues.apache.org/jira/browse/KAFKA-3992
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Alexander Cook
>
> I see the following error sometimes when I start multiple consumers at about 
> the same time in the same process (separate threads). Everything seems to 
> work fine afterwards, so should this not actually be an ERROR level message, 
> or could there be something going wrong that I don't see? 
> Let me know if I can provide any more info! 
> Error processing messages: Error registering mbean 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
> org.apache.kafka.common.KafkaException: Error registering mbean 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
>  
> Caused by: javax.management.InstanceAlreadyExistsException: 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
> Here is the full stack trace: 
> M[?:com.ibm.streamsx.messaging.kafka.KafkaConsumerV9.produceTuples:-1]  - 
> Error processing messages: Error registering mbean 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
> org.apache.kafka.common.KafkaException: Error registering mbean 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
>   at 
> org.apache.kafka.common.metrics.JmxReporter.reregister(JmxReporter.java:159)
>   at 
> org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:77)
>   at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:288)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:177)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:162)
>   at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:641)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:268)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:270)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:303)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:197)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:187)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:126)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorKnown(AbstractCoordinator.java:186)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:857)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:829)
>   at 
> com.ibm.streamsx.messaging.kafka.KafkaConsumerV9.produceTuples(KafkaConsumerV9.java:129)
>   at 
> com.ibm.streamsx.messaging.kafka.KafkaConsumerV9$1.run(KafkaConsumerV9.java:70)
>   at java.lang.Thread.run(Thread.java:785)
>   at 
> com.ibm.streams.operator.internal.runtime.OperatorThreadFactory$2.run(OperatorThreadFactory.java:137)
> Caused by: javax.management.InstanceAlreadyExistsException: 
> kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
>   at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:449)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1910)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:978)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:912)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:336)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:534)
>   at 
> org.apache.kafka.common.metrics.JmxReporter.reregister(JmxReporter.java:157)
>   ... 18 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3374) Failure in security rolling upgrade phase 2 system test

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3374:
-
Fix Version/s: 0.10.0.1

> Failure in security rolling upgrade phase 2 system test
> ---
>
> Key: KAFKA-3374
> URL: https://issues.apache.org/jira/browse/KAFKA-3374
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Ismael Juma
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> [~geoffra] reported the following a few days ago.
> Seeing fairly consistent failures in
> "Module: kafkatest.tests.security_rolling_upgrade_test
> Class:  TestSecurityRollingUpgrade
> Method: test_rolling_upgrade_phase_two
> Arguments:
> {
>   "broker_protocol": "SASL_PLAINTEXT",
>   "client_protocol": "SASL_SSL"
> }
> Last successful run (git hash): 2a58ba9
> First failure: f7887bd
> (note failures are not 100% consistent, so there's non-zero chance the commit 
> that introduced the failure is prior to 2a58ba9)
> See for example:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3374) Failure in security rolling upgrade phase 2 system test

2016-07-26 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393912#comment-15393912
 ] 

Ewen Cheslack-Postava commented on KAFKA-3374:
--

There's also a few cases of this on the 0.10.0 branch, so I'm going to add 
0.10.0.1 to the fix version.

> Failure in security rolling upgrade phase 2 system test
> ---
>
> Key: KAFKA-3374
> URL: https://issues.apache.org/jira/browse/KAFKA-3374
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Ismael Juma
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> [~geoffra] reported the following a few days ago.
> Seeing fairly consistent failures in
> "Module: kafkatest.tests.security_rolling_upgrade_test
> Class:  TestSecurityRollingUpgrade
> Method: test_rolling_upgrade_phase_two
> Arguments:
> {
>   "broker_protocol": "SASL_PLAINTEXT",
>   "client_protocol": "SASL_SSL"
> }
> Last successful run (git hash): 2a58ba9
> First failure: f7887bd
> (note failures are not 100% consistent, so there's non-zero chance the commit 
> that introduced the failure is prior to 2a58ba9)
> See for example:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-03-08--001.1457454171--apache--trunk--f6e35de/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2016-07-26 Thread Andrew Jorgensen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393877#comment-15393877
 ] 

Andrew Jorgensen commented on KAFKA-3980:
-

[~omkreddy] do you mean enable debug log on the producer side of the kafka 
side? It looks like the debug log for the producer itself doesn't have its 
client id so I'm assuming you mean on the kafka side?

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3992) InstanceAlreadyExistsException Error for Consumers Starting in Parallel

2016-07-26 Thread Alexander Cook (JIRA)
Alexander Cook created KAFKA-3992:
-

 Summary: InstanceAlreadyExistsException Error for Consumers 
Starting in Parallel
 Key: KAFKA-3992
 URL: https://issues.apache.org/jira/browse/KAFKA-3992
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.0
Reporter: Alexander Cook


I see the following error sometimes when I start multiple consumers at about 
the same time in the same process (separate threads). Everything seems to work 
fine afterwards, so should this not actually be an ERROR level message, or 
could there be something going wrong that I don't see? 

Let me know if I can provide any more info! 

Error processing messages: Error registering mbean 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
org.apache.kafka.common.KafkaException: Error registering mbean 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
 
Caused by: javax.management.InstanceAlreadyExistsException: 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1


Here is the full stack trace: 

M[?:com.ibm.streamsx.messaging.kafka.KafkaConsumerV9.produceTuples:-1]  - Error 
processing messages: Error registering mbean 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
org.apache.kafka.common.KafkaException: Error registering mbean 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
at 
org.apache.kafka.common.metrics.JmxReporter.reregister(JmxReporter.java:159)
at 
org.apache.kafka.common.metrics.JmxReporter.metricChange(JmxReporter.java:77)
at 
org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:288)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:177)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:162)
at 
org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:641)
at org.apache.kafka.common.network.Selector.poll(Selector.java:268)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:270)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:303)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:197)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:187)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:126)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorKnown(AbstractCoordinator.java:186)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:857)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:829)
at 
com.ibm.streamsx.messaging.kafka.KafkaConsumerV9.produceTuples(KafkaConsumerV9.java:129)
at 
com.ibm.streamsx.messaging.kafka.KafkaConsumerV9$1.run(KafkaConsumerV9.java:70)
at java.lang.Thread.run(Thread.java:785)
at 
com.ibm.streams.operator.internal.runtime.OperatorThreadFactory$2.run(OperatorThreadFactory.java:137)
Caused by: javax.management.InstanceAlreadyExistsException: 
kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id=node--1
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:449)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1910)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:978)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:912)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:336)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:534)
at 
org.apache.kafka.common.metrics.JmxReporter.reregister(JmxReporter.java:157)
... 18 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3991) MirrorMaker: allow custom publisher

2016-07-26 Thread Hang Sun (JIRA)
Hang Sun created KAFKA-3991:
---

 Summary: MirrorMaker: allow custom publisher
 Key: KAFKA-3991
 URL: https://issues.apache.org/jira/browse/KAFKA-3991
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.9.0.1
Reporter: Hang Sun
Priority: Minor


It would be nice if MirrorMaker allows custom publisher to be plugged in.  The 
use case we have is to use a customized REST publisher with Kafka REST Proxy 
(http://docs.confluent.io/2.0.0/kafka-rest/docs/index.html) to mirror data 
cross data center via public internet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [GitHub] kafka pull request #1661: KAFKA-3987: Allow config of the hash algorithm use...

2016-07-26 Thread Luciano Afranllie
Hi

Could somebody with commit permission review (and eventually merge) this
pull request?

Regards
Luciano

On Mon, Jul 25, 2016 at 11:49 AM, luafran  wrote:

> GitHub user luafran opened a pull request:
>
> https://github.com/apache/kafka/pull/1661
>
> KAFKA-3987: Allow config of the hash algorithm used by the log cleaner
>
> Allow configuration of the hash algorithm used by the Log Cleaner's
> offset map
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/luafran/kafka
> config-for-log-cleaner-hash-algo
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/kafka/pull/1661.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This closes #1661
>
> 
> commit 2e7e507903c73740ca498405c5680a8c528ccda6
> Author: Luciano Afranllie 
> Date:   2016-07-25T14:39:59Z
>
> KAFKA-3987: Allow configuration of the hash algorithm used by the
> LogCleaner's offset map
>
> 
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393625#comment-15393625
 ] 

Ismael Juma commented on KAFKA-3990:


The buffer is allocated based on the size returned by the broker. It's unlikely 
that the broker would return such a big payload to the producer, so perhaps the 
message got corrupted on the way? Is there anything of interest in the broker 
logs?

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> Notice the size to allocate {{1213486160}} ~1.2 GB. I'm not yet sure how this 
> size is initialised.
> Notice as well that every time this OOME appear the {{NetworkReceive}} 
> constructor at 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L49
>  receive the parameters : {{maxSize=-1}}, 

[jira] [Updated] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-26 Thread Brice Dutheil (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brice Dutheil updated KAFKA-3990:
-
Description: 
We are regularly seeing OOME errors on a kafka producer, we first saw :

{code}
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 ~[kafka-clients-0.9.0.1.jar:na]
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) 
~[kafka-clients-0.9.0.1.jar:na]
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
~[kafka-clients-0.9.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
{code}


This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
(see 
https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)

Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And we 
are producing small messages 500B at most.


Also the error don't appear on the devlopment environment, in order to identify 
the issue we tweaked the code to give us actual data of the allocation size, we 
got this stack :

{code}
09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
NetworkReceive.readFromReadableChannel.receiveSize=1213486160
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/tomcat.hprof ...
Heap dump file created [69583827 bytes in 0.365 secs]
09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread | 
producer-1: 
java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
  at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 ~[kafka-clients-0.9.0.1.jar:na]
  at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) 
~[kafka-clients-0.9.0.1.jar:na]
  at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
~[kafka-clients-0.9.0.1.jar:na]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
{code}


Notice the size to allocate {{1213486160}} ~1.2 GB. I'm not yet sure how this 
size is initialised.
Notice as well that every time this OOME appear the {{NetworkReceive}} 
constructor at 
https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L49
 receive the parameters : {{maxSize=-1}}, {{source="-1"}}

We may have missed configuration in our setup but kafka clients shouldn't raise 
an OOME. For reference the producer is initialised with :

{code}
Properties props = new Properties();
props.put(BOOTSTRAP_SERVERS_CONFIG, properties.bootstrapServers);
props.put(ACKS_CONFIG, "ONE");
props.put(RETRIES_CONFIG, 0);
props.put(BATCH_SIZE_CONFIG, 16384);
props.put(LINGER_MS_CONFIG, 0);
props.put(BUFFER_MEMORY_CONFIG, 33554432);
props.put(REQUEST_TIMEOUT_MS_CONFIG, 1000);
props.put(MAX_BLOCK_MS_CONFIG, 1000);
props.put(KEY_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class.getName());
props.put(VALUE_SERIALIZER_CLASS_CONFIG, 
JSONSerializer.class.getName());
{code}

For reference while googling for the issue we found a similar stack trace with 
the new 

[jira] [Created] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-26 Thread Brice Dutheil (JIRA)
Brice Dutheil created KAFKA-3990:


 Summary: Kafka New Producer may raise an OutOfMemoryError
 Key: KAFKA-3990
 URL: https://issues.apache.org/jira/browse/KAFKA-3990
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.9.0.1
 Environment: Docker, Base image : CentOS
Java 8u77
Reporter: Brice Dutheil


We are regularly seeing OOME errors on a kafka producer, we first saw :

{code}
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 ~[kafka-clients-0.9.0.1.jar:na]
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) 
~[kafka-clients-0.9.0.1.jar:na]
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
~[kafka-clients-0.9.0.1.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
~[kafka-clients-0.9.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
{code}


This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
(see 
https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)

Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And we 
are producing small messages 500B at most.


Also the error don't appear on the devlopment environment, in order to identify 
the issue we tweaked the code to give us actual data of the allocation size, we 
got this stack :

{code}
09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
NetworkReceive.readFromReadableChannel.receiveSize=1213486160
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/tomcat.hprof ...
Heap dump file created [69583827 bytes in 0.365 secs]
09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread | 
producer-1: 
java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
  at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 ~[kafka-clients-0.9.0.1.jar:na]
  at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) 
~[kafka-clients-0.9.0.1.jar:na]
  at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
~[kafka-clients-0.9.0.1.jar:na]
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
~[kafka-clients-0.9.0.1.jar:na]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
{code}


Notice the size to allocate {{1213486160}} ~1.2 GB. I'm not yet sure how this 
size is initialised.
Notice as well that every time this OOME appear the NetworkReceive constructor 
receive the parameters : {{maxSize=-1}}, {{source="-1"}} 
(https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L49)

We may have missed configuration in our setup but kafka clients shouldn't raise 
an OOME. For reference the producer is initialised with :

{code}
Properties props = new Properties();
props.put(BOOTSTRAP_SERVERS_CONFIG, properties.bootstrapServers);
props.put(ACKS_CONFIG, "ONE");
props.put(RETRIES_CONFIG, 0);
props.put(BATCH_SIZE_CONFIG, 16384);
props.put(LINGER_MS_CONFIG, 0);
props.put(BUFFER_MEMORY_CONFIG, 33554432);
props.put(REQUEST_TIMEOUT_MS_CONFIG, 1000);
props.put(MAX_BLOCK_MS_CONFIG, 1000);

Re: [VOTE] KIP-55: Secure quotas for authenticated users

2016-07-26 Thread Rajini Sivaram
Hi all,

This vote has passed with 3 binding and one non-binding +1. Many thanks to
those who voted.

If you have any comments or suggestions, please continue to add them to the
discussion thread.


On Fri, Jul 15, 2016 at 6:47 AM, Gwen Shapira  wrote:

> +1
>
> Thanks for working through this, Rajini.
>
> On Thu, Jul 7, 2016 at 7:12 AM, Rajini Sivaram
>  wrote:
> > I would like to initiate voting for KIP-55 (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> ).
> > Since the KIP has changed quite a lot since the last vote, we will
> discard
> > the previous vote and start this new voting thread.
> >
> > KIP-55 extends the existing client-id quota implementation to enable
> secure
> > quotas for multi-user environments. The KIP proposes a flexible, unified
> > design that supports quotas at ,  or 
> > levels. It retains compatibility with the existing  quotas
> when
> > new user level quotas are not configured.
> >
> > Thank you...
> >
> >
> > Regards,
> >
> > Rajini
>



-- 
Regards,

Rajini


[GitHub] kafka pull request #1665: MINOR: Increase default timeout for other `wait` m...

2016-07-26 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1665

MINOR: Increase default timeout for other `wait` methods in `TestUtils`

They are now consistent with `waitUntilTrue`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka increase-default-wait-until-time

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1665


commit 3ebff18e4cd850d0b652d39dc4aae6b3e1f0c4f2
Author: Ismael Juma 
Date:   2016-07-26T07:50:21Z

MINOR: Increase default timeout for other `wait` methods in `TestUtils`

They are now consistent with `waitUntilTrue`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---