[jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423839#comment-15423839
 ] 

Joel Koshy commented on KAFKA-4050:
---

A stack trace should help further clarify. (This is from a thread dump that 
Todd shared with us offline). Thanks [~toddpalino] and [~mgharat] for finding 
this.

{noformat}
"kafka-network-thread-1393-SSL-30" #114 prio=5 os_prio=0 tid=0x7f2ec8c30800 
nid=0x5c1e waiting for monitor entry [0x7f213b8f9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:481)
- waiting to lock <0x000641508bf8> (a java.lang.Object)
at 
sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:329)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:218)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
- locked <0x00066aad9880> (a java.security.SecureRandom)
at sun.security.ssl.CipherBox.createExplicitNonce(CipherBox.java:1015)
at 
sun.security.ssl.EngineOutputRecord.write(EngineOutputRecord.java:287)
at 
sun.security.ssl.EngineOutputRecord.write(EngineOutputRecord.java:225)
at sun.security.ssl.EngineWriter.writeRecord(EngineWriter.java:186)
- locked <0x000671c5c978> (a sun.security.ssl.EngineWriter)
at sun.security.ssl.SSLEngineImpl.writeRecord(SSLEngineImpl.java:1300)
at 
sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1271)
- locked <0x000671ce7170> (a java.lang.Object)
at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
- locked <0x000671ce7150> (a java.lang.Object)
at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
at org.apache.kafka.common.network.SslTransportLayer.write(p.java:557)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:146)
at org.apache.kafka.common.network.MultiSend.writeTo(MultiSend.java:81)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:292)
at 
org.apache.kafka.common.network.KafkaChannel.send(KafkaChannel.java:158)
at 
org.apache.kafka.common.network.KafkaChannel.write(KafkaChannel.java:146)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:329)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at kafka.network.Processor.poll(SocketServer.scala:472)
at kafka.network.Processor.run(SocketServer.scala:412)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Of note is that all of the network threads are waiting on the same NativePRNG 
lock (0x000641508bf8)

> Allow configuration of the PRNG used for SSL
> 
>
> Key: KAFKA-4050
> URL: https://issues.apache.org/jira/browse/KAFKA-4050
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Todd Palino
>Assignee: Todd Palino
>  Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates 

Re: [DISCUSS] Java 8 as a minimum requirement

2016-08-16 Thread Alexis Midon
java7 is end of life. http://www.oracle.com/technetwork/java/eol-135779.html
+1



On Tue, Aug 16, 2016 at 6:43 AM Ismael Juma  wrote:

> Hey Harsha,
>
> I noticed that you proposed that Storm should drop support for Java 7 in
> master:
>
> http://markmail.org/message/25do6wd3a6g7cwpe
>
> It's useful to know what other Apache projects are doing in this regard, so
> I'm interested in the timeline being proposed for Storm's transition. I
> could not find it in the thread above, so I'd appreciate it if you could
> clarify it for us (and sorry if I missed it).
>
> Thanks,
> Ismael
>
> On Mon, Jun 20, 2016 at 5:05 AM, Harsha  wrote:
>
> > Hi Ismael,
> >   Agree on timing is more important. If we give enough heads
> >   up to the users who are on Java 7 thats great but still
> >   shipping this in 0.10.x line is won't be good as it still
> >   perceived as maint release even the release might contain
> >   lot of features .  If we can make this as part of 0.11 and
> >   cutting 0.10.1 features moving to 0.11 and giving rough
> >   timeline when that would be released would be ideal.
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Jun 17, 2016, at 11:13 AM, Ismael Juma wrote:
> > > Hi Harsha,
> > >
> > > Comments below.
> > >
> > > On Fri, Jun 17, 2016 at 7:48 PM, Harsha  wrote:
> > >
> > > > Hi Ismael,
> > > > "Are you saying that you are aware of many Kafka users still
> > > > using Java 7
> > > > > who would be ready to upgrade to the next Kafka feature release
> > (whatever
> > > > > that version number is) before they can upgrade to Java 8?"
> > > > I know there quite few users who are still on java 7
> > >
> > >
> > > This is good to know.
> > >
> > >
> > > > and regarding the
> > > > upgrade we can't say Yes or no.  Its upto the user discretion when
> they
> > > > choose to upgrade and ofcourse if there are any critical fixes that
> > > > might go into the release.  We shouldn't be restricting their upgrade
> > > > path just because we removed Java 7 support.
> > > >
> > >
> > > My point is that both paths have their pros and cons and we need to
> weigh
> > > them up. If some users are slow to upgrade the Java version (Java 7 has
> > > been EOL'd for over a year), there's a good chance that they are slow
> to
> > > upgrade Kafka too. And if that is the case (and it may not be), then
> > > holding up improvements for the ones who actually do upgrade may be the
> > > wrong call. To be clear, I am still in listening mode and I haven't
> made
> > > up
> > > my mind on the subject.
> > >
> > > Once we released 0.9.0 there aren't any 0.8.x releases. i.e we don't
> > > > have LTS type release where we continually ship critical fixes over
> > > > 0.8.x minor releases. So if a user notices a critical fix the only
> > > > option today is to upgrade to next version where that fix is shipped.
> > > >
> > >
> > > We haven't done a great job at this in the past, but there is no
> decision
> > > that once a new major release is out, we don't do patch releases for
> the
> > > previous major release. In fact, we have been collecting critical fixes
> > > in
> > > the 0.9.0 branch for a potential 0.9.0.2.
> > >
> > > I understand there is no decision made yet but given the premise was to
> > > > ship this in 0.10.x  , possibly 0.10.1 which I don't agree with. In
> > > > general against shipping this in 0.10.x version. Removing Java 7
> > support
> > > > when the release is minor in general not a good idea to users.
> > > >
> > >
> > > Sorry if I didn't communicate this properly. I simply meant the next
> > > feature release. I used 0.10.1.0 as an example, but it could also be
> > > 0.11.0.0 if that turns out to be the next release. A discussion on that
> > > will probably take place once the scope is clear. Personally, I think
> the
> > > timing is more important the the version number, but it seems like some
> > > people disagree.
> > >
> > > Ismael
> >
>


[jira] [Updated] (KAFKA-4052) Allow passing properties file to ProducerPerformance

2016-08-16 Thread Ashish K Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish K Singh updated KAFKA-4052:
--
Status: Patch Available  (was: Open)

> Allow passing properties file to ProducerPerformance
> 
>
> Key: KAFKA-4052
> URL: https://issues.apache.org/jira/browse/KAFKA-4052
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Allow passing properties file to ProducerPerformance, to enable using the 
> tool against secure Kafka installs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4052) Allow passing properties file to ProducerPerformance

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423723#comment-15423723
 ] 

ASF GitHub Bot commented on KAFKA-4052:
---

GitHub user SinghAsDev opened a pull request:

https://github.com/apache/kafka/pull/1749

KAFKA-4052: Allow passing properties file to ProducerPerformance



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SinghAsDev/kafka KAFKA-4052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1749


commit 4fee22a1489286a8146935f18f942fe405a71732
Author: Ashish Singh 
Date:   2016-08-17T01:27:06Z

KAFKA-4052: Allow passing properties file to ProducerPerformance




> Allow passing properties file to ProducerPerformance
> 
>
> Key: KAFKA-4052
> URL: https://issues.apache.org/jira/browse/KAFKA-4052
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Allow passing properties file to ProducerPerformance, to enable using the 
> tool against secure Kafka installs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1749: KAFKA-4052: Allow passing properties file to Produ...

2016-08-16 Thread SinghAsDev
GitHub user SinghAsDev opened a pull request:

https://github.com/apache/kafka/pull/1749

KAFKA-4052: Allow passing properties file to ProducerPerformance



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SinghAsDev/kafka KAFKA-4052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1749


commit 4fee22a1489286a8146935f18f942fe405a71732
Author: Ashish Singh 
Date:   2016-08-17T01:27:06Z

KAFKA-4052: Allow passing properties file to ProducerPerformance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1748: KAFKA-3940 Log should check the return value of di...

2016-08-16 Thread imandhan
GitHub user imandhan opened a pull request:

https://github.com/apache/kafka/pull/1748

KAFKA-3940 Log should check the return value of dir.mkdirs()

This commit changes all the occurrences of dir.mkdirs() with 
Files.createDirectory(dir.toPath())

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/imandhan/kafka KAFKA-3940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1748.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1748


commit baf70497759e636a55d50f03d0c5a9673ac71d2d
Author: Ishita Mandhan 
Date:   2016-08-17T00:32:22Z

KAFKA-3940 Log should check the return value of dir.mkdirs()

This commit changes all the occurrences of dir.mkdirs() with 
Files.createDirectory(dir.toPath())




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3940) Log should check the return value of dir.mkdirs()

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423669#comment-15423669
 ] 

ASF GitHub Bot commented on KAFKA-3940:
---

GitHub user imandhan opened a pull request:

https://github.com/apache/kafka/pull/1748

KAFKA-3940 Log should check the return value of dir.mkdirs()

This commit changes all the occurrences of dir.mkdirs() with 
Files.createDirectory(dir.toPath())

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/imandhan/kafka KAFKA-3940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1748.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1748


commit baf70497759e636a55d50f03d0c5a9673ac71d2d
Author: Ishita Mandhan 
Date:   2016-08-17T00:32:22Z

KAFKA-3940 Log should check the return value of dir.mkdirs()

This commit changes all the occurrences of dir.mkdirs() with 
Files.createDirectory(dir.toPath())




> Log should check the return value of dir.mkdirs()
> -
>
> Key: KAFKA-3940
> URL: https://issues.apache.org/jira/browse/KAFKA-3940
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ishita Mandhan
>  Labels: newbie
>
> In Log.loadSegments(), we call dir.mkdirs() w/o checking the return value and 
> just assume the directory will exist after the call. However, if the 
> directory can't be created (e.g. due to no space), we will hit 
> NullPointerException in the next statement, which will be confusing.
>for(file <- dir.listFiles if file.isFile) {



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4052) Allow passing properties file to ProducerPerformance

2016-08-16 Thread Ashish K Singh (JIRA)
Ashish K Singh created KAFKA-4052:
-

 Summary: Allow passing properties file to ProducerPerformance
 Key: KAFKA-4052
 URL: https://issues.apache.org/jira/browse/KAFKA-4052
 Project: Kafka
  Issue Type: Improvement
Reporter: Ashish K Singh
Assignee: Ashish K Singh


Allow passing properties file to ProducerPerformance, to enable using the tool 
against secure Kafka installs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Strange behavior when turn the system clock back

2016-08-16 Thread Gabriel Ibarra
Hi, Thanks for answering Ismael. I'm sorry I was absent the last days.

Here is the link to the issue:
https://issues.apache.org/jira/browse/KAFKA-4051


On Thu, Aug 11, 2016 at 5:11 PM, Ismael Juma  wrote:

> It's probably worth filing a ticket in JIRA. Please also include a bit of
> context why it's important for the consumers to tolerate system clock
> changes.
>
> Ismael
>
> On Thu, Aug 11, 2016 at 7:54 PM, Gabriel Ibarra <
> gabriel.iba...@tallertechnologies.com> wrote:
>
> > Thanks Ismael,
> > I agree with you, It seems to be a problem related with absolute timers.
> >
> > So, How we continue?, do you agree with report this as a bug?
> > In our system this issue has a great impact. And maybe this particular
> > issue could be fixed without a serious decreasing of performance.
> >
> >
> > On Thu, Aug 11, 2016 at 11:11 AM, Ismael Juma  wrote:
> >
> > > Kafka code uses System.currentTimeMillis in a number of places, so it
> > would
> > > not surprise me if it misbehaves when the clock is turned back by an
> > hour.
> > > System.nanoTime is meant to handle this issue, but there are questions
> > > about the performance impact of using that (
> > > https://github.com/apache/kafka/pull/837).
> > >
> > > Ismael
> > >
> > > On Thu, Aug 11, 2016 at 2:19 PM, Gabriel Ibarra <
> > > gabriel.iba...@tallertechnologies.com> wrote:
> > >
> > > > Thanks for answering, all help is welcome.
> > > >
> > > > Yes, I tested without changing the clock and It works well.
> > > > Actually both consumer are running in different process,
> > > > so I think it is not the case that you mention.
> > > >
> > > > I even tested this using two different Kafka clients,
> > > > using the java client and using librdkafka of edenhill (a c client),
> > > > and I got the same results.
> > > > That is why I think that the problem come from Kafka.
> > > >
> > > > Gabriel
> > > >
> > > >
> > > > On Thu, Aug 11, 2016 at 2:20 AM, Gwen Shapira 
> > wrote:
> > > >
> > > > > I know it sounds silly, but did you check that your test setup
> works
> > > > > when you don't change the clock?
> > > > >
> > > > > This pattern can happen when two consumers somehow block each other
> > > > > (for example, one thread with two consumers) - so one waits for the
> > > > > other to join, but the other is blocked, so the first is timed out
> > and
> > > > > then the second is unblocked and manages to join but now the first
> is
> > > > > blocked and so on...
> > > > >
> > > > > Gwen
> > > > >
> > > > > On Wed, Aug 10, 2016 at 10:29 AM, Gabriel Ibarra
> > > > >  wrote:
> > > > > > Hello guys, I am dealing with an issue when turn the system clock
> > > back
> > > > > > (either due to NTP or administrator action). I'm using
> > > > > kafka_2.11-0.10.0.0
> > > > > >
> > > > > > I follow the next steps.
> > > > > > - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It
> will
> > > be
> > > > > > owner of all the partitions.
> > > > > > - Turn the system clock back. For instance 1 hour.
> > > > > > - Start a new consumer for TOPIC_NAME  using the same group id,
> it
> > > will
> > > > > > force a rebalance.
> > > > > >
> > > > > > After these actions the kafka server logs constantly the below
> > > > > > messages, and after
> > > > > > a while both consumers do not receive more packages. I saw that
> > this
> > > > > > condition lasts at least the time that the clock went back, for
> > this
> > > > > > example 1 hour, and finally after this time kafka come back to
> > work.
> > > > > >
> > > > > > [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to
> > > > > > restabilize group GROUP_NAME with old generation 2
> > > (kafka.coordinator.
> > > > > > GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized
> > group
> > > > > > GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to
> > > > > > restabilize group GROUP_NAME with old generation 3
> > > (kafka.coordinator.
> > > > > > GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group
> > GROUP_NAME
> > > > > > generation 3 is dead and removed (kafka.coordinator.
> > > GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to
> > > > > > restabilize group GROUP_NAME with old generation 0
> > > (kafka.coordinator.
> > > > > > GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized
> > group
> > > > > > GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to
> > > > > > restabilize group GROUP_NAME with old generation 1
> > > (kafka.coordinator.
> > > > > > GroupCoordinator)
> > > > > > [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP
> > > > > generation
> > > > > > 1 is dead and 

[jira] [Created] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-16 Thread Gabriel Ibarra (JIRA)
Gabriel Ibarra created KAFKA-4051:
-

 Summary: Strange behavior during rebalance when turning the OS 
clock back
 Key: KAFKA-4051
 URL: https://issues.apache.org/jira/browse/KAFKA-4051
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.0.0
 Environment: OS: Ubuntu 14.04 - 64bits

Reporter: Gabriel Ibarra


If a rebalance is performed after turning the OS clock back, then the kafka 
server enters in a loop and the rebalance cannot be completed until the system 
returns to the previous date/hour.

Steps to Reproduce:

- Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner of 
all the partitions.
- Turn the system (OS) clock back. For instance 1 hour.
- Start a new consumer for TOPIC_NAME  using the same group id, it will force a 
rebalance.

After these actions the kafka server logs constantly display the messages 
below, and after a while both consumers do not receive more packages. This 
condition lasts at least the time that the clock went back, for this example 1 
hour, and finally after this time kafka comes back to work.

[2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)

Due to the fact that some systems could have enabled NTP or an administrator 
option to change the system clock (date/time) it's important to do it safely, 
currently the only way to do it safely is following the next steps:

1-  Tear down the Kafka server.
2-  Change the date/time
3- Tear up the Kafka server.

But, this approach can be done only if the change was performed by the 
administrator, not for NTP. Also in many systems turning down the Kafka server 
might cause the INFORMATION TO BE LOST.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread Todd Palino (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423400#comment-15423400
 ] 

Todd Palino commented on KAFKA-4050:


It appears to be called every time something needs to be encrypted (have to get 
randomness to run the crypto routines), so yeah, every single packet being sent 
would need a call to get random bytes.

> Allow configuration of the PRNG used for SSL
> 
>
> Key: KAFKA-4050
> URL: https://issues.apache.org/jira/browse/KAFKA-4050
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Todd Palino
>Assignee: Todd Palino
>  Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates that and 
> passes it to SSLContext if provided. This will also let someone select a 
> stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423390#comment-15423390
 ] 

ASF GitHub Bot commented on KAFKA-4050:
---

GitHub user toddpalino opened a pull request:

https://github.com/apache/kafka/pull/1747

KAFKA-4050: Allow configuration of the PRNG used for SSL

Add an optional configuration for the SecureRandom PRNG implementation, 
with the default behavior being the same (use the default implementation in the 
JDK/JRE).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/toddpalino/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1747


commit 4d101bdf5faf58fb2e0e54c0848133cc9deda6ef
Author: Todd Palino 
Date:   2016-08-16T20:58:34Z

Make the SecureRandom implementation configurable

commit 4793b69b3287b0f4fc8905aa62608e5b64f57f90
Author: Todd Palino 
Date:   2016-08-16T20:58:52Z

Add a note in the docs about setting the SecureRandom PRNG implementation




> Allow configuration of the PRNG used for SSL
> 
>
> Key: KAFKA-4050
> URL: https://issues.apache.org/jira/browse/KAFKA-4050
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Todd Palino
>Assignee: Todd Palino
>  Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates that and 
> passes it to SSLContext if provided. This will also let someone select a 
> stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1747: KAFKA-4050: Allow configuration of the PRNG used f...

2016-08-16 Thread toddpalino
GitHub user toddpalino opened a pull request:

https://github.com/apache/kafka/pull/1747

KAFKA-4050: Allow configuration of the PRNG used for SSL

Add an optional configuration for the SecureRandom PRNG implementation, 
with the default behavior being the same (use the default implementation in the 
JDK/JRE).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/toddpalino/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1747


commit 4d101bdf5faf58fb2e0e54c0848133cc9deda6ef
Author: Todd Palino 
Date:   2016-08-16T20:58:34Z

Make the SecureRandom implementation configurable

commit 4793b69b3287b0f4fc8905aa62608e5b64f57f90
Author: Todd Palino 
Date:   2016-08-16T20:58:52Z

Add a note in the docs about setting the SecureRandom PRNG implementation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk7 #1478

2016-08-16 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-3769: Create new sensors per-thread in KafkaStreams

[me] HOTFIX: Re-inserted SimpleBenchmark output for system tests

--
[...truncated 12167 lines...]
org.apache.kafka.streams.processor.internals.MinTimestampTrackerTest > 
testTracking PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testProcessOrder 
STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testProcessOrder 
PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testPauseResume 
STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testPauseResume 
PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > 
testMaybePunctuate STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > 
testMaybePunctuate PASSED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
testEncodeDecode STARTED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
testEncodeDecode PASSED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
shouldDecodePreviousVersion STARTED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
shouldDecodePreviousVersion PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
testEncodeDecode STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
testEncodeDecode PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldEncodeDecodeWithUserEndPoint STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldEncodeDecodeWithUserEndPoint PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldBeBackwardCompatible STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldBeBackwardCompatible PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testStickiness STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testStickiness PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithStandby STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithStandby PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithoutStandby STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithoutStandby PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexingTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexingTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingStatefulTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingStatefulTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingSimpleTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingSimpleTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testTopologyMetadata STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testTopologyMetadata PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexByNameTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexByNameTopology PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner PASSED

org.apache.kafka.streams.processor.internals.PunctuationQueueTest > 
testPunctuationInterval STARTED

org.apache.kafka.streams.processor.internals.PunctuationQueueTest > 
testPunctuationInterval PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldAddUserDefinedEndPointToSubscription STARTED


Re: [VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-16 Thread Jun Rao
Andrey,

Thanks for the KIP. +1

Jun

On Tue, Aug 16, 2016 at 1:32 PM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi!
>
> I would like to initiate the voting process for KIP-74:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
>
>
> Thanks,
> Andrey.


[jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423368#comment-15423368
 ] 

Jun Rao commented on KAFKA-4050:


[~toddpalino], thanks for reporting this. Do you know how often PRNG is called? 
On every network packet?

> Allow configuration of the PRNG used for SSL
> 
>
> Key: KAFKA-4050
> URL: https://issues.apache.org/jira/browse/KAFKA-4050
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Todd Palino
>Assignee: Todd Palino
>  Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates that and 
> passes it to SSLContext if provided. This will also let someone select a 
> stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423354#comment-15423354
 ] 

Ismael Juma commented on KAFKA-4050:


Nice find. :)

> Allow configuration of the PRNG used for SSL
> 
>
> Key: KAFKA-4050
> URL: https://issues.apache.org/jira/browse/KAFKA-4050
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
>Reporter: Todd Palino
>Assignee: Todd Palino
>  Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates that and 
> passes it to SSLContext if provided. This will also let someone select a 
> stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4050) Allow configuration of the PRNG used for SSL

2016-08-16 Thread Todd Palino (JIRA)
Todd Palino created KAFKA-4050:
--

 Summary: Allow configuration of the PRNG used for SSL
 Key: KAFKA-4050
 URL: https://issues.apache.org/jira/browse/KAFKA-4050
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 0.10.0.1
Reporter: Todd Palino
Assignee: Todd Palino


This change will make the pseudo-random number generator (PRNG) implementation 
used by the SSLContext configurable. The configuration is not required, and the 
default is to use whatever the default PRNG for the JDK/JRE is. Providing a 
string, such as "SHA1PRNG", will cause that specific SecureRandom 
implementation to get passed to the SSLContext.

When enabling inter-broker SSL in our certification cluster, we observed severe 
performance issues. For reference, this cluster can take up to 600 MB/sec of 
inbound produce traffic over SSL, with RF=2, before it gets close to 
saturation, and the mirror maker normally produces about 400 MB/sec (unless it 
is lagging). When we enabled inter-broker SSL, we saw persistent replication 
problems in the cluster at any inbound rate of more than about 6 or 7 MB/sec 
per-broker. This was narrowed down to all the network threads blocking on a 
single lock in the SecureRandom code.

It turns out that the default PRNG implementation on Linux is NativePRNG. This 
uses randomness from /dev/urandom (which, by itself, is a non-blocking read) 
and mixes it with randomness from SHA1. The problem is that the entire 
application shares a single SecureRandom instance, and NativePRNG has a global 
lock within the implNextBytes method. Switching to another implementation 
(SHA1PRNG, which has better performance characteristics and is still considered 
secure) completely eliminated the bottleneck and allowed the cluster to work 
properly at saturation.

The SSLContext initialization has an optional argument to provide a 
SecureRandom instance, which the code currently sets to null. This change 
creates a new config to specify an implementation, and instantiates that and 
passes it to SSLContext if provided. This will also let someone select a 
stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4039) Exit Strategy: using exceptions instead of inline invocation of exit/halt

2016-08-16 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423346#comment-15423346
 ] 

Maysam Yabandeh commented on KAFKA-4039:


Certainly. I will submit a patch soon.

> Exit Strategy: using exceptions instead of inline invocation of exit/halt
> -
>
> Key: KAFKA-4039
> URL: https://issues.apache.org/jira/browse/KAFKA-4039
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> The current practice is to directly invoke halt/exit right after the line 
> that intends to terminate the execution. In the case of System.exit this 
> could cause deadlocks if the thread invoking System.exit is holding  a lock 
> that will be requested by the shutdown hook threads that will be started by 
> System.exit. An example is reported by [~aozeritsky] in KAFKA-3924. This 
> would also makes testing more difficult as it would require mocking static 
> methods of System and Runtime classes, which is not natively supported in 
> Java.
> One alternative suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15420269=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15420269]
>  would be to throw some dedicated exceptions that will eventually invoke 
> exit/halt:
> {quote} it would be great to move away from executing `System.exit` inline in 
> favour of throwing an exception (two examples, but maybe we can find better 
> names: FatalExitException and FatalHaltException) that is caught by some 
> central code that then does the `System.exit` or `Runtime.getRuntime.halt`. 
> This helps in a couple of ways:
> (1) Avoids issues with locks being held as in this issue
> (2) It makes it possible to abstract the action, which is very useful in 
> tests. At the moment, we can't easily test for these conditions as they cause 
> the whole test harness to exit. Worse, these conditions are sometimes 
> triggered in the tests and it's unclear why.
> (3) We can have more consistent logging around these actions and possibly 
> extended logging for tests
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-16 Thread Andrey L. Neporada
Hi!

I would like to initiate the voting process for KIP-74:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes


Thanks,
Andrey.

Build failed in Jenkins: kafka-trunk-jdk8 #818

2016-08-16 Thread Apache Jenkins Server
See 

Changes:

[me] HOTFIX: Re-inserted SimpleBenchmark output for system tests

--
[...truncated 6821 lines...]

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > 

Re: [VOTE] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Ismael Juma
Hi Andrey,

Can you please start a new thread for the vote? Gmail is showing your vote
message in the discuss thread.

Ismael

On Tue, Aug 16, 2016 at 9:15 PM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi!
>
> I would like to initiate the voting process for KIP-74:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
>
> Thanks,
> Andrey.


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Description: We only allow for handling {{RetriableException}} from calls 
to {{SinkTask.put()}}, but this is something we should support also for 
{{flush()}}  and arguably also {{open()}}.  (was: We only allow for handling 
{{RetriableException}} from calls to {{SinkTask.put()}}, but this is something 
we should support also for {{flush()}}  and arguably also {{open()}}.

We don't have support for {{RetriableException}} with {{SourceTask}}.)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4039) Exit Strategy: using exceptions instead of inline invocation of exit/halt

2016-08-16 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423311#comment-15423311
 ] 

Maysam Yabandeh commented on KAFKA-4039:


Revert is certainly an option but I am not sure if the deadlock is more serious 
than data loss: with the current status it shuts down cleanly but does not exit 
waiting for one last thread that is processing an outstanding request. For me 
this is a bit of operational burden but it is at least safe.

Also the suggested solution of throwing an exception seems pretty 
straightforward and would address the other perhaps-yet-unreported issues with 
inline invocation of system.exit. If there is no other volunteer I would be 
happy to take up this jira and prepare the patch.

> Exit Strategy: using exceptions instead of inline invocation of exit/halt
> -
>
> Key: KAFKA-4039
> URL: https://issues.apache.org/jira/browse/KAFKA-4039
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> The current practice is to directly invoke halt/exit right after the line 
> that intends to terminate the execution. In the case of System.exit this 
> could cause deadlocks if the thread invoking System.exit is holding  a lock 
> that will be requested by the shutdown hook threads that will be started by 
> System.exit. An example is reported by [~aozeritsky] in KAFKA-3924. This 
> would also makes testing more difficult as it would require mocking static 
> methods of System and Runtime classes, which is not natively supported in 
> Java.
> One alternative suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15420269=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15420269]
>  would be to throw some dedicated exceptions that will eventually invoke 
> exit/halt:
> {quote} it would be great to move away from executing `System.exit` inline in 
> favour of throwing an exception (two examples, but maybe we can find better 
> names: FatalExitException and FatalHaltException) that is caught by some 
> central code that then does the `System.exit` or `Runtime.getRuntime.halt`. 
> This helps in a couple of ways:
> (1) Avoids issues with locks being held as in this issue
> (2) It makes it possible to abstract the action, which is very useful in 
> tests. At the moment, we can't easily test for these conditions as they cause 
> the whole test harness to exit. Worse, these conditions are sometimes 
> triggered in the tests and it's unclear why.
> (3) We can have more consistent logging around these actions and possibly 
> extended logging for tests
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Summary: Connect does not support RetriableException consistently for sinks 
 (was: Connect does not support RetriableException consistently for sources & 
sinks)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.
> We don't have support for {{RetriableException}} with {{SourceTask}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Andrey L. Neporada
Hi!

I would like to initiate the voting process for KIP-74:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes

Thanks,
Andrey.

[jira] [Commented] (KAFKA-4039) Exit Strategy: using exceptions instead of inline invocation of exit/halt

2016-08-16 Thread Alexey Ozeritskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423298#comment-15423298
 ] 

Alexey Ozeritskiy commented on KAFKA-4039:
--

Thanks [~maysamyabandeh].
I think we have to decide how to solve the problem or revert commit 
d1757c70a198014e85026c01a1a4ccab6a12da7d because it brings more serious 
problem. In fact, my "solution" from KAFKA-3924 does not solve the problem 
because shutdown hook waits for sheduler and vice versa. 

> Exit Strategy: using exceptions instead of inline invocation of exit/halt
> -
>
> Key: KAFKA-4039
> URL: https://issues.apache.org/jira/browse/KAFKA-4039
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> The current practice is to directly invoke halt/exit right after the line 
> that intends to terminate the execution. In the case of System.exit this 
> could cause deadlocks if the thread invoking System.exit is holding  a lock 
> that will be requested by the shutdown hook threads that will be started by 
> System.exit. An example is reported by [~aozeritsky] in KAFKA-3924. This 
> would also makes testing more difficult as it would require mocking static 
> methods of System and Runtime classes, which is not natively supported in 
> Java.
> One alternative suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15420269=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15420269]
>  would be to throw some dedicated exceptions that will eventually invoke 
> exit/halt:
> {quote} it would be great to move away from executing `System.exit` inline in 
> favour of throwing an exception (two examples, but maybe we can find better 
> names: FatalExitException and FatalHaltException) that is caught by some 
> central code that then does the `System.exit` or `Runtime.getRuntime.halt`. 
> This helps in a couple of ways:
> (1) Avoids issues with locks being held as in this issue
> (2) It makes it possible to abstract the action, which is very useful in 
> tests. At the moment, we can't easily test for these conditions as they cause 
> the whole test harness to exit. Worse, these conditions are sometimes 
> triggered in the tests and it's unclear why.
> (3) We can have more consistent logging around these actions and possibly 
> extended logging for tests
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1746: KAFKA-4049: Fix transient failure in RegexSourceIn...

2016-08-16 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/1746

KAFKA-4049: Fix transient failure in RegexSourceIntegrationTest



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
K4049-RegexSourceIntegrationTest-failure

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1746.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1746


commit e045b977cc5607e72402e64547df773d9bda7a61
Author: Guozhang Wang 
Date:   2016-08-16T20:08:27Z

fix transient failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4049) Transient failure in RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423304#comment-15423304
 ] 

ASF GitHub Bot commented on KAFKA-4049:
---

GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/1746

KAFKA-4049: Fix transient failure in RegexSourceIntegrationTest



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
K4049-RegexSourceIntegrationTest-failure

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1746.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1746


commit e045b977cc5607e72402e64547df773d9bda7a61
Author: Guozhang Wang 
Date:   2016-08-16T20:08:27Z

fix transient failure




> Transient failure in 
> RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted
> --
>
> Key: KAFKA-4049
> URL: https://issues.apache.org/jira/browse/KAFKA-4049
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: test
>
> There is an hidden assumption in this test case that the created 
> {{TEST-TOPIC-A}} and {{TEST-TOPIC-B}} are propagated to the streams client at 
> the same time, and stored as {{assignedTopicPartitions[0]}}. However this is 
> not always true since these two topics may be added on the client side as two 
> consecutive metadata refreshes.
> The proposed fix includes the following:
> 1. In {{waitForCondition}} do not trigger the {{conditionMet}} function again 
> after the while loop, but just remember the returned value from the last 
> call. This is safer so that if the condition changes after the while loop it 
> will not be considered as well.
> 2. Not remembering a map of all the previous assigned partitions, but only 
> the most recent one. And also get rid of the final check after streams client 
> is closed by just use {{equals}} in the condition to make sure that it is 
> exactly the same to the expected assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4049) Transient failure in RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted

2016-08-16 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4049:
-
Description: 
There is an hidden assumption in this test case that the created 
{{TEST-TOPIC-A}} and {{TEST-TOPIC-B}} are propagated to the streams client at 
the same time, and stored as {{assignedTopicPartitions[0]}}. However this is 
not always true since these two topics may be added on the client side as two 
consecutive metadata refreshes.

The proposed fix includes the following:

1. In {{waitForCondition}} do not trigger the {{conditionMet}} function again 
after the while loop, but just remember the returned value from the last call. 
This is safer so that if the condition changes after the while loop it will not 
be considered as well.

2. Not remembering a map of all the previous assigned partitions, but only the 
most recent one. And also get rid of the final check after streams client is 
closed by just use {{equals}} in the condition to make sure that it is exactly 
the same to the expected assignment.

  was:There is a race condition in 


> Transient failure in 
> RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted
> --
>
> Key: KAFKA-4049
> URL: https://issues.apache.org/jira/browse/KAFKA-4049
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: test
>
> There is an hidden assumption in this test case that the created 
> {{TEST-TOPIC-A}} and {{TEST-TOPIC-B}} are propagated to the streams client at 
> the same time, and stored as {{assignedTopicPartitions[0]}}. However this is 
> not always true since these two topics may be added on the client side as two 
> consecutive metadata refreshes.
> The proposed fix includes the following:
> 1. In {{waitForCondition}} do not trigger the {{conditionMet}} function again 
> after the while loop, but just remember the returned value from the last 
> call. This is safer so that if the condition changes after the while loop it 
> will not be considered as well.
> 2. Not remembering a map of all the previous assigned partitions, but only 
> the most recent one. And also get rid of the final check after streams client 
> is closed by just use {{equals}} in the condition to make sure that it is 
> exactly the same to the expected assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4049) Transient failure in RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted

2016-08-16 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4049:
-
Description: There is a race condition in 

> Transient failure in 
> RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted
> --
>
> Key: KAFKA-4049
> URL: https://issues.apache.org/jira/browse/KAFKA-4049
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: test
>
> There is a race condition in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2016-08-16 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423285#comment-15423285
 ] 

Guozhang Wang commented on KAFKA-3410:
--

[~wushujames] Could you check if KAFKA-3924 fixed this issue?

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4049) Transient failure in RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted

2016-08-16 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4049:


 Summary: Transient failure in 
RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted
 Key: KAFKA-4049
 URL: https://issues.apache.org/jira/browse/KAFKA-4049
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Guozhang Wang
Assignee: Guozhang Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423239#comment-15423239
 ] 

ASF GitHub Bot commented on KAFKA-4042:
---

GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1745

WIP: KAFKA-4042: prevent DistributedHerder thread from dying from 
connector/task lifecycle exceptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 715e7535d4aaae6174b3d6c1607617b382f0a8b8
Author: Shikhar Bhushan 
Date:   2016-08-16T19:06:56Z

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions




> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1745: WIP: KAFKA-4042: prevent DistributedHerder thread ...

2016-08-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1745

WIP: KAFKA-4042: prevent DistributedHerder thread from dying from 
connector/task lifecycle exceptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 715e7535d4aaae6174b3d6c1607617b382f0a8b8
Author: Shikhar Bhushan 
Date:   2016-08-16T19:06:56Z

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #817

2016-08-16 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-3769: Create new sensors per-thread in KafkaStreams

--
[...truncated 3429 lines...]

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > 

[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Component/s: KafkaConnect

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4048) Connect does not support RetriableException consistently for sources & sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4048:
--

 Summary: Connect does not support RetriableException consistently 
for sources & sinks
 Key: KAFKA-4048
 URL: https://issues.apache.org/jira/browse/KAFKA-4048
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


We only allow for handling {{RetriableException}} from calls to 
{{SinkTask.put()}}, but this is something we should support also for 
{{flush()}}  and arguably also {{open()}}.

We don't have support for {{RetriableException}} with {{SourceTask}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4047) Return more useful information from ConsumerGroupCommand for consumer groups that are rebalancing or manually assigned

2016-08-16 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423212#comment-15423212
 ] 

Vahid Hashemian commented on KAFKA-4047:


Thanks for verifying.

> Return more useful information from ConsumerGroupCommand for consumer groups 
> that are rebalancing or manually assigned
> --
>
> Key: KAFKA-4047
> URL: https://issues.apache.org/jira/browse/KAFKA-4047
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, tools
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sean McKibben
>Priority: Minor
>
> For consumers which are manually assigned topic partitions, or for offsets 
> which are otherwise manually committed to brokers using the new consumer API, 
> the existing ConsumerGroupCommand tools are operationally insufficient.
> There are many use cases in production operation when it is important to be 
> able to easily retrieve the stored offsets and calculated lag of a consumer 
> group, even if the group is rebalancing or partition assignment information 
> is indeterminate. Often these cases involve a misbehaving or crashed client 
> application, and having the ConsumerGroupCommand return {{Consumer group 
> `myGroupID` does not exist or is rebalancing.}} instead of the information it 
> does know is not very helpful. Additionally, when manual offset commits are 
> used, or the automatic consumer group subscription management is not used, 
> the same message is returned by the tool. In these cases, the offsets are 
> actually stored and available even though the broker/coordinator doesn't have 
> information about partition ownership.
> Previously, this was a non-issue as a ZK client could be leveraged relatively 
> easily to get important information like the last stored offset positions, or 
> consumer-offset-checker would return more information in its responses. With 
> the new consumer, however, the process for getting offset and lag is not as 
> straightforward, and tools to provide this information at the operational 
> level are necessary as part of a Kafka installation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1744: HOTFIX: Re-inserted system out

2016-08-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1744


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Reopened] (KAFKA-4045) Investigate feasibility of hooking into RocksDb's cache

2016-08-16 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska reopened KAFKA-4045:
-

Reopening since it's worth checking with the RocksDb community on future plans 
around this.

> Investigate feasibility of hooking into RocksDb's cache
> ---
>
> Key: KAFKA-4045
> URL: https://issues.apache.org/jira/browse/KAFKA-4045
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> Ideally we could hook a listener into RockDb's cache so that when entries are 
> flushed or evicted from the cache the listener is called (and can 
> subsequently perform Kafka Streams-specific functions, like forward a record 
> downstream). That way we don't build our own cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4047) Return more useful information from ConsumerGroupCommand for consumer groups that are rebalancing or manually assigned

2016-08-16 Thread Sean McKibben (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean McKibben resolved KAFKA-4047.
--
Resolution: Duplicate

I think 3853 is pretty close, so I'll mark it as a duplicate.

> Return more useful information from ConsumerGroupCommand for consumer groups 
> that are rebalancing or manually assigned
> --
>
> Key: KAFKA-4047
> URL: https://issues.apache.org/jira/browse/KAFKA-4047
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, tools
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sean McKibben
>Priority: Minor
>
> For consumers which are manually assigned topic partitions, or for offsets 
> which are otherwise manually committed to brokers using the new consumer API, 
> the existing ConsumerGroupCommand tools are operationally insufficient.
> There are many use cases in production operation when it is important to be 
> able to easily retrieve the stored offsets and calculated lag of a consumer 
> group, even if the group is rebalancing or partition assignment information 
> is indeterminate. Often these cases involve a misbehaving or crashed client 
> application, and having the ConsumerGroupCommand return {{Consumer group 
> `myGroupID` does not exist or is rebalancing.}} instead of the information it 
> does know is not very helpful. Additionally, when manual offset commits are 
> used, or the automatic consumer group subscription management is not used, 
> the same message is returned by the tool. In these cases, the offsets are 
> actually stored and available even though the broker/coordinator doesn't have 
> information about partition ownership.
> Previously, this was a non-issue as a ZK client could be leveraged relatively 
> easily to get important information like the last stored offset positions, or 
> consumer-offset-checker would return more information in its responses. With 
> the new consumer, however, the process for getting offset and lag is not as 
> straightforward, and tools to provide this information at the operational 
> level are necessary as part of a Kafka installation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1744: HOTFIX: Re-inserted system out

2016-08-16 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/1744

HOTFIX: Re-inserted system out



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka hotfix-ducktape-marker

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1744


commit 72e69e30a4ac5899a196381ffe83cea5edc18ab5
Author: Eno Thereska 
Date:   2016-08-16T17:59:06Z

Re-inserted system out




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Andrey L. Neporada
Hi!

> On 16 Aug 2016, at 20:28, Jun Rao  wrote:
> 
> Hi, Andrey,
> 
> I was thinking of just doing 2 for the new fetch request for backward
> compatibility.
> 
> It seems there are no more comments on this thread. So, we can probably
> start the voting thread once you update the wiki.

Ok, makes sense for me.
Will update wiki today.

> 
> Also, it seems that KIP-73 depends on this KIP. Do you think you will be
> actively working on the implementation of this KIP in the next couple of
> weeks so that KIP-73 can proceed?
> 

We really need this change as soon as possible, so I do plan to work actively 
on KIP-74. 

> Thanks,
> 
> Jun

Andrey.



[jira] [Resolved] (KAFKA-3769) KStream job spending 60% of time writing metrics

2016-08-16 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-3769.
--
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1530
[https://github.com/apache/kafka/pull/1530]

> KStream job spending 60% of time writing metrics
> 
>
> Key: KAFKA-3769
> URL: https://issues.apache.org/jira/browse/KAFKA-3769
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>Assignee: Guozhang Wang
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> I've been profiling a complex streams job, and found two major hotspots when 
> writing metrics, which take up about 60% of the CPU time of the job. (!) A PR 
> is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423136#comment-15423136
 ] 

ASF GitHub Bot commented on KAFKA-3769:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1530


> KStream job spending 60% of time writing metrics
> 
>
> Key: KAFKA-3769
> URL: https://issues.apache.org/jira/browse/KAFKA-3769
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>Assignee: Guozhang Wang
>Priority: Critical
>
> I've been profiling a complex streams job, and found two major hotspots when 
> writing metrics, which take up about 60% of the CPU time of the job. (!) A PR 
> is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1530: KAFKA-3769: Create new sensors per-thread in Kafka...

2016-08-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1530


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper performance issue for storing offset

2016-08-16 Thread Xin Jin
Hi,

I'm working on streaming systems in AMPLab at UC Berkeley. This article (
https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka)
mentioned the ZooKeeper performance issue when consumers store offsets in
ZooKeeper.

"In Kafka releases through 0.8.1.1, consumers commit their offsets to
ZooKeeper. ZooKeeper does not scale extremely well (especially for writes)
when there are a large number of offsets (i.e., consumer-count *
partition-count)."

Can anyone tell me in production scenarios, how many consumers and
partitions do you have? How much write (offset update) traffic do you
generate that ZooKeeper cannot handle?

Thank you very much!
Xin


Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Jun Rao
Hi, Andrey,

I was thinking of just doing 2 for the new fetch request for backward
compatibility.

It seems there are no more comments on this thread. So, we can probably
start the voting thread once you update the wiki.

Also, it seems that KIP-73 depends on this KIP. Do you think you will be
actively working on the implementation of this KIP in the next couple of
weeks so that KIP-73 can proceed?

Thanks,

Jun

On Tue, Aug 16, 2016 at 9:28 AM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi, Jun!
>
> > On 16 Aug 2016, at 18:52, Jun Rao  wrote:
> >
> > Hi, Andrey,
> >
> > For 2, we actually can know the next message size. In LogSegment.read(),
> we
> > first use the offset index to find the file position close to the
> requested
> > offset and then scan the log forward to find the message whose offset is
> at
> > or larger than the requested offset. By the time we find such a message,
> we
> > know exactly the size of the first message that we need to return in the
> > fetch response. No additional read is needed. We can probably just
> > propagate this information back to the caller. I added a comment related
> to
> > this in https://issues.apache.org/jira/browse/KAFKA-3810.
>
> Good point!
>
> I think it makes sense to apply the strategy you propose in KAFKA-3810 not
> only for new fetch request but also for old (“unlimited”) one.
>
> So we return full message back even if first unread message exceeds
> partition limit. And we continue doing so until response limit is reached.
> What do you think?
>
>
> >
> > Thanks,
> >
> > Jun
> >
>
> Andrey.
>
>


[GitHub] kafka pull request #1735: MINOR: Add application id prefix for copartitionGr...

2016-08-16 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka/pull/1735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4047) Return more useful information from ConsumerGroupCommand for consumer groups that are rebalancing or manually assigned

2016-08-16 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423013#comment-15423013
 ] 

Vahid Hashemian commented on KAFKA-4047:


[~graphex] This could be a duplicate of issues reported in 
[KAFKA-3144|https://issues.apache.org/jira/browse/KAFKA-3144], 
[KAFKA-3859|https://issues.apache.org/jira/browse/KAFKA-3859], and 
[KAFKA-3853|https://issues.apache.org/jira/browse/KAFKA-3853]. Please advise if 
that is not the case. Thanks.

> Return more useful information from ConsumerGroupCommand for consumer groups 
> that are rebalancing or manually assigned
> --
>
> Key: KAFKA-4047
> URL: https://issues.apache.org/jira/browse/KAFKA-4047
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, tools
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sean McKibben
>Priority: Minor
>
> For consumers which are manually assigned topic partitions, or for offsets 
> which are otherwise manually committed to brokers using the new consumer API, 
> the existing ConsumerGroupCommand tools are operationally insufficient.
> There are many use cases in production operation when it is important to be 
> able to easily retrieve the stored offsets and calculated lag of a consumer 
> group, even if the group is rebalancing or partition assignment information 
> is indeterminate. Often these cases involve a misbehaving or crashed client 
> application, and having the ConsumerGroupCommand return {{Consumer group 
> `myGroupID` does not exist or is rebalancing.}} instead of the information it 
> does know is not very helpful. Additionally, when manual offset commits are 
> used, or the automatic consumer group subscription management is not used, 
> the same message is returned by the tool. In these cases, the offsets are 
> actually stored and available even though the broker/coordinator doesn't have 
> information about partition ownership.
> Previously, this was a non-issue as a ZK client could be leveraged relatively 
> easily to get important information like the last stored offset positions, or 
> consumer-offset-checker would return more information in its responses. With 
> the new consumer, however, the process for getting offset and lag is not as 
> straightforward, and tools to provide this information at the operational 
> level are necessary as part of a Kafka installation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Time-based releases for Apache Kafka

2016-08-16 Thread Jim Jagielski
I'm following along on the thread so for sure! :)

> On Aug 16, 2016, at 12:19 PM, Gwen Shapira  wrote:
> 
> Absolutely!
> 
> If you have any concrete suggestions for steps we can take to improve
> the process, this will be most awesome. We'd love to learn from your
> long experience in Apache :)
> 
> Gwen
> 
> On Tue, Aug 16, 2016 at 6:59 AM, Jim Jagielski  wrote:
>> By being aware of the potential issues, it's easier to address
>> them at the start, and to create a process which does what
>> it can to "ensure" the problems don't pop up :)
>> 
>>> On Aug 16, 2016, at 9:48 AM, Ismael Juma  wrote:
>>> 
>>> Hi Jim,
>>> 
>>> Thanks for your feedback. We value the community and we definitely want
>>> Kafka to remain a fun and friendly place to participate. Under this
>>> proposal, volunteers will still be able to do the work when they can. The
>>> benefit is that it is likely to reach users faster since the next release
>>> is never far away.
>>> 
>>> Ismael
>>> 
>>> On Tue, Aug 16, 2016 at 2:42 PM, Jim Jagielski  wrote:
>>> 
 The idea of time-based releases make sense. The issue is
 when they become the tail wagging the dog.
 
 Recall that all developers and contributors are assumed to
 be doing this because they are personally invested in the
 project. Their is also the assumption that, as such, they
 are volunteers and do the work "when they can". And finally,
 there is the fact that working on Apache projects should be
 FUN. It should be someplace where you aren't beholden to,
 or under, some artificial schedule.
 
 If time-based releases are put in place, and held to under
 unforgiving standards, all the above are put at risk. And
 when that happens it puts the project and the community at
 risk as well.
 
 So having a set schedule is fine... it's how "we" do it that
 is key.
 
>> 
> 
> 
> 
> -- 
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog



Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Andrey L. Neporada
Hi, Jun!

> On 16 Aug 2016, at 18:52, Jun Rao  wrote:
> 
> Hi, Andrey,
> 
> For 2, we actually can know the next message size. In LogSegment.read(), we
> first use the offset index to find the file position close to the requested
> offset and then scan the log forward to find the message whose offset is at
> or larger than the requested offset. By the time we find such a message, we
> know exactly the size of the first message that we need to return in the
> fetch response. No additional read is needed. We can probably just
> propagate this information back to the caller. I added a comment related to
> this in https://issues.apache.org/jira/browse/KAFKA-3810.

Good point!

I think it makes sense to apply the strategy you propose in KAFKA-3810 not only 
for new fetch request but also for old (“unlimited”) one.

So we return full message back even if first unread message exceeds partition 
limit. And we continue doing so until response limit is reached.
What do you think?


> 
> Thanks,
> 
> Jun
> 

Andrey.



Re: [DISCUSS] Time-based releases for Apache Kafka

2016-08-16 Thread Gwen Shapira
Absolutely!

If you have any concrete suggestions for steps we can take to improve
the process, this will be most awesome. We'd love to learn from your
long experience in Apache :)

Gwen

On Tue, Aug 16, 2016 at 6:59 AM, Jim Jagielski  wrote:
> By being aware of the potential issues, it's easier to address
> them at the start, and to create a process which does what
> it can to "ensure" the problems don't pop up :)
>
>> On Aug 16, 2016, at 9:48 AM, Ismael Juma  wrote:
>>
>> Hi Jim,
>>
>> Thanks for your feedback. We value the community and we definitely want
>> Kafka to remain a fun and friendly place to participate. Under this
>> proposal, volunteers will still be able to do the work when they can. The
>> benefit is that it is likely to reach users faster since the next release
>> is never far away.
>>
>> Ismael
>>
>> On Tue, Aug 16, 2016 at 2:42 PM, Jim Jagielski  wrote:
>>
>>> The idea of time-based releases make sense. The issue is
>>> when they become the tail wagging the dog.
>>>
>>> Recall that all developers and contributors are assumed to
>>> be doing this because they are personally invested in the
>>> project. Their is also the assumption that, as such, they
>>> are volunteers and do the work "when they can". And finally,
>>> there is the fact that working on Apache projects should be
>>> FUN. It should be someplace where you aren't beholden to,
>>> or under, some artificial schedule.
>>>
>>> If time-based releases are put in place, and held to under
>>> unforgiving standards, all the above are put at risk. And
>>> when that happens it puts the project and the community at
>>> risk as well.
>>>
>>> So having a set schedule is fine... it's how "we" do it that
>>> is key.
>>>
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


[jira] [Created] (KAFKA-4047) Return more useful information from ConsumerGroupCommand for consumer groups that are rebalancing or manually assigned

2016-08-16 Thread Sean McKibben (JIRA)
Sean McKibben created KAFKA-4047:


 Summary: Return more useful information from ConsumerGroupCommand 
for consumer groups that are rebalancing or manually assigned
 Key: KAFKA-4047
 URL: https://issues.apache.org/jira/browse/KAFKA-4047
 Project: Kafka
  Issue Type: Improvement
  Components: consumer, tools
Affects Versions: 0.10.0.1, 0.10.0.0, 0.9.0.1
Reporter: Sean McKibben
Priority: Minor


For consumers which are manually assigned topic partitions, or for offsets 
which are otherwise manually committed to brokers using the new consumer API, 
the existing ConsumerGroupCommand tools are operationally insufficient.
There are many use cases in production operation when it is important to be 
able to easily retrieve the stored offsets and calculated lag of a consumer 
group, even if the group is rebalancing or partition assignment information is 
indeterminate. Often these cases involve a misbehaving or crashed client 
application, and having the ConsumerGroupCommand return {{Consumer group 
`myGroupID` does not exist or is rebalancing.}} instead of the information it 
does know is not very helpful. Additionally, when manual offset commits are 
used, or the automatic consumer group subscription management is not used, the 
same message is returned by the tool. In these cases, the offsets are actually 
stored and available even though the broker/coordinator doesn't have 
information about partition ownership.
Previously, this was a non-issue as a ZK client could be leveraged relatively 
easily to get important information like the last stored offset positions, or 
consumer-offset-checker would return more information in its responses. With 
the new consumer, however, the process for getting offset and lag is not as 
straightforward, and tools to provide this information at the operational level 
are necessary as part of a Kafka installation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Jun Rao
Hi, Andrey,

For 2, we actually can know the next message size. In LogSegment.read(), we
first use the offset index to find the file position close to the requested
offset and then scan the log forward to find the message whose offset is at
or larger than the requested offset. By the time we find such a message, we
know exactly the size of the first message that we need to return in the
fetch response. No additional read is needed. We can probably just
propagate this information back to the caller. I added a comment related to
this in https://issues.apache.org/jira/browse/KAFKA-3810.

Thanks,

Jun

On Tue, Aug 16, 2016 at 4:58 AM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi Jun!
> Thanks for feedback.
>
> > On 15 Aug 2016, at 20:04, Jun Rao  wrote:
> >
> > Hi, Andrey,
> >
> > Thanks for the update to the wiki. Just a few more minor comments.
> >
> > 1. "If *response_max_bytes* parameter is zero ("no limit"), the request
> is
> > processed *exactly* as before." Instead of using 0, it seems it's more
> > natural to use Int.MAX_INT to preserve the old behaviour.
> >
> OK, done.
>
> > 2. "For the first partition, server always fetches at least
> > *message.max.bytes."
> > *To be more precise, the server only fetches more bytes than
> > *response_max_bytes
> > *(and up *message.max.bytes*t) if there is a message whose size is larger
> > than *response_max_bytes.*
> >
>
> Unfortunately, there is no easy way to obtain the size of next message in
> ReplicaManager:fetchMessages() - you will need to issue extra small read
> from log to find it.
> So, unless I am missing something important, I would like to keep the
> proposal (and algorithm) as is.
>
> > 3. "The solution is to continue fetching from first empty partition in
> > round-robin fashion or to perform random shuffle of partitions." We
> > probably want to make it clearer by saying that this is for ordering the
> > partitions in the fetch request.
> >
> > 4. Just to make it clear. Could you include the new fetch request
> protocol
> > in the wiki (e.g.
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 4+-+Command+line+and+centralized+administrative+operations#KIP-4-
> Commandlineandcentralizedadministrativeoperations-
> MetadataRequest(version1))
> > and mark the new field?
> >
>
> OK, done.
>
> BTW, what is the target version of this KIP? Currently the inter-broker
> protocol version in KIP is set to 0.11.0-IV0. Do we want to target for 0.11
> or maybe somewhat earlier?
>
>
> > Jun
> >
>
> Thanks,
> Andrey.
>
>


[jira] [Commented] (KAFKA-3478) Finer Stream Flow Control

2016-08-16 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422907#comment-15422907
 ] 

Bill Bejeck commented on KAFKA-3478:


Is this task still available and is it a feature that is still in the current 
plan/desired to get done?

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4046) Expose Cleaner Offset for Compacted Topics

2016-08-16 Thread Sean McKibben (JIRA)
Sean McKibben created KAFKA-4046:


 Summary: Expose Cleaner Offset for Compacted Topics
 Key: KAFKA-4046
 URL: https://issues.apache.org/jira/browse/KAFKA-4046
 Project: Kafka
  Issue Type: Improvement
  Components: metrics, offset manager
Affects Versions: 0.10.0.1, 0.10.0.0, 0.9.0.1
Reporter: Sean McKibben
Priority: Minor


Certain use cases, such as a streaming distinct view on data, could make use of 
topic compaction to retrieve a snapshot of unique messages from a compacted 
topic. Currently, the point to which compaction has progressed is not exposed 
for such use cases, though it appears to be readily available to the brokers. 
Without that information being exposed to consumers for such use cases, 
techniques such as pigging https://en.wikipedia.org/wiki/Pigging need to be 
implemented on an ad-hoc basis to discover the approximate offset to which 
compaction has progressed.
Simple API for retrieving the position of compaction for a topic partition 
would be useful both operationally and for use cases which need the general 
uniqueness guarantees offered by topic compaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4023) Add thread id as prefix in Kafka Streams thread logging

2016-08-16 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422890#comment-15422890
 ] 

Bill Bejeck commented on KAFKA-4023:


Picking this one up.  Just let me know if someone is currently working this and 
I'll unassign myself.

> Add thread id as prefix in Kafka Streams thread logging
> ---
>
> Key: KAFKA-4023
> URL: https://issues.apache.org/jira/browse/KAFKA-4023
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie++
>
> A single Kafka Streams instance can include multiple stream threads, and 
> hence without logging prefix it is difficult to determine which thread's 
> producing which log entries.
> We should
> 1) add the log-prefix as thread id in StreamThread logger, as well as its 
> contained StreamPartitionAssignor.
> 2) add the log-prefix as task id in StreamTask / StandbyTask, as well as its 
> contained RecordCollector and ProcessorStateManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4023) Add thread id as prefix in Kafka Streams thread logging

2016-08-16 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-4023:
--

Assignee: Bill Bejeck

> Add thread id as prefix in Kafka Streams thread logging
> ---
>
> Key: KAFKA-4023
> URL: https://issues.apache.org/jira/browse/KAFKA-4023
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie++
>
> A single Kafka Streams instance can include multiple stream threads, and 
> hence without logging prefix it is difficult to determine which thread's 
> producing which log entries.
> We should
> 1) add the log-prefix as thread id in StreamThread logger, as well as its 
> contained StreamPartitionAssignor.
> 2) add the log-prefix as task id in StreamTask / StandbyTask, as well as its 
> contained RecordCollector and ProcessorStateManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3937) Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages

2016-08-16 Thread William Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422828#comment-15422828
 ] 

William Yu commented on KAFKA-3937:
---

[~ijuma] Took a stab at making the fix. I made it against the 0.9.0 branch as 
that is what my clients and brokers are running as. I tested it in my 
environment and i saw native memory was not going above the java -Xmx setting 
as the streams were being closed when the RecordIterator completed.

> Kafka Clients Leak Native Memory For Longer Than Needed With Compressed 
> Messages
> 
>
> Key: KAFKA-3937
> URL: https://issues.apache.org/jira/browse/KAFKA-3937
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.2, 0.9.0.1, 0.10.0.0
> Environment: Linux, latest oracle java-8
>Reporter: Tom Crayford
>Priority: Minor
>
> In https://issues.apache.org/jira/browse/KAFKA-3933, we discovered that 
> brokers can crash when performing log recovery, as they leak native memory 
> whilst decompressing compressed segments, and that native memory isn't 
> cleaned up rapidly enough by garbage collection and finalizers. The work to 
> fix that in the brokers is taking part in 
> https://github.com/apache/kafka/pull/1598. As part of that PR, Ismael Juma 
> asked me to fix similar issues in the client. Rather than have one large PR 
> that fixes everything, I'd rather break this work up into seperate things, so 
> I'm filing this JIRA to track the followup work. I should get to a PR on this 
> at some point relatively soon, once the other PR has landed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3937) Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422805#comment-15422805
 ] 

ASF GitHub Bot commented on KAFKA-3937:
---

GitHub user wiyu opened a pull request:

https://github.com/apache/kafka/pull/1743

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages

@ijuma - Creating this PR against 0.9.0 as this is what we're using in 
prod. I can modify this for trunk if the logic solution looks good. 

I modified the record iterator to inherit from closable and implemented the 
method to call streams.close(). The stream will now be closed when we are done 
traversing the inner iterator. I had to move the `try {} catch{}` out of the 
`if` portion to deal with IOException coming from `innerDone()`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wiyu/kafka compressor_memory_leak_in_fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1743


commit b56ec1563e34d40c338224970c6b8cd3802e4dcd
Author: William Yu 
Date:   2016-08-16T14:09:28Z

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages




> Kafka Clients Leak Native Memory For Longer Than Needed With Compressed 
> Messages
> 
>
> Key: KAFKA-3937
> URL: https://issues.apache.org/jira/browse/KAFKA-3937
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.2, 0.9.0.1, 0.10.0.0
> Environment: Linux, latest oracle java-8
>Reporter: Tom Crayford
>Priority: Minor
>
> In https://issues.apache.org/jira/browse/KAFKA-3933, we discovered that 
> brokers can crash when performing log recovery, as they leak native memory 
> whilst decompressing compressed segments, and that native memory isn't 
> cleaned up rapidly enough by garbage collection and finalizers. The work to 
> fix that in the brokers is taking part in 
> https://github.com/apache/kafka/pull/1598. As part of that PR, Ismael Juma 
> asked me to fix similar issues in the client. Rather than have one large PR 
> that fixes everything, I'd rather break this work up into seperate things, so 
> I'm filing this JIRA to track the followup work. I should get to a PR on this 
> at some point relatively soon, once the other PR has landed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1743: KAFKA-3937: Kafka Clients Leak Native Memory For L...

2016-08-16 Thread wiyu
GitHub user wiyu opened a pull request:

https://github.com/apache/kafka/pull/1743

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages

@ijuma - Creating this PR against 0.9.0 as this is what we're using in 
prod. I can modify this for trunk if the logic solution looks good. 

I modified the record iterator to inherit from closable and implemented the 
method to call streams.close(). The stream will now be closed when we are done 
traversing the inner iterator. I had to move the `try {} catch{}` out of the 
`if` portion to deal with IOException coming from `innerDone()`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wiyu/kafka compressor_memory_leak_in_fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1743


commit b56ec1563e34d40c338224970c6b8cd3802e4dcd
Author: William Yu 
Date:   2016-08-16T14:09:28Z

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-59 - Proposal for a kafka broker command - kafka-brokers.sh

2016-08-16 Thread Jayesh Thakrar
All,
If there is no discussion, feedback or objection, is there any concern in going 
to the next step of voting?
Thanks,Jayesh
  From: Jayesh Thakrar 
 To: "dev@kafka.apache.org"  
 Sent: Saturday, August 13, 2016 11:44 PM
 Subject: [DISCUSS] KIP-59 - Proposal for a kafka broker command - 
kafka-brokers.sh
   
This is to start off a discussion on the above KIP at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-59%3A+Proposal+for+a+kafka+broker+commandThe
 proposal is to fill the void of a command line tool/utility that can provide 
information on the brokers in a Kafka cluster.
The code is available on GitHub at https://github.com/JThakrar/kafkaThe KIP 
page has the help documentation as well as the output from the command with 
various options.Thank you,Jayesh Thakrar


   

Re: [DISCUSS] Time-based releases for Apache Kafka

2016-08-16 Thread Jim Jagielski
By being aware of the potential issues, it's easier to address
them at the start, and to create a process which does what
it can to "ensure" the problems don't pop up :)

> On Aug 16, 2016, at 9:48 AM, Ismael Juma  wrote:
> 
> Hi Jim,
> 
> Thanks for your feedback. We value the community and we definitely want
> Kafka to remain a fun and friendly place to participate. Under this
> proposal, volunteers will still be able to do the work when they can. The
> benefit is that it is likely to reach users faster since the next release
> is never far away.
> 
> Ismael
> 
> On Tue, Aug 16, 2016 at 2:42 PM, Jim Jagielski  wrote:
> 
>> The idea of time-based releases make sense. The issue is
>> when they become the tail wagging the dog.
>> 
>> Recall that all developers and contributors are assumed to
>> be doing this because they are personally invested in the
>> project. Their is also the assumption that, as such, they
>> are volunteers and do the work "when they can". And finally,
>> there is the fact that working on Apache projects should be
>> FUN. It should be someplace where you aren't beholden to,
>> or under, some artificial schedule.
>> 
>> If time-based releases are put in place, and held to under
>> unforgiving standards, all the above are put at risk. And
>> when that happens it puts the project and the community at
>> risk as well.
>> 
>> So having a set schedule is fine... it's how "we" do it that
>> is key.
>> 



Re: [DISCUSS] Time-based releases for Apache Kafka

2016-08-16 Thread Ismael Juma
Hi Jim,

Thanks for your feedback. We value the community and we definitely want
Kafka to remain a fun and friendly place to participate. Under this
proposal, volunteers will still be able to do the work when they can. The
benefit is that it is likely to reach users faster since the next release
is never far away.

Ismael

On Tue, Aug 16, 2016 at 2:42 PM, Jim Jagielski  wrote:

> The idea of time-based releases make sense. The issue is
> when they become the tail wagging the dog.
>
> Recall that all developers and contributors are assumed to
> be doing this because they are personally invested in the
> project. Their is also the assumption that, as such, they
> are volunteers and do the work "when they can". And finally,
> there is the fact that working on Apache projects should be
> FUN. It should be someplace where you aren't beholden to,
> or under, some artificial schedule.
>
> If time-based releases are put in place, and held to under
> unforgiving standards, all the above are put at risk. And
> when that happens it puts the project and the community at
> risk as well.
>
> So having a set schedule is fine... it's how "we" do it that
> is key.
>


Re: [DISCUSS] Java 8 as a minimum requirement

2016-08-16 Thread Ismael Juma
Hey Harsha,

I noticed that you proposed that Storm should drop support for Java 7 in
master:

http://markmail.org/message/25do6wd3a6g7cwpe

It's useful to know what other Apache projects are doing in this regard, so
I'm interested in the timeline being proposed for Storm's transition. I
could not find it in the thread above, so I'd appreciate it if you could
clarify it for us (and sorry if I missed it).

Thanks,
Ismael

On Mon, Jun 20, 2016 at 5:05 AM, Harsha  wrote:

> Hi Ismael,
>   Agree on timing is more important. If we give enough heads
>   up to the users who are on Java 7 thats great but still
>   shipping this in 0.10.x line is won't be good as it still
>   perceived as maint release even the release might contain
>   lot of features .  If we can make this as part of 0.11 and
>   cutting 0.10.1 features moving to 0.11 and giving rough
>   timeline when that would be released would be ideal.
>
> Thanks,
> Harsha
>
> On Fri, Jun 17, 2016, at 11:13 AM, Ismael Juma wrote:
> > Hi Harsha,
> >
> > Comments below.
> >
> > On Fri, Jun 17, 2016 at 7:48 PM, Harsha  wrote:
> >
> > > Hi Ismael,
> > > "Are you saying that you are aware of many Kafka users still
> > > using Java 7
> > > > who would be ready to upgrade to the next Kafka feature release
> (whatever
> > > > that version number is) before they can upgrade to Java 8?"
> > > I know there quite few users who are still on java 7
> >
> >
> > This is good to know.
> >
> >
> > > and regarding the
> > > upgrade we can't say Yes or no.  Its upto the user discretion when they
> > > choose to upgrade and ofcourse if there are any critical fixes that
> > > might go into the release.  We shouldn't be restricting their upgrade
> > > path just because we removed Java 7 support.
> > >
> >
> > My point is that both paths have their pros and cons and we need to weigh
> > them up. If some users are slow to upgrade the Java version (Java 7 has
> > been EOL'd for over a year), there's a good chance that they are slow to
> > upgrade Kafka too. And if that is the case (and it may not be), then
> > holding up improvements for the ones who actually do upgrade may be the
> > wrong call. To be clear, I am still in listening mode and I haven't made
> > up
> > my mind on the subject.
> >
> > Once we released 0.9.0 there aren't any 0.8.x releases. i.e we don't
> > > have LTS type release where we continually ship critical fixes over
> > > 0.8.x minor releases. So if a user notices a critical fix the only
> > > option today is to upgrade to next version where that fix is shipped.
> > >
> >
> > We haven't done a great job at this in the past, but there is no decision
> > that once a new major release is out, we don't do patch releases for the
> > previous major release. In fact, we have been collecting critical fixes
> > in
> > the 0.9.0 branch for a potential 0.9.0.2.
> >
> > I understand there is no decision made yet but given the premise was to
> > > ship this in 0.10.x  , possibly 0.10.1 which I don't agree with. In
> > > general against shipping this in 0.10.x version. Removing Java 7
> support
> > > when the release is minor in general not a good idea to users.
> > >
> >
> > Sorry if I didn't communicate this properly. I simply meant the next
> > feature release. I used 0.10.1.0 as an example, but it could also be
> > 0.11.0.0 if that turns out to be the next release. A discussion on that
> > will probably take place once the scope is clear. Personally, I think the
> > timing is more important the the version number, but it seems like some
> > people disagree.
> >
> > Ismael
>


Re: [DISCUSS] Time-based releases for Apache Kafka

2016-08-16 Thread Jim Jagielski
The idea of time-based releases make sense. The issue is
when they become the tail wagging the dog.

Recall that all developers and contributors are assumed to
be doing this because they are personally invested in the
project. Their is also the assumption that, as such, they
are volunteers and do the work "when they can". And finally,
there is the fact that working on Apache projects should be
FUN. It should be someplace where you aren't beholden to,
or under, some artificial schedule.

If time-based releases are put in place, and held to under
unforgiving standards, all the above are put at risk. And
when that happens it puts the project and the community at
risk as well.

So having a set schedule is fine... it's how "we" do it that
is key.


Re: Change Apache Kafka properties dynamically without restarting servers using Archaius

2016-08-16 Thread Tom Crayford
inline

On Mon, Aug 15, 2016 at 5:49 PM, VIJJU CH  wrote:

> Hello,
>
> I have some questions related to properties of Apache Kafka. We have a four
> node Kafka Cluster which is on Amazon EMR. Currently, in order to change
> any properties we do restart the servers to pick the new changes made. For
> the following:
>
> 1. Can we change the topic name that a producer is writing to
>

Specified in the message by your code, so no problem.


> 2. Can we change the bootstrap server list
>

Not without restarting the producer/consumer objects.


> 3. Can we change the consumer group name
>

Not without restarting the consumer object.


> 4. Can we change the number of consumers etc. and be able to verify and
> measure that the change is picked up and did not result in lost or
> duplicate message nor introduce any consumer lag
>

This is simply the number of client objects you have running. It's under
your control.

Note that "did not result in lost or duplicate message" is not entirely
possible in the presence of failure. You have to pick one and optimize for
that (Kafka's docs often refer to this as "at least once" or "at most once"
messaging).

Thanks

Tom Crayford
Heroku Kafka


>
> Is it possible to change these properties without restarting the servers ?
> I mean can we change these dynamically using Archaius (
> https://github.com/Netflix/archaius/wiki)?
>
> Reply me at your earliest convenience.
>
> Thanks,
> Vijju
>


Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Ismael Juma
Hi Andrey,

On Tue, Aug 16, 2016 at 12:58 PM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:
>
> BTW, what is the target version of this KIP? Currently the inter-broker
> protocol version in KIP is set to 0.11.0-IV0. Do we want to target for 0.11
> or maybe somewhat earlier?
>

I suggest 0.10.1-IV0 for now. If we decide to name the next version
0.11.0.0, we can change it then.

Ismael


Re: [DISCUSS] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-16 Thread Andrey L. Neporada
Hi Jun!
Thanks for feedback.

> On 15 Aug 2016, at 20:04, Jun Rao  wrote:
> 
> Hi, Andrey,
> 
> Thanks for the update to the wiki. Just a few more minor comments.
> 
> 1. "If *response_max_bytes* parameter is zero ("no limit"), the request is
> processed *exactly* as before." Instead of using 0, it seems it's more
> natural to use Int.MAX_INT to preserve the old behaviour.
> 
OK, done.

> 2. "For the first partition, server always fetches at least
> *message.max.bytes."
> *To be more precise, the server only fetches more bytes than
> *response_max_bytes
> *(and up *message.max.bytes*t) if there is a message whose size is larger
> than *response_max_bytes.*
> 

Unfortunately, there is no easy way to obtain the size of next message in 
ReplicaManager:fetchMessages() - you will need to issue extra small read from 
log to find it.
So, unless I am missing something important, I would like to keep the proposal 
(and algorithm) as is.

> 3. "The solution is to continue fetching from first empty partition in
> round-robin fashion or to perform random shuffle of partitions." We
> probably want to make it clearer by saying that this is for ordering the
> partitions in the fetch request.
> 
> 4. Just to make it clear. Could you include the new fetch request protocol
> in the wiki (e.g.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataRequest(version1))
> and mark the new field?
> 

OK, done.

BTW, what is the target version of this KIP? Currently the inter-broker 
protocol version in KIP is set to 0.11.0-IV0. Do we want to target for 0.11 or 
maybe somewhat earlier?


> Jun
> 

Thanks, 
Andrey.



[jira] [Created] (KAFKA-4045) Investigate feasibility of hooking into RocksDb's cache

2016-08-16 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4045:
---

 Summary: Investigate feasibility of hooking into RocksDb's cache
 Key: KAFKA-4045
 URL: https://issues.apache.org/jira/browse/KAFKA-4045
 Project: Kafka
  Issue Type: Sub-task
Reporter: Eno Thereska
Assignee: Damian Guy


Ideally we could hook a listener into RockDb's cache so that when entries are 
flushed or evicted from the cache the listener is called (and can subsequently 
perform Kafka Streams-specific functions, like forward a record downstream). 
That way we don't build our own cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4045) Investigate feasibility of hooking into RocksDb's cache

2016-08-16 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska resolved KAFKA-4045.
-
Resolution: Fixed

Unfortunately hooking into RocksDb is not possible at this time. End of 
investigation.

> Investigate feasibility of hooking into RocksDb's cache
> ---
>
> Key: KAFKA-4045
> URL: https://issues.apache.org/jira/browse/KAFKA-4045
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> Ideally we could hook a listener into RockDb's cache so that when entries are 
> flushed or evicted from the cache the listener is called (and can 
> subsequently perform Kafka Streams-specific functions, like forward a record 
> downstream). That way we don't build our own cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Do not log value of configs that Kafka doesn't recognize

2016-08-16 Thread Jaikiran Pai
We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers 
and producers. In one of our consumers, our consumer config had a SSL 
specific property which ended up being used against a non-SSL Kafka 
broker port. As a result, the logs ended up seeing messages like:


17:53:33,722  WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
*ssl.truststore.password = foobar* was supplied but isn't a known config.


The log message is fine and makes sense, but can Kafka please not log 
the values of the properties and instead just include the config name 
which it considers as unknown? That way it won't ended up logging these 
potentially sensitive values. I understand that only those with access 
to these log files can end up seeing these values but even then some of 
our internal processes forbid logging such sensitive information to the 
logs. This log message will still end up being useful if only the config 
name is logged without the value.


Can I add this as a JIRA and provide a patch?

-Jaikiran


[GitHub] kafka pull request #1742: KAFKA-4015: Add new cleanup.policy, compact_and_de...

2016-08-16 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1742

KAFKA-4015: Add new cleanup.policy, compact_and_delete

Added compact_and_delete cleanup.policy to LogConfig.
Updated LogCleaner.CleanerThread to also run deletion for any topics 
configured with compact_and_delete.
Ensure Log.deleteSegments only runs when delete is enabled.
Additional Integration and unit tests to cover new option

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4015

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1742


commit 889f5b8cc763cd488567c6034d8c25be28596ee1
Author: Damian Guy 
Date:   2016-08-05T10:27:12Z

enable cleanup.policy=compact_delete




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4015) Add new cleanup.policy, compact_and_delete

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422445#comment-15422445
 ] 

ASF GitHub Bot commented on KAFKA-4015:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1742

KAFKA-4015: Add new cleanup.policy, compact_and_delete

Added compact_and_delete cleanup.policy to LogConfig.
Updated LogCleaner.CleanerThread to also run deletion for any topics 
configured with compact_and_delete.
Ensure Log.deleteSegments only runs when delete is enabled.
Additional Integration and unit tests to cover new option

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4015

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1742


commit 889f5b8cc763cd488567c6034d8c25be28596ee1
Author: Damian Guy 
Date:   2016-08-05T10:27:12Z

enable cleanup.policy=compact_delete




> Add new cleanup.policy, compact_and_delete
> --
>
> Key: KAFKA-4015
> URL: https://issues.apache.org/jira/browse/KAFKA-4015
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> There are some use cases where it is desirable to have a topic that supports 
> both compact and delete policies, i.e., any topic that wants to be compacted 
> by key, but also wants keys that haven't been updated for some time to be 
> automatically expired.
> Add a new compact_and_delete option to cleanup.policy. When set, both compact 
> and delete cleanup strategies should run. This change needs to guarantee 
> thread-safety.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)