Re: [VOTE] 0.10.0.0 RC4

2016-05-16 Thread Gwen Shapira
Thanks, man! Good to see Heroku being good friends to the Kafka community by testing new releases, reporting issues and following up with the cause and documentation pr. With this out of the way, we closed all known blockers for 0.10.0.0. I'll roll out a new RC tomorrow morning. Gwen On Sun,

Re: [VOTE] 0.10.0.0 RC4

2016-05-15 Thread Tom Crayford
https://github.com/apache/kafka/pull/1389 On Sun, May 15, 2016 at 9:22 PM, Ismael Juma wrote: > Hi Tom, > > Great to hear that the failure testing scenario went well. :) > > Your suggested improvement sounds good to me and a PR would be great. For > this kind of change, you

Re: [VOTE] 0.10.0.0 RC4

2016-05-15 Thread Ismael Juma
Hi Tom, Great to hear that the failure testing scenario went well. :) Your suggested improvement sounds good to me and a PR would be great. For this kind of change, you can skip the JIRA, just prefix the PR title with `MINOR:`. Thanks, Ismael On Sun, May 15, 2016 at 9:17 PM, Tom Crayford

Re: [VOTE] 0.10.0.0 RC4

2016-05-15 Thread Tom Crayford
How about this? Note: Due to the additional timestamp introduced in each message (8 bytes of data), producers sending small messages may see a message throughput degradation because of the increased overhead. Likewise, replication now transmits an additional 8 bytes per message. If

Re: [VOTE] 0.10.0.0 RC4

2016-05-15 Thread Ismael Juma
Hi Tom, Thanks for the update and for all the testing you have done! No worries about the chase here, I'd much rather have false positives by people who are validating the releases than false negatives because people don't validate the releases. :) The upgrade note we currently have follows:

Re: [VOTE] 0.10.0.0 RC4

2016-05-15 Thread Tom Crayford
I've been digging into this some more. It seems like this may have been an issue with benchmarks maxing out the network card - under 0.10.0.0-RC the slightly additional bandwidth per message seems to have pushed the broker's NIC into overload territory where it starts dropping packets (verified

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Gwen Shapira
also, perhaps sharing the broker configuration? maybe this will provide some hints... On Fri, May 13, 2016 at 5:31 PM, Ismael Juma wrote: > Thanks Tom. I just wanted to share that I have been unable to reproduce > this so far. Please feel free to share whatever you information

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Ismael Juma
Thanks Tom. I just wanted to share that I have been unable to reproduce this so far. Please feel free to share whatever you information you have so far when you have a chance, don't feel that you need to have all the answers. Ismael On Fri, May 13, 2016 at 7:32 PM, Tom Crayford

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Becket Qin
Gwen, The version we are currently running in production is the trunk on Feb 24. Which has KAFKA-3025. Our release test cluster has been running this version for about two months, I haven't seen throughput issues so far. But we are probably not running at the max capacity of the brokers. I will

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Gwen Shapira
Becket, Did you try deploying one of the 0.10.0 candidates at LinkedIn? Did you see this issue? Gwen On Fri, May 13, 2016 at 10:30 AM, Becket Qin wrote: > Tom, > > Maybe it is mentioned and I missed. I am wondering if you see performance > degradation on the consumer side

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Gwen Shapira
Hi, We (Ismael, Magnus and Jun) are also trying to reproduce and figure it out on our side. Will keep you posted. Gwen On Fri, May 13, 2016 at 11:32 AM, Tom Crayford wrote: > I've been investigating this pretty hard since I first noticed it. Right > now I have more

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Tom Crayford
I've been investigating this pretty hard since I first noticed it. Right now I have more avenues for investigation than I can shake a stick at, and am also dealing with several other things in flight/on fire. I'll respond when I have more information and can confirm things. On Fri, May 13, 2016

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Becket Qin
Tom, Maybe it is mentioned and I missed. I am wondering if you see performance degradation on the consumer side when TLS is used? This could help us understand whether the issue is only producer related or TLS in general. Thanks, Jiangjie (Becket) Qin On Fri, May 13, 2016 at 6:19 AM, Tom

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Tom Crayford
Ismael, Thanks. I'm writing up an issue with some new findings since yesterday right now. Thanks Tom On Fri, May 13, 2016 at 1:06 PM, Ismael Juma wrote: > Hi Tom, > > That's because JIRA is in lockdown due to excessive spam. I have added you > as a contributor in JIRA and

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Ismael Juma
Hi Tom, That's because JIRA is in lockdown due to excessive spam. I have added you as a contributor in JIRA and you should be able to file a ticket now. Thanks, Ismael On Fri, May 13, 2016 at 12:17 PM, Tom Crayford wrote: > Ok, I don't seem to be able to file a new Jira

Re: [VOTE] 0.10.0.0 RC4

2016-05-13 Thread Tom Crayford
Ok, I don't seem to be able to file a new Jira issue at all. Can somebody check my permissions on Jira? My user is `tcrayford-heroku` Tom Crayford Heroku Kafka On Fri, May 13, 2016 at 12:24 AM, Jun Rao wrote: > Tom, > > We don't have a CSV metrics reporter in the producer

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Jun Rao
Tom, We don't have a CSV metrics reporter in the producer right now. The metrics will be available in jmx. You can find out the details in http://kafka.apache.org/documentation.html#new_producer_monitoring Thanks, Jun On Thu, May 12, 2016 at 3:08 PM, Tom Crayford wrote:

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Tom Crayford
Yep, I can try those particular commits tomorrow. Before I try a bisect, I'm going to replicate with a less intensive to iterate on smaller scale perf test. Jun, inline: On Thursday, 12 May 2016, Jun Rao wrote: > Tom, > > Thanks for reporting this. A few quick comments. > >

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Jun Rao
Tom, Thanks for reporting this. A few quick comments. 1. Did you send the right command for producer-perf? The command limits the throughput to 100 msgs/sec. So, not sure how a single producer can get 75K msgs/sec. 2. Could you collect some stats (e.g. average batch size) in the producer and

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Ismael Juma
Hi Tom, This is puzzling because, as you said, not much has changed in the TLS code since 0.9.0.1. A JIRA sounds good. I was going to ask if you could test the commit before/after KAFKA-3025, but I see that Gwen has already done that. :) Ismael On Thu, May 12, 2016 at 9:26 PM, Tom Crayford

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Gwen Shapira
I know it is a big ask, but can you try bisecting? For example, test before/after on commits: * 45c8195 KAFKA-3025; Added timetamp to Message and use relative offset. * 5b375d7 KAFKA-3149; Extend SASL implementation to support more mechanisms * 69d9a66 KAFKA-3618; Handle ApiVersionsRequest before

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Tom Crayford
Yep, confirm. On Thu, May 12, 2016 at 9:37 PM, Gwen Shapira wrote: > Just to confirm: > You tested both versions with plain text and saw no performance drop? > > > On Thu, May 12, 2016 at 1:26 PM, Tom Crayford > wrote: > > We've started running our

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Gwen Shapira
Just to confirm: You tested both versions with plain text and saw no performance drop? On Thu, May 12, 2016 at 1:26 PM, Tom Crayford wrote: > We've started running our usual suite of performance tests against Kafka > 0.10.0.0 RC. These tests orchestrate multiple

Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Tom Crayford
We've started running our usual suite of performance tests against Kafka 0.10.0.0 RC. These tests orchestrate multiple consumer/producer machines to run a fairly normal mixed workload of producers and consumers (each producer/consumer are just instances of kafka's inbuilt consumer/producer perf

Re: [VOTE] 0.10.0.0 RC4

2016-05-10 Thread Gwen Shapira
Thanks for finding and reporting, Liquan. I'll wait a day or two for more testing and roll out a new RC. In other news: We keep running into last minute issues with our shell scripts, because we have zero automated testing for them. Contribution of automated tests for our scripts will be super

Re: [VOTE] 0.10.0.0 RC4

2016-05-10 Thread Liquan Pei
We found a blocking issue on the release https://issues.apache.org/jira/browse/KAFKA-3692. This may cause the external CLASSPATH not be included in the final CLASSPATH in kafka-run-class.sh. There is no easy work around of this and we need a new RC. Thanks, Liquan On Mon, May 9, 2016 at 6:49 PM,

[VOTE] 0.10.0.0 RC4

2016-05-09 Thread Gwen Shapira
Hello Kafka users, developers and client-developers, This is the first candidate for release of Apache Kafka 0.10.0.0. This is a major release that includes: (1) New message format including timestamps (2) client interceptor API (3) Kafka Streams. Since this is a major release, we will give