Ok... I don't believe my test results, but I have repeated them. I found that build 9cfb99 (Mar 22) worked for me -- KafkaProducer didn't hang/time-out.The next-newer build, 73470b0, did hang and KafkaProducer send() calls would timeout. The only differences between the two builds were documentation changes and a tiny config file change. So I started making the changes manually one at a time until I found the issue.... and I can't believe it. Build 9cfb99 removed two lines in config/server.properties: #advertised.host.name=<hostname routable by clients>#advertised.port=<port accessible by clients> (It also commented out a listeners config but turning this on/off made no difference.) Note: THE LINES ARE COMMENTED OUT! I flipped this a few times in disbelief but the results were the same. My process: 1) ./gradlew clean2) ./gradlew -PscalaVersion=2.11 releaseTarGz -x signArchives3) (build a Docker image with the tgz file just like in spotify/kafka)4) ./gradlew -PscalaVersion=2.11 install_2_11 5) clean, update, then recompile my test code in sbt6) run This process is repeatable and I applied it consistently through all tests. Not convinced yet? I went back to trunk. Popped in those 2 commented out lines and rebuilt using the process above. Worked! Email will probably mess up my formatting here, but here's the git diff on my 2-line change to trunk: $ git diffdiff --git a/config/server.properties b/config/server.propertiesindex aebcb87..d9d17e8 100644--- a/config/server.properties+++ b/config/server.properties@@ -29,6 +29,9 @@ broker.id=0 # listeners = PLAINTEXT://your.host.name:9092 #listeners=PLAINTEXT://:9092 +#advertised.host.name=<hostname routable by clients>+#advertised.port=<port accessible by clients>+ # Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured. Otherwise, it will use the value # returned from java.net.InetAddress.getCanonicalHostName(). I dunno what to say. I'll keep flipping this and have a friend confirm the results. Is some code or script textually looking at this file (not parsing it) for either of these 2 things? From: Greg Zoller <gwzol...@yahoo.com.INVALID> To: "users@kafka.apache.org" <users@kafka.apache.org> Sent: Saturday, April 16, 2016 8:17 PM Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT (latest) Well I've made progress but I don't understand the results. Build 99cfb99 (Mar 22) KafkaProducer works (built it twice separately to confirm)Build 73470b0 (next one) does not, but.... this commit is pretty much just comment changes! I can't see anything that would change actual behavior. I'm going to play with these results more next week because it doesn't make any sense. I was methodical to keep a clean environment but clearly something's not right... comments don't break functionality. I'll try building the commit you mentioned and the one just before and see what I get. Greg
From: Ismael Juma <ism...@juma.me.uk> To: users@kafka.apache.org; Greg Zoller <gwzol...@yahoo.com> Sent: Saturday, April 16, 2016 6:47 PM Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT (latest) That's interesting, thanks for investigating this. I had a look at the commits since that date, and one that changed a related part of the code was: https://github.com/apache/kafka/commit/1fbe445dde71df0023a978c5e54dd229d3d23e1b Maybe you could try testing with that commit and seeing if it breaks at that point. Ismael On Sun, Apr 17, 2016 at 12:02 AM, Greg Zoller <gwzol...@yahoo.com.invalid> wrote: > I think I'm going to take a brute-force approach... > I went back and built 702d560 (Mar 21), the first build for > 0.10.1.0-SNAPSHOT. > This worked! No issues with KafkaProducer and initial, post-population > offsets were set appropriately. My send() callback returned no exceptions > and reported valid metadata. > Now the fun part... I'm going to try to zero in to find the build, and > therefore the change, that broke this functionality. More updates to > follow... > Greg > > From: Greg Zoller <gwzol...@yahoo.com.INVALID> > To: "users@kafka.apache.org" <users@kafka.apache.org>; Greg Zoller < > gwzol...@yahoo.com> > Sent: Saturday, April 16, 2016 3:32 PM > Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT > (latest) > > Not sure if this helps, but I also just tried adding this line immediately > before I call my send() code: > p.partitionsFor("lowercaseStrings") > where p is my KafkaProducer and lowercaseStrings is my topic. > partitionsFor() returned immediately and successfully with expected values > for partition information. Not 100% sure but from the KafkaProducer code > it appears this pulls information from metadata, so if that's true I do > have a meaningful connection to my server and getting the metadata. > > From: Greg Zoller <gwzol...@yahoo.com.INVALID> > To: "users@kafka.apache.org" <users@kafka.apache.org> > Sent: Saturday, April 16, 2016 3:02 PM > Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT > (latest) > > Hi, Ismael, > Thank you for your help! > The only difference in usage between the versions is that I flip my > project dependency version from 0.9.0.1 to 0.10.1.0 and clean+test. Code > is identical, including the Docker packaging. > OK, so I'm looking at the logs and honestly I'm not 100% sure what I'm > looking at. I didn't see anything that looked like an error but that > doesn't mean its not there. > I'd be very grateful if you'd be willing to look. I've attached the > compressed logs for just this one failed test and no more, so they're > small. (attached) > Greg > > From: Ismael Juma <isma...@gmail.com> > To: users@kafka.apache.org > Sent: Saturday, April 16, 2016 2:30 PM > Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT > (latest) > > Hi Greg, > > I asked if there were any errors in the broker logs because the metadata > request sent by the producer timed out. No messages can be sent if the > producer is unable to retrieve the topic metadata. If the broker has > started without errors, then the producer is unable to reach it for some > reason (network setup or config problem are common reasons). > > I know you said this works for 0.9.0.1, but are you sure there are is no > other difference apart from the version? It may be worth trying it without > using Docker. > > Ismael > 0.10.1.0-SNAPSHOT is the broker too > > My procedure is the same as 0.9.0.1. I do a signed tgz build and then a > local maven install. I unpack the tgz into a docker just like > spotify/kafka. I run that as my server broker with ports 9092 and 2181 > exposed. > > I link my producer code in the gist to my local maven snapshot lib (not in > a Docker) > > Same process works for the 0.9 version > > Sent from my iPhone > > > On Apr 15, 2016, at 7:17 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > Hi Greg, > > > > What is the broker version? Have you checked that there are no errors > > logged in the broker? > > > > Ismael > > > > On Fri, Apr 15, 2016 at 11:55 PM, Greg Zoller <gwzol...@yahoo.com.invalid > > > > wrote: > > > >> ok... one more attempt for a working link... > >> https://gist.github.com/gzoller/145faef1fefc8acea212e87e06fc86e8 > >> (If this doesn't work please copy/paste link.) > >> > >> From: Greg Zoller <gwzol...@yahoo.com> > >> To: "users@kafka.apache.org" <users@kafka.apache.org>; Greg Zoller < > >> gwzol...@yahoo.com> > >> Sent: Friday, April 15, 2016 5:53 PM > >> Subject: Re: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT > >> (latest) > >> > >> GIST: https://gist.github.com/gzoller/145faef1fefc8acea212e87e06fc86e8 > >> > >> From: Greg Zoller <gwzol...@yahoo.com.INVALID> > >> To: "users@kafka.apache.org" <users@kafka.apache.org> > >> Sent: Friday, April 15, 2016 5:51 PM > >> Subject: Producer Bug? Moving form 0.9.0.1 to 0.10.1.0-SNAPSHOT (latest) > >> > >> I have built 0.10.1.0-SNAPSHOT from scratch and used it with > >> my KafkaProducer code from 0.9.0.1.It compiled just fine but when run > it > >> hangs (times out actually) on send(). I've created a gist below with a > >> clip from the output in comments at the end of the file. > >> Remember, this code worked flawlessly as-is with 0.9.0.1. Did a > breaking > >> change occur that requires this code to be modified, or is this a bug in > >> 0.10.1.0-SNAPSHOT? > >> Kafka populate topic > >> > >> > >> | > >> | > >> | > >> | | | > >> > >> | > >> > >> | > >> | > >> | | > >> Kafka populate topic > >> Kafka populate topic | | > >> > >> | > >> > >> | > >> > >> > >> > >> > >> > >> > >> > > > > > > > >