The metadata request is sent to the broker, which will read from ZK. I suggest that you turn on trace level logging for class kafka.network.RequestChannel$ in all brokers. The log will tell you how long each metadata request takes on the broker. You can then set you socket timeout in the producer accordingly.
Thanks, Jun On Fri, Mar 22, 2013 at 9:38 AM, Bob Jervis <bjer...@gmail.com> wrote: > I've tried this and it appears that we are still seeing the issue. Here > is a stack trace of one of the socket timeout exceptions we are seeing (we > converted to the SimpleConsumer): > > 2013-03-22 04:54:51,807 INFO kafka.client.ClientUtils$: Fetching metadata > for topic Set(v1-japanese-0, v1-indonesian-5, v1-french-3, v1-other-4, > v1-portuguese-1, v1-other-1, v1-german-2, v1-english-15, v1-english-8, > v1-portuguese-6, v1-arabic-0, v1-english-6, v1-korean-1, v1-spanish-5, > v1-english-10, v1-japanese-4, v1-english-1, v1-italian-3, v1-spanish-1, > v1-english-5, v1-other-7, v1-portuguese-2, v1-other-0, v1-indonesian-4, > v1-english-9, v1-japanese-1, v1-spanish-2, v1-portuguese-7, v1-german-3, > v1-arabic-1, v1-japanese-7, v1-spanish-4, v1-other-3, v1-japanese-3, > v1-english-0, v1-french-1, v1-indonesian-0, v1-arabic-2, v1-english-4, > v1-french-0, v1-indonesian-7, v1-english-12, v1-spanish-3, v1-japanese-6, > v1-indonesian-3, v1-english-13, v1-other-6, v1-portuguese-3, v1-italian-1, > v1-english-7, v1-german-0, v1-korean-3, v1-spanish-7, v1-farsi-2, > v1-russian-1, v1-indonesian-1, v1-portuguese-0, v1-indonesian-6, > v1-french-2, v1-english-11, v1-japanese-2, v1-other-2, v1-indonesian-2, > v1-english-14, v1-portuguese-5, v1-spanish-6, v1-german-1, v1-italian-0, > v1-english-2, v1-japanese-5, v1-portuguese-4, v1-spanish-0, v1-english-3, > v1-arabic-3, v1-other-5, v1-korean-2) > 2013-03-22 04:54:51,808 INFO kafka.producer.SyncProducer: Connected to > kafka01.qa.viq:9092 for producing > 2013-03-22 04:54:51,983 INFO kafka.producer.SyncProducer: Disconnecting > from kafka01.qa.viq:9092 > 2013-03-22 04:54:51,987 INFO kafka.producer.SyncProducer: Connected to > 100.100.106.91:9092 for producing > 2013-03-22 14:41:42,650 INFO kafka.consumer.SimpleConsumer: Reconnect due > to socket error: > java.net.SocketTimeoutException > at > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201) > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) > at > java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221) > at kafka.utils.Utils$.read(Utils.scala:372) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at > kafka.network.Receive$class.readCompletely(Transmission.scala:56) > at > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) > at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) > at > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:124) > at > kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:122) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:161) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:161) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:161) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:160) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:160) > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:160) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:159) > at > kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:48) > at > kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:58) > at > com.visibletechnologies.platform.common.kafka.KafkaReader.initializeIterator(KafkaReader.java:231) > at > com.visibletechnologies.platform.common.kafka.KafkaReader.read(KafkaReader.java:181) > at > com.visibletechnologies.platform.katta.krouter.KRouter.doWork(KRouter.java:117) > at > com.visibletechnologies.framework.servicebase.ServiceBase.start(ServiceBase.java:187) > at > com.visibletechnologies.platform.katta.krouter.Main.main(Main.java:132) > > I don't see any Zookeeper interactions here. It looks like fetch requests > to the Kafka broker, but maybe I'm missing something. > > > On Thu, Mar 21, 2013 at 9:16 PM, Jun Rao <jun...@gmail.com> wrote: > >> Bob, >> >> Currently, the metadata request needs to do at least one ZK read per >> partition. So the more topics/partitions you have, the longer the request >> takes. So, you need to increase the request timeout. Try something like 60 >> * 1000 ms. >> >> Thanks, >> >> Jun >> >> On Thu, Mar 21, 2013 at 12:46 PM, Bob Jervis <bjer...@gmail.com> wrote: >> >>> We are seeing horrible problems. We cannot move data through our 0.8 >>> borker because we are getting socket timeout exceptions and I cannot >>> figure >>> out what settings should be. The fetch metadata stuff is throwing these >>> exceptions and no matter how I tweak the timeouts, I still get horrible >>> timeouts and no progress on moving data. >>> >>> On test environments where there are only 12 topics there are no >>> problems. >>> >>> When the number of topics goes to ~75, then we can't move anything >>> because >>> the fetch metadata requests time out. >>> >>> What can we do to fix this????????? >>> >>> I am desperate. >>> >> >> >