Vikas,
"Kafka server is started with default properties except the log retention period being 15 minutes" This seems very aggressive log retention on kafka side hence you might be running into "Got fetch request with offset out of range" " Too many failed messages at spout. I assumed that initially when topology starts, because of initialization latency, there might be few thousands of messages which fail, however, it seems that this behavior is not limited to initialization and messages fails quite often and very rarely I am seeing that there is no failed message in last 10 minutes. :)" Have you seen any errors in worker logs?. Failed at messages at the spout is bit confusing it might be that your bolts failing and spout receiving a "fail" acknowledgement from the bolts. Every time I submit my topology, it takes more than 10 minutes to reach messages to the first bolt. First spout tries to accumulate message (which too many failed messages) for first few minutes (10 mins or so) This seems strange. How many partitions your topic has and whats the parallelism on the spout. -Harsha On Tue, Sep 2, 2014, at 10:22 PM, Vikas Agarwal wrote: Hi, I am not sure if this mailing list would be the correct place for this, however, I decided to ask here assuming many of storm cluster installations involve Kafka as their spout. I have set following properties for Kafka Spout: kafkaConfig.bufferSizeBytes = 1024 * 1024 * 4; kafkaConfig.fetchSizeBytes = 1024 * 1024 * 4; kafkaConfig.forceFromStart = true|false; (tried both, true and false) Kafka server is started with default properties except the log retention period being 15 minutes. And Storm configuration is as mentioned the Michael Noll's [1]blog conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 8);conf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 32);conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384); topology.max.spout.pending = 10000 I am using Hortonworks distribution for installing Hadoop ecosystem. We are consuming twitter stream and pushing the tweets to a Kafka topic and then Storm topology is trying to consume those tweets using KafkaSpout with configuration described above. We are using twitter filter stream and we have many filter keywords so the input flux is quite high (not high as with firehose but still very high) and varies quite a lot depending on time of the day and any of the keywords, used as track filter, being viral on a particular day. Now I am facing 3 major issues with my topology (which contains 3 bolts after the kafka spout) 1) Too many failed messages at spout. I assumed that initially when topology starts, because of initialization latency, there might be few thousands of messages which fail, however, it seems that this behavior is not limited to initialization and messages fails quite often and very rarely I am seeing that there is no failed message in last 10 minutes. :) 2) After a while Kafka spout begins to throw "Got fetch request with offset out of range" error message continuously and never picks any message from the kafka topic while the stream collector is still able to push the messages to the topic. 3) Every time I submit my topology, it takes more than 10 minutes to reach messages to the first bolt. First spout tries to accumulate message (which too many failed messages) for first few minutes (10 mins or so) and then each bolt start accumulating messages sequentially and after 15-20 min, every bolt in the topology has some messages to process. I am not able to understand why a message that has been processed by spout, is not delivered to next bolt immediately. I guess the message buffers as described in Michael Noll's blog are responsible for this but still changing the buffers didn't make any change in behavior. -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters [2]http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax References 1. http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/ 2. http://www.infoobjects.com/
