I was able to verify this on my setup. I'm still not able to figure out if KafkaSpout can write offset to Kafka instead of Zookeeper. Has anyone tried this? Anyway I can pass through consumer configs like "offsets.storage = kafka"?
I found a few reference that explicitly say that KafkaSpout will write to Zookeeper. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_user-guide/content/ch_storm-using-ingest-connectors.html "The Kafka spout stores its offsets in the same instance of Zookeeper used by Apache Storm." We have a Solr cloud using Zookeeper and now I'm planning to use it for Kafka+Storm. Our clusters will be rather small but I'd like to reduce zookeeper usage as much as possible. Has anyone here tried three applications using same Zookeeper? We would probably have 4-5 Solr cloud nodes, 3-4 kafka nodes, ~10 Storm nodes to begin with. Thanks, Srikanth On Thu, Apr 9, 2015 at 7:58 PM, Erik Weathers <[email protected]> wrote: > Srikanth, > > Yes, the KafkaSpout ensures "at least once" processing. > > As for whether the topology restart leads to reading from the last offset > committed, it is configurable. e.g., with the kafka-0.8 spout available > with storm-0.9.3+, you can set the startOffsetTime to force a different > behavior than "last committed offset": > > spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); > spoutConfig.startOffsetTime = kafka.api.OffsetRequest.LatestTime() > > - Erik > > On Thu, Apr 9, 2015 at 4:12 PM, Srikanth <[email protected]> wrote: > >> Hello, >> >> We have an uses case for Storm+Kafka where we need atleast once >> processing. >> >> If I understand correctly kafkaSpout >> i) Adds each offset Id to pending set when emitting a tuple. >> ii) Removes offset ID from pending set only when processing is >> successful downstream(ack()) >> iii) Does a commit() to zookeeper every stateUpdateIntervalMs. >> iv) Commit writes lowest pending offset to Zookeeper. >> >> So, when a topology restarts, it will start from the last write in step >> iv hence assuring atleast once processing. Is my understanding correct? >> >> Thanks, >> Srikanth >> > >
