We solved this offset sync issue by making our topology idempotent, (we could do that with our use case) our storm topology consumes documents from kafka and commits to elasticsearch & inserting records to cassandra.. our topology can re-consume from beginning of the queue, and the docids and primary keys are chosen such that the records get overwritten with the same document.
cheers, /Manish On Thu, Dec 21, 2017 at 1:23 PM, Stig Rohde Døssing <[email protected]> wrote: > Hi Nasron, > > I don't believe there's currently a tool to help you migrate. We did it > manually by writing a small utility that looked up the commit offsets in > Storm's Zookeeper, opened a KafkaConsumer with the new consumer group id > and committed the offsets for the appropriate partitions. We stopped our > topologies, used this utility and redeployed with the new spout. > > Assuming there isn't already a tool for migration floating around > somewhere, I think we could probably build some migration support into the > storm-kafka-client spout. If the path to the old offsets in Storm's > Zookeeper is given, we might be able to extract them and start up the new > spout from there. > > 2017-12-19 21:59 GMT+01:00 Nasron Cheong <[email protected]>: > >> Hi, >> >> I'm trying to determine steps for migration to the storm-kafka-client in >> order to use the new kafka client. >> >> It's not quite clear to me how offsets are migrated - is there a specific >> set of steps to ensure offsets are moved from the ZK based offsets into the >> kafka based offsets? >> >> Or is the original configuration respected, and storm-kafka-client can >> mostly be a drop in replacement? >> >> I want to avoid having spouts reset to the beginning of topics after >> deployment, due to this change. >> >> Thanks. >> >> - Nasron >> > >
