Thanks Bill, my apologies I did not elaborate my use case. In my use case, the data from Cassandra is pushed to Kafka and then we consume from Kafka to snowflake. Once we push the data to snowflake, we do not want to go back to the source(Cassandra) to pull the data again. There are occasions where we are asked to pull the data for a certain date and time. I thought storing the offset will help with that use case. The other item is our validation framework. We need to validate that I am processing all the rows that Cassandra is pushing to kafka. So the validation program needs to look at number of rows in Cassandra for a particular key and see if we have that many messages in Kafka and Snowflake for that key.
Thanks Rajib -----Original Message----- From: Bill Bejeck <b...@confluent.io> Sent: Tuesday, May 12, 2020 7:41 AM To: users@kafka.apache.org Subject: Re: Offset Management... [**EXTERNAL EMAIL**] Hi Rajib, Generally, it's best to let Kafka handle the offset management. Under normal circumstances, when you restart a consumer, it will start reading records from the last committed offset, there's no need for you to manage that process yourself. If you need manually commit records vs. using auto-commit, then you can use one of the commit API methods commitSync <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitSync--&data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&sdata=Qnz%2BuwyN477ga76v6s4vjWU%2BVE0m%2FRXYILWK5CBF0wo%3D&reserved=0> or commitAsync <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitAsync-org.apache.kafka.clients.consumer.OffsetCommitCallback-&data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&sdata=EGFjUW59I%2BPtNXsf2Cm02PWcXMjL3oBi0nLhgVE4KAg%3D&reserved=0> . -Bill On Mon, May 11, 2020 at 9:52 PM Rajib Deb <rajib_...@infosys.com> wrote: > Hi, I wanted to know if it is a good practice to develop a custom > offset management method while consuming from Kafka. I am thinking to > develop it as below. > > > 1. Create a PartitionInfo named tuple as below > > PartitionInfo("PartitionInfo",["header","custom writer","offset"] > > 1. Then populate the tuple with the header, writer and last offset > details > 2. Write the tuple in a file/database once the consumer commits the > message > 3. Next time when consumer starts, it checks the last offset and > reads from there > > Thanks > Rajib > >