Thanks Bill, my apologies I did not elaborate my use case. 

In my use case, the data from Cassandra is pushed to Kafka and then we consume 
from Kafka to snowflake. Once we push the data to snowflake, we do not want to 
go back to the source(Cassandra) to pull the data again. There are occasions 
where we are asked to pull the data for a certain date and time. I thought 
storing the offset will help with that use case. The other item is our 
validation framework. We need to validate that I am processing all the rows 
that Cassandra is pushing to kafka. So the validation program needs to look at 
number of rows in Cassandra for a particular key and see if we have that many 
messages in Kafka and Snowflake for that key.


Thanks
Rajib

-----Original Message-----
From: Bill Bejeck <b...@confluent.io> 
Sent: Tuesday, May 12, 2020 7:41 AM
To: users@kafka.apache.org
Subject: Re: Offset Management...

[**EXTERNAL EMAIL**]

Hi Rajib,

Generally, it's best to let Kafka handle the offset management.
Under normal circumstances, when you restart a consumer, it will start reading 
records from the last committed offset, there's no need for you to manage that 
process yourself.
If you need manually commit records vs. using auto-commit, then you can use one 
of the commit API methods commitSync 
<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitSync--&amp;data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&amp;sdata=Qnz%2BuwyN477ga76v6s4vjWU%2BVE0m%2FRXYILWK5CBF0wo%3D&amp;reserved=0>
 or commitAsync
<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitAsync-org.apache.kafka.clients.consumer.OffsetCommitCallback-&amp;data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&amp;sdata=EGFjUW59I%2BPtNXsf2Cm02PWcXMjL3oBi0nLhgVE4KAg%3D&amp;reserved=0>
.

-Bill


On Mon, May 11, 2020 at 9:52 PM Rajib Deb <rajib_...@infosys.com> wrote:

> Hi, I wanted to know if it is a good practice to develop a custom 
> offset management method while consuming from Kafka. I am thinking to 
> develop it as below.
>
>
>   1.  Create a PartitionInfo named tuple as below
>
> PartitionInfo("PartitionInfo",["header","custom writer","offset"]
>
>   1.  Then populate the tuple with the header, writer and last offset 
> details
>   2.  Write the tuple in a file/database once the consumer commits the 
> message
>   3.  Next time when consumer starts, it checks the last offset and 
> reads from there
>
> Thanks
> Rajib
>
>

Reply via email to