Hi guys,

I am using apache beam 2.2.0 on prod. Reading from kafka and writing to
BigQuery.

In your kafkaIO documentation you mention that one can enable auto-commit
for offsets by setting ENABLE_AUTO_COMMIT_CONFIG to true.
But I noticed in dataflow logs that my consumer config always has this
value set to false even tho I set it to true in my code (via
updateConsumerProperties method).

I had a look at source code and I noticed that in KafkaIO class in line nr:
1153 you are hardcoding auto commits to false.
Which means that documentation is not in line with code implementation.

Now I view this as a problem, because I already had situations that I
couldn't redeploy my flow with --update flag due to incompatible changes,
but I still didn't want to reprocess old data. In this case I will have to
use earliest offset reset which will force me to reprocess all the messages
in my topic. I cannot use latest offset reset, because I will lose data.
And KafkaIO is also not manually committing those offsets on successful
checkpoints which would be definitely the most optimal solution, like it is
handled in flink.

Do you have any suggestions how to solve that problem?

Thanks,
Kamil.


-- 
Kamil DziubliƄski
Big Data Engineer

Azimo
173 Upper Street
<https://maps.google.com/?q=173+Upper+Street+London,+N1+1RG&entry=gmail&source=g>
London, N1 1RG
<https://maps.google.com/?q=173+Upper+Street+London,+N1+1RG&entry=gmail&source=g>




azimo.com
facebook.com/azimomoney <https://www.facebook.com/azimomoney>
twitter.com/azimo

Reply via email to