Hi guys, I am using apache beam 2.2.0 on prod. Reading from kafka and writing to BigQuery.
In your kafkaIO documentation you mention that one can enable auto-commit for offsets by setting ENABLE_AUTO_COMMIT_CONFIG to true. But I noticed in dataflow logs that my consumer config always has this value set to false even tho I set it to true in my code (via updateConsumerProperties method). I had a look at source code and I noticed that in KafkaIO class in line nr: 1153 you are hardcoding auto commits to false. Which means that documentation is not in line with code implementation. Now I view this as a problem, because I already had situations that I couldn't redeploy my flow with --update flag due to incompatible changes, but I still didn't want to reprocess old data. In this case I will have to use earliest offset reset which will force me to reprocess all the messages in my topic. I cannot use latest offset reset, because I will lose data. And KafkaIO is also not manually committing those offsets on successful checkpoints which would be definitely the most optimal solution, like it is handled in flink. Do you have any suggestions how to solve that problem? Thanks, Kamil. -- Kamil DziubliĆski Big Data Engineer Azimo 173 Upper Street <https://maps.google.com/?q=173+Upper+Street+London,+N1+1RG&entry=gmail&source=g> London, N1 1RG <https://maps.google.com/?q=173+Upper+Street+London,+N1+1RG&entry=gmail&source=g> azimo.com facebook.com/azimomoney <https://www.facebook.com/azimomoney> twitter.com/azimo
