What is the runner? Can you elaborate a bit more what you mean by 'executor/thread shutting down'? If the KafkaIO reader shutdown cleanly, it would call close the consumer cleanly, triggering auto commit. But if you shutdown a pipeline, it might not cleanly close the consumers. What is the auto_commit interval you have? Please note that there is no way to coordinate consistency between Beam pipeline and externally maintained auto commit offsets since it is outside Beam. 'Drain' feature Dataflow can help (it lets a clean shutdown of the pipeline), also note that many runners provide clean ways to update a pipeline that keeps all the state from previous run (in this case Kafka offsets), which is the only way for Beam to provide its processing guarantees across runs.
KafkaIO leaves auto_commit handling entirely to KafkaConsumer. If you are seeing consumer is not honoring the auto_committed offset, please check the log from KafkConsumer on the worker. Only user error I could think of is some typo in consumer group name upon restart. Currently KafkaIO does not actively participate in auto_commit management. It lets user directly set KafkaConsumer configuration. May be there is a case for some more active support for auto_commit management. Please provide more details in your case so that we can discuss actual specifics and potential improvements it provides. On Mon, Nov 6, 2017 at 8:15 AM, NerdyNick <[email protected]> wrote: > There seems to be a lot of oddities with the auto offset committer and the > watermark management as well as kafka offsets in general. > > One issue I keep having is the auto committer will just not commit any > offsets. So the topic will look like it's backing up. From what I've been > able to trace on it it appears to be in relation to the executor/thread > shutting down before the auto commit has a chance to run. Even though the > min read times are set. It still prematurely shuts down. Turning auto > commit interval down seems to help but doesn't resolve the issue. Just > seems to allow it to correct itself much quicker. > > Another I just had happen is after restarting a pipeline the auto > committed offsets reset to the earliest record and the pipeline appears to > be working on those records. Which is odd in contrary to a lot of things. > When I shut the pipeline down it was only a few thousand records behind. > The consumer is configured to start at the latest offset not the earliest. > Give that It would appear the recorded watermarks had an odd corruption or > something where they believed they where in the past. > > -- > Nick Verbeck - NerdyNick > ---------------------------------------------------- > NerdyNick.com > Coloco.ubuntu-rocks.org >
