skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling URL: https://github.com/apache/spark/pull/24613#issuecomment-493974211 > Why not use one group and listConsumerGroupOffsets? @gaborgsomogyi yes `listConsumerGroupOffsets` could be used but is it implemented for most clients? Check [here](https://github.com/edenhill/librdkafka/issues/2173). If people will use the admin-client it makes sense btw. It seems it does two [calls](https://github.com/apache/kafka/blob/3b1524c5dfd2a94f3fb919dad0de70984963772b/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L2776), first find the coordinator and then list the offsets. On the other hand when I say filtering, I dont mean filter the whole topic, it would mean pick up from the latest and as you see new records coming in that topic, process them or not based on the filter. Of course that could be also slow. Anyway, I dont have a clear view of the performance at the moment but I dont mind switching to supporting that special call if possible. @seglo thoughts, what are you using for getting the offsets? When you say one group what do you mean? If I create the same groupId per source per query then partial data may be assigned as described [here](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L81). So it does not make sense for multiple queries running in parallel. If the same query is restarted then if using the same gId does not create issues then I could do it if I checkpoint that info eg. enforce a unique gID being equal to the guery Id that is persisted across restarts in the metadata dir.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
