skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling URL: https://github.com/apache/spark/pull/24613#issuecomment-494313373 @gaborgsomogyi > Something like monitoringGroupId One option to pass a full predictable groupId is at the point when the query object is created (start) but it seems that I can do that as well, options are defined per source from what I see and can be used when the set relations are transformed to the logical plan. @HeartSaVioR > EDIT: This case might be still an issue even if query ID is being used as consumer ID - This may also occur when queries are started/restarted in quick succession. Could you elaborate a bit more on this? I think since the streaming object is kept around on a quick restart you could just start again from the beginning creating the ids. Do you mean that there will be issues because of re-using the same id in a short period of time? In that case I think the brokers are notified since the consumer is closed in a graceful shutdown of the query. My decision is try do the following: a) Allow user pass a gId per source with the `monitoringGroupId` which will have effect only when the user sets `commitOffsetsOnCheckpoints` flag. b) On query restarts we start re-assigning the ids. When queries are run in parallel we make sure ids are distinct unless the user makes them identical on purpose.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
