skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling URL: https://github.com/apache/spark/pull/24613#issuecomment-494092510 @gaborgsomogyi @HeartSaVioR what if I introduce queryOptions at the `query.start()` call as optional parameters, so I can then pass a query specific unique gID [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L101). For example: ``` query1.start(id1) query2.start(id2) ``` This way we dont share anything if we want and the user will have to pass explicitly the gId per query if he wants to integrate with external monitoring, thoughts? This way we could make the per query config more flexible in the future in case we want to add more query specific options. > query ID might be considered as unique group id since it can provide both unique and continuous, but it should consider the case where multiple Kafka sources are being used in same query. In that case we could have an increasing id added as sources are getting registered, assuming code is not modified keeping registration order the same on restart. Let me know what is the viable option here. I prefer extending the API as it is provides maximum flexibility but may be too intrusive for Spark to modify a public API. So an auto-generated gId based on the query id will solve the issues. The problem with auto-generated gIds is you need to discover them first in order to use them elsewhere eg. monitoring side. With static fixed ids its easier and there is no option to do so with Spark that is also safe in any case right now. Goal is not to make anything default, but it would be turned on or off via a flag.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
