skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling URL: https://github.com/apache/spark/pull/24613#issuecomment-494092510 @gaborgsomogyi @HeartSaVioR what if I introduce queryOptions at the `query.start()` call as optional parameters, so I can then pass a query specific unique gID [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L101). For example: ``` query1.start(id1) query2.start(id2) ``` This way we dont share anything if we want and the user will have to pass explicitly the gId per query if he wants to integrate with external monitoring, thoughts? This way we could make the per query config more flexible in the future in case we want to add more query specific options. > query ID might be considered as unique group id since it can provide both unique and continuous, but it should consider the case where multiple Kafka sources are being used in same query. In that case we could have an increasing id added as sources are getting registered, assuming code is not modified keeping registration order the same on restart. I can do both but let me know what is the viable option here. I prefer the first one as it is what monitoring tools expect but may be too intrusive for Spark, as for the second one it is also possible.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
