skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for committing kafka offsets per batch for supporting external tooling URL: https://github.com/apache/spark/pull/24613#issuecomment-494520497 @zsxwing > The inconsistency between Spark's checkpoint and Kafka commit metadata may be confusing I could move the commit code in `runActivatedStream` at the end after progress is reported, after the trigger is finished and eliminate the issue. > This one also only works in micro-batch mode. Given that the old streaming api is not even officially EOL afaik, continuous streaming is experimental and micro-batch mode is the supported production ready one I see supporting this first as the reasonable choice. > In addition, it seems hard to reason about the case such as self-join and self-union. I prefer to not add this into Spark. What is the issue with this case. Could you elaborate more? In a self-join you only have one consumer no? > I prefer to not add this into Spark. You can just implement this inside a StreamingQueryListener and expose the listener as a library We discussed this in the related jira. Certainly it can be done this way (StreamingQueryListener). However, since this is done in Flink why not do it in Spark and help the users, assuming there are no issues? Why should Spark require special handling when integrating with monitoring tooling for kafka consumer progress?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
