skonto edited a comment on issue #24613: [SPARK-27549][SS] Add support for 
committing kafka offsets per batch for supporting external tooling
URL: https://github.com/apache/spark/pull/24613#issuecomment-494520497
 
 
   @zsxwing 
   
   > The inconsistency between Spark's checkpoint and Kafka commit metadata may 
be confusing
   
   I could move the commit code in `runActivatedStream` at the end after 
progress is reported, after the trigger is finished and eliminate the issue.
   
   > This one also only works in micro-batch mode.
   
   Given that the old streaming api is not even officially EOL afaik, 
continuous streaming is experimental and micro-batch mode is the supported 
production ready one I see supporting this first as the reasonable choice. 
   
   > In addition, it seems hard to reason about the case such as self-join and 
self-union. I prefer to not add this into Spark.
   
   What is the issue with this case. Could you elaborate more? In a self-join 
you only have one consumer no?
   
   > I prefer to not add this into Spark. You can just implement this inside a 
StreamingQueryListener and expose the listener as a library
   
   We discussed this in the related jira. Certainly it can be done this way. 
Since this is done in Flink why not do it in Spark, assuming there are no 
issues?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to