yijiacui-db commented on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-806074981


   > The latest offset in source progress is the latest offset available in the 
source, not the latest consumed offset by the stream.
   
   @viirya That's a good point. I referred to the latest consumed offset used 
in metrics method, without realizing that latestOffset available is reported by 
Kafka through reportLatestOffset. While implementing this metrics interface, 
it's more general for sources that don't implement reportLatestOffset, who 
doesn't have access to latest available offset in the source progress.
   
   It's definitely true that for Kafka source, this api isn't that necessary 
because of that reported latest offset. @zsxwing @tdas Do you think that we 
should remove this api for kafka source because it's kinda duplicated? And if 
so, do we still want to merge the metrics api only to apache/spark?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to