[GitHub] [spark] gaborgsomogyi commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

GitBox Fri, 26 Mar 2021 05:08:15 -0700


gaborgsomogyi commented on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-808166003



   > Take Kafka as an example, we can have read limit while consuming the 
offsets, so we can only consume some certain number of offset, but the 
available data in kafka is more than that. That can be applied to all the other 
streaming sources too. There are some users want to know whether they fall 
behind through the listener and want to adjust the cluster size accordingly.
   
   If I understand correctly it's planned to monitor whether Spark is behind in 
Kafka processing. If that's true there is an existing solution for this which 
works like charm. The user can commit the offsets back to Kafka with a listener 
and the delta between available and committed offsets can be monitored. If this 
would be the only use-case then not 100% sure it worth the ~300 lines change 
set in the Kafka part.
   
   I would like to emphasize I'm not against to make this better, just would be 
good to see a bit more from use-case perspective.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

Reply via email to